Daniel Messenger, Department of Applied Mathematics, University of Colorado Boulder
Data-Driven Model Selection using Weak SINDy with Applications to Spatiotemporal Problems in Biology
The task of identifying governing equations to match observed phenomena is crucial to understanding and predicting the behavior of complex systems for which derivation of models from first principles is not feasible. For spatiotemporal problems in biology, such as morphogenesis, cellular migration, and territory development, mathematical representations of underlying mechanisms are often proposed heuristically. Advancements in data science now allow for validation and improvement of these proposed representations. In particular, the field of data-driven model selection aims to learn appropriate mathematical models from experimental data so that the underlying dynamics can be inferred directly from the learned equations. Recent breakthroughs have involved using experimental data points directly in the computation of trajectories from candidate models. A key challenge in this regard is the approximation of pointwise derivatives from experimental data with high measurement noise and/or low sampling frequency. We present a novel weak formulation of the system discovery problem that replaces numerical differentiation with local integrations. This so-called Weak SINDy framework (WSINDy) improves on the standard SINDy algorithm by orders of magnitude, allowing for the discovery of ODEs and PDEs in the presence of significantly higher noise levels than previously reported. In some notoriously challenging problems (Kuramoto-Sivashinsky, Korteweg-de Vries) system identification is possible along with reasonable accuracy in the recovered coefficients for noise levels upwards of 50% (||noise||_2 / ||clean data||_2 x 100%), while existing methods fail for noise levels beyond 1%. In addition, in the case of noise-free data, we prove that with suitable test functions WSINDy recovers the correct equations with effective machine precision recovery of coefficients (i.e. below the error tolerance of the data simulation scheme). In this way, the WSINDy algorithm requires no pointwise derivative approximations, noise filtering, or black-box routines, and requires no knowledge of the noise level. We demonstrate the viability of WSINDy for system identification in biology by applying it to several synthetic test ODEs and PDEs as well as to data from in vivo wound healing experiments. In the case of wound healing data, as a proof of concept we show that WSINDy produces a PDE that qualitatively matches the experimental spatiotemporal data set.