Below, painted informally and in broad strokes, are some questions from my research.

### Computing for and with chaotic dynamics

Because of its high sensitivity to perturbations, the state of a chaotic system cannot be predicted accurately even when its (deterministic) governing equations are known. This is because the initial condition is, in practice, never known exactly. Furthermore, the governing equations can only be solved inexactly leading to growing numerical errors. The resulting solution will only be a *pseudo*-orbit of the true governing equations. This is also the case when there are – in practice, inevitable – modeling errors in the governing equations. We are interested in two practically meaningful quantities whose estimates are known mathematically. The presence of chaos makes existing algorithms developed for these estimates in nonchaotic systems invalid or infeasible. Thus, it may seem that, in the quest to find a mathematically rigorous and a feasible computation, the abovementioned high sensitivity to perturbations is a deadend. But, ultimately, this defining feature of chaos gives us wings. When we learn to exploit instability to perturbations, we not only compute the quantities we desired in a new way, but we encounter other fundamental quantities that are studied in statistics.

#### How does the long-term behavior of a chaotic system change with system parameters?

The first quantity of interest is the derivative of statistics with respect to system parameters. Even though instantaneous values are highly sensitive to small changes in the system parameters, long-term averages may vary smoothly with parameter changes. For instance, one hopes that some climate components, when averaged over sufficiently large time and spatial scales, respond smoothly to external forcings. Then, these statistical responses are well-defined, even though the response of chaotic processes at shorter spatiotemporal scales may be highly unpredictable. How do we obtain estimates of these derivatives without using classical tangent/adjoint methods that lead to exploding gradients? The theory of linear response, which also appears in statistical mechanics, is well-established for idealized chaotic systems and gives a formula for the derivative. One such estimate that is provably consistent with this formula is the subject of our joint work with Qiqi Wang.

#### How can we estimate the state of a chaotic system given past observations?

The second quantity is the object of *Bayesian filtering*: the sequence of filtering distributions. In this setting, we have a timeseries of observations of the chaotic orbits. That is, through experiments or through numerical simulations, we collect noisy values of some state functions along orbits. With these observations, the goal of filtering to is estimate the present state. In Bayesian filtering, we are interested in filtering distributions: the Bayesian posterior distribution of the state conditioned on past observations. Going beyond point estimates, and having quantified uncertainties is especially useful in chaotic systems, because, as mentioned earlier, even with perfect observations (zero measurement noise), the state can rarely be estimated perfectly.

The filtering distributions are repeatedly updated via Bayesian inference each time a new observation is available. In practice, we require an efficient method to sample from the filtering distributions. One way to accomplish this is via *measure transport*, or, transformation of samples from a source to a target distribution. Here, we must repeatedly compute such transport maps that can take samples from the prior to posterior of each Bayesian update. How can we exploit knowledge about the underlying chaotic attractor, particularly that which can be computed from numerical simulations of the dynamics, to compute a new kind of transport maps?

### Computational methods for measure transport

As mentioned before, filtering is an iterative application of Bayesian inference. We are interested more broadly in computational methods for inference based on measure transport. That is, we need to sample from a typically non-Gaussian posterior by computing a transformation of samples from the prior. One setting is when the posterior is known upto the normalization constant, and the *score* – gradient of the log density – can be computed at prior samples. It turns out we can devise methods that have a connection to KAM theory – classical results about perturbations of certain classes of dynamical systems. We are interested in exploring and exploiting this connection, more generally, between hamiltonian dynamics and computational methods for sampling and measure transport.

### The dynamics of learning

The training algorithm for optimization problems can be studied as deterministic or stochastic dynamical systems on the space of parameters (such as weights, biases etc in a neural network training). These dynamics are generally nonlinear, and exhibit behavior other than convergence to fixed points, in the typical non-convex settings we deal with in practice. One theoretical mystery of practical importance surrounds *generalization* of the learning algorithm. That is, how well do the outputs of the algorithm (i.e., the learned neural network function) perform on unseen inputs? One often encounters in practice that keeping the data the same, increasing the number of parameters to much higher values than the input size offers improved generalization. We are interested in finding clues to generalization in the dynamics of learning. These could add to the large swath of interesting theoretical results and empirical observations, from the perspectives of optimization and learning theory, toward better understanding generalization.