Quantum Mechanics for Closure of Dynamical Systems
Abstract.
We propose a scheme for data-driven parameterization of unresolved dimensions of dynamical systems based on the mathematical framework of quantum mechanics and Koopman operator theory. Given a system in which some components of the state are unknown, this method involves defining a surrogate system in a time-dependent quantum state which determines the fluxes from the unresolved degrees of freedom at each timestep. The quantum state is a density operator on a finite-dimensional Hilbert space of classical observables and evolves over time under an action induced by the Koopman operator. The quantum state also updates with new values of the resolved variables according to a quantum Bayes’ law, implemented via an operator-valued feature map. Kernel methods are utilized to learn data-driven basis functions and represent quantum states, observables, and evolution operators as matrices. The resulting computational schemes are automatically positivity-preserving, aiding in the physical consistency of the parameterized system. We analyze the results of two different modalities of this methodology applied to the Lorenz 63 and Lorenz 96 multiscale systems and show how this approach preserves important statistical and qualitative properties of the underlying chaotic dynamics.
Keywords
MSC codes
1. Introduction.
Among the foundational problems in the modeling of complex dynamical systems is the question of how to account for fine-grain degrees of freedom which are too computationally complex to model directly. Earth’s climate system is a classical example of a multiscale, multiphysics system where direct numerical simulation of all relevant degrees of freedom is not computationally feasible (now and for the foreseeable future [32]), necessitating the use of subgrid-scale models to represent unresolved degrees of freedom. For example, cloud formation (a highly influential process on climate scales) is in part determined by microscopic chemical processes and turbulent convective motions in the atmosphere across the entire globe. Representations of convective cloud physics within global climate models (GCMs) have thus relied on surrogate models of small-scale processes to approximate their aggregate contribution over the larger spatiotemporal climate scales [44]. This is a methodology known as closure, or parameterization, and besides climate dynamics it finds applications in many disciplines dealing with complex time-dependent phenomena; see, e.g., [79, 29].
In this paper, we present a new framework for closure of dynamical systems that models the unresolved degrees of freedom as a quantum mechanical system. Our approach extends a recently developed operator-theoretic framework for data assimilation, called quantum mechanical data assimilation (QMDA) [36, 33], to the setting of two-way coupling between classical and quantum systems representing the resolved and unresolved dynamics, respectively.
1.1. Parameterization.
Early approaches to parameterization (see, e.g., [1, 34]) were based on low-order bulk formulas representing an average flux from unresolved degrees of freedom to the resolved variables. These formulas are typically constructed using physical reasoning, and feature a small number of parameters that can be tuned, e.g., using observational data, so as to best match the behavior of nature, or a high-resolution reference model, on the resolved scales. More recently, with the advent of the “big data” era, data-driven parameterization schemes have received considerable attention [10, 17, 55, 72, 75, 77, 89, 94]. These approaches employ training data generated by high-resolution simulations to learn models of the unresolved flux terms as functions of the resolved variables. The premise of these machine learning approaches is that by fitting closure models in a high-dimensional hypothesis space one can discover functional relationships between the resolved and unresolved variables that would be difficult to do on the basis of physical reasoning and/or asymptotic analysis alone. Indeed, this approach has been shown to outperform conventional parameterizations in terms of capturing the statistical behavior of the resolved variables in a variety of settings [13, 12, 93, 67, 18].
A common feature of parameterization schemes, including the methods outlined above, is dimension reduction. That is, the contribution of the unresolved variables is made to be a function only of larger-scale variables which can be feasibly simulated or consistently measured. Effectively, this means that the equations governing the parameterized small-scale processes lose whatever independence they had from the resolved processes. In the paper [71], Palmer uses ideas from dynamical systems theory (in particular, the Poincaré–Bendixson theorem) to show that in some circumstances, dimension-reducing parameterizations will necessarily yield a system which is not chaotic. Palmer points out that since the chaotic nature of the weather and climate [85] is a vital aspect of its long-term behavior, a loss of chaotic dynamics could be a confounding factor in long-term statistical fidelity. In other words, the (partial) independence of small-scale processes could be a fundamental component of the qualitative behavior of large-scale processes. Simplifying the small-scale subsystems with functional dependencies exclusively in terms of other variables could thus result in the loss of dynamical complexity in the climate model and substantially change the qualitative behavior of long-term climate predictions.
Using the Lorenz 63 (L63) system [63] as an example, Palmer proposed a solution to this issue in which unresolved scales were parameterized by a stochastic variable rather than a deterministic model. The stochastic variable was the least energetic coordinate of the L63 system in an empirical orthogonal function (EOF) basis and was modeled as Gaussian white noise with variance determined from the full system (making its stochastic behavior mimic, in some sense, the deterministic behavior of the fully defined component). Altering the system to have a stochastic component effectively corresponds to state space augmentation; that is, the state space of the parameterized system now includes the resolved variables and the event space associated with the stochastic variables. As a result, the act of approximating some component of the state space no longer makes the parameterized variable totally determined by other components of the system through a functional relationship, which means there is no longer necessarily an erasure of the chaotic behavior.
More broadly, stochastic parameterization schemes provide a means of restoring the independence between resolved and unresolved degrees of freedom, while maintaining computational tractability, by representing the unresolved degrees of freedom as realizations of a stochastic process [66, 61, 91, 25, 22, 6]. In systems exhibiting timescale separation, a common route to stochastic parameterization is to derive stochastic differential equations (SDEs) for the unresolved variables by taking homogenization limits of the primitive governing equations [73, 68, 54]. Alternatively, and particularly in systems where the unresolved dynamics are not consistent with SDEs, stochastic closure schemes are constructed by parameter inference from training data [83]. Recent works have used ideas from data assimilation [65, 59] to sequentially learn the parameters of stochastic closure models from noisy partial observations [20, 42, 21, 60].
In placing these methods and results in context, it should be kept in mind that stochasticity of the parameterized dynamics is not a necessary ingredient for successful subgrid-scale modeling—rather, it is the act of representing unresolved degrees of freedom by surrogate dynamical models (of either deterministic or stochastic nature) that can lead to improved performance over parameterization schemes based on pure functional relationships with the resolved variables. Indeed, in multiscale dynamical systems exhibiting averaging principles [73, 3], the effective slow dynamics is deterministic. The heterogeneous multiscale method [29] is a general multiscale modeling framework that leverages averaging principles to build microscale models for unresolved variables—in a broad range of applications, these models are deterministic. In atmospheric dynamics, one of the most successful approaches to parameterization, called super-parameterization, involves coupling a coarse atmospheric model with embedded column models of moist convection in each gridbox [45, 84], and these models are again oftentimes (but not always [46]) deterministic.
1.2. Quantum mechanical data assimilation.
QMDA is a technique for sequential data assimilation (filtering) of partially observed dynamical systems [65, 59], combining elements of Koopman operator theory [30] with the Dirac–von Neumann quantum mechanical evolution and measurement axioms [88]. In QMDA, the density operator from quantum mechanics is employed as a generalization of the probability distribution for the system state in Bayesian data assimilation. Moreover, the assimilated observables are represented by self-adjoint multiplication operators, and dynamical evolution of observables takes place under the unitary action of the Koopman operator. When observations are made, the density is updated via a projective von Neumann measurement, which is analogous to the Bayesian analysis step. Thus, under the assumption that the assimilated signal is governed by a classical dynamical system, QMDA comprises a one-way-coupled classical–quantum system, where the classical system influences the state of the quantum system through the observation map but the quantum system does not influence the state of the classical system.
QMDA addresses a number of challenges in classical data assimilation, as it avoids the need for ad hoc Gaussian approximations, does not require diffusion regularization, and automatically preservers sign-definite quantities. Being rooted in linear operator theory, the method also has an asymptotically consistent, data-driven formulation, whereby operators are represented by matrices in a basis learned from time-ordered data using kernel algorithms [7, 35, 8, 27].
1.3. Quantum mechanical closure.
Building on the QMDA framework, we propose a data-driven technique for closure of dynamical systems that employs a quantum mechanical system as a surrogate model of the unresolved degrees of freedom. The surrogate quantum system evolves in tandem with the classical system for the resolved variables via two-way coupling, generalizing the one-way coupling in QMDA. A schematic overview of our approach, which we call quantum mechanical closure (QMCl), is depicted in Figure 1.

At any time \(t_n\), the state of the quantum mechanical system is represented by a density operator \(\rho_n\) acting on a Hilbert space \(H_L\) of (finite) dimension \(L\). The closure terms needed to advance the resolved variables \(x_n\) at time \(t_n\) to their state \(x_{n+1}\) at time \(t_{n+1} = t_n+ \Delta t\) are obtained by evaluation of a quantum mechanical observable \(A\) (i.e., a self-adjoint operator acting on \(H_L\)) on the state \(\rho_n\). Intuitively, the density operator \(\rho_n\) can be thought of as a noncommutative analogue of a classical probability density, and the evaluation of \(A\) on \(\rho_n\) is analogous to the evaluation of an expectation functional in classical probability theory. Thus, the update \(x_n \mapsto x_{n+1}\) given \(\rho_n\) can be thought of as ensemble update based on the statistical information about the unresolved degrees of freedom encoded in \(\rho_n\). In Palmer’s L63 example, which we will study in section 6, the resolved variables \(x_n\) would be the two most energetic L63 coordinates, and the quantum observable \(A\) would represent the influence of the least energetic coordinate to \(x_n\).
To advance \(\rho_n\) to the quantum state \(\rho_{n+1}\) at time \(t_{n+1}\), we first employ a unitary evolution map, \(\rho_{n} \mapsto \tilde \rho_{n+1}\), induced by the Koopman operator of the dynamical system [30]. Then we update \(\tilde \rho_{n+1} \mapsto \rho_{n+1}\) by conditioning by the resolved variables \(x_{n+1}\) using an operator-valued function that we call a quantum feature map. These steps will be described precisely below, but for now we note that the unitary Koopman evolution can be thought of as being analogous to the forecast step in data assimilation [65, 59]. In this analogy, the Koopman evolution leads to a “prior” state \(\tilde \rho_{n+1}\), which is updated projectively to the “posterior” \(\rho_{n+1}\) given \(x_{n+1}\) analogously to the Bayesian analysis step in data assimilation. In quantum theory, the values of the feature map used here to assimilate the resolved degrees of freedom are known as quantum effects [16]. Quantum effects generalize the notion of events in classical statistics. The joint update \((x_n,\rho_n) \mapsto (x_{n+1},\rho_{n+1})\) represents a closed dynamical system that can be iterated forward to yield an approximation of the original system.
1.4. Contributions.
The principal distinguishing aspects of QMCl are as follows.
1.
We cast the problem of closure of dynamical systems into the framework of operator theory and quantum information theory. Quantum systems are known to exhibit new types of behavior (such as entanglement and superposition) and offer greater capacity to encode information than classical systems [41]. In particular, the dimension of the space of density operators on an \(L\)-dimensional Hilbert space is \(O(L^2)\), whereas the dimension of the space of \(L\)-dimensional probability density vectors is \(O(L)\). Our main premise is that the mathematical framework of quantum mechanics provides a flexible arena for building closure schemes for classical dynamical systems enjoying favorable structure preservation and asymptotic consistency properties.
2.
Positivity preservation. Our procedure for constructing the coupled classical–quantum system depicted in Figure 1 employs, as an intermediate step, an information-preserving embedding of the full dynamical system (i.e., both the resolved and the unresolved degrees of freedom) into an infinite-dimensional quantum system. This system is projected to the finite-dimensional system on the Hilbert space \(H_L\). The process of embedding the classical system into the infinite-dimensional quantum system before projecting into finite dimensions allows us to take advantage of properties of non-abelian (that is, noncommutative) operator spaces to ensure that the parameterization scheme is positivity preserving. That is, positive-definite flux terms from the unresolved variables to the resolved variables in the original system are represented by positive-definite operators in the QMCl-parameterized system (and similarly for negative-definite terms). This addresses an important problem in data-driven subgrid-scale modeling, where failure to preserve the sign of sign-definite quantities (e.g., moisture in an atmospheric circulation model) is recognized as a significant source of bias and numerical instability [93].
3.
Data-driven formulation. QMCl employs a closely related data-driven formulation to QMDA, which uses eigenfunctions of kernel integral operators as data-driven basis functions for representation of linear operators [7]. The training data requirements of QMCl are comparable to other data-driven closure schemes and typically comprise of time series of the resolved variables and the unresolved fluxes. In what follows, we explore scenarios where training data for building the basis are either directly available from integrations of the full model (as would be the case, e.g., when coarsening a high-resolution climate model [12, 13, 93]), or they are constructed from the resolved variables (as would be the case, e.g., when learning residuals of imperfect models from observations of nature). In the latter scenario, we use delay-coordinate embedding techniques [81] to improve the richness of our approximation models. Using techniques for pointwise and spectral approximation of linear operators [19, 90], the data-driven QMCl models converge in the large-data limit to data-independent finite-dimensional models on the Hilbert space \(H_L\). Increasing the dimension parameter \(L\) allows the finite-dimensional models to capture increasingly complex subgrid-scale dynamics.
1.5. Plan of the paper.
In section 2, we state the parameterization problem under study. In section 3, we describe the construction of the quantum mechanical system that we will use to model the unresolved degrees of freedom. This construction largely parallels the QMDA approach from [33]. In section 4, we describe the QMCl parameterization scheme based on the quantum system from section 3. This is followed by a description of the data-driven implementation of our approach in section 5. In sections 6 and 7, we present applications of QMCl to the L63 and L96 multiscale systems, respectively. Section 8 contains a discussion and perspectives on future work. Appendix A gives an overview of the axioms of quantum mechanics. Appendices B and C contain supplementary material on the discretization procedure used to obtain the finite-dimensional QMCl scheme and its data-driven approximation, respectively. Details on numerical implementation and pseudocode are collected in Appendix D. A MATLAB implementation of the QMCl parametrization model can be found in the repository https://github.com/davidfreeman97/QMCl.
2. Problem statement.
Consider a discrete-time dynamical system with state space \(\Omega\) and evolution map \(\Phi : \Omega \to \Omega\) preserving a probability measure \(\mu\). Suppose that the state space is decomposable as the product \(\Omega ={\mathcal X} \times{\mathcal Y}\), where \(\mathcal X\) and \(\mathcal Y\) are state spaces for the resolved and unresolved degrees of freedom, respectively. Throughout the paper, we will assume that \(\Omega\) is a “nice” (separable, completely metrizable) topological space, and \(\mu\) is a Borel measure with compact support. Additional assumptions will be introduced as needed.
We view the \(\Phi\) dynamics on \(\Omega\) as being inaccessible to direct simulation, e.g., due to the dimension of \(\mathcal Y\) being prohibitively large, lack of complete knowledge of the evolution map \(\Phi\), or a combination thereof. The general problem of parameterization is to build a surrogate dynamical system \(\tilde \Phi : \tilde \Omega \to \tilde \Omega\) on a state space \(\tilde \Omega ={\mathcal X} \times \tilde{{\mathcal Y}}\) which is (i) feasible to integrate and (ii) compatible with the evolution of the resolved degrees of freedom in \(\mathcal X\). Here, \(\tilde{{\mathcal Y}}\) is the state space of a surrogate model of the unresolved degrees of freedom in \(\mathcal Y\).
Let \(P_{{\mathcal X}} : \Omega \to{\mathcal X}\) and \(\tilde P_{{\mathcal X}} : \tilde \Omega \to{\mathcal X}\) denote the canonical projection maps onto \(\mathcal X\) from \(\Omega\) and \(\tilde \Omega\), respectively. Let also \(\alpha : \Omega \to \tilde \Omega\) be a map that assigns a dynamical state in \(\tilde \Omega\) to each dynamical state in \(\Omega\). A parameterized system on \(\tilde \Omega\) which is consistent with the original system on \(\Omega\) would satisfy the following commutative diagram:
This diagram represents the fact that for initial conditions \(\tilde \omega = \alpha (\omega ) \in \tilde \Omega\) that are consistent with \(\alpha \in \Omega\) in terms of the resolved variables, i.e., \(\tilde P_{{\mathcal X}}(\tilde \omega ) = P_{{\mathcal X}}(\omega )\), the \(\tilde \Phi\) dynamics should produce consistent outcomes with \(\Phi\) (again in terms of the resolved variables), i.e., \(\tilde P_{{\mathcal X}} \circ \tilde \Phi (\tilde \omega ) = P_{{\mathcal X}} \circ \Phi (\omega )\). In practice, such a strong form of consistency is seldom achievable, and one seeks to identify an appropriate state space \(\tilde{{\mathcal Y}}\) and dynamics \(\tilde \Phi\) such that (2.1) holds as closely as possible in some sense, e.g., in terms of low-order statistics such as probability distributions and time-autocorrelation functions of the resolved variables with respect to the invariant measure.

(2.1)
In general, it is advantageous to include any partial knowledge about the \(\Phi\) dynamics in the construction of \(\tilde \Phi\). To incorporate that possibility in our formalism, we introduce (without loss of generality) a space \(\mathcal Z\) and maps \(\phi :{\mathcal X} \times \mathcal Z \to{\mathcal X}\) and \(Z :{\mathcal Y} \to{\mathcal Z}\) such thatHere, \(\phi\) captures the functional form of the evolution map for the resolved variables conditioned on a function \(Z\) (a “flux term”) of the unresolved variables. Assuming that \(\phi\) is known, we can construct \(\tilde \Phi\) as a map of the formwhere \(\tilde \psi :{\mathcal X} \times \tilde{{\mathcal Y}} \to \tilde{{\mathcal Y}}\) represents the evolution of the surrogate unresolved variables \(\tilde y\) given the resolved variables \(x\), and \(\tilde Z : \tilde{{\mathcal Y}} \to{\mathcal Z}\) is a function acting as a surrogate for \(Z\). If \(\phi\) is not known, we replace (2.2) bywhere \(\tilde \phi :{\mathcal X} \times{\mathcal Z} \to{\mathcal X}\) is an approximation of \(\phi\).
\begin{align*} P_{{\mathcal X}} \circ \Phi (\omega ) = \phi (x, Z(y)), \quad \omega = (x,y). \end{align*}
\begin{align} \tilde \Phi ( x, \tilde y) = (\phi (x, \tilde Z(\tilde y )), \tilde \psi (x, \tilde y)), \end{align}
(2.2)
\begin{align} \tilde \Phi ( x, \tilde y) = (\tilde \phi (x, \tilde Z(\tilde y )), \tilde \psi (x, \tilde y)), \end{align}
(2.3)
2.1. Examples.
We outline how some of the commonly used parameterization schemes fit within the framework described above.
2.1.1. Deterministic functional closure.
In closure schemes modeling the unresolved variables through functional relationships with the resolved variables, we formally set \(\tilde{{\mathcal Y}} ={\mathcal X}\), so that the flux term \(\tilde Z :{\mathcal X} \to{\mathcal Z}\) becomes a function of the resolved variables. The map \(\tilde \psi (x,\tilde y)\) is also formally set to a copy of \(\phi (x,\tilde Z(\tilde y))\), so it need not be explicitly retained in calculations. A number of the classical parameterization schemes (see, e.g., [1, 34] in geophysical fluid dynamics) derive \(\tilde Z\) heuristically using prior physical knowledge of the problem at hand. For certain classes of systems exhibiting timescale separation between the \(x\) and \(y\) variables, \(\tilde Z\) can be systematically obtained from the governing equations of the original system using asymptotic analysis techniques such as averaging [73]. In data-driven approaches, the function \(\tilde Z\) is constructed via a supervised learning algorithm using \((x,Z(y))\) pairs as training data; see, e.g., [13, 12, 93].
2.1.2. Stochastic closure.
To treat stochastic parameterization schemes, we set \(\tilde{{\mathcal Y}}\) to a probability sample space and \(\tilde Z : \tilde{{\mathcal Y}} \to{\mathcal Z}\) to a random variable, and we choose \(\tilde \psi\) such that \(\{ \tilde \Phi^n : n \in{\mathbb N}\}\) defines a stochastic process over \(\tilde \Omega\). As a concrete example, we may understand Palmer’s Gaussian-distributed stochastic closure of the L63 system [71] in this framework as follows.
In a coordinate system obtained via empirical orthogonal function (EOF) analysis, the L63 governing equations with the standard parameter values becomewhere the coordinates \(a_1\), \(a_2\), and \(a_3\) have decreasing variance. Let \(\Omega = \mathbb R^3\) and \(\Phi^t : \Omega \to \Omega\) with \(t \in \mathbb R\) be the flow generated by (2.4). In this example, we consider the discrete-time system \(\Phi : \Omega \to \Omega\) with \(\Phi = \Phi^{\Delta t}\) induced by the continuous-time flow at a fixed timestep \(\Delta t\) as the “true” dynamics. Moreover, we let \(\Omega ={\mathcal X} \times{\mathcal Y}\) with \(x= (a_1,a_2) \in{\mathcal X} \equiv \mathbb R^2\) and \(y = a_3 \in{\mathcal Y} \equiv \mathbb R\) represent the resolved and unresolved variables, respectively. The corresponding flux term is \(Z :{\mathcal Y} \to{\mathcal Z}\) with \({\mathcal Z} = \mathbb R\) and \(Z(y) = y\). As approximate resolved dynamics, we employ the map \(\tilde \phi :{\mathcal X} \times{\mathcal Z} \to{\mathcal X}\) obtained by a forward Euler discretization of (2.4) at the timestep \(\Delta t\); that is,
\begin{align} \begin{aligned} \dot{a_1} & = v_1(a_1,a_2,a_3) := 2.3\,a_1 - 6.2\,a_3 - 0.49\, a_1 a_2 - 0.57\, a_2 a_3, \\ \dot{a_2} & = v_2(a_1,a_2,a_3) := -62 - 2.7\,a_2+0.49\, a_1^2 - 0.49\, a_3^2+0.14\, a_1 a_3, \\ \dot{a_3} & = v_3(a_1,a_2,a_3) := -0.63\,a_1 - 13\, a_3+0.43\, a_1 a_2+0.49\, a_2 a_3, \end{aligned} \end{align}
(2.4)
\begin{align*} \tilde \phi (x,z) = x + (v_1(x,z), v_2(x,z)) \, \Delta t. \end{align*}
In Palmer’s approach, the evolution of the \(a_3\) variable is approximated by an i.i.d. Gaussian process. To represent this process within the framework described above, we let \(\tilde{{\mathcal Y}} = \mathbb R^{{\mathbb N}}\) be a space of \(\mathbb R\)-valued sequences equipped with a \(\sigma\)-algebra \(\Sigma_{\tilde{{\mathcal Y}}}\) of cylinder sets and a shift-invariant Gaussian measure \(\nu : \Sigma_{\tilde{{\mathcal Y}}} \to [0,1]\). That is, \(\nu\) has the properties that (i) \(\nu \circ T^{-1} = \nu\), where \(T : \tilde{{\mathcal Y}} \to \tilde{{\mathcal Y}}\) is the shift map, \(T(\tilde y_0,\tilde y_1, \ldots, ) = (\tilde y_1,\tilde y_2,\ldots )\); and (ii) \(\tilde Z_0, \tilde Z_1,\ldots\) with \(\tilde Z_n = u \circ T^n\) are i.i.d. Gaussian random variables. Here, \(u: \tilde{{\mathcal Y}} \to{\mathcal Z}\) projects onto the first component, \(u(\tilde y_0, \tilde y_1, \ldots ) = \tilde y_0\), and the measure \(\nu\) is chosen such that \(Z_n\) has variance equal to the variance of \(a_3\). With these definitions, the parameterized dynamics from (2.3) becomes \(\tilde \Phi (x,\tilde y) = (\tilde \phi (x, \tilde Z(\tilde y), \tilde \psi (x,\tilde y)))\), with \(\tilde \psi (x,\tilde y) = u(T(\tilde y))\).
Palmer observed that with this approach the parameterized \(x\) dynamics recovers the qualitative features of the \((a_1,a_2)\) variables of the full L63 system, including the characteristic lobes of the Lorenz attractor projected onto the \((a_1,a_2)\) plane; see [71, Figure 2] and Figure 7(a) below. In particular, since the state space \(\tilde \Omega\) of the stochastically parameterized system is formally infinite-dimensional, the method is able to overcome theoretical limitations of a deterministic functional closure on \({\mathcal X} = \mathbb R^2\), which by the Poincaré–Bendixson theorem cannot exhibit chaos in continuous time under smooth dynamics.
In section 6, we will use Palmer’s L63 closure scheme as a reference example to assess the performance of QMCl. Certain classes of SDE schemes derived via homogenization principles [73] can similarly be understood as a state-space augmentation of the resolved variables space \(\mathcal X\) to include the sample space of a Brownian motion driving the parameterized system.
3. Quantum mechanical representation of classical dynamics.
In this section, we construct the quantum mechanical system that we will use to model the unresolved degrees of freedom in QMCl. We work within the Dirac–von Neumann formulation of quantum mechanics, which associates to every quantum system a Hilbert space and represents its states and observables as density operators and self-adjoint operators, respectively. See Appendix A for an overview of relevant material from quantum theory, as well as one of the many textbooks in the literature; e.g., [88, 80]. Our goal is to build a quantum system on a finite-dimensional Hilbert space, \(H_L\), and use the quantum state space of this system, denoted as \(Q(H_L)\), as the state space of the surrogate dynamical model for the unresolved degrees of freedom, i.e., \(\tilde{{\mathcal Y}} = Q(H_L)\) in the notation of section 2. To that end, we follow the QMDA approach described in [33]. In this section, we give an overview of this procedure, referring the reader to [33] and Appendix B for further details.
The first step in our construction is an embedding of the classical dynamics on \(\Omega\) into a quantum system associated with an infinite-dimensional Hilbert space, \(H\). Under this embedding, classical observables (functions of the state) are represented by quantum mechanical observables, classical probability densities are represented by density operators, and dynamical evolution takes place under the induced action of the Koopman operator. We describe this embedding in subsection 3.2, following a brief overview of relevant definitions from Koopman and transfer operator theory in subsection 3.1. In subsection 3.3, we construct finite-dimensional quantum systems by discretization (i.e., finite-dimensional projection) of the infinite-dimensional system, choosing \(H_L \subset H\) as an \(L\)-dimensional space in a nested family of finite-dimensional subspaces associated with an orthonormal basis of \(H\). In subsection 3.4, we make explicit our choice of basis leading to the construction of \(H_L\) for QMCl applications.
3.1. Koopman and transfer operators.
The measure-preserving dynamical system \(\Phi : \Omega \to \Omega\) induces groups of linear operators \(U^n : L^p(\mu ) \to L^p(\mu )\) on the \(L^p\) spaces associated with the invariant probability measure \(\mu\), acting by composition with the dynamical flow, \(U^n f=f \circ \Phi^n\) with \(n \in \mathbb Z\). These operators are known as Koopman operators [57, 30] and preserve the \(L^p(\mu )\) norm for any \(p \in [0, \infty ]\) since \(\mu\) is an invariant measure under \(\Phi\). In the Hilbert space case, \(H := L^2(\mu )\), the Koopman operators are unitary, i.e., \((U^n)^* = U^{-n}\) for any \(n \in \mathbb Z\). We define the inner product of \(H\) as \(\langle f, g \rangle := \int_\Omega f^* g \, d\mu\) and use the notations \(\lVert f \rVert_{L^p(\mu )} := (\int_\Omega \lvert f \rvert^p \, d\mu )^{1/p}\) for \(p \in [1,\infty )\) and \(\lVert f \rVert_{L^\infty (\mu )} = \lim_{p\to \infty } \lVert f \rVert_{L^p(\mu )}\) for the standard \(L^p(\mu )\) norms.
Identifying \(L^p(\mu )\), \(p \in (1,\infty ]\), with the dual of \(L^q(\mu )\), \(q \in [1,\infty )\), with \(\frac{1}{p} + \frac{1}{q} = 1\), we have that \(U^n : L^p(\mu ) \to L^p(\mu )\) is the adjoint (dual) of the operator \(P^n : L^q(\mu ) \to L^q(\mu )\) with \(P^n f=f \circ \Phi^{-n}\), known as the transfer operator [4]. If we further identify \(L^q(\mu )\) as the space of finite Borel measures on \(\Omega\) with densities in \(L^q(\mu )\), denoted here as \(M_q(\mu )\), the transfer operator can be identified with the pushforward map \(\Phi^n_* : M_q(\mu ) \to M_q(\mu )\), where \(\Phi^n_* \nu = \nu \circ \Phi^{-n}\). That is, if \(\nu \in M_q(\mu )\) has density \(\varrho = \frac{d\nu }{ d\mu } \in L^q(\mu )\), then the measure \(\Phi^n_* \nu\) has density \(\frac{d \Phi^n_* \nu }{d\mu } = P^n \varrho\).
Koopman and transfer operators are central analytical tools in ergodic theory [4, 30]. In recent years, they have found widespread use in data-driven analysis and forecasting methodologies for dynamical systems; see, e.g., [28, 69, 78, 92, 14, 56, 8]. We will be addressing aspects of data-driven approximation in section 5, but for the remainder of this section our focus will be on quantum mechanical systems induced by the unitary Koopman operators on \(H\) and its subspaces.
3.2. Embedding into an infinite-dimensional quantum system.
Consider the space of bounded operators on the Hilbert space \(H\), denoted as \(B(H)\). Equipped with the operator norm, \(\lVert A\rVert := \sup_{f \in H \setminus \{0\}} \frac{\lVert Af\rVert_H}{\lVert f\rVert_H}\), and operator adjoint, \(A^*\), this space has the structure of a non-abelian von Neumann algebra; see Appendix A.4. The self-adjoint elements of \(B(H)\) constitute the space of bounded quantum observables on \(H\). Meanwhile, the space of quantum states of \(H\), denoted as \(Q(H)\), is the set of positive, trace-class operators of unit trace, known as density operators. Every density operator \(\rho \in Q(H)\) induces a bounded linear functional \(\mathbb E_\rho : B(H) \to \mathbb C\) such that \(\mathbb E_\rho A = \operatorname{tr}(\rho A)\). When \(A\) is self-adjoint, \(\mathbb E_\rho A\) is a real number corresponding to the expected value of the quantum observable \(A\) when the system is in state \(\rho\).
The unitary Koopman evolution group \(\{ U^n : H \to H \}_{n\in \mathbb Z}\) acts on quantum states and observables through the conjugation operators \(\mathcal P^n : Q(H) \to Q(H)\) and \(\mathcal U^n : B(H) \to B(H)\), respectively, defined as \(\mathcal P^n \rho = U^{n*}\rho U^n\) and \(\mathcal U^n A=U^n A U^{n*}\). These operators form dual pairs, in the sense that \(\mathbb E_{\mathcal P^n \rho } A = \mathbb E_\rho (\mathcal U^n A)\) for any \(\rho \in Q(H)\) and \(A \in B(H)\).
To embed the classical dynamics represented by the Koopman and transfer operators from subsection 3.1 into the quantum system just described, we restrict our attention to classical observables in the space \(L^\infty (\mu ) \subset H\). This space has the structure of an abelian von Neumann algebra under pointwise function multiplication and complex conjugation (see subsection A.4) and can be naturally embedded into \(B(H)\) through its regular representation. The latter is defined as the map \(\pi : L^\infty (\mu ) \to B(H)\), where \(\pi f\) is the multiplication operator by \(f\), i.e., \((\pi f) g=fg\) for all \(g \in H\). Note that \(\pi f\) is self-adjoint if and only if \(f\) is real-valued, and \(\pi f\) is a positive operator if and only if \(f \geq 0\) \(\mu\)-a.e. Moreover, the spectrum of the multiplication operator \(\pi f\) is equal to the essential range of \(f\), which corresponds to the values of \(f\) that have nonzero probability of occurring with respect to \(\mu\).
One readily verifies that \(\pi\) intertwines the Koopman operator \(U^n : L^\infty (\mu ) \to L^\infty (\mu )\) with its quantum mechanical counterpart \(\mathcal U^n : B(H) \to B(H)\); that is, we have \(\mathcal U^n \circ \pi = \pi \circ U^n\), and the following diagram commutes:
Similarly, letting \(P(\mu )\) denote the space of probability densities in \(L^1(\mu )\), we have an embedding \(\Gamma : P(\mu ) \to Q(H)\) that maps a probability density \(p\) to the density operator that projects along the square root of \(p\), i.e.,Note that \(\sqrt p\) is a unit vector in \(H=L^2(\mu )\) (since \(p\) is a probability density in \(L^1(\mu )\)), and \(\Gamma (p)\) is a rank-1 operator. Such states are known as pure states and are to be distinguished from mixed states where the rank of the corresponding density operators is greater than 1. Analogously to \(\pi\), the map \(\Gamma\) is compatible with the evolution of probability densities under the transfer operator \(P^n : P(\mu ) \to P(\mu )\) and the induced operator \(\mathcal P^n : Q(H) \to Q(H)\); that is, we have \(\mathcal P^n \circ \Gamma = \Gamma \circ P^n\).

\begin{align} \Gamma (p) = \langle \sqrt p, \cdot \rangle \sqrt p. \end{align}
(3.1)
Given a probability density \(p\in L^1(\mu )\), let \(\mathbb E_p : L^\infty (\mu ) \to \mathbb C\) denote the expectation functional \(\mathbb E_p f = \int_\Omega f \, d\mu\). Then, for any classical observable \(f \in L^\infty (\mu )\), probability density \(p \in P(\mu )\), and number of timesteps \(n \in{\mathbb N}\), the following classical–quantum compatibility relationships hold:
\begin{align} \mathbb E_p(U^n f) \equiv \mathbb E_{P^np} f = \mathbb E_{\Gamma (p)}(\mathcal U^n(\pi f)) \equiv \mathbb E_{\mathcal P^n(\Gamma (p))}(\pi f). \end{align}
(3.2)
Besides embedding dynamics, for the purposes of our closure scheme we will need a quantum mechanical representation of Bayesian conditioning of probability densities provided by quantum effects (see Appendix A.3). An effect in the operator algebra \(B(H)\) is any element \(e \in B(H)\) satisfying \(0 \leq e \leq \operatorname{Id}\). We denote the set of such effects as \(\mathcal E(H)\). Much like a characteristic function \(\chi_S\) of a measurable set \(S \subseteq \Omega\) represents a classical event as an element of the abelian algebra \(L^\infty (\mu )\) satisfying \(0 \leq \chi_S \leq 1\), an effect \(e\in \mathcal E(H)\) can be thought of as a quantum event. Classically, given a probability density \(p \in L^1(\mu )\) such that \(\mathbb E_p \chi_S \gt 0\), we can obtain the conditional density \(p|_{\chi_S}\) through the formulaGiven an effect \(e \in \mathcal E(H)\) and a density operator \(\rho \in Q(H)\) such that \(\mathbb E_\rho e \gt 0\), the quantum analogue of (3.3) iswhere \(\sqrt{e} \in B(H)\) is the (unique) positive square root of \(e\). As one can directly verify, our quantum mechanical representation of probability densities and classical observables is compatible with these conditioning operations; that is,
\begin{align} p|_{\chi_S} = \frac{\chi_S p}{\mathbb E_p \chi_S} \equiv \frac{\sqrt{\chi_S} p \sqrt{\chi_S}}{\mathbb E_p \chi_S}. \end{align}
(3.3)
\begin{align} \rho |_e = \frac{\sqrt{e} \rho \sqrt{e}}{\mathbb E_\rho e}, \end{align}
(3.4)
\begin{align} \Gamma (p|_{\chi_S}) = \Gamma (p)|_{\pi \chi_S}. \end{align}
(3.5)
3.3. Discretization.
Let \(\{ \phi_0, \phi_1, \ldots \}\) be an orthonormal basis of \(H\), and let \(H_L \subset H\) be the \(L\)-dimensional subspace defined asDeferring the task of choosing \(\{\phi_l\}\) to subsection 3.4, we construct finite-dimensional quantum systems on \(H_L\) by discretization (i.e., projection) of the infinite-dimensional system on \(H\) as follows.
\begin{align} H_L = \operatorname{span}\{ \phi_0, \ldots, \phi_{L-1} \}. \end{align}
(3.6)
Let \(\Pi_L : H \to H\) be the orthogonal projection with \(\operatorname{ran} \Pi_L = H_L\). When convenient, we will view \(\Pi_L\) as a map into \(H_L\) without change of notation. The projection \(\Pi_L\) induces a projection \(\boldsymbol \Pi_L : B(H) \to B(H)\) defined as \(\boldsymbol \Pi_L A = \Pi_L A \Pi_L\). The range of \(\boldsymbol \Pi_L\) can be canonically identified with \(B(H_L)\), i.e., the space of linear maps on \(H_L\), so we can consider \(\boldsymbol \Pi_L : B(H) \to B(H_L)\) as a map from (bounded) quantum observables on \(H\) to quantum observables on \(H_L\) (which are necessarily bounded since \(H_L\) is finite-dimensional). An important property of \(\boldsymbol \Pi_L\) is that it is a positive map, i.e., it maps positive operators in \(B(H)\) to positive operators in \(B(H_L)\). This is in contrast to \(\Pi_L\), which in general does not map positive functions to positive functions.
As a von Neumann algebra, \(B(H_L)\) is isomorphic to the \(L^2\)-dimensional algebra of \(L\times L\) complex matrices, which we denote as \(\mathbb M_L\). Such an isomorphism \(\boldsymbol \beta_L : B(H_L) \to \mathbb M_L\) is noncanonical since it depends on a choice of basis for \(H_L\), but for our purposes a natural choice is to define \(\boldsymbol \beta_L\) as the map that yields the matrix representation of operators in \(B(H_L)\) with respect to the \(\{ \phi_l \}\) basis from (3.6); that is, \(\boldsymbol \beta_L A = \boldsymbol A\), where \(\boldsymbol A = [A_{ij} ]_{i,j=0}^{L-1}\) is the \(L \times L\) matrix with elements \(A_{ij} = \langle \phi_i, A \phi_j \rangle\).
Using the quantum representation of classical observables \(\pi : L^\infty (\mu ) \to B(H)\) from subsection 3.2, we define a projected representation \(\pi_L : L^\infty (\mu ) \to B(H_L)\) and its matrix-valued counterpart \(\boldsymbol \pi_L : L^\infty (\mu ) \to \mathbb M_L\) as \(\pi_L = \boldsymbol \Pi_L \circ \pi\) and \(\boldsymbol \pi_L = \boldsymbol \beta_L \circ \pi_L\), respectively. By positivity of \(\pi\), \(\boldsymbol \Pi_L\), and \(\boldsymbol \beta_L\), the maps \(\pi_L\) and \(\boldsymbol \pi_L\) are both positive; i.e., they map positive functions to positive operators and positive matrices, respectively. For a real-valued element \(f \in L^\infty (\mu )\), \(\pi_L f\) is a finite-rank symmetric operator with a spectrum \(\{ a_0, \ldots, a_{J-1} \} \subset \mathbb R\) of real eigenvalues that can be loosely thought of a “discretization” of the essential range of \(f\); see Appendix B.2.
It is important to note that for a general element \(f \in L^\infty (\mu )\), the matrix representation \(\boldsymbol \pi_L f\) is not diagonal; that is, the range of \(\boldsymbol \pi_L\) is not included in an abelian subalgebra of \(\mathbb M_L\) (every such subalgebra would only contain diagonal matrices). This is in contrast to \(\pi\), which, being an algebra homomorphism, maps \(L^\infty (\mu )\) to an abelian subalgebra of \(B(H)\) (consisting of multiplication operators). Thus, as a result of discretization, our representation of classical observables by the finite-dimensional quantum system exhibits non-abelian behavior, which is an intrinsically quantum mechanical property.
Using the maps \(\boldsymbol{\mathit \Pi}_L\) and \(\beta_L\), quantum states, effects, and evolution operators are similarly discretized via projection onto \(H_L\) and represented as \(L \times L\) matrices. Discussion of the method of discretization for these objects, as well as associated consistency results, can be found in Appendix B.
3.4. Choice of basis.
Recall that the family of Hilbert spaces \(H_1 \subset H_2 \subset \cdots\) from subsection 3.3 is determined from an orthonormal basis \(\{ \phi_0, \phi_1, \ldots \}\) of \(H\). In choosing this basis, one must keep in mind that in applications the invariant measure \(\mu\) that defines the inner product of \(H\) is typically supported on an unknown, nonsmooth subset of state space (e.g., a fractal attractor). In such cases, defining the basis vectors \(\phi_l\) through explicit closed-form expressions is generally not possible, so we rely instead on a basis construction through eigendecomposition of kernel integral operators. This approach was introduced in the diffusion forecast technique [7], and was later used in related methods for Koopman operator approximation [40, 35, 27], as well as in QMDA [36, 33]. For our purposes, an advantage of using eigenfunctions of integral operators is their approximability from training data sampled from the invariant measure \(\mu\); we will take up this task in section 5.
Let \(W : \Omega \to{\mathcal W}\) be a measurable function into a data space \(\mathcal W\) that we will use to build our basis. In the applications presented in sections 6 and 7 below, \(\mathcal W\) will be a Euclidean space, \({\mathcal W} = \mathbb R^{d_{\mathcal W}}\) for some dimension \(d_{{\mathcal W}}\), and possibilities for \(W\) will include the following:
•
The identity map \(\operatorname{Id}\) for \({\mathcal W} = \Omega\), which corresponds to using information from the full system state to build the basis.
•
The projection map \(\mathcal P_{{\mathcal X}} : \Omega \to{\mathcal X}\) for \({\mathcal W} ={\mathcal X}\), which corresponds to building the basis using information from only the resolved variables.
•
A delay-coordinate map [81] for \({\mathcal W} ={\mathcal X}^Q\), where \(Q \in{\mathbb N}\) is the number of delays. This corresponds to using the dynamics to implicitly recover some of the information about the full system state lost through projection onto \(\mathcal X\).
For any of these choices, let \(\kappa_{{\mathcal W}} :{\mathcal W} \times{\mathcal W} \to \mathbb R\) be a symmetric kernel function and \(\kappa : \Omega \times \Omega \to \mathbb R\) its pullback to \(\Omega\), i.e., \(\kappa (\omega,\omega^{\prime }) = \kappa_{{\mathcal W}}(W(\omega ),W(\omega^{\prime }))\). We assume that \(W\) and \(\kappa_{{\mathcal W}}\) have sufficient regularity so that \(\kappa\) lies in \(L^2(\mu \times \mu )\). In that case, the integral operator \(K : H \to H\) defined asis a self-adjoint, real, Hilbert–Schmidt integral operator. As a result, there exists a real orthonormal basis \(\{ \phi_0, \phi_1, \ldots \}\) of \(H\) consisting of eigenvectors of \(K\); that is,where the eigenvalues \(\lambda_l\) are real and satisfy \(\lvert \lambda_0 \rvert \geq \lvert \lambda_1 \rvert \geq \cdots \searrow 0\). In the experiments of sections 6 and 7, our nominal choice for \(\kappa_{{\mathcal W}}\) is the radial Gaussian kernel on \({\mathcal W} = \mathbb R^{d_{{\mathcal W}}}\),where \(\epsilon_{{\mathcal W}} \gt 0\) is a bandwidth parameter that we tune automatically (see Appendix D.3.1). In some of our experiments, we will use a variable-bandwidth generalization of (3.9) proposed in [9], which was also used in [36, 33] in QMDA applications. We define the Hilbert spaces \(H_L\) using (3.6) with the eigenvectors \(\phi_l\) from (3.8).
\begin{align} K f = \int_\Omega \kappa (\cdot, \omega ) f(\omega ) \, d\mu (\omega ) \end{align}
(3.7)
\begin{align} K \phi_l = \lambda_l \phi_l, \end{align}
(3.8)
\begin{align} \kappa_{{\mathcal W}}(w,w^{\prime }) = e^{-\lVert w - w^{\prime } \rVert^2_2/\epsilon_{{\mathcal W}}^2}, \end{align}
(3.9)
Next, note that whenever \(\kappa\) is bounded on the (compact) support of \(\mu\) and the eigenvalue \(\lambda_l\) is nonzero, any corresponding eigenvector \(\phi_l\) (which is an equivalence class of functions on \(\Omega\), defined \(\mu\)-a.e.) is represented by the everywhere-defined function \(\varphi_l : \Omega \to \mathbb R\) such thatIt follows from (3.10) that every such function \(\varphi_l\) inherits the regularity properties (e.g., boundedness, continuity, differentiability) of the kernel \(\kappa\). This motivates using kernels with plentiful nonzero corresponding eigenvalues in order to control the regularity of the basis functions. More specifically, noticing that \(\varphi_l\) from (3.10) is the pullback of a function on \(\mathcal W\), i.e., \(\varphi_l = \varphi_l^{({\mathcal W})} \circ W\) for some \(\varphi_l^{({\mathcal W})} :{\mathcal W} \to \mathbb R\), we obtain a “maximal” number of nonzero eigenvalues \(\lambda_l\) if the \(\varphi_l^{({\mathcal W})}\) form an orthonormal basis of the Hilbert space \(L^2(\mu_{{\mathcal W}})\) on \(\mathcal W\) associated with the pushforward measure \(\mu_{{\mathcal W}}:= W_* \mu\). This will hold, for instance, if \(\kappa_{{\mathcal W}}\) is an integrally strictly positive-definite kernel on \(\mathcal W\) [86], e.g., the radial Gaussian kernel on \({\mathcal W} = \mathbb R^{d_{{\mathcal W}}}\) from (3.9).
\begin{align} \varphi_l(\omega ) = \frac{1}{\lambda_l} \int_\Omega \kappa (\omega,\omega^{\prime }) \phi_l(\omega^{\prime }) \, d\mu (\omega^{\prime }). \end{align}
(3.10)
4. Quantum mechanical closure (QMCl).
Recall from section 2 that the state space of the parameterized system has the decomposition \(\tilde \Omega = \mathcal{X} \times \tilde{\mathcal{Y}}\), where \(\tilde{\mathcal{Y}}\) is the state space of the surrogate model for the unresolved degrees of freedom, and \(\mathcal{X}\) is the space of variables we seek to accurately predict. In QMCl, we set \(\tilde{{\mathcal Y}}\) to the state space \(\tilde{{\mathcal Y}} = Q(H_L)\) of the finite-dimensional quantum system constructed in subsection 3.3 for some \(L \in{\mathbb N}\). Thus, in order to evolve the parameterized system via (2.2), we need to specify (i) the surrogate flux term \(\tilde Z : Q(H_L) \to{\mathcal Z}\) and (ii) the evolution map \(\tilde \psi :{\mathcal X} \times Q(H_L) \to Q(H_L)\) of the surrogate model. These maps will be specified in subsections 4.1 to 4.3. In subsection 4.4, we define a map \(\alpha : \Omega \to \tilde \Omega\) that assigns initial conditions of the parameterized system on \(\tilde \Omega\) from initial conditions of the true system (see the commutative diagram (2.1)).
4.1. Flux term.
We assume throughout that \(\mathcal Z\) is a Euclidean space, \({\mathcal Z} = \mathbb R^d\). With that assumption, we represent the true and surrogate flux terms componentwise, i.e., \(Z = (Z^{(1)}, \ldots, Z^{(d)})\) and \(\tilde Z = (\tilde Z^{(1)}, \ldots, \tilde Z^{(d)})\), where \(Z^{(i)} :{\mathcal Y} \to \mathbb R\) and \(\tilde Z^{(i)} : Q(H_L) \to \mathbb R\) are real-valued functions. We also assume that for each \(i \in \{ 1, \ldots, d \}\), the pullback function \(\zeta^{(i)} : \Omega \to \mathbb R\) with \(\zeta^{(i)} = Z^{(i)} \circ P_{{\mathcal Y}}\) lies in \(L^\infty (\mu )\). Under that condition, \(\zeta^{(i)}\) has a representation in \(B(H)\) by the multiplication operator \(\pi \zeta^{(i)}\), and we have a corresponding projected operator \(\pi_L \zeta^{(i)} \in B(H_L)\). We set \(\tilde Z^{(i)}\) to the quantum mechanical expectation associated with this multiplication operator, i.e.,
\begin{align} \tilde Z^{(i)}(\rho ) = \mathbb E_\rho (\pi_L \zeta^{(i)}). \end{align}
(4.1)
Given that the resolved variables and the quantum state at time \(t_n\) are \(x_n \in{\mathcal X}\) and \(\rho_n \in B(H_L)\), respectively, we use (4.1) and the resolved dynamics \(\phi :{\mathcal X} \times{\mathcal Z} \to{\mathcal X}\) to update \(x_n\) to the state \(x_{n+1}\) at time \(t_{n+1}\) via the formulaThis step is depicted in the top line in the schematic of Figure 1.
\begin{align} x_{n+1} = \phi (x_n, z_n), \quad z_n = \tilde Z(\rho_n). \end{align}
(4.2)
4.2. Quantum state update.
Our approach for updating the quantum state \(\rho_n \in Q(H_L)\) given the resolved variables in \(x_n \in{\mathcal X}\) is inspired by the prediction–correction scheme employed in sequential data assimilation (filtering) [65, 59]. First, we use the projected transfer operator \(\mathcal P_L\) to evolve \(\rho_n\) to the quantum statesee the arrow labeled “transfer operator” in the bottom row of the schematic in Figure 1. The state \(\tilde \rho_{n+1}\) plays a role analogous to the prior density in classical data assimilation. Then we obtain the quantum state \(\rho_{n+1}\) at time \(t_{n+1}\) by conditioning \(\tilde \rho_{n+1}\) by a quantum effect \(e_{n+1} \in \mathcal E(H_L)\) induced by the classical state \(x_{n+1}\) using (3.4), i.e.,where \(\mathcal F_L :{\mathcal X} \to \mathcal E(H_L)\) is an effect-valued map that will be described in subsection 4.3. This update is represented by the arrow labeled “state conditioning” in the second row of the schematic in Figure 1. The state \(\rho_{n+1}\) obtained in this manner is loosely analogous to the posterior density in classical data assimilation. Note that if \(\tilde \rho_{n+1}\) is pure, then \(\rho_{n+1}\) is also pure. The map \(\tilde \psi :{\mathcal X} \times Q(H_L) \to Q(H_L)\) that evolves the state of the quantum system is defined as \(\tilde \psi (x_n,\rho_n) = \rho_{n+1}\), where \(\rho_{n+1}\) is obtained from \((x_n, \rho_n)\) by combining (4.2), (4.3), and (4.4).
\begin{align} \tilde \rho_{n_+1} := \frac{\mathcal P_L \rho_n}{\operatorname{tr}(\mathcal P_L \rho_n)}; \end{align}
(4.3)
\begin{align} \rho_{n+1} = \tilde \rho_{n+1} \rvert_{e_{n+1}} = \frac{\sqrt{e_{n+1}} \tilde \rho_{n+1} \sqrt{e_{n+1}} }{\operatorname{tr}(\sqrt{e_{n+1}} \tilde \rho_{n+1} \sqrt{e_{n+1}} )}, \quad e_{n+1} = \mathcal F_L(x_{n+1}), \end{align}
(4.4)
In applications, the frequency of updating the state via conditioning on a quantum effect can be varied. Instead of using the map \(\tilde{\psi }\) as defined above, one may use \(\tilde{\psi }_r : \mathcal{X} \times Q(H_L) \to Q(H_L)\), where (4.2) and (4.3) are iterated \(r\) times between every application of (4.4); see Algorithm D.1. This corresponds to lengthening the time under which the quantum component of the system undergoes Koopman evolution before it is updated via conditioning on the observation of an effect. It should be kept in mind that, in general, the Hilbert space dimension \(L\) required to attain a given level of approximation accuracy is not directly related to the dimension of the space \(\mathcal Y\) of unresolved variables. Rather, the performance of QMCl depends on how accurately relevant operators such as the Koopman operator, the multiplication operators representing the flux functions, and the effect-valued feature maps are represented on the subspace \(H_L\).
4.3. Effect-valued feature map.
What remains to completely define the posterior state in (4.4) is to specify the effect-valued map \(\mathcal F_L :{\mathcal X} \to \mathcal E(H_L)\). Following [33], we build this map as an operator-valued feature map induced by a measurable kernel function \(k :{\mathcal X} \times{\mathcal X} \to [0,1]\) on the resolved variables space. Every such function induces a feature map \(F :{\mathcal X} \to L^\infty (\mu )\) with \(\operatorname{ran} F \subseteq L^\infty (\mu )\), given by \(F(x) = k(x,\cdot )\). Composing this map with the representation \(\pi : L^\infty (\mu ) \to B(H)\) leads to a feature map \(\mathcal F :{\mathcal X} \to \mathcal E(H)\), \(\mathcal F = \pi \circ F\), which takes values in the effect space of the infinite-dimensional operator algebra \(B(H)\). We obtain \(\mathcal F_L\) by projection of \(\mathcal F\) onto \(B(H_L)\), i.e., \(\mathcal F_L = \boldsymbol \Pi_L \circ \mathcal F\).
In the applications presented in sections 6 and 7, \({\mathcal X} = \mathbb R^{d_{\mathcal{X}}}\) is a Euclidean space, and we use a radial Gaussian kernelThe radial Gaussian kernel is strictly positive-definite [48], which implies that the corresponding feature map \(F\) is injective.
\begin{align} k(x,x^{\prime }) = e^{-\lVert x - x^{\prime } \rVert^2_2/ \epsilon^2}, \quad \epsilon \gt 0. \end{align}
(4.5)
4.4. Initialization.
Assuming that the parameterized dynamics \(\tilde \Phi : \tilde \Omega \to \tilde \Omega\) have been appropriately defined, we would ideally like to have an initialization map \(\alpha : \Omega \to \tilde \Omega\) such that the commutative diagram (2.1) is satisfied. Practically, we cannot assume to have access to the unresolved variables in \(\mathcal Y\), so we must contend with maps that depend on \((x,y) \in \Omega\) only through \(x \in{\mathcal X}\). With that in mind, it is natural to choose \(\alpha\) using the effect-valued feature map from subsection 4.3, definingwhenever \(\operatorname{tr}(\mathcal F_L(x))\gt 0\). Note that \(\rho_x\) defined in this way is a state since \(\mathcal F_L(x)\) is a positive operator for all \(x \in{\mathcal X}\). Empirically, we find that following the decay of initial transients, the behavior of QMCl-parameterized systems does not depend significantly on the choice of \(\rho_x\). For instance, replacing (4.6) bywhich ignores any information from \(x\) for the assignment of the initial quantum state \(\bar \rho\), was found to impart negligible changes in the long-term statistical behavior of the parameterized system. The numerical experiments in sections 6 and 7 (which focus on one- and two-point statistics under the invariant measure) utilize \(\alpha\) from (4.7).
\begin{align} \alpha (x) = (x, \rho_x), \quad \rho_x = \frac{\mathcal F_L(x)}{\operatorname{tr}(\mathcal F_L(x))}, \end{align}
(4.6)
\begin{align} \alpha (x) = (x, \bar \rho ), \quad \bar \rho = \langle 1_\Omega, \cdot \rangle 1_\Omega, \end{align}
(4.7)
Computationally, an advantage of (4.7) over (4.6) is that the former is a pure state, whereas the latter is generally mixed. Since the Koopman evolution and state conditioning preserve the purity of states, this means that the entire QMCl evolution initialized via (4.7) remains pure, resulting in a significant reduction of computational cost (see Appendix C.5). However, it is possible that initializing with the mixed state (4.6) (or a low-rank approximation thereof) as opposed to (4.7) would be beneficial in initial-value prediction experiments, which we do not address in this paper.
4.5. Stochastic parameterization.
The QMCl scheme described in subsections 4.1 to 4.3 is deterministic, in the sense that a given initial condition \((x_0,\rho_0) \in ({\mathcal X}, Q(H_L))\) completely determines the sequence \((x_0, \rho_0), (x_1, \rho_1), \ldots\) of resolved variables \(x_n\) and quantum states \(\rho_n\). In another approach, which we call the “stochastic” approach, we set \(\tilde{{\mathcal Y}}\) to the sequence space \(\tilde{{\mathcal Y}} ={\mathcal Z}^{{\mathbb N}}\). As in subsection 2.1.2, we equip \(\tilde{{\mathcal Y}}\) with a \(\sigma\)-algebra \(\Sigma_{\tilde{{\mathcal Y}}}\) and a probability measure \(\nu\) that is invariant under the shift map \(T : \tilde{{\mathcal Y}} \to \tilde{{\mathcal Y}}\). We also let \(u : \tilde{{\mathcal Y}} \to \mathbb Z\) be the projection map onto the first coordinate, and we define the random variables \(\tilde Z_n : \tilde{{\mathcal Y}} \to{\mathcal Z}\) such that \(\tilde Z_n = u \circ T^n\). We formally choose the probability space \((\tilde{{\mathcal Y}}, \Sigma_{\tilde{{\mathcal Y}}}, \nu )\) such that the components \(Z_n^{(1)}, \ldots, Z_n^{(d)} : \tilde{{\mathcal Y}} \to \mathbb R\) are independent, real-valued random variables distributed according to a probability measure \(\mathbb P_{\rho_n, \pi \zeta^{(j)}}\) induced by the density operator \(\rho_n\) and the quantum observable \(\pi \zeta^{(j)}\) according to the axioms of quantum mechanics (see Appendices A.2 and B.2). Operationally, this means that the values \(z_n = (z_n^{(1)}, \ldots, z_n^{(d)}) \in{\mathcal Z}\) of the flux terms at timestep \(n\) are obtained by independent random draws from the measurement distributions \(\mathbb P_{\rho_n, \pi \zeta^{(1)}}, \ldots, \mathbb P_{\rho_n, \pi \zeta^{(d)}}\), respectively, determined via (B.4). The update formulas (4.3) and (4.4) for the quantum state \(\rho_n\) remain unchanged. See Algorithm D.3 for pseudocode.
5. Data-driven formulation.
To move towards a data-driven algorithm, we must first establish a method to generate a discrete analogue of \(H = L^2(\mu )\) (and then of \(H_L\)) which can be manipulated in practice. To that end, we make the following standing assumptions.
A1.
We have access to samples \(\hat w_0, \ldots, \hat w_{N-1} \in{\mathcal W}\), \(\hat x_0, \ldots, \hat x_{N-1} \in{\mathcal X}\), and \(\hat z_0, \ldots, \hat z_{N-1} \in{\mathcal Z}\) from the map \(W\), the resolved variables \(X\), and the unresolved fluxes \(Z\), respectively, with \(\hat w_n = W(\hat \omega_n)\), \(\hat x_n = X(\hat \omega_n)\), and \(\hat z_n = Z(\hat \omega_n)\). The samples are taken along a dynamical trajectory \(\hat \omega_0, \ldots, \hat \omega_{N-1} \in \Omega\) with \(\hat \omega_n = \Phi^n(\hat \omega_0)\).
A2.
The invariant measure \(\mu\) is ergodic.
A3.
There is a compact, forward-invariant set \(\mathcal M \subseteq \Omega\) (i.e., \(\Phi (\mathcal M) \subseteq \mathcal M\)) that contains the support of \(\mu\) and the starting point \(\hat \omega_0\) (and thus the entire orbit \(\hat \omega_0, \hat \omega_1, \ldots\)).
With these assumptions, we use the samples \(\{(\hat w_n,\hat x_n,\hat z_n)\}_{n=0}^{N-1}\) as training data to build data-driven QMCl models that are structurally very similar to the data-independent models described in section 4. Our construction follows closely the data-driven formulation of QMDA [36, 33]. Note that we do not assume knowledge of the states \(\hat \omega_n\) underlying the training data \((\hat w_n,\hat x_n,\hat z_n)\). Also, we do not require that the \(\hat \omega_n\) lie in the support of \(\mu\) (which may be a null set with respect to an ambient measure on \(\Omega\), such as an attractor of a dissipative system), but allow instead initial conditions \(\omega_0\) drawn from the larger set \(\mathcal M\) of potentially positive ambient measure [11].
5.1. Discretization of operators.
Taking \(\mu_N = \sum_{n=0}^{N-1} \delta_{\hat \omega_n}/ N\) to be the discrete sampling measure on \(\{ \hat \omega_n \}_{n=0}^{N-1}\), and \(\hat{H}_N := L^2(\mu_N)\), we begin by approximating operators on \(H=L^2(\mu )\) by operators on \(\hat{H}_N\) (where elements of \(\hat{H}_N\) are represented simply by functions on \(\{ \hat \omega_n \}_{n=0}^{N-1}\); see Appendix C). Operators on \(\hat{H}_N\) are further approximated by operators on an \(L\)-dimensional subspace \(H_{L,N} \subseteq \hat{H}_N\) spanned by the leading \(L\) eigenfunctions of a data-driven kernel integral operator \(K_N : \hat H_N \to \hat H_N\) (defined in (C.3)). The orthogonal projection onto \(H_{L,N}\) will be denoted as \(\Pi_{L,N} : \hat H_N \to \hat H_N\), and we also define \(\boldsymbol \Pi_{L,N} : B(\hat H_N) \to B(\hat H_N)\) such that \(\boldsymbol \Pi_{L,N} A = \Pi_{L,N} A \Pi_{L,N}\). The reason for introducing the subspaces \(H_{L,N}\) is that taking the limit of large training data, \(N\to \infty\), at fixed dimension \(L\) allows us to control errors relative to the QMCl scheme on \(H_L\) described in subsection 3.3. More details on the numerical aspects of the data-driven scheme, along with some associated convergence results, can be found in Appendices C and D.
5.2. Data-driven QMCl framework.
In the data-driven setting, the state space of the quantum mechanical model of the unresolved variables is \(\tilde{{\mathcal Y}} = Q(H_{L,N})\). Analogously to the data-independent formulation in section 4, we evolve the parameterized system on \({\mathcal X} \times \tilde{{\mathcal Y}}\) using a surrogate flux \(\tilde Z : Q(H_{L,N}) \to{\mathcal Z}\) and an evolution map \(\tilde \psi :{\mathcal X} \to Q(H_{L,N})\). In this subsection, we give an outline of the construction of these maps, focusing on the differences between the data-driven approach and the formulation of section 4. Specific formulas and pseudocode relevant to the data-driven setting are included in Appendix D.1.
Flux terms. As in subsection 4.1, we prescribe the surrogate flux \(\tilde Z : Q(H_{L,N}) \to{\mathcal Z}\) using quantum mechanical expectations of discrete multiplication operators. For that, we first note that the space \(L^\infty (\mu_N)\) is a finite-dimensional, abelian von Neumann algebra, which we use as a data-driven analogue of \(L^\infty (\mu )\) (see Appendix A.4). This algebra has a regular representation \(\hat \pi_N : L^\infty (\mu_N) \to B(\hat H_N)\) that maps each vector \(f \in \hat H_N\) to the discrete multiplication operator that multiplies by \(f\). Note that \(\hat \pi_N f\) is a diagonal operator in the standard basis of \(\hat H_N\). We use \(\hat \pi_N\) and the projected representation \(\pi_{L,N} := \boldsymbol \Pi_{L,N} \circ \hat \pi_N\) as data-driven analogues of \(\pi : L^\infty (\mu ) \to B(H)\) and \(\pi_L : L^\infty (\mu ) \to B(H_L)\), respectively. The training samples \(\hat z_n \in{\mathcal Z} \equiv \mathbb R^d\) define, componentwise, a collection of elements \(\zeta^{(1)}_N, \ldots, \zeta^{(d)}_N \in L^\infty (\mu_N)\) such that \(\zeta^{(i)}_N(\omega_n) = \hat z_n^{(i)}\), where \(\hat z_n^{(i)}\) is the \(i\)th component of \(\hat z_n = ( \hat z_n^{(1)}, \ldots, \hat z_n^{(d)})\). Analogously to (4.1), we define \(\tilde Z = (\tilde Z^{(1)}, \ldots, \tilde Z^{(d)})\) with \(\tilde Z^{(i)}(\rho ) = \mathbb E_\rho (\pi_{L,N} \zeta^{(i)}_N)\). The evolution of the resolved variables in \(\mathcal X\) given \(\rho \in Q(H_{L,N})\) is carried out via (4.2). Computationally, the quantum observables \(\tilde Z^{(i)}\) and states \(\rho\) are represented by their \(L\times L\) matrix representations in the \(\{ \phi_{l,N} \}\) basis of \(H_{L,N}\), i.e., \(\boldsymbol Z^{(i)}_{L,N} = \boldsymbol \beta_{L,N} \tilde Z^{(i)}\) and \(\boldsymbol \rho = \boldsymbol \beta_{L,N} \rho\), where \(\boldsymbol \beta_{L,N} : B(H_{L,N}) \to \mathbb M_L\)is defined analogously to \(\boldsymbol\beta_L\); see Appendix C.2.
Quantum state update. We employ a predictor–corrector scheme similar to that in subsection 4.2. Given the quantum state \(\rho_n \in Q(H_{L,N})\) and resolved variables \(x_n \in{\mathcal X}\) at time \(t_n\), we use a data-driven approximation \(\mathcal P_{L,N} : B(H_{L,N}) \to B(H_{L,N})\) of the transfer operator to obtain the prior state \(\tilde \rho_{n+1}\) at time \(t_{n+1}\) analogously to (4.3) and an effect-valued feature map \(\tilde{\mathcal F}_{L,N}:{\mathcal X} \to \mathcal E(H_{L,N})\) that updates \(\tilde \rho_{n+1}\) by conditioning by \(e_{n+1} = \tilde{\mathcal F}_{L,N}(x_{n+1})\) as in (4.4). The transfer operator \(\mathcal P_{L,N}\) is based on an approximation of the Koopman operator \(U : H \to H\) by a shift operator \(\hat U_N : \hat H_N \to \hat H_N\) [7]. That is, we have \(\mathcal P_{L,N} A=U_{L,N}^* A U_{L,N}\), where \(U_{L,N} = \boldsymbol \Pi_{L,N} \hat U_N\), and \(U_{L,N}\) is represented by the \(L \times L\) matrix \(\boldsymbol U_{L,N} = \boldsymbol \beta_{L,N} U_{L,N}\); see Appendix D.1.4 for further details. The effect-valued feature map \(\tilde{\mathcal F}_{L,N}\) is constructed analogously to \(\mathcal F_L\) from subsection 4.3 using the radial Gaussian kernel in (4.5) in conjunction with the samples \(\hat x_n\). In the \(\{ \phi_{l,N} \}\) basis of \(H_{L,N}\), the map \(\tilde{\mathcal F}_{L,N}\) is represented by a matrix-valued map \(\tilde{\boldsymbol F}_{L,N} :{\mathcal X} \to \mathbb M_L\) with \(\tilde{\boldsymbol F}_{L,N} = \boldsymbol \beta_{L,N} \circ \tilde{\mathcal F}_{L,N}\) that we use in numerical applications; see Appendix D.1.5.
Initialization. In the experiments presented in sections 6 and 7, we initialize the parameterized system with an uninformative quantum state analogously to (4.7). Specifically, given any \(x \in{\mathcal X}\), we set the initial state \(\alpha (x) = (x, \bar \rho_{N,L}) \in{\mathcal X} \times Q(H_{L,N})\), whereAn alternative approach would be to set \(\alpha (x) = (x,\rho_x)\), where the quantum state \(\rho_x\) is obtained via the effect-valued feature map \(\mathcal F_{L,N}\) analogously to (4.6). As mentioned in subsection 4.4, numerically we found that the uninformative initialization approach has negligible impact on the ability of the parameterized system to reproduce the equilibrium statistical behavior of the original system, but initializing with (4.6) may be important in initial-value prediction experiments.
\begin{align} \bar \rho_{N,L} = \frac{\boldsymbol \Pi_{L,N} \bar \rho_N}{\operatorname{tr}(\boldsymbol \Pi_{L,N}\bar \rho_N)}, \quad \bar \rho_N= \langle 1_\Omega, \cdot \rangle_N 1_\Omega. \end{align}
(5.1)
Stochastic parameterization. The data-driven QMCl framework has a stochastic variant which is entirely analogous to the scheme described in subsection 4.5.
6. Quantum mechanical closure of the L63 system.
As our first set of numerical examples, we apply QMCl to the L63 system [63]. The L63 system is classically defined as the dynamical system on \(\mathbb{R}^3\) where \((x(t), y(t), z(t)) \in \mathbb{R}^3\) evolves asfor the parameter values \(\beta=8/3\), \(\rho=28\), and \(\sigma=10\). However, as noted in [71], the L63 system can be expressed in terms of the system (2.4), where the variables \((a_1(t), a_2(t), a_3(t)) \in \Omega \equiv \mathbb R^3\) are obtained by projection of \((x(t),y(t),z(t))\) onto the EOF basis vectors and have decreasing variance.
\begin{align*} \dot x(t) = \sigma (y(t) - x(t)), \quad \dot y(t) = x(t) (\rho - z(t)) - y(t), \quad \dot z(t) = x(t) y(t) - \beta z(t) \end{align*}
6.1. Experimental setup.
We follow closely the setup of [71], which we outlined in subsection 2.1.2. We assume knowledge of the equations governing the \(a_1\) and \(a_2\) components but not of that governing the \(a_3\) component. That is, we have \((a_1, a_2) \in{\mathcal X} \equiv \mathbb R^2\), \(a_3 \in{\mathcal Y} \equiv \mathbb R\), and the flux term is \(Z :{\mathcal Y} \to{\mathcal Z} \equiv \mathbb R\) with \(Z(a_3) = a_3\). The true discrete-time system on \(\Omega\) evolves under the time-\(\Delta t\) flow generated by (2.4) for a timestep of \(\Delta t=0.01\); that is, we have \(\Phi : \Omega \to \Omega\) with \(\Phi = \Phi^{0.01}\). Of course, \(\Phi\) is not available in closed form, so practically we consider as the “true” L63 dynamics a numerical approximation of \(\Phi\) by a high-fidelity ordinary differential equation solver. Here we use the ode45 solver of MATLAB which is based on an adaptive Runge–Kutta scheme of order \((4,5)\). As our approximate resolved dynamics \(\tilde \phi :{\mathcal X} \times{\mathcal Z} \to{\mathcal Z}\) we use a standard 4th-order Runge–Kutta (RK4) discretization of the \(a_1\) and \(a_2\) equations in (2.4), treating \(a_3\) as fixed; see (D.13) for an explicit formula. We evolve the quantum states in \(\tilde{{\mathcal Y}} = Q(H_{L,N})\) using the map \(\psi_r :{\mathcal X} \times Q(H_{L,N}) \to Q(H_{L,N})\) from subsection 4.2 for various choices of the number of timesteps \(r\) between state updates by the quantum Bayes’ rule (4.4). For each experiment, the value of \(L\) was chosen after numerical experimentation with the goal of balancing accuracy of reproduction of the statistical behavior of the true system (assessed via probability density functions (PDFs) and time-autocorrelation functions) with computational cost. Although increasing \(L\) tends to improve accuracy, there tends to be a value of \(L\) after which increasing \(L\) has diminishing gains. We attempted to set \(L\) as large as possible without going beyond this heuristic threshold.
6.2. Experiments with large, full training data.
We first consider QMCl models trained with a long time series of the full system state. In these experiments, the training observation map \(W : \Omega \to{\mathcal W}\) is the identity map on \({\mathcal W} = \Omega = \mathbb R^3\), and we use a training time series \(\hat \omega_0, \ldots, \hat \omega_{N-1} \in{\mathcal W}\) consisting of \(N=\text{150,000}\) samples with \(\hat \omega_n = (\hat x_n,\hat y_n)= \Phi^n(\hat \omega_0)\). The initial training state \(\omega_0\) is obtained by integrating (2.4) with the initial condition \((2,2,2)\) for 500 model time units (i.e., \(500/\Delta t = \text{50,000}\) timesteps) and setting \(\hat \omega_0\) to the final point of that trajectory. We also use the values \(\hat z_0, \ldots, \hat z_{N-1} \in \mathbb R\) of the flux term, \(\hat z_n = Z(\hat y_n) = \hat y_n\), from the training trajectory.
Setting the dimension parameter \(L = \text{1500}\), we compute the kernel eigenfunction basis \(\{ \phi_{l,N} \}_{l=0}^{L-1}\) of \(H_{L,N}\) from (C.4) using the training data \(\hat \omega_n\). The kernel function \(\kappa_{{\mathcal W}}\) is the radial Gaussian kernel (3.9) with the bandwidth parameter \(\epsilon_{{\mathcal W}} = \sqrt{19}\) chosen via automatic tuning (see Appendix D.3.1). We then compute the corresponding \(L \times L\) matrix representations \(\boldsymbol U_{L,N}\) and \(\boldsymbol Z_{L,N}\) of the projected Koopman operator and the projected multiplication operator representing the flux term \(Z\) as described in section 5.2. We also use the basis functions to build the effect-valued feature map \(\tilde{\boldsymbol F}_{L,N}\) described in Appendix D.1.5. This map is based on the radial Gaussian kernel \(k\) in (4.5) with the bandwidth parameter \(\epsilon=2\). This completes the training phase of QMCl.
To run the parameterized system, we generate a state \(\omega_0 \in \Omega\) near the Lorenz attractor analogously to \(\hat \omega_0\); that is, we integrate (2.4) over 500 model time units with the initial condition \((1.99,2,2)\) and set \(\omega_0 =( x_0, y_0)\) to the last point of that trajectory. We then use (5.1) to generate an initial condition \(\iota ( x_0) = ( x_0, \rho_0) \in{\mathcal X} \times Q(H_{L,N})\) of the parameterized system. We condition the quantum state via the \(\tilde{\mathcal F}_{L,N}\) effect map every \(r=10\) timesteps (i.e., every \(r \, \Delta t=0.1\) model time units).
Starting from \(( x_0, \rho_0)\), the QMCl system generates via Algorithm D.2 a time-ordered sequence of pairs \(( x_0, \rho_0), ( x_1, \rho_1), \ldots\) of resolved variables \(x_n \in{\mathcal X}_n\) and quantum states \(\rho_n \in Q(H_{L,N})\), as well as a corresponding sequence of flux terms \(z_0, z_1, \ldots \in{\mathcal Z}\) given by \(z_n = \tilde Z(\rho_n)\) in accordance with (4.1). Under a “perfect” closure in the sense of the commutative diagram (2.1), the sequence of resolved variables \(x_0, x_1, \ldots\) should match the time series \(\tilde x_0, \tilde x_1, \ldots\) of the \((a_1,a_2)\) state vector components under the L63 flow starting from the same initial condition, i.e., \(\tilde x_n = (a_1(t_n), a_2(t_n))\) with \((a_1(t_n), a_2(t_n), a_3(t_n) ) = \Phi^{n\,\Delta t}( \omega_0)\). If, in addition, the flux term \(Z=a_3\) is consistently approximated by \(\tilde Z\), then the time series \(( x_0, z_0 ), ( x_1, z_1), \ldots\) generated by the QMCl system should match the full three-dimensional L63 trajectory \(\tilde \omega_0, \tilde \omega_1, \ldots\) with \(\tilde \omega_n = (a_1(t_n),a_2(t_n),a_3(t_n))\). Here we assess the performance of the scheme by examining its ability to reproduce salient qualitative features of the Lorenz attractor and to recover the marginal distributions and time-autocorrelation functions of \(a_1\), \(a_2\), and \(a_3\).
Figure 2 compares representative trajectories of the resolved variables \(x\) under the true L63 dynamics (Figure 2(a)) and the QMCl system (Figure 2(b)). It is readily apparent that the QMCl system generates a structure with a geometry similar to the Lorenz attractor. It is also worth noting that this similarity holds not just in the \(a_1, a_2\) projection (which would be sufficient for a parameterization scheme). In fact, as shown by the three-dimensional trajectory plots in Figure 3, the QMCl algorithm meaningfully reconstructs all three dimensions of the system. Particularly, the trajectories show that the QMCl system evolves in the \(a_3\) component in a qualitatively similar manner to the true system. In Figure 4, we show time series plots for the \(a_1\) component under the true L63 and QMCl dynamics. The general rate and aperiodicity of the transitions, as well as the tendency for the point to sometimes oscillate in one lobe with increasing amplitude prior to switching lobes, are preserved in the QMCl system.



Next, in Figure 5, we examine the ability of QMCl to reproduce some of the one- and two-point statistics under the invariant measure of the L63 system, namely the marginal probability density functions (PDFs) and time-autocorrelation functions of the \(a_1\), \(a_2\), and \(a_3\) coordinates. We estimate these statistics numerically using trajectories generated by the two systems spanning 1000 model time units (i.e., \(1000/\Delta t = \text{100,000}\) samples); see Appendix D.4 for further details. It is evident from the results that the QMCl system performs well in terms of reproducing the marginal PDF and time-autocorrelation structure of \(a_1\), \(a_2\), and \(a_3\) with respect to the invariant measure.

Of particular importance in the L63 system is the behavior of the \(a_1\) component. Much of the system’s chaotic behavior can be understood as chaos when the system transitions from one lobe of the attractor to another, manifesting as sign changes of the \(a_1\) coordinate (see, e.g., Figure 2(a)). The autocorrelation plots in Figure 5(b) show that the QMCl system accurately captures the initial correlation decay of \(a_1\) due to these transitions, which takes place over a timescale comparable to the Lyapunov timescale of the system (approximately 1 model time unit). The corresponding PDF is also reproduced reasonably well (see Figure 5(a)), though some differences from the true PDF are visible near \(a_1=0\). These differences are not too surprising given that values of \(a_1\) close to 0 correspond to the mixing region between the two lobes of the attractor where the dynamics are particularly sensitive to perturbations.
Other notable aspects of the results in Figure 5 are the oscillatory nature of the autocorrelation function for \(a_2\) (Figure 5(d)) and the bimodal corresponding PDF (Figure 5(c)) due to oscillations about the unstable fixed points in the center of the two “holes” of the attractor. The QMCl system is seen to consistently reproduce these features. Meanwhile, the time autocorrelation of \(a_3\) (Figure 5(f)) exhibits a damped oscillatory behavior that is again well-reproduced by QMCl.
In calculations not reported here, we observed that replacing the RK4 scheme in the definition of \(\tilde \phi\) by a first-order forward Euler scheme results in a noticeable reduction of statistical accuracy of the QMCl model in terms of the PDFs and autocorrelation functions of \(a_1\), \(a_2\), and \(a_3\). This effect is independent of QMCl since integrating the L63 system (2.4) with a forward Euler scheme of fixed timestep \(\Delta t=0.01\) (as opposed to ode45) was found to impart similar changes to the \(a_1, a_2, a_3\) statistics.
In summary, the results presented in this subsection demonstrate that the QMCl-parameterized system provides an accurate surrogate model of the full L63 dynamics.
6.3. Small, partial training data.
The experiments in subsection 6.2 assumed access to a large training dataset containing full information about the system state. Here we show that with the same parameters but much less training data, \(N = \text{6400}\), along with lower Hilbert space dimensions (\(L=1000\) and \(L=500\)), the QMCl method still performs a reasonable reconstruction of the L63 dynamics. For these experiments, we also impose the constraint that in generating the kernel eigenvectors defining \(H_{L,N}\) we do not have access to full state vectors \((a_1,a_2,a_3)\). Instead, only the \(a_1\) component is available. We include results with both \(L=500\) and \(L=1000\) to show robustness of the approach with respect to reducing the Hilbert space dimension.
Let \(\tilde W : \Omega \to \mathbb R\) be the map that projects onto the first coordinate, i.e., \(\tilde W(\omega ) = a_1\) with \(\omega = (a_1, a_2, a_3)\). Let also \(\tilde w_n = \tilde W(\hat \omega_n)\), where \(\hat \omega_0, \hat \omega_1, \ldots, \hat \omega_{N-1} \in \Omega\) is the training trajectory in state space (defined as in subsection 6.2 but assumed here unobserved). To enrich the data \(\tilde w_n\) with information lost due to projection by \(\tilde W\), delay-coordinate embedding is used on this one-dimensional time series. Setting \({\mathcal W} = \mathbb R^Q\), where \(Q \in{\mathbb N}\) is an even parameter corresponding to the number of delays, we define the delay-coordinate map \(W : \Omega \to{\mathcal W}\) such thatFrom the theory of delay-coordinate maps [81] it is known that for sufficiently large \(Q\), and with “high probability” in a suitable sense, \(W\) is an injective map on compact subsets of \(\Omega\) and thus the support of the invariant measure \(\mu\) (which is compact since it is contained in an absorbing ball under the L63 dynamics [58]). Thus, for sufficiently large \(Q\), training data obtained through the map \(W\) should be theoretically sufficient to build a basis for the entire Hilbert space \(H\). Importantly for practical applications, we havewhich means that we can evaluate \(W\) on the dynamical states \(\hat \omega_n\) underlying the training data without knowledge of these states. In particular, we can compute kernel matrices and build an associated data-driven basis of \(H_{L,N}\) as described in Appendix C.2 using data \(\hat w_n = W(\hat \omega_n)\) sampled from \(W\). It is worthwhile noting that as the number of delays \(Q\) increases, the kernel eigenfunctions obtained from delay-coordinate-mapped data tend to span approximately Koopman-invariant subspaces [26, 37], which improves the quality of Koopman/transfer operator approximation in these subspaces. This fact motivates using delay-coordinate maps even when full system states are available for training.
\begin{align} W(\omega ) = (\tilde W(\Phi^{-Q/2}(\omega )), \tilde W(\Phi^{-Q/2+1}(\omega )), \ldots, \tilde W(\Phi^{Q/2}(\omega )). \end{align}
(6.1)
\begin{align*} W(\hat \omega_n) = ( \tilde w_{n-Q/2}, \tilde w_{n-Q/2+1}, \ldots, \tilde w_{n+Q/2} ), \end{align*}
Figure 6 displays representative trajectory, marginal PDF, and autocorrelation results obtained using \(N=\text{6400}\) training samples from (6.1) and \(Q=10\) delays. For the kernel \(\kappa_{{\mathcal W}}\) used to build the basis, we used the variable-bandwidth kernel proposed in [9], normalized to a symmetric Markov kernel using the approach of [23]. A description of the construction of this kernel can be found in [33]. For the feature-map kernel \(k\), a bandwidth value \(\epsilon=10\) was chosen. One can notice that the partial-training systems still closely approximate both the marginal PDFs (Figure 6(c)) and the autocorrelation functions (Figure 6(d)) of the true system under the invariant measure. However, in both cases, it is evident that the larger, more informative training data used in the experiment of subsection 6.2 allows for a more accurate approximation of the true system; see, e.g., the autocorrelation plots in Figure 6(d). It is also evident that while the initial decay in autocorrelation is captured well by all systems, the \(L=1000\), and particularly \(L=1500\), systems capture the correlations beyond the Lyapunov timescale substantially better than the \(L=500\) system.

6.4. Stochastic closure experiments.
For purposes of establishing a baseline of performance based on a concise and fast stochastic method, we first examine plots generated by the i.i.d. Gaussian closure of [71] (see subsection 2.1.2). Figure 7 shows representative trajectories for the \((a_1,a_2)\) and \((a_1,a_2,a_3)\) components generated by this system. Though both attractor lobes are visible, there is noticeably less accuracy in the behavior when it comes to transitions between lobes—this is reflected in the autocorrelation plots shown in Figure 8(d). The stochastic closure also fails to reproduce the characteristic “holes” around the fixed points in the lobes of the attractor. This is reflected in the \(a_1\) PDF from the Gaussian closure in Figure 8(c) which exhibits high probability density for values of \(a_1\) that have lower probability density under the true system due to the holes.


Viewing Figure 7(b) allows us a glimpse into what is going on “under the hood” of the stochastic Gaussian closure. Since the choice of the \(a_3\) component is Gaussian i.i.d. for each timestep, the \(a_3\) component appears simply as random noise, while the useful approximation of the attractor manifests only in the \((a_1, a_2)\) projection (see Figure 7(a)). On the other hand, in Figure 3(b) we can see that the deterministic QMCl approach reconstructs the entire attractor, even including the parameterized dimension. In some applications, such as climate dynamics, the increase in computation cost and requisite information in QMCl (see Appendix C.5) may result in parameterizations based on cheaper parametric stochastic models, such as the i.i.d. Gaussian closure of the L63 system being favorable. However, the ability to accurately reconstruct entire attractors may allow for the reproduction of more complicated and subtle dynamical properties in the resolved variables, which could in turn have value in areas where additional computation time would be worth investing.
In comparison with the deterministic QMCl and i.i.d. Gaussian closure, we also examine the stochastic QMCl closure described in subsection 4.5. In this experiment, we use \(N = \text{90,000}\) samples of the state vector (i.e., \(W=\operatorname{Id}\) as in subsection 6.2) and \(L=1200\) eigenfunctions to build the QMCl Hilbert space \(H_{L,N}\). The quantum state was updated through the observation kernel after every \(r=10\) timesteps. The bandwidth parameter for the kernel \(\kappa_{{\mathcal W}}\) used to build the basis was algorithmically chosen to be \(\epsilon_{{\mathcal W}} = \sqrt{19}\), and the bandwidth parameter for the feature map kernel \(k\) was chosen to be \(\epsilon=35\).
It is worth noting that a relatively large value of \(\epsilon\) was chosen for this experiment (cf. \(\epsilon=2\) and \(\epsilon=10\) in subsections 6.2 and 6.3, respectively) due to numerical stability issues. Namely, for values of \(\epsilon\) comparable to the deterministic QMCl experiments, the stochastic QMCl system tends to run into numerical errors associated with the effect-update step if the current point lands too far from the training data (meaning that, for some \(x_n \in{\mathcal X}\) under the parameterized dynamics, \(k(x_n,\hat x_m)\) is numerically zero for all \(\hat x_m\) in the training dataset). Increasing \(\epsilon\) makes the system more stable with respect to this particular problem, at the expense of a less informative kernel. The deterministic QMCl system is less noisy, and it appears inherently less likely to run into this issue, allowing for the value of \(\epsilon\) to be chosen significantly smaller.
In Figures 8(c) and 8(d), we compare the marginal PDF and time-autocorrelation function of the \(a_1\) variable, respectively, of the true system, the deterministic and stochastic QMCl systems, and the i.i.d. Gaussian closure. We can notice that the stochastic QMCl closure comes closer to reconstructing the correct autocorrelation function than the i.i.d. Gaussian closure. As is clear from Figure 8(a), the stochastic QMCl algorithm recovers some qualitative aspects of the L63 system but fails in other important respects. In Figure 8(b), we can see that the three-dimensional \((a_1,a_2,a_3)\) trajectories are somewhat more coherent than their counterparts from the Gaussian closure in Figure 7(b) (since the stochastic flux terms in QMCl are not i.i.d., and their distribution depends on the resolved variables), but they are still dominated by noise. Moreover, similarly to the Gaussian closure, the “holes” of the attractor lobes (Figure 8(a)) are not adequately recovered. In Figure 8(c), we notice that the \(a_1\) histograms have significantly different structures—namely, the trajectories generated from the stochastic QMCl system tend to cluster in the two lobes, while the true system has a high density of points in the central transition region. Evidently, while the “double wing” nature of the attractor is present in the stochastic QMCl system, the smoothness of the transition between lobes, the general distribution of points, and the rate and qualitative behavior of the system as it transitions are all less accurate to the true system than the deterministic QMCl system is.
7. Lorenz 96 multiscale.
The L96 multiscale model [64, 31] is a system of equations of \(K\) variables \(\{ x_k \}_{k=1}^K\) and \(JK\) variables \(\{y_{j,k}\}_{j,k=1}^{J,K}\) defined bywhere \(\bar y_k = \sum_j y_{j,k}/ J\), and \(F\), \(h_x\), \(h_y\), and \(\varepsilon\) are real parameters. The L96 multiscale system is a more comprehensive model of atmospheric dynamics than the L63 system. For \(\varepsilon \ll 1\), it is a multiscale system, in which the variables \(x_k\) vary slowly in time and each has an associated set \(\{y_{j,k}\}_{j=1}^J\) of variables which vary quickly. Each slow variable \(x_k\) is only influenced by the fast variables via the average value \(\bar y_k\) of its associated fast variable set. The L96 multiscale system has been extensively used as a testbed for parameterization schemes; see, e.g., [91, 25, 2, 50, 20, 42], among many references. The objective of parameterization in this case is to approximate the behavior of the slow variables. That is, we have \(x = (x_1, \ldots, x_K) \in{\mathcal X} \equiv \mathbb R^K\), \(y = ( y_{1,1}, \ldots, y_{J,K} ) \in{\mathcal Y} \equiv \mathbb R^{JK}\), and \(Z:{\mathcal Y} \to{\mathcal Z} = \mathbb R^K\) with \(Z(y) = (\bar y_1, \ldots, \bar y_K)\) for the resolved variables, unresolved variables, and flux terms, respectively.
\begin{align} \begin{gathered} \begin{aligned} \dot x_k &= -x_{k-1} \left( x_{k-2} - x_{k+1} \right) - x_k + F - h_x \bar y_k, \\ \dot y_{j,k} & =\frac{1}{\varepsilon } \left( - y_{j+1,k} \left( y_{j+2,k} - y_{j-1,k} \right) - y_{j,k} + h_y x_k \right), \end{aligned}\\ x_{k+K} = x_k, \quad y_{j,k+K}=y_{j,k}, \quad y_{j+J,k} = y_{j, k+1}, \end{gathered} \end{align}
(7.1)
As \(\varepsilon \to 0\), the L96 multiscale system is known to exhibit an averaging limit [73] in which the evolution of the \(x_k\) variables is Markovian and is governed by the system of equationsfor some functions \(\hat Z_1, \ldots, \hat Z_K :{\mathcal X} \to \mathbb R\). Following [15, 33], we choose the parameters \(K=9\), \(J=8\), \(h_x = - 0.8\), \(h_y=1\), and \(\varepsilon=1/128\). The resulting dynamical regime is chaotic, with approximately Markovian dynamics for the \(x_k\) variables.
\begin{align*} \dot x_k = - x_{k-1} \left( x_{k-2} - x_{k+1} \right) -x_k + h_x \hat Z_k(x) \end{align*}
7.1. Quantum mechanical closure experiments.
Let \(\Phi^t : \Omega \to \Omega\) with \(\Omega ={\mathcal X} \times{\mathcal Y}\) and \(t \in \mathbb R\) be the flow generated by (7.1). Similarly to the L63 experiments in section 6, we consider a discrete-time system \(\Phi : \Omega \to \Omega\) with \(\Phi = \Phi^{\Delta t}\) obtained by temporal subsampling of the flow. Here the timestep is \(\Delta t=0.01\) model time units. As training data for QMCl, we use time series \(\hat w_0, \ldots, \hat w_{N-1} \in{\mathcal W} \equiv{\mathcal X} \equiv \mathbb R^K\) and \(\hat z_0, \ldots, \hat z_{N-1} \in{\mathcal Z}\), where \(\hat w_n = \mathcal P_{{\mathcal X}}(\hat \omega_n)\), \(\hat z_n = Z(\hat \omega_n)\), and \(\hat \omega_0, \ldots, \hat \omega_{N-1} \in \Omega\) with \(\hat \omega_n = \Phi^n(\hat \omega_0)\). In particular, we build the basis of \(H_{L,N}\) using information from only the slow variables. The numerical trajectory \(\hat \omega_n\) is generated using the ode15s solver of MATLAB which is appropriate for stiff problems. The number of training samples is \(N = \text{40,000}\), and the initial condition \(\hat \omega_0\) is taken on the trajectory starting from \((x_1, \ldots, x_K) = ( 1, 0, \ldots, 0)\) and \((y_{1,1}, \ldots, y_{J,K}) = (1.1, 0,\ldots,0)\) after an equilibration time interval of 200 model time units (i.e., similarly to the L63 experiments; see subsection 6.2).
We build the kernel eigenbasis of the Hilbert space \(H_{L,N}\) with dimension \(L=1900\) using the radial Gaussian kernel \(\kappa_{{\mathcal W}}\) from (3.9). As in the L63 experiments from section 6, the bandwidth parameter \(\epsilon_{{\mathcal W}} \approx 3.16\) was tuned automatically. Moreover, the evolution map \(\tilde \phi :{\mathcal X} \times{\mathcal Z} \to{\mathcal X}\) is based on the RK4 scheme in (D.13). We evolve the quantum states using the map \(\psi_r :{\mathcal X} \times Q(H_{L,N}) \to Q(H_{L,N})\) with \(r=5\) Koopman evolution steps (i.e., \(r \, \Delta t=0.05\) model time units) between each state conditioning via (4.4). The effect-valued feature map \(\tilde{\mathcal F}_{L,N}\) was based on a radial Gaussian kernel, here with bandwidth parameter \(\epsilon ={2}\). Similarly to the L63 experiments, we assess the performance of QMCl in terms of its ability to reproduce salient qualitative features of the true dynamics, as well as marginal PDFs and time-autocorrelation functions of the revolved variables. In these tests, the true and QMCl systems are initialized on the trajectory starting from \((x_1, \ldots, x_K) = (1,0,\ldots, 0)\) and \((y_{1,1}, \ldots, y_{J,K}) = (1, 0, \ldots, 0)\) after an equilibration period of 1000 model time units. We use 100,000 samples (i.e., \(\text{100,000}\,\Delta t=1000\) model time units) on the true and QMCl trajectories starting at that point to compute PDFs and autocorrelation functions. The initial QMCl state is obtained via (4.7).
Figure 9 displays Hovmoller diagrams (space–time heat maps) of the \(x_k\) variables under the true L96 and QMCl dynamics. In our chosen dynamical regime, the \(x_k\) variables exhibit characteristic propagating patterns which can be thought of as crude representations of eastward-propagating disturbances in Earth’s midlatitude atmosphere. The dynamics of these wave-like structures, including their aperiodic emergence and decay, appear to be qualitatively well-captured by the QMCl system.

As a more quantitative test, in Figure 10 we compare the marginal PDFs and time-autocorrelation functions of \(x_1\) under the true L96 and QMCl dynamics. We can see in Figure 10(b) that the autocorrelation under the QMCl system matches the periodicity present in the true system, though the decay of the correlation amplitude is somewhat faster in the QCMl system. The \(x_1\) PDFs (Figure 10(a)) are also in good agreement between the true and QMCl systems.

8. Summary and discussion.
We have developed a data-driven framework for closure of dynamical systems that combines aspects of quantum theory, ergodic theory, and kernel methods for machine learning. Our approach, called quantum mechanical closure (QMCl), models the unresolved degrees of freedom as a finite-dimensional quantum mechanical system that is coupled to a classical model governing the resolved variables. The state of the quantum system is a density operator (a quantum mechanical analogue of a probability density) that evolves dynamically under the induced action of the Koopman operator. The fluxes from the unresolved variables to the resolved variables are modeled as expectations of quantum observables (self-adjoint operators) that provide the closure terms needed to advance the resolved variables given the current quantum state. Meanwhile, the state of the resolved variables updates the quantum state by conditioning, using an operator-valued feature map in a step that can be viewed as a quantum mechanical analogue of Bayes’ rule. The resulting two-way coupling between the classical and quantum systems for the resolved and unresolved degrees of freedom, respectively, resembles the prediction–correction cycle of sequential data assimilation (filtering).
QMCl has a data-driven implementation in which a dataset of \(N\) samples of the resolved variables is used to learn a basis for an \(L\)-dimensional Hilbert space \(H_{L,N}\) consisting of eigenfunctions \(\phi_{l,N}\) of a kernel integral operator. The QMCl model is then built over \(H_{L,N}\), and all operators employed in the scheme are numerically represented as \(L\times L\) matrices with respect to the \(\{\phi_{l,N}\}\) basis. The data-driven formulation of QMCl has a well-characterized large-data limit, \(N\to \infty\), leveraging results from spectral approximation of kernel integral operators. In addition, the method has a stochastic variant wherein the closure terms are obtained by random draws at each timestep from the probability distribution of quantum mechanical measurements induced by the quantum state.
We have applied QMCl to deterministic and stochastic closure experiments involving the L63 (section 6) and L96 multiscale (section 7) systems, the latter in a chaotic regime with timescale separation. In these experiments, QMCl was able to reconstruct the qualitative nature of the resolved variables, as well as their marginal distribution and time-autocorrelation functions with respect to the invariant measure. In the L63 examples, the unresolved variable was also recovered, resulting in a three-dimensional reconstruction of the Lorenz attractor.
We end the paper with a discussion of some of the salient features of QMCl and possible avenues for future work.
8.1. Positivity preservation.
A novel aspect of QMCl is that it casts the problem of closure of dynamical systems in the setting of operator algebras and the associated quantum probability theory. What we have argued in this paper is that the structure of these spaces naturally leads to computational algorithms with improved structure-preservation properties compared to formulations in abelian algebras of classical observables. In particular, as a result of basic properties of projections on operator algebras, QMCl represents positive classical observables by positive operators, whereas orthogonal projections onto finite-dimensional function spaces are not, in general, positivity preserving. By virtue of the same properties, QMCl is able to represent classical probability densities by “honest-to-god” density operators in finite dimensions, whereas finite orthogonal basis expansions of probability densities are not, in general, positive functions (and thus not normalizable to probability densities).
We believe that the positivity preservation enjoyed by QMCl is a useful property in applications. In geophysical fluid dynamics, for instance, many relevant physical quantities such as density, moisture, and pressure by definition take nonnegative values. Since negative values of such variables have no physical meaning, a parameterization scheme which generates negative-valued outputs of these observables could have particularly problematic effects on the system as a whole. Core conservation laws, regarding mass, for example, could be violated by such a negative quantity, leading to poor, or even meaningless, predictions as time advances. In the context of parameterizations of climate models, some of the biases and numerical instabilities exhibited by data-driven parameterization schemes have indeed been attributed to failure to respect physical constraints such as energy conservation and positivity of precipitation [93], and it is possible that QMCl may provide a route to addressing such issues.
8.2. Stochastic versus deterministic approaches.
In areas such as climate modeling, stochastic parameterization as a general approach has been increasingly studied and used to successful ends [6]. The stochastic mode of the QMCl framework (subsection 4.5) can be thought of as a new stochastic parameterization algorithm. In this view, at each time, the probability distribution associated with the stochastically drawn flux term \(z\) is determined by a generalized PDF which is the quantum density operator \(\rho\). Though the framework of QMCl fundamentally alters the mathematical structure underlying the probabilistic dynamics of the distribution, as a whole this can nonetheless can be thought of in the same general terms as stochastic parameterizations based on classical probability theory.
The deterministic approach (subsections 4.1 to 4.4), instead of drawing the value of \(z\) randomly from an induced probability measure, directly chooses \(z\) to be the expectation value of a quantum observable given the quantum state \(\rho\). This is a natural choice as an estimator of the true flux given the unresolved variables, and is similar in spirit to statistical closure methods that estimate fluxes through classical expectations [50], including methods based on ensemble data assimilation [20, 42, 21, 60].
Upon first glance, one might imagine that dropping stochasticity counteracts the primary benefit of this method as an approach compared to more standard approaches to parameterization. Indeed, moving away from deterministic functions of the state as parameterizations, due to their inherent dynamics-flattening properties, is one reason why we were interested in such techniques to begin with. However, while the choice of \(z\) at a given time is deterministic in this mode, it is not a function of the classical state at a given time—it is a function of the quantum state. That is, instead of our deterministic parameterization function being a function on the space of unresolved variables \(\mathcal Y\), it is now a function on the space of \(L \times L\) density matrices. The latter is a convex subset of the set of \(L \times L\) Hermitian matrices of (real) dimension \(L^2 - 1\) [5]. Assuming that the initial state is pure (i.e., a rank-1 projection, as is the case in the experiments in sections 6 and 7), all subsequent states produced by the QMCl algorithm are pure and thus lie in a subset of the set of all quantum states of dimension \(2(L-1)\). Since \(L\) can be very large compared to the dimension of \(\mathcal Y\), our parameterization is no longer a dimension reduction but in fact a massive dimension expansion. These additional dimensions allow for much more information to be carried through time and allow us to avoid the dynamics flattening that comes from a reduction approach.
A number of benefits arise from the deterministic variant of QMCl. For one, it is significantly computationally cheaper than the stochastic version since the step of generating a probability distribution and drawing points from it is replaced by the comparatively cheaper calculation of computing \(\operatorname{tr}(\rho (\pi_L \zeta^{(i)} ))\) from (4.1) (particularly when \(\rho\) is a pure state, in which case the computational cost is \(O(L)\)). Furthermore, in systems such as L96 multiscale (section 7), where multiple draws need to be taken at each timestep, we have a simple solution to the problem of choosing how to correlate the approximations of the various unresolved components. In the stochastic setup, all parameters were drawn separately (from the same quantum state but nonetheless independently). Potentially important relations between what the values of the various parameters could be at a given time were thus lost—regardless of the accuracy of the PDFs, individual variation could result in sets of parameters which could not actually exist in reality. In the deterministic approach, this problem is implicitly resolved. All flux terms are taken to be the expectation of their observable \(\pi_L \zeta^{(i)}\) associated with a given quantum state, so there is an implicit correlation between the parameters which naturally arises from the quantum state itself. Finally, it is not unreasonable to think that analyzing convergence properties and error bounds of the deterministic approach may be easier in future research than a stochastic mode.
In the L63 results of subsection 6.4, we can already see clear benefits of using the expectation value over a random draw. What is manifest in the trajectory plots in Figure 3, along with the marginal PDF and time-autocorrelation plots in Figure 5, is that the quantum state appears to be carrying sufficient information to resolve the \(a_3\) component at each time in a way that meaningfully corresponds to the underlying dynamics. To achieve this, however, we had to use the expectation value.
8.3. Future work.
This work motivates future research in a number of directions. First, on a fundamental level, it would be useful to establish sufficient conditions for the convergence of the QMCl model to the true model as the dimension \(L\) increases. This problem could be approached using projection operator techniques [43] such as the Mori–Zwanzig formalism. It is also natural to explore extensions of QMCl to closure of semiclassical systems, for instance semiclassical models of plasma dynamics where connections with Koopman operator theory have already studied using the Koopman–von Neumann formalism [53]. Second, it would be fruitful to explore ways of improving the computational scalability of QMCl, both with regards to training and out-of-sample evaluation. To that end, methods for kernel approximation based on random features [76] appear well-suited to reduce the computational cost of the current brute-force kernel computations, which is \(O(N^2)\) for training (basis function computation) and \(O(N)\) for out-of-sample evaluation (operator-valued feature map). Random feature methods can reduce these costs to \(O(N)\) and \(O(1)\), respectively, while allowing streaming data processing in the training phase [38]. Another direction would be to employ methods for kernel learning [70] so as to optimize the kernels used in the computation of the basis and operator-valued feature map with respect to an objective. A longer-term goal would be to develop implementations of QMCl on quantum computers. Recent approaches for simulation of dynamical systems on quantum computers based on mathematical techniques closely related to QMCl have shown promising results for simple ergodic dynamical systems [39], and it would be interesting to explore whether these methods, or other methods for quantum simulation of classical dynamics (see, e.g., [52, 62, 74]), can be extended in a parameterization context. Here a challenge would be how to handle sequential two-way interactions between the classical and quantum computational systems representing the resolved and unresolved dynamics, respectively. We believe that addressing this and other related problems would be fruitful areas for future work.
Appendix A. Quantum theory background.
As a statistical theory, quantum mechanics can be looked at as a generalization of Bayesian probability theory [49, 47]. Quantum mechanics deals directly with objects (e.g., quantum particles) for which a measurement of any observable at a given time is inherently probabilistic in nature. The theory of quantum mechanics deals with understanding how the generalized probability distributions associated with these objects (which, in some views, constitute the objects themselves) evolve over time and under experimental observations of the objects’ properties. The probability theory associated with quantum systems, however, is generalized beyond classical probability theory. Quantum probabilistic dynamics are understood through non-abelian algebras of operators on Hilbert spaces and spectral theory applied to these operators. The QMDA and QMCl methodologies arise from a parallel between the probabilistic dynamics formulated in this way and the mathematical structure of Koopman operator theory [36, 33].
A standard framework of quantum probabilistic dynamics arises from the Dirac–von Neumann quantum mechanical axioms. In this appendix, we provide an overview of the mathematical objects and evolution rules associated with this theory in order to get a general view of how the mathematical systematization of quantum mechanics can be applied to dynamical systems more broadly. More detailed explication of the foundations of this theory can be found, e.g., in [80, 16, 88].
A.1. Hilbert spaces and quantum observables.
In the Dirac-von Neumann framework, associated to every quantum system is a Hilbert space \(\left( \mathcal H, \langle \cdot, \cdot \rangle \right)\) over the complex numbers. The precise choice of \(\mathcal H\) will depend on the specific system being studied. Quantum observables are defined as elements of the set of self-adjoint (possibly unbounded) linear operators \(A: D(A) \to \mathcal H\), where the domain \(D(A)\) of \(A\) is a dense subspace of \(\mathcal H\).
Every quantum observable is associated with a measurable real-valued quantity of the quantum system (such as momentum, spin, etc.). Importantly, if the quantity associated with a quantum observable \(A\) is measured in an experimental setting, the possible values of the measurement are contained in the spectrum of \(A\). The self-adjointness condition ensures that such measurements are real-valued.
Let \(B(\mathcal H)\) be the space of bounded operators on \(\mathcal H\). From the spectral theorem for self-adjoint operators, associated with every quantum observable \(A : D(A) \to \mathcal H\) is an operator-valued measure \(E_A: \mathcal{B}(\mathbb{R}) \to B(\mathcal H)\) on the Borel \(\sigma\)-algebra \(\mathcal B(\mathbb R)\) over \(\mathbb R\) such that (i) for every Borel set \(S \in \mathcal B(\mathbb R)\), \(E_A(S)\) is an orthogonal projection; (ii) \(E(\emptyset ) = 0\); (iii) \(E_A(\mathbb R) = \operatorname{Id}\); and (iv) for every countable collection \(\{S_i \}\) of pairwise-disjoint sets \(S_i \in \mathcal B(\mathbb R)\), we have \(E_A(\bigcup_i S_i) = \sum_i E(S_i)\), where the sum over \(i\) converges in the strong topology of \(B(\mathcal H)\). Using \(E_A\), one can reconstruct the operator \(A\) through the spectral integral \(A = \int_{\mathbb{R}} a \,dE_A(a)\).
The measure \(E_A\) plays an important role in calculations, and it can be resolved as a finite discrete sum of projections for QMCl applications (see Appendix B.2). In the general theory, however, the spectrum of \(A\) has a continuous component and \(A\) cannot be finitely resolved in this way. Unlike classical observables lying in commutative algebras of functions, quantum observables do not, in general, commute. One of the signature characteristics of quantum mechanics arises from this fact—noncommutative observables cannot be measured simultaneously at arbitrarily high precision, giving rise to the famed uncertainty principle.
A.2. Quantum states.
Let \(B_1(\mathcal H) \subseteq B(\mathcal H)\) be the space of trace class operators on \(\mathcal H\). The quantum states of the system are defined as elements of the set \(Q(\mathcal H) \subset B_1(\mathcal H)\) consisting of all positive, trace class operators of unit trace, i.e.,Such operators \(\rho \in Q(\mathcal H)\) are known as density operators. Quantum states can be thought of as roughly analogous to classical probability densities in an \(L^1\) space, representing the degree of knowledge about the system’s current state. With this in mind, a density operator’s positivity and unity of trace can be thought of as analogous to the positivity and normalization of a classical probability density, respectively. We say that \(\rho \in Q(\mathcal H)\) is a pure state if there exists a unit vector \(\xi \in \mathcal H\) such that \(\rho = \langle \xi, \cdot \rangle \xi\), i.e., \(\rho\) is a rank-1 projection along \(\xi\). Otherwise, \(\rho\) is said to be mixed.
\begin{align*} Q(\mathcal H) = \{\rho \in B_1(\mathcal H) \mid \text {$\operatorname {tr}\rho =1$, $\rho \geq 0$}\}. \end{align*}
Every quantum state \(\rho \in Q(\mathcal H)\) induces a continuous, unital, positive linear functional \(\mathbb E_\rho : B(\mathcal H) \to \mathbb C\) on bounded operators defined asintuitively, we think of this formula as being analogous to an expectation of a bounded random variable with respect to a classical probability density. If \(A\) is a self-adjoint quantum observable (not necessarily bounded), the associated projection-valued measure \(E_A\) induces a probability distribution for experimental measurements of \(A\). Specifically, the probability that a measurement \(a \in \mathbb{R}\) of the quantum observable \(A\) lies in a Borel set \(S \in \mathcal B(\mathbb R)\) is given byThis general formula holds for all quantum observables, and thus the quantum state \(\rho\) alone determines the probability distribution of all quantum observables of a system.
\begin{align} \mathbb E_\rho A = \operatorname{tr}(\rho A); \end{align}
(A.1)
\begin{align} \mathbb P_{\rho,A}\left( a \in S \right) = \mathbb E_\rho E_A(S). \end{align}
(A.2)
A.3. Quantum evolution and measurement.
In the Dirac–von Neumann axioms, there are two evolution rules for the quantum state. The first is the time-evolution equation. Letting \(\mathcal T\) denote either \(\mathbb Z\) or \(\mathbb R\), respectively, for discrete- or continuous-time systems, suppose that \(\rho_0 \in Q(\mathcal H)\) is the initial quantum state of a system, and \(\rho_t\) is the quantum state of the system at a time \(t \in \mathcal T\) such that no intervening measurement took place. The quantum mechanical time evolution axiom posits that there exists a group of unitary operators \(\{ U^t \in B(\mathcal H) \}_{t\in \mathcal T}\) such thatThe collection \(\{ \mathcal P^t\}_{t \in \mathcal T}\) extends to an evolution group on \(B_1(\mathcal H)\), which can be thought of as an analogue of the group of transfer operators of a measure-preserving dynamical system [4] acting on densities in \(L^1\). If the quantum system under study is open, i.e., it is allowed to interact with its environment, (A.3) is replaced by evolution under a semigroup \(\{ \mathcal P^t : B_1(\mathcal H) \to B_1(\mathcal H) \}_{t \in \mathcal T_+}\) which is required to be trace nonincreasing; i.e., we have \(\operatorname{tr}(\mathcal P^t \rho ) \leq 1\) rather than the stronger property \(\operatorname{tr} (\mathcal P^t \rho ) = 1\) under (A.3). In either case, the evolution of quantum states under \(\mathcal P^t\) is dual to the evolution of quantum observables given by the operators \(\mathcal U^t : B(\mathcal H) \to B(\mathcal H)\) withi.e., we have \(\mathbb E_{\mathcal P^t \rho_0} A = \mathbb E_{\rho_0} (\mathcal U^t A)\). In quantum mechanics, the evolution of states and observables in (A.3) and (A.4) is known as the Schrödinger and Heisenberg picture, respectively.
\begin{align} \rho_t = \mathcal P^t \rho_0 := U^{t*} \rho_0 U^t. \end{align}
(A.3)
\begin{align} \mathcal U^t A = U^t A U^{t*}; \end{align}
(A.4)
The second evolution rule addresses how measurements of quantum observables affect the quantum state. Suppose a quantum system is in state \(\rho \in Q(\mathcal H)\) at a time \(t\), and a quantum observable \(A\) is measured at the time \(t\). Then, if the measurement takes a value \(a \in \mathbb R\) such that \(E_A(\{a\}) \neq 0\) (which necessarily means that \(a\) is an eigenvalue of \(A\)), the quantum state of the system is now given by \(\rho |_a \in Q(\mathcal H)\), whereThe update \(\rho \mapsto \rho |_a\) is oftentimes referred to as a von Neumann measurement and can be thought of as an analogue of the Bayesian conditioning rule of classical probability theory.
\begin{align} \rho |_a = \frac{E_A(\{a\}) \rho E_A(\{a\})}{\operatorname{tr}(E_A(\{a\}) \rho E_A(\{a\}))}. \end{align}
(A.5)
In (A.5), the projections \(E_A(\{a\})\) are examples of quantum effects, defined as the operators in the setIntuitively, one can think of effects as being analogous to events in classical probability theory. The projections \(e \in \mathcal E(\mathcal H)\) represent events which are “sharp” in the sense that \(e^2 = e\), whereas other effects such that \(e^2 \lt e\) can be thought of as being “fuzzy.”
\begin{align} \mathcal E(\mathcal H) = \{ e \in B(\mathcal H) \mid 0 \leq e \leq \operatorname{Id} \}. \end{align}
(A.6)
A generalization of (A.5) that represents conditioning of \(\rho\) by an arbitrary effect \(e \in \mathcal E(\mathcal H)\) isThis update rule reduces to (A.5) when \(e = e^2 = E_A(\{a\})\) is a projection. In applications involving physical quantum systems, (A.7) is used to model the state change of a quantum system being measured due to interactions with the measuring apparatus, a process sometimes called “state collapse.” Equation (A.7), however, also applies more broadly as an inference rule within abstract quantum probability theory [82], which is how we use it in our QMCl schemes; that is, as the “quantum state update” step of Figure 1.
\begin{align} \rho |_e = \frac{\sqrt e \rho \sqrt e}{\operatorname{tr}(\sqrt e \rho \sqrt e)}. \end{align}
(A.7)
A.4. Von Neumann algebras.
Consider the space of classical observables \(L^\infty (\mu )\) from subsection 3.1. In addition to being a Banach space as are the other \(L^p(\mu )\) spaces, \(L^\infty (\mu )\) has the distinguished property of being a von Neumann algebra [87] with respect to pointwise function multiplication and complex conjugation. This means the following:By commutativity of function multiplication, \(L^\infty (\mu )\) is an abelian algebra. Given a probability density \(p \in L^1(\mu )\), i.e., a positive function satisfying the normalization condition \(\int_\Omega p \, d\mu = 1\), we have a continuous, positive linear functional \(\mathbb E_p : L^\infty (\mu ) \to \mathbb C\), defined as the expectationLetting \(P(\mu ) \subset L^1(\mu )\) be the space of probability densities in \(L^1(\mu )\), i.e.,we can think of elements of \(P(\mu )\) as states of the algebra \(L^\infty (\mu )\).
1.
For any \(f, g \in L^\infty (\mu )\), the pointwise product \(fg\) lies in \(L^\infty (\mu )\), and we havemaking \(L^\infty (\mu )\) a \(C^*\)-algebra.
\begin{align*} \lVert f g \rVert_{L^\infty (\mu )} \leq \lVert f \rVert_{L^\infty (\mu )} \lVert g \rVert_{L^\infty (\mu )}, \quad \lVert f^* f \rVert_{L^\infty (\mu )} = \lVert f \rVert_{L^\infty (\mu )}^2, \end{align*}
2.
\(L^\infty (\mu )\) is the continuous dual of the Banach space \(L^1(\mu )\), making it a von Neumann algebra.
\begin{align*} \mathbb E_p f = \int_\Omega p f \,d\mu. \end{align*}
\begin{align*} P(\mu ) = \left\{ p \in L^1(\mu ) \mid \text {$p \geq 0$, $ \int_\Omega p \, d\mu = 1$} \right\}, \end{align*}
The similarities between this construction and the construction of quantum states from Appendix A.2 is not accidental. Given a Hilbert space \(\mathcal H\), the space \(B(\mathcal H)\) is a Banach space equipped with the operator norm, \(\lVert A \rVert_{B(\mathcal H)} = \sup_{u \in \mathcal H} \frac{\lVert A u \rVert_{\mathcal H}}{\lVert u \rVert_{\mathcal H}}\), and it is a von Neumann algebra with respect to operator composition and adjunction. This means the following:Aside from trivial cases, the algebra \(B(\mathcal H)\) is non-abelian, which is a fundamental reason for differences in behavior of mathematical models of classical and quantum systems.
1.
For any \(A, B \in B(\mathcal H)\), the operator product \(AB\) lies in \(B(\mathcal H)\), and we havemaking \(B(\mathcal H)\) a \(C^*\)-algebra.
\begin{align*} \lVert A B \rVert_{B(\mathcal H)} \leq \lVert A \rVert_{B(\mathcal H)} \lVert B \rVert_{B(\mathcal H)}, \quad \lVert A^* A \rVert_{B(\mathcal H)} = \lVert A \rVert_{B(\mathcal H)}^2, \end{align*}
2.
\(B(\mathcal H)\) is the continuous dual of the Banach space \(B_1(\mathcal H)\), equipped with the trace norm \(\lVert A \rVert_{B_1(\mathcal H)} = \operatorname{tr}\sqrt{A^* A}\), making it a von Neumann algebra.
Appendix B. Discretization.
This appendix provides further details on aspects of the discretization procedure described in subsection 3.3 that yields the quantum system on the finite-dimensional Hilbert space \(H_L\) from the infinite-dimensional system on \(H\).
B.1. States.
For a given quantum state \(\rho \in Q(H)\), let \(\sigma_0, \sigma_1, \ldots\) be the projected operators \(\sigma_L = \boldsymbol \Pi_L \rho \in B(H)\). While, in general, the \(\sigma_L\) are not density operators, as \(L\to \infty\) they converge to \(\rho\) in the trace norm of \(Q(H)\). As a result, there exists \(L_* \in{\mathbb N}\) such that for every \(L \gt L_*\), we have \(C_L = \operatorname{tr} \sigma_L \gt 0\), and thusis a quantum state in \(Q(H)\). Every such state can be identified with a state of \(Q(H_L)\). As one can directly verify, for every quantum observable \(A \in B(H)\), we havewhere \(A = \boldsymbol \Pi_L A\). Thus, in the limit of infinite dimension \(L\), the quantum systems on \(H_L\) consistently recover the expectation of any bounded quantum observable on \(H\).
\begin{align} \rho_L := \frac{\sigma_L}{C_L} \end{align}
(B.1)
\begin{align} \lim_{L\to \infty } \mathbb E_{\rho_L} A_L = \mathbb E_\rho A, \end{align}
(B.2)
Let \(\beta_L : H_L \to \mathbb C^L\) be the linear map that maps elements of \(H_L\) to their column vector representation with respect to the \(\{ \phi_l \}\) basis from (3.6), i.e., \(\beta_L f = \boldsymbol f = (f_0, \ldots, f_{L-1})^\top\) with \(f_l = \langle \phi_l, f \rangle\). In applications, we represent a quantum state \(\rho \in Q(H_L)\) by an \(L\times L\) density matrix \(\boldsymbol \rho = \boldsymbol \beta_L \rho\) (see subsection 3.3). If \(\rho = \langle \xi, \cdot \rangle \xi\) is a pure state induced by a unit vector \(\xi \in H_L\), then \(\boldsymbol \rho = \boldsymbol \xi \boldsymbol \xi^\dagger\) is a rank-1 projection matrix along the unit vector \(\boldsymbol \xi = \beta_L \xi \in \mathbb C^L\), where \(^\dagger\) denotes the complex conjugate transpose.
Next, if \(\rho = \Gamma (p) \in Q(H)\) is a pure state from (3.1) induced by a probability density \(p \in L^1(\mu )\), i.e., \(\rho = \langle \xi, \cdot \rangle \xi\) with \(\xi = \sqrt{p}\), then for large-enough \(L\) we have \(\rho_L = \langle \xi_L, \cdot \rangle \xi_L\) with \(\xi_L = \Pi_L \xi/ \lVert \Pi_L \xi \rVert_H\). It should be noted that, in general, \(\xi_L\) is not the square root of a probability density in \(P(\mu )\); in fact, \(\xi_L\) need not be a positive function. Thus, analogously to our discretization of observables from subsection 3.3, our finite-dimensional representations of classical probability densities are through intrinsically quantum mechanical objects.
B.2. Spectral decomposition of observables.
Given a real-valued classical observable \(f\in L^\infty (\mu )\), the corresponding projected multiplication operator \(\pi_L f \in B(H_L)\) is a finite-rank self-adjoint operator. Letting \(\{ a_0, \ldots, a_{J-1} \} \subset \mathbb R\), \(J \leq L\), denote the spectrum of \(\pi_L f\) (which consists entirely of real eigenvalues), we have that \(\pi_L f\) is decomposable in terms of the discrete spectral measure \(E_{\pi_L f} : \mathcal B(\mathbb R) \to B(H_L)\) given bywhere \(E_{\pi_L f}^{(j)} \in B(H_L)\) is the orthogonal projection onto the eigenspace of \(\pi_L f\) corresponding to \(a_j\) (see Appendix A.1). That is, we have
\begin{align} E_{\pi_L f}(S) = \sum_{j: a_j \in S} E_{\pi_L f}^{(j)}, \end{align}
(B.3)
\begin{align*} \pi_L f = \int_{\mathbb R} a \, dE_{\pi_L f}(a) = \sum_{j=0}^{J-1} a_j E_{{\pi_L f}}^{(j)}. \end{align*}
Equation (B.3) may be used to compute the measurement probability from (A.2) given a quantum state \(\rho \in Q(H_L)\), namelyAs \(L \to \infty\), the operators \(\pi_L f\) exhibit spectral convergence to the multiplication operator \(\pi f\) in a suitable sense; see [33] for further details.
\begin{align} \mathbb P_{\rho, \pi_Lf}(S) = \sum_{j:a_j \in S} \operatorname{tr}\left( \rho E_{\pi_L f}^{(j)}\right). \end{align}
(B.4)
B.3. Evolution operators.
Let \(U \equiv U^1\) and \(P \equiv P^1\) be the single-step Koopman and transfer operators on \(H\). For any \(L \in{\mathbb N}\), we define projected operators \(U_L = \boldsymbol \Pi_L U\) and \(P_L = \boldsymbol \Pi_L P\) on \(H_L\) and the corresponding induced operators \(\mathcal U_L : B(H_L) \to B(H_L)\) and \(\mathcal P_L : B_1(H_L) \to B_1(H_L)\) as \(\mathcal U_L A = U_L A U_L^*\) and \(\mathcal P_L A = U_L^* A U_L \equiv P_L A P_L^*\). In general, \(U_L\) and \(P_L\) are not unitary operators. Nevertheless, \(\mathcal P_L\) can be shown to be trace nonincreasing, \(\operatorname{tr}(\mathcal P_L \rho ) \leq 1\) for any \(\rho \in Q(H_L)\). As a result, the operator semigroups \(\{ \mathcal U^n_L \}_{n \in{\mathbb N}}\) and \(\{ \mathcal P^n_L \}_{n \in{\mathbb N}}\) generated by \(\mathcal U_L\) and \(\mathcal P_L\), respectively, define an open quantum system on \(H_L\). Note that if \(H_L\) is a \(U\)-invariant subspace of \(H\), then \(U_L : H_L \to H_L\) and \(P_L : H_L \to H_L\) are unitary, and we can define associated unitary evolution groups \(\{ \mathcal U^n_L \}_{n \in{\mathbb N}}\) and \(\{ \mathcal P^n_L \}_{n \in{\mathbb N}}\). However, every such \(U\)-invariant subspace \(H_L\) must necessarily admit a basis of Koopman eigenfunctions, and we cannot in general assume existence of such a basis. Irrespective of whether \(U\) and \(P\) are unitary, for any number of timesteps \(n \in{\mathbb N}\), observable \(A \in B(H)\), and quantum state \(\rho \in Q(H)\), we havewhere \(A_L = \boldsymbol \Pi_L A\) and \(\rho_L\) is given by (B.1).
\begin{align*} \lim_{L \to \infty } \mathbb E_{\mathcal P^n_L \rho_L} A_L = \mathbb E_{\mathcal P^n \rho } A, \end{align*}
B.4. Discretization of effects.
We use the projection maps \(\boldsymbol \Pi_L\) to discretize effects similarly to our discretization of quantum observables from subsection 3.3. First, observe that since \(\boldsymbol \Pi_L A \leq A\) whenever \(A \in B(H)\) is positive, the projections \(\boldsymbol \Pi_L\) map effects into effects; that is, we can view \(\boldsymbol \Pi_L\) as a map from \(\mathcal E(H)\) to \(\mathcal E(H_L)\). Using (B.2), it follows that for every observable \(A \in B(H)\), state \(\rho \in Q(H)\), and effect \(e \in \mathcal E(H)\),where \(A_L = \boldsymbol \Pi_L A\), \(\rho_L\) is given by (B.1), and \(e_L = \boldsymbol \Pi_L e\). Thus, conditioning by the projected effects \(e_L\) consistently recovers conditioning by \(e\) in the infinite-dimensional limit. Note that if \(e\in \mathcal E(H)\) is “classical,” i.e., it is a multiplication operator by characteristic function of a classical event, \(e = \pi \chi_S\) for some measurable set \(S \subseteq \Omega\), the projected effect \(e_L = \pi_L \chi_S\) is in general not a multiplication operator. Thus, in the context of the finite-dimensional quantum systems on \(H_L\), state conditioning takes place by generally non-classical events.
\begin{align*} \lim_{L\to \infty } \mathbb E_{\rho_L\rvert e_L} A_L = \mathbb E_{\rho \rvert_e} A, \end{align*}
Appendix C. Data-driven approximation.
This appendix contains further details on the data-driven formulation of QMCl described in subsection 5.2.
C.1. Finite-dimensional Hilbert space.
Let \(\mu_N\) be the discrete sampling measure on \(\Omega\) supported on the training trajectory \(\hat \omega_0, \ldots, \hat \omega_{N-1}\) from subsection 5.2, defined as \(\mu_N = \sum_{n=-1}^{N-1} \delta_{\hat \omega_n}/ N\), where \(\delta_{\hat \omega_n}\) is the Dirac \(\delta\)-measure supported on \(\hat \omega_n \in \Omega\). As a finite-dimensional analogue of \(H=L^2(\mu )\), we employ the finite-dimensional Hilbert space \(\hat H_N := L^2(\mu_N)\), equipped with the inner productNote that the elements of \(\hat H_N\) are equivalence classes of functions on \(\Omega\) with common values at the sampled states \(\hat \omega_n\). Assuming, for simplicity of exposition, that the points \(\hat \omega_0, \ldots, \hat \omega_{N-1}\) are all distinct (which holds true aside from special cases such as \(\hat \omega_0\) being an eventually periodic point under \(\Phi\)), \(\hat H_N\) has dimension \(N\). Thus, every element \(f \in \hat H_N\) can be represented by a column vector \(\boldsymbol f = (f_0, \ldots, f_{N-1} )^\top \in \mathbb C^N\) with components \(f_n = f(\hat \omega_n)\), and every column vector \(\boldsymbol f \in \mathbb C^N\) represents a unique element \(f\in \hat H_N\). Correspondingly, every linear map \(A:\hat H_N \to \hat H_N\) can be represented by a matrix \(\boldsymbol A \in \mathbb C^{N\times N}\) such that \(\boldsymbol A \boldsymbol f\) is the column vector representation of \(Af\).
\begin{align*} \langle f, g \rangle_N := \int_\Omega f^* g \, d\mu_N = \frac {1}{N} \sum_{n=0}^{N-1} f^*(\hat \omega_n) g(\hat \omega_n). \end{align*}
Let \(C(\mathcal M)\) denote the Banach space of continuous, complex-valued functions on the forward-invariant set \(\mathcal M\), equipped with the uniform norm, \(\lVert f \rVert_{C(\mathcal M)} = \max_{\omega \in \mathcal M} \lvert f(\omega )\rvert\). Under assumptions A2 and A3, for \(\mu\)-a.e. initial condition \(\hat \omega_0 \in \mathcal M\) and in the limit of large data, \(N \to \infty\), the measures \(\mu_N\) converge to the invariant measure in the weak-\(^*\) topology of finite Borel measures on \(\mathcal M\); i.e.,In (C.1), we may replace \(f\) in the integrals with respect to \(\mu_N\) by a uniformly convergent sequence of functions \(f_N\); that is, we havefor \(\mu\)-a.e. \(\hat \omega_0 \in \mathcal M\), where \(\lim_{N\to \infty } f_N = f\) in the \(C(\mathcal M)\) norm. In what follows, (C.2) will play a key role in ensuring the consistency of the data-driven formulation of QMCl.
\begin{align} \lim_{N\to \infty } \int_{\mathcal M} f \, d\mu_N \equiv \lim_{N\to \infty } \frac{1}{N} \sum_{n=0}^{N-1} f(\hat \omega_n) = \int_{\mathcal M} f \, d\mu \quad \forall f \in C(\mathcal M). \end{align}
(C.1)
\begin{align} \lim_{N\to \infty } \int_{\mathcal M} f_N \, d\mu_N = \int_{\mathcal M} f \, d\mu \end{align}
(C.2)
C.2. Data-driven basis.
In subsection 3.4, we used the eigenfunctions \(\phi_l\) of the kernel integral operator \(K : H \to H\) to define \(L\)-dimensional subspaces \(H_L \subseteq H\) (see (3.6)) on which we built finite-dimensional quantum systems. To do this in the data-driven setting, we replace \(K\) with the operator \(K_N: \hat{H}_N \to \hat{H}_N\) defined analogously to (3.7) asComputationally, \(K_N\) is represented by the \(N\times N\) kernel matrixOnce again, we solve the eigenvalue problem for \(K_N\),and define \(L\)-dimensional subspaces \(H_{L,N} \subseteq \hat H_N\) as (cf. (3.6))
\begin{align} K_N f = \int_\Omega \kappa (\cdot,\omega ) f(\omega ) \, d\mu_N(\omega ) = \frac{1}{N}\sum_{n=0}^{N-1} \kappa (\cdot,\hat \omega_n) f(\hat \omega_n). \end{align}
(C.3)
\begin{align*} \boldsymbol K_N = [\kappa (\hat \omega_i, \hat \omega_j)]_{i,j=0}^{N-1} = [\kappa_{{\mathcal W}}(\hat w_i, \hat w_j)]_{i,j=0}^{N-1}. \end{align*}
\begin{align} K_N \phi_{l,N} = \lambda_{l,N} \phi_{l,N}, \end{align}
(C.4)
\begin{align} H_{L,N} = \operatorname{span}\{ \phi_{0,N}, \ldots, \phi_{L-1,N} \}. \end{align}
(C.5)
In (C.5), the eigenvectors \(\phi_{l,N}\) are orthonormal on \(\hat H_N\), and the corresponding eigenvalues \(\lambda_{l,N}\) are real and are ordered in order of decreasing modulus. Analogously to \(\Pi_L : H \to H\) and \(\boldsymbol \Pi_L : B(H) \to B(H)\) from subsection 3.3, we define projection maps \(\Pi_{L,N} : \hat H_N \to \hat H_N\) and \(\boldsymbol \Pi_{L,N} : B(\hat H_N) \to B(\hat H_N)\) such that \(\operatorname{ran} \Pi_{L,N} = H_{L,N}\) and \(\boldsymbol \Pi_{L,N}A = \Pi_{L,N} A \Pi_{L,N}\). We also let \(\beta_{L,N} : H_{L,N} \to \mathbb C^L\) and \(\boldsymbol \beta_{L,N} : B(H_{L,N}) \to \mathbb M_L\) be the maps from vectors in \(H_{L,N}\) and operators in \(B(H_{L,N})\) to their corresponding vector and matrix representations with respect to the \(\{ \phi_{l,N} \}_{l=0}^{L-1}\) basis, defined analogously to \(\beta_L : H_L \to \mathbb C^L\) and \(\boldsymbol \beta_L : B(H_L) \to \mathbb M_L\) from subsection 3.3, respectively.
Similarly to the representatives \(\varphi_l\) of \(\phi_l\) from (3.10), every eigenvector \(\phi_{l,N}\) with nonzero corresponding eigenvalue \(\lambda_{l,N}\) has an everywhere-defined representative \(\varphi_l : \Omega \to \mathbb R\), given byUnder assumption A4, every such \(\varphi_l\) and \(\varphi_{l,N}\) is continuous on \(\mathcal M\). By results on spectral approximation of kernel integral operators [90], the following can be shown to hold as \(N\to \infty\) for \(\mu\)-a.e. initial condition \(\hat \omega_0 \in \mathcal M\):We refer the reader to the paper [33] for further details on these results in the context of QMDA.
\begin{align} \varphi_{l,N}(\omega ) = \frac{1}{\lambda_{l,N}} \int_{\mathcal M} \kappa (\omega,\omega^{\prime }) \phi_{l,N}(\omega ) \, d\mu_N = \frac{1}{\lambda_{l,N}} \frac{1}{N} \sum_{n=0}^{N-1} \kappa (\omega,\hat \omega_n) \phi_{l,N}(\hat \omega_n). \end{align}
(C.6)
1.
For every nonzero eigenvalue \(\lambda_l\) of \(K\), the sequence of eigenvalues \(\lambda_{l,N}\) of \(K_N\) converges to \(\lambda_l\), including multiplicities.
2.
For every continuous representative \(\varphi_l \in C(\mathcal M)\) of an eigenfunction \(\phi_l \in H\) of \(K\) corresponding to \(\lambda_l \neq 0\), there exists a sequence of eigenfunctions \(\phi_{l,N} \in \hat H_N\) of \(K_N\) whose continuous representatives \(\varphi_{l,N} \in C(\mathcal M)\) converge to \(\varphi_l\) in the \(C(\mathcal M)\) norm.
C.3. Operator approximation.
We use the data-driven basis \(\{ \phi_{l,N} \}_{l=0}^{N-1}\) of \(H_{L,N}\) to consistently approximate matrix representations of operators on \(H_L\) by matrix representations of operators on \(H_{L,N}\) for a class of operators that behave consistently with operators on continuous functions. In what follows, \(\iota : C(\mathcal M) \to L^p(\mu )\) and \(\iota_N : C(\mathcal M) \to L^p(\mu_N)\) will denote the standard inclusion and restriction maps from continuous functions on \(\mathcal M\) to their corresponding \(L^p\) equivalence classes with respect to \(\mu\) and \(\mu_N\), respectively.
Consider an operator \(A\in B(H)\) that satisfiesfor some bounded operator \(\tilde A : C(\mathcal M) \to C(\mathcal M)\). Consider also a uniformly bounded sequence of operators \(\hat A_1, \hat A_2, \ldots\) in \(B(\hat H_1), B(\hat H_2), \ldots\), respectively, such thatGiven any such operator family \(\{ A, \tilde A, \hat A_0, \hat A_1, \ldots \}\), our approach is to approximate matrix elements of \(A\) with respect to the kernel eigenbasis \(\{ \phi_l \}\) of \(H\) from subsection 3.4 by matrix elements of \(\hat A_N\) with respect to the data-driven eigenbasis \(\{ \phi_{l,N} \}\) of \(\hat H_N\). Specifically, let \(\phi_i\) and \(\phi_j\) be two basis vectors of \(H\) from (3.8) corresponding to nonzero eigenvalues. Let \(\phi_{i,N}\) and \(\phi_{j,N}\) be basis vectors of \(\hat H_N\) from (C.4) chosen such that, as \(N\to \infty\), their continuous representatives \(\varphi_{i,N}\) and \(\varphi_{j,N}\) converge to those of \(\phi_i\) and \(\phi_j\) (i.e., \(\varphi_i\) and \(\varphi_j\)), respectively, as described in Appendix C.2. Then it can be shown [33] that for \(\mu\)-a.e. \(\hat \omega_0 \in \mathcal M\),This means that we can consistently approximate matrix elements of \(A\) by matrix elements of \(\hat A_N\). In particular, if \(L \in{\mathbb N}\) is such that \(\lambda_{L-1}\) is nonzero, it follows from (C.9) that as \(N\to \infty\) the matrix representations \(\boldsymbol A_{L,N} = \boldsymbol \beta_{L,N} \hat A_{L,N}\) of the projected operators \(A_{L,N} = \boldsymbol \Pi_{L,N}\hat A_N \in B(H_{L,N})\) converge to the matrix representation \(\boldsymbol A_L = \boldsymbol \beta_L A_L\) of \(A_L = \boldsymbol \Pi_L A\) in any matrix norm. As we explain in subsection 5.2 and Appendix D.1, under assumption A4, all operators employed in QMCl satisfy the compatibility conditions (C.7) and (C.8) and thus can be approximated in this manner.
\begin{align} A \circ \iota = \iota \circ \tilde A \end{align}
(C.7)
\begin{align} \lim_{N\to \infty } \lVert (\hat A_N \circ \iota_N - \iota_N \circ \tilde A) f \rVert_{\hat H_N} =0 \quad \forall f \in C(\mathcal M). \end{align}
(C.8)
\begin{align} \lim_{N\to \infty } \langle \phi_{i,N}, \hat A_N \phi_{j,N} \rangle_N = \langle \phi_i, A \phi_j \rangle. \end{align}
(C.9)
C.4. Convergence of data-driven approximation.
With the addition of another approximation parameter (\(N\) in addition to \(L\)), we seek to examine the convergence properties of the system under the iterated limits of \(L \to \infty\) after \(N \to \infty\). Previous work [33] has shown that for quantum observables \(\hat A_N \in B(\hat H_N)\) and \(A \in B(H)\) satisfying (C.7) and (C.8) for some \(\tilde A : C(\mathcal M) \to C(\mathcal M)\) and for states \(\hat \rho_N \in Q(\hat H_N)\) and \(\rho \in Q(H)\) satisfying a related compatibility condition with operators on continuous functions, the following asymptotic consistency relationship holds:Here, \(A_{L,N} = \boldsymbol \Pi_{L,N} \hat A_N\) and \(A_L = \boldsymbol \Pi_L A\) are the projected quantum observables associated with \(\hat A_N\) and \(A\), respectively, and \(\rho_{L,N} \in Q(H_{L,N})\) and \(\rho_L \in Q(H_L)\) are the projected states associated with \(\hat \rho_N\) and \(\hat \rho\), respectively (see (B.1)). Under assumption A4, the fluxes \(\zeta^{(i)}\) are continuous, which implies that (C.7) and (C.8) are satisfied with \(A = \pi (\iota \zeta^{(i)})\), \(\hat A_N = \hat \pi_N(\iota_N \zeta^{(i)})\), and \(\tilde A\) set to the multiplication operator by \(\zeta^{(i)}\) on continuous functions. Furthermore, the class of states \(\rho\) for which (C.10) holds includes images \(\rho = \Gamma (p)\) from (3.1) of probability densities in \(L^1(\mu)\) with continuous representatives in \(C(\mathcal M)\), as well as higher-rank generalizations, so the data-driven QMCl formulation is asymptotically consistent as \(N\to \infty\) in a broad range of scenarios encountered in applications. In addition, an analogous convergence result holds for conditioning by effects \(e \in \mathcal E(H)\) and \(\hat e_N \in \mathcal E(\hat H_N)\) which satisfy (C.7) and (C.8) for some operator \(\tilde e : C(\mathcal M) \to C(\mathcal M)\) on continuous functions. We refer the reader to [33] for further details.
\begin{align} \lim_{L \to \infty } \lim_{N\to \infty } \mathbb E_{\mathcal P_{L,N}\rho_{L,N}} A_{L,N} = \lim_{L\to \infty } \mathbb E_{\mathcal P_L \rho_L} A_L = \mathbb E_\rho A. \end{align}
(C.10)
C.5. Computational cost.
The training data requirements and computational cost of QMCl are generally comparable with those of kernel methods for supervised machine learning. Given that the data space \(\mathcal W\) has dimension \(d_{{\mathcal W}}\), the brute-force computation cost of forming the \(N\times N\) kernel matrix \(\boldsymbol K_N\) representing the integral operator \(K_N\) is \(O(d_{{\mathcal W}}N^2)\) for radial kernels. This cost can be reduced to \(O(N\log N)\) in data spaces of sufficiently low dimension using randomized methods for approximate nearest neighbors; see, e.g., [51]. In our numerical experiments, we compute \(\boldsymbol K_N\) with brute force and then sparsify it, retaining \(k_{\text{nn}} \ll N\) nearest neighbors per data point. The storage cost and matrix–vector multiplication cost for \(\boldsymbol K_N\) then become \(O(k_{\text{nn}} N)\). We compute the basis vectors \(\phi_{l,N}\) using iterative solvers. The cost of this computation depends on the spectral properties of \(\boldsymbol K_N\) and the number \(L\) of requested eigenvectors but generally scales linearly with \(k_{\text{nn}}\) and \(N\). Once the basis \(\{ \phi_{l,N} \}_{l=0}^{L-1}\) has been computed, we form the \(L \times L\) matrix \(\boldsymbol U_{L,N} = \boldsymbol \pi_{L,N} \hat U_N\) representing the projected shift operator and the \(L\times L\) observable matrices \(\boldsymbol Z^{(1)}_{L,N}, \ldots, \boldsymbol Z^{(d)}_{L,N}\) (see Appendix D.3.1), each with an \(O(NL^2)\) computational cost. This completes the training phase of QMCl.
The computational cost associated with advancing the parameterized model over one timestep, given that the classical and quantum states are \(x_n \in{\mathcal X}\) and \(\rho_n \in Q(H_{L,N})\), respectively, is as follows:Note that the density matrix update, \(\tilde{\boldsymbol \rho }_{n+1} \mapsto \boldsymbol \rho_{n+1}\), is the only step in the online prediction phase whose cost depends on the size \(N\) of the training data. In subsection 8.3, we discuss possible ways of alleviating this dependence using random feature methods for kernel matrix approximation [76].
•
We compute the flux terms \(\tilde Z^{(i)} (\rho_n) = \operatorname{tr}( \boldsymbol \rho_n \boldsymbol Z_{L,N}^{(i)})\) for \(i \in \{ 1, \ldots, d \}\), where \(\boldsymbol \rho_n \in \mathbb M_L\) is the matrix representation of \(\rho_n\) in the \(\{\phi_{l,N}\}\) basis of \(H_{L,N}\). For a quantum state of rank \(r\), the cost of each of these computations is \(O(r L^2)\). This can be as high as \(O(L^3)\) for quantum states of full rank, but in our experiments we work with pure states, \(r=1\), which results in \(O(L^2)\) operations.
•
We advance the resolved variables \(x_n\) via (4.2) using the previously computed fluxes. The cost of this procedure is independent of QMCl, so we do not consider it further here.
•
Using the transfer operator, we advance \(\boldsymbol \rho_n\) to the density matrix \(\tilde{\boldsymbol \rho }_{n+1}\) representing the prior state \(\tilde \rho_n\). This is again an \(O(rL^2)\) operation which can be as high as \(O(L^3)\) but reduces to \(O(L^2)\) for pure states.
•
We compute the \(L\times L\) matrix \(\boldsymbol F_{L,N}(x_{n+1})\) representing the quantum effect \(\mathcal F_{L,N}(x_{n+1})\). This has an \(O(d_{{\mathcal X}}NL^2)\) cost. Using \(\boldsymbol F_{L,N}(x_{n+1})\), we condition \(\tilde{\boldsymbol \rho }_{n+1}\) to obtain the density matrix \(\boldsymbol \rho_{n+1}\) representing the posterior state from (4.4). The cost of this operation is \(O(rL^2)\), where \(r\) is again the rank of \(\tilde{\boldsymbol \rho }_{n+1}\).
Appendix D. Numerical methods.
This appendix contains details on the numerical implementation of the data-driven formulation of QMCl described in section 5 and Appendix C. Appendix D.1 provides details on the linear operators employed in the data-driven scheme. Appendix D.2 gives the formula for classical evolution of the resolved variables used in the experiments of sections 6 and 7. Appendix D.3 contains a high-level algorithmic description of the QMCl pipeline, along with associated pseudocode. Appendix D.4 describes the approach used to compute histograms and time-autocorrelation functions.
D.1. Linear operators.
In Appendices D.1.1 to D.1.5, we describe the construction of the kernel integral operators, multiplication operators, spectral measures, evolution operators, and effect-valued feature maps, respectively, used in the data-driven QMCl formulation. We assume throughout availability of the training data described in section 5. For simplicity, we suppress \(N\) and \(L\) subscripts (number of training samples and eigenfunctions, respectively) from our notation of column vectors and matrices representing vectors and linear operators, respectively.
D.1.1. Kernel integral operators.
The first step in the data-driven QMCl algorithm is to use the training data \(\hat w_0, \ldots, \hat w_{N-1} \in{\mathcal W}\) to build the kernel integral operator \(K_N : \hat H_N \to \hat H_N\) and compute the associated eigenfunctions \(\phi_{l,N}\). The eigenvalue problem for \(K_N\) in (C.4) is equivalent to the matrix eigenvalue problemfor the \(N\times N\) kernel matrix \(\boldsymbol K = [K_{ij}]_{i,j=0}^{N-1}\) with \(K_{ij} = \kappa_{{\mathcal W}}(\hat w_i,\hat w_j)\), where \(\boldsymbol \phi_l = (\phi_{0l},\ldots,\phi_{N-1,l})^\top \in \mathbb R^N\) are column vectors whose elements give the eigenfunction values, \(\phi_{nl} = \phi_{l,N}(\hat \omega_n)\). We normalize the eigenvectors such that \(\boldsymbol \phi_i^\top \boldsymbol \phi_j = N \delta_{ij}\), which is equivalent to the orthonormality condition \(\langle \phi_{i,N}, \phi_{j,N} \rangle_N = \delta_{ij}\) on \(\hat H_N\). For computation, it is useful to arrange the leading \(L\) eigenvectors \(\boldsymbol \phi_0, \ldots, \boldsymbol \phi_{L-1}\) that form the basis of \(H_{L,N}\) in an \(N\times L\) matrix \(\boldsymbol \Phi\) whose \(l\)th column is equal to \(\boldsymbol \phi_l\).
\begin{align} \boldsymbol K \boldsymbol \phi_l = \lambda_{l,N} \boldsymbol \phi_l \end{align}
(D.1)
D.1.2. Multiplication operators.
Every classical observable \(f : \Omega \to \mathbb C\) induces an element \(\hat f_N \in L^\infty (\mu_N)\) by restriction to the finite dynamical trajectory \(\{ \hat \omega_0, \ldots, \hat \omega_{N-1} \} \subset \Omega\) underlying the training data. Since \(L^\infty (\mu_N)\) is isomorphic to \(\mathbb C^N\) whenever the states \(\hat \omega_n\) are distinct (which we assume here), we can represent \(\hat f_N\) by the column vector \(\boldsymbol f = (f_0, \ldots, f_{N-1})^\top \in \mathbb C^N\), where \(f_n = f(\hat \omega_n)\). We stress that \(\boldsymbol f\) is empirically accessible so long as the values \(f_n\) of \(f\) on the states \(\hat \omega_n\) are known, without requiring knowledge of \(\hat \omega_n\). This is the case, for instance, for the values \(z_n^{(i)} = Z^{(i)}(\hat \omega_n)\) of the fluxes used as our training data.
In QMCl, we map every such classical observable \(f\) to a projected multiplication operator \(A_{L,N} := \pi_{L,N} \hat f_N \in B(H_{L,N})\). Computationally, this operator is represented by the \(L\times L\) matrix representation \(\boldsymbol A := \boldsymbol \beta_{L,N} A_{L,N} = [ A_{ij} ]_{i,j=0}^{L-1}\) with elements \(A_{ij} = \langle \phi_{i,N}, A_{L,N} \phi_{j,N} \rangle_N\). Using \(\odot\) to denote elementwise multiplication of column vectors, we haveIn matrix notation, the above expression becomesNote that \(\boldsymbol A\) is a self-adjoint matrix whenever \(f\) is real-valued. Henceforth, we will assume that this the case.
\begin{align*} A_{ij} = \frac {1}{N} \sum_{n=0}^{N-1} \phi_{i,N}(\hat \omega_n) f(\hat \omega_n) \phi_{j,N}(\hat \omega_n) = \frac {1}{N} \boldsymbol \phi_i^\top ( \boldsymbol f \odot \boldsymbol \phi_j). \end{align*}
\begin{align} \boldsymbol A = \frac{1}{N} \boldsymbol \Phi^\top (\operatorname{diag} \boldsymbol f) \boldsymbol \Phi. \end{align}
(D.2)
Given a quantum state \(\rho \in Q(H_{L,N})\) with matrix representationthe quantum mechanical expectation \(\mathbb E_{\rho } A\) from (A.1) can be computed asIf \(\boldsymbol \rho = \boldsymbol \xi \boldsymbol \xi^\dagger\) is a pure state associated with a unit vector \(\boldsymbol \xi \in \mathbb C^L\) (recall that \(^\dagger\) denotes the complex-conjugate transpose), then (D.4) simplifies to
\begin{align} \boldsymbol \rho = [\langle \phi_{i,N}, \rho \phi_{j,N}\rangle_N ]_{i,j=0}^{L-1}, \end{align}
(D.3)
\begin{align} \mathbb E_{\rho } A = \operatorname{tr}(\boldsymbol \rho \boldsymbol A). \end{align}
(D.4)
\begin{align*} \mathbb E_{\rho } A = \boldsymbol \xi^\dagger \boldsymbol A \boldsymbol \xi. \end{align*}
D.1.3. Spectral decomposition.
In the stochastic variant of QMCl (see subsection 4.5 and Appendix B.2), we use the spectral measure \(E_{A_{L,N}}\) of \(A_{L,N} = \pi_{L,N} \hat f_N\) in order to compute the discrete probability distributions \(\mathbb P_{\rho,A_{L,N}}\) in (B.4) for a given quantum state \(\rho \in Q(H_{L,N})\). We stochastically generate flux terms by sampling from these distributions as follows.
Let \(\boldsymbol u_0, \ldots, \boldsymbol u_{L-1} \in \mathbb R^L\) be a set of orthonormal eigenvectors of \(\boldsymbol A\) from (D.2) with corresponding eigenvalues \(a_0, \ldots, a_{L-1} \in \mathbb R\) (potentially with multiplicities). In the \(\{ \phi_{l,N} \}\) basis of \(H_{L,N}\), the spectral measure \(E_{A_{L,N}}\) from (B.3) is represented by a matrix-valued measure \(\boldsymbol E : \mathcal B(\mathbb R) \to \mathbb M_L\) with \(\boldsymbol E(S) = [\langle \phi_{i,N}, E_{A_{L,N}}(S) \phi_{j,N} \rangle_N]_{i,j=0}^{L-1}\), such thatNote that for any eigenvalue \(a_j\), the sum \(\boldsymbol E^{(j)} = \sum_{l: a_l =a_j} \boldsymbol u_l \boldsymbol u_l^\top\) is the matrix representation of the projection \(E^{(j)}_{A_{L,N}}\) onto the eigenspace of \(A_{L,N}\) corresponding to \(a_j\).
\begin{align*} \boldsymbol E(S) = \sum_{l: a_l \in S} \boldsymbol u_l \boldsymbol u_l^\top. \end{align*}
Given a quantum state \(\rho \in Q(H_{L,N})\) with matrix representation \(\boldsymbol \rho\) from (D.3), the probability distribution \(\mathbb P_{\rho,A_{L,N}}\) can be evaluated asIf \(\boldsymbol \rho = \boldsymbol \xi \boldsymbol \xi^\dagger\) is pure, then (D.5) simplifies toPractically, we draw samples from \(\mathbb P_{\rho,A_{L,N}}\) by computing the probability vector \(\boldsymbol p\in \mathbb R^L\) withand drawing samples from the spectrum \(\{ a_0, \ldots, a_{L-1} \}\) with distribution \(\boldsymbol p\) using a sampling algorithm (e.g., randsample in MATLAB).
\begin{align} \mathbb P_{\rho,A_{L,N}}(S) = \sum_{l:a_l \in S} \operatorname{tr}\left( \boldsymbol \rho (\boldsymbol u_l \boldsymbol u^\top_l) \right) = \sum_{l:a_l \in S} \boldsymbol u_l^\top \boldsymbol \rho \boldsymbol u_l. \end{align}
(D.5)
\begin{align*} \mathbb P_{\rho,A_{L,N}}(S) = \sum_{l:a_l \in S} \lvert \boldsymbol \xi^\dagger \boldsymbol u_l \rvert^2. \end{align*}
\begin{align} \boldsymbol p= (p_0, \ldots, p_{L-1}), \quad p_l = P_{\rho,A_{L,N}}(\{a_l\}) \end{align}
(D.6)
D.1.4. Evolution operators.
Following [7, 36, 33], we approximate the Koopman operator by the left shift operator \(\hat U_N : \hat H_N \to \hat H_N\) defined asFor completeness, we note that an alternative approach with equivalent asymptotic behavior as \(N \to \infty\) (which we do not use in experiments of sections 6 and 7) is to employ the unitary shiftWith either approach, we obtain the evolution operator \(U_{L,N} : H_{L,N} \to H_{L,N}\) by projection onto \(H_{L,N}\), \(U_{L,N} = \Pi_{L,N}\hat U_N \Pi_{L,N}\). In the \(\{ \phi_{l,N} \}\) basis of \(H_{L,N}\), \(U_{L,N}\) is represented by the \(L\times L\) matrixIn particular, for the left shift operator in (D.7), we havewhere \(\hat{\boldsymbol U}\) is the \(N\times N\) left shift matrixEquivalently, in matrix notation we haveAnalogously to the data-independent case described in Appendix B.3, \(U_{L,N}\) has an induced action \(\mathcal P_{L,N} : B_1(H_{L,N}) \to B_1(H_{L,N})\) defined by the conjugation formula \(\mathcal P_{L,N} A = U_{L,N}^* A U_{L,N}\). (Note that, as a vector space, \(B_1(H_{L,N})\) is isomorphic to \(B(H_{L,N})\) by finite dimensionality of \(H_{L,N}\), but \(B_1(H_{L,N})\) is equipped with the trace norm, whereas \(B(H_{L,N})\) is equipped with the operator norm.) In the \(\{ \phi_{l,N} \}\) basis of \(H_{L,N}\), \(\mathcal P_{L,N}\) is represented by the linear operator \(\mathsf P : \mathbb M_L \to \mathbb M_L\) on \(L \times L\) matrices defined as \(\mathsf P \boldsymbol A = \boldsymbol U^\top \boldsymbol A \boldsymbol U\). That is, if \(\boldsymbol A = \boldsymbol \beta_{L,N} A \in \mathbb M_L\) is the matrix representation of an operator \(A \in B_1(H_{L,N})\), then \(\mathsf P \boldsymbol A\) is the matrix representation of \(\mathcal P_{L,N}A\), i.e., \(\mathsf P \boldsymbol A = \boldsymbol \beta_{L,N} (\mathcal P_{L,N} A)\).
\begin{align} \hat U_N f(\hat \omega_n) = \begin{cases} f(\hat \omega_{n+1}), & 0 \leq n \leq N-2, \\ 0, & n = N - 1. \end{cases} \end{align}
(D.7)
\begin{align*} \hat U_N f(\hat \omega_n) = \begin {cases} f(\hat \omega_{n+1}), & 0 \leq n \leq N-2, \\ f(\hat \omega_0), & n = N - 1. \end {cases} \end{align*}
\begin{align*} \boldsymbol U = [U_{ij}] = [ \langle \phi_{i,N}, \hat U_N \phi_j \rangle_N ]_{i,j=0}^{L-1}. \end{align*}
\begin{align*} U_{ij} = \frac {1}{N} \sum_{n=0}^{N-2} \phi_i(\hat \omega_n) \phi_j(\hat \omega_{n+1}) = \frac {1}{N} \boldsymbol \phi_i^\top \hat {\boldsymbol U} \boldsymbol \phi_j, \end{align*}
\begin{align*} \hat {\boldsymbol U} = \begin {pmatrix} 0 & 1 \\ & \ddots & \ddots \\ & & 0 & 1 \\ & & & 0 \end {pmatrix}. \end{align*}
\begin{align} \boldsymbol U = \frac{1}{N} \boldsymbol \Phi^\top \hat{\boldsymbol U} \boldsymbol \Phi. \end{align}
(D.8)
In general, \(U_{L,N}\) is not a unitary operator, so \(\mathcal P_{L,N}\) does not necessarily map quantum states in \(Q(H_{L,N}) \subset B_1(H_{L,N})\) to quantum states (see Appendix B.3). However, \(\mathcal P_{L,N}\) is trace nonincreasing, so it generates an open quantum system. In our computations, we enforce state preservation by replacing \(\mathcal P_{L,N}\) by the nonlinear mapdefined on the set of quantum states \(\rho \in Q(H_{L,N})\) for which \(\operatorname{tr}(\mathcal P_{L,N} \rho )\) is nonzero. This map is represented by the map \(\tilde{\mathsf P}\) on \(L\times L\) density matrices defined asIf \(\boldsymbol \rho = \boldsymbol \xi \boldsymbol \xi^\dagger\) is pure, then \(\tilde{\mathsf P}(\boldsymbol \rho ) = \tilde{\boldsymbol \xi }^\dagger \tilde{\boldsymbol \xi }\) is a pure state associated with the unit vectorThus, to compute the evolution of pure states it is sufficient to work with the map \(\tilde{\boldsymbol P}\) on state vectors rather than explicitly with \(\tilde{\mathsf P}\) on density matrices—this results in a reduction of computational cost from \(O(L^3)\) to \(O(L^2)\); see Appendix C.5.
\begin{align*} \tilde {\mathcal P}_{L,N}(\rho ) := \frac {\mathcal P_{L,N}\rho }{\operatorname {tr}(\mathcal P_{L,N}\rho )}, \end{align*}
\begin{align*} \tilde {\mathsf P}(\boldsymbol \rho ) = \frac {\mathsf P \boldsymbol \rho }{\operatorname {tr}(\mathsf P \boldsymbol \rho )}. \end{align*}
\begin{align*} \tilde {\boldsymbol \xi } = \tilde {\boldsymbol P}(\boldsymbol \xi ) := \frac {\boldsymbol U^\top \boldsymbol \xi }{\lVert \boldsymbol U^\top \boldsymbol \xi \rVert_2}. \end{align*}
D.1.5. Effect-valued feature maps.
Recall from subsections 4.3 and 5.2 that the effect-valued feature map \(\mathcal F_{L,N} :{\mathcal X} \to \mathcal E(H_{L,N})\) maps each state \(x \in{\mathcal X}\) of the resolved variables to a projected multiplication operator \(\pi_{L,N}(\hat F_N(x))\), where \(\hat F_N(x) = k(x, X(\cdot )) \in L^\infty (\mu_N)\) is the feature vector associated with the kernel \(k :{\mathcal X} \times{\mathcal X} \to [0,1]\). In the \(\{ \phi_{l,N} \}\) basis of \(H_{L,N}\), \(\mathcal F_{L,N}\) is represented by the matrix-valued map \(\boldsymbol F :{\mathcal X} \to \mathbb M_L\) with \(\boldsymbol F = \boldsymbol \beta_{L,N} \circ \mathcal F_{L,N}\). That is, we have \(\boldsymbol F(x) = \boldsymbol A\), where \(\boldsymbol A\) is given by (D.2) with \(\boldsymbol f = ( k(x,\hat x_0), \ldots, k(x,\hat x_{N-1}))^\top\). State conditioning by the effect \(\mathcal F_{L,N}(x)\) (see (A.7) and (4.4)) can then be computed via the matrix formulawhere \(\boldsymbol \rho |_{\boldsymbol F(x)}\) is the matrix representation of \(\rho |_{\mathcal F_{L,N}(x)}\). If \(\boldsymbol \rho = \boldsymbol \xi \boldsymbol \xi^\dagger\) is pure, then the conditioned state \(\boldsymbol \rho |_{\boldsymbol F(x)}\) is also pure, and the associated state vector is given by
\begin{align} \boldsymbol \rho |_{\boldsymbol F(x)} = \frac{\sqrt{\boldsymbol A} \boldsymbol \rho \sqrt{\boldsymbol A}}{\operatorname{tr}(\sqrt{\boldsymbol A} \boldsymbol \rho \sqrt{\boldsymbol A})}, \end{align}
(D.9)
\begin{align} \boldsymbol \xi |_{\boldsymbol F(x)} = \frac{\sqrt{\boldsymbol A}\boldsymbol \xi }{\lVert \sqrt{\boldsymbol A}\boldsymbol \xi \rVert_2}. \end{align}
(D.10)
A drawback of (D.9) and (D.10) is that they require the computation of the matrix square root \(\sqrt{\boldsymbol A}\). To avoid the cost of this step, we can modify \(\mathcal F_{L,N}\) to the effect-valued map \(\tilde{\mathcal F}_{L,N} (x) = (\pi_{L,N} \hat F^{1/2}_N(x))^2\). This map is represented by the matrix-valued function \(\tilde{\boldsymbol F} :{\mathcal X} \to \mathbb M_L\) such that \(\tilde{\boldsymbol F}(x) = \boldsymbol A^2\), where \(\boldsymbol A\) is given by (D.2), now with \(\boldsymbol f = (\sqrt{k(x,\hat x_0)}, \ldots, \sqrt{k(x,\hat x_{N-1})})^\top\). In this case, the conditioning formula by \(\tilde{\boldsymbol F}(x)\) becomeswhich avoids the matrix square root. Analogously to (D.10), the state vector update under (D.11) when \(\boldsymbol \rho = \boldsymbol \xi \boldsymbol \xi^{\dagger }\) is pure becomeswhich again does not require computation of a matrix square root. Our experiments in sections 6 and 7 utilize (D.12) for state conditioning.
\begin{align} \boldsymbol \rho |_{\tilde{\boldsymbol F}(x)} = \frac{\boldsymbol A \boldsymbol \rho \boldsymbol A}{\operatorname{tr}(\boldsymbol A \boldsymbol \rho \boldsymbol A)}, \end{align}
(D.11)
\begin{align} \boldsymbol \xi |_{\tilde{\boldsymbol F}(x)} = \frac{\boldsymbol A\boldsymbol \xi }{\lVert \boldsymbol A\boldsymbol \xi \rVert_2}, \end{align}
(D.12)
Even though, in general, \(\mathcal F_{L,N}(x)\) and \(\tilde{\mathcal F}_{L,N}(x)\) are not equal, the two maps have the same asymptotic limit as \(N\to \infty\) and \(L\to \infty\) (see Appendices B.4 and C.4). In that limit, conditioning by any of \(\mathcal F_{L,N}(x)\) and \(\tilde{\mathcal F}_{L,N}(x)\) recovers conditioning by \(\mathcal F(x)\) in the infinite-dimensional quantum system on \(H\). The latter is in turn consistent with classical Bayesian conditioning by the feature vectors \(F(x)\) (see subsection 3.2). We note that for the radial basis function kernel in (4.5) used in our experiments the square root kernel function \(\sqrt k\) can be obtained simply by scaling the bandwidth parameter \(\epsilon\) by a factor of \(\sqrt 2\).
D.2. Classical evolution.
In the experiments of sections 6 and 7, the classical evolution map \(\tilde \phi :{\mathcal X} \times{\mathcal Z} \to{\mathcal X}\) is based on a standard RK4 discretization of the resolved component of the dynamics on \({\mathcal X} = \mathbb R^{d_{{\mathcal X}}}\), keeping the flux terms in \({\mathcal Z} = \mathbb R^d\) fixed. Specifically, given that the resolved components of the dynamics satisfy \(\dot x(t) = v(x(t), z(t))\) for a vector field \(v:{\mathcal X} \times{\mathcal Z} \to{\mathcal X}\) and the timestep of the parameterized system is \(\Delta t\), we set
\begin{align} \begin{gathered} \tilde \phi (x,z) = x + \frac{1}{6}(k_1 + 2 k_2 + 2 k_3 + k_4) \, \Delta t,\\ \begin{aligned} k_1 = v( x, z), \quad k_2 = v( x + \Delta t\, k_1/ 2, z), \\ k_3 = v( x + \Delta t \, k_2/ 2, z), \quad k_4 = v(x + \Delta t\, k_3, z). \end{aligned} \end{gathered} \end{align}
(D.13)
D.3. Algorithm structure.
The data-driven QMCl pipeline consists of training, initialization, and simulation/prediction phases, summarized in Appendices D.3.1 to D.3.3 respectively.
D.3.1. Training.
Recall that our training dataset consists of time-ordered samples \(\hat w_0, \ldots, \hat w_{N-1} \in{\mathcal W}\) for computing basis functions and samples \(\hat z_0^{(i)}, \ldots, \hat z_{N-1}^{(i)} \in \mathbb R\) with \(i \in \{ 1, \ldots, d \}\) of the components of the flux \(Z :{\mathcal Y} \to{\mathcal Z}\). In this paper, we assume that the samples are taken on a single dynamical trajectory \(\hat \omega_0, \ldots, \hat \omega_{N-1} \in \Omega\), i.e., \(\hat w_n = W(\hat \omega_n)\), \(\hat z_n^{(i)} = Z^{(i)}(\hat \omega_n)\), and \(\hat \omega_n = \Phi^m(\hat \omega_0)\) for some initial condition \(\hat \omega_0 \in \Omega\). The methods described below can be readily generalized to training with samples from ensembles of shorter trajectories, so long as the sampling measure \(\mu_N\) of the data converges to the invariant measure in the sense of (C.1). The steps of the training phase are as follows:
1.
Tune the bandwidth parameter \(\epsilon_{{\mathcal W}}\) of the kernel \(\kappa_{{\mathcal W}}\). We perform this step using the automatic tuning procedure developed in [24, 9]. The procedure is based on a grid search in a collection of candidate \(\epsilon_{{\mathcal W}}\) values, selecting the value that maximizes a kernel-dependent measure of dimension of the dataset. For pseudocode, see, e.g., Algorithm B.5 in [33].
2.
With the bandwidth parameter from step 1, form the kernel matrix \(\boldsymbol K\) and compute the kernel eigenvectors \(\{ \boldsymbol \phi_0, \ldots, \boldsymbol \phi_{L-1} \}\) from (D.1). Arrange the basis vectors in a matrix \(\boldsymbol \Phi\) as described in Appendix D.1.1. In applications, we typically approximate \(\boldsymbol K\) by a sparse matrix using nearest-neighbor truncation and solve the resulting eigenvalue problem with iterative solvers (e.g., eigs of MATLAB).
3.
For each \(i \in \{ 1, \ldots d \}\), use \(\boldsymbol \Phi\) and the \(\hat z_n^{(i)}\) samples to compute an \(L\times L\) multiplication operator matrix \(\boldsymbol Z^{(i)}\) via (D.2) with \(\boldsymbol f = (\hat z_0^{(i)}, \ldots, \hat z_{N-1}^{(i)})^\top\). In the stochastic variant of QMCl, we also compute the eigenvalues \(a^{(i)}_0, \ldots, a^{(i)}_{L-1} \in \mathbb R\) and corresponding eigenvectors \(\boldsymbol u_0^{(i)}, \ldots, \boldsymbol u_{L-1}^{(i)}\) for each \(\boldsymbol Z^{(i)}\).
4.
D.3.2. Initialization.
Let \(x_0 \in{\mathcal X}\) be a given initial classical state and \(\rho_0 \in Q(H_{L,N})\) an initial quantum state that may depend on \(x_0\). In the experiments of sections 6 and 7, we set \(\rho_0\) to the uninformative state \(\bar \rho_{L,N}\) from (4.7). This is a pure state induced by the unit vector \(\bar \xi_{L,N} = \bar \psi_{L,N}/ \lVert \bar \psi_{L,N} \rVert_{\hat H_N}\), where \(\bar \psi_{L,N} = \sum_{l=0}^{L-1} c_l \phi_{l,N}\) and \(c_l = \langle 1_\Omega, \phi_{l,N} \rangle_N\). In column vector notation, the expansion coefficients \(c_l\) are given by \(c_l = \boldsymbol \phi_l^\top \boldsymbol 1_N/ N\), where \(\boldsymbol 1_N = (1,\ldots, 1)^\top\) is the column vector in \(\mathbb C^N\) with all elements being equal to 1. Correspondingly, in the \(\{ \phi_{l,N} \}\) basis of \(H_{L,N}\) the state vector \(\bar \xi_{L,N}\) is represented by the column vector \(\bar{\boldsymbol \xi } = \beta_{L,N} \bar \xi_{L,N}\) given byand the density operator \(\bar \rho_{L,N}\) is represented by the rank-1 density matrix \(\boldsymbol{\bar \rho }_{L,N} = \bar{\boldsymbol \xi } \bar{\boldsymbol \xi }^\top\). Note that if the kernel \(\kappa_{{\mathcal W}}\) used to build the basis is normalized to a Markov kernel (see, e.g., the QMCl experiments in subsection 6.3 and the QMDA experiments in [36, 33]), then the leading basis element \(\boldsymbol \phi_0\) can be chosen as \(\boldsymbol 1_N\), and \(\bar{\boldsymbol \xi }\) simplifies to the vector \((1,0,\ldots, 0)\).
\begin{align*} \bar {\boldsymbol \xi } = \frac {\boldsymbol c}{ \lVert \boldsymbol c \rVert_2}, \quad \boldsymbol c = (c_0, \ldots, c_{L-1} )^\top, \end{align*}
Besides the choice \(\rho_0 = \bar \rho_{L,N}\), an alternative approach to quantum state initialization (which we do not employ in the experiments presented in this paper) is to use the feature map \(\mathcal F_{L,N}\) to set \(\rho_0 = \mathcal F_{L,N}( x_0)/ \operatorname{tr}(\mathcal F_{L,N}( x_0))\). As noted in subsection 4.4, \(\rho_0\) obtained with this approach is not, in general, a pure state.
D.3.3. Simulation/prediction.
Given the initial data \(( x_0,{\boldsymbol \rho }_0)\) obtained by any of the two methods described in Appendix D.3.2, the QMCl parameterized system evolves by alternating between classical and quantum evolution, as described in subsections 4.2 and 5.2. Algorithm D.1 implements this evolution over \(r\) timesteps given that the state of the system at the \(n\)th timestep is \(( x_n,{\boldsymbol \rho }_n)\). We recall that \(r \in{\mathbb N}\) is the number of timesteps between each update of the quantum state by the feature map (see subsection 4.2). Thus, the output of Algorithm D.1 is a sequence of classical states \(x_{n+1}, \ldots, x_{n+r} \in{\mathcal X}\) at timesteps \(n+1, \ldots, n + r\), respectively, and a posterior quantum state \({\boldsymbol \rho }_{n+r}\) at timestep \(n+r\). To continue to march the system forward, Algorithm D.1 is executed with initial conditions \(( x_{n+r},{\boldsymbol \rho }_{n+r})\).
Inputs • Resolved variables \(x_n \in{\mathcal X}\) and density matrix \({\boldsymbol \rho }_n \in \mathbb M_L\) at the \(n\)th timestep. • Number of timesteps \(r \in{\mathbb N}\) per quantum Bayesian update. Outputs • Resolved variables \(x_{n+1}, \ldots, x_{n+r} \in{\mathcal X}\) at timesteps \(n+1, \ldots, n+r\). • Posterior density matrix \({\boldsymbol \rho }_{n+r} \in \mathbb M_L\) at timestep \(n+r\). Steps 1. Set \(\tilde{\boldsymbol \rho }_n ={\boldsymbol \rho }_n\). 2. For \(j \in \{ 1, \ldots, r \}\): a. Compute the fluxes \(z_{n+j-1} = (z_{n+j-1}^{(1)}, \ldots, z_{n+j-1}^{(d)}) \in \mathbb R^d\) with \(z_{n+j-1}^{(i)} = \operatorname{tr}({\tilde{\boldsymbol \rho} }_{n+j-1} \boldsymbol Z^{(i)})\). b. Update the resolved variables: \(x_{n+j} = \tilde \phi ( x_{n+j-1},z_{n+j-1})\). c. Update the density matrix with the transfer operator: \(\tilde{\boldsymbol \rho }_{n+j} = \tilde{\mathsf P}(\tilde{\boldsymbol \rho }_{n+j-1})\). 3. Evaluate the effect-valued feature map: \(\boldsymbol e_{n+r} = \tilde{\boldsymbol F}( x_{n+r})\). 4. Compute the posterior density matrix \({\boldsymbol \rho }_{n+r} = \tilde{\boldsymbol \rho }_{n+r} |_{\boldsymbol e_{n+r}}\) via (D.11). 5. Return: \(x_{n+1}, \ldots, x_{n+r},{\boldsymbol \rho }_{n+r}\). |
If the initial state \({\boldsymbol \rho }_0 ={\boldsymbol \xi }_0{\boldsymbol \xi }_0^\dagger\) is pure, then under iteration of Algorithm D.1 all subsequent states \({\boldsymbol \rho }_r,{\boldsymbol \rho }_{2r}, \ldots\) are pure. Algorithm D.2 specializes Algorithm D.1 to that setting, which results in a reduction of computational cost from \(O(L^3)\) to \(O(L^2)\) (see Appendix C.5).
Finally, Algorithm D.3 implements the stochastic variant of QMCl for general (mixed) states. The specialization of this algorithm to pure states is entirely analogous to Algorithm D.2, so we do not include it here in the interest of brevity.
Inputs • Resolved variables \(x_n \in{\mathcal X}\) and state vector \({\boldsymbol \xi }_n \in \mathbb C^L\) at the \(n\)th timestep. • Number of timesteps \(r \in{\mathbb N}\) per quantum Bayesian update. Outputs • Resolved variables \(x_{n+1}, \ldots, x_{n+r} \in{\mathcal X}\) at timesteps \(n+1, \ldots, n+r\). • Posterior state vectors \({\boldsymbol \xi }_{n+r} \in \mathbb M_L\) at timestep \(n+r\). Steps 1. Set \(\tilde{\boldsymbol \xi }_n ={\boldsymbol \xi }_n\). 2. For \(j \in \{ 1, \ldots, r \}\): a. Compute the fluxes \(z_{n+j-1} = (z_{n+j-1}^{(1)}, \ldots, z_{n+j-1}^{(d)}) \in \mathbb R^d\) with \(z_{n+j-1}^{(i)} =\tilde{\boldsymbol \xi }_{n+j-1}^\dagger \boldsymbol Z^{(i)} \tilde{\boldsymbol \xi }_{n+j-1}\). b. Update the resolved variables: \(x_{n+j} = \tilde \phi ( x_{n+j-1},z_{n+j-1})\). c. Update the state vector with transfer operator: \(\tilde{\boldsymbol \xi }_{n+j} = \tilde{\boldsymbol P}(\tilde{\boldsymbol \xi }_{n+j-1})\). 3. Evaluate the effect-valued feature map: \(\boldsymbol e = \tilde{\boldsymbol F}( x_{n+r})\). 4. Compute the state vector \({\boldsymbol \xi }_{n+r} = \tilde{\boldsymbol \xi }_{n+r} |_{\boldsymbol e_{n+r}}\) via (D.12). 5. Return: \(x_{n+1}, \ldots, x_{n+r},{\boldsymbol \xi }_{n+r}\). |
Inputs • Resolved variables \(x_n \in{\mathcal X}\) and density matrix \({\boldsymbol \rho }_n \in \mathbb M_L\) at the \(n\)th timestep. • Number of timesteps \(r \in{\mathbb N}\) per quantum Bayesian update. Outputs • Resolved variables \(x_{n+1}, \ldots, x_{n+r} \in{\mathcal X}\) at timesteps \(n+1, \ldots, n+r\). • Posterior density matrix \({\boldsymbol \rho }_{n+r} \in \mathbb M_L\) at timestep \(n+r\). Steps 1. Set \(\tilde{\boldsymbol \rho }_n ={\boldsymbol \rho }_n\). 2. For \(j \in \{ 1, \ldots, r \}\): a. For \(i \in \{1, \ldots, d \}\): i. Compute the probability vector \(\boldsymbol p_{n+j-1}^{(i)} \in \mathbb R^L\) associated with \(\tilde{\boldsymbol \rho }_{n+j-1}\) and \(\boldsymbol Z^{(i)}\) using (D.6). ii. Draw a sample \(z_{n+j-1}^{(i)}\) from the spectrum \(\{ a^{(i)}_0, \ldots, a^{(i)}_{L-1} \}\) of \(\boldsymbol Z^{(i)}\) with distribution \(\boldsymbol p_{n+j-1}^{(i)}\). b. Set the flux term \(z_{n+j-1} = (z_{n+j-1}^{(1)}, \ldots, z_{n+j-1}^{(d)}) \in \mathbb R^d\), and update the resolved variables: \(x_{n+j} = \tilde \phi ( x_{n+j-1},z_{n+j-1})\). c. Update the density matrix with the transfer operator: \(\tilde{\boldsymbol \rho }_{n+j} = \tilde{\mathsf P}(\tilde{\boldsymbol \rho }_{n+j-1})\). 3. Evaluate the effect-valued feature map: \(\boldsymbol e_{n+r} = \tilde{\boldsymbol F}( x_{n+r})\). 4. Compute the posterior density matrix \({\boldsymbol \rho }_{n+r} = \tilde{\boldsymbol \rho }_{n+r} |_{\boldsymbol e_{n+r}}\) via (D.11). 5. Return: \(x_{n+1}, \ldots, x_{n+r},{\boldsymbol \rho }_{n+r}\). |
D.4. Histograms and autocorrelation functions.
For time-ordered data \(f_0, \ldots, f_{\tilde N-1} \in \mathbb R\) sampled at a fixed interval \(\Delta t\), the value of the time-autocorrelation function \(C_{f,\tilde N}(\tau_j)\) at timeshift value \(\tau_j = j\, \Delta t\), \(j \in{\mathbb N}\), is given byIn the main text, we show plots of the normalized autocorrelation function \(\bar C_{f,\tilde N}(\tau_j) := C_{f,\tilde N}(\tau_j)/ C_{f,\tilde N}(0)\), where \(\bar C_{f,\tilde N}(0) = 1\) by construction. If the \(f_n\) are samples of an observable \(f \in L^2(\mu )\) taken on an orbit \(\omega_0, \ldots, \omega_{\tilde N-1}\) of the dynamics, i.e., \(f_n = f(\omega_n)\) and \(\omega_n = \Phi^n(\omega_0)\), then by the pointwise ergodic theorem, as \(\tilde N \to \infty\), \(C_{f,\tilde N}(\tau_j)\) converges for \(\mu\)-a.e. \(\omega_0 \in \Omega\) to \(C_f(\tau_j) = \langle f, U^j f \rangle\), where \(U : H \to H\) is the Koopman operator induced by \(\Phi\).
\begin{align*} C_{f,\tilde N}(\tau_j) = \frac {1}{\tilde N} \sum_{n=0}^{\tilde N-j} f_n f_{n+j}. \end{align*}
To generate histograms based on the data \(f_0, \ldots, f_{\tilde N-1}\), we split the interval \([\min \{ f_n \}_{n=0}^{\tilde N-1}, \max \{ f_n \}_{n=0}^{\tilde N-1} ]\) into \(B\) uniformly sized bins \(S_0, \ldots, S_{B-1}\) and compute the normalized counts \((N_0/\tilde N, \ldots, N_{S-1}/ \tilde N)\) where \(N_i\) is the number of datapoints \(f_n\) lying in \(S_i\). For our graphs, the value \(B=45\) was chosen.
References
1.
A. Arakawa and W. H. Schubert, Interaction of a cumulus cloud ensemble with the large-scale environment, Part I, J. Atmos. Sci., 31 (1974), pp. 674–701, https://doi.org/10.1175/1520-0469(1974)031%3C0674:IOACCE%3E2.0.CO;2.
2.
H. M. Arnold, I. M. Moroz, and T. N. Palmer, Stochastic parametrizations and model uncertainty in the Lorenz ’96 system, Philos. Trans. Roy. Soc. A, 371 (2013), 20110479, https://doi.org/10.1098/rsta.2011.0479.
3.
Z. Artstein, J. Linshiz, and E. S. Titi, Young measure approach to computing slowly advancing fast oscillations, Multiscale Model. Simul., 6 (2007), pp. 1085–1097, https://doi.org/10.1137/070687219.
4.
V. Baladi, Positive Transfer Operators and Decay of Correlations, Adv. Ser. Nonlinear Dynam. 16, World Scientific, River Edge, NJ, Singapore, 2000.
5.
I. Bengtsson and K. Zyckowski, Geometry of Quantum States. An Introduction to Quantum Entanglement, Cambridge University Press, Cambridge, UK, 2006.
6.
J. Berner, et al., Stochastic parameterization: Toward a new view of weather and climate models, Bull. Amer. Math. Soc., 98 (2017), pp. 565–588, https://doi.org/10.1175/BAMS-D-15-00268.1.
7.
T. Berry, D. Giannakis, and J. Harlim, Nonparametric forecasting of low-dimensional dynamical systems, Phys. Rev. E (3), 91 (2015), 032915, https://doi.org/10.1103/PhysRevE.91.032915.
8.
T. Berry, D. Giannakis, and J. Harlim, Bridging data science and dynamical systems theory, Notices Amer. Math. Soc., 67 (2020), pp. 1336–1349, https://doi.org/10.1090/noti2151.
9.
T. Berry and J. Harlim, Variable bandwidth diffusion kernels, Appl. Comput. Harmon. Anal., 40 (2016), pp. 68–96, https://doi.org/10.1016/j.acha.2015.01.001.
10.
K. Bhattacharya, B. Hosseini, N. B. Kovachki, and A. M. Stuart, Model reduction and neural networks for parametric PDEs, SMAI J. Comput. Math., 7 (2021), pp. 121–157, https://doi.org/10.5802/smai-jcm.74.
11.
M. Blank, Ergodic averaging with and without invariant measures, Nonlinearity, 30 (2017), pp. 4649–4664, https://doi.org/10.1088/1361-6544/aa8fe8.
12.
T. Bolton and L. Zanna, Applications of deep learning to ocean data inference and subgrid parameterization, J. Adv. Model. Earth Syst., 11 (2019), pp. 376–399, https://doi.org/10.1029/2018MS001472.
13.
N. D. Brenowitz and C. S. Bretherton, Prognostic validation of a neural network unified physics parameterization, Geophys. Res. Lett., 45 (2018), pp. 6289–6298, https://doi.org/10.1029/2018GL078510.
14.
S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser, and J. N. Kutz, Chaos as an intermittently forced linear system, Nat. Commun., 8 (2017), 19, https://doi.org/10.1038/s41467-017-00030-8.
15.
D. Burov, D. Giannakis, K. Manohar, and A. Stuart, Kernel analog forecasting: Multiscale test problems, Multiscale Model. Simul., 19 (2021), pp. 1011–1040, https://doi.org/10.1137/20M1338289.
16.
P. Busch, P. J. Lahti, and P. Mittelstaedt, The Quantum Theory of Measurement, Lecture Notes in Phys. New Ser., Springer-Verlag, Berlin, 1996.
17.
K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton, Data-driven discovery of coordinates and governing equations, Proc. Natl. Acad. Sci., 116 (2019), pp. 22445–22451, https://doi.org/10.1073/pnas.1906995116.
18.
A.-T. G. Charalampopoulos and T. P. Sapsis, Machine-learning energy-preserving nonlocal closures for turbulent fluid flows and inertial tracers, Phys. Rev. Fluids 7 (2022), 024305, https://doi.org/10.1103/PhysRevFluids.7.024305.
19.
F. Chatelin, Spectral Approximation of Linear Operators, Classics Appl. Math. 65, SIAM, Philadelphia, 2011, https://doi.org/10.1137/1.9781611970678.
20.
N. Chen and Y. Li, BAMCAFE: A Bayesian machine learning advanced forecast ensemble method for complex turbulent systems with partial observations, Chaos, 31 (2021), 113114, https://doi.org/10.1063/5.0062028.
21.
Y. Chen, D. Sanz-Alonso, and R. Willett, Autodifferentiable ensemble Kalman filters, SIAM J. Math. Data Sci., 4 (2022), pp. 801–833, https://doi.org/10.1137/21M1434477.
22.
A. J. Chorin and F. Lu, Discrete approach to stochastic parametrization and dimension reduction in nonlinear dynamics, Proc. Natl. Acad. Sci. USA, 112 (2015), pp. 9804–9809, https://doi.org/10.1073/pnas.1512080112.
23.
R. Coifman and M. Hirn, Bi-stochastic kernels via asymmetric affinity functions, Appl. Comput. Harmon. Anal., 35 (2013), pp. 177–180, https://doi.org/10.1016/j.acha.2013.01.001.
24.
R. R. Coifman, Y. Shkolnisky, F. J. Sigworth, and A. Singer, Graph Laplacian tomography from unknown random projections, IEEE Trans. Image Process., 17 (2008), pp. 1891–1899, https://doi.org/10.1109/tip.2008.2002305.
25.
D. Crommelin and E. Vanden-Eijnden, Subgrid-scale parameterization with conditional Markov chains, J. Atmos. Sci., 65 (2008), pp. 2661–2675, https://doi.org/10.1175/2008JAS2566.1.
26.
S. Das and D. Giannakis, Delay-coordinate maps and the spectra of Koopman operators, J. Stat. Phys., 175 (2019), pp. 1107–1145, https://doi.org/10.1007/s10955-019-02272-w.
27.
S. Das, D. Giannakis, and J. Slawinska, Reproducing kernel Hilbert space compactification of unitary evolution groups, Appl. Comput. Harmon. Anal., 54 (2021), pp. 75–136, https://doi.org/10.1016/j.acha.2021.02.004.
28.
M. Dellnitz, G. Froyland, and S. Sertl, On the isolated spectrum of the Perron–Frobenius operator, Nonlinearity, 13 (2000), pp. 1171–1188, https://doi.org/10.1088/0951-7715/13/4/310.
29.
W. E, B. Engquist, X. Li, W. Ren, and E. Vanden-Eijnden, Heterogeneous multiscale methods: A review, Commun. Comput. Phys., 2 (2007), pp. 367–450.
30.
T. Eisner, B. Farkas, M. Haase, and R. Nagel, Operator Theoretic Aspects of Ergodic Theory, Grad. Texts in Math. 272, Springer, Cham, 2015.
31.
I. Fatkullin and E. Vanden-Eijnden, A computational strategy for multiscale systems with applications to Lorenz 96 model, J. Comput. Phys., 200 (2004), pp. 605–638, https://doi.org/10.1016/j.jcp.2004.04.013.
32.
B. Fox-Kemper, S. Bachman, B. Pearson, and S. Reckinger, Principles and advances in subgrid modelling for eddy-rich simulations, CLIVAR Exchanges, 65 (2014), pp. 42–46.
33.
D. C. Freeman, D. Giannakis, B. Mintz, A. Ourmazd, and J. Slawinska, Data assimilation in operator algebras, Proc. Natl. Acad. Sci. USA, 120 (2023), e2211115120, https://www.pnas.org/doi/10.1073/pnas.2211115120.
34.
P. R. Gent and J. C. McWilliams, Isopycnal mixing in ocean circulation models, J. Phys. Oceanogr., 20 (1989), pp. 150–155, https://doi.org/10.1175/1520-0485(1990)020%3C0150:IMIOCM%3E2.0.CO;2.
35.
D. Giannakis, Data-driven spectral decomposition and forecasting of ergodic dynamical systems, Appl. Comput. Harmon. Anal., 47 (2019), pp. 338–396, https://doi.org/10.1016/j.acha.2017.09.001.
36.
D. Giannakis, Quantum mechanics and data assimilation, Phys. Rev. E (3), 100 (2019), 032207, https://doi.org/10.1103/PhysRevE.100.032207.
37.
D. Giannakis, Delay-coordinate maps, coherence, and approximate spectra of evolution operators, Res. Math. Sci., 8 (2021), 8, https://doi.org/10.1007/s40687-020-00239-y.
38.
D. Giannakis, A. Henriksen, J. A. Tropp, and R. Ward, Learning to forecast dynamical systems from streaming data, SIAM J. Appl. Dyn. Syst., 22 (2022), pp. 527–558, https://doi.org/10.1137/21M144983X.
39.
D. Giannakis, A. Ourmazd, P. Pfeffer, J. Schumacher, and J. Slawinska, Embedding classical dynamics in a quantum computer, Phys. Rev. A, 105 (2022), 052404, https://doi.org/10.1103/PhysRevA.105.052404.
40.
D. Giannakis, J. Slawinska, and Z. Zhao, Spatiotemporal feature extraction with data-driven Koopman operators, in Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, Proc. Mach. Learn. Res. 44, D. Storcheus, A. Rostamizadeh, and S. Kumar, eds., PMLR, Montreal, Canada, 2015, pp 103–115, https://proceedings.mlr.press/v44/giannakis15.html.
41.
K. T. Goh, J. Kaniewski, E. Wolfe, T. Vértesi, X. Wu, Y. Cai, Y.-C. Liang, and V. Scarani, Geometry of the set of quantum correlations, Phys. Rev. A, 97 (2018), 022104, https://doi.org/10.1103/PhysRevA.97.022104.
42.
G. A. Gottwald and S. Reich, Supervised learning from noisy observations: Combining machine-learning techniques with data assimilation, Phys. D, 423 (2021), 132911, https://doi.org/10.1016/j.physd.2021.132911.
43.
H. Grabert, Projection Operator Techniques in Nonequilibrium Statistical Mechanics, Springer Tracts Modern Physi. 95, Springer-Verlag, Berlin, New York, 1982.
44.
W. W. Grabowski, H. Morrison, S.-I. Shima, G. C. Abade, P. Dziekan, and H. Pawlowska, Modeling of cloud microphysics: Can we do better?, Bull. Amer. Meteor. Soc., 100 (2019), pp. 655–672, https://doi.org/10.1175/BAMS-D-18-0005.1.
45.
W. W. Grabowski and P. K. Smolarkiewicz, CRCP: A Cloud Resolving Convection Parameterization for modeling the tropical convecting atmosphere, Phys. D, 133 (1991), pp. 171–178, https://doi.org/10.1016/S0167-2789(99)00104-9.
46.
I. Grooms and A. J. Majda, Efficient stochastic superparameterization for geophysical turbulence, Proc. Natl. Acad. Sci. USA, 110 (2013), pp. 4464–4469, https://doi.org/10.1073/pnas.1302548110.
47.
S. Gudder, Quantum probability, in Handbook of Quantum Logic and Quantum Structures, K. Engesser, D. M. Gabbary, and D. Lehmann, eds., Elsevier, Amsterdam, 2007, pp. 121–146.
48.
T. Hofmann, B. Schölkopf, and A. Smola, Kernel methods in machine learning, Ann. Statist., 36 (2008), pp. 1171–1220, https://doi.org/10.1214/009053607000000677.
49.
A. S. Holevo, Statistical Structure of Quantum Theory, Lect. Notes Phys. Monogr. 67, Springer, Berlin, 2001.
50.
S. W. Jiang and J. Harlim, Modeling of missing dynamical systems: Deriving parametric models using a nonparametric framework, Res. Math. Sci., 7 (2020), 16, https://doi.org/10.1007/s40687-020-00217-4.
51.
P. W. Jones, A. Osipov, and V. Rokhlin, Randomized approximate nearest neighbors algorithm, Proc. Natl. Acad. Sci. USA, 108 (2011), pp. 15679–15686, https://doi.org/10.1073/pnas.1107769108.
52.
I. Joseph, Koopman–von Neumann approach to quantum simulation of nonlinear classical dynamics, Phys. Rev. Res., 2 (2020), 043102, https://doi.org/10.1103/PhysRevResearch.2.043102.
53.
I. Joseph, Y. Shi, M. D. Porter, A. R. Castelli, V. I. Geyko, F. R. Graziani, S. B. Libby, and J. L. DuBois, Quantum computing for fusion energy science applications, Phys. Plasmas, 30 (2023), 010501, https://doi.org/10.1063/5.0123765.
54.
D. Kelly and I. Melbourne, Deterministic homogenization for fast-slow systems, J. Funct. Anal., 272 (2017), pp. 4063–4102, https://doi.org/10.1016/j.jfa.2017.01.015.
55.
S. Klus, F. Nüske, P. Koltai, H. Wu, I. Kevrekidis, C. Schütte, and F. Noé, Data-driven model reduction and transfer operator approximation, J. Nonlinear Sci., 28 (2018), pp. 985–1010, https://doi.org/10.1007/s00332-017-9437-7.
56.
S. Klus, F. Nüske, S. Peitz, J.-H. Niemann, C. Clementi, and C. Schütte, Data-driven approximation of the Koopman generator: Model reduction, system identification, and control, Phys. D, 406 (2020), 132416, https://doi.org/10.1016/j.physd.2020.132416.
57.
B. O. Koopman, Hamiltonian systems and transformation in Hilbert space, Proc. Natl. Acad. Sci. USA, 17 (1931), pp. 315–318, https://doi.org/10.1073/pnas.17.5.315.
58.
K. Law, A. Shukla, and A. M. Stuart, Analysis of the 3DVAR filter for the partially observed Lorenz’63 model, Discrete Contin. Dyn. Syst., 34 (2013), pp. 1061–10178, https://doi.org/10.3934/dcds.2014.34.1061.
59.
K. Law, A. Stuart, and K. Zygalakis, Data Assimilation: A Mathematical Introduction, Texts Appl. Math. 62, Springer, Cham, 2015, https://doi.org/10.1007/978-3-319-20325-6.
60.
M. E. Levine and A. M. Stuart, A framework for machine learning of model error in dynamical systems, Commun. Amer. Math. Soc., 2 (2022), pp. 283–344, https://doi.org/10.1090/cams/10.
61.
J. W.-B. Lin and J. D. Neelin, Influence of a stochastic moist convective parameterization on tropical climate variability, Geophys. Res. Lett., 27 (2000), pp. 3691–3694, https://doi.org/10.1029/2000GL011964.
62.
J.-P. Liu, H.Ø. Kolden, H. K. Krovi, and A. M. Childs, Efficient quantum algorithm for dissipative nonlinear differential equations, Proc. Natl. Acad. Sci. USA, 118 (2021), e2026805118, https://doi.org/10.1073/pnas.2026805118.
63.
E. N. Lorenz, Deterministic nonperiodic flow, J. Atmos. Sci., 20 (1963), pp. 130–141, https://doi.org/10.1175/1520-0469(1963)020%3C0130:DNF%3E2.0.CO;2.
64.
E. N. Lorenz, Predictability of weather and climate, in Predictability of Weather and Climate, T. Palmer and R. Hagedorn, eds., Cambridge University Press, Cambridge, UK, 1996, pp. 40–58.
65.
A. J. Majda and J. Harlim, Filtering Complex Turbulent Systems, Cambridge University Press, Cambridge, UK, 2012.
66.
A. J. Majda, I. I. Timofeyev, and E. Vanden Eijnden, Models for stochastic climate prediction, Proc. Natl. Acad. Sci. USA, 96 (1999), pp. 14687–14691, https://doi.org/10.1073/pnas.96.26.14687.
67.
A. Mathews, M. Francisquez, J. W. Hughes, D. R. Hatch, B. Zhu, and B. N. Rogers, Uncovering turbulent plasma dynamics via deep learning from partial observations, Phys. Rev. E (3), 104 (2021), 025205, https://doi.org/10.1103/PhysRevE.104.025205.
68.
I. Melbourne and A. M. Stuart, A note on diffusion limits of chaotic skew-product flows, Nonlinearity, 24 (2011), pp. 1361–1367, https://doi.org/10.1088/0951-7715/24/4/018.
69.
I. Mezić, Spectral properties of dynamical systems, model reduction and decompositions, Nonlinear Dynam., 41 (2005), pp. 309–325, https://doi.org/10.1007/s11071-005-2824-x.
70.
O. Owhadi and G. R. Yoo, Kernel flows: From learning kernels from data into the abyss, J. Comput. Phys., 389 (2019), pp. 22–47, https://doi.org/10.1016/j.jcp.2019.03.040.
71.
T. N. Palmer, A nonlinear dynamical perspective on model error: A proposal for non-local stochastic-dynamic parametrization in weather and climate prediction models, Quart. J. Roy. Meteorol. Soc., 127 (2001), pp. 279–304, https://doi.org/10.1002/qj.49712757202.
72.
S. Pan and K. Duraisamy, Data-driven discovery of closure models, SIAM J. Appl. Dyn. Syst., 17 (2018), pp. 2381–2413, https://doi.org/10.1137/18M1177263.
73.
G. A. Pavliotis and A. M. Stuart, Multiscale Methods: Averaging and Homogenization, Texts Appl. Math. 53, Springer, New York, 2008, https://doi.org/10.1007/978-0-387-73829-1.
74.
P. Pfeffer, F. Heyder, and J. Schumacher, Hybrid quantum-classical reservoir computing of thermal convection flow, Phys. Rev. Res., 4 (2022), 033176, https://doi.org/10.1103/PhysRevResearch.4.033176.
75.
D. Qi and J. Harlim, Machine learning-based statistical closure models for turbulent dynamical systems, Philos. Trans. Roy. Soc. A, 380 (2022), 20210205, https://doi.org/10.1098/rsta.2021.0205.
76.
A. Rahimi and B. Recht, Random features for large-scale kernel machines, in Advances in Neural Information Processing Systems, Vol. 20, J. Platt, D. Koller, Y. Singer, and S. Roweis, eds., Curran Associates, Red Hook, NY, 2007, pp. 1177–1184, https://proceedings.neurips.cc/paper/2007/file/013a006f03dbc5392effeb8f18fda755-Paper.pdf.
77.
M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys., 378 (2019), pp. 686–707, https://doi.org/10.1016/j.jcp.2018.10.045.
78.
C. W. Rowley, I. Mezić, S. Bagheri, P. Schlatter, and D. S. Henningson, Spectral analysis of nonlinear flows, J. Fluid Mech., 641 (2009), pp. 115–127, https://doi.org/10.1017/s0022112009992059.
79.
P. Sagaut, Large Eddy Simulation for Incompressible Flows, Springer-Verlag, Berlin, 2006.
80.
J. J. Sakurai, Modern Quantum Mechanics, revised ed., Addison-Wesley, Reading, MA, 1993.
81.
T. Sauer, J. A. Yorke, and M. Casdagli, Embedology, J. Statist. Phys., 65 (1991), pp. 579–616, https://doi.org/10.1007/bf01053745.
82.
R. Schack, Quantum theory from four of Hardy’s axioms, Found. Phys., 33 (2003), pp. 1461–1468, https://doi.org/10.1023/A:1026044329659.
83.
T. Schneider, A. M. Stuart, and J.-L. Wu, Learning stochastic closures using ensemble Kalman inversion, Trans. Math. Appl., 5 (2021), tnab003, https://doi.org/10.1093/imatrm/tnab003.
84.
J. Slawinska, O. Pauluis, A. J. Majda, and W. W. Grabowski, Multiscale interactions in an idealized Walker cell: Simulations with sparse space–time superparameterization, Mon. Wea. Rev., 143 (2015), pp. 563–580, https://doi.org/10.1175/MWR-D-14-00082.1.
85.
J. Slingo and T. Palmer, Uncertainty in weather and climate prediction, Philos. Trans. Roy. Soc. A, 369 (2011), pp. 4751–4767, https://doi.org/10.1098/rsta.2011.0161.
86.
B. K. Sriperumbudur, K. Fukumizu, and G. R. Lanckriet, Universality, characteristic kernels and RKHS embedding of measures, J. Mach. Learn. Res., 12 (2011), pp. 2389–2410.
87.
M. Takesaki, Theory of Operator Algebras I, Encyclopaedia Math. Sci. 124, Springer, Berlin, 2002.
88.
L. A. Takhtajan, Quantum Mechanics for Mathematicians, Grad. Stud. Math. 95, American Mathematical Society, Providence, RI, 2008.
89.
P. R. Vlachas, W. Byeon, Z. Y. Wan, T. P. Sapsis, and P. Koumoutsakos, Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks, Proc. Roy. Soc. A, 474 (2018), 20170844, https://doi.org/10.1098/rspa.2017.0844.
90.
U. von Luxburg, M. Belkin, and O. Bousquet, Consistency of spectral clustering, Ann. Statist., 36 (2008), pp. 555–586, https://doi.org/10.1214/009053607000000640.
91.
D. S. Wilks, Effects of stochastic parametrizations in the Lorenz ’96 system, Quart. J. Roy. Meteorol. Soc., 131 (2005), pp. 389–407, https://doi.org/10.1256/qj.04.03.
92.
M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, A data-driven approximation of the Koopman operator: Extending dynamic mode decomposition, J. Nonlinear Sci., 25 (2015), pp. 1307–1346, https://doi.org/10.1007/s00332-015-9258-5.
93.
J. Yuval and P. A. O’Gormann, Stable machine-learning parameterization of subgrid processes for climate modeling at a range of resolutions, Nat. Commun., 11 (2020), 3295, https://doi.org/10.1038/s41467-020-17142-3.
94.
Y. Zhu and N. Zabaras, Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification, J. Comput. Phys., 366 (2018), pp. 415–447, https://doi.org/10.1016/j.jcp.2018.04.018.
Information & Authors
Information
Published In

Copyright
© 2024 SIAM. Published by SIAM under the terms of the Creative Commons 4.0 license. This work is licensed under a Creative Commons Attribution 4.0 International License.
History
Submitted: 8 August 2022
Accepted: 29 August 2023
Published online: 5 February 2024
Keywords
MSC codes
Authors
Funding Information
U.S. Department of Defense (DOD): N00014-21-1-2946
National Science Foundation (NSF): 1842538, DMS-1854383
Multidisciplinary University Research Initiative (MURI): N00014-19-1-242
Funding: The research of the first and second authors was supported by the U.S. Department of Defense, Basic Research Office under Vannevar Bush Faculty Fellowship grant N00014-21-1-2946. The research of the second author was also supported by the U.S. National Science Foundation under grants 1842538 and DMS-1854383 and the U.S. Office of Naval Research under MURI grant N00014-19-1-242.
Metrics & Citations
Metrics
Citations
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.
Cited By
- Data-Driven Koopman Based System Identification for Partially Observed Dynamical Systems with Input and DisturbanceSci, Vol. 6, No. 4 | 19 December 2024
View Options
View options
- Access via your Institution
- Questions about how to access this content? Contact SIAM at [email protected].