Strategies for Reduced-Order Models for Predicting the Statistical Responses and Uncertainty Quantification in Complex Turbulent Dynamical Systems

Turbulent dynamical systems characterized by both a high-dimensional phase space and a large number of instabilities are ubiquitous among many complex systems in science and engineering. The existence of a strange attractor in the turbulent systems containing a large number of positive Lyapunov exponents results in a rapid growth of small uncertainties, requiring naturally a probabilistic characterization for the evolution of the turbulent system. Uncertainty quantification in turbulent dynamical systems is a grand challenge where the goal is to obtain statistical estimates such as the change in mean and variance for key physical quantities in their nonlinear responses to changes in external forcing parameters or uncertain initial data. One central issue in contemporary research is the development of a systematic methodology that can recover the crucial features of the natural system in statistical equilibrium (model fidelity) and improve the imperfect model prediction skill in response to various external perturbations (model sensitivity). A general mathematical framework to construct statistically accurate reduced-order models that have skill in capturing the statistical variability in the principal directions with largest energy of a general class of damped and forced complex turbulent dynamical systems is discussed here. The methods are developed under a universal class of turbulent dynamical systems with quadratic nonlinearity that is representative in many applications in applied mathematics and engineering. The validity of general framework of reduced-order models is demonstrated on instructive stochastic triad models. Recent applications to two-layer baroclinic turbulence in the atmosphere and ocean with combinations of turbulent jets and vortices are also surveyed.

Turbulent dynamical systems characterized by both a high-dimensional phase space and a large number of instabilities are ubiquitous among many complex systems in science and engineering [51,54,92,75]. The existence of a strange attractor [93] in turbulent systems containing a large number of positive Lyapunov exponents results in a rapid growth of small uncertainties from imperfect modeling equations or perturbations in initial values, requiring naturally a probabilistic characterization for the evolution of the turbulent system. Uncertainty quantification (UQ) in turbulent dynamical systems is a grand challenge where the goal is to obtain statistical estimates such as the change in mean and variance for key physical quantities in their nonlinear responses to changes in external forcing parameters or uncertain initial data. One problem of practical significance in contemporary science is using UQ to understand the complexity of anisotropic turbulent processes over a wide range of spatio-temporal scales in engineering shear turbulence [35,91,26] as well as climate atmosphere ocean science [84,92,51]. This is especially important from a practical viewpoint because energy often flows intermittently from the smaller scales to affect the largest scales in such anisotropic turbulent flows.
In the development of a proper UQ scheme for systems of high or infinite dimensionality with instabilities, the analysis and prediction of phenomena often occur through complex dynamical equations that have significant model errors compared with the true natural signal. The imperfect model errors are always unavoidable due to both the imperfect understanding of the underlying physical processes and the limited computational resources needed for repeated Monte-Carlo simulations in each different scenario of high dimensional systems. Clearly, it is important both to improve the imperfect model's capabilities to recover crucial features of the natural system and also to accurately model the sensitivities in the natural system to changes in external or internal parameters. These efforts are hampered by the fact that the actual dynamics of the natural system are unknown. Important examples with major societal impact involve the Earth's climate and climate change where climate sensitivities are studied through a suite of imperfect comprehensive computer models ( [22,80,57], and references therein); other examples include imperfect mesoscopic models in materials science [13,39] and neural science [81] when compared with actual observed behavior in these complex nonlinear systems.
Recently, information theory has been utilized in different ways to systematically improve model fidelity and sensitivity [58,57], to quantify the role of coarse-grained initial states in long-range forecasting [28,29], and to make an empirical link between model fidelity and forecasting skill [19,20]. Imperfect models for complex systems are constrained by their capability to reproduce certain statistics in a training phase where the natural system has been observed; for example, this training phase in climate science is roughly the 60-year dataset of extensive observations of the Earth's climate system. For long-range forecasting, it is natural to guarantee statistical equilibrium fidelity for an imperfect model, and a framework using information theory is a natural way to achieve this in an unbiased fashion [58,57,28,29,20]. First, equilibrium statistical fidelity for an imperfect model depends on the choice of coarse-grained variables utilized [58,57]; second, equilibrium model fidelity is a necessary but not sufficient condition to guarantee long-range forecasting skill [59,29]. For example, Section 2.6 of [48] extensively discusses three very different strongly mixing chaotic dynamical models with 40 variables and with the same Gaussian equilibrium measure, the TBH, K-Z, and IL-96 models.
One significant application of UQ through empirical information theory is quantifying uncertainty in climate change science [57,54]. The climate is an extremely complex coupled system involving multiple physical processes for the atmosphere, ocean, and land over a wide range of spatial scales from millimeters to thousands of kilometers and time scales from minutes to decades or centuries [22,74]. Climate change science focuses on predicting the coarse-grained planetary scale long time changes in the climate system due to either changes in external forcing or internal variability such as the impact of increased carbon dioxide [31,29]. For several decades the predictions of climate change science have been carried out with some skill through comprehensive computational atmospheric and oceanic simulation (AOS) models [22,74,80], which are designed to mimic the complex physical spatio-temporal patterns in nature. Such AOS models either through lack of resolution due to current computing power or through inadequate observation of nature necessarily parameterize the impact of many features of the climate system such as clouds, mesoscale and submesoscale ocean eddies, sea ice cover, etc. Thus, there are intrinsic model errors in the AOS models for the climate system and the effect of such model errors on predicting the coarse-grained large scale long time quantities is of interest. One central scientific issue in contemporary climate change science is the development of a systematic methodology that can recover the crucial features of the natural system in statistical equilibrium/climate (model fidelity) and improve the imperfect model prediction skill in response to various external perturbations like climate change and mitigation scenarios (model sensitivity) [2,5,20,59,54].
Here we discuss a general mathematical framework to construct statistically accurate reduced-order models that have the skill in capturing the statistical variability in the principal directions with largest energy of a general class of damped and forced complex turbulent dynamical systems. Low-order truncation methods is especially important for UQ with practical impact since the curse of ensemble size [6,54] forbids to run Monte-Carlo simulations for all possible uncertain forcing scenarios in order to do attribution studies. Thus reduced-order models (ROM) are needed on a low-dimensional subspace where key physical significant quantities are characterized by the degrees of freedom that carry the largest energy or variance. In general, there are three stages in the modeling procedure, imperfect model selection; calibration of the imperfect model in a training phase using only data in the low-order perfect statistics; and prediction of the responses to a wide class of forcing and perturbation scenarios. The methods are developed for a universal class of turbulent dynamical systems with quadratic nonlinearity that is representative in many applications in applied mathematics and engineering [56,51,54]. Several mathematical ideas will be introduced to help improve the prediction skill of the imperfect reduced-order models. Most importantly, empirical information theory [48] and statistical linear response theory [59] are applied in the training phase for calibrating model errors to achieve optimal imperfect model parameters; and total statistical energy dynamics [53] are introduced to improve the model sensitivity in the prediction phase especially when strong external perturbations are exerted. The validity of general framework of reducedorder models has been verified by testing the methods on a series of representative turbulent dynamical models ranging from the 40-dimensional Lorenz '96 model [63], one-layer barotropic turbulence with topography [78], and finally the two-layer baroclinic turbulence with internal instability [79].
In the following parts of the paper, we first display the general formulation of the turbulent system with quadratic nonlinearity, and its statistical moment dynamical equations in Section 1. The skill and limitation of many previous low-order modeling ideas are also discussed. Theoretical toolkits that are useful for the development of reduced-order models are introduced in Section 2, where a general strategy to improve imperfect model sensitivity is described using empirical information theory and a general total statistical energy dynamics. Section 3 discusses the construction of reduced-order models in detail under this general framework with these various theoretical tools. In Section 4, we illustrate all these procedures and algorithms for the reduced-order models for some simple but instructive systems of triad stochastic equations with several novel features. In Section 5, we give examples of the skill of the procedures and algorithms on two-layer baroclinic models for both atmosphere and ocean regimes with turbulent jets and vortices with roughly 1 × 10 5 degrees of freedom and direct and inverse turbulent cascades. In these very tough regimes, the reduced-order strategies show skill in capturing the response to changes in external forcing using only 200 modes, less than 0.15% of the modes in the original system.
1. General Formulation of Turbulent Dynamical Systems with Nonlinearity. One representative feature in many turbulent dynamical systems from nature is the quadratic energy conserving nonlinear interaction that transfers energy from the unstable modes to stable ones where the energy is dissipated resulting in a statistical steady state in equilibrium. We consider the following abstract formulation of the turbulent dynamical systems about state variables u ∈ R N in a high-dimensional phase space (1.1) du dt = (L + D) u + B (u, u) + F (t) + σ (t)Ẇ (t; ω) .
On the right hand side of the above equation (1.1), the first two components, (L + D) u, represent linear dispersion and dissipation effects so that where the superscript star ' * ' represents conjugate transpose of the matrix. The nonlinear effect in the dynamical system is introduced through a quadratic form, B (u, u), about the state variables u that conserves energy when linear operators and all forcing in (1.1) are ignored, such that, where the dot on the left hand side denotes the inner product under a proper metric according to the conserved quantity [53,54]. Besides, the system is forced by external forcing effects that are decomposed into a deterministic component, F (t), and a stochastic component usually represented by a Gaussian random process, σ (t)Ẇ (t; ω). It needs to be noticed that F (t) might be inhomogeneous and introduce anisotropic structure into the system, and σ (t)Ẇ (t; ω) might further alter the energy structure in the fluctuation modes. Many complex turbulent dynamical systems can be categorized into this abstract mathematical structure in (1.1) satisfying the properties (1.2a) and (1.2b), including the (truncated) Navier-Stokes equation [76] as well as basic geophysical models for the atmosphere, ocean, and the climate systems with rotation, stratification, and topography [84,51,54]. The main goal of the remainder of this paper is to provide a survey about the development of a consistent mathematical framework for systems like (1.1) and illustrate emerging applications of turbulent dynamical systems with model error and the curse of ensemble size.
1.1. Exact statistical moment equations for the abstract formulation. We use a finite-dimensional representation of the stochastic field u consisting of a fixed-in-time, N-dimensional, orthonormal basis whereū (t) = u (t) represents the ensemble average of the model state variable response (we use angled bracket to represent ensemble average), i.e. the mean field, and Z i (t; ω) are stochastic coefficients measuring the fluctuation processes along the direction e i . By taking the statistical (ensemble) average over the original equation (1.1) and using the mean-fluctuation decomposition (1.3), the evolution equation of the mean stateū = u is given by the following dynamical equation (1.4) dū dt = (L + D)ū + B (ū,ū) + ∑ i, j R i j B (e i , e j ) + F, with R = ZZ * the second-order covariance matrix of the stochastic coefficients Z = {Z i } N i=1 . The term B (ū,ū) represents the nonlinear interactions between the mean state, and R i j B (e i , e j ) is the higher-order feedbacks from the fluctuation modes to the mean state dynamics. Moreover the random fluctuation component of the solution, u = ∑ i Z i (t; ω) e i satisfies du dt = (L + D) u + B ū, u + B u ,ū + B u , u − B u , u + σ (t)Ẇ (t; ω) .
By projecting the above equation to each orthonormal basis element e i we obtain dZ i dt = Z j [(L + D) e j + B (ū, e j ) + B (e j ,ū)] · e i + B u , u − B u , u · e i + σ (t)Ẇ (t; ω) · e i .
From the last equation we directly obtain the exact evolution equation of the covariance matrix R = ZZ * by multiplying Z * j on both sides of the equation and taking ensemble statistical average where we have: i) the linear dynamical operator L v (ū) expresses energy transfers between the mean field and the stochastic modes (effect due to B), as well as energy dissipation (effect due to D) and non-normal dynamics (effect due to L ) ii) the positive definite operator Q σ expresses energy transfer due to the external stochastic forcing iii) as well as the energy flux Q F expresses nonlinear energy transfer between different modes due to non-Gaussian statistics (or nonlinear terms) modeled through third-order moments One important property to notice is that the energy conservation property of the quadratic operator B is inherited in the statistical equations by the matrix Q F since The above exact statistical equations for the state of the mean (1.4) and covariance matrix (1.5) will be the starting point for the developments in the reduced-order models on UQ methods. Note that the statistical dynamics for the mean (1.4) and covariance (1.5) are still not closed due to the inclusion of thirdorder moments through the nonlinear interactions in Q F in (1.6c). The basic idea in the general development of reduced-order schemes concerns about proper approximation about this energy flux term Q F in a simple and efficient manner so that the energy mechanism can be modeled properly in the reduced-order schemes [44,84,89].
1.1.1. Low-order truncation methods for UQ and their limitations. Next we briefly discuss some popular low-order truncation methods for closing the statistical equations (1.4) and (1.5) and their limitations. Low-order truncation models for UQ include projection of the dynamics on leading order empirical orthogonal functions (EOFs) [36], truncated polynomial chaos (PC) expansions [37,41,72], and dynamically orthogonal (DO) truncations [85,86]. Then ideas about closing the low-order truncated system within the resolved modes need to be proposed. A pioneering statistical prediction strategy [23,24] overcomes the curse of ensemble size for moderate size turbulent dynamical systems by simply neglecting the third-order moments by setting Q F ≡ 0 in the covariance equations (1.5). This Gaussian closure method has been applied to short time statistical prediction for truncated geophysical models like the one-layer geophysical models in (1.9c) with some success [24,90]. Based on the similar idea of neglecting third-order moments, the eddy-damped quasi-normal Markovian approximation (EDM) [84,44] is another approximation to the moment hierarchy (1.4) and (1.5) that closes the second moments with (inconsistent) Gaussian approximation in the higher order equations. With a much larger eddy-damped parameters, the EDM equations are realizable in a stochastic model.
Moreover concise mathematical models and analysis reveal fundamental limitations in truncated EOF expansions [3,18], PC expansions [9,56], and DO truncations [87], due to different manifestations of the fact that in many turbulent dynamical systems, modes that carry small variance on average can have important, highly intermittent dynamical effects on the large variance modes. Furthermore, the large dimension of the active variables in turbulent dynamical systems makes direct UQ by large ensemble Monte-Carlo simulations impossible in the foreseeable future while once again, concise mathematical models [56] point to the limitations of using moderately large yet statistically too small ensemble sizes. Other important methods for UQ involve the linear statistical response to change in external forcing or initial data through the fluctuation dissipation theorem (FDT) which only requires the measurement of suitable time correlations in the unperturbed system [1,31,32,33,61]. Despite some significant success with this approach for turbulent dynamical systems [1,31,32,61], the method is hampered by the need to measure suitable approximations to the exact correlations for long time series as well as the fundamental limitation to parameter regimes with a linear statistical response. All the limitations above imply the need of a more careful treatment for the higher-order statistics in Q F in the exact equations for mean and covariance (1.4) and (1.5).
1.2. The overall prediction strategy for the development of reduced-order statistical models. Before preceding to the details about developing the reduced-order statistical model framework, we illustrate the basic ideas in the modeling process as a general overview. Overall, this can serve as a generic procedure where rigorous mathematical theories and various computational strategies are combined to get a crucial improvement for understanding turbulent dynamical systems. In general, we can decompose the reduced-order statistical modeling strategy into three stages: i) imperfect model selection according to the complexity of the problem; ii) model calibration in the training phase using equilibrium data; and iii) model prediction with the optimized model parameters for various responses to external perturbations. The overall prediction strategy is summarized in a diagram in Figure 1.1. The basic procedure for developing statistical models illustrates a representative example where various mathematical theories and numerical methods interact and are combined for achieving a better understanding about the natural system.
1.2.1. Ergodicity and non-trivial invariant measure for the true turbulent dynamical systems. In the first place, the best reduced-order approximation strategy can only be achieved through a good understanding about the true turbulent dynamical system. Several important mathematical theories are especially useful for characterizing the statistical structure of the turbulent system. Under special damping and random noise forms without the deterministic forcing F ≡ 0, a Gaussian invariant measure can be generated in the statistical steady state, whereas this Gaussian distribution from equilibrium statistical mechanics can be only derived from special damping and noise terms [54,78]. One more generalized situation with importance in many realistic applications is when no stochastic forcing in the damped and forced dynamical system (1.1), so that the deterministic system with σ ≡ 0 has non-trivial long-time dynamics. The uncertainty in such deterministic systems is measured by the unstable sub-phase space with a number of positive Lyapunov exponents, thus a nontrivial global attractor is generated through the strong interaction and exchange of energy [82]. This scenario is similar to the Sinai-Ruelle-Bowen (SRB) measure problem [93,83]. In that case, a unique distinguished invariant measure p eq , the SRB measure, is the one selected by the vanishing noise limit with appropriate assumptions on the system and noise. This distinguished invariant measure forms up a stationary statistical solution p eq in equilibrium, so that (1.7) p eq Φ t −1 (Ω) = p eq (Ω) , for any t > 0 and Ω ⊂ R N , with Φ t : R N → R N as the flow map. This invariant measure (1.7) provides a mechanism for explaining how local instability on attractors can produce coherent statistics for orbits starting from large sets in the basin. The statistical ensemble behaviour in equilibrium such as the mean state and covariance can be deduced by taking averages with respect to the invariant measure. Ergodicity is then one important property for the turbulent dynamical system with uncertainty, and means that there exists a unique invariant measure in statistical equilibrium which attracts all statistical initial data. Geometric ergodicity for finite dimensional Garlerkin truncation models (for example, the two or three dimensional Navier-Stokes equations) with minimal stochastic forcing is an important research topic [21,54,65]. With proper ergodicity assumption about the abstract system (1.1) and rigorously justified for the system with minimal stochastic forcing [54,65], the statistical expectation of any functionals about the state variables can be calculated through averaging the time-series in steady state, that is, where g is any functional about the state variables u, and p eq is the invariant measure (1.7) in statistical equilibrium. Taking the ensemble averages from the first equality of (1.8) is usually an extremely challenging problem, while the average along a trajectory over a long time as the right hand side of (1.8) forms a more practical approach. Ergodicity is crucial in this prediction strategy for achieving accurate perfect model statistics, and will be assumed throughout the following discussions.

Modeling Stages Math. & Computational Tools
Model Selection 1.2.2. Model selection, model calibration, and prediction with optimized imperfect model. The ergodic theory and invariant measure enable us to get access to the model equilibrium statistical structures in steady state. Still the major goal in this investigation is to find the model sensitivity in response to various external perturbations. Especially for the turbulent dynamical systems with instability like (1.1), nonlinearity forms the key mechanism in the complex chaotic behaviour, and even small perturbations may drive the system away from its original equilibrium state. Furthermore, strong non-Gaussianity due to the strange attractor from the SRB measure is another characteristic feature in these turbulent systems with non-Gaussian measures even in equilibrium. The reduced-order statistical modeling procedure aims at capturing these nonlinear non-Gaussian statistical responses in the principal directions in the system in an accurate and efficient way.
As illustrated in Figure 1.1 for the general strategy, the modeling procedure begins with the model selection stage where proper approximation method is adopted through a careful analysis about the statistical theories. Specifically in the reducedorder models to be developed here, usually additional damping and random noise corrections are introduced for the unresolved higher-order statistics. The equilibrium invariant measure and ergodic theory [65,89] can help determine the optimal Galerkin truncation wavenumber for the reduced-order model and the proper basis that can cover the most important directions in the system. Especially, non-Gaussian statistics in the unperturbed equilibrium state would also become important and require careful consideration in the model calibration.
The model calibration procedure is usually carried out in a training phase before the prediction, so that the optimal imperfect model parameters can be achieved through a careful calibration about the true higher-order statistics. The ideal way is to find a unified systematic strategy where various external perturbations can be predicted from the same set of optimal parameters through this training phase. To achieve this, various statistical theories and numerical strategies need to be blended together in a judicious fashion. Most importantly, we need to consider the linear statistical response theory to calibrate the model responses in mean and variances [48,52,33]; and use empirical information theory [58,59,51] to get a balanced measure for the error in the leading order moments. In the final model prediction stage, the optimized imperfect model parameters are applied for the forecast of various model responses to perturbations. In the construction about numerical models, numerical issues also need be taken into account to make sure numerical stability and accuracy. Especially, proper schemes with accuracy order consistent with the reduced model approximation error should be proposed to ensure optimal performance.
1.3. Low-order models illustrating model selection, calibration, and prediction in UQ. Here we provide a brief discussion of some instructive quantitative and qualitative low-order models where the above strategy for improved prediction and UQ is displayed. The test models as in nature often exhibit intermittency [26,76] where some components of a turbulent dynamical system have low amplitude phases followed by irregular large amplitude bursts of extreme events. Intermittency is an important physical phenomena. Exactly solvable test models as a test bed for the prediction and UQ strategy [54,48,57] including information barriers are discussed extensively in models ranging from linear stochastic models to nonlinear models with intermittency in the research expository article [56] as well as in [7,8]. Some more sophisticated applications are mentioned next in Section 1.4. Turbulent diffusion in exactly solvable models is a rich source of highly nontrivial spatiotemporal multi-scale models to test the strategies in empirical information theory and kicked statistical response theory in a more complex setting [27,58,59,60]. Even though these models have no positive Lyapunov exponents, they have been shown rigorously to exhibit intermittency and extreme events [66]. Calibration strategies for imperfect models using information theory have been developed recently to yield statistical accurate prediction of these extreme events by imperfect inexpensive linear stochastic models for the velocity field [77]. This topic merits much more attention by other modern applied mathematicians [70,71].
1.3.1. Nonlinear regression models for time series. A central issue in contemporary science is the development of data driven statistical dynamical models for the time series of a partial set of observed variables which arise from suitable observations from nature (see [17] and references therein); examples are multi-level linear autoregressive models as well as ad hoc quadratic nonlinear regression models. It has been established recently [67] that ad hoc quadratic multi-level regression models can have finite time blow up of statistical solutions and pathological behavior of their invariant measure even though they match the data with high precision. A new class of physics-constrained multi-level nonlinear regression models was developed which involve both memory effects in time as well as physics-constrained energy conserving nonlinear interactions [34,62], which completely avoid the above pathological behavior with full mathematical rigor.
A striking application of these ideas combined with information calibration to the predictability limits of tropical intraseasonal variability such as the Madden-Julian oscillation (MJO) and the monsoon has been developed in a series of papers [16,15,14]. They yield an interesting class of low-order turbulent dynamical systems with extreme events and intermittency. The nonlinear low-order stochastic model (see Section 4.2 of [54]) has been shown to have significant skill for determining the predictability limits of the large-scale cloud patterns of the boreal winter MJO [16] and the summer monsoon [14]. It is an interesting open problem to rigorously describe the intermittency and other mathematical features in these low-order turbulent dynamical systems.

Examples of complex turbulent dynamical systems.
Here we list some typical prototype models of complex turbulent dynamical systems with the structure in (1.1). These qualitative and quantitative models with increasing complexity form a desirable set of testing models for prediction, UQ, and state estimation [54]. We will finally test the reduced-order modeling strategies on all these typical models as a thorough discussion about the effectiveness and limitations of the model reduction ideas including a complete new treatment for the triad example. (A) The triad system with quadratic energy transfer. The triad model [50,54] is the elementary building block of complex turbulent systems with energy conserving nonlinear interactions. It is a 3-dimensional ODE system with inhomogeneous damping and both deterministic and stochastic forcing terms The triad system is an instructive test model for the reduced-order strategies. A self-contained pedagogical discussion about the triad system is shown in Section 4. (B) 40-dimensional Lorenz '96 model. The Lorenz '96 model [46,63,54,51] is a 40-dimensional turbulent dynamics defined with periodic boundary condition which mimics weather waves of the mid-latitude atmosphere. Various representative statistical features can be generated by changing the external forcing values in F See [63] for the detailed reduced-order modeling strategy. (C) One-layer barotropic model with topography. The one-layer barotropic system [51,54,78] is a basic and simple geophysical model for the atmosphere or ocean with the essential geophysical effects of rotation, topography, and both deterministic and random forcing.
See [78] for the detailed reduced-order modeling strategy. (D) Two-layer quasi-geostrophic model with baroclinic instability. The two-layer quasi-geostrophic model with baroclinic instability in a two-dimensional periodic domain [84,92,79] is one fully nonlinear fluid model, and is quite capable in capturing the essential physics of the relevant internal variability despite its relatively simple dynamical structure. (1.9d) See [79] and discussions in Section 5 for the reduced-order modeling strategy.

Statistical Theory Toolkits for Improving Model Prediction Skill.
In this section we introduce the general theoretical toolkits that are useful for capturing the key statistical features in turbulent systems like (1.1) and improving imperfect model prediction skill. Despite the complex model statistical responses in each component as the turbulent dynamical system gets perturbed, there exists a simple and exact statistical energy conservation principle for the total statistical energy of the system describing the overall (inhomogeneous) statistical structure in the system through a simple scalar dynamical equation [53,54]. The theory is briefly described in Section 2.1. Then the construction about the imperfect reduced-order models concerns about the consistency in equilibrium (climate fidelity) and the responses to perturbations (model sensitivity). Equilibrium statistical fidelity should be guaranteed in the first place so that the reduced-order model will converge to the true unperturbed equilibrium statistics. To further calibrate the detailed model sensitivity to perturbations in each statistical component, the linear response theory can offer useful quantities to measure for quantifying the crucial statistics in the model structure. Combining with the relative entropy under empirical information theory, a general information-theoretical framework can be proposed to tune the imperfect model parameters in a training phase, thus optimal model parameters can be used for model prediction in various dynamical regimes. We will describe the basic statistical theories in this section.
2.1. A statistical energy conservation principle. Despite the fact that the exact equations for the statistical mean (1.4) and the covariance fluctuations (1.5) are not closed equations, there is suitable statistical symmetry so that the energy of the mean plus the trace of the covariance matrix satisfies an energy conservation principle even with general deterministic and random forcing. Here we briefly introduce the theory developed in [53,54] about a total statistical energy dynamics for the abstract system (1.1). This total statistical energy offers a general description about the total responses in the perturbed system and will be shown useful for the construction of reduced-order models.
Consider the statistical mean energy,Ē = 1 2 |ū| 2 = 1 2ū ·ū, and the statistical fluctuation energy, E = 1 2 u · u = 1 2 trR. Assume the following symmetries involving the nonlinear interaction operator B under the orthonormal basis {e i }: A) The self interactions vanish in the quadratic interaction, B) The dyad interaction coefficients vanish through the symmetry, Therefore the detailed triad symmetry guarantees that the nonlinear interaction B (u, u) will not alter the total statistical energy structure in the system (though the state of the mean and covariance may both change due to the nonlinear term in (1.4) and (1.5)). So we have the following theorem [53,54]: where R satisfies the exact covariance equation in (1.5). Matrix Q σ expresses energy transfer due to external stochastic forcing, and is a diagonal matrix with entries, Q σ ,kk = |σ k | 2 .
For most practical dynamical systems, for example, the systems we have illustrated in (1.9a)-(1.9d), the symmetries in (2.1) are usually satisfied. Also a generalization allowing both dyads and triads in the statistical energy conservation principle is in [54]. Thus the statistical energy conservation principle can always be applied. Notice that especially under the homogeneous dissipation case, D = −dI, the right hand side of the statistical energy equation (2.2) will become a linear damping term for the total energy, −dE, plus the deterministic forcing applying on the mean state and the stochastic forcing contribution. This implies that the total energy structure (and thus the total variance in all the modes) can be determined from the statistical mean state by solving the scalar equation above.

Statistical equilibrium fidelity in approximation models.
Here we consider the statistical energy of the dynamical system in each individual (spectral) mode. Statistical equilibrium fidelity concerns the convergence to the true equilibrium statistics in statistical steady state in the reduced-order models. Recall the true second-order statistical equation (1.5) The most difficult and expensive part in solving the above system comes from evaluating the nonlinear flux term Q F where higher order statistics are involved, that is, Note that the third-order moments always include triad interactions of modes Z m , Z n , Z j between different scales, where nonlinear energy forward-cascade and backward-cascade along the energy spectrum can be induced. Thus the central issue in developing closure models becomes to find proper approximation about the nonlinear flux term Q M F ∼ Q F which can offer a statistically consistent estimation. First of all, it is important to remember the conservation of the total nonlinear flux trQ F ≡ 0 from (1.6d). This equality implies that the nonlinear interactions will not introduce additional energy source or sink into the system. Thus the same constraint should be maintained in designing the approximation models, trQ M F = 0. Consideration about accuracy and computational efficiency should be balanced in determining the explicit form of Q M F in the implementation of reduced methods. Here we first display some theoretical principles about the equilibrium nonlinear flux Q F,eq that can be used as guidelines for determining the values in Q M F . 2.2.1. Calibration about higher-order statistics in full phase space. In the prediction of model responses it is most important to find the variability along each principal direction. In general, the nonlinear flux Q F illustrates the nonlinear energy transfer between modes with different scales. In fact, we can decompose the matrix Q F = Q + F + Q − F by singular value decomposition into a positive-definite and negative-definite component. The positive definite part Q + F illustrates the additional energy that is injected into this mode from other scales, while the negative definite part Q − F shows the extraction of energy through nonlinear transfer to other scales. Thus the accurate approximation about the nonlinear flux Q F,i j in each (spectral) component becomes important. On the other hand, this approximation requires the calibration about the third-order moments Z m Z n Z j and Z m Z n Z i , and will always include the interactions between the (resolved) large-scale modes and (unresolved) smaller-scale fluctuations. Direct simulation would require ensemble averages for the third-order moments, where large numerical errors and high computational loads are almost unavoidable.
Instead, from the statistical dynamics for the covariance equation (1.5) in statistical steady state, the temporal derivative on the left hand side vanishes, d dt R eq ≡ 0, thus the equilibrium solution ū eq , R eq necessarily satisfies the steady state equation where Q F,eq includes the third-order moments evaluated at the statistical steady state. Therefore we can get the measurements about equilibrium third-order nonlinear flux through the lower order steady state solution of the mean,ū eq , and the covariance, The quasi-linear operator L v ū eq is defined through (1.6a) containing the interactions between mean state. Especially, the non-trivial third moments play a crucial dynamical role in the statistical closure models. As an example in the case without random forcing Q σ ≡ 0, the necessary and sufficient condition for a non-Gaussian statistical steady state [54] requires that L v ū eq R eq + R eq L * v ū eq = 0, so the above matrix has non-zero entries. This is an important constraint that needs to be considered first in the construction about reduced-order models in the next section.

2.2.2.
Calibration of higher-order statistics in the reduced subspace. Despite the above exact model calibration for higher-order statistics (2.3) using equilibrium mean and covariance, in many realistic problems, resolving the entire covariance matrix R ∈ C N×N of order O N 2 is still expensive and unnecessary especially for high dimensional systems N 1 with strong interactions between small and large scales. Often the key physical significant quantities are characterized by the degrees of freedom which carry the largest energy (or variance). Thus, for most cases we are mostly interested in the model variability in a low-dimensional subspace along the principal directions spanned by the subspatial basis One simplest proposal to get the low-order basis v j is through the leading order EOFs or energy based proper orthogonal decomposition [36,3]. The reduced-order third-order nonlinear flux can be calculated through a more efficient way using only the mean state,ū eq , and covariance in the subspace of interest, C = P * RP ∈ C s×s . By projecting the original nonlinear flux formulation (2.3) onto the subspace, we have the reduced-order formulation where the reduced-order quasi-linear operator and reduced-order noise can also be calculated efficiently only using information in the subspace with resolved leading order statistics Thus even though Q red F may still include many third moments between the low-wavenumber resolved modes and high-wavenumber modes that are not calculated explicitly in the reduced-order equations only for C, we can still achieve the equilibrium nonlinear flux constrained in the resolved subspace of interest by using only the mean and covariances ū eq ,C eq along the resolved directions {v 1 , · · · , v s }.
In general, the first two moments in equilibrium can be achieved through the ergodicity (1.8) by averaging the variables of interest along one solution trajectory, thus we can get the calibration about the third-order moment feedbacks in the secondorder dynamics by solving the equation (2.3) or (2.4). Besides, we also find one necessary condition for confirming equilibrium fidelity for the reduced-order models for the construction of nonlinear flux term, so that consistent nonlinear flux Q F is guaranteed in the final steady state Actually, the idea of estimating the higher-order statistics through low-order moments has been exploited for several specific models in [89,88,63]. The equilibrium statistics from (2.4) can efficiently calibrate the model nonlinear energy transfer mechanism along each resolved principal direction. However, as external perturbations are exerted, nonlinear responses will take place with large deviation from the original equilibrium statistical data calculated in Q F,eq . Next we will discuss the strategy to calibrate the model sensitivity to perturbations in a unified way.

2.3.
Linear response theory and kicked responses. The linear response theory as well as fluctuation-dissipation theorem (FDT) offers a convenient way to get leading-order statistical linear approximation about model responses to perturbations [12,48,68,59]. Consider the general unperturbed system (1.1), δ F = 0, with invariant measure p eq (u), and an external forcing perturbation in separation with temporal and spatial variables, Therefore the resulting perturbed probability density p δ can be asymptotically expanded as the equilibrium and the fluctuation correction [48] (2.6) The equilibrium statistics and leading-order correction to the perturbation of some functional about the state variable A (u) can be formulated as an asymptotic expansion, according to δ p . Therefore we get the leading order responses from Above the pointed-bracket denotes the statistical average under the solution from Fokker-Planck equation. R A (t) is the linear response operator corresponding to the functional A, which is calculated through correlation functions in the unperturbed statistical equilibrium (climate) only The noise in the equations is not needed for FDT to be valid, but is required to generate the smooth equilibrium measure p eq for the linear response operator R A . There is even a rigorous proof of the validity of FDT in this context [33]. Note that even though in general the linear response operator is difficult to calculate considering the complicated and unaccessible equilibrium distribution, a variety of Gaussian approximations for p eq and improved algorithms have been developed for response via FDT [43,48,61,52]. FDT can have high skill for the mean response and some skill for the variance response for a wide variety of turbulent dynamical systems [55, 1, 61, 2, 47, 32].

Calculate linear response operators through initial kicked responses.
The problem in calculating the leading order response using (2.8) is that the equilibrium distribution p eq is expensive to calculate for general systems with non-Gaussian features in a high dimensional phase space. One strategy to approximate the linear response operator which avoids direct evaluation of p eq through the FDT formula but still includes important non-Gaussian statistics is through the kicked response of an unperturbed system to a perturbation δ u of the initial state from the equilibrium measure, that is, to set the initial distribution with the same variance but a perturbation in the mean state One important advantage of adopting this kicked response strategy is that higher-order statistics due to nonlinear dynamics will not be ignored (compared with the other linearized strategy using only Gaussian statistics [55]). Then the kicked response theory gives the following proposition [48,58] for calculating the linear response operator: For δ small enough, the linear response operator R A (t) can be calculated by solving the unperturbed system (1.1) with a perturbed initial distribution in (2.9). Therefore, the linear response operator can be achieved through Here δ p is the resulting leading order expansion of the transient density function from unperturbed dynamics using initial value perturbation. From the formula in (2.10), the response operators for the mean and variance can be achieved from the perturbation part of the probability density δ p . And this density function can also be used to measure the information distance between the truth and imperfect models in the training phase.
The proof of the above Proposition 2.2 is a direct application of Duhamel's principle to the corresponding Fokker-Planck equation with forcing perturbations [48]. Thus the variability in the external forcing can be transferred to the perturbations in initial values. More importantly, the kicked response formulation (2.10) with initial mean state perturbation (2.9) is independent of the specific perturbation forms. Thus the operator R A describes the inherent dynamical mechanisms of the system. We summarize the practical strategies to calculate the kicked response operators for the mean and variance from (2.10) in Appendix A.
2.4. Empirical information theory for measuring imperfect model errors.
2.4.1. Empirical information theory for leading order statistics. The empirical information theory [38,48] builds the least biased probability measure consistent with the leading order measurements of the true perfect system. Information theory is often used in statistical science for imperfect model selection [11]. A natural way to measure the lack of information in one probability density from the imperfect model, p M , compared with the true probability density, p, is through the relative entropy or information distance [42,48], given by Despite the lack of symmetry in its arguments (that is, P (p 1 , p 2 ) = P (p 2 , p 1 ) in general), the relative entropy, P p, p M provides an attractive framework for assessing model error like a probabilistic metric. Importantly, the following two crucial features are satisfied in the relative entropy: (i) P p, p M ≥ 0, and the equality holds if and only if p = p M ; and (ii) it is invariant under any invertible change of variables. The most practical setup for utilizing the framework of empirical information theory arises when only the Gaussian statistics of the distributions are considered. By only comparing the first two moments of the density functions, we get the following fact [49]: If the probability density functions p, p M contain only the first two moments, that is, p ∼ N (ū, R) and p M ∼ N (ū M , R M ), the relative entropy in (2.11) has the explicit formula The first term on the right hand side of (2.12) is called the signal, reflecting the model error in the mean but weighted by the inverse of the model variance R M ; whereas the second term is the dispersion, involving only the model error covariance ratio RR −1 M , measuring the differences in the covariance matrices. Above usually we will use p to denote the probability distribution of the perfect model, which is actually unknown. Nevertheless, we can construct the measure of the perfect model p L using L measurements of the true system. Consider the imperfect model prediction with its associated probability density p M L , the definition of relative entropy (2.11) facilitates the practical calculation [40,58,59,54,56] precisely measures an intrinsic error from L measurements of the perfect system, and this is a simple example of an information barrier for any imperfect model based on L measurements for calibration. With the measurements L representing the first two moments, the Gaussian approximation (2.12) can be used to estimate the information error P p L , p M L considering only the first L statistical measurements (in practice, it is usually the measurements about the statistical mean and covariance).

Climate information barrier in single point statistics in homogeneous systems.
Here as one example, we briefly illustrate the inherent information barrier in special homogeneous systems like the L-96 model in (1.9b) (see [63]) with uniform damping and forcing using the above relative entropy metric. Of particular interest in both theory and applications, the statistical mean and variance at each individual grid point [51,63,64,20] play an important role as key statistical quantities to predict. In climate science, these might be the mean and variance of the surface temperature at every grid point. The single point mean u 1pt and single point variance r 1pt can be defined by averaging each grid component with presumed homogeneity, that is, For simplicity in representation, we assume homogeneous damping, D = −dI, and forcing, F = f I, in the energy dynamics (2.2), thus the total statistical energy equations for the true model E and reduced-order model approximation E M become Above the statistical energy can be defined through the single point statistics as where we assume homogeneity in the first two moments. The last part on the right hand side of E M equation comes from the error in the approximation for nonlinear flux Q M F . Taking the difference of the above two equations for E and E M and using Gronwall's inequality gives the error in the total statistical energy δ and the definition about the statistical energy offers the estimation The error estimation for the single point variance through the error from the mean and nonlinear flux by combining the above two inequalities with C 0 ,C 1 constants. The inequality in (2.14) illustrates that the error in the second-order statistics δ r 1pt can be controlled by the error in the first-order mean with a good approximation for the nonlinear flux term Q M F . Similar special results for the 40-dimensional L-96 model are described in [63].
With the help of the relative entropy, we can first illustrate the inherent information barrier with the single-point statistics approximation (2.13). It will be shown even with consistent single-point statistics in ū 1pt , r 1pt , large errors may still appear due to the lack of consideration in the covariances between different modes. Through the definition in (2.11) (and referring to Proposition 4.1 of [49]), the relative entropy between the truth p and imperfect model single-point statistics p M 1pt has the form Above, p G is the Gaussian fit for the original probability distribution p with the same mean and covariance from the truth; p G 1pt = N ū 1pt , r 1pt is the single-point approximation for the true system, and p M 1pt = N (ū M , r M ) is the reduced-order model prediction from the imperfect model with consistent single-point statistics. The first part on the right hand side of (2.15) is the intrinsic information barrier in Gaussian approximation. And the third part with homogeneous assumption of the system will vanish (or at least be minimized) due to the single-point statistics fidelity from (2.14). The error from single point approximation (and ignoring the cross-covariance) then comes only from the information barrier in marginal approximation P p G , p G 1pt as shown in the second part on the right hand side of (2.15). Simple calculation using the formula (2.12) and Jensen's inequality [63] yields the estimation for the information barrier in single-point approximation where r j is the variance in the spectral modes, and σ 2 max = max r j , σ 2 min = min r j are the largest and smallest variance. In Figure 2.1, we demonstrate this information barrier for imperfect models with exact one-point statistics ū 1pt , r 1pt consistency for the L-96 model with F = 5, 8. Large errors in the statistical steady state spectra (thus information barrier for these models) The barrier from (2.16) could become significant considering the gap between the largest and smallest variances due to common decaying energy spectra in turbulent systems. See [63] for a detailed example for the L-96 model. This information barrier can only be overcome by introducing more careful calibration about the dynamics in each eigen-direction of the system individually.

2.4.3.
Dynamical calibration for imperfect model improvement. The prediction skill of imperfect models can be improved by comparing the information distance through the linear response operator with the true model. The following fact offers a convenient way to measure the lack of information in the perturbed imperfect model requiring only knowledge of linear responses for the mean and variance δū ≡ δ R u , δ R ≡ δ R (u−ū) 2 . For this result, it is important to tune the imperfect model to satisfy equilibrium model fidelity, in the first place. Statistical equilibrium fidelity is a natural necessary condition to tune the mean and variance of the imperfect model to match those of the perfect model; it is far from a sufficient condition [54,58,59]. Using simplified assumptions with block-diagonal covariance matrices R = diag (R k ) and equilibrium model fidelity P p G , p M G = 0, the relative entropy in (2.11) between the true perturbed density p δ and the perturbed model density p M δ with small perturbation δ can be expanded componentwisely as the following proposition: PROPOSITION 2.4. Under assumptions with block-diagonal covariance matrices R = diag (R k ) and equilibrium model fidelity P p G , p M G = 0, the relative entropy in (2.12) between perturbed model density p M δ and the true perturbed density p δ with small perturbation δ can be expanded componentwisely as Here in the first line S p G,δ − S (p δ ) is the intrinsic error from Gaussian approximation of the system. R k is the equilibrium variance in k-th component, and δū k and δ R k are the linear response operators for the mean and variance in k-th component.
Detailed derivation about this result is shown in [59]. The inherent information error from the first row of (2.17) is due to the measurement in only first two order of moments, and is independent of the specific imperfect model structures. As a result, this component, S p G,δ − S (p δ ), can be viewed as a constant and does not need to be calculated in the optimization procedure.
The second row of the information distance (2.17) illustrates the signal error from the estimation about the mean responses, while the third row is the dispersion error for the errors from the variance responses.
The above Proposition 2.4 about empirical information theory and linear response theory together provides a convenient and unambiguous way of improving the performance of imperfect models in terms of increasing their model sensitivity regardless of the specific form of external perturbations δ f . The formula (2.10) in Proposition 2.2 as well as (2.7) illustrates that the skill of an imperfect model in predicting forced changes to perturbations with general external forcing is directly linked to the model's skill in estimating the linear response operators R A for the mean and variances (that is, use the functional A = u, (u −ū) 2 in calculating the linear response operators) in a suitably weighted fashion as dictated by information theory (2.17). This offers us useful hints of training imperfect models for optimal responses for the mean and variance in a universal sense. From the linear response theory, it shows that the system's responses to various external perturbations can be approximated by a convolution with the linear response operator R A (which is only related to the statistics in the unperturbed equilibrium steady state). It is reasonable to claim that an imperfect model with precise prediction of this linear response operator should possess uniformly good sensitivity to different kinds of perturbations. On the other hand, the response operator can be calculated easily by the transient state distribution density function using the kicked response formula as in (2.10). Considering all these good features of the linear response operator, the information barrier due to model sensitivity to perturbations can be overcome by minimizing the information error in the imperfect model kicked response distribution relative to the true response from observation data.
To summarize, consider a class of imperfect models, M . The optimal model M * ∈ M that ensures best information consistent responses to various kinds of perturbations is characterized with the smallest additional information in the linear response operator R A among all the imperfect models, such that where p M δ can be achieved through a kicked response procedure (2.10) in the training phase compared with the actual observed data p δ in nature, and the information distance between perturbed responses P p δ , p M δ can be calculated with ease through the expansion formula (2.17). The information distance P p δ (t) , p M δ (t) is measured at each time instant, so the entire error is averaged under the L 1 -norm inside a proper time window [0, T ] before the linear response function decays back to zero.
3. Reduced-Order Statistical Models for the Turbulent Systems. Previously in Section 2, the general idea about finding the optimal imperfect model is proposed according to the statistical theories and information distance metric. And we have shown the basic theoretical tools that can help construct the reduced-order statistical approximations and illustrate the information barriers due to these approximations. Then it is important to construct the explicit forms of the reduced-order models according to the exact dynamics for the mean and covariance in (1.4) and (1.5). Generally the statistical model for the leading two moments can be formulated in the full phase space as whereū M ∈ R N is the model approximated mean, and R M is the N × N full order covariance matrix about the fluctuation state variable u ∈ R N . Comparing with the original statistical dynamics (1.4) and (1.5), the most expensive but crucial part comes from the nonlinear flux term Q F in (1.6c) where important third-order moments are included representing the nonlinear interactions between different modes. Therefore the key issue in this section is to construct a judicious estimation about this nonlinear interaction term Q M F in the statistical closure models. Here the basic idea is to start with the simplest possible imperfect model and compare the advantages and limitations of different levels of imperfect models due to different degrees of approximation and model calibration, and finally check how the theories from previous sections can help with improving the model prediction skill, especially the model sensitivity to various perturbations.
3.1. A hierarchy of statistical reduced-order modeling ideas based on stochastic models. We may consider the statistical closure ideas by taking another look at the dynamics for stochastic coefficients Major nonlinearity comes from the term above representing interactions between different fluctuation modes B (u , u ) · e i . The first idea here is to model the effect of the nonlinear energy transfers on each mode by adding additional damping d M,i balancing the linearly unstable character of these modes, and adding additional (white) stochastic excitation with standard deviation σ M,i which will model the energy received by the stable modes. We want to constrain ourselves to second order models concentrating on the mean and variance and maintaining the computational expense in a low level, hence the additional parts d M,i , σ M,i only include statistics up to second order moments. Specifically we replace this high-order nonlinear term by with trR = ∑ j Z j Z * j for measuring the total energy (variance) structure in the system. Corresponding to the statistical equations, the nonlinear flux Q F representing the higher-order interactions is replaced by In (3.2), (D M , Σ M ) are N × N matrices that replace the original nonlinear unstable and stable effects from the original dynamics. Here represents the additional damping effect to stabilize the unstable modes with positive Lyapunov coefficients, while Q M F+ = Σ M,k (R) is the positive-definite additional noise to compensate for the overdamped modes. Now the problem is converted to finding expressions for D M and Σ M . In the following by gradually adding more detailed characterization about the statistical dynamical model we display the general procedure of constructing a hierarchy of the closure methods step by step. Below is a review about several model closure ideas [54,88,63,78] with increasing complexity: 1. Quasilinear Gaussian closure model: The simplest approximation for the closure methods at the first stage should be simply neglecting the nonlinear part entirely [23,25,90]. That is, set Thus the nonlinear energy transfer mechanism will be entirely neglected in this Gaussian closure model. This is the similar idea in the eddy-damped Markovian model where the moment hierarchy is closed at the level of second moments with Gaussian assumption and a much larger eddy-damped parameter is introduced to replace the molecular viscosity (see Chapter 5 of [84] and [44] for details). Obviously this crude Gaussian approximation will not work well in general due to the cutoff of the energy flow when strong nonlinear interactions between modes occur. Actually, the deficiency of this crude approximation have been shown under the L-96 framework, and in final equilibrium state there exists only one active mode with critical wavenumber [89,63]. Such closures are only useful in the weakly nonlinear case where the quasi-linear effects are dominant. 2. Models with consistent equilibrium statistics: Next the strategy is to construct the simplest closure model with consistent equilibrium statistics. So the direct way is to choose constant damping and noise term at most scaled with the total variance. We propose two possible choices as in [63] for the damping and noise in (3.2) below. Gaussian closure 1 (GC1): let Gaussian closure 2 (GC2): let (3.5) Above only two scalar model parameters (ε M , σ M ) are introduced, and I N represents the N × N identity matrix. GC1 is the familiar strategy of adding constant damping and white noise forcing to represent nonlinear interaction; GC2 scales with the total variance trR (or total statistical energy) so that the model sensitivity can be further improved as the system is perturbed. From both GC1 and GC2, we introduce uniform additional damping rate for each spectral mode controlled by a single scalar parameter ε M ; while the additional noise with variance σ 2 M is added to make sure climate fidelity in equilibrium (we leave the detailed discussion for climate fidelity in Section 3.2.1).
The statistical model closure Q M F is used to approximate the third-order moments in the true dynamics, thus the exponents of the total energy trR in GC2 should be consistent in scaling dimension. In the positive-definite part Q M F+ , it calibrates the rate of energy injected into the spectral mode due to nonlinear effect in the order |u | 3 . The factor scales with the total energy with exponent 3/2 so that the corrections keep consistent with the third-order moment approximations; In the negative damping rate Q M F− , the scaling function is used to characterize the amount of energy that flows out the spectral mode due to nonlinear interactions. Scaling factor with a square-root of the total energy with exponent 1/2 is applied for this damping rate multiplying the variance in order |u | 2 to make it consistent in scaling dimension with third moments. 3. Modified quasi-Gaussian closure with equilibrium statistics: In this modified quasi-Gaussian closure model originally proposed in [89,88], we exploit more about the true nonlinear energy transfer mechanism from the equilibrium statistical information. Thus the additional damping and noise proposed like before are calibrated through the equilibrium nonlinear flux by letting N M,eq is the effective damping from equilibrium, and Q + F,eq is the effective noise from the positive-definite component. Unperturbed equilibrium statistics in the nonlinear flux Q F,eq are used to calibrate the higher-order moments as additional energy sink and source. The true equilibrium higher-order flux can be calculated without error from first and second order moments in ū eq , R eq from the unperturbed true dynamics (1.5) in steady state following the steady state statistical solution relation (2.3) as discussed in Section 2.2 Q − F,eq , Q + F,eq are the negative and positive definite components in the unperturbed equilibrium nonlinear flux Q F,eq . Since exact model statistics are used in the imperfect model approximations, the true mechanism in the nonlinear energy transfer can be modeled under this first correction form. This is the similar idea used for measuring higherorder interactions in [88], where more sophisticated and expensive calibrations are required to make that model work there.

3.2.
A reduced-order statistical energy model with optimal consistency and sensitivity. The above closure model ideas, especially (3.4), (3.5), and (3.6), have advantages of their own. Models in (3.4) and (3.5) are simple and efficient to construct with consistent equilibrium consistency, while (3.6) involves the true information about the higher-order statistics in equilibrium so that the energy mechanism can be characterized well. The validity of these approaches has been tested and compared from several papers [89,88,63] using the simplified triad model and L-96 model. Still when it comes to the more complicated and realistic flow systems like the quasi-geostrophic equations, more detailed calibration for model consistency and sensitivity is required to achieve the optimal performance. A preferred approach for the nonlinear flux Q M F combining both the detailed model energy mechanism and control over model sensitivity is proposed in the form The closure form (3.8) consists of three indispensable components: i) Higher-order corrections from equilibrium statistics: In the first part of the correction using the damping and noise operator as N M,eq , Q + F,eq , unperturbed equilibrium statistics in the nonlinear flux Q F,eq are used to calibrate the higher-order moments as additional energy sink and source following the procedure in (3.6). Therefore the equilibrium statistics can be guaranteed to be consistent with the truth, and the true energy mechanism can be restored; ii) Additional damping and noise to model changes in nonlinear flux: The above corrections in step i) by using equilibrium information for nonlinear flux is found to be insufficient for accurate prediction in the reduced-order methods since the scheme is only marginally stable and the energy transferring mechanism may change with large deviation from the equilibrium case when external perturbations are applied. Thus we also introduce the additional damping and noise (d M , Σ M ) as from (3.4). d M is just a constant scalar parameter to add uniform dissipation on each mode, and Σ M is the further correction as an additional energy source to maintain climate fidelity; iii) Statistical energy scaling to improve model sensitivity: Still note that these additional parameters are added regardless of the true nonlinear perturbed energy mechanism where only unperturbed equilibrium statistics are used. To capture the responses to a specific perturbation forcing, it is better to make the imperfect model parameters change adaptively according to the total energy structure. Considering this, the additional damping and noise corrections are scaled with factors f 1 (R) , f 2 (R) related with the total statistical variance trR as Note that in the full model formulation (3.1a) and (3.1b) with the entire covariance matrix R resolved, the total variance structure trR is easy to achieve. However in the low-order models with only the variances in the principal modes resolved explicitly as will be discussed in the following subsection, trR is generally not available directly. This is where the statistical energy dynamics (2.2) can play an important role and help the development of reduced order models. Besides, a further skewsymmetric correction for dispersion effects in addition to the scalar model damping d M in the reduced-order models might be useful in some situations as the following remark. This term will not alter the entire energy structure of the system due to skew symmetry but can offer correction for the dispersion relation in this imperfect model. However, we may have the additional difficulty in fitting the N by N parameter matrix in the general case. In practical applications, instead we can exploit the physical structure of the specific model and introduce only one additional dispersion parameter iω M ; see [78] and [79] for two examples of adding the dispersion correction to effectively improve model prediction skill under the barotropic and baroclinic models.
Next we discuss the detailed calibrations about the nonlinear flux approximations. Two steps of model calibration should be considered as from the general framework described in Section 1.2: i) the equilibrium consistency that the reduced model must converge to the true equilibrium statistical state as no perturbations are added; ii) model sensitivity by blending statistical response and information theory so that the imperfect model can capture the responses to various kinds of perturbations as the system is perturbed. The construction in (3.6) guarantees equilibrium consistency using the true equilibrium model nonlinear flux structure. On the other hand, to improve model sensitivity, the linear response operators with information distance metric are used to find optimal parameters from the correction part in (3.4) or (3.5).
3.2.1. Equilibrium statistical fidelity through the additional damping and noise. In designing the reduced-order models, equilibrium fidelity for consistent statistics should be guaranteed in the first place in the unperturbed climate. That is, the same final unperturbed statistical equilibrium R eq should be recovered from the closure models R M in each component. Comparing the true statistical equation (1.5) with the reduced-order model (3.1b), time derivatives about the statistics on the left hand sides vanish in statistical steady state, thus climate consistency can be achieved if we have exact recovery of the estimation in the nonlinear flux term. Specifically, it requires that the model nonlinear flux correction term (3.8) converges to the truth, Q M → Q F,eq , when no external perturbation is added. Under this condition in steady state the closure model covariance equation (3.1b) goes to the true unperturbed statistics, the equilibrium statistical relation (2.3) implies the relation In construction the first component N M,eq , Q + F,eq comes from the true equilibrium statistics, and in equilibrium state it will guarantee the consistency with the truth that − N M,eq R eq + R eq N * M,eq + Q + F,eq = Q F,eq . This part will be automatically equal to the true nonlinear flux in equilibrium. On the other hand climate consistency requires that the second component correction due to the parameters (d M I N , Σ M ) adds no additional energy source or sink in the unperturbed system, and no further correction in the scaling functionals. That is, we need Σ M to satisfy (3.10) Σ M = 1 2 d M R eq , f 1 trR eq = 1, f 2 trR eq = 1.
Note again f 1 and f 2 in (3.10) calibrate the model sensitivity to perturbations according to the total energy structure trR. Thus it is natural to assume no additional correction in the unperturbed case. By choosing parameters according to (3.10), the climate consistency for the imperfect reduced-order models in (3.1b) in the unperturbed equilibrium is guaranteed. In addition, we still leave one controlling parameter d M for the freedom to tune the imperfect model performance, considering that climate consistency is only the necessary but not sufficient condition for good model prediction [59].
3.2.2. Model calibration blending statistical response and information theory. The above methods (3.4), (3.5), (3.6), as well as (3.8) construct statistical approximation models with consistent equilibrium statistics. Still equilibrium fidelity of imperfect models is a necessary but not sufficient condition for model prediction skill with many examples [54,59,63]. In order to get precise forecasts for various forced responses, it is also crucial to seek models that can correctly reflect the system's 'memory' to its previous states. From Section 2.2, it shows that the linear response operator R A represents the laggedcovariance of certain functions (and thus can describe the 'memory' of the system to previous states). We try to find a unified way to achieve the optimal model parameters d M such that the imperfect models can maintain high performance for various kinds of external perturbations. Adopting the general strategy suggested in Section 2.3, we can improve model sensitivity through tuning imperfect models in a training phase before the prediction step. Thus the optimal model parameter can be selected through minimizing the information distance in the linear response operators in (2.18) between the imperfect closure model and the truth.
Information-theoretical framework to measure the linear responses in the training phase. In this training phase, we try to find the optimal model parameters d M by comparing the linear response operators from the true system and imperfect approximation model. The true model linear response operator and the reduced-order model response operator can be calculated from (2.8), following the procedure from the kicked response strategy with detailed procedure shown in Appendix A. The distance between these two operators can be calculated through the information metric (2.17) which offers an unbiased and invariant measure for model distributions The first row above is the signal error due to the estimation about the mean; and the second row is the dispersion error for calibrating the linear responses in the first two order of moments, δ R k . The intrinsic error due to second-order closure S p G,δ − S (p δ ) is independent of the specific forms of the reduced-order models and is not included in this metric. The optimization principle in (2.18) is then performed over the parameter d M .

Comparisons with stochastic modeling about the mean and fluctuations and realizability.
To achieve a better understanding about the statistical models, it is useful to compare the reduced-order statistical energy model (3.1a) and (3.1b) with its stochastic correspondences. In the stochastic formulation, we consider the separation with a deterministic mean state and the stochastic fluctuations whereū M = u M is the statistical mean state following the same dynamical mean equation as before together with the stochastic dynamics for the fluctuation modes 3.3. Reduced-order statistical model for principal modes. Here for genuinely high-dimensional systems, the computational cost for the full covariance matrix R M is still unaffordable even with the first two moment closure [6,56]; for example, climate systems usually have the dimensionality at least of order 10 3 . On the other hand, in many situations, we are mostly interested in the variability in the statistics of the first most energetic principal directions. Therefore one alternative practical strategy is to develop reduced-order methods that only calculate variances in a low-dimensional subspace spanned by primary EOFs {v 1 , · · · , v s } with s N (N the dimensionality of the full system). The corresponding reduced-order representation of the state variables under these resolved basis becomes u =ū + ∑ s k=1 Y k v k . To see the possibility of achieving this, first note that the dynamical equations for variances (3.1b) in each mode r k = Y k Y * k are rather independent with each other according to the previous closure strategies with higher-order interactions replaced by additional damping and noise terms. Thus it is realizable to restrict the variance equations inside the chosen subspace. Actually following the same strategy by replacing the high-order interaction terms by proper damping and noise, the equivalent counterpart of the closure models can be formulated as a low-order stochastic system The mean dynamics (3.12a) is the same as the previous closure model (3.1a) with an additional correction term G to compensate the unresolved modes. C M is the reduced-order s × s covariance matrix where only the leading primary modes are resolved, that is, where P = [v 1 , v 2 , · · · , v s ] ∈ C N×s projects the modes to the subspace. Through proper choice of the parameters according to GC1 (3.4), GC2 (3.5), MQG (3.6), or the blended method (3.8) as before but concentrating on the resolved subspace, these reduced system should converge to the same first two order statistics with the moment closure model. Still several new problems need to be taken care of for the above model reduction process: i) How to ensure correct modeling about the true statistics in the mean dynamics (3.12a) due to the many unresolved directions in the covariance C M ; ii) How to include the nonlocal scale factor (which always includes the total energy trR = ∑ N k=1 r k ) in (3.9) in the nonlinear flux approximation Q red F,M if only subspace variances are resolved. Here in the general strategy to do this, we follow the ideas in [63,78,79].

Correction for the mean dynamics.
Still the simplest way of estimating the unresolved parts in the mean dynamics is through the statistical equilibrium information G eq . The value of the additional forcing G eq is determined using statistical steady state information for the covariance C eq and the meanū eq . In particular we have the equilibrium equation through the steady state mean dynamics where d dtū eq ≡ 0 (3.13) Similar as in the estimation about the nonlinear flux, the mean dynamics correction term can also be scaled with the total variance in the system, so that, (3.14) G = trR trR eq G eq .
In this way, the mean dynamics (3.12a) become consistent in the statistical equilibrium state, and the corrections G can change sensitively according to the total energy structure through trR.
Remark 3.2. One further optional correction for the unresolved modes in the reduced-order mean equations is to make further use of the linear response theory predictions. To estimate the values for unresolved modes, we can improve it from the equilibrium statistics by introducing finer approximation making use of the linear response operator (2.7) (3.15) r k,un ∼ r k,eq + δ r k = r k,eq +ˆt 0 R r k (t − s) δ F (s) ds, k > s.
Therefore these first-order predictions for the unresolved variances r k,un can also be used in (3.12a) for estimating the unresolved modes. This idea is applied to the L-96 system with improvements shown in [63].

Correction through total statistical energy.
In the reduced-order covariance dynamics (3.12b) using the closure form (3.8), two additional scaling factors, f 1 , f 2 , are introduced to further quantify the nonlinear energy flux in and out the spectral modes due to the nonlinear interactions. We propose the dynamical corrections with the total statistical energy trR as in the forms (3.9). This total energy correction introduces global information into each spectral mode so the nonlinear energy transfer can be better characterized in the imperfect model, while solving only one additional scalar equation is the only additional cost in computation. The scaling factor from trR introduces nonlinear global effect into the additional damping and noise corrections in each mode. This can be solved efficiently by introducing one additional scalar equation as described in Then trR can be achieved by solving E = 1 2ū 2 + 1 2 trR. Especially in uniform damping case, D = −dI, the above statistical energy equation can be simplified as Note that on the right hand side of (3.16), only mean state information (which can be fully resolved in the reduced mean dynamics in (3.12a)) needs to be calculated to get the total statistical energy E. In this way, the total second order moments trR can be entirely determined only through the first order meanū and the scalar statistical energy equation (3.16). The final issue in the reduced-order model construction is about the tuning process in the training phase. Still the same kicked-response strategy (2.10) can be applied to the reduced-order formulation (3.12). Importantly, the relative entropy expansion for the responses in (2.17) is decomposed into each component. Thus it can be directly applied to the reduced-order case by calculating only the signal and dispersion error in the resolved subspace. Therefore through the same procedure as the previous case, we can find the optimal model parameter in the training phase for the reduced-order model, and then apply the optimal model for prediction with various forcing perturbations.
3.4. Summary of the Reduced-Order Statistical Energy Closure algorithm. We summarize the low-dimensional reducedorder statistical closure algorithm with calibrations from total statistical energy and linear response theory. The general reducedorder model algorithm is split into the separated steps of a training phase and a prediction phase after a proper imperfect model selection step according to the problem. The training phase is used to improve model sensitivity by tuning the imperfect model parameter using only unperturbed equilibrium statistics for the linear response operator. Then the optimal parameter can be applied for predicting model responses to different kinds of external perturbations. Note that in the calibration step in the algorithm, only the unperturbed statistics in equilibrium are required. Thus this offers the optimal model parameters that are ideally valid for all kinds of specific forcing perturbation forms. With the help of the linear response operator we are able to find a unified way to tune the imperfect model parameters and avoid the exhausting and impractical process to tune the models each time with different kinds of perturbations.

Reduced-Order Statistical Models Applied to a Suite of Stochastic Triad Models.
In this section, we illustrate the performance of the reduced-order formulations by considering a simple but nevertheless instructive model, namely the triad system with stochastic forcing [30,50,54,89]. The triad systems where three modes interact through quadratic energyconserving nonlinear interactions form the building block for more general complex turbulent flow, thus provide a nice simple test case for the mode elimination strategy in the first stage. The nonlinear interaction in triad systems is generic of nonlinear coupling between any three modes in larger systems with quadratic nonlinearity. For a three-dimensional system about state -Construct low-order approximation of the nonlinear flux Q M F in the statistical equations using the statistical energy closure proposed in (3.8) consistent with the equilibrium first two moments; -Compute the true linear response operator from the unperturbed equilibrium statistics, and calculate the imperfect model predicted linear response operator from the kicked response strategy; -Determine the imperfect model parameter value through minimizing the information distance in (2.17) and (2.18) between linear response operators from true equilibrium statistics and imperfect model approximation. variables u = (u 1 , u 2 , u 3 ) T ∈ R 3 with a quadratic part that is energy preserving, the triad system possesses the general form The triad system (4.1) is easy to summarize in the original abstract formulation (1.1), where are the skew-symmetric and dissipation operator in (1.2a) representing respectively the Coriolis forcing and dissipation; and which forms the nonlinear triad coupling that satisfies the energy conservation in (1.2b). The triad system can form the building block of complex turbulent dynamical systems since it can be viewed as a three-dimensional Galerkin truncation of many general dynamics. One celebrated example is the famous Lorenz model [45] that can be viewed as a special case of this procedure. An interpretation of these low-order models with atmospheric problems and geoscience is illustrated in [30]. Though simple in appearance of this triad system, complex and interesting statistical features can be generated through changing the model parameters.

4.1.
Statistical properties for the triad system. First we can check the general statistical properties described in Section 1.2 with the triad system. Typically we would like to investigate the evolution of a smooth probability density function p (u,t) due to the internal and external stochasticity. Associated with the triad equations (4.1), the statistical solution satisfies the Fokker-Planck equation Above we assume the forcing terms are only dependent on time,  2) is difficult to achieve due to the nonlinear interactions in the triad system. Still under special arrangement about the damping and noise coefficients, one special solution of a Gaussian invariant measure, p eq , can be reached in the equilibrium. In the absence of the deterministic forcing, F = 0, assume the damping operator d i and random noise forcing σ i satisfy the following relation in each component Therefore, a Gaussian invariant measure as defined in (1.7) can be found with equipartition of energy in each component, that is, Above σ 2 eq is the equilibrium variance in the Gaussian invariant distribution p eq that controls the variability in each mode. To see this, we can substitute the invariant measure (4.4) back into the Fokker-Planck equation (4.2). It is a special case from the Theorem in [54], and detailed energy mechanism and stability for the triad system can be found in [50].
In the general case with deterministic external forcing and inhomogeneous structure, energy is injected into the modes and transferred to each other due to the nonlinear quadratic interaction through more complicated mechanism, thus strong nonlinear non-Gaussian statistics with energy cascade and internal instabilities can be generated.

4.1.2.
A link with quasi-geostrophic turbulence. The triad model (4.1) is the building block of complex turbulent dynamical systems since a three-dimensional Galerkin truncation of many complex turbulent dynamics possesses the energyconserving nonlinearity as in (1.1). The random forcing together with the damping term represents the inhomogeneous effect of the interaction with other modes in a turbulent dynamical system that are not resolved in the three dimensional subspace. Stochastic triad models are qualitative models for a wide variety of turbulent phenomena regarding energy exchange and cascades and supply important intuition for many effects. They also provide elementary test models with subtle features for prediction, UQ, and state estimation [50,64,87].
As a simple illustration about the link to more complex turbulent systems, we can consider the quasi-geostrophic (QG) potential vorticity equation with no external forcing and dissipation We have the barotropic triads of three barotropic components, ψ k , ψ m , ψ n , obeying the selecting rule k + m + n = 0. Consider an initial condition in which only these three components of a particular triad are excited, then these three modes will only interact with each other while no other modes will get excited due to the particular triad relations as the system evolves in time. By projecting the above equation to the active triad modes, we get the dynamical equations for the selected modes where A kmn = |n| 2 |k| 2 m ⊥ · n is the triad interaction coefficient with the detailed symmetry A kmn + A mnk + A nkm = 0, showing the conservation of kinetic energy, d dt |k| 2 |ψ k | 2 + |m| 2 |ψ m | 2 + |n| 2 |ψ n | 2 = 0.
The typical forward and backward cascades of energy and enstrophy in turbulent flow are characterized by the triad interactions between the three models. Hence from the above discussion, in the two-dimensional QG turbulence, the nonlinear energy transfer is exactly governed by the barotropic triads the same as (4.1) in the nonlinear interaction part.

4.2.
Typical dynamical regimes in the triad system. Though simple in appearance, the triad system (4.1) has representative statistical features including energy cascade between modes and internal instabilities that can be created in this simple set-up. A fundamental factor in the triad system is the internal instabilities that make the mean unstable over various directions in phase space as is typical for anisotropic fully turbulent systems. Elementary intuition about energy transfer in such models can be gained by looking at the special situation with L = D = F = σ ≡ 0 so that there are only the nonlinear interactions in (4.1). We examine the linear stability of the fixed point,ū = (ū 1 , 0, 0) T . Elementary calculations show that the perturbation δ u 1 satisfies dδ u 1 dt = 0 while the perturbations δ u 2 , δ u 3 satisfy the second-order equations so that there is instability with B 2 B 3 > 0 and the energy of δ u 2 , δ u 3 grows provided B 1 has (4.6) the opposite sign of B 2 and B 3 with The elementary analysis in (4.6) suggests that we can expect a flow or cascade of energy from u 1 to u 2 and u 3 where it is dissipated provided the interaction coefficient B 1 has the opposite sign from B 2 and B 3 .
Then energy cascades can be induced from the strongly forced unstable energetic mode to the stable less energetic modes with stronger damping effects. Particularly, we can generate distinct statistical features from Gaussian to highly skewed non-Gaussian PDFs in the following dynamical regimes: • Regime I: Equipartition of energy. Set the equipartition of energy in stationary steady state. That is, The skew-symmetric interaction is set to be zero, L 1 = L 2 = L 3 = 0, and there is no deterministic forcing F 1 = F 2 = F 3 = 0 added in unperturbed equilibrium. In this case, the first mode is strongly forced by the random forcing while the other two modes are much less energetic. The values in the nonlinear coupling coefficients make sure that the additional energy injected in u 1 cascades to the other two less energetic modes u 2 , u 3 , and then gets dissipated by the strong damping; • Regime III: Nonlinear regime with dual energy cascade. Use the same damping and noise forcing parameters as in the energy cascade case, that is, d 1 = 1, d 2 = 2, d 3 = 2, and σ 2 1 = 10, σ 2 2 = 0.01, σ 2 3 = 0.01. The nonlinear coupling coefficients are also taken the same values as before, B 1 = 2, B 2 = B 3 = −1. Skew-symmetric interactions are added in this regime as L 1 = 0.09, L 2 = 0.06, L 3 = −0.03 to enhance the interactions between modes. Deterministic forcings are applied in modes u 2 , u 3 as F 1 = 0, F 2 = −1, F 3 = 1. In addition to the forward energy cascade from u 1 to u 2 , u 3 as in the previous case, the additional forcing introduces energy sources in modes u 2 , u 3 and leads to backward energy cascade from modes u 2 , u 3 back to u 1 . The linear skew-symmetric operator further alters the equilibrium energy structure in the system, resulting in skewed probability distribution functions in the steady state. This regime is especially interesting because strong internal instability can be generated here. The first test case is the simplest but nevertheless representative with equipartition of energy. The higher-order moment effects are relatively small in this equipartition energy case, while the dynamics are dominantly Gaussian with zero cross-covariances as the system evolves in time. The second test case above is also relatively simple without the skew-symmetric interaction between modes and most energy will accumulate in the first dominant mode. Nevertheless important third-order interactions will take place in this case, and large errors will be introduced if the cross-covariances are ignored without care. In the third test case the non-zero forcing in the unperturbed system creates skewed equilibrium distributions with important third-order moments. Also the non-zero linear skew-symmetric interaction terms add extra emphasis on the cross-covariances. These induce stronger interactions and energy cascades between the triad modes.

4.2.1.
Numerical results about the the unperturbed triad system in equilibrium. The true statistical features of the triad system in the above dynamical regimes are resolved through direct Monte-Carlo simulations. We run an ensemble of N = 10000 particles, which shall be enough for capturing the statistics in a three-dimensional phase space. Forward Euler scheme with small time step is used to integrate the system in time due to its simplicity. The stochastic forcing is simulated through the standard Euler-Maruyama scheme. The initial ensemble is chosen from a standard Gaussian random sampling.
For more details about the model statistics in these regimes, Figure 4.1 displays the probability distributions in these three test regimes. In mode u 1 , Gaussian (or quasi-Gaussian) distributions can always be observed in all the three regimes. Also the equipartition of energy structure can be observed in the first regime as predicted from analysis in the previous sections. Marginal PDFs are consistent with the Gaussian fits from theory and the joint 2-dimensional distributions are also in Gaussian structure. On the other hand, in the forward energy cascade case, fat tails can be observed in the marginal PDFs in both u 2 , u 3 as well as the star-shaped joint-distribution, showing the strong nonlinear effects in the modes. Note that the non-Gaussianity in u 2 , u 3 can affect the final structure in the dominant mode u 1 despite its near-Gaussian marginal distribution. Furthermore, in the third test regime with dual energy cascades, besides the fat tails, skewed PDFs appear in both modes u 2 , u 3 due to the non-zero deterministic forcing applied on them. The non-Gaussianity can be further confirmed by the joint distributions where strong skewness is shown. Also note that the mean states are not centered in zero in this case. These illustrate important third-order moments in this case for accurate predictions. Below we concentrate on this toughest regime.

4.3.
Reduced-order statistical models for the triad system. In the development about reduced-order models, we focus on the accurate estimation about the first two moments, that is, the statistical mean state and covariance matrix. The major interest is to check the models' skill in capturing sensitivity in response to external perturbations besides equilibrium consistency in the models. Considering the relatively simple structure in the triad dynamics, we focus on the GC1 and GC2 formulation in (3.4) and (3.5) respectively in the following test cases and the parameter calibration strategy proposed in (3.8). In the model reduction procedure, we first consider the fully resolved model, where the 3-dimensional mean and 3 × 3 covariance matrix are resolved entirely; and then the diagonal model, where only the mean and diagonal variances are calculated explicitly. In the final step, a severely reduced-order model is introduced where only the variance in the principal mode is resolved.

Reduced model formulation for the triad system (Model selection).
Applying exactly the modeling procedure in Section 3 to the triad system, the two-moment closure schemes replace the higher order moments with additional damping and excitement containing only first two order of moments. The first approach is to run the full system with mean and covariance matrix where both diagonal variances and off-diagonal components are resolved forū ∈ R 3 and R ∈ R 3×3 dū dt As a test for reduced order methods, we also check the models' skill by ignoring the cross-covariances. Especially for the triad system, the off-diagonal cross-covariances play a crucial role in the computation of mean and variance responses, while on the other hand, are difficult and expensive to estimate with accuracy. Then the unresolved covariances are approximated from steady state information.
• Fully Resolved Model: In this fully resolved model, the entire mean and covariance matrix are calculated through the dynamical equations (4.7). The nonlinear flux term is approximated by additional damping and additional noise as in (3.4) and (3.5) • Diagonal Model ignoring cross-covariances: In this diagonal model, the cross-covariances between modes are ignored to improve model efficiency. So the covariance matrix is replaced by diagonal matrix R = diag (r k ). The correction in Q GC for higher-order moments calibration is kept the same as the full model for GC1 or GC2 respectively. To correct the error due to the neglect of cross-covariances, we add the cross-covariance correction using only steady state information r i j = trR trR eq u i u j eq .
The diagonal model can be much more efficient compared with the fully resolved model with a model reduction from O N 2 to O (N). Through construction using the unperturbed equilibrium information, climate consistency is guaranteed in both GC1 and GC2 model. In GC1, only constant damping and noise are added to approximate the unresolved higher order moments, while these terms are further corrected with the total variance in GC2. The exponents in the scaling factors are designed to make them consistent in dimension with the estimated third-order statistics. Actually, these scaling factor becomes quite crucial in capturing model responses to perturbations and model performances in transient state.
As a further reduction in the model we consider the two-moment closure methods in reduced order subspace. In this case, we only resolve the variance in the first mode u 1 , and calculate the mean dynamics for all three modes. Then the reduced order model can be expressed as is the index set that includes the resolved modes. Especially with only the leading variance resolved, C red = r 1 = u 2 1 is only the variance in mode u 1 . The construction about the reduced-order parameters can follow the general strategy in (3.12a) and (3.12b) exactly.
• Reduced Model with only the principal variance resolved: In the variance equation (4.8b), the formulation is similar as before and we only need to constrain the covariance matrix C red in the resolved subspace with the first mode u 1 resolved. The additional damping and noise can be applied in the same way as the reduced-model correction as discussed in (3.12). In the mean dynamics, the unresolved second-order moments in R i j B (e i , e j ) are corrected following (3.13) The total variance for model sensitivity correction can also be achieved through the approximated statistical energy dynamics as in (3.16) Note that we have calculated the mean in each mode, then the total variance can be calculated through the total statistical energy trR = 2E − ∑ 3 i=1ū 2 i . One additional difficulty in the inhomogeneous case is that different damping rates d j are applied to different modes. So we introduce the effective damping rate d eff through the statistical steady state information as We also list the explicit statistical mean and variance dynamical equations in Appendix B, as well as the explicit statistical energy dynamics to achieve the total variance of the system when only the leading mode is resolved.

4.3.2.
Model consistency and tuning parameters in the training phase (Model calibration). In the model calibration step, we check the imperfect models' consistency with climatology when no perturbation is applied. As noticed before, the imperfect models' skill in capturing the right statistics in transient state is crucial for the convergence to the right fixed point in the energy equation. As an example, Figure 4.2 compares the model performances in regimes with dual energy cascade. It can be observed that the statistical steady state mean and variances are recovered with accuracy due to the climate consistent choice of parameters through (3.4) and (3.5), while GC1 lacks the ability to accurately capture the transient state in the beginning due to its lack of sensitivity.
We illustrate the tuning process for optimal model parameters in the training phase in Figure 4.3. Again we use the dual energy cascade regime as an example. Since both the deterministic and stochastic forcing might be perturbed in the external forcing, we consider a kicked response in the initial value according to the perturbation form described in (4.9) and (4.10). In the first row, the information errors with changing model parameter values are shown in total relative entropy from (2.17) as well as the errors in signal and dispersion component. Note that the errors in GC2 model results stay uniformly small in the entire parameter regime, showing the robustness of the method; on the other hand, GC1 model displays larger information error no matter how well we tune the model parameter. This illustrates the inherent information barrier in the closure schemes if we do not consider proper statistical energy scaling in the model damping and noise terms. In the second row, we show the approximation about the linear response operators in the mean and variance using optimal parameters. The transient structures can be captured with accuracy in GC2 model, while GC1 lacks the skill due to the insufficient calibration in the higher-order nonlinear flux approximation. Imperfect model performances in convergence to statistical equilibrium state in dual energy cascade regime. Both GC1 (with constant damping and noise) and GC2 (with correction from total variance) can recover the steady state mean and variance but GC1 lacks the ability to accurately capture the transient state in the beginning due to its lack of sensitivity.

Model prediction skill to perturbations (Model prediction). Periodic perturbations added in both deterministic
and random forcing are representative in checking the imperfect models' prediction skill in response to perturbations. The perturbations are introduced in the following forms: • periodic forcing perturbations in the deterministic forcing: The deterministic forcing perturbation is introduced by a periodic addition to the mean forcing F =F + δ F. As one typical test case, we add periodic forcing perturbations δ F to each mode, that is, with ω taking the value π/4, and A i measures the perturbation amplitude in each mode. • periodic perturbation in the stochastic random forcing: Also to add stronger time-dependent effect to the variances, we add periodic random forcing to the system by setting with δ f (t) = sin (ωt) also set to be periodic. Hereσ j is the mean stochastic forcing amplitude in the unperturbed case with σ T j >σ j to add perturbations to this unperturbed mean. We check the model performances in predicting response to the periodic perturbations in both deterministic and stochastic components. Here we display the imperfect model prediction skill in the toughest regime with dual energy cascade. Thus strong nonlinear coupling is present between the modes. In Figure 4.4, the fully resolved model with mean and 3 × 3 covariance matrix is applied to the typical regime in GC1 and GC2 closure methods. In this regime with skewed distributions, higher order moments become crucial and need more detailed calibration. As we can see from the results, the deficiency of GC1 method appears in this regime due to the inaccurate approximation about the third-order moments. Large deviation takes place in the skewed modes u 2 , u 3 due to the errors from third-order energy transfer. In contrast, GC2 model maintains the high skill in predicting the mean and variances with the more careful calibration about the nonlinear flux through the scaling factor using total statistical energy.
In Figure 4.5, we check the diagonal models with only mean and variances in each mode resolved and ignoring the offdiagonal cross-covariances. Like the previous case of full model, GC1 loses the skill in predicting the responses in u 2 , u 3 due to the lack of information in the third-order interactions. The errors in the beginning transient regime drive the statistical equation to the wrong state or even blowing up. GC2 keeps the skill in capturing the response structures of both the mean and variances. And again most of the error takes place in the variance estimations. Illustration about the training phase for tuning optimal imperfect model parameters for GC1 and GC2. The first row is the information errors with changing values of the tuning parameter σ M . The second row displays the approximation for the linear response operators in GC1 and GC2 using optimal parameters from the tuning process above.  Testing the feasibility of the reduced-order strategies described previous is important for further applications to general high dimensional turbulent systems because the triad interactions always represent the nonlinear interactions in different scales which usually are ignored in realistic modeling. The same dynamical regime is tested here for the reduced-order models. In this case, only the variance in the first mode u 1 is resolved explicitly. Especially for this regime with strong dual energy cascade, as we have seen in the true statistical dynamics, strong coupling exists between the high energy mode u 1 and the less energetic modes, u 2 , u 3 , in both third-order moments and second-order cross-covariances. Thus this becomes a challenging situation for the reduced-order models for capturing the responses with accuracy. The model prediction results are shown in Figure 4.6. With forward and backward energy cascade, strong nonlinear high-order interactions become crucial here. GC1 loses its skill in this case and ends with large errors especially in the mean state u 1 . This is no surprise considering the strong perturbed deviation from the equilibrium state in this regime due to the nonlinear energy transfer while GC1 model only uses unperturbed equilibrium information. On the other hand, GC2 keeps its skill and can capture the responses in both mean and variance with only a single low-order mode resolved.

4.3.4.
Additional results for the triad model with equipartition of energy. In the final part of this section, we illustrate the imperfect model prediction skills in the equipartition of energy regime with Gaussian statistics. In this case, the three modes (u 1 , u 2 , u 3 ) in the triad model possess same amount of energy in statistical equilibrium state as in (4.3) and (4.4). In Figure 4.7 and 4.8, we show the prediction results to the periodic perturbations from the diagonal model with off-diagonal cross-covariances neglected, and from the reduced model with only the variance in the first mode u 1 resolved. Both GC1 and GC2 can capture the mean states in all three modes quite accurately, and GC1 and GC2 results have little difference. This is due to the relatively simple energy mechanism in this regime with same amount of energy in each mode. On the other hand, in the prediction of variances in the diagonal model larger errors appear especially in the modes u 2 and u 3 . This shows the important effects of the off-diagonal covariances in predicting second order moments in this equipartition energy regime. The reduced model gets good predictions for the variance in the first resolved mode u 1 . With the more expensive full model, the prediction for the variances in all three modes will become accurate with larger computational cost.

5.
Reduced-Order Statistical Models Applied to Two-Layer Baroclinic Turbulence. In this section, we validate the performance of the reduced-order models by the more complicated two-layer quasi-geostrophic system with baroclinic instability. It is shown that the baroclinic model is capable in capturing the essential physics of the relevant internal variability despite its relatively simple dynamical structure. Two dynamical regimes with typical statistical features are representative in many applications [90,4,79]. The first one is the fully turbulent flow with homogeneous statistics as a result of internal baroclinic   instability corresponding to the high-latitude ocean and atmosphere; the second one is the anisotropic flow field with strong meandering zonal jets as in the low/mid-latitude regime. Detailed discussions and comparisons of the construction and test about the reduced-order models in various regimes with representative zonal jets and vortices can be found in [79].
The governing two-layer quasi-geostrophic (QG) equations in a barotropic-baroclinic mode formulation for potential vorticity anomalies q ψ , q τ with periodic boundary condition in both x, y directions are [84,92] (5.1) Above q ψ = ∆ψ, q τ = ∆τ − k 2 d τ are the disturbance potential vorticities in barotropic and baroclinic mode respectively, while ψ, τ are the corresponding disturbance barotropic and baroclinic stream functions. The barotropic mode ψ can be viewed as the vertically averaged effect from the flow, and the baroclinic mode τ is usually related with the thermal effect in heat transport. Besides, J (A, B) = A x B y − A y B x represents the Jacobian operator. k d = √ 8/L d = (2 f 0 /NH) 2 is the baroclinic deformation wavenumber corresponding to the Rossby radius of deformation L d . A large-scale vertical shear (U, −U) with the same strength and opposite directions is assumed in the background to induce baroclinic instability. In the dissipation operators on the right hand sides of the equations (5.1), besides the hyperviscosity, ν∆ s q i , we only use Ekman friction, κ∆ψ 2 , with strength κ on the lower layer of the flow. 5.1. Representative dynamical regimes for the two-layer baroclinic turbulence. The two-layer quasi-geostrophic system can display various dynamical regimes with distinct statistical features as the parameters are changed. Parameters for high and low/mid latitude dynamical regimes are shown in Table 1. In numerical simulations, the true statistics are calculated by a pseudo-spectra code by resolving the two-layer equations (5.1) with 128 spectral modes zonally and meridionally, corresponding to 256 × 256 × 2 grid points in total. In the reduced-order methods, only the large-scale modes |k| ≤ 10 are resolved, which is about 0.15% of the full model resolution.
In the simulations for the unperturbed system in ocean and atmosphere regimes, Figure 5.1 displays the two-layer flow structure in high-latitude ocean regime. The first row is the snapshots of the barotropic and baroclinic vorticity. Homogeneous structure can be observed in both cases while larger scale structures appear in the baroclinic mode. It is important to notice the strong correlation in the coherent structures in the barotropic and baroclinic field, illustrating the strong energy transfer between the two modes. The following part shows time-series of energy in barotropic and baroclinic mode, − ffl ψq ψ , − ffl τq τ , as well as TABLE 1 Model parameters for ocean and atmosphere dynamical regimes in high and low/mid latitude. N is the model resolution, β , k d are the rotation parameter and the deformation frequency, U is the background mean shear flow, κ is the Ekman drag in the bottom layer. The last three columns display the unstable waveband from linear analysis. (k min , k max ) shows the range of unstable wavenumbers; σ max is the largest linear growth rate; and (k x , k y ) max is the position of the mode with maximum growth rate.  ffl ψ x τ. In Figure 5.2 the results for the two-layer flow in high-latitude atmosphere regime are compared. One important feature here is the flow field alternating between blocked and unblocked regimes. In the stream functions, it can be observed that in the blocked regime, zonal flow is blocked and the field is restricted at separated regimes, while in the unblocked regime strong zonal flow can be observed. Strong meridional heat flux can be observed in the blocked regime while the flow is in state with lower energy and low heat transfer in the zonal unblocked regime.
In mid/low latitude regimes, both the ocean and atmosphere are distinctly inhomogeneous on large scales. The existence of large-amplitude meandering zonal jets in these regimes suggests the regional metastable equilibria, while the large-scale forced perturbations may lead to regular or irregular fluctuations in some extent. The jet structures are illustrated in more detail in Figure 5.3 for the time-series of the zonally average mean flow, u = −∂ y ψ. In this low/mid latitude case, especially for the ocean regime, due to the strong zonal jets in wavenumber k y = 6, zonal modes with k x = 5, 6 become active due to the nonlinear interactions.
The general steady state statistical structures in the spectral field are shown in Figure 5  statistics in high-latitude, the mean states stay in small values within fluctuation errors in both ocean and atmosphere regimes. From the energy spectra, one observation is that the potential energy is dominant in large scales in the baroclinic modes, and the kinetic baroclinic energy becomes more important in small scales. For both regimes, we observe wide and energetic spectra that exchange energy between different scales, which indicates strong forward and backward energy cascades along the entire spectral modes.

5.2.
Predicting quasi-geostrophic statistical responses in reduced-order models. The quasi-geostrophic response to both stochastic and deterministic perturbations is an important subject in understanding the earth's atmospheric and oceanic interactions [2,47]. The same strategy developed in Section 3.3 and applied in the triad system for the first mode in Section 4.3 can be directly generalized to the statistical modeling of the two-layer QG system here.

5.2.1.
Statistical formulation about the two-layer baroclinic equations. We formulate the two-layer QG system with Galerkin truncation to finite number of spectral modes. In model simulations, consider a set of rescaled normalized quantities with a high wavenumber truncation N under standard Fourier basis e k = exp (ik · x) due to the periodic boundary condition, so that The introduction of this new set of quantities (5.2) offers convenience that the energy inner-product becomes the standard Euclidean form. Under the above settings, the rescaled set of equations of (5.1) can be summarized in the abstract form in the . Radial-averaged spectra in mean and second-order moments in both atmosphere and ocean high-latitude regimes. The first row compares the statistical mean states (in logarithmic coordinate). The following two rows show the variances, and statistical energy, in barotropic and baroclinic modes, as well as the potential energy. truncated subspace |k| ≤ N as in (1.1) where the linear operators are decomposed into the non-symmetric part L k involving β -effect and vertical shear flow U and dissipation part D k , together with the forcing F k combining deterministic component and stochastic component compared with (5.1). Most importantly, B (p, p) is the nonlinear interactions that conserve both energy and enstrophy. The same reduced-order modeling strategy then can be applied to the two-layer model following the algorithm in Section 3.4. Therefore the true dynamical equations for the statistical moment R k = p k * p k in the form of a 2 × 2 matrix containing barotropic and baroclinic mode in same wavenumber k become where c.c. represent the complex completion for the conjugate parts. On the right hand side of the equation, L k , D k represent the linear interactions between modes, including β -effect through the rotation of the earth, the effects from the mean shear flow U, as well as the dissipations from Ekman drag and hyperviscosity. Q σ ,k is the external forcing perturbations represented by hypothetical stirring and heating forces. Importantly, the nonlinear flux Q F,k represents the nonlinear interactions between different wavenumbers due to the advection term. Third-order moments with triad modes m + n = k enter the first two order moments dynamics representing the nonlinear energy transfer between small and large scales. The nonlinear energy exchange mechanism is crucial in the energy budget, and the conservation property is satisfied due to the triad symmetry as ∑ k trQ F,k = 0.

Reduced-order model predictions for responses in various dynamical regimes.
In constructing the reducedorder models, the same strategy is applied to the crucial but expensive nonlinear flux term Q F as in (3.8) Both equilibrium higher-order statistics and additional corrections are combined, and statistical energy equation is important to provide the scaling factor for optimal consistency and sensitivity. See [79] for more details. In checking the model sensitivity in the homogeneous high-latitude regimes, we introduce the forcing perturbation by changing the background jet strength U. Note that the deterministic perturbation about zonal mean flow advection forms a  difficult test case because the forcing is applied along all wavenumbers with stronger mean-fluctuation interactions involved. On the other hand, for the reduced order methods, only the perturbations at the limited resolved modes are quantified. This gives the inherent difficulty for applying the reduced order models to this kind of perturbations since we have no knowledge of the unresolved modes where large amount of energy is contained. The results with mean flow perturbations δU = ±0.05 in the ocean regime and perturbations δU = 0.02, −0.01 in the atmosphere regime are shown in Figure 5.5 and 5.6 separately. The perturbation accounts for about 5%-10% of the original shear strength U, and the corresponding responses in both energy and heat flux spectra are large due to this global perturbation at every wavenumber and nonlinear energy cascade. In the ocean regime, a wide waveband of modes |k| = 3, 4, 5, 6 becomes sensitive to the perturbations; while in the atmosphere regime, the first dominant mode |k| = 1 is especially sensitive to even small perturbations. This illustrates the strong nonlinear interactions between the high and low wavenumber modes. The reduced-order method displays uniform skill in capturing the sensitive responses in the large-scale modes for both positive and negative perturbation cases with only first 10 × 10 spectral modes resolved compared with the 256 × 256 full resolution model.
In Figure 5.7 and 5.8, we compare the model responses in both low/mid-latitude ocean and atmosphere regimes. In this inhomogeneous regime with anisotropic jets, the statistical variables combine the responses in the mean and variance, p * 1,k p 2,k =p * 1,kp2,k + p * 1,k p 2,k , to display the total effect from the perturbation. In the ocean regime, the dominant mode with largest sensitivity is at wavenumber |k| = 6 due to the zonal jet structure. The sensitivity is captured with accuracy in the  reduced-order method. Also we compare the time evolvement of the total resolved energy and heat flux. The prediction is also good with small error. In the atmosphere regime, |k| = 1 mode gets the largest statistical energy and is most sensitive to perturbations. One important feature is the large change in the heat flux in the first two modes, representing the exchange of energy in the dominant barotropic and baroclinic mode. Still the responses can be captured with accuracy in each mode in the spectra as well as the total energy and heat flux profile with only 10 2 modes resolved. Note that in both cases, the heat flux is weak due to the blocking effect from strong zonal jets.
6. Summary and Some Future Research Directions. Understanding and improving the predictive skill of imperfect models for high-dimensional complex turbulent systems is a formidable and challenging problem and has been investigated through multiple approaches with various mathematical theories through the years [84,29,44,63,88,47]. Low-order truncation methods for statistical prediction can overcome the curse of dimensionality [23,62] by concentrating on the subspace containing largest variability. On the other hand, anisotropic turbulent processes are representative in many engineering and environmental fluid flows [35,26] where energy transports intermittently from the smaller scales to impact the largest scales in these flows. Therefore significant model errors always occur due to the high wavenumber truncation in the imperfect model approximations. A systematic information-theoretic framework has been shown useful to improve model fidelity and sensitivity [58,59,10] for complex systems including perturbation formulas and multimodel ensembles that can be utilized to improve model error. In many applications to complex systems with model error such as the climate change science [22,74], it is crucially important to provide guidelines to improve the predictive skill of imperfect models for their responses to changes in various external forcing perturbations.
We discuss the general framework of efficient low-dimensional reduced-order models in this paper for turbulent dynamical systems with nonlinearity to capture statistical responses to external perturbations. The validity of the reduced-order modeling procedure is displayed via the simplest 3-dimensional triad model which is the building block of general turbulent systems, and further on the more complicated two-layer barotropic model with huge model reduction. The computational cost is reduced through a systematic approximation about the expensive nonlinear higher-order interactions. Additional damping and noise corrections are proposed to replace the third-order moments. Model consistency in unperturbed equilibrium and sensitivity to external perturbations are maintained through careful calibration about imperfect model error in a training phase before the prediction. The model errors are calibrated and reduced effectively through a combination of linear response theory involving only unperturbed equilibrium statistics and an information-theoretic framework using information theory. The general framework has been tested in detail for a series of dynamical systems with increasing complexity [63,78,79,54].
For future development about the methods, there exist several interesting and promising directions that are worth further investigation in the next stage: A) Tracking the model fluctuation statistics about the perfect statistical mean state. In many problems for turbulent systems, we can assume the statistical mean state is known with reasonable accuracy by averaging along the data trajectory, while the statistical fluctuations about the mean state are the quantities of interest [51,64]. It is useful to consider accurate and efficient ways to quantify the model fluctuations involving both uncertainties in the perfect system and errors due to imperfect model approximation; B) Design of a mitigation control strategy by using novel low-order statistical models. There is need to combine control theory with the statistical model reduction strategies for the principal large-scale modes in the turbulent dynamical systems. For example, it is interesting to consider the effects of climate change using control and statistical modeling strategies; C) Predicting passive scalar turbulence with complex flow field. Besides the turbulent flow field, the dynamics of the passive tracer advected by the turbulent flow has many interesting features with practical implications and is worth investigating. One important feature in the turbulent tracer field is the appearance of intermittency despite the near-Gaussian statistics in the background advection flow. The intermittency in time-series and fat-tails in the passive tracer distributions have been observed in nature [73], and have been investigated under a simpler modeling framework both theoretically and numerically [60,66,77]. It is thus interesting to consider low-order stochastic and statistical modeling about the tracer field advected by complex turbulent flows; D) More detailed consideration about the model nonlinearity. In the reduced-order approximation here, the overall strategy does not require explicit calculation of the inefficient quadratic forms directly, but instead mimic the statistical symmetry in the nonlinearity in simple and efficient forms. Still it is interesting to check the improvement in the low-order models with more detailed approximation.
Acknowledgment . This research of the Andrew Majda is partially supported by the Office of Naval Research through MURI N00014-16-1-2161 and DARPA through W911NF-15-1-0636. Di Qi is supported as a graduate research assistant on these grant.
Appendix A. Numerical strategies to calculate the kicked response operators. In the calibration step of the reducedorder models, we use the statistical kicked response theory to tune the imperfect model parameters in the training phase. Here we describe the details about calculating the kicked response operators for the mean and variance numerically. From the formula in (2.10), the response operators for the mean and variance can be achieved from the perturbation part of the probability density δ p . And this density function is also used to measure the information distance between the truth and imperfect model result in the training phase. Below we describe the numerical procedure to get this distribution function δ p for the true system and the imperfect closure model separately.
• Kicked response for the true model: For the true system, we want to achieve the most accurate possible estimation for the response operators both for comparison with the imperfect model results and for calculating the FDT linear prediction in (2.7). Therefore we use a Monte-Carlo simulation with an large enough ensemble size to capture the response in density. The initial equilibrium ensemble is picked by sampling from a normal distribution with consistent equilibrium mean and variance of the true system. For the kicked response to the mean, a constant perturbation with ten percent of the equilibrium state mean δ u = 0.1ū eq is added to each initial ensemble member (in fact, as observed in numerical experiments, this perturbation amplitude has little effect on the results in the response distribution as long as it's not too large); and the initial variance of the ensemble is kept unchanged. The response distribution δ p then is achieved by monitoring the decay of the ensemble particles back to equilibrium under unperturbed dynamics and uniformly perturbed initial value (and the length of the time window that we need to monitor depends on the mixing property of the turbulent system). See [5,48] for similar version of this algorithm. • Kicked response for the imperfect model: For the imperfect model, we just need to run the closure equations to get the responses for the mean and variance. In the same way as the true model, the initial mean is taken from the equilibrium distribution and a perturbation with amplitude δ u = 0.1ū eq is added to the initial mean state. The initial value for the variance is taken the same as the equilibrium state value and kept unperturbed. Then using this initial mean and variance, the imperfect model with specific closure strategies is applied to monitor the decay of the mean and variance back to equilibrium. One additional important point that requires attention is that even if the unperturbed equilibrium initial conditions are applied, the system will still deviate from the equilibrium state first and reapproach equilibrium again after some relaxation time. This is due to the insufficient characterization of the entire distribution of the true system with a Gaussian approximation (note that nonlinearities are also included in the imperfect closure methods). To eliminate this effect in computing the kicked response in both the true and imperfect models, we subtract the statistics computed using the unperturbed initial value from the statistics computed using the perturbed Gaussian initial condition to achieve more accurate characterization of the responses.
Appendix B. Explicit statistical dynamical formulations for the triad system. We can derive for the triad system (4.1) the dynamical equations for the mean state On the right hand sides of the above equations (B.1a)-(B.1c), the first parts include the skew-symmetric interactions between modes as well as the linear damping for the mean. The nonlinear interaction parts enter the mean dynamics both from the interactions between the mean states, and more importantly from the second-order moments of the fluctuations. Thus the mean dynamical equations are not closed by themselves due to the inclusion of unresolved higher-order statistics. Also note that in the second-order moments in the mean equations, diagonal variances won't appear while the cross-diagonal covariances take place as the role of transferring energy in the mean. Next we consider the dynamics for the fluctuation parts of the state variables du 1 dt = L 2 u 3 − L 3 u 2 − d 1 u 1 + B 1 ū 2 u 3 + u 2ū3 + u 2 u 3 − u 2 u 3 + σ 1Ẇ1 , (B.2a) du 2 dt = L 3 u 1 − L 1 u 3 − d 2 u 2 + B 2 ū 3 u 1 + u 3ū1 + u 3 u 1 − u 3 u 1 + σ 2Ẇ2 , (B.2b) The above equations can be achieved by subtracting the mean equations from the original triad system. Then the dynamics for higher order moments can be achieved through the fluctuations equations. Importantly, we can get the dynamical equations for the variances in each mode And the dynamical equations for the cross-covariances between modes become du 1 u 2 dt = L 2 u 2 u 3 − L 3 u 2 2 − d 1 u 1 u 2 + B 1 ū 2 u 2 u 3 + u 2 2ū 3 + B 1 u 2 u 2 u 3 + L 3 u 2 1 − L 1 u 1 u 3 − d 2 u 1 u 2 + B 2 ū 1 u 1 u 3 + u 2 1ū 3 + B 2 u 1 u 1 u 3 , (B.4a) du 1 u 3 dt = L 2 u 2 3 − L 3 u 2 u 3 − d 1 u 1 u 3 + B 1 ū 2 u 2 3 + u 2 u 3ū 3 + B 1 u 2 u 3 u 3 + L 1 u 1 u 2 − L 2 u 2 1 − d 3 u 1 u 3 + B 3 ū 1 u 1 u 2 + u 2 1ū 2 + B 3 u 1 u 1 u 2 , (B.4b) du 2 u 3 dt = L 3 u 1 u 3 − L 1 u 2 3 − d 2 u 2 u 3 + B 2 ū 1 u 2 3 + u 1 u 3ū 3 + B 2 u 1 u 3 u 3 + L 1 u 2 2 − L 2 u 1 u 2 − d 3 u 2 u 3 + B 3 ū 1 u 2 2 + u 1 u 2ū 2 + B 3 u 1 u 2 u 2 . (B.4c) For most situations, it is the diagonal variances in (B.3a)-(B.3c) that we are more interested in, while the off-diagonal covariances (B.4a)-(B.4c) are less important and expensive to resolve. On the other hand, it is noticed that only the cross-covariance terms take place in the central variance dynamics in (B.3a)-(B.3c) for the linear and quasi-linear interaction. In this typical case, if we only consider the diagonal model and ignore the off-diagonal terms in the statistical closure dynamics, huge errors could be introduced in the variance dynamical equations. Thus in the development of reduced order statistical models, careful calibration about the unresolved components becomes crucial in the accuracy of the prediction results.
With the dynamical equations for the mean (B.1) and for the variances in each mode (B.3) and (B.4), we can derive the statistical energy dynamics following the general framework proposed in (2.2) of Theorem 2.1 and also [53]. The statistical energy can be defined as the combination of the mean energy and the fluctuation energy as In the dynamics for the mean and covariance, the major difficulty in resolving the equations explicitly comes from the complex third-order moments in Q F as well as the covariance interactions through R i j B (e i , e j ). However due to the conservation of energy and the detailed triad symmetry in the triad system, nonlinear interactions cancel in the mean and variance equations and the statistical energy equation becomes Therefore the total statistical structure can be calculated from (B.5) without knowing the higher-order moments as well as cross-covariances which are in general difficult to resolve exactly without error. Furthermore, considering the special case with homogeneous damping in each mode, d j ≡ d, the dissipation term on the right hand side of (B.5) becomes −dE. Thus we can get the total second-order statistics from only the information in the first-order moments with the help of the statistical energy dynamics.