Asymptotic forecast uncertainty and the unstable subspace in the presence of additive model error

It is well understood that dynamic instability is among the primary drivers of forecast uncertainty in chaotic, physical systems. Data assimilation techniques have been designed to exploit this phenomena, reducing the effective dimension of the data assimilation problem to the directions of rapidly growing errors. Recent mathematical work has, moreover, provided formal proofs of the central hypothesis of the Assimilation in the Unstable Subspace methodology of Anna Trevisan and her collaborators: for filters and smoothers in perfect, linear, Gaussian models, the distribution of forecast errors asymptotically conforms to the unstable-neutral subspace. Specifically, the column span of the forecast and posterior error covariances asymptotically align with the span of backward Lyapunov vectors with non-negative exponents. Earlier mathematical studies have focused on perfect models, and this current work now explores the relationship between dynamical instability, the precision of observations and the evolution of forecast error in linear models with additive model error. We prove bounds for the asymptotic uncertainty, explicitly relating the rate of dynamical expansion, model precision and observational accuracy. Formalizing this relationship, we provide a novel, necessary criterion for the boundedness of forecast errors. Furthermore, we numerically explore the relationship between observational design, dynamical instability and filter boundedness. Additionally, we include a detailed introduction to the Multiplicative Ergodic Theorem and to the theory and construction of Lyapunov vectors.

1. Introduction. The seminal work of Lorenz [41] demonstrated that, even in deterministic systems, infinitesimal perturbations in initial conditions can rapidly lead to a long-term loss of predictability in chaotic, physical models. In weather prediction, this understanding led to the transition from single-trajectory forecasts to ensemble-based, probabilistic forecasting [39]. Historically, ensembles have been initialized in order to capture the spread of rapidly growing perturbations [12,50]. Data assimilation methods have likewise been designed to capture this variability in the context of Bayesian and variational data assimilation schemes; see, e.g., Carrassi et al. for a recent survey of data assimilation techniques in geosciences [13]. The ensemble Kalman filter, particularly, has been shown to strongly reflect these dynamical instabilities [16,42,25,7], and its performance depends significantly upon whether these rapidly growing errors are sufficiently observed and corrected.
The assimilation in the unstable subspace (AUS) methodology of Trevisan et al. [14,51,52,44,45] has provided a robust, dynamical interpretation of these observed properties of the ensemble Kalman filter. For deterministic, linear, Gaussian models, Trevisan et al. hypothesized that the asymptotic filter error concentrates in the span of the unstable-neutral backward Lyapunov vectors (BLVs), and this has recently been mathematically proven. Gurumoorthy et al. [27] demonstrated that the null space of the forecast error covariance matrices asymptotically contain the time varying subspace spanned by the stable BLVs. This result was generalized by Bocquet et al. [8], proving the asymptotic equivalence of reduced rank initializations of the Kalman filter with the full rank Kalman filter: as the number of assimilations increases towards infinity, the covariance of the full rank Kalman filter converges to a sequence of low rank covariance matrices initialized only in the unstable-neutral BLVs.
The convergence of the Kalman smoother error covariances onto the span of the unstableneutral BLVs, and the stability of low rank initializations, was established by Bocquet and Carrassi [7]; this latter work also numerically extended this relationship to weakly nonlinear dynamics and ensemble-variational methods. The works of Bocquet et al. [8] and Bocquet and Carrassi [7] relied upon the sufficient hypothesis that the span of the unstable and neutral BLVs remained uniformly completely observed. This hypothesis has recently been refined to a necessary and sufficient criterion for the exponential stability of continuous time filters, in perfect models, in terms of the detectability of the unstable-neutral subspace [23].
The present study is concerned with extending the limits of the results developed in deterministic dynamics (perfect models) to the presence of stochastic model errors. This manuscript and its sequel [26] seek to (i) determine the extent to which stable dynamics confine the uncertainty in the sequential state estimation problem in models with additive noise and (ii) to use these results to interpret the properties, and suggest design, of ensemblebased Kalman filters with model error. This manuscript studies the asymptotic properties of the full rank, theoretical Kalman filter [33] and the unfiltered errors in the stable BLVs. The sequel [26] utilizes these results to interpret filter divergence for reduced rank, ensemble-based Kalman filters.
In section 3 we present a detailed introduction to the BLVs. In section 4 we develop novel bounds on the forecast error covariance, describing the evolution of uncertainty as the growth of error due to dynamic instability and model imprecision, with respect to the constraint of observations. Together, the rate of dynamic instability and the observational precision form an inverse relationship which we use to characterize the boundedness of forecast errors. In Corollary 1 and Corollary 2, we prove a necessary criterion for filter boundedness in autonomous and time varying systems: the observational precision, relative to the background uncertainty, must be greater than the leading instability which forces the model error. However, the intuition of AUS needs additional qualifications when interpreting the role of model errors in reduced rank filters. Unlike perfect models, uncertainty in the stable BLVs does not generically converge to zero as a consequence of reintroducing model errors. Moreover, while stability guarantees that unfiltered errors remain uniformly bounded in the stable BLVs, the uncertainty may still be impractically large; even when a Lyapunov exponent is strictly negative, positive realizations of the local Lyapunov exponents can force transient instabilities which strongly amplify the forecast uncertainty. The impact of stable modes on forecast uncertainty differs from similar results for nonlinear, perfect models by Ng et al. [42] and Bocquet, Raanes, and Hannart [9], where the authors demonstrate the need to correct stable modes in the ensemble Kalman filter due to sampling errors induced by nonlinearity. Likewise, this differs from the nonlinear AUS extended Kalman filter (EKF-AUS-NL) of Palatella and Trevisan [45] that accounts for truncation errors in the estimate of the forecast uncertainty in nonlinear models. In subsection 5.2, we derive the mechanism for the transient instabilities amplifying perturbations as a linear effect in the presence of model errors. We furthermore provide a computational framework to study the variance of these perturbations.
In subsection 5.3, we study the filter boundedness and stability criteria of Bocquet et al.
[8] and Frank and Zhuk [23] in their relation to bounding forecast errors in imperfect models. Likewise, we explore their differences in the context of dynamically selecting observations, similar to the work of Law et al. [37]. With respect to several observational designs as benchmarks, we numerically demonstrate that the unconstrained growth of errors in the stable BLVs of high variance can be impractically large compared to the uncertainty of the full rank Kalman filter. These results have strong implications for ensemble-based filtering in geosciences and weather prediction, where ensemble sizes are typically extremely small relative to the model dimension. In perfect models, an ensemble size large enough to correct the small number unstable and neutral modes might suffice. However, our results suggest the need to further increase the rank of ensemble-based gains. The significance of this result for ensemblebased Kalman filters and their divergence is further elaborated on in the sequel [26].
2. Linear state estimation. The purpose of recursive data assimilation is estimating an unknown state with a sequential flow of partial and noisy observations; we make the simplifying assumption that the dynamical and observational models are both linear and the error distributions are Gaussian. In this setting, given a Gaussian distribution for the initial state, the distribution of the estimated state is Gaussian at all times. Formulated as a Bayesian inference problem, we seek to estimate the distribution of the random vector x k ∈ R n evolved via a linear Markov model, with observations y k ∈ R d given as The model variables and observation vectors are related via the linear observation operator H k : R n → R d . Let I n denote the n × n identity matrix. We denote the model propagator from time t l−1 to time t k as M k:l M k · · · M l , where M k:k I n .
For all k, l ∈ N, the random vectors of model and observation noise, w k , w l ∈ R n and v k , v l ∈ R d , are assumed mutually independent, unbiased, Gaussian white sequences. Particularly, we define where E is the expectation, R k ∈ R d×d is the observation error covariance matrix at time t k , and Q k ∈ R n×n stands for the model error covariance matrix. The error covariance matrix R k can be assumed invertible without losing generality. For simplicity we assume the dimension of the observations d ≤ n will be fixed.
For two positive semidefinite matrices, A and B, the partial ordering is defined B ≤ A if and only if A − B is positive semidefinite. To avoid pathologies, we assume that the model error and the observational error covariance matrices are uniformly bounded, i.e., there are constants q inf , q sup , r inf , r sup ∈ R such that for all k, Rather than explicitly computing the evolution of the distribution for x k , the Kalman filter computes the forecast and posterior distributions parametrically via recursive equations for the mean and covariance of each distribution. Definition 1. The forecast error covariance matrix P k of the Kalman filter satisfies the discrete-time dynamic Riccati equation [33] where Ω k H T k R −1 k H k is the precision matrix of the observations. Equation (2.6) expresses the error covariance matrix, P k+1 , as the result of a two-step process: (i) the assimilation at time t k yielding the analysis error covariance, and (ii) the forecast, where the analysis error covariance is forward propagated by Assuming that the filter is unbiased such that the initial error is mean zero, it is easy to demonstrate that the forecast and analysis error distributions are mean zero at all times. In this context, the covariances P k , P a k represent the uncertainty of the state estimate defined by the filter mean. As we will focus on the evolution of the covariances, we neglect the update equations for the mean state and refer the reader to Jazwinski [31] for a more complete discussion. The classical conditions for the boundedness of filter errors, and the independence of the asymptotic filter behavior from its initialization, are given in terms of observability and controllability. Observability is the condition that given finitely many observations, the initial state of the system can be reconstructed. Controllability describes the ability to move the system from any initial state to a desired state given a finite sequence of control actions-in our case the moves are the realizations of model error. These conditions are described in the following definitions, beginning with the information and controllability matrices.
Definition 2. We define Φ k:j to be the time varying information matrix and Υ k:j to be the time varying controllability matrix, where For γ ≥ 0 let us define the weighted controllability matrix as Note that Ξ 0 k:j ≡ Υ k:j . We recall from section 7.5 of Jazwinski [31] the definitions of uniform complete observability (respectively, controllability).
then the system is uniformly completely observable. Likewise suppose there exist N Υ , a, b > 0 independent of k for which k > N Υ implies 0 < aI n ≤ Υ k:k−N Υ ≤ bI n ; (2.12) then the system is uniformly completely controllable.
Hypothesis 1. Assume that the system of equations (2.1) and (2.2) is (a) uniformly completely observable; (b) uniformly completely controllable. Remark 1. We will explicitly refer to Hypothesis 1 whenever it is used. When we refer Hypothesis 1 alone, we refer to both parts (a) and (b). At times, we will explicitly only use either part (a) or part (b) of Hypothesis 1. Theorem 1. Suppose the system of equations (2.1) and (2.2) satisfies Hypothesis 1 and P 0 > 0. Then there exist constants p a inf and p a sup independent of k such that the analysis error covariance is uniformly bounded above and below Given any two initializations of the prior error covariance P 0 , P 0 > 0 with associated sequences of analysis error covariances P a k , P a k , the covariance sequences converge, lim k→∞ P a k − P a k = 0, exponentially in k. These are classical results of filter stability; see, for example, Theorem 7.4 of Jazwinski [31] or Bougerol's work with random matrices [10,11] for a generalization. The square root Kalman filter is a reformulation of the recurrence in (2.6) which is used to reduce computational cost and obtain superior numerical precision and stability over the standard implementations; see, e.g., [49] and references therein. The advantage of this formulation to be used in our analysis is to explicitly represent the recurrence in (2.6) in terms of positive semidefinite, symmetric matrices. Definition 4. Let P k be a solution to the time varying Riccati equation (2.6), and define X k ∈ R n×n to be a Cholesky factor of P k such that The root X k in (2.14) can be interpreted as an ensemble of anomalies about the mean as in the ensemble Kalman filter [22,2]. In operational conditions, it is standard that the forecast error distribution is approximated with a suboptimal, reduced rank surrogate [17]. Using a reduced rank approximation, the estimated covariance and exact error covariance are not equal, and this can lead to the systematic underestimation of the uncertainty [26]. However, in the following we will assume that X k is computed as an exact root. The sequel to this work explicitly treats the case of reduced rank, suboptimal filters [26].
Definition 5. We order singular values σ 1 > · · · > σ n such that and we write 0 ≤ αI n ≤ X T k Ω k X k ≤ βI n ≤ ∞ for all k. Equation (2.15) is closely related to the singular value analysis of the precision matrix by Johnson, Hoskins, and Nichols [32] and the analysis of the conditioning number for the Hessian of the variational cost function by Haben, Lawless, and Nichols [28] and Tabeart et al. [48]. These works study the information gain from observations, relative to the background uncertainty, due to the assimilation step. The primary difference between these earlier works and our study here is that the background error covariance is static in these variational formulations, while in the present study the root X k is flow dependent. In this flow dependent context, the constant α (respectively, β) is interpreted as the minimal (respectively, maximal) observational precision relative to the maximal (respectively, minimal) background forecast uncertainty. The constant α is nonzero if and only if the principal angles between the column span of X k and the kernel of H k are bounded uniformly below. Generally, we thus take α = 0 unless observations are full dimensional. A nonzero value for α can be understood as an ideal scenario.
Using Definition 4 and the matrix shift lemma (see [8, Appendix C]) we rewrite the forecast Riccati equation (2.6) as from which we infer Iterating on the above inequality, we obtain the recursive bound The bounds in (2.20) explicitly describe the previously introduced uncertainty as dynamically evolved to time k, relative to the constraint of the observations. We will utilize the BLVs to extract the dynamic information from the sequences of matrices M k:l , M T k:l . 3. Lyapunov vectors. This section contains a short introduction to Lyapunov vectors and the multiplicative ergodic theorem (MET). For a more comprehensive introduction, there are many excellent resources at different levels of complexity; see, for example, [1,38,4,5,36,24]. There is inconsistent use of the terminology for Lyapunov vectors in the literature, so we choose to use the nomenclature of Kuptsov and Parlitz [36] for its accessibility and self-consistency.
Consider the growth or decay of an arbitrary, norm one vector v 0 ∈ R n to its state at time t k via the propagator M k:0 . This is written as so that the eigenvectors of the matrix M T k:0 M k:0 describe the principal axes of the ellipsoid defined by the unit disk evolved to time t k . Using the above relationship for the reverse time model, we see the growth or decay of the unit disk in reverse time as The principal axes of the past ellipsoid that evolves to the unit disk at the present time are thus precisely the eigenvectors of the matrix M −T 0:−k M −1 0:−k . There is no guarantee in general that there is consistency between the asymptotic forward and reverse time growth and decay rates, i.e., in (3.1) and (3.2) as k → ∞. Generally, models may have Lyapunov spectrum defined as intervals of lower and upper growth rates; see, e.g., Dieci and Van Vleck [19,20]. However, as we are motivated by the tangent-linear model for a nonlinear system, we may assume some "regularity" in the dynamics.
The antisymmetry of the forward/reverse time, regular and adjoint models' growth and decay, is known as Lyapunov-Perron regularity (LP-regularity) [5] and is equivalent to the classical Oseledec decomposition; see [4, Theorem 2.1.1]. LP-regularity guarantees that (i) the Lyapunov exponents are well defined for the linear model as point-spectrum, (ii) the linear space is decomposable into subspaces that evolve covariantly with the linear propagator, and (iii) each such subspace asymptotically grows or decays according to one of the point-spectrum rates. We summarize the essential results of Oseledec's theorem for use in our work in the following; see Theorem 2.1.1 of Barreira and Pesin [4] for a complete statement.
Definition 6. For p ≤ n, the Lyapunov spectrum of the system (2.1) is defined as the set where λ 1 > · · · > λ p and κ i corresponds to the multiplicity (degeneracy) of the exponent λ i . We separate nonnegative and negative exponents, λ n 0 ≥ 0 > λ n 0 +1 , such that each index i > n 0 corresponds to a stable exponent. The subspaces E i k are denoted Oseledec spaces, and the decomposition of the model space into the direct sum is denoted Oseledec splitting.
For arbitrary linear systems LP-regularity is not a generic property-it is the MET that shows that this is a typical scenario for a wide class of nonlinear systems. A point will be defined to be LP-regular if the tangent-linear model along its evolution is LP-regular. We state a classical version of the MET (see Theorem 2.1.2 of [4] and the following discussion), but note that there are more general formulations of this result and more general forms of the associated covariant-subspace decompositions. These results go beyond the current work; see, e.g., Froyland et al. [24] and Dieci, Elia, and Van Vleck [20,18] for a stronger version of the MET and related topics.
Theorem 3 (multiplicative ergodic theorem). If f is a C 1 diffeomorphism of a compact, smooth, Riemannian manifold M , the set of points in M which are LP-regular has measure 1 with respect to any f -invariant Borel probability measure ν on M . If ν is ergodic, then the Lyapunov spectrum is constant with ν-probability 1.
Loosely, the MET states that, with respect to an ergodic probability measure (that is compatible with the map f and the usual topology), there is probability one of choosing initial conditions for which the Lyapunov exponents are well defined and independent of initial condition. This form of the MET has a wide range of applications in differentiable dynamical systems, but the MET is not limited to this setting. The strong version of the MET has been applied in, e.g., hard disk systems, the truncated Fourier expansions of PDEs, and with nonautonomous ODEs and their transfer operators. See Example 1.2 of [24] for a discussion of these topics. For the rest of this work, we will take the hypothesis that our model satisfies LP-regularity.
Hypothesis 2. The model defined by the deterministic equation is assumed to be LP-regular. For Gaussian error distributions, the evolution of the forecast error distribution is interpreted in terms of Oseledec's theorem as the evolution of deviations from the mean, propagated via the equations for perturbations. While Oseledec's theorem guarantees that a decomposition of the model space exists, constructing such a decomposition is nontrivial. Motivated by (3.1) and (3.2), we define the following operators as in equations (13) and (14) of Kuptsov and Parlitz [36].
Definition 7. We define the far-future operator as and the far-past operator as In the classical proof of the MET, the far-future/past operators are shown to be well defined positive definite, symmetric operators [43]. As they are diagonalizable over R, we order the eigenvalues of W + (k) as µ + 1 (k) > · · · > µ + p (k) and the eigenvalues of . By the MET, the eigenvalues µ ± i (k) are independent of k and satisfy the relationship Definition 8. Let the columns of the matrix F k , respectively, B k , be any orthonormal eigenbasis for the far-future operator W + (k), respectively, far-past operator W − (k). Order the columns blockwise such that for each i = 1, . . . , p and each j = 1, . . . , κ i , F The CLVs are defined only by the Oseledec spaces and, therefore, are independent of the choice of a norm-any choice of basis subordinate to the Oseledec splitting is valid. On the other hand, the FLVs and the BLVs are determined specifically with respect to a choice of a norm and the induced metric. The choice of basis in each case can be made uniquely (up to a scalar and the choice of a norm) only when p = n. For the remaining work we will focus on the BLVs; for a general survey on constructing FLVs, BLVs, and CLVs, see, e.g., Kuptsov and Parlitz [36] and Froyland et al. [24].
The Oseledec spaces and Lyapunov vectors can also be defined in terms of filtrations, i.e., chains of ascending or descending subspaces of R n . This forms an axiomatic approach to constructing abstract Lyapunov exponents used by, e.g., Barreira and Pesin [4]. The BLVs describe an orthonormal basis for the ascending chain of Oseledec subspaces, the backward filtration [36]. For all 1 ≤ m ≤ p we obtain the equality c 2018 SIAM and ASA. Published by SIAM and ASA under the terms of the Creative Commons 4.0 license by equation (17) of Kuptsov and Parlitz [36] and the decomposition of the backward filtration in equations (1.5.1) and (1.5.2) of Barreira and Pesin [4]. Note that (3.9) does not imply B i j k ∈ E i k for i > 1, as the BLVs are not themselves covariant with the model dynamics. However, the BLVs are covariant with the QR algorithm.
Lemma 3.1. Outside of a set of Lebesgue measure zero, a choice of i j linearly independent initial conditions for the recursive QR algorithm converges to some choice for the leading i j BLVs. For any k, the BLVs satisfy the relationship where T k is an upper triangular matrix. Moreover, for any i j and any k, Proof. The covariance of the BLVs with respect to the QR algorithm in (3.10) can be derived from (3.3) and (3.9). For all 1 ≤ m ≤ p, due to the covariance of the Oseledec spaces. Therefore the transformation M k represented in a moving frame of BLVs is upper triangular. When the spectrum is degenerate, p < n, there is nonuniqueness in the choice of the BLVs. However, given an initial choice of the BLVs at some time k − 1, the choice of BLVs at time k can be defined directly via the relationship in (3.10). This is the relationship derived in equation (31) by Kuptsov and Parlitz [36] and is the basis of the recursive QR algorithms of Shimada and Nagashima [47] and Benettin et al. [6]. A choice of BLVs gives a special choice of the classical Perron transformation (see Theorems 3.3.1 and 3.3.2 of [1]), and in particular, it is proven by Ershov and Potapov [21] that outside of a set of Lebesgue measure zero, the recursive QR algorithm converges to some choice of BLVs.
Note that the far-future/past operators are also well defined for the propagator of the adjoint model z k = M −T k z k−1 . Equation (3.11) thus follows from the far-past operator for the adjoint model, defined as It is easy to verify that the BLVs defined by the adjoint model agree with those defined via the regular model-in each case, the left singular vectors of M k:k−l converge to a choice of the BLVs as l → ∞. Notice that the eigenvalues of Equation (3.10) describes the dynamics in the moving frame of BLVs, where the transition map from the frame at time t k−1 to time t k is given by T k . Applying the change of basis sequentially for the matrix M k:l , we recover where we define T k:l T k · · · T l+1 . We note that M k:k = I n implies T k:k ≡ I n . Let e i j denote the i j th standard basis vector such that where T T k:l i j denotes the i j th column of T T k:k−l , i.e., the i j th row of T k:k−l . For any k and any > 0, there exists some N ,k such that if k − l is taken sufficiently large, (3.11) guarantees Definition 9. For each k > l, i = 1, . . . , p, and j = 1, . . . , κ i we define the i j th local Lyapunov exponent (LLE) from k to l as 1 Proof. This is also discussed by Ershov and Potapov [21] in demonstrating the convergence of the recursive QR algorithm. For a discussion on the numerical stability and convergence see, e.g., Dieci and Van Vleck [19,20].
Perturbations of model error to the mean equation for the Kalman filter are not governed by the asymptotic rates of growth or decay but, rather, by the LLEs. While the LLE 1 k−l log(|T i j k:l |) approaches the value λ i as k − l approaches infinity, its behavior on short time scales can be highly variable. Particularly, for an arbitrary LP-regular system, the rate of convergence in (3.11) may depend on k. An important class of such systems is, e.g., nonuniformly hyperbolic systems (see chapter 2 of [4]). To make the LLEs tractable, we make an additional assumption, compatible with the typical assumptions for partial hyperbolicity [29]. We adapt the definition of partial hyperbolicity from Hasselblatt and Pesin [30] to our setting.
Definition 10. Let λ n 0 = 0. For every k we define the splitting into unstable, neutral, and stable subspaces: Suppose there exist constants C > 0 and independent of k such that ν s < 1 < η u and for any l > 0, v ∈ E m k , v = 1, and m ∈ {s, c, u} Partially hyperbolic systems, as in Definition 10, have LLEs which are bounded uniformly with respect to rates defined on the subspaces in (3.18). When C is taken large the definition permits transient growth of stable modes and transient decay of unstable modes. The neutral subspace encapsulates diverse behaviors which always fall below prescribed rates of exponential growth or decay. We will make a slightly stronger assumption on these uniform growth and decay rates that is equivalent to fixing a uniform window of transient variability on each Lyapunov exponent.
Hypothesis 3. Let > 0 be given. We assume that for each i there exists some N i, , independent of k and j, such that for any B i.e., the growth and decay is uniform (translation invariant) in k.
Unless specifically stated otherwise, we assume Hypothesis 3 for the remaining of this paper. However, our results may be generalized to all systems satisfying Definition 10 by using only the uniform rates of growth or decay on the entire unstable, neutral, and stable subspaces in (3.20). Our results also apply to systems without neutral exponents, i.e., λ n 0 > 0, as a trivial extension.

Dynamically induced bounds for the Riccati equation.
4.1. Autonomous systems. Consider the classical theorem regarding the existence and uniqueness of solutions to the stable Riccati equation for autonomous dynamics. This is paraphrased from Theorem 2.38, Chapter 7, of Kumar and Varaiya [35] in terms of the forecast error covariance recurrence in (2.18).
Definition 11. The autonomous system is defined such that for every k and Ω k ≡ Ω.
Let P = XX T for some X ∈ R n×n ; the stable Riccati equation is defined as Theorem 4. Let the autonomous system defined by (2.1), (2.2), and (4.1) satisfy Hypothesis 1. There is a positive semidefinite matrix, P ≡ X X T , which is the unique solution to the stable Riccati equation (4.2). For any initial choice of P 0 , if P k satisfies the recursion in (2.18), then lim k→∞ P k = P.
Slightly abusing notation, take α and β to be defined by the solution to the stable Riccati equation (4.2), Then for any k we recover the invariant recursion for the stable limit .

(4.7)
For every 1 ≤ i ≤ p, any > 0, and associated N i, as in Hypothesis 3, Proof. Note that time invariant propagators trivially satisfy Hypothesis 3 and it is easy to verify the relationship |µ i | = e λ i directly from the definition of the Lyapunov exponents. We begin by proving (4.6) and (4.8) for eigenvectors of M T . If v i j is an eigenvector of M T associated to µ i , (4.4) implies for every k. For λ i < 0 generally, or for any λ i such that α > e 2λ i − 1, The stable Riccati equation (4.2) implies Q ≤ P. Therefore, using the left side of (4.
for all k. In particular, for every eigenvector The above argument does not have a straightforward extension to the generalized eigenspaces, so we coarsen the bound to obtain a closed limiting form in terms of the BLVs which retain the important growth characteristics under M T . For i > n 0 , or for any λ i such that α > e 2λ i − 1, there is a choice of as in (4.5) and N i, as in Hypothesis 3. Let P ≤ p sup I n ; then from the right side of (4.4) we derive which implies B i j T PB i j can be bounded above by for every k > N i, . Taking the limit of (4.18) as k → ∞ yields The lower bound is demonstrated by similar arguments with the lower bound in (4.4), utilizing the property P < ∞.
Proposition 1 is similar to results in perfect models [9,27,8] but with some key differences. Once again the estimation errors are dissipated by the dynamics in the span of the stable BLVs, but the recurrent injection of model error prevents the total collapse of the covariance to the unstable-neutral subspace. In (4.6), we see that for very strong decay, when e 2λ i ≈ 0, or for high precision observations, i.e., when the system is fully observed and as α → ∞, the stable limit of the forecast uncertainty reduces to what is introduced by the recurrent injection of model error. The singular evolutive extended kalman filter of Pham, Verron, and Roubaud [46] has exploited these properties by neglecting corrections in the stable eigenspaces and only c 2018 SIAM and ASA. Published by SIAM and ASA under the terms of the Creative Commons 4.0 license making corrections in the unstable directions. This is likewise the motivation for AUS of Trevisan et al. [14,51,52,44,45], though the work of AUS was concerned with nonlinear, perfect models.
The upper bounds in (4.6) and (4.7) generally hold for i ≤ n 0 only when the system is fully observed. Therefore, these bounds can be considered an ideal bound for the unstableneutral modes. However, the lower bound in (4.14) hold generally for i < n 0 . By assuming the existence of an invariant solution to the stable Riccati equation (4.2), we will recover a necessary condition for its existence.
Then it is necessary that where v i 0 ≡ 0. Therefore, for any m ≥ 1, the vector M T − µ i I n m v i j is in the span of Corollary 1 shows that it is necessary for the existence of the stable Riccati equation that observations are precise enough, relative to the background uncertainty, to counteract the strongest dynamic instability forcing the model error. The quantity in (4.20) thus represents the stabilizing effect of the observations, similar to the bounds on the conditioning number provided by Haben, Lawless, and Nichols [28] and Tabeart et al. [48], but in Corollary 1 expressly in response to the system's dynamic instabilities. .

(4.24)
If Hypothesis 1(a) is also satisfied, then for every 1 ≤ i ≤ p, any > 0 and associated Proof. If the system satisfies Hypothesis 1(b), then 26) where b N i, is independent of k. Therefore, there exists a constant depending on α and N i, , but independent of k, such that Ξ α k:k−N i, ≤ C α,N i, I n . From the above, we bound B Suppose that Hypothesis 1(a) and (b) are both satisfied; then by Theorem 1 there exists a uniform bound on P k such that X k must also be uniformly bounded; together with uniform boundedness of R k and H k , this implies β < ∞. Note that for a constant C β,N i, depending on β and N i, but independent of k. Utilizing the recursion in (2.20), choosing and an appropriate N i, , and finally bounding the weighted controllability matrix with (4.32) allows one to recover the lower bound in (4.25) in a similar manner to the upper bound.
The above proposition shows that there is a uniform upper and lower bound on the forecast error for the Kalman filter, in the presence of model error, which can be described in terms of inverse, competing factors: the constant α (respectively, β) is interpreted as the minimal (respectively, maximal) observational precision relative to the maximal (respectively, minimal) background forecast uncertainty represented in the observation variables. Additionally C β,N i , C α,N i represent the lower and upper bounds on local variability of the evolution of model errors before perturbations adhere within an threshold to their asymptotic behavior.
Proof. If the forecast error Riccati equation (2.18) is uniformly bounded, there is a 0 < p sup < ∞ such that we have the inequality, P k ≤ p sup I n for all k, and β < ∞. Using the lower bound in (2.20), for all k we have The summands in (4.34) are positive semidefinite such that for any k > N Υ + 1, truncating Ξ β k:1 verifies Note that by Definition 2, if k > N Υ + 2, and therefore, for every k > N Υ + 2, Using Hypothesis 1(b), for every k > N Υ + 2 we derive using the inequality in (2.12). For any j, multiplying (4.38) on the left by (B 1 j k ) T and on the right by B 1 j k and taking the limit as k → ∞ shows that it is necessary for (4.33) to hold for the left side to be bounded away from ∞.
In contrast to Corollary 1 for autonomous systems, Corollary 2 uses the Hypothesis 1(b) to simplify the arguments-this, moreover, guarantees the necessary criterion is with respect to λ 1 , as the controllability matrix is guaranteed to be positive definite and thus nonvanishing on every Oseledec space. There is, however, a more direct analog to the statement of Corollary 1 where the adjoint-covariant Lyapunov vectors will play the role of the eigenvectors of M T . It is easy to demonstrate that the adjoint-covariant Lyapunov vectors have the desired covariance and growth/decay with respect to the reverse time adjoint model, M T k . There exist, under the condition of integrally separated Oseledec spaces, classical constructions for covariant and adjoint-covariant bases that decompose the model propagator into a block-upper triangular form (see Theorem 5.4.9 of [1]). This decomposition makes the derivation of a precise statement like Corollary 1 analogous in time varying models with respect to the adjoint-covariant Laypunov vectors and adjoint-covariant Oseledec spaces. However, the above arguments require significant additional exposition which we feel unnecessary, as Corollary 2 is sufficiently general.
Corollary 3. Assume (2.1) and (2.2) satisfy Hypothesis 1(b), and suppose H k X k ≡ 0 for every k such that α = β = 0. Let k ≥ 1, and choose v ∈ span{B Proof. The inequality in (2.20) is an equality for the unfiltered forecast where β = α = 0. Thus the corollary is clear for any stable BLV directly from Proposition 2, and the conclusion extends to norm one linear combinations.
Corollary 3 extends the intuition of AUS to the presence of model error: corrections may be targeted along the expanding modes while the uncertainty in the stable modes remains bounded by the system's dynamic stability alone. Particularly, without filtering uncertainty remains uniformly bounded in the span of the stable BLVs. This is analogous to the results of Bocquet et al. [8], where in perfect models, the stable dynamics alone are sufficient to dissipate forecast error in the span of the stable BLVs. With α = 0, the uniform bound in Corollary 3 may be understood by the two components which (4.24) is composed of, the bound on Υ k:k−N i, and q sup e 2(λ i + )N i, +1 1 − e 2(λ i + ) . (4.40) The controllability matrix Υ k:k−N i, represents the newly introduced uncertainty from model error that is yet to be dominated by the dynamics. Equation (4.40) represents an upper bound on the past model errors that have already been dissipated by the stable dynamics. Nevertheless, this uniform bound is uninformative about the local variability. In the following sections, we study the variance of the unfiltered uncertainty in the stable BLVs compared to the uncertainty of the Kalman filter.

Experimental setup.
To satisfy Hypothesis 2, we construct a discrete, linear model from the nonlinear Lorenz-96 (L96) equations [40], commonly used in data assimilation literature; see, e.g., [13] and references therein. For each m ∈ {1, . . . , n}, the L96 equations read dx dt such that the components of the vector x are given by the variables x m with periodic boundary conditions, x 0 = x n , x −1 = x n−1 and x n+1 = x 1 . The term F in L96 is the forcing parameter.
The tangent-linear model [34] is governed by the equations of the Jacobian matrix, ∇L(x), We fix the model dimension n 10 and the forcing parameter as the standard F = 8, as the model exhibits chaotic behavior, while the small model dimension makes the robust computation of Lyapunov vectors numerically efficient. The linear propagator M k is generated by computing the discrete, tangent-linear model [34] from the resolvent of the Jacobian equation (5.2) along a trajectory of the L96, with an interval of discretization at δ 0.1. We numerically integrate the Jacobian equation with a fourth order Runge-Kutta scheme with a fixed time step of h 0.01.
For F = 8, the 10 dimensional nonlinear L96 model has a nondegenerate Lyapunov spectrum, and we replace the superscript i j with i for the BLVs. The model has three positive, one neutral, and six negative Lyapunov exponents such that n 0 = 4. The Lyapunov spectrum for the discrete, linear model is computed directly via the relationship in Lemma 3.2, where the average is taken over 10 5 iterations of the recursive QR algorithm after precomputing the BLVs to convergence. In our simulations, before our analysis, we precompute the BLVs and the FLVs over 10 5 iterations of the recursive QR algorithm for the forward model, or, respectively, for the reverse time adjoint model (see section 3 of [36]). We note that the computed Lyapunov spectrum for the discrete, linear model as in simulations is related to the spectrum of the nonlinear L96 model by rescaling the linear model's exponents by 1 δ .

Variability of recurrent perturbations.
While Corollary 3 guarantees that the uncertainty in the stable BLVs is uniformly bounded, this bound strongly reflects the scale of the model error and the local variance of the Lyapunov exponents. If model errors are large, or the stable Lyapunov exponents have high variance, this indicates that the uniform bound can be impractically large for forecasting. Assume no observational or filtering constraint, i.e., H k X k ≡ 0. Suppose that the model error statistics are uniform in time and spatially uncorrelated with respect to a basis of BLVs: Q k B k DB T k , where D ∈ R n×n is a fixed diagonal matrix with the i j th diagonal entry given by D i j . Denote P 0 ≡ Q 0 ; then (2.20) becomes Assuming the errors are uncorrelated in the basis of BLVs is a strict assumption, but studying the free evolution of perturbations has general applicability, B T k Q k B k ≤ q sup I n , and therefore, (5.4) may be interpreted in terms of an upper bound on the variance of the freely evolved forecast uncertainty in the i j th mode. Algorithm 5.1 describes our recursive approximation of the free evolution, given by (5.4), for k ∈ {1, . . . , m} via the QR algorithm. We assume that the QR algorithm has been run to numerical convergence for the BLVs at time 0. The approximation of (5.4) with Algorithm 5.1 is numerically stable for all k and any i > n 0 , precisely due to the upper triangular dynamics in the BLVs. The upper triangularity of all T k means the lower right block of T k:l is given as the product of the lower right blocks of the sequence of matrices {T j } k j=l+1 . Therefore, computing the stable block of T k:l is Algorithm 5.1. Free evolution of perturbations in the i j th BLV. Define B 0 to be the BLVs at time zero and m ≥ 1. for k = 1, . . . , m do Let T k , B k be defined by the QR recursion (3.10), and let T s k ∈ R s×s be the lower right submatrix of T k corresponding to the stable exponents. Set Ψ i j k = 1 and T I s .
for each i = n 0 + 1, . . . , p and j = 1, . . . , κ i . end for return Ψ i j k end for independent of the unstable exponents, and the row norms of T k:l converge uniformly and exponentially to zero by Hypothesis 3.
In Figure 5.1 we plot Ψ 5 k and Ψ 6 k as in Algorithm 5.1 and the LLEs for B 5 k and B 6 k for k ∈ {1, . . . , 10 4 }. Assuming that Q k ≤ q sup I n , Ψ i k bounds the variance in the ith stable mode at time k, up to the scaling factor of q sup . As n 0 = 4, the exponent λ 5 is the stable exponent closest to zero. The left side of Figure

Unfiltered versus filtered uncertainty.
In the following, we compare the variance of the unfiltered error in the stable BLVs, represented by Ψ i k for i ∈ {5, . . . , 10}, with the uncertainty in the Kalman filter. Assuming that Q k I n , in this case Ψ i k is equal to the variance of the unfiltered error along B i k . While the error in the Kalman filter depends on the observational configuration, the value of Ψ i k depends only on the underlying dynamics. Therefore, we benchmark the variance of the unfiltered error over a range of observational designs to determine under what conditions the unfiltered error in the stable BLVs will exceed the uncertainty of the full rank Kalman filter. This analysis allows us to evaluate how many of the stable BLVs can remain unfiltered while achieving an acceptable forecast performance. This comparison has a special significance when considering reduced rank, suboptimal filters, which is the subject of the sequel [26].
The recent works of Bocquet et al.
[8] and Frank and Zhuk [23] weaken Hypothesis 1 to criteria on the observability, or detectability, of the unstable-neutral subspace to obtain filter stability and boundedness in perfect models. The results in Corollary 1, Corollary 2, and Corollary 3 similarly suggest that the sufficient condition for filter boundedness, Hypothesis 1, may be weakened in the presence of model errors. For this reason, we will study the variance of the filtered error with respect to observations satisfying the criteria discussed by Bocquet et al. [8] and Frank and Zhuk [23].
Given a fixed dimension of the observational space d < n, consider finding an observational operator, H k , which minimizes the forecast uncertainty. Suppose the singular value decomposition of an arbitrary choice of H k is given as For a given observation error covariance matrix, the size of the singular values of H k correspond to the precision of observations relative to the uncertainty in the precision matrix, Ω k H T k R −1 k H k . Imposing that all singular values of H k must be equal to one, then up to an orthogonal transformation of R −1 k , we equate the choice of an observational operator H k with the selection of an orthogonal matrix V k ∈ R n×d .
For perfect models, Q k ≡ 0, we write the forecast error Riccati equation in terms of a choice of H k V T k as The Frobenius norm, P k+1 F = tr P 2 k+1 , is bounded by are as large as possible, similar to maximizing the denominator of (4.33).
For a fixed sequence of observation error covariances, finding the largest eigenvalues of (5.9) can be studied by finding the subspace for which the matrix of orthogonal projection coefficients V T k X k has the largest singular values. In perfect models, the forecast error covariance for the Kalman filter asymptotically has support confined to the span of the unstable and neutral BLVs [27,8]. This is likewise evidenced for the ensemble Kalman filter in weakly nonlinear models [42,7], suggesting that the columns of V k should be taken as the leading d BLVs. Definition 13 is a formalization of the AUS observational paradigms [53,15] utilizing "bred vectors" as proxies for the BLVs. The breeding method of Toth and Kalnay [50] simulates how the modes of fast growing error are maintained and propagated through the successive use of short range forecasts in weather prediction. The bred vectors are formed by initializing small perturbations of a control trajectory and forecasting these in parallel along the control. Upon iteration, the span of these perturbations generically converge to the leading BLVs. For a discussion of variants of this algorithm, and the convergence to the BLVs, see, e.g., Balci et al. [3].
The choice of observation operator in Definition 13 is also related to the numerical study of targeted observations for the L96 model of Law et al. [37]. Law  Note that the observation operator H b4 k uniformly completely observes the span of the unstable and neutral BLVs, and thus for d ≥ 4, H bd k satisfies the sufficient criterion for filter stability in perfect dynamics discussed by Bocquet et al. [8]. The operator H b4 k likewise satisfies the necessary and sufficient detectability criterion for filter stability perfect dynamics of Frank and Zhuk [23]. On the other hand, the operator H fd k observes the span of the leading d FLVs. Unlike the BLVs, the FLVs define a QL decomposition of the span of the CLVs (see equation (53) of [36]). This implies that the columns of the operator F f4 k actually span the orthogonal complement to the stable Oseledec spaces. Therefore, H f4 k satisfies the criterion of Frank and Zhuk [23] but will not generally satisfy the condition of Bocquet et al. [8].
We perform parallel experiments, fixing the sequence of linear propagators M k and the initial prior error covariance P 0 I n , while varying the choice of the observation operator and the observational dimension d. In each parallel experiment, we study the average forecast uncertainty for the full rank Kalman filter as described by Frobenius norm of the forecast error covariance P k , averaged over 10 5 assimilations, neglecting a separate filter stabilization period of 10 4 assimilations. For each d ∈ {4, . . . , 9}, we compare the following choices of observation operators: (i) H bd k ; (ii) H fd k ; (iii) H k V T k for randomly drawn orthogonal matrices, V k ∈ R n×d ; and (iv) a fixed network of observations, given by the leading d rows of the identity matrix, i.e., H k I d×n . We also compute the average Frobenius norm of the forecast error covariance for full dimensional observations, with H k I n . In each experiment, the observational and model error covariances are fixed as R k I d and Q k I n . For each i, the value of Ψ i k is averaged over the 10 5 assimilations. In Figure 5 Our results have strong implications for the necessary rank of the gain in ensemble-based Kalman filters. In perfect, weakly nonlinear models, the ensemble span typically aligns with the leading BLVs [42,7]. From the above results, we conclude that the effective rank of the ensemble-based gain must be increased to account for weakly stable BLVs of high variance in the presence of model errors. The perturbations of model errors excited by transient instabilities in these modes can lead to the unfiltered errors becoming unacceptably large compared to the filtered errors. Likewise H bd k makes a slight reduction to the overall forecast error over a choice of d random observations. As the span of the leading n 0 FLVs is orthogonal to the trailing, stable Oseledec spaces, this choice can be considered closer to the minimum necessary observational constraint on the forecast errors. Particularly, the kernel of F fn 0 k is identically equal to the sum of the stable Oseledec spaces. This suggests that a necessary and sufficient condition for filtered boundedness can be described in terms of the observability of the n 0 leading FLVs, similar to the criterion of Frank and Zhuk [23]. While it is not necessary, the sufficient condition of Bocquet et al. [8] leads to a lower filter uncertainty as the span of the leading n 0 BLVs generally contains the largest projection of the forecast error. This suggests that observing the leading eigenvectors of M k+1 M T k+1 may generally outperform observing the leading eigenvectors of M T k+1 M k+1 when the time between observations δ = t k+1 − t k leads to significant differences in the linear expansions, as was noted as an alternative design by Law et al. [37]. For operational forecasting, this supports the use of the breeding technique [50] to target observations over using the axes of forward growth. 6. Conclusion. This work formalizes the relationship between the Kalman filter uncertainty and the underlying model dynamics, so far understood in perfect models, now in the presence of model error. Generically, model error prevents the collapse of the covariance to the unstable-neutral subspace, and our Proposition 1 and Proposition 2 characterize the asymptotic window of uncertainty. We provide a necessary condition for the boundedness of the Kalman filter forecast errors for autonomous and time varying dynamics in Corollary 1 and Corollary 2: the observational precision, relative to the background uncertainty, must be greater than the leading instability which forces the model error. Particularly, Corollary 3 proves that forecast errors in the span of the stable BLVs remain uniformly bounded, in the absence of filtering, by the effect of dynamic dissipation alone.
The uniform bound on the errors in the span of the stable BLVs extends the intuition of AUS to the presence of model error, but with qualifications. Studying this uniform bound with Algorithm 5.1, we identify an important mechanism for the growth of forecast uncertainty in suboptimal filters: variability in the LLEs for asymptotically stable modes can produce transient instabilities, amplifying perturbations of model error. The impact of stable modes close to zero differs from similar results for nonlinear, perfect models by Ng et al. [42] and Bocquet, Raanes, and Hannart [9], where the authors demonstrate the need to correct weakly stable modes in the ensemble Kalman filter due to the sampling error induced by nonlinearity.
Likewise, this differs from the EKF-AUS-NL of Palatella and Trevisan [45] that accounts for the truncation errors in the estimate of the forecast uncertainty in perfect, nonlinear models. Our work instead establishes the fundamental impact of these transient instabilities as a linear effect in the presence of model errors.
In addition to our necessary criterion for filter boundedness, in subsection 5.3 we discuss the criteria of Bocquet et al. [8] and Frank and Zhuk [23] in relation to dynamically targeted observations. Our numerical results suggest how these sufficient, and, respectively, necessary and sufficient, criteria can be extended to the presence of model errors. Moreover, we distinguish between the minimal necessary observational constraints for filter boundedness and more operationally effective, sufficient designs. Particularly, our results suggest that while it may be necessary that the observations uniformly completely observe the span of the unstableneutral FLVs, it is sufficient and improves performance to uniformly completely observe the span of unstable-neutral BLVs. In terms of operational forecasting, this strongly supports the use of bred vectors to target observations to constrain the forecast errors.
Corollary 1, Corollary 2, Corollary 3 and the results of section 5 suggest that as a theoretical framework for the ensemble Kalman filter, AUS may be extended to the presence of model errors. By uniformly completely observing and correcting for the growth of uncertainty in the span of the unstable, neutral, and some number of stable BLVs, reduced rank filters in the presence of model errors may obtain satisfactory performance. In practice, one may compute offline the typical uncertainty in the stable BLVs via Algorithm 5.1 and determine the necessary observational and ensemble dimension at which unfiltered forecast error has negligible impact on predictions. However, computational limits on ensemble sizes may make this strategy unattainable in practice-the impact of these unfiltered errors on the performance of a reduced rank, suboptimal filter is the subject of the direct sequel to this work [26].