Error Bounds for Dynamical Spectral Estimation

Dynamical spectral estimation is a well-established numerical approach for estimating eigenvalues and eigenfunctions of the Markov transition operator from trajectory data. Although the approach has been widely applied in biomolecular simulations, its error properties remain poorly understood. Here we analyze the error of a dynamical spectral estimation method called “the variational approach to conformational dynamics” (VAC). We bound the approximation error and estimation error for VAC estimates. Our analysis establishes VAC’s convergence properties and suggests new strategies for tuning VAC to improve accuracy.


1.
Introduction.An essential goal in simulation studies is to identify functions that decorrelate slowly in time.Since the values of these functions can be forecast far into the future, they are used for dimensionality reduction and prediction.Moreover, slowly decorrelating functions describe many scientifically significant processes.For example, in biomolecular systems, large-scale arrangements that control biological activity decorrelate slowly, compared to quickly fluctuating bond lengths and angles.
Dynamical spectral estimation is a numerical method that identifies slowly decorrelating functions by estimating the eigenfunctions and eigenvalues of the Markov transition operator of a system.Under appropriate assumptions, a small number of eigenfunctions span the most slowly decorrelating functions of the system, and the associated eigenvalues determine the slowest decorrelation rates.Dynamical spectral estimation uses simulated trajectories to estimate these quantities of interest.
Despite the wide acceptance of dynamical spectral estimation, estimated eigenfunctions and eigenvalues can have substantial error [47], and the cause of this error is not yet fully understood.Our goal here is to identify and bound the major error sources, thereby identifying opportunities where dynamical spectral estimation can produce accurate results.
Dynamical spectral estimation has been used in fields as diverse as biomolecular simu-lation [36], fluid mechanics [49], and geophysical analysis [10].The approach goes by many names in the literature, including Markov state models [43], time-lagged independent component analysis [44], Ulam's method [53], dynamical mode decomposition [42], and extended dynamical mode decomposition [56].The methods are all closely related, so an error analysis for any one of the methods can shed useful light on the others.Here, for concreteness, we focus on a dynamical spectral estimation method that is well-established in chemistry, called "the variational approach to conformational dynamics" (VAC) [30,5,29,14].VAC can be applied to any Markov process X t that is ergodic and reversible with respect to a distribution µ.Starting from a data set of simulated trajectories, VAC is applied in two steps.First, the data set is used to estimate expectations C ij (τ ) = E µ [φ i (X 0 ) φ j (X τ )] involving a set of basis functions (φ i ) 1≤i≤n .Then, the spectral decomposition of the matrix C (0) −1 C (τ ) is used to estimate eigenvalues and eigenfunctions of the transition operator of X t .
Our mathematical analysis establishes bounds on VAC's approximation error and estimation error.Approximation error is the error in eigenvalue and eigenfunction estimates if the expectations C ij (τ ) = E µ [φ i (X 0 ) φ j (X τ )] are computed perfectly.Estimation error is the additional error incurred in VAC estimates because matrices C (0) and C (τ ) are computed imperfectly using a finite data set.
We are not the first authors to mathematically examine VAC's error.Djurdjevac and coauthors [7] bounded the approximation error of VAC eigenvalues.We extend their work by bounding the approximation error for VAC eigenfunctions, which are the chief objects of interest in most applications of dynamical spectral estimation.Additionally, we provide the first analysis of estimation error both for VAC eigenvalues and for eigenfunctions.
Our analysis of VAC also requires proving original error bounds.Standard bounds for the approximation of eigenspaces (e.g., [40, pg. 103] or [19, pg. 990]) depend on the inverse gap between eigenvalues.However, the gap between any two non-trivial eigenvalues of the transition operator vanishes exponentially fast with the lag time parameter τ .Therefore, the standard bounds increase exponentially as τ → ∞.In contrast to this asymptotic scaling, we contribute a sharp new perturbation bound that depends only on the inverse relative gap between eigenvalues.This new bound reaches its minimal value in the large τ limit, demonstrating the benefit of long lag times for reducing approximation error.In contrast, our asymptotic expressions for the estimation error do depend on the inverse spectral gap and therefore grow in the large τ limit.Therefore, it is best to select an intermediate lag time.
While there is no single ideal lag time dictated by our analysis, we offer new tools for tuning VAC to reduce the estimation error.One such tool, the VAC condition number, identifies the subspaces of VAC eigenfunctions most sensitive to estimation error.A second diagnostic, the mean squared estimation error, identifies the typical size of the estimation error at different lag times.We provide data-driven formulas for calculating these quantities, enabling VAC users to identify and avoid the most problematic subspaces and lag times.Our experiments confirm that using these diagnostic tools leads to improved accuracy for VAC estimates.
The paper is organized as follows.Background material is given in section 2, theoretical results are in section 3, numerical experiments are in section 4, mathematical derivations are in section 5, and the conclusions follow in section 6.
2. Background.This section presents background material explaining the VAC algorithm and the dynamical quantities VAC approximates.
2.1.VAC.We begin by introducing the steps of VAC when the algorithm is applied to trajectory data from a Markov process X t with an ergodic, reversible distribution µ.The algorithm starts by estimating expectations involving a set of basis functions (φ i ) 1≤i≤n .Subsequently, VAC solves an eigenvalue problem involving matrices of expectations.
Algorithm 2.1 VAC algorithm at lag time τ .
Return VAC eigenvalues λτ i and VAC eigenfunctions γτ i = j vi j (τ ) φ j .
In Algorithm 2.1, we are purposefully vague about the exact method for obtaining trajectory data to estimate One common approach involves simulating long trajectories X t and removing the start of each trajectory to limit equilibration bias [45].A second common approach ("importance sampling" [21]) involves simulating short trajectories and addressing bias through an appropriate reweighting procedure [32,57].Since there are no restrictions on how the data set is generated, enhanced sampling techniques can be used to generate the trajectory initial conditions or even the trajectories themselves [3,34].
In addition to collecting a data set, another key design feature affecting VAC is the choice of basis functions.In the mid-1990s, early versions of VAC used the coordinate axes as basis functions [50,12], a choice that remains common in molecular dynamics simulations [28,44,35].Independently, in the late 1990s and early 2000s, researchers began constructing spectral estimates using "Markov state models" [43,47,48], a procedure mathematically equivalent to performing VAC using a basis of indicator functions on a partition of state space.This idea of using a basis of indicator functions can be traced back to a publication by Stanislaw Ulam in 1960 [53, pg. 74-75] and leads to simplifications in the eigenvalue problem in Algorithm 2.1.In the 2010s, it was observed that these schemes shared a common mathematical framework that could be extended to arbitrary basis sets [30].Subsequent work led to the development of new families of basis functions [33,55,2,31].
The name "variational approach to conformational dynamics" is inspired by the min-max principle for self-adjoint operators [30,38].This variational principle demonstrates that the top eigenfunctions η 1 , . . ., η k of the transition operator maximize the value of the autocorrelation function at any lag time τ ≥ 0. Thus, when η is a linear combination of the top k eigenfunctions and u is uncorrelated with the top k eigenfunctions, the autocorrelation functions are related by Consistent with this variational principle, VAC constructs linear combinations of basis functions that maximize autocorrelations.A recent approach due to [23] and [4] extends the linear fitting procedure in VAC by using artificial neural networks to maximize autocorrelations.However, in the present analysis we focus on the linear VAC algorithm as described in Algorithm 2.1, and we leave analysis of the nonlinear fitting procedure to future work.
To help clarify the relationship between VAC and other algorithms, we observe that the computational steps in Algorithm 2.1 can be used for many purposes.For example, AMUSE [52,27] uses the same computational procedure as Algorithm 2.1, but the goal is to solve the blind-source separation problem in signal processing.Likewise, dynamic mode decomposition [39] and extended dynamic mode decomposition [56] use the same computational procedure as Algorithm 2.1, but the goal is to analyze non-reversible processes, particularly deterministic fluid flows.While the underlying computations are similar in all these cases, VAC refers specifically to the spectral estimation of time-reversible processes.To learn more about the connections between VAC and other related algorithms, we refer the reader to the helpful review paper by Klus and coauthors [17].

Spectral theory.
In this subsection, we take a closer look at the transition operator of the process X t and its eigenfunctions.We assume X t is either a continuous-time Feller process [15] or a discrete-time process restricted to even times t = 0, 2, 4, . ... We assume X t is ergodic and time-reversible with respect to a distribution µ.We use •, • to denote the inner product on the Hilbert space L 2 (µ), and we set • = •, • 1/2 .Lastly, we use P U to denote the orthogonal projection [37, pg. 187] onto the closed linear subspace U and P f to denote the orthogonal projection onto the one-dimensional subspace spanned by the function f .The transition operator [15], also called the Koopman operator, is defined as the conditional expectation operator satisfying (2.4) There are three main properties of the transition operator that determine information about its eigenfunctions.
1.The transition operator T t is self-adjoint in L 2 (µ).The self-adjointness follows from the time-reversibility condition (2.5) µ (dx) p t (x, dy) = µ (dy) p t (y, dx) , where p t (x, dy) denotes the transition probabilities for the process X t .By integrating over equation (2.5), we verify the self-adjointness property 2. The transition operator satisfies the semigroup property (2.7) For discrete-time processes, the semigroup property guarantees a decomposition (2.8) For continuous-time Feller processes, the decomposition can be extended even further, leading to the formula (2.9) T t = e tA , t ≥ 0, which relates the semigroup T t to its infinitesimal generator A [15]. 3. The transition operator T t is nonnegative, that is, for all t ≥ 0 if X t is a continuous-time process and for t = 0, 2, 4, . . .if X t is a discrete-time process.
Using the spectral theorem for self-adjoint operators [37], we obtain a decomposition of either A or T 2 .Then, we extend this decomposition to the transition operator at all lag times t ≥ 0 or t = 0, 2, 4, . ... The spectral decomposition takes the form where Π (dσ) is a projection-valued measure.
The spectral decomposition completely determines the time correlations of the process X t .If the spectrum is discrete, then a finite set of orthonormal eigenfunctions are responsible for all the slowest decorrelations of the process.However, if there is an essential spectrum containing σ = 0, then an infinite set of orthonormal functions decorrelate arbitrarily slowly [37, pg. 236].
To avoid the possibility of having an essential spectrum containing σ = 0, it is sufficient to assume T t is compact.Under compactness, the spectral decomposition takes the form (2.12) where e −σ 1 t > e −σ 2 t ≥ e −σ 3 t ≥ • • • are eigenvalues and η 1 , η 2 , η 3 , . . .are the associated eigenfunctions.Since the process is ergodic, e −σ 1 t = 1 is a simple eigenvalue of T t corresponding to the eigenfunction η 1 = 1. Figure 1 shows additional examples of eigenfunctions for a compact transition operator T t .While the compactness assumption is enough to facilitate a rigorous analysis of VAC, the compactness assumption can be overly restrictive.In the Monte Carlo literature, there are numerous examples of transition operators that are not compact, such as the transition operator for the Metropolis-Hastings sampler [25,1].Therefore, we prefer to use the quasicompactness assumption, a weaker assumption satisfied by a broader class of processes.Assumption 2.1 (Quasi-compactness).The spectral decomposition for the transition operator T t takes the form: e −σt Π (dσ) .(2.13) Figure 1: Eigenfunctions of a compact transition operator, corresponding to dynamics d X t Y t = −0.40.17 0.17 −0.2 Left: typical trajectory of (X t , Y t ).Upper middle: time series for eigenfunction η 2 with slow decorrelation timescale σ −1 2 = 5.Lower middle: time series for eigenfunction η 50 with fast decorrelation timescale σ −1 50 = 0.1.Right: spatial structure of η 2 and η 50 .
Remark 2.2.In the analysis to follow, an "eigenspace" of T t denotes the closed linear subspace of eigenfunctions with a given eigenvalue.An "invariant subspace" U is any closed linear subspace satisfying T t U ⊆ U.
Remark 2.3.There is a common modification of Algorithm 2.1 where the estimated mean μi ≈ µ i = E µ [φ i (X 0 )] is subtracted from each one of the basis functions φ i before performing VAC (see the discussion in [17]).When the mean is removed, VAC no longer estimates the trivial eigenfunction η 1 = 1 with eigenvalue e −σ 1 t = 1; however, VAC continues to estimate all other eigenvalues and eigenspaces.

Approximation of eigenspaces.
It is colloquially said that VAC approximates eigenvalues and eigenfunctions, but it is more correct to say that VAC approximates eigenvalues and eigenspaces.Recall that λτ i and γτ i are the VAC eigenvalues and eigenfunctions, while e −σ i τ and η i are the true eigenvalues and eigenfunctions of the transition operator.We assume that VAC eigenvalues are arranged from largest to smallest so that λτ Here, • 2 denotes the operator norm and • F denotes the Hilbert-Schmidt norm, also known as the Frobenius norm.
The gap distance and projection distance are very flexible, and definitions (2.16) can be applied even if dim (U) < dim (W) ≤ ∞.In this case, we observe that d 2 (U, W) and d F (U, W) are not technically distances.Rather, d 2 (U, W) and d F (U, W) are properly interpreted as distances between U and the nearest dim (U)-dimensional subspace of W.
We end this section by introducing a useful property of the projection distance, which we apply repeatedly in the analysis.Lemma 2.5.Consider U = span (U 1 , U 2 ) where U 1 and U 2 are orthogonal subspaces, and W = span (W 1 , W 2 ) where W 1 and W 2 are orthogonal subspaces.Then, (2.17) (2.18) 3. Theoretical results.To describe the approach taken in the theoretical analysis, we introduce an idealized VAC algorithm where expectations are computed perfectly.Notationally, we distinguish between VAC and idealized VAC by using the ˆsymbol to indicate the quantities calculated using data.For VAC, we write Ĉij (τ ), vi (τ ), λτ i , and γτ i .For idealized VAC, we write and γ τ i .In the theoretical analysis, we use idealized VAC to isolate two different sources of error.We decompose subspace error using .
Analogously, we decompose eigenvalue error using .
Approximation error is the difference between idealized VAC estimates and the true eigenvalues and eigenspaces.Estimation error is the difference between VAC estimates and idealized VAC estimates.We first present approximation error bounds in subsection 3.1 and then we present estimation error bounds in subsection 3.2.
Remark 3.1.To illustrate the implications of our error bounds, we use numerical experiments.Thus, Figure 2 and Figure 3 demonstrate the error of VAC when applied to the Ornstein-Uhlenbeck process dX = −X dt + √ 2 dW using a basis of indicator functions.Details on how the figures were generated appear in the supplement.

Approximation error.
In this subsection, we first bound the approximation error by using traditional Rayleigh-Ritz approximation bounds.However, we find that the Rayleigh-Ritz bounds do not provide enough information to show how approximation error depends on the lag time parameter τ .Therefore, we derive improved bounds by using original methods.The improved bounds are asymptotically sharp at long lag times, revealing that long lag times cause the approximation error to stabilize.

Existing approximation bounds are inadequate.
The idealized VAC algorithm is equivalent to the Rayleigh-Ritz method in spectral estimation.In the Rayleigh-Ritz method [46], the eigenvalues and eigenspaces of a target operator A are estimated by introducing a subspace of functions U and then calculating the eigenvalues and eigenspaces of P U A| U where A| U denotes the restriction of A to the subspace U.This is also exactly what is done in idealized VAC.The target operator is the transition operator T τ , and the subspace of basis functions is Φ = span 1≤i≤n φ i .Moreover, the idealized VAC eigenfunctions γ τ i are eigenfunctions of P Φ T τ | Φ with eigenvalues λ τ i .The equivalence between the Rayleigh-Ritz method and idealized VAC is known in the VAC literature [41,7].However, the implications for VAC's approximation error have not yet been fully explored.Djurdjevac and coauthors [7] applied Rayleigh-Ritz error bounds to analyze idealized VAC eigenvalues.The following theorem takes a step further, by also applying Rayleigh-Ritz error bounds to analyze idealized VAC eigenspaces.Theorem 3.2 (Approximation bounds).Fix the lag time τ > 0 and the index 1 ≤ k ≤ r, but allow the basis set Φ to vary.In the limit as d F span 1≤i≤k η i , Φ → 0, the idealized VAC estimates converge as follows: 1.The idealized VAC eigenvalues 1, 2, . . ., k all converge Moreover, the kth idealized VAC eigenvalue is bounded by 2. When there is a gap between {σ j , . . ., σ k } and all other σ i values, the subspace span j≤i≤k γ τ i of idealized VAC eigenfunctions converges Moreover, the top k idealized VAC eigenfunctions are bounded by Proof.See [18,19] for the original proofs, or see the derivations in the supplement.
The main takeaway from Theorem 3.2 is that the approximation error converges to zero in the limit as Condition (3.7) implies that the basis set Φ must become very rich, so that eigenfunctions η i can be closely approximated using linear combinations of basis functions.The Rayleigh-Ritz error bound (3.6) clearly indicates that the eigenspace approximation error must decay with an increasingly rich basis.However, the bound is not sufficiently detailed to show how approximation error depends on the lag time τ .As seen in Figure 2, the Rayleigh-Ritz bound (3.6) asymptotes to infinity as the lag time increases, implying that approximation error can grow arbitrarily large.In contrast to this upper bound, however, experiments reveal that approximation error decreases and then stabilizes as the lag time tends to infinity.In the next section, we will derive an improved bound that is asymptotically sharp, describing the exact behavior of the approximation error as τ → ∞.

New approximation bounds.
To analyze the dependence on lag time, we develop a mathematical approach different from the methods applied to the Rayleigh-Ritz method in the past.We start by identifying a key stability property of idealized VAC that has not appeared in the previous literature.As τ → ∞, idealized VAC eigenspaces converge to a well-defined limit, implying that the approximation error must stabilize at long lag times.
To rigorously study the convergence of idealized VAC estimates, our first step is to introduce the orthogonalized projection functions q 1 , q 2 , . ... These are the natural functions to appear in the τ → ∞ limit.They are constructed from the projected eigenfunctions P Φ η 1 , P Φ η 2 , . . ., but they are adjusted to meet the orthogonality constraints on idealized VAC eigenfunctions.Definition 3.3.Set p = min {n, r}, where n is the number of basis functions (φ i ) 1≤i≤n and r is the number of eigenfunctions (η i ) 1≤i≤r identified in Assumption 2.1.Assume that projections P Φ η i are linearly independent for 1 ≤ i ≤ p.Then, define: . . .
Our next step is to prove that idealized VAC eigenfunctions γ τ i converge to the orthogonalized projections q i at long lag times.
Theorem 3.4 (The τ → ∞ limit).Fix the basis set Φ but allow the lag time τ to vary.In the limit as τ → ∞, idealized VAC estimates converge as follows: 1.When there is a gap between σ k and all other σ i values, the kth idealized VAC eigenvalue satisfies 2. When there is a gap between {σ j , . . ., σ k } and all other σ i values, the subspace span j≤i≤k γ τ i of idealized VAC eigenfunctions converges 3. When there is a gap between σ k and all other σ i values and a gap between σ k+1 and all other σ i values, the top k idealized VAC eigenfunctions satisfy Proof.See subsection 5.2, subsection 5.3, and subsection 5.4.
The main message of Theorem 3.4 is that idealized VAC eigenspaces converge exponentially fast as τ → ∞.Because of this convergence, the approximation error must stabilize.As the last step of our approximation error analysis, we use the stabilization at long lag times to provide a new, asymptotically sharp bound on VAC's approximation error.Theorem 3.5.When λ τ k > e −σ k+1 τ , the top k idealized VAC eigenfunctions are bounded by Proof.See subsection 5.3.
Interpreting the results of this subsection, we can identify concrete strategies for how best to reduce approximation error.The approximation error can be divided into two parts: In this decomposition, we separate the lag-time-independent error and the lag-time-dependent error.In applications of VAC, there are separate strategies for reducing these two error sources.To reduce the lag-time-independent error, the best strategy is to enrich the basis set as much as possible.If the basis set is rich enough to approximate the top eigenfunctions η 1 , η 2 , . . ., η k with high accuracy, then the lag-time-independent error must be low.Assuming there is a gap between {σ j , . . ., σ k } and all other σ i values, Lemma 2.5 guarantees As the basis set becomes increasingly rich, the right-hand-side of the inequality converges to zero, implying that the lag-time-independent error must vanish.To reduce the lag-time-dependent error, the best strategy is simply to increase the lag time.As τ → ∞, Theorem 3.4 guarantees that the lag-time-dependent error must decay exponentially fast.

Estimation error.
In this subsection, we present formulas for the estimation error and explain how to calculate the mean squared estimation error using data.
3.2.1.Formulas for the estimation error.In applications of VAC, it is not typically possible to evaluate expectations C ij (τ ) = E µ [φ i (X 0 ) φ j (X τ )] exactly.Instead, trajectory data is used to provide estimates Ĉij (τ ) ≈ C ij (τ ).In the asymptotic limit as Ĉ (τ ) → C (τ ) and Ĉ (0) → C (0), the estimation error is governed by the following asymptotic formulas.Theorem 3.6 (Estimation error).Fix the basis set Φ and the lag time τ > 0, but allow matrices Ĉ (0) and Ĉ (τ ) to vary.Assume idealized VAC eigenfunctions are normalized so that γ τ i , γ τ j = δ ij , and recall v i (τ ) is the vector with Then, VAC estimates have the following behavior as Ĉ (τ ) → C (τ ) and Ĉ (0) → C (0): 1.When there is a gap between λ τ k and all other λ τ i values, the kth VAC eigenvalue satisfies When there is a gap between λ τ j , . . ., λ τ k and all other λ τ i values, the subspace span j≤i≤k γτ i of VAC eigenfunctions satisfies Moreover, the condition number for the subspace span j≤i≤k γτ i is given by with the conventions λ τ 0 = ∞ and λ τ n+1 = −∞.Proof.See subsection 5.5.
A useful quantity identified in Theorem 3.6 is the condition number (3.21), which quantifies VACs sensitivity to small errors in the matrices Ĉ (τ ) and Ĉ (0).In experiments, we find the condition number is a useful heuristic for judging whether a VAC estimation problem is easy or hardmore specifically whether a large or small data set is required for accurate estimation.When the condition number for a subspace of VAC eigenfunctions is higher than 5 at all lag times, the numerical experiments in section 4 show that VAC is prone to experiencing large amounts of estimation error.Empirically, we can estimate the minimum condition number across all lag times by using (3.22) min We recommend that VAC users identify the minimum condition number for various subspaces and focus on estimating the well-conditioned subspaces whenever possible.Additionally, we recommend that authors report the minimum condition number along with their VAC results, helping readers to assess whether the results could be affected by estimation error.

3.2.2.
Calculating the asymptotic mean squared estimation error using data.Here, we explain how to calculate the mean squared estimation error using trajectory data.We assume for simplicity that the data consists of a single long stationary trajectory of the process X t .However, the estimation procedure described here could be generalized to other types of trajectory data.
Our approach for calculating the mean squared estimation error is based on the following convergence in distribution result.
Theorem 3.7.Fix the basis set Φ and the lag time τ > 0, but allow the data set used in VAC to vary.Assume E µ |φ i (X 0 )| 4 < ∞ for 1 ≤ i ≤ n.Assume a stationary trajectory X 0 , X ∆ , X 2∆ , . . ., X T −∆ is simulated and estimates Ĉij (t) ≈ C ij (t) are formed using Then, VAC estimates have the following behavior as T → ∞: 1.When there is a gap between λ τ k and all other λ τ i values, the kth VAC eigenvalue satisfies 2. When there is a gap between λ τ j , . . ., λ τ k and all other λ τ i values, the subspace span j≤i≤k γτ i of VAC eigenfunctions satisfies Here, (Z τ lm ) 1≤l,m≤n is a mean-zero multivariate Gaussian with variance terms where we have defined Proof.See subsection 5.6 The great value of Theorem 3.7 is that it suggests a data-driven approach for calculating the mean squared estimation error in the asymptotic limit.First, we can use the data set to estimate the E |Z τ lm | 2 terms by means of equation (3.26).Second, we can substitute the E |Z τ lm | 2 estimates into equation (3.24) or (3.25) to compute the mean squared estimation error for eigenvalues or invariant subspaces.In the supplement, we provide a step-by-step description of this estimation procedure.
In Figure 3, we calculate the mean squared estimation error by using a single trajectory of data.We find that it is possible to accurately identify the lag times at which the mean squared estimation error exceeds a critical threshold, such as 0.2.Moreover, in the numerical experiments in section 4, 0.2 is a typical level at which the estimation error begins to contribute significantly to VAC's overall error.Therefore, in VAC applications we recommend calculating the mean squared error and avoiding lag times where the error exceeds such a threshold.We conclude this section by considering three additional strategies to reduce the estimation error of VAC.The first strategy is to increase trajectory length.By increasing the length T of the trajectory, the estimation error consistently decreases at a 1/ √ T rate as shown in Figure 3.The second strategy for reducing the estimation error is to prune the size of the basis set.We find in Theorem 3.6 that the squared estimation error increases linearly with the number of basis functions.Therefore, it is best to include only those basis functions that have the potential to overlap with the eigenfunctions of the transition operator.
The final strategy for reducing the estimation error is to select basis functions with favorable integrability properties.In Theorem 3.7, it is seen that the mean squared estimation error depends on the fourth moments of the idealized VAC coordinates.If the basis functions themselves have large kurtosis this can increase the estimation error in VAC calculations.Favorable integrability properties may be one factor that helps explain the success of Markov state models, in which the basis consists of indicator functions on a partition of the state space.The higher moments of indicator functions are often well-controlled, compared to, e.g., higher order polynomials of the coordinate axes.

Numerical experiments.
In this section, we report on two numerical experiments that illustrate the major factors impacting VAC accuracy.Moreover, these experiments show how computing the VAC condition number and the mean squared estimation error can help to improve VAC's accuracy.

4.1.
Varying the basis size and trajectory length.First, we apply VAC to estimate the span of eigenfunctions η 1 , η 2 and η 3 for the Ornstein-Uhlenbeck (OU) process In both trials, the basis functions are indicator functions on disjoint intervals.
The two different trials demonstrate that the breakdown of approximation error and estimation error is sensitive to context, as seen in Figure 4.The approximation error is higher in trial 1 because of the smaller set of basis functions, whereas the estimation error is higher in trial 2 because of the smaller data set.In trial 1, it is optimal to use a comparatively long lag time of τ = 0.7 to reduce the approximation error.In contrast, in trial 2 it is optimal to use a comparatively short lag time of τ = 0.1 to avoid the increase in estimation error at longer lag times.In addition to showing the true error levels, Figure 4 shows the root mean squared estima-tion error calculated directly from the data.In trial 1, the calculated estimation error remains below 0.2 for lag times up to 1.5.Therefore, the VAC practitioner can infer that the data set is rich enough to take high lag times without experiencing large estimation error.However, in trial 2, the calculated estimation rises more rapidly, reaching a level of 0.2 when the lag time is just τ ≈ 0.5.In this case, the VAC practitioner can infer that the data set is not rich enough to take τ > 0.5.An alternative lag time selection strategy called implied timescale analysis has been advocated in the past by VAC researchers [47].In this strategy, the VAC eigenvalues are used to compute the implied timescales If VAC eigenvalues are perfect estimates of the true eigenvalues, then implied timescales are perfectly flat and they equal σ −1 i .In practice, however, implied timescales are not flat.They increase quickly at short lag times and then increase more slowly at long lag times.To cut down on VAC's error, Swope and coauthors [47] proposed selecting a long enough lag time so that the implied timescales for the eigenfunctions of interest are approximately level.
Figure 5 presents the implied timescales for the OU process.From the figure it is clear that the implied timescales cannot be used to assess the estimation error.The estimation error is much higher in the second trial, yet the implied timescales for trial 1 and trial 2 are similar.However, implied timescales may help assess the approximation error.As the lag time is increased from τ = 0 to τ = 0.1, the second and third implied timescale become much flatter, which provides an accurate indication that the approximation error is decreasing and beginning to settle.We conclude that implied timescale analysis may provide an approach for assessing approximation error that is complementary to our approach for assessing the estimation error.Whereas our approach is useful for identifying and avoiding the error that is prevalent at long lag times, implied timescale analysis may be useful for identifying and avoiding the error that is prevalent at short lag times.However, while our approach for computing the mean squared estimation error is rigorously justified, it remains an open research problem to rigorously justify this proposed relationship between the implied timescales and the approximation error.

4.2.
Varying the size of the subspace.In this second experiment, we apply VAC to estimate the eigenfunctions of the diffusion process where the potential U and the diffusion matrix σ are given by: We simulate a stationary trajectory of length T = 500 and then apply VAC using the basis set 1, x 1 , x 2 , x 2 1 , x 1 x 2 , x 2 2 .We investigate how the accuracy changes when VAC is used to estimate two subspaces of different sizes: span {η 1 , η 2 } and span {η 1 , η 2 , η 3 }.When estimating span {η 1 , η 2 }, there is a wide range of lag times that all lead to low error levels.As seen in Figure 6, the total error decreases between lag times of τ = 0 and τ = 0.2, but it is nearly constant for all lag times between τ = 0.2 to τ = 1.5.In contrast, when estimating span {η 1 , η 2 , η 3 }, the total error is V-shaped with a distinct minimum at the lag time τ = 0.2.The error rises rapidly as the lag time is increased beyond τ = 0.2 due to an upsurge in the estimation error.What explains the different error profiles when estimating the subspace span {η 1 , η 2 } versus span {η 1 , η 2 , η 3 }?The explanation is not a difference in the data set or the basis set, since these factors remain the same when estimating the two subspaces.Rather, the increase in estimation error is due to the much higher condition number for the subspace span {η 1 , η 2 , η 3 }.No matter how the lag time is selected, the inverse spectral gap (λ τ 4 − λ τ 3 ) −1 is at least as high as 9.5.In contrast, when estimating the subspace span {η 1 , η 2 }, the minimum condition number min τ (λ τ 3 − λ τ 2 ) −1 is just 2.0.Here we see a high condition number is associated with increased levels of estimation error and a stronger relationship between estimation error and lag time.
To avoid situations where the estimation error is uncontrollably high, VAC users should identify well-conditioned subspaces and focus on estimating these subspaces whenever possible.As shown in the VAC eigenvalue plot in Figure 7, eigenvalues for well-conditioned subspaces often visually stand apart from the rest of the eigenvalues.The large gap between the second and third VAC eigenvalue indicates a natural separation in timescales, which implies that span {η 1 , η 2 } is a well-conditioned subspace.Applying the spectral decomposition (2.13), we find that each matrix entry C ij (τ ) has an exponentially decaying structure. (5.1) Thus, the matrix C (τ ) is the sum of exponentially decaying rank-one matrices where we have used the shorthand η l , φ to denote the vector with entries η l , φ i .
To approximate the behavior of idealized VAC at long lag times, we remove the smallest terms in the expansion (5.2) and replace C (τ ) by the sum of k rank-one matrices When the rank-k approximation is used in place of C (τ ), it results that the top k idealized VAC eigenfunctions γ τ 1 , . . ., γ τ k span the subspace (5.4) span Therefore, the truncation argument helps intuitively explain the convergence of idealized eigenfunctions γ τ i to orthogonalized projections q i as τ → ∞.Our proofs in subsection 5.2, subsection 5.3, and subsection 5.4 essentially serve to justify the truncation argument and to provide rigorous bounds on the convergence.

Convergence of eigenvalues.
In this section, we verify the statement in Theorem 3.4 that the kth idealized eigenvalue converges (3.12) in the limit τ → ∞, provided there is a gap between σ k and all other σ i values.To prove this result, our main tool is the min-max principle for self-adjoint operators [38].
Proof.It suffices to consider the q = 1 case.Then, q can be decomposed as q = aq +bq k where a 2 + b 2 = 1, q ∈ Q 1:k−1 , and q = 1.It follows Thus, q, T τ q is bounded from below by the lowest eigenvalue of (5.15) For any symmetric real-valued matrix M = a b b c with a > c, the lowest eigenvalue is at least as large as c − b 2 / (a − c) [24].We can check that (5.16) Therefore, the lowest eigenvalue of the matrix M is at least as large as (5.17) Proof of (3.12).Using the min-max principle and Proposition 5.2, (5.18) as τ → ∞.Using the min-max principle and Proposition 5.

Convergence of invariant subspaces.
In this section, we verify the statement in Theorem 3.4 that the subspace span j≤i≤k γ τ i of idealized VAC eigenfunctions converges (3.13) span j≤i≤k γ τ i → span j≤i≤k q i in the limit τ → ∞, provided there is a gap between {σ j , . . ., σ k } and all other σ i values.To prove this result, our main tool is a well-known lemma due to Davis and Kahan [6].
Lemma 5.4.Suppose A and B are self-adjoint operators and U and W are closed subspaces.If the spectrum of P U A| U lies in the interval [a, b] and the spectrum of The Davis and Kahan lemma leads to the following error bound.
Proposition 5.5.When λ τ k > e −σ k+1 , the distance between subspaces Γ τ 1:k = span 1≤i≤k γ τ i and Q 1:k = span 1≤i≤k q i is bounded by Proof.The spectrum of P Φ∩Q ⊥ where we have used the fact that Γ τ 1:k is an invariant subspace of P Φ T τ P Φ .Next, we introduce the subspace H 1:k = span 1≤i≤k η i , which is orthogonal to Φ ∩ Q ⊥ 1:k .Then, (5.26) To complete the theorem, it is enough to show (5.27) To prove equation (5.27), we apply a useful property of the Frobenius norm.For bounded linear operators A and B, if it is true that Au ≤ Bu for all u, then it follows that A F ≤ B F [13].Using this property, it is sufficient to prove (5.28) Moreover, it is sufficient to prove that Using the polarization identity and the fact that H ⊥ 1:k is an invariant subspace of T τ , conclude Proof of Theorem 3.5.When λ τ k > e −σ k+1 , Proposition 5.5 allows us to calculate (5.37) Proof of (3.13).When there is a gap between {σ j , . . ., σ k } and all other σ i values, Proposition 5.5 shows that span 1≤i≤j γ τ i → span 1≤i≤k q i and span 1≤i≤k γ τ i → span 1≤i≤k q i in the limit τ → ∞.By applying Lemma 2.5, we verify span j≤i≤k γ τ i → span j≤i≤k q i .5.4.Exponential speed of convergence.In this section, we verify the last statement in Theorem 3.4 that the top k idealized VAC eigenfunctions satisfy in the limit τ → ∞, provided there is a gap between σ k and all other σ i values and a gap between σ k+1 and all other σ i values.
Proof of (3.14).If there are fewer orthogonalized projection functions (q i ) 1≤i≤p compared to basis functions (φ i ) 1≤i≤n , select additional functions (q i ) p+1≤i≤n so that (q i ) 1≤i≤n is a complete orthonormal basis for Φ.Then, The terms q i , γ τ j are determined by the eigenvalue equation (5.42) λ τ j q i , γ τ j = q i , T τ γ τ j = n l=1 q i , T τ q l q l , γ τ j .
Proof of Theorem 3.6.First, define matrices Due to the normalization δ ij = γ τ i , γ τ j , we must have Therefore, idealized VAC eigenfunctions γ τ i and eigenvalues λ τ i are the eigenfunctions and eigenvalues of the multiplication operator (5.51) In contrast, VAC eigenfunctions γτ i and eigenvalues λτ i are eigenfunctions and eigenvalues of (5.52) As Ĉ (0) → C (0) and Ĉ (τ ) → C (τ ), we can calculate where we have made repeated use of the identities (5.50).Multiplying on the left by V (τ ) T and on the right by V (τ ), we find that VAC eigenspaces are unitarily equivalent to the eigenspaces of the matrix operator (5.57) , and the two operators share the same eigenvalues.Theorem 3.6 then follows by applying firstorder perturbation bounds [16] for eigenvalues and invariant subspaces of a diagonal matrix Λ (τ ) that is perturbed by a matrix L (τ ).
5.6.Distributional formulas for the estimation error.In this section, we verify the distributional formulas for the estimation error that are given in Theorem 3.7.
Proof of Theorem 3.7.We fix the lag time t ≥ 0 and the indices 1 ≤ i, j ≤ n, but allow the total trajectory length T to vary.Then, we write As T → ∞, we will proceed to show that √ T Ĉij (t) − C ij (t) converges to an asymptotic normal distribution.
By assumption, X t is started from the stationary distribution X 0 ∼ µ, so the random variables (φ ij (X s∆ , X s∆+τ )) s=0,1,... are strictly stationary [26, pg. 230-231] with mean C ij (t).Moreover, the conditional expectations E for a constant C < ∞ that is independent of s.Condition (5.59) is an asymptotic negligibility condition that guarantees the validity of the central limit theorem for φ ij (X s∆ , X s∆+τ ).Using the central limit theorem in [11, ch. 5], we prove that (5.60) For simplicity, we have considered the asymptotic distribution of √ T Ĉij (t) − C ij (t) .However, by the same approach we can prove the asymptotic normality of any linear combination of random variables √ T Ĉij (t) − C ij (t) involving different values of i, j, and t.By the Cramér-Wold theorem [15], therefore, the array (5.61) converges to a mean-zero multivariate normal distribution.
We have proved that VAC accurately identifies subspaces of eigenfunctions span j≤i≤k η i when three conditions are satisfied: 1.The values {σ j , . . ., σ k } are separated from all other σ i values by a spectral gap. 2. The library of basis functions (φ i ) 1≤i≤n becomes very rich so that linear combinations of basis functions can fully represent η 1 , . . ., η k .3. The data set becomes very large so that expectations ] are evaluated with vanishing error.VAC converges for any value of the lag time parameter τ > 0, yet the choice of lag time can dramatically alter the speed of convergence.Hence, our main contribution is to prove error bounds that explicitly show how error depends on the lag time.These bounds provide a full theoretical justification for why limitations in the basis set contribute to the error at short lag times and limitations in the data set contribute to the to error at long lag times.
Our numerical analysis approach is flexible, and it could be extended to algorithms besides VAC that estimate dynamical quantities of interest using trajectory data.A broadly useful approach involves decomposing the total error into approximation error and estimation error.Another useful approach involves identifying asymptotic formulas for the estimation error.In future research, it is our goal to rigorously analyze the approximation and estimation error for other powerful algorithms used in biochemical simulation (e.g., [51]).
Lastly, while the main purpose of our work is to deepen theoretical understanding, we also provide diagnostic tools to assess VAC's estimation error and tune VAC's parameters to reduce this error source.We present the VAC condition number as a tool for identifying subspaces of VAC eigenfunctions that are prone to experiencing high estimation error.We present the mean squared estimation error as a tool for calculating the estimation error at different lag times.Motivated by the present study, we have also developed an approach for reducing the lag time sensitivity and increasing VAC's robustness by integrating over a window of lag times [22].These tools have direct relevance to the researchers using VAC, pointing the way toward a more streamlined lag time selection process and a more critical assessment of VAC's error for the future.started from the stationary N (0, 1) distribution.The eigenfunctions of the transition operator T t are the Hermite polynomials with eigenvalues 1, e −t , e −2t , e −3t , . ... The conditional distribution for the OU process is determined by Therefore, we can simulate the OU process in discrete time using the exact evolution equations (7.4) When we apply VAC to the OU process, we use a basis of n indicator functions on disjoint intervals, namely, (7.5) The boundary points are selected as follows: 1. First, we set q i = Φ −1 (i/n) where is the cumulative distribution function for a standard normal random variable, and Φ −1 is the inverse cumulative distribution function, also called the quantile function.2. Next, we set q i ← q i + , where is an offset parameter that is always = 0.1 in our figures.The offset parameter helps make our examples realistic, since it would typically be impossible in VAC applications to identify quantiles of the equilibrium distribution exactly.Although many quantities involving the Ornstein-Uhlenbeck process can be calculated analytically, we used numerical quadrature to evaluate the integrals

Figures for the double well process.
Here we provide additional information about how Figure 6 and Figure 7 were generated.These figures show VAC applied to the process (7.10) where the potential U and the diffusion matrix σ are given by: X t is a double well process that spends long time periods in potential wells near (−1, 0) and (1, 0) with rare transitions between wells.We simulate X t using the BAOAB-limit integrator presented in Leimkuhler and Matthews [20] with the timestep ∆ = 10 −4 .We discard the first t = 10 time units of each trajectory to reduce equilibration bias.
To calculate reference values for the true eigenfunctions η i and the idealized VAC matrices C (τ ), we use the numerical PDE approach from appendix D of reference [51].We first construct a grid from −2 to 2 in x and from −5 to 5 in y with grid spacing of = 6 × 10 −4 −1/2 .We next construct the transition matrix P for a hopping process on a grid: P (x ± , y ∓ ) = 0, (7.15) P (x, y) = 1 − P (x + , y) − P (x − , y) − P (x, y + ) − P (x, y − ).(7.16)In the → 0 limit, the action of 24  2 (P − I) on smooth functions approximates the action of the infinitesimal generator L for the process X t .
We calculate the eigenfunctions η i using eigenfunctions of P .We calculate the idealized VAC matrix C (τ ) using the approximation C ij (τ ) = φ i , e Lτ φ j (7.17) where φ i is the vector of φ i values evaluated at each gridpoint and D µ is a diagonal matrix with the stationary measure at each gridpoint along the diagonal.7.3.Mean squared estimation error.In Algorithm 7.1, we describe the procedure for calculating the mean squared estimation error using data.The procedure is similar to the one used to calculate error bars in Markov Chain Monte Carlo.Therefore, we take advantage of existing software for Markov Chain Monte Carlo sampling [9] in our implementation.Algorithm 7.1 Asymptotic estimation error 1.For 1 ≤ i, j ≤ n, perform the following calculations.
(c) Use the approach in [45, pg.143-145] to determine a truncation threshold K such that Rij (s∆) ≈ 0 for s > K, and set  7.4.Rayleigh-Ritz approximation bounds.In this section, we re-derive the classic approximation bounds for the Rayleigh-Ritz method first presented in [18] and [19, pg. 992].Our first step is to verify the inequality follows directly from the min-max principle.
The lower bound then follows by applying the min-max principle.
It remains to verify the inequality where we have used the fact that Γ τ k+1:n is an invariant subspace of P Φ T τ P Φ and H 1:k is an invariant subspace of τ .7.5.Sharper bounds on the lag-time-independent error.Here we prove an elegant bound on the lag-time-independent error., η ∈ H j:k .
Moreover, observing that (7.39) it is enough to prove , η ∈ H j:k .

Figure 2 :
Figure 2: Comparison between different bounds for the the approximation error.Left: the Rayleigh-Ritz bound asymptotes to infinity at long and short lag times.Center: the true approximation error stabilizes at long lag times.Right: the improved bound presented in Theorem 3.4 is asymptotically sharp at long lag times.

Figure 3 :
Figure 3: Root mean squared estimation error for different trajectory lengths T .The calculated error is obtained from a single trajectory of data using formulas in Theorem 3.7.The true error is obtained through 100 independent trials.

(4. 1 )
dX = −X dt + √ 2 dW .In two different trials, we show how VAC's accuracy depends on the size of the basis set and the length of the simulated trajectory.The number of basis functions and the trajectory length are varied as follows: length T = 10 4 T = 500

Figure 4 :
Figure 4: True error and calculated error for the OU process.Top: the bold line indicates the true root mean squared (RMS) error over 30 independent trajectories, while the shaded region indicates the mean ± 1 standard deviation.The purple dot shows the optimal lag time.Bottom: the calculated RMS estimation error obtained from each of the 30 trajectories.

Figure 5 :
Figure 5: Implied timescales of the OU process.

Figure 6 :
Figure6: When the minimum condition number is 2.0, estimation error is low (left).When the minimum condition number is 9.5, estimation error is much higher (right).

e −σ k τ ≤ 1 that appears in the statement of Theorem 3. 2 . 7
Proof of equation(3.4).As in the proof of Theorem 3.4, the upper bound (