Parareal Convergence for Oscillatory PDEs with Finite Time-scale Separation

A variant of the Parareal method for highly oscillatory systems of PDEs was proposed by Haut and Wingate (2014). In that work they proved superlinear conver- gence of the method in the limit of infinite time scale separation. Their coarse solver features a coordinate transformation and a fast-wave averag- ing method inspired by analysis of multiple scales PDEs and is integrated using an HMM-type method. However, for many physical applications the timescale separation is finite, not infinite. In this paper we prove con- vergence for finite timescale separaration by extending the error bound on the coarse propagator to this case. We show that convergence requires the solution of an optimization problem that involves the averaging win- dow interval, the time step, and the parameters in the problem. We also propose a method for choosing the averaging window relative to the time step based as a function of the finite frequencies inherent in the problem.


Introduction
A variation of the Parareal method [28] [29] for highly oscillatory systems of equations was proposed in [21].The method constructs the coarse solver based on a coordinate transformation and fast-wave averaging motivated by multiscale analysis of PDEs [7,26,33] and performs the integration using techniques from the Heterogeneous Multiscale Method [14].They proved that the method provides parallel speedups [21] in the limit of infinite time scale separation.Finite time scale separation is an important case to understand for physical applications because many physical phenomena, such as those occurring in numerical weather prediction, have finite frequencies inherent in the problem, e.g.Earth's rotation rate is finite.In this paper we extend the work of [21] by showing that rapid convergence of the method is also possible for finite timescale separation by proving error bounds on the coarse solver in the case of finite timescale separation.
Examples of applications of the Parareal algorithm being applied to parabolic PDEs include simulations of financial markets (i.e. the Black-Scholes equation for an American put [5] and a nonlinear parabolic evolutionary equation via the finite element method [22].Hyperbolic systems solved with Parareal include simulation of molecular dynamics [3], fluid/structure interaction [15], solution of the Navier-Stokes equations [16], and reservoir modelling [20].In all of these applications, the degree of oscillatory stiffness was not sufficient to impede convergence, but it is known (cf.[1]) to be an issue for the Parareal method.
There have been several modifications to the Parareal method which apply to highly oscillatory systems which assume that the system may be separated into fast and slow variables.In terms of ODEs, [27] have proposed a multiscale method for singularly perturbed ODEs where the fast dynamics are dissipative.Ariel et al (2016) [1] propose a method for highly oscillatory ODEs which is multiscale in nature but does not require explicit knowledge of the fast and slow variables.Gander and Hairer (2014) [19] suggest Parareal methods for Hamiltonian dynamics.Approaches using symplectic integrators with applications to molecular dynamics are presented in, for example, [2] and [6].Finally, [21] proposed a method which is motivated by a asymptotic solutions for fast singular limits of nonlinear evolutionary PDEs.It is an extension of this method which we study here and which we refer to as Asymptotic Parallel-in-Time (APinT).It takes its name from the modified coarse solver which is inspired by methods used in the asymptotic analysis of PDEs.
In this work we are primarily interested in oscillatory stiffness.Oscillatory stiffness places a restriction on the convergence of the Parareal method due to accuracy and stability limitations it places on the timestep size.We consider oscillatory stiffness to be a phenomenon arising from the presence of rapid oscillations which restricts the coarse timestep, with the degree of oscillatory stiffness being the degree to which the timestep is restricted.As discussed by Higham and Trefethen [23], stiffness is a transient phenomenon involving finite time intervals.This is important here as in this paper we are concerned with mitigating oscillatory stiffness over the interval of the coarse timestep.
We consider as a model equation a PDE of the form: where u is the vector of unknowns, L is a skew-Hermitian linear operator with purely imaginary eigenvalues, and N (•, •) is a nonlinear operator which is assumed to be of quadratic type.We further assume that the solution, u ∈ L 2 and that we may approximate eq. ( 1) as a finite system of ODEs.The linear term then induces temporal oscillations on an O(ε) timescale, which can require the use of prohibitively small timesteps for standard numerical integrators if ε is small.If accuracy is required, as is necessary for the Parareal method, implicit methods must also use a small time step.The contribution of this work lies in an improved error estimate of the asymptotically motivated coarse solver which permits a mathematical description of the relationship between the time stepping error and the time-averaging of the the nonlinear operator.This understanding leads to an improved mathematical understanding of the convergence of the APinT method.With this improved notion of accuracy, we are then able to prove that the APinT algorithm converges for finite time scale separation.This is an advance on the previously shown case in the limit of ε → 0.
The slow solution relies on an averaged version of eq. ( 1), with the average taken over an infinite window in the limit as ε → 0. This is estimated numerically by a finite sum over a sufficiently large window.It has been shown [12][14][21] that the length required for this window may be reduced through the use of a smooth kernel of integration.In the small-ε limit, we find that this method provides a convergent algorithm.We have also found that for finite ε the averaging window may be chosen such that the slow solution is sufficiently accurate that the Parareal method remains convergent, as shown in fig. 1 Figure 1: The number of iterations required for convergence of the APinT method for the 2-D rotating shallow water equations across three values of timescale separation, ε (cf.eq. ( 1)).Note that towards the small-ε limit, shown in red, the convergence improves with an increase in the size of the averaging window, as is consistent with the asymptotic theory.Of interest, however, there is a clear minimum is visible outside of this limit, especially for ε = 1, which marks a departure from the asymptotic theory where the limit is not taken.This makes clear both that the convergence of the method depends on the degree of scale separation, ε, and that the width of the averaging window, shown here proportional to the coarse timestep, may be chosen to control it.We shall rigorously explain this in section 4.
In the next section we give an overview of the Parareal method to set the context of the work.In section 3 we discuss the slow solution which is found by fast-wave averaging and which forms the coarse propagator of the APinT method.We will discuss the implications of the existence of near resonances on the slow solution.With that in mind, we proceed in section 4 to prove the error bounds and therefore the convergence of the APinT method.Finally, in section 5 we will show numerical experiments on the one-dimensional rotating shallow water equations and discuss some particularities of solving these equations with the APinT method.

The Parareal Algorithm
In this section we briefly review the Parareal method proposed by Lions et al. [28], and further expanded upon by Maday and Turinici [29].They proposed a generalisation of the concept of domain decomposition to the temporal domain, in which a 'coarse' approximation to the solution is computed which is then refined, parallel in time, by the 'fine' timestep.The solution has been shown to iteratively converge to the fine solution [18].In practice, this method requires that the coarse timestepping method permits large timesteps, that it be inexpensive to compute and sufficiently accurate that the method converges quickly.In general, the maximum timestep is O(ε), so in the case of ε = O(1), i.e. the less-stiff case, Parareal may be applied without any modifications.The insight of [21] was that a slow solution based on a coordinate transformation and a time average over the fast waves in the nonlinear operator provides a convergent and efficiently-computable coarse approximation.In fact, they showed that under suitable assumptions of smoothness, superlinear convergence is obtained as ε → 0.
Also related to this work is that of [30] who paid particular attention to the parallel implementation.They note that the coarse solver may employ a coarser timestep [28], a coarser discretisation [16], and/or a simpler physical model [29].The APinT coarse solver as presented here us a combination of the first and third of these.
A sketch of the algorithm is in fig. 2. We assume for the sake of simplicity that we are interested in solving eq. ( 1) on the interval t ∈ [0, 1].Let ϕ t (u 0 ) denote the evolution operator associated with eq. ( 1) such that u(t) = ϕ t (u 0 ) solves the full equation.Similarly ϕ t (u 0 ) solves the averaged equations.
We then divide the time domain into N finite subintervals, [n∆T, (n+1)∆T ], where n = 0, . . .N − 1.The Parareal algorithm begins with a coarse solve and then proceeds by computing approximations to the solution, U k n , iteratively, following: Here, since the quantities ) are already computed at iteration k, the difference can be computed in parallel (1) First Iteration (3) Parareal Correction Iteration The Parareal algorithm.In (1), the coarse propagator is used to find an initial approximation to the solution.The solutions found by this iteration at coarse timesteps n∆T are then taken as the initial conditions for parallel-intime refinement by the fine proagator (2).Finally, the initial approximation is refined in a serial fashion by the Parareal correction iteration eq. ( 2) shown in (3).The process in (2) and ( 3) is repeated, with the results of the correction iteration providing the new initial conditions for the next time-parallel iteration, until the desired level of convergence is obtained.for all n.Since the computation of ϕ ∆T (U k n−1 ) is cheap, the overall computation is quick in a parallel sense if the iterates converge quickly.

The Slow Solution
Our interest in solving general PDEs which arise in physical modelling, in particular those of weather and climate, requires that we confront the problem of oscillatory stiffness over the interval of a coarse time step.In this section we shall describe the mathematical roots of the slow solution, provide a short description of its historical context, and review its discretisation as a coarse solver for the Parareal algorithm.As the convergence of the coarse solver for the APinT algorithm depends on the quality of the approximation, we will then investigate the numerical behaviour of this solver before providing an improved proof of the performance of the slow solver.
As an example of an application where the actual physics was thought to be asymptotic, but later shown not to be we look to the field of numerical weather prediction.Historically, the physical notion of 'slow' dynamics, called Quasi-Geostrophic (QG) equations was a major advance in understanding weather.This insight was due to Charney [8] who derived the 'slow' equations.These reduced equations allowed the fast waves, which cause the oscillatory stiffness, to be filtered while still resolving the large-scale motions of the fluid.The work was later expanded upon, leading to what is generally recognised as the first successful numerical weather prediction [9] [10].Since those early days, the reduced equations have been rigorously shown to hold asymptotically in the limit of ε → 0 [13].In contrast to these results, modern weather prediction has found that the reduced equations are not accurate enough to be predictive and therefore they rely on numerical approximations of the full equations of motion [11].As such, we are confronted with the problem that at least some of the oscillations matter even for the large scale flow and so we must find some way to resolve the fast waves in order to capture the full dynamics.
To address this problem for the Parareal method, [21] constructed a numerical approximation to the governing equations eq. ( 1) based on the consideration of fast singular limits [33].One of the conclusions of this theory for when the linear operator, L is skew-Hermitian is that though the leading order dynamics is not slow itself, the slow dynamics evolve independently of the fast, while the fast dynamics are 'swept' by the slow [33].For realistic weather, we expect ε ∼ 0.01 to ε ∼ 0.1 [34].Since ε is finite for realistic cases, the timescale separation is also finite.Working in the limit of small ε and applying the method of multiple scales, an averaged equation for equations of the form eq. ( 1) was found by [13].
For a slow timescale, t, and a fast timescale, τ , they showed that averaged equations in the asymptotic limit of ε → 0 must satisfy: where ū denotes the averaged u and where the integral is taken over the nonlinear operator, not the solution itself, and where there is a mapping by the exponential of the linear operator between the averaged and 'full' solutions: The above condition, studied in detail by [7], [26], and [33], motivates our averaging, described below.We follow [21] and write the averaged equation in the following form: ∂u ∂t + e tL/ε N (e −tL/ε u, e −tL/ε u) = 0, (5) from which the full solution may be obtained through the application of the matrix exponential as in eq. ( 4).The averaged equation, eq. ( 5), has lost the factor of 1/ε from the full equation, eq. ( 1), the main source of the oscillations, although another derivative will regain this term and so the oscillations have not been entirely eliminated.
In order to use this equation as a coarse solver for Parareal with finite timescale separation, [21] retreated from the asymptotic limit by taking the integral over the nonlinear operator in eq. ( 3) over a finite time averaging window, rather than the infinite limit associated with → 0. This integral is approximated numerically by using a smooth kernel, ρ(s), 0 ≤ s ≤ 1 which is chosen such that the length T 0 of the time window for the averaging is as small as possible, and approximate the averaged nonlinear operator eq. ( 3) as:  The solution here exhibits both rapid oscillations and a slower trend.Of importance here is the difference between the derivative considering the entire solution, i.e. ∂u ∂t and that of the averaged solution with respect to the given averaging window, i.e. ∂u ∂t .It is this averaged right-hand side which follows the slow solution without being affected by the fast oscillations and is therefore applicable to the coarse timestepping.
The Heterogeneous Multiscale Method (HMM) [14] is then applied to the slow equation by computing the averages numerically, as in eq. ( 6).In practice, the width of the averaging window may be freely chosen, as illustrated in fig. 3. The choice of this window has a significant effect on the convergence of the method, as illustrated in fig. 1.As the computational cost of Parareal is proportional to the number of iterations required, an optimally-chosen window is necessary.In addition to a new error estimate our insight into the coarse error allows us to choose the optimal averaging window length as a function of the timescale separation.

Triad Resonances
In order to permit a long coarse timestep, the coarse solver proposed by [21] and described in section 3 filters the nonlinear operator.The effect of this is a change in the content of the nonlinear triad interactions which is a function of both the degree of near resonance of the interaction and the length of the averaging window.As we discuss in section 4, the extent to which near-resonant sets are retained or rejected has an important impact on the convergence of the Parareal method.Therefore, in this section we review nonlinear triad resonances for the model problem that we use in our tests in section 5.2, noting that a similar approach applies to all systems of the form eq. ( 1).
Systems governed by a quadratic nonlinearity with dispersive waves exhibit triad resonances [13][17] [24].Since we are motivated by geophysical modelling, we consider the general averaged equation for the rotating shallow water equations, which are commonly used as a test case for geophysical solvers.Following the notation of [13], we decompose the right-hand side of eq. ( 3) in terms of its basis of eigenvectors and write: to the different branches of the eigenvalues, k, k 1 , and k 2 are the wavenumbers, ω α k is the dispersion relation at a given α and wavenumber, σ denotes the Fourier coefficient in this basis, r α k is the right eigenvector of the linear operator, and C α1,α2,α k1,k2,k is an interaction coefficient [31].
In the asymptotic case, the limit ε → 0, by the orthogonality of the Fourier series, the only waves which remain after the wave averaging procedure (i.e.where τ → ∞) are the direct three-wave resonances (cf.[13], [31], [35]), i.e. the elements of the resonant set, S k,α i.e.: The wave-averaged solution then follows: It is this three-wave resonance condition from which the behaviour of the averaging kernel can be understood.In the limit as τ → ∞, only the direct resonances should remain.Because we have finite timescale separation, this integral is approximated over a finite averaging window which must be large enough to filter the non-resonant triad.
As shown in [21], using a finite time-averaging window and numerically integrating with respect to a smooth, finitely supported kernel permits this algorithm to result in a convergent Parareal algorithm in the limit of small ε.
For finite ε we take a finite average and so the solution set is larger than the direct resonant set and this has an important effect on the convergence of Parareal.To better explain the finite-ε case, we define concentric shells of near-resonances, i.e. we rewrite the triad-based form eq. ( 7) as: where S β k,α , β = 1, 2, . . .refers to a near-resonant set, i.e.: where 0 = 0 by definition.The direct-resonant set results in a solution consisting of only the slow dynamics of the system -which was shown in [13] to be equivalent to the reduced equations for this system.Again, and as we will see in section 4, the extent to which the near-resonant sets are retained and rejected by the averaging procedure is fundamental to the convergence of the APinT variation of the Parareal method.

Error Bounds
Now that we have discussed the key elements of the algorithm, we are in a position to discuss and prove convergence for the case when ε is finite.We shall first construct an improved error estimate for the coarse solution, and then use that result to prove the convergence of the Parareal method.

A bound on the errors due to time-stepping and timeaveraging in the coarse solver
In this section we employ the idea of near-resonant sets to extend the existing proof of APinT convergence [21] to the case of finite ε.As has been demonstrated above (cf.fig.1), the choice of the averaging window width, η, has a profound effect on the convergence of the method.While the choice of η is well-understood for the limit of small ε [21], we show here that η may be similarly chosen to provide convergence for ε up to O(1) for an appropriate coarse timestep.We first reduce eq.( 1) to a standard form for ODEs.Following section 3, we write: i.e. we are interested in the solution over a ∆T timescale.Let τ = t/(ε∆T ), and so ṽ (τ ), defined on the interval [0, 1/ε], Then differentiation gives, Upon this substitution into the coarse solver eq. ( 5) over the discrete time interval eq. ( 13), we arrive at the desired form which permits us to use the framework given in [32] where they have derived bounds for averaging methods.The aim of this is to modify and reapply their result for the error bound due to averaging, which holds on a general dynamical system of finite ODEs.This averaging error is one of the two major sources of error in the timestepping of the coarse solver.We then write the coarse solver in the form: While our interest is in solving PDEs describing physical systems, in practice we employ a Fourier spectral method, which has the effect of treating the PDE as a finite-dimensional system of ODEs.In this paper we show that the APinT method is convergent for finite systems of ODEs.This gives us access to the machinery of the numerical analysis of ODEs and averaging methods, following [32].Let x solve the governing equations when they are written as a system of ODEs, i.e. in the form shown in eq. ( 16).For example, in the numerical experiments given in section 5.2, x is the Fourier solution.Then we may write: Similarly, we consider the coarse solver eq. ( 16) written as a system of ODEs.Let y solve this averaged form of eq. ( 16), i.e.: where the averaging follows directly from the averaged equation, eq. ( 6) and is written: where η denotes the finite length of the averaging window.
Lemma 1. Considering the initial value problems in x and y as stated above where f is R n × R Lipschitz continuous with constant β in x on D ⊂ R n and t on an O(1) timescale, i.e. for all x 1 , x 2 ∈ D, β is such that: Let: Then we can bound the difference between the exact solution x and the averaged solution y as: The above lemma follows from a modification of Lemma 4.2.8 in [32] in order to include the kernel of integration (cf.appendix A).We have here bounded the error over an O(1) time interval instead of O(1/ε) so that the rate of convergence at different degrees of scale separation may be more easily compared, as in practice we are interested in simulations over fixed timescales.Taking the unmodified lemma provides a slightly different result as it gives the averaging error over a simulation time which scales with ε.Due to the numerical nature of the proof here, the appropriate timescale is over a coarse timestep.lemma 1 places a bound on the error committed by averaging over the fast waves, independent of the numerical methods used for spatial or temporal discretisation.Next, we consider the error arising from the numerical approximation of eq. ( 18).In doing so, we will need to assume bounds on f in the region of phase space where y exists.Then assume that We assume that such a bound exists for higher spatial derivatives of f , such that Lemma 2. Denote the numerical approximation to the averaged solution y(t) with timestep ∆T by a second-order timestepping method as y ∆T (t).Assume that y(t) = εf (x, t) and that f ∈ D as in eq.(23).Assume that integration is performed with respect to a smooth kernel, ρ(•), and let λ n denote the n-th near resonant triad (cf.section 3.1).Then the local time-stepping error of a second order time-stepping scheme applied to eq. (18) satisfies: for some constant, C ∈ R < ∞ and where M is the bound over the nonlinear operator as given in lemma 1.
Proof.The timestepping error of a p-th order scheme is bounded by [25]: First, decompose f in terms of its basis of eigenvectors as discussed in section 3.1.As with eq. ( 7), we may write the solution as a sum of ODEs, each for a specific resonant nearness, λ n .Then for the j-th component of y, we write n ∆T e i∆T λn(t+s) N n,j (y) ds, (27) where the nearness of the resonances in any particular ODE is exposed through the eigenvalue sum, λ n , in the exponent and where the subscript , j denotes the j-th component and not a derivative, as it would with Einstein's notation.We then seek the third time derivative, which is found to be λ n e i∆T λn(t+s) ∂N n,j (y) This is then the right-hand side which is integrated with respect to the smooth kernel.The magnitude of the near-resonant triad, λ n , now presents itself as a multiplier on the complex exponential.It is then clear that it is this value, which is zero for direct resonances but becomes large in general, which is the source of numerical stiffness.For convenience, we introduce then In bounding the timestepping error, we are interested in the norm of this quantity.Recalling that we are working with a finite-dimensional system of ODEs and applying the triangle and Cauchy-Schwarz inequalities we find that Now, as N and all of its spatial derivatives up to and including p = 2 are bounded by M by eq. ( 23), we write where C f is a positive constant.We will now assume that |λ n | = 0 as we are interested in the sup-norm of these values, which is nonzero when nearresonances are included.The directly resonant case has been treated by [21].Then we must consider two possibilities.Firstly, if |λ n | ≤ 1, then we define some constant, K 1 , If |λ n | > 1, the binomial theorem yields: And then we may write As for the Rotating Shallow Water Equations there must always be a value of λ n which is strictly greater than one, we shall assume that it is the second value which is the maximum.We now let C = C t K. Finally, we bound the nonlinear term in the same fashion as lemma 1, where the fact that: completes the proof by providing an upper bound for the nonlinear operator as in lemma 1.This provides a bound for the error due to timestepping which does not depend directly on the solution, but rather on the general properties of the nonlinearity, in particular the triadic interactions.
With these two lemmas describing our primary sources of error, we will seek a bound on the error in the APinT algorithm and use this to show convergence.From lemma 2, it follows that the timestepping error depends on: We define the following term which describes the filtering, independent of the gain due to the scale separation and the coarse timestep, and which is the key insight into understanding how to regularise an oscillatory problem over a finite time interval.(37) Λ(η) provides a measure of the extent to which the averaging integral mitigates the numerical stiffness.Recall that when the maximum λ n is large, as it is for highly oscillatory problems, it contributes to large gradients on the right-hand side requiring a small numerical timestep.In contrast, the integral component tends to zero as λ n gets large, and does so superlinearly because of the smooth kernel, ρ(s) [21].This term is then where we see precisely how the averaging procedure filters the fast oscillations, causing Λ(η) to achieve a lower magnitude than λ 2 n does on its own and therefore reducing the numerical stiffness.

Λ(η) = max
In seeking a bound on the error in the timestepping, it is necessary to bound this term for some particular averaging kernel, ρ(s).The choice of averaging kernel affects the error bounds through this function.Λ(η) is bounded and tends rapidly to zero as η → ∞ (q.v.section 5.3).
With this in mind, we now prove theorem 1 which bounds the error committed by the coarse timestepping as compared to the fine.This will later allow us to prove error bounds on APinT subject to finite timescale separation.Theorem 1.Let ∆T denote the coarse timestep for a second order numerical method.We assume a finite scale separation on the order of ε.For an averaging window of length η, the total error in the coarse timestepping for the APinT algorithm is bounded by: where M is the sup-norm over the nonlinear operator as in lemmas 1 and 2 and C 0 , C 1 , and D 1 are finite constants.
Proof.By the triangle inequality, we may write: lemma 1 is used to bound the first term, i.e.: Applying lemma 2 and eq.( 37) to the second term yields: where Λ(η) is bounded independently of λ n for any averaging window length, η.Combining the bounds in equations eq.(39) and eq.( 41) gives the theorem as desired.

Proof of Parareal Convergence
We may now derive error bounds for the Parareal iteration on finite systems of ODEs, given in eq. ( 2).Using our improved error bound for the coarse solver which holds for finite ε, we modify the proof given in [21], which held only as ε → 0. For consistency we define several operators following [21].Let φ∆T (•) be the evolution operator associated with numerically solving the slow equation using an O(p) method, such that φ∆T (•) is a numerical approximation of ϕ ∆T (•).Furthermore, let ϕ ∆T (•) denote the evolution operator for the fine solution.We then define: Then, as in [4], [21], and section 4.1 we make the following assumptions: 1.The operators ϕ(•) and ϕ(•) are uniformly bounded for 0 ≤ t ≤ 1: The averaging method is accurate in the sense that: 3. The averaged evolution operator satisfies: and the numerical approximation to the evolution equation satisfies: 4. Following lemma 1 and lemma 2 and eq.( 43), the error operators satisfy: and: We have now quantified the major sources of error in the coarse timestepping which will affect the convergence of Parareal.The following proof of the convergence follows directly from these bounds.Theorem 2. Subject to the above assumptions, the error, u(T n ) − U k n , after the k-th Parareal iteration is bounded by: Proof.The proof is by induction on k.When k = 0: where we have used eq.( 44), which bounds the error induced by the averaging procedure, to bound the first term and lemma 2, which governs the timestepping error, for the second.Now assume that: We may then write the Parareal iteration, eq. ( 2) in the following form, using eq.( 47) and eq.( 48): By directly substituting equations eq. ( 46), eq. ( 47), and eq.( 48), we have: Finally, application of the discrete Gronwall inequality gives: theorem 2 is one of the key contributions of this work.Using the understanding of near-resonance and the result of theorem 1, it generalises the proof given by [21] of convergence for the asymptotic limit as ε → 0 to finite ε.This is a significant improvement as for many physical applications such as weather and climate modelling ε remains finite.As the averaging window length, η, may be freely chosen we may select an optimal η for a wide range of ε subject to the other constants and choice of ∆T such that the method is convergent.We discuss this in the next section.

Convergence for any ε
Given theorem 2 we are in finally in a position to discuss convergence for any timescale separation.For the APinT algorithm to converge, we require that: We are then left with the problem of choosing an appropriate averaging window length, η, depending on the degree of scale separation, , and the filtered contribution of the triads, Λ(η).In the interest of demonstrating that one exists, we assume the scaling (for example): We then have: as ε → 0, our error also decreases for any value of the power s.Λ ∆T ε s is bounded, so as ε → 1, all terms remain bounded and we may choose our coarse timestep accordingly to ensure convergence.This means that the method proposed here may be applied across the full range of ε ∈ (0, 1] with only a change of averaging window length, which allows convergence for physical problems where the time scale separation may change throughout the computation.This is in contrast to the proof in the limit [21] which proved convergence only for ε → 0.

The One-Dimensional Rotating Shallow Water Equations
We now consider an example, using the one-dimensional rotating shallow water equation as a test-case, as did [21].Let the unknown vector be: We then write the linear and nonlinear operators in the full model eq.( 1) as: for some constant, F ∈ R. The corresponding eigenvalues are: In general, as ε → 0 we expect that Λ(η) → 0 as well due to cancellation of oscillations in the integral [13].As discussed in section 4, oscillatory stiffness arises due to the magnitude of the gain term outside of the integral, which is large for highly oscillatory systems.The integral itself, however, is bounded from above by one, and achieves this value only for directly resonant triads (cf.section 3.1), where the gain is zero.As the distance of resonance (i.e. the magnitude of |ω α k − ω α1 k1 − ω α2 k2 |) increases, the integral tends to zero as well (and does so faster with larger η).
The choice is then for a given degree of scale separation, ε, to choose an η which mitigates the stiffness sufficiently to allow the necessary coarse timestep, while retaining as much fidelity to the full equations as possible (cf.lemma 1, where the averaging error is proportional to η).We shall deal with the practical implications of the form of Λ(η) in section 5.3.

The Optimally-Averaged Slow and Fast Solutions
In order to illustrate the slow averaged solution over which the timestepping is performed and its relation to the full solution, fig. 4 compares the slow, full and true solutions for the stiff case where ε = 0.01.The spatio-temporal oscillations are very rapid in the stiff case, which is the source of the timestep limitation by the CFL condition.However, the slow solution over which the timestepping is performed lacks these rapid oscillations and so permits the large timestep.
Spatio-temporal oscillations from an initially stationary Gaussian height field are shown on a domain which is spatially periodic, i.e. the top and bottom boundaries of the plots wrap around.The decay of the height field into waves travelling in opposite directions is visible in fig. 4. The optimal averaging window (q.v.section 5.3) was applied, and convergence to single precision was obtained in six iterations.Figure 4: comparison of the solution derived from the averaging method with the 'true' solution.All three plots show spatio-temporal oscillations in the height field of the 1D RSWE with the time coordinate on the x-axis and the spatial coordinate on the y-axis.The top plot shows the slow approximation of the height field (the third component of u).The timestepping is performed over this slower quantitywith decreased oscillatory stiffness and therefore an increase in timestep.The middle plot is the projection of this quantity back into normal space by the matrix exponential, e τ L .This is the coarse solution which is used after the first coarse solve.The quality of this when compared to the solution computed with the fine solver shown in the last plot (to which the APinT algorithm converges) is what allows rapid convergence of Parareal.In this example, ε = 0.01 and the averaging window, η = 1.0, which is optimal for this problem..

Numerical Results on the 1-D RSWE
In this section we present numerical results for the one-dimensional rotating shallow water equations which build on those presented in [21].Figure 5 shows the norm of the coarse error, i.e. x(t) − y(t) 2 computed relative to the fine timestep versus the width of the averaging window, T 0 , which denotes the numerical choice of η, where 0 ≤ t ≤ 1, δt = 2e−4, and the spatial resolution is N x = 64.This spatial resolution and fine timestepping regime were found to be within the asymptotic range of the timestepping.A second-order Strang splitting method was used for both the coarse and fine solves.The initial flow was stationary with a Gaussian height field.
For the smallest ε the asymptotic behaviour is well approximated, as the fidelity of the coarse timestepping increases as the averaging window increases.This is consistent with the behaviour described in section 3.1, where the theory predicts that η → ∞ as ε → 0. However, for larger ε such as the two cases shown, there is a clear optimal size for the averaging window to take, i.e. the minimum in the red and green curves in fig. 5.The location but not the magnitude of this point is predicted by eq. ( 62).
Theorem 1 states that we should expect that outside of the small-ε limit the iterative error should decrease with k as in theorem 2 and exhibit a minimum where the sum of the timestepping and averaging errors is smallest.In the case of ε = 0.01, which is near the limit as ε → 0, we expect that for a large enough window we will have optimal convergence, with no improvement in solution quality for a larger averaging window.The numerical results are then consistent with the theory developed in this paper.1.This is a numerical estimate of the error corresponding to that in theorem 1, i.e. x(t) − ∆T (t) , computed by brute-force comparison of the averaged coarse solution to a finely computed reference solution.Note the clear existence of an optimal averaging window for the case where ε = 1.0, and the tendency towards the asymptotic theory, i.e. the error becoming inversely proportional to the averaging window length, T 0, as ε → 0.
In practice, we seek a choice for T 0 for which the solution is non-stiff on an O(∆T ) interval, and therefore as ∆T increases, so must the averaging window.Similarly, the oscillatory stiffness is proportional to 1/ε, and so as ε → 0 it is necessary to choose a longer averaging window, and therefore to apply stronger smoothing to the solution.
Comparing fig. 5 to fig. 6, which shows the iterative error in the APinT method after three iterations for the same parameters, the direct computation of the coarse timestepping error provides good qualitative agreement with the optimal choice of η for the different values of ε.This is in direct agreement with the prediction of theorem 2.
As T 0 is taken smaller, instability is observed for all .This corresponds to the explicit CFL limit being violated, as reducing the length of the averaging window increases the maximum wave speed in the solution.For large T 0 , the iterative error roughly stabilises for finite ε.In the limit as η is taken very large, the coarse timestepping corresponds to an incorrect equation (e.g. as in the QG equations for our example) being solved in a numerically stable fashion.The difference in the coarse and fine equations is sufficient to inhibit convergence, but does not violate the timestepping limit.5 the measured total (i.e.timestepping plus averaging) error in the coarse timestepping, this figure shows the iterative error for a full Parareal solve of the RSWE after three iterations for the same computational conditions.Note that the behaviour with respect to variation of the averaging window and particularly the location of the optimal window length is well predicted by the brute force computation of the coarse timestepping error.

Optimal Averaging for the 1-D RSWE
It was shown in section 4.3 that it is possible to choose the averaging window in such a way as to ensure convergence.Beyond doing this, we may choose the window optimally to obtain the fastest possible convergence (cf.fig.6).
The reason we are able to describe the qualitative behaviour this way is a direct result of the Parareal algorithm and the sources of error present in it.The Parareal method consists of an initial approximation to the solution performed by the coarse solver which is accurate to within the coarse error predicted by theorem 1.This is followed by a series of parallel-in-time corrections which converge to the full solution, derived from the difference between the coarse and fine solutions.The closer the initial approximation is to the solution, the less correction is required to converge.
The optimal choice of η may be written as an optimisation problem: for some as-yet unknown constants C 1 , C 2 , and C 3 .It is here that the fact that both the timestepping and averaging errors are bounded proportionally to the norm of the nonlinear term, M , becomes serendipitous, as this constant may be somewhat non-optimal in practice.In seeking the location, but not the magnitude, of the minimum coarse error, the bound on the norm of the nonlinear operator plays no role.Seeking stationary points with respect to η, this then requires us to find η such that: The result of eq. ( 59) is used to choose the optimal averaging window.This result captures the relationship of parameters such as the timestep and the scale separation on the optimal averaging, but relies on several unknown constants.If these constants C n were known, the optimal averaging window could be determined computationally.Equation (59) would then provide an approximation to the optimal window.Given some initial data of the type shown in fig.6, these constants may be fit by least-squares.Doing so fits the known trend to the known data, and permits the optimal averaging window to be recomputed 'on the fly' in a computation.
Certain practical issues arise in the computation of η.Firstly, the computation of dΛ dη requires all triads to be investigated, i.e. the maximum is taken over the set of all near-resonant sets.Doing so is computationally expensive, although if this computation were to be performed infrequently the cost could be negligible compared to the simulation cost.Additionally, finding η requires solving a transcendental equation in at least two variables (η, ε), both for the initial fitting of constants, and for the optimisation on the fly.We therefore propose a simpler model based on the behaviour of Λ(s).
Restricting ourselves for this example to a Gaussian kernel, we may consider the asymptotic behaviour of the kernel as λ n is large.This gives: We then multiply our approximation by 1 = η 2 /η 2 , to obtain: , showing that it is bounded independently of x and that it tends rapidly to zero as x → ∞.This is used conceptually in bounding the Λ-term, i.e. the mitigated stiffness, in eq. ( 60).The unmitigated stiffness is shown in grey.
for some constant, D 1 , since x 2 e −x 2 is bounded independently of x (cf.7).We then replace our first term in eq. ( 58) and seek fixed points corresponding to the minimum error.This yields: This equation provides an estimate for the optimal averaging window length, η opt , in terms of the computational parameters and the empirically-fit constants.This result is consistent with theorem 2, as it exhibits a clear minimum for O(1) values of ε, with the optimal averaging window increasing as ε → 0, as the asymptotic theory predicts.Both approximations are shown in fig.8 for a set of minima extracted from a series of runs of the algorithm.
The full model given in eq. ( 59) provides a much closer approximation both to the behaviour for ε = 1 and as ε → 0, and as an actual fit to the points.It does this, however, at the cost of several orders of magnitude more computational difficulty.The simple model of eq. ( 62), on the other hand, provides a reasonable approximation to the error as a function of ε, but has the disadvantage of poorly resolving the trend in the limit as ε → 0. While the simple prediction underestimates the optimal as ε → 0, the behaviour in this range is well-understood (cf.[21], [1]) and so a hybrid model may easily be applied in practice.59) is shown with the green curve.This mode shows good agreement throughout the range and handles the long averaging windows needed as ε → 0 as well.The simple model derived from asymptotic analysis on a Gaussian kernel is shown in red, and provides similar accuracy as the full model outside of the small-ε region, but at a dramatically reduced computational cost.Finally, the dashed line indicated the assumed scaling on the averaging window given in eq. ( 53) with s taken empirically as 0.2.The trend towards a longer time averaging window being necessary for smaller ε is captured, while this scaling somehwat overestimates the window for larger values of scale separation, although it may be computed very cheaply.

Conclusion
We have investigated the convergence of a Parareal method using the APinT coarse solver, which provides a technique by which oscillatory-stiff equations may be solved with the Parareal method.The convergence of this method is due to the averaging applied to the coarse solution, which filters the fast waves and mitigates the oscillatory stiffness present in many of the equations of mathematical physics.This averaging must be performed over the entirety of the nonlinear operator due to the role the direct and near-resonances play in the oscillatory stiffness of the system.
By describing the error of the coarse solver in terms of the interplay between the average over the rapid oscillations and the timestepping, we show the method converges for finite scale separation, significantly extending the domain of applicability for this method.
We have shown here that this method is convergent across a wide range of scale separation, which is an improvement on the prior result [21] which held only in the small-ε limit.Further, in section 5.3 we considered both a full and a reduced model to predict the optimal averaging window in practical codes.To prove this, change variables: τ = t/ (hε) and

Figure 3 :
Figure3: A schematic showing the averaged versus full derivatives for some toy solution exhibiting fast and slow behaviour, and for two different averaging window widths, η 1 , and η 2 .The solution here exhibits both rapid oscillations and a slower trend.Of importance here is the difference between the derivative considering the entire solution, i.e. ∂u ∂t and that of the averaged solution with respect to the given averaging window, i.e. ∂u ∂t .It is this averaged right-hand side which follows the slow solution without being affected by the fast oscillations and is therefore applicable to the coarse timestepping.

Figure 5 :
Figure5: Computed coarse error for ∆T = 0.1.This is a numerical estimate of the error corresponding to that in theorem 1, i.e. x(t) − ∆T (t) , computed by brute-force comparison of the averaged coarse solution to a finely computed reference solution.Note the clear existence of an optimal averaging window for the case where ε = 1.0, and the tendency towards the asymptotic theory, i.e. the error becoming inversely proportional to the averaging window length, T 0, as ε → 0.

Figure 6 :
Figure 6: Iterative error in APinT with 1-D RSWE after three iterations for ∆T = 0.1.Whereas fig.5the measured total (i.e.timestepping plus averaging) error in the coarse timestepping, this figure shows the iterative error for a full Parareal solve of the RSWE after three iterations for the same computational conditions.Note that the behaviour with respect to variation of the averaging window and particularly the location of the optimal window length is well predicted by the brute force computation of the coarse timestepping error.

Figure 7 :
Figure 7: Examples of the function x 2 e −Cx2 , showing that it is bounded independently of x and that it tends rapidly to zero as x → ∞.This is used conceptually in bounding the Λ-term, i.e. the mitigated stiffness, in eq.(60).The unmitigated stiffness is shown in grey.

Figure 8 :
Figure8: The optimal averaging window for the APinT solver is predicted in three different ways as a function of the scale separation.The measured optimal values for several runs with a coarse timestep of ∆T = 0.05 are shown with the filled circles.The so-called 'full' or expensive model from eq. (59) is shown with the green curve.This mode shows good agreement throughout the range and handles the long averaging windows needed as ε → 0 as well.The simple model derived from asymptotic analysis on a Gaussian kernel is shown in red, and provides similar accuracy as the full model outside of the small-ε region, but at a dramatically reduced computational cost.Finally, the dashed line indicated the assumed scaling on the averaging window given in eq.(53) with s taken empirically as 0.2.The trend towards a longer time averaging window being necessary for smaller ε is captured, while this scaling somehwat overestimates the window for larger values of scale separation, although it may be computed very cheaply.
and discussed in section 4.3.