A Seamless Multilevel Ensemble Transform Particle Filter

This paper presents a seamless algorithm for the application of the multilevel Monte Carlo (MLMC) method to the ensemble transform particle filter (ETPF). The algorithm uses a combination of optimal coupling transformations between coarse and fine ensembles in difference estimators within a multilevel framework, to minimise estimator variance. It differs from that of Gregory et al. (2016) in that strong coupling between the coarse and fine ensembles is seamlessly maintained during all stages of the assimilation algorithm, instead of using independent transformations to equal weights followed by recoupling with an assignment problem. This modification is found to lead to an increased rate in variance decay between coarse and fine ensembles with level in the hierarchy, a key component of MLMC. This offers the potential for greater computational cost reductions. This is shown, alongside evidence of asymptotic consistency, in numerical examples.

1. Introduction.Particle filters and some parametric filters use ensembles to represent a posterior measure given a partially observed random dynamical system.Recently, a number of studies [10,11,12] have considered the extension of the multilevel Monte Carlo method to filtering problems [8].This involves using a telescoping sum of Monte Carlo estimators of solutions to systems discretized with varying levels of resolution.The method has also been applied to other Bayesian inference problems, such as static parameter estimation [13].
The applications of MLMC have varied greatly, with the only major difference being how to 'couple' random samples of the system from two consecutive levels to achieve variance reduction in the hierarchy of Monte Carlo estimators.For nonlinear filtering, [12] uses a coupled resampling algorithm for a particle filter (multilevel particle filter) whereas [10] used an optimal transportation coupling algorithm for an ensemble transform particle filter (ETPF) [15] leading to the multilevel ETPF (MLETPF).Both of these algorithms returned computational cost reductions for a fixed order of accuracy from that of their standard, single level counterparts.
Since then, the optimal transport algorithm has been investigated in further detail in [17] with the aim of the coupling of multiple particle filters.This study also considered the use of the iterative Sinkhorn algorithm, as well as exploiting sparse optimal transport matrices, to solve these optimal transportation problems, resulting in the reduction of their computational cost.The iterative Sinkhorn algorithm has also been implemented to reduce the computational cost of the optimal transportation problem in a second-order accurate ETPF [6].This paper will concentrate on improving the algorithm in [10] to avoid using an assignment problem [14], instead using a 'seamless' combination of three optimal transportation problems.The benefit of applying this modification is that, for certain numerical cases considered in this paper, the rate of variance decay between samples with consecutive levels of increasing resolution is improved from that in [10]; this leads to a greater rate in the reduction of computational cost for the propagation of ensembles in the MLETPF.In the notation of [8], we observe a variance decay scaling close to β ≈ 2 (quadratic decay) in multivariate problems rather than one that corresponds to β ≈ 1 (linear decay) which was observed previously.In other cases, including ones where the problem is univariate or localisation is used, both the proposed and previous algorithms scale at the optimal quadratic rate.However the proposed algorithm is still found to return estimators with a smaller magnitude of variance and thus are less costly for a fixed accuracy.
The paper proceeds as follows.In Section 2, an introduction to Bayesian filters will be given, and then a new 'seamless' agorithm for the optimal transport resampling step in a variant of the particle filter, the ETPF, will be explained.Numerical examples will provide a proof of concept in Section 3. Finally a summary and discussion concludes the paper in Section 4.

Multilevel Filtering Algorithms.
2.1.Particle Filters for Bayesian Data Assimilation Problems.We consider a Bayesian data assimilation problem for a state vector X ∈ R d , with discrete time model equation where ω denotes any stochastic contributions in M, and where M may involve the application of several timesteps with the index n denoting the observation times.We assume that at each observation time, t n (where the interval between two observation times, ∆t = t n+1 − t n , is assumed fixed), we have noisy observations where h : R d → R m is a (nonlinear) observation operator, and η n ∈ R m is a random variable representing measurement error that is assumed in this paper to be drawn from N(0, R), R ∈ R m×m , so that Y n has probability density denoted the likelihood function.A filter is an estimator for the conditional probability density for the state variable X n , given the observations until time index n and assuming a given initial probability density π(X 0 ).The classical particle filter uses an ensemble {X i,n } N i=1 of model states (here i denotes the realisation of model state and n denotes the time index) together with a set of scalar weights {w i,n } N i=1 , so that expectations over the conditional probability density may be consistently estimated using the formula A particle filter can be constructed iteratively using three stages.1. Model propagation: (2.6) (Here ω i represents an independent realisation of the random variable ω for each ensemble member X i,n .) 2. Bayesian weight update: where Z is a normalisation constant chosen so that ∑ N i=1 w i,n = 1. 3. Resampling / transformation: A (possibly stochastic) ensemble transformation There are deterministic cases of this transformation [16] that preserve the filter estimate or in the stochastic transformation case (random resampling) provide an unbiased estimate (2.9) Specific particle filters are defined through the choice of resampling/transformation method, which is necessary to avoid degenerate weights [1].In particular, [7] provides a background on a range of resampling techniques, including adaptive resampling, and their effects on the particle filter.Typically, random resampling methods add variance to the filtering estimator.There is literature exploring the rigorous theory behind the impact of this on the particle filter, such as in [4].Resampling ideas like residual resampling try to reduce this additional variance.The deterministic resampling/transformation described below, which was proposed in [15], is designed to minimise this additional variance.

Optimal Coupling Between Discrete Probability Distributions.
The key tool in the type of resampling considered in this paper is an optimal coupling between discrete probability distributions.First, consider a pair (Z 1 , Z 2 ) of discrete random variables where Z 1 take values in the set with marginal probabilities (2.11) For one to minimise the variance of the difference between Z 1 and Z 2 , one requires the joint probability matrix D i, j that maximises the covariance Cov D (Z 1 , Z 2 ) subject to these marginals.We note where E D and Cov D are the expectation and covariance with respect to the coupling probability matrix D. Thus maximising the covariance Cov D (Z 1 , Z 2 ) is equivalent to minimising the cost function (the discrete Wasserstein distance) (2.13) , over D i, j , subject to the constraints The solution to this optimisation problem can be found by linear programming.The computational cost of this can be O(N log N) if the dimension, d, is one, as it can be solved by a cheap algorithm [16], but is O(N 3 log N) in multiple dimensions.For cheap forward models, this can be a computational bottleneck.Localisation, which is described in the next section, can provide some relief in this issue.As described in [16], the resulting matrix D i, j is typically very sparse, and in the one-dimensional case, has a maximum of two non-zeros per row.This optimal coupling can be used to resample weighted ensembles.Let the ensembles {Z i 1 } N i=1 and {Z i 2 } N i=1 be drawn from random variables Z 1 and Z 2 with measures µ Z 1 and µ Z 2 respectively.Furthermore suppose they are assigned weights {p i } N i=1 and {q i } N i=1 respectively by importance sampling via a third probability measure µ Z 3 , all absolutely continuous with respect to the Lesbegue measure.We wish to resample where the expectation is taken over the random variables used in the resampling.This can be achieved by choosing Zi 2 = Z j 2 with probability D i, j /p i .Then (2. 16) as required.
As an alternative to this random resampling, one can also take the deterministic transformation In [16], the following was proved.If we define the set { Zi 2 } N i=1 , computed via the transform in (2.17), then the maps weakly converge to a map Ψ : R d → R d as N → ∞.In addition to this, the random variable given by Z 2 = Ψ(Z 1 ) has probability measure µ Z 2 .Hence, for arbitrary test functionals g, we can obtain weak convergence for estimates to E[g(Z 2 )], and (2.18) as N → ∞.
The optimal coupling can be used to define a deterministic ensemble transform which was introduced in [15] to replace a weighted ensemble This gives a consistent ensemble with convergent statistics as N → ∞.This is the basis of the Ensemble Transform Particle Filter (ETPF) of [15], where this transform is used as a resampling step.It would be desirable to establish L 2 rates of convergence for this transformation; this could allow one to establish rigorous analysis of the effectiveness of the multilevel Monte Carlo framework which will be described later.

Localisation.
Localisation is useful in cases where the dimension d is large or one is dealing with spatially extended systems.A multilevel version of the Ensemble Kalman Filter (EnKF) for these types of system has recently been proposed [3].Localisation in the ETPF can be used to simplify the optimal transportation problems used above, and in some cases, reduce the computational cost of them from O(N 3 logN) to O(dNlogN).Using a "localisation radius", r loc , one can solve a separate optimal transportation problem on each individual component of the random variables Z 1 and Z 2 , with the cost function for the k'th component of where C is some localisation matrix.This could be of the form In the case when r loc = 0, each optimal transportation problem is one-dimensional (can be solved in O(NlogN)), and the problems for all individual components can be solved in O(dNlogN).This case will be referred to as the fully localised case.It must be noted however, that localisation adds an additional bias into the ETPF estimate.For more information on this localisation scheme that can be used alongside the ETPF refer to [2].Localisation in the ETPF is important in cases where the optimal transport algorithmic cost is dominative over the model cost of the problem.In these cases, the multilevel framework (which can return computational cost reductions in the cost of the forward model / propagation of ensembles) would prove to have a negligible impact on the overall cost of the filter.Thus localisation should be used in those cases.Localisation in the above manner is used in one of the numerical examples at the end of this paper.

A Multilevel
Filter.The recent emergence of the multilevel Monte Carlo method poses the question of how to apply the telescoping sum framework to filters [10,12,11].A multilevel filter estimate makes use of a hierarchy of models M l , l = 0, . . ., L, where l = 0 denotes the coarsest, cheapest and least accurate model and l = L denotes the finest, most expensive and most accurate model (in this paper we only consider models with different time-step sizes so that all state vectors have the same dimension d).We assume that each model has a time-step size h l and that the model cost scales as O(h 0 where the subscript denotes the level of resolution of the realisation and n is the time-index) together with a hierarchy of pairs of ensembles Here X i,n l ∼ X n l but the samples X i,n l , with importance weights ŵi,n l , are independent to the samples X i,n l .Thus expectations may be estimated using the telescoping sum formula where we define the coarse estimator and the difference estimators In filtering one is also interested in estimating higher moments of X n L and thus being able to asymptotically consistently estimate E[g(X n L )] using this telescoping sum is also important.Despite the weak convergence of these estimators mentioned throughout this paper, establishing L 2 convergence rates for them remains an open problem.
The three iterative filtering steps defined in Section 2.1 are applied to this set of 2L + 1 ensembles, noting that: (a) the random sampling of the initial conditions, as well as the stochastic terms in M, must be drawn independently for the coarse estimator and each difference estimator; and (b) the correlation between the coarse and fine level ensembles in each difference estimator must be as high as possible.In general for multilevel Monte Carlo methods, the latter is achieved by using the same initial conditions for the coarse and fine ensembles in each difference estimator, as well as the same realisations of the stochastic terms in M.
If the correlation between the coarse and fine level ensembles in each difference estimator is sufficiently great then we satisfy a key condition of the multilevel Monte Carlo framework.This is that the variance of the difference between pairs of samples in the weighted coarse and fine ensembles, V l , decreases asymptotically with increasing l.When this condition is achieved, and we allow the sample sizes of each of the independent estimators, N l , to decrease asymptotically with l at a certain rate, then generally speaking, model computational cost reductions from a standard single level filter with fixed accuracy can be achieved.This is because each independent estimator balances estimator variance (determined by V l /N l ) and discretization bias; expensive estimators with small discretization bias have less samples and cheap estimators with large discretization bias have more samples.
For the standard multilevel Monte Carlo methodology in [8], an algorithm was presented to find the optimal values of L and N l that produce the greatest computational cost reductions from the Monte Carlo single level counterpart with the same accuracy.For more analysis and accompanying theory, turn to the review article [9].This article also gives a brief guide to the range of applications of multilevel Monte Carlo.
For multilevel filtering algorithms, the challenge is to find a resampling/transformation strategy that keeps the coarse and fine ensembles correlated in each difference estimator after each assimilation step.
2.5.The Multilevel Ensemble Transform Particle Filter.In [10], we proposed an algorithm to address this issue.After updating the weights of each pair of ensembles, we took the following steps to apply the multilevel Monte Carlo method to the ETPF, creating the multilevel ensemble transform particle filter (MLETPF).
We independently apply an ensemble transform to both the coarse and fine ensemble in each difference estimator, following the approach of [15].This transform was described earlier in the paper.For both the coarse and fine weighted ensembles in each difference estimator, with N F particles in each, we seek a joint probability (coupling) matrix D i, j between {X i,n , w i,n } N F i=1 and the evenly weighted ensemble { X i,n } N F i=1 .In particular we desire the D i, j , i = 1, ..., N F , j = 1, ..., N F , that minimises (2.25) subject to the marginal constraints (2.26) This is equivalent to maximising the covariance between them.After using linear programming to obtain the minimal matrix D i, j , we compute the ensemble transform We verify that the mean of this new ensemble was preserved from the weighted ensemble It was demonstrated in [15] that this transformation provides weakly converging approximations of higher moments as N F → ∞.

2.
Re-couple the pair of ensembles in each difference estimator.
For each difference estimator, given the transformed coarse and fine ensembles { X i,n C } N F i=1 and { X i,n F } N F i=1 with equal weights 1/N F respectively, we seek the coupling matrix T i, j , i = 1, ..., N F , j = 1, ..., N F , that minimises (2.29) where T i, j must take non-negative integer values subject to the constraints (2.30) This is an assignment problem that can be solved by the Hungarian algorithm, resulting in a matrix T i, j with all entries equal to either 1/N F or 0. We then re-order the coarse ensemble so that T i, j becomes a diagonal matrix.
2.6.The Seamless Multilevel Ensemble Transform Particle Filter.The multilevel ensemble transform filter has an inelegant aspect, in that we first decorrelate the pairs of ensembles through independent ensemble transformation, before subsequently restoring correlation by using an assignment problem.We should expect the coupling between the pairs of ensembles to be stronger if we were to avoid this aspect and keep them correlated throughout the transformation / coupling scheme.
To address this, we now describe a new multilevel filtering algorithm, which we call the seamless multilevel ensemble transform filter (seamless MLETPF), to correct this inelegance.This algorithm redesigns the assignment problem used to couple pairs of ensembles as two different optimal transportation problems.In this new version of the algorithm, after updating the weights of each pair of ensembles, we perform the following steps for each pair of coarse and fine ensembles in the difference estimators.
1. Find a coupling between the weighted fine and coarse ensembles.
2. Transform the fine ensemble.Generate a coupling matrix T i, j , i = 1, ..., N F , j = 1, ..., N F that minimizes (2.34) subject to the marginal constraints given by (2.35) This forms the matrix needed to transform the weighted ensemble Transform the coarse ensemble with fine weights to an evenly weighted ensemble.
Find another coupling matrix Ti, j , i = 1, ..., N F , j = 1, ..., N F that minimizes (2.36) subject to the marginal constaints given by (2.37) Here we use the evenly weighted new transformed finer ensemble in the cost function for Ti, j to keep the coarse and fine ensembles closely coupled.Finally the new transformed, evenly weighted coarse ensemble As mentioned previously, this method aims to couple the ensembles in each multilevel difference estimator, so that the sample covariance of the difference between them, . Given other assumptions, [5] shows that if β > γ, where γ is defined as the model cost exponent earlier in this paper, an optimal computational cost reduction can be reached.The value of β that this seamless coupling achieves appears, from numerical studies at the end of the paper, to be greater than the previous methodology outlined in [10] for multivariate problems.Greater values of β , seemingly offered by this new algorithm, therefore signify the potential of greater computational cost reductions in some cases.
The assignment problem in the algorithm in [10] could be interpreted as rearranging the transformed particles in both the coarse and fine ensembles in each multilevel difference estimator.By replacing this problem with the seamless combination of optimal transportation problems, we instead modify the particles to maintain as strong a coupling as possible between the two ensembles, throughout each step of the algorithm.Thus we are optimising over a continuous space of couplings instead of a discrete space of couplings.
One can verify that the sample means of the coupled, transformed preserve those from the weighted ensembles given before the transform by for the finer level [15], and for the coarse level.In terms of studying the asymptotic consistency of the higher moments of { X i,n C } N F i=1 , we consider the coarse transform as a combination of two ensemble transforms.[15] showed that the first transform generating the intermediate ensemble will satisfy weak asymptotic convergence (2.41) as N F → ∞, and by the same logic as N F → ∞.We demonstrate this numerically in the next section.Thus the coupled, transformed coarse ensemble in each of the difference estimators in the seamless MLETPF produce consistent estimators to statistics of X n C and therefore the telescoping formula in Section 2.4 is asymptotically consistent.
One can also use localisation to reduce the dimensionality of the cost functions (and to reduce the computational cost of the linear transport problems) in the same way as was mentioned earlier in the paper, by simply implementing the above algorithm for every individual component of a multivariate random variable.
When localisation is not employed, the forward model cost of the standard ETPF (with a fixed order of accuracy) must dominate over the optimal transportation cost of the seamless MLETPF in order for the overall computational cost of the ETPF to be reduced.Suppose we desire a mean-square-error (MSE) of O(ε 2 ), for a small ε; we require N 0 = O(ε −2 ) (for the seamless MLETPF) and N = O(ε −2 ) (for the standard ETPF).Also assume h L = O(ε) for a first order accurate discretization technique.Then the forward model cost of the standard ETPF is If we assume that 2N l = N l−1 for simplicity, the optimal transportation cost of the seamless MLETPF can be written as (2.44) using standard (arithmetico) geometric series results.Here c t is a constant.In the case where γ is not extremely large, C FM will become less than C OT as ε → 0, and the optimal transportation cost will dominate; forward model cost speed-ups from the multilevel framework would be worthless.However we expect that the seamless MLETPF will offer overall speed-ups in a certain ε-regime, provided that c m /c t ≫ 1.
3. Numerical Examples.The three numerical experiments used in this section are framed to test out the posterior consistency of the seamless MLETPF, the improved rate of variance decay for a non-localised filter estimate from the algorithm in [10], and the computational cost reductions for a fixed accuracy from localised ETPF estimators.In some of the numerical examples we need to estimate the rootmean-square-error (RMSE) of the MLETPF; we take this relative to an accurate (with variance / discretization bias much lower than any of the experimental MLETPF simulations) ETPF approximation.The details of the discretization level and sample size of this approximation will be given during these examples.The RMSE, against E[X n ] conditioned on the observations, or an accurate approximation of it as the case here may be, is estimated over the time Computational cost is here given as a runtime in seconds using a standard Python implementation of the algorithms.

Example (Consistency of posterior statistics).
Here, the consistency of the mean, variance, third and fourth moments of the coarse posterior after a single seamless transform / coupling step will be confirmed numerically.As the fine ensemble is equations are given by the 3-component chaotic nonlinear system in X = (x, y, z) with ρ = 28, σ = 10, β = 8/3, φ = 0.1.The scalar Brownian motion W in the above system will be the same for each component to maximise the impact of the strong nonlinearity in the equations, i.e. to make it a more challenging filtering example.Let X n be the solution to the above Lorenz-63 equations at time t n , and let X n l be the discretization of X n using the forward Euler numerical scheme with time-step h l = 2 −9−l .The seamless MLETPF estimator with N l = 2 8−l , l ∈ [0, 6], using the full optimal transport problems with no localisation, is used to compute an approximation to E[X n ].Here ∆t = 2 2 h 0 and we take n ∈ [1,1280].This coarsest time-step of h 0 = 2 −9 is used due to stability reasons.Observations are given by a measurement error with R = 0.25I, where I is the 3 × 3 identity matrix.
First, the example will be used to show to the improvement in the rate of variance decay of the coupling between fine and coarse ensembles with the level of hierarchy, , in a multivariate case.Figure 2 show the average rates of asymptotic decay of the sample covariance, from using 5 independent implementations of the seamless MLETPF estimator and the original algorithm in [10].We observe that the seamless algorithm now produces a variance decay close to β = 2.The rate is certainly improved from the original methodology, in [10], where β ≈ 1 is obtained.
Second, this example shows the effects that the increased rate of variance decay, shown in Figure 2, has on the overall forward model cost of the seamless MLETPF estimator.Given that optimal transportation algorithmic costs dominate over the cheap forward model (γ = 1) in this example, there isn't an overall speed-up in convergence offered by either implementation of the MLETPF from that of the standard ETPF.However the increased rate of variance decay in the seamless MLETPF estimator can be seen to produce further reductions in the forward model costs from that of the standard MLETPF and ETPF estimators.This means that there is an overall runtime benefit whenever the forward model cost of the standard ETPF dominates the optimal transportation costs.
We define a desired order of magnitude of RMSE, ε, and use this to determine the parameters L and N l as done in [10].From this, we set L = −log(ε −1 )/log(2) and N l = ε −2 2 −l for the standard implementation of the MLETPF and N l = ε −2 2 −(3/2)l for the seamless one.The choice of using the geometric decay in N l is primarily for simplicity and a proof of concept; we do not claim these are optimal as would be obtained by using the algorithm in [8].The different rates of geometric decay in N l are designed to exploit the different rates of sample variance decay shown in Figure 2, in order to gain a reduction in the growth of forward model cost for the seamless MLETPF for a fixed order of estimator variance ∑ L l=0 V l /N l , and thus accuracy.For the standard ETPF, N = ε −2 and L is set to be the same as in the MLETPF esti- mators.The reduction in the growth of forward model cost for the seamless MLETPF is shown in Figure 3, for decreasing values of ε.The rates, O ε −2 for the seamless implementation and O ε −2 log(ε) 2 for the standard implementation, are consistent with the analysis in [8] for when β > γ and β = γ respectively.Here computational cost is measured by the total number of floating point operations for the forward model computation.
Here X denotes the solution to the Lorenz 63 system.Results are shown for the seamless MLETPF and the previous algorithm of [10] in the red (with triangular points) and blue (with circular points) lines respectively.Asymptotes of linear and quadratic decay, with decreasing h l , are shown by black solid and dashed lines respectively.
Example (Lorenz 96 equations).The Lorenz-96 system is given by the following spatial discretization, (3.4) Here, F is a forcing term, typically taken to be F = 8 for chaotic behaviour, j ∈ [1,40], ∆ = 0.5, and W j are independent standard Brownian Motions, with σ 2 = 0.1.This independence allows the reduction in spatial correlations in this system and hence localisation, which will be used for this problem, has little effect on the performance of the assimilation, as in [10].For systems with strong spatial dependence structures, localisation will have more of an affect on the ETPF and it's multilevel counterparts [2].The choice of some parameters, such as the frequency of assimilation, is an important aspect to consider when choosing if localisation, and thus the ETPF is suitable for a particular problem.For example, if assimilation is more frequent, spatial dependence will be altered more frequently, that in turn could lead to a worse performing filter.
Let X n l ( j) represent the j'th component of X at t n (with time-index n), discretized by the forward Euler numerical scheme with time-step h l = 2 −8−l .The coarsest time-step, h 0 = 2 −8 , is used here for stability purposes.Here, ∆t = h 0 , and we take n ∈ [1,1280]; observations are given by using a measurement error variance of R = 0.25I, where I is the 40 × 40 identity matrix.This system will be used to test out the computational cost reductions of the seamless MLETPF, from that of the standard ETPF, and the standard MLETPF implementation, in finding estimates to statistics of X n of a pre-defined accuracy.Due to the use of localisation, the model cost will now dominate over the cost of the ensemble transform / coupling scheme and so we evaluate whether the seamless MLETPF offers overall computational cost reductions.As in [10], the use of localisation does lead to an inconsistent approximation to statistics of X n so we measure RMSE relative to a localised, high accuracy ETPF approximation, that the estimates are asymptotically consistent with, as stated at the start of this section.The discretization level of this approximation is L = 13 and the sample size is N = 40000.
The authors assume that one has already chosen to use the fully localised ETPF for this problem and are using the multilevel framework to improve the efficiency, so hence evaluating the convergence of the MLETPF to a localised ETPF approximation rather than the truth is sensible in this setting.If one were to compare to the truth here, the speed-up in convergence that would be seen in the multilevel cases would plateau due to the localisation bias; this is a problem inherent of localisation in general and not specifically of the multilevel framework.
Pre-defined values of L and N l will be used in the seamless and standard MLETPF estimators for a user-defined ε which determines an order of magnitude of RMSE as done in the previous example.We set L = ⌈− log(ε)/ log(2)⌉ and N l = ε −2 2 −(3/2)l for each ε respectively in both the seamless and standard implementations of the MLETPF.For the standard ETPF estimator, we set the sample size to be N = ε −2 and L to be the same as above.In Figure 4, the average decay of sample variance, V l , with increasing l is shown for both the standard and seamless implementations of the MLETPF.As was observed in [10], the standard implementation of the MLETPF achieves the same optimal quadratic rate of variance decay (as the seamless one) in this fully localised case, however the magnitude of variance is smaller for the seamless implementation.
The RMSE against computational cost for the standard and seamless implementations of the MLETPF estimator, as well as the standard ETPF estimator, approximating E[X n ], are shown for different ε values in Figure 5.One notes that in this problem, where γ = 1, the computational cost of the standard ETPF estimator follows O(ε −3 ) scaling.Computational cost reductions, down to approximately O(ε −2 ) scaling, are seen for both the standard and seamless implementations of the MLETPF estimator.However the computational costs for the seamless MLETPF estimator are of lower magnitude along this scaling than in the standard implementation of the MLETPF.Similar rates of convergence are shown for the three estimators approximating E (X n ) 2 in Figure 6.
4. Discussion and Summary.This paper has presented a seamless algorithm for an efficient version of the Ensemble Transform Particle Filter (ETPF), the multilevel ETPF (MLETPF), improving on an algorithm in [10].It is seamless in the sense that we use a combination of three optimal transport problems to transform and positively couple coarse and fine ensembles simultaneously.This replaced the assignment problem that was solved to recouple them, after transforming them each independently, in the aforementioned previous algorithm.Here X denotes the solution to the Lorenz 96 system.Results are shown for the seamless MLETPF and the previous algorithm of [10] (both localised) in the red (with triangular points) and blue (with circular points) lines respectively.An asymptote of quadratic decay, with decreasing h l , is shown by the dashed line.
The benefit of this change is that, from a proof of concept presented in this paper, it can be seen that using the seamless MLETPF, reduces the variance between the coarse and fine ensembles, and even the rate at which this variance decays with increasing levels of accuracy in some cases, from that of [10].This in turn leads to a lower overall variance of the MLETPF estimators, and thus can reduce the forward model cost of computing them for a fixed accuracy.
For cases where the forward model cost is low, the optimal transport cost is the computational bottleneck in both the ETPF and the multilevel equivalent unless localisation is used.Iterative methods, such as the Sinkhorn approximation in [17], can be used to reduce this algorithmic cost.They have been utilised in [6] by using a second-order accurate framework for the ETPF.This is an important future direction of this multilevel research, since the proposed algorithm in this paper involves a combination of optimal transport problems that could be solved using such methods.

Fig. 2 :
Fig. 2: Average, over 5 independent simulations and all assimilation steps n ∈ [1, 1280], of the sample variance TrV n l = Tr Cov[ X n l − X n l ] for l ∈ [1, 6].Here X denotes the solution to the Lorenz 63 system.Results are shown for the seamless MLETPF and the previous algorithm of[10] in the red (with triangular points) and blue (with circular points) lines respectively.Asymptotes of linear and quadratic decay, with decreasing h l , are shown by black solid and dashed lines respectively.

Fig. 3 :
Fig. 3: Forward model computational costs against RMSE for decreasing ε values, of the seamless and standard MLETPF estimators and the standard ETPF (without localisation) in the Lorenz 63 problem set-up.

Fig. 4 :
Fig. 4: Average, over 5 independent simulations and all assimilation steps n ∈ [1, 1280], of the sample variance TrV n l = Tr Cov[ X n l − X n l ] for l ∈ [1, 6].Here X denotes the solution to the Lorenz 96 system.Results are shown for the seamless MLETPF and the previous algorithm of[10] (both localised) in the red (with triangular points) and blue (with circular points) lines respectively.An asymptote of quadratic decay, with decreasing h l , is shown by the dashed line.

Fig. 5 :
Fig. 5: Root-mean-square-error (RMSE) against runtime of the approximations to E [X n ] from the standard ETPF and both implementations of the MLETPF for decreasing values of ε in the Lorenz 96 problem set-up.Rates of O(ε −3 ) and O(ε −2 ) of computational cost growth are shown in black dashed and solid lines respectively.

Fig. 6 :
Fig. 6: Root-mean-square-error (RMSE) against runtime of the approximations to E (X n ) 2 from the standard ETPF and both implementations of the MLETPF for decreasing values of ε in the Lorenz 96 problem set-up.Rates of O(ε −3 ) and O(ε −2 ) of computational cost growth are shown in black dashed and solid lines respectively.