Bounding stationary averages of polynomial diffusions via semidefinite programming

We introduce an algorithm based on semidefinite programming that yields increasing (resp. decreasing) sequences of lower (resp. upper) bounds on polynomial stationary averages of diffusions with polynomial drift vector and diffusion coefficients. The bounds are obtained by optimising an objective, determined by the stationary average of interest, over the set of real vectors defined by certain linear equalities and semidefinite inequalities which are satisfied by the moments of any stationary measure of the diffusion. We exemplify the use of the approach through several applications: a Bayesian inference problem; the computation of Lyapunov exponents of linear ordinary differential equations perturbed by multiplicative white noise; and a reliability problem from structural mechanics. Additionally, we prove that the bounds converge to the infimum and supremum of the set of stationary averages for certain SDEs associated with the computation of the Lyapunov exponents, and we provide numerical evidence of convergence in more general settings.

1 Introduction.Stochastic differential equations (SDEs) and the diffusion processes they generate are important modelling tools in numerous scientific fields, such as chemistry, economics, physics, biology, finance, and epidemiology [22].Stationary measures are to SDEs what fixed points are to deterministic systems: if the SDE is stable, then its stationary measures determine its long-term statistics.More concretely, both the ensemble averages and the time averages of the process converge to averages with respect to a stationary measure (which we call stationary averages).For the majority of SDEs, there are no known analytical expressions for their stationary measures.Consequently, large efforts have been directed at developing computational tools that estimate stationary averages and, more generally, at developing tools that can be used to study the long-term behaviour of SDEs.Among these, most prominent are Monte Carlo discretisation schemes, PDE methods (finite-difference, finite-element, Galerkin methods), path integral methods, moment closure methods, and linearisation techniques (see, e.g., [39,6,43,45]).
The purpose of this paper is to introduce a new algorithm that provides an alternative approach to the analysis of stationary measures.It uses semidefinite programming to compute hard bounds on polynomial stationary averages of polynomial diffusions.Our approach has distinct advantages: (i) it returns monotone sequences of both upper and lower bounds on the stationary averages, hence quantifying precisely the uncertainty in our knowledge of the stationary averages; (ii) no assumptions are required on the uniqueness of the stationary measures of the SDE; and (iii) the availability of high quality SDP solvers and of the modelling package GloptiPoly 3 drastically reduces the investment of effort and specialised knowledge required to implement the algorithm for the analysis of a given diffusion process.
1.1 Problem definition.We consider R n -valued diffusion processes X := {X t : t ≥ 0} satisfying stochastic differential equations of the form where the entries of the drift vector b : R n → R n and the diffusion coefficients σ : R n → R n × R m are polynomials.In the above, W := {W t : t ≥ 0} is a standard R m -valued Brownian motion and the initial condition Z is a Borel measurable random variable.We assume that the underlying filtered space (Ω, F, {F t } t≥0 , P) satisfies the usual conditions.
A Borel probability measure π on R n is a stationary (or invariant) measure of the dynamics (1.1) if Z ∼ π ⇒ X t ∼ π for all t ≥ 0, where we use the notation Y ∼ π to mean that the random variable Y has law π.The set of stationary measures of (1.1) is denoted P.
The problem we address here is how to estimate stationary averages of the form in a systematic, computationally efficient manner.We present an algorithm that yields bounds on averages (1.2) when f is a polynomial.Hence, our algorithm can be used to bound the moments of the stationary measures of (1.1).More precisely, the algorithm returns lower and upper bounds on the set B d f,G := {π(f ) : π ∈ P has finite d-th order moments and support contained in G} , where G is a given real algebraic variety G := {x ∈ R n : g 1 (x) = 0, . . ., g (x) = 0}, (1.4) for given polynomials g 1 , . . ., g .The case G = R n corresponds to = 0. Stationary averages of the form (1.2) and the set (1.3) are of broad interest: if X enjoys some mild stability and regularity properties [32], the stationary averages (1.2) provide succinct, quantitative information about the long-term behaviour of X.In particular, for almost every sample path t → X t (ω) (that is, P-almost every ω ∈ Ω), there exists a π ∈ P such that lim t→∞ see Theorem 2.2 for a formal statement.Furthermore, for appropriately chosen d, the set B d f,G is the set of limits (1.5) where t → X t (ω) is any sample path contained in G: where Ω G is the subset of samples ω ∈ Ω such that X t (ω) belongs to G for all t ≥ 0, see [32, §8].Similar considerations also apply to the ensemble averages E [f (X t )] (see the remark after Theorem 2.2).Therefore, bounds on the set B d f,G equip us with quantitative information on the long-term behaviour of the paths of X that are contained in G.
Remark (Linking the long-term behaviour to the stationary measures): Although our work is motivated by the study of the long-term behaviour of a diffusion X, the scope of this paper is restricted to the problem of bounding the stationary averages (1.2) and the associated set (1.3).It is important to remark that to connect the long-term behaviour of X with the set B d f,G (and the bounds our algorithm returns) it is necessary to establish separately: (a) the existence of stationary measures of (1.1) with support contained in G and the finiteness of their moments up to order d; (b) the convergence of the time averages, i.e. verify that the limit (1.5) holds for the diffusion at hand; (c) that, for the initial conditions of interest, X takes values in G.
A straightforward way to verify (c) is to apply Itô's formula and check that If the initial condition Z takes values in G, then X clearly takes values in G as well.Establishing (a) and (b) typically requires additional proofs beyond the scope of this paper, and has been studied extensively elsewhere (see [32,33,20]).We point out that whether or not conditions (a)-(c) hold, the algorithm still yields bounds on B d f,G .Without proving (a)-(c), it may simply be that the set (1.3) is empty, or that the relationships (1.5) or (1.6) do not exist.To make the paper self-contained, we recall briefly in Section 2.2 some simple conditions that we use in our later examples to establish (a) and (b).
1.2 Brief description of algorithm.Mathematically, our approach consists of four steps: (1) We derive a finite set of linear equalities satisfied by the moments of any stationary measure of (1.1) (see Lemma 2.3 and Section 3.1).Such a system of equalities is often underdetermined (see Example 3.3) and therefore admits infinitely many solutions.
(2) To rule out spurious solutions, we exploit the fact that the solutions must be the moments of a probability measure, hence must satisfy extra constraints (e.g. even moments cannot be negative).We use well known results [25] to construct semidefinite inequalities (so called moment constraints) that are satisfied by moments of any probability measure with support contained in G.This is done in Section 3.2.
(3) Exploiting the fact that f is a polynomial, we rewrite the stationary average π(f ) as a linear combination of the moments of π.
(4) By maximising (resp.minimising) the linear combination over the set of all vectors of real numbers that satisfy both the linear equalities and the semidefinite inequalities, we obtain an upper (resp.lower) bound on the set B d f,G .
Computationally, to find the upper (or lower) bound we solve a semidefinite program, a particularly tractable class of convex optimisation problems for which high-quality solvers are freely available online.The semidefinite constraints in (2), popularised by Lasserre and co-authors (see [25] and references therein), can be implemented via the freely-available, user-friendly package Glop-tiPoly 3 [13], which makes the approach described here accessible to non-experts.
Remark (Convergence of the algorithm): Concerning moment approaches, a question that often arises is that of convergence of the algorithm.In the context of this paper, this question takes the following form.Suppose that we want to obtain bounds on the set where d f is the degree of f .Note that if G is compact, then any measure with support on G has all moments finite, thus B is the only set of interest.As will become clear later, repeated applications of our algorithm yield both an increasing sequence of lower bounds and a decreasing sequence of upper bounds on B ∞ f,G .The algorithm is said to converge if these sequences converge to the infimum and supremum, respectively, of B ∞ f,G .In general, the algorithm presented in this paper is not guaranteed to converge in the above sense.However, our numerics indicate that the algorithm often converges in practice (see Examples 3.4,4.1 and 4.3).In Section 4.2, we do prove convergence for SDEs related to the Lyapunov exponents of linear ordinary differential equations (ODEs) perturbed by multiplicative white noise.The question of convergence is of theoretical interest, but regardless of its answer, the bounds computed are still valid and are often still appropriate for the application in question (e.g., see the examples in Section 4).
1.3 Related literature and contributions of the paper.Computational methods that yield bounds on functionals of Markov processes by combining linear equalities (arising from the definitions of the functional and the process) and moment constraints have appeared in the literature.We refer to this class of methods as generalised moment approaches (GMAs).The various GMAs differ in the type of Markov processes and moment constraints they consider.The ideas underlying GMAs were first discussed in [4,5] where they were used to obtain analytical bounds on moments for measure-valued diffusions.The first GMA was presented in E. Schwerer's PhD thesis [40] in the context of jump processes with bounded generators and reflected Brownian motion on the nonnegative orthant.In [15, §12.4], the authors present a GMA that yields bounds on the moments of stationary measures of discrete-time Feller Markov chains.In [12], analogous techniques are used to bound the moments of the stationary measures of diffusion approximations of the Wright-Fisher model on the unit simplex.GMAs have also been proposed to solve optimal control problems [14,11], to estimate exit times [10,26], and to price financial derivatives [27,7].
The contributions of this paper are as follows.Whereas [40,12,11] consider only stationary averages of specific SDEs, we introduce GMAs to the setting of general polynomial diffusions over unbounded domains-this requires setting up the technical background of Lemma 2.3.Also in contrast with [40,12,11], we employ moment constraints that lead to SDPs instead of linear programs.In Section 4.2, we prove convergence of our algorithm for SDEs related to the Lyapunov exponents of linear ODEs perturbed by multiplicative white noise.To the best of our knowledge, this is the first such result in the setting of stationary averages of diffusion processes.The remaining contributions are the applications of the algorithm to several examples of interest.Section 4.1 explains how the algorithm can be combined with the ideas underlying the Metropolis Adjusted Langevin Algorithm (a Markov chain Monte Carlo algorithm) to carry out numerical integration with respect to certain target measures, and then applies it to a simple Bayesian inference problem.In Section 4.2, we use our algorithm to obtain bounds on the Lyapunov exponents of linear differential equations perturbed by multiplicative white noise, and we show that in this case our approach is both sufficient and necessary (i.e., with enough computation power, it yields lower (upper) bounds arbitrarily close to the minimum (maximum) Lyapunov exponent, see Theorem 4.2).Finally, in Section 4.3 we explain how the algorithm can be extended to yield bounds on stationary averages π(f ) where f is piecewise polynomial, and we use this extension to tackle a reliability problem from structural mechanics.
2 Background and preliminaries.
2.1 Notation Throughout this paper we use the following notation: • E[•] denotes expectation with respect to the underlying probability measure P, and we use X t and X(t) interchangeably.
• Given a function h : R n → R, ∂ i h denotes its partial derivative with respect to its i th argument; ∂ ij h := ∂ i ∂ j h; ∇h denotes its gradient vector; and ∇ 2 h denotes its Hessian matrix.
• Suppose that M is a smooth manifold.C 2 (M) denotes the set of real-valued, twice continuously differentiable functions on M.
• For any two matrices A, B ∈ R n×m , denotes the trace inner product of A and B, and ||A|| := A, A denotes the Frobenius norm of A.
• Let N n be the set of n-tuples α := (α 1 , . . ., α n ) of natural numbers α i ∈ N. Let N n d be the subset of n-tuples such that |α| We define the sum of two tuples α, β ∈ N n d to be the tuple α + β := (α 1 + β 1 , . . ., α n + β n ).• The space of real-valued vectors indexed by N n d , {y : and we make no distinction between them.Similarly for the space of real-valued matrices indexed by N n d , {M : M αβ ∈ R, α, β ∈ N n d }, and R r(d)×r (d) .With this in mind, we denote the standard inner product on R r(d) as y, z := and the standard outer product on R r(d) as (y ⊗ z)w := z, w y, y, z, w ∈ R r(d) .
• Let α ∈ N n d and x ∈ R n .Monomials are denoted as  d) is the vector of coefficients of p.We denote the degree of any polynomial p with d p .
• Let µ be a Borel measure on R n .A vector y in R r(d) is the vector of moments of order up to d of µ if, for any α ∈ N n d , the α-th component of y is given by assuming that the integrals are well defined.
2.2 Stability and regularity properties of X.We now briefly review well known properties of X and a relevant drift condition used in the examples below.Throughout this section we assume that G is an (n − )-dimensional smooth submanifold of R n .For this to be the case, it is sufficient that the vectors ∇g 1 (x), ∇g 2 (x), . . ., ∇g (x) form a linearly independent set, for each x in G, where the g i 's are defined in (1.4).Firstly, smoothness of the components of b and σ imply that (1.1) has a unique strong solution X, which is defined up to a stopping time τ ∞ [47].The generator (or Kolmogorov operator) A associated with (1.1) is the second order differential operator x ∈ G, and where a := σσ T denotes the diffusion matrix of (1.1).It is well known that if A is a hypoelliptic operator on C 2 (G) (see [16]), then (1.1) generates a strong Feller Markov process.This is the regularity property of X we use in our examples below.Although this condition can be replaced with weaker ones (e.g., X being a T-process [32]), such alternative conditions usually require more work to establish in practice.The stability properties we require are summarised as follows: Condition 2.1 ( [32,33]).Suppose that the paths of the diffusion X are contained in G (that is, , and that either one of the following conditions holds: (i) the manifold G is compact.
(ii) (Drift condition) there exists a function u ∈ C 2 (G) and a constant c > 0 such that for each q ∈ R the sub-level set {x ∈ G : u(x) ≤ q} is compact and such that Au(x) ≤ −cu(x), holds for all x in G except those in a compact subset of G.
(ii) The SDE (1.1) has at least one stationary measure with support contained in G.
(iii) If Condition 2.1 (i) holds, then for any measurable f and almost every sample path t → X t (ω), the limit (1.5) holds, where π ∈ P has support contained in G and depends on the starting position of the path, Z(ω).Furthermore, equation (1.6) holds.
(iv) If G is not compact but Condition 2.1 (ii) holds, then π(u) < ∞ for any π ∈ P with support contained in G, and the same as in Theorem 2.2 (iii) is true for every measurable f such that Proof.In the case (i), it is easy to argue that the solution is globally defined.In the case (ii), global existence of the solution follows from [33, Thrm.2.1].The rest follows from Theorems 3.4 and 8.1 in [32], plus Theorem 4.7 in [33] in the case of the drift condition.
Remark (The ensemble averages): If, additionally to the premise of Theorem 2.2, the semigroup generated by (1.1) is aperiodic [32], then the analogous statements to Theorem 2.2 (iii) and (iv) hold for the ensemble averages E [f (X t )].In this case, we have that where π ∈ P depends on the law of the initial condition Z, see [32, Thrm.8.1].
2.3 A relationship between the generator of (1.1) and its stationary measures.The following technical lemma is necessary for the development of the algorithm presented in this paper.An application of Itô's formula shows that, if h ∈ C 2 (R n ), then is a local martingale.The generator A evaluated at function h and point x describes the rate of change in time of the expected value of h(X t ) conditioned on the event {X t = x}.If π is the law of X t , then π(Ah) describes the rate of change in time of E[h(X t )].It follows that if π is a stationary measure of (1.1), then the law of X t does not change in time, and thus we would expect that π(Ah) = 0. Unfortunately, for technical reasons, this is not always the case (see [8] for counterexamples).However, it is not difficult to find sufficient conditions on h such that π(Ah) = 0.
The following lemma gives one such condition specialised for polynomial functions h, which are the focus of this paper.For a proof of the lemma see Appendix A.
Lemma 2.3.Let π be a stationary measure of (1.1) whose moments of order d exist and are finite.
Let h be polynomial with degree , where b is the drift vector and a := σσ T is the diffusion matrix of (1.1).Then π(Ah) = 0.
3 The algorithm.Our algorithm constructs a tractable outer approximation 3).As shown below, the approximation we derive is the image of a spectrahedron (a set defined by linear equalities and semidefinite inequalities) through a linear functional.Finding the infimum and supremum of C d f,G reduces to solving two SDPs, which can be efficiently carried out using one of several high-quality solvers freely available.Since f,G .To introduce our method, we first take a closer look at the set B d f,G from two alternative perspectives.First, note that the set B d f,G defined in (1.3) is the image of the set P d G := {π ∈ P : π has finite d-th order moments and support contained in G} through the linear functional (on the vector space of signed measures) π → π(f ).It is straightforward to check that, as a subset of the vector space of signed measures, P d G is convex.Consequently, its image B d f,G is a (possibly unbounded) interval, which is fully described by its supremum and infimum (leaving aside whether or not B d f,G contains its endpoints).Alternatively, since f is a polynomial (with vector of coefficients f f f ), the set B d f,G is the image of the set d) : y is a vector of moments up to order d of a measure in P d G } through the linear functional (on R r(d) ) y → y, f f f .We now see some concrete examples of these sets.
Example 3.1.To introduce our ideas, we use an example for which there is an extensive body of results.Consider the two-dimensional SDE takes values in the circle of radius R, S 1 R , then the paths of X remain in S 1 R .Using Hörmander's condition [16] it is easy to verify that, for each R > 0, the generator A is a hypoelliptic operator on C 2 (S 1 R ).By compactness of S 1 R , Condition 2.1 (i) is satisfied, and Theorem 2.2 states that, for each R > 0, (3.1) has at least one stationary measure with support contained in S 1 R .It is also well known [21] that for each R, (3.1) has only one such measure π R , which is the uniform distribution on S 1 R .Therefore, for a given R and Note that if R = 0, then X ≡ 0, hence π 0 := δ 0 , the Dirac measure at zero, is also a stationary measure of (3.1).Consequently, for any d: ∅ for all other varieties G.By the symmetry of the measure π R , it is easy to show that where Γ(•) denotes the gamma function.It is now straightforward to describe the sets Y d G .For instance, consider the sets containing the r(2)-dimensional vectors of moments up to order d = 2 defined as y := (y (0,0) , y (1,0) , y (0,1) , y (2,0) , y (1,1) , y (0,2) ) ∈ R 6 .Then we have Using the above descriptions of the sets P d G together with the map π → π(f ) or, alternatively, the sets Y d G together with the map y → y, f f f , we can deduce any projection of interest In the above example, we could obtain the sets P d G , Y d G and their projections B d f,G directly from (1.1).However, this is difficult in general.Indeed, results from real algebraic geometry show that optimising over the cone of vectors whose components are moments of a measure is an NP-hard problem [25].We believe that, except for trivial cases, the same holds for Y d G which is a subset of this cone.Instead, we construct here a spectrahedral outer approximation G consists of solving an SDP, a polynomial-time problem.Explicitly, O d G is a subset of R r(d) defined by linear equalities and semidefinite inequalities such that Because the outer approximation is a convex set, its image through the linear functional y → y, f f f is an interval that contains Hence our task is reduced to obtaining the outer approximation O d G .We do this in two steps: • In Section 3.1, we use Lemma 2.3 to construct a set of linear equalities satisfied by the moments of any stationary measure of (1.1).
• In Section 3.2, we construct a set of linear equalities and a semidefinite inequality satisfied by the moments of any unsigned measure with support on G.
The outer approximation O d G then consists of the set of vectors in R r(d) that satisfy both of the above.
3.1 Linear equalities satisfied by the moments of stationary measures.By assumption, both the drift vector and the diffusion coefficients in (1.1) are polynomials.Therefore, if h is a polynomial, Ah is also a polynomial.Suppose that y belongs to Y d G and choose any measure π in P d G that has y as its vector of moments of order d.
In words, every vector in Y d G satisfies the r(d − d A ) linear equalities defined by (3.4).At this point, it is worth remarking that the conditions on h and π in Lemma 2.3 are only sufficient but not necessary for π(Ah) = 0 to hold, as the following example shows.
Example 3.2.Let λ be a positive even integer, and consider the one-dimensional SDE It is straightforward to verify that Condition 2.1 (ii) holds with u(x) := x λ and G := R, and to use Hörmander's condition to establish that the generator of (3.5) is hypoelliptic on C 2 (R).From Theorem 2.2, it follows that: (3.5) has at least one stationary measure; that all of its stationary measures have moments up to order λ; and that (1.5) holds for any f ∈ R[x] λ .From (3.4), we deduce that given the moments of any such stationary measure y ∈ R λ+1 , then We are only interested in solutions to these equations that the are moments of a probability measure.Hence we can append the normalisation y 0 = 1 and solve (3.6) to obtain In fact, (3.7) holds for k = 1, . . ., λ (instead of only for k = 1, . . ., λ − 2).This is easily deduced by solving the Fokker-Planck equation associated with (3.5) and showing that the density of a stationary measure of (3.5) is given by the inverse Gamma distribution 1 λ! x −λ−2 e − 1 x .Indeed, the moments of this distribution are given by (3.7) for k = 1, . . ., λ. Additionally employing the Support Theorem of Stroock and Varadhan [2, Thrm.6.6],we can conclude that this is the only stationary measure of (3.5).In conclusion, the moments of any stationary measure of (3.5) satisfy (3.6) for k = 1, . . ., λ although Lemma 2.3 only implies that they are satisfied for k = 1, . . ., λ − 2.
For most SDEs, y 0 = 1 together with the equations (3.4) defines a set of underdetermined linear equations as the following example illustrates.
It is straightforward to verify that Condition 2.1 (ii) is satisfied with u(x) := e x 2 /2 and G := R, and to use Hörmander's condition to establish that the generator of (3.8) is hypoelliptic on C 2 (R).Theorem 2.2 then establishes that (3.8) has globally defined solutions; that it has at least one stationary measure; that all stationary measures have all moments finite; and that (1.5) holds for any f ∈ R[x].Equations (3.4) in this case read: By appending y 0 = 1 to the above, we can only solve for y 3 = y 0 /2 = 1/2.The set of equations formed by the first k ≥ 2 conditions (together with y 0 = 1) is underdetermined, and no other moment can be solved for.
3.2 Moment conditions.That the moment equations (3.4) are underdetermined in the above example is essentially the same issue that moment closure methods attempt to address (see [39, §3.4] for a review).These methods 'close' the equations (3.4) by heuristically removing the dependence of the first few equations on higher moments.We do not follow this approach here.Instead, we overcome this issue by exploiting the fact that we are not interested in all the solutions to the equations (3.4), but only in those that are the vector of moments of a probability measure with support contained in G.There are various tractable conditions, known as moment conditions, which are satisfied by the vector of moments of any measure with support contained in G, but not in general by an arbitrary vector of real numbers.For example, trivially, the even moments of an unsigned Borel measure on R n must be non-negative.We now describe in more detail the moment conditions we use in the rest of this paper.
Let π be a measure with vector of moments y of order d, and let g be a polynomial of degree d.
By the definition (1.4), g j is zero everywhere in G for all j = 1, 2, . . ., .Thus y, g j h g j h g j h = 0 for every j and h checking that y satisfies y, g j h g j h g j h = 0 for any given h ∈ R[x] d−dg j is equivalent to checking that y satisfies the following r(d − d g j ) equations: To these linear equations, we append a semidefinite inequality that stems from the fact that probability measures are unsigned.Let h be any polynomial of degree s(d) := d/2 .Then it follows that Since h 2 is a non-negative function we have that where the moment matrix denotes the element-wise integration of the matrix m s(d) ⊗ m s(d) .Note that the entries of the moment matrix are a function of the moments of π: Since We then combine (3.4), (3.9), (3.11) and the normalisation y 0 = 1 to obtain our outer approximation: d) : From (3.3), it follows that the projection . Therefore, we have the following bounds: x,R .To find bounds on the mean of the stationary measures of the SDE we construct the outer approximations of B ∞ x,R , as described above.The first few such approximations are: .
Since a matrix is positive semidefinite if and only if all its principal minors are non-negative, it follows trivially that Hence optimising over these sets yields uninformative bounds: x,R = ∞.The higher order approximations d > 4, however, lead to non-trivial bounds (Table 1).To obtain the endpoints of the higher order approximations C 4 x,R , C  4 Applications.We now consider three applications of the algorithm.To compute the bounds presented in this section we used the modelling package GloptiPoly 3 [13] to construct the SDPs corresponding to the outer approximations (3.12) and the solver SDPT3 [46] to solve the SDPs.All computations were carried out on a desktop computer with a 3.4 GHz processor and 16GB of memory running Ubuntu 14.04.
4.1 Langevin diffusions, numerical integration, and an inference problem.The Metropolis Adjusted Langevin Algorithm (MALA) [37] is a popular Markov chain Monte Carlo algorithm (MCMC).MALA can be used to estimate integrals with respect to measures of the form where dx denotes the Lebesgue measure on R n , v : R n → R is a smooth confining potential, and Z v is the normalising constant Z v := R n e v(x) dx.It is well known that π v is the unique stationary measure of the Langevin diffusion The SDE (4.2) has globally defined solutions and, regardless of the initial condition, the limit (1.5) holds with π := π v for all π v -integrable functions f [37].MALA proceeds by discretising X, adding a Metropolis accept-reject step to preserve stationarity of π v , and simulating the resulting chain.
The time averages of the simulation then converge to the desired average [37].Since π v is the unique stationary measure of (4.2), we can use our algorithm to directly compute bounds on π v (f ) when both f and v are polynomials, circumventing any discretisation or simulation.We illustrate this idea with the following simple Bayesian inference problem.
Example 4.1.The scalar noisy time-varying recurrence equation is often used to benchmark parameter and state estimation algorithms [9,3].For simplicity, we assume that the state {z k : k = 1, . . ., N } is observable and we focus on the problem of estimating the parameters p 1 , p 2 , and p 3 .The additive noise {ξ k : k = 1, . . ., N } is typically taken to be an i.i.d.sequence of normally distributed random variables.Since Gaussianity of random variables is not important in our algorithm, we instead choose {ξ k : k = 1, . . ., N } to be an i.i.d.sequence of random variables with bimodal law where π u ξ is as in (4.1), see Fig. 1a.Choosing parameters p := (p 1 , p 2 , p 3 ) = (0.5, 2, 1) and z 0 = 2, we use (4.3) to generate N samples z := {z 1 , z 2 , . . ., z N }, see Fig. 1b.The inference problem is to use the generated samples z and the initial condition z 0 to estimate the parameters p.
Taking a Bayesian perspective, we first choose a prior distribution µ 0 over the parameters and then we extract information from the posterior distribution µ p|z , see [42].Our algorithm can be used to this end if the prior µ 0 is of the form (4.1) with a polynomial potential: and u 0 is a polynomial.
From Bayes' formula, the posterior also takes the form (4.1): We can then use our algorithm to yield lower and upper bounds on the posterior means µ p|z (p 1 ), µ p|z (p 2 ) and µ p|z (p 3 ), and upper bounds on the total variance of the posterior distribution µ p|z .For simplicity, we chose our prior µ 0 to be a unit variance zero mean normal distribution, i.e., u 0 (p) := − ||p|| 2 /2.The results are shown in Fig. 1c,d.For N ≥ 15 samples, we obtain small upper bounds on the total variance (two orders of magnitude smaller than the lower bounds on the posterior means).For this reason, we expect the posterior distribution to resemble a Dirac measure at the vector of posterior means, indicating that the posterior means are appropriate estimators of the parameters (as confirmed in Fig. 1c), thus solving our inference problem.
), and combining these bounds with the lower bounds computed for the posterior means.In total, 847 bounds were computed taking a total CPU time of 720 seconds, averaging 0.85 seconds per bound.
It is important to remark that Example 4.1 can be solved equally well with MCMC methodsindeed, MCMC algorithms scale better than ours with the dimension of the target measure π v .However, our alternative approach presents some attractive features: it is fast (see caption of Fig. 1) and simple to implement; no tuning of the algorithm is required (e.g., choosing the discretisation step size in MALA); and it delivers deterministic bounds on the integrals of interest instead of stochastic estimates (hence avoiding issues concerning the convergence of MCMC simulations).Our algorithm can also provide useful information in situations where MCMC methods face difficulties; in particular when the target distribution has several isolated modes.In such cases, MCMC algorithms get stuck at one of the modes and do not explore the rest of the target distribution.Our numerical observations suggest that our algorithm is also affected by the presence of isolated modes, producing a large gap between the upper and lower bounds for the desired integrals (since each bound is stuck at a different mode).The presence of such large gaps can alert the practitioner to the existence of isolated modes, something which is often not obvious for target distributions dimension three or more.Methods designed to deal with isolated modes, such as simulated-tempering [3], can then be used instead of standard a Metropolis-Hastings MCMC method.
4.2 Lyapunov exponents of linear SDEs driven by multiplicative white noise.In practical applications involving systems of linear ordinary differential equations (ODEs), we are interested in situations where the parameters are perturbed by Gaussian white noise.In those cases, we obtain the following class of linear stochastic differential equations where A ∈ R n×n , B i ∈ R n×n and W i := {W i (t) : t ≥ 0} are m independent standard Brownian motions on R. It is well known that (4.4) has globally defined solutions.Furthermore, there exists a jointly continuous process is the unique solution of (4.4) (see [17,Thrm. 21.3]).
In 1967, Khas'minskii [21] made the following observation.Applying Itô's formula twice, he found that the projection of X Z t onto the unit sphere, Λ Z t := X Z t / X Z t , satisfies the SDE where where Suppose that the generator of Λ Z := {Λ Z t : t ≥ 0} is hypoelliptic on C 2 (S n−1 ), where S n−1 := {x ∈ R n : ||x|| = 1} denotes the unit sphere in R n .Since Λ Z , by definition, takes values in S n−1 , Theorem 2.2 tells us that (4.5) has at least one stationary measure, and together with (4.6) and (4.7) that for P-almost every sample path t → X Z (t) there exists a stationary measure π of (4.5) such that Typically, the above integral is estimated by choosing an appropriate discretisation scheme for (4.4), simulating the resulting chain, and computing the corresponding time average [44].We instead exploit the fact that u 0 , . . .u m and Q are all polynomials and apply our algorithm on (4.6) to compute bounds for π(Q).In particular, for any d ≥ d Q we have that: which implies that i.e., the equilibrium solution X 0 ≡ 0 of (4.4) is almost surely asymptotically stable if η d Q,S n−1 < 0, and almost surely asymptotically unstable if ρ d Q,S n−1 > 0. Therefore, our algorithm applied to (4.5) yields a sufficient test for the asymptotic stability or instability of (4.4).The method also yields a necessary test for asymptotic stability in the following sense: Theorem 4.2.Suppose that the generator of Λ Z is hypoelliptic on C 2 (S n−1 ).Let where the infimum and supremum are taken over the set of initial conditions Z that are Borel measurable random variables on R n .For sufficiently large d, Proof.Let A denote the generator of Λ Z and not of X Z .The theorem is proved only for the lower bounds ρ d Q,S n−1 .The proof for the upper bounds is identical.The proof proceeds in three steps: 1. We show that the set of limits S := {λ(Z) : Z is a Borel measurable random variable on R n } is the same as the set 2. We show that the equalities (3.4) fully characterise the stationary measures of (4.5), in the sense that if π is a Borel probability measure with support contained in S n−1 such that then π is a stationary measure of (4.5).
3. We then only have to apply Theorem 4.3 in [25], which shows that, for large enough d, π is a Borel probability measure with support contained in S n−1 that satisfies π(Ax α ) = 0, ∀α ∈ N n .
Then (3.4) fully characterises the stationary measures of (4.5), which implies that For the detailed proof, see Appendix B.
Example 4.3.We exemplify the use of our algorithm in computing Lyapunov exponents through a classic example from [21].The question of whether an unstable linear system of ODEs could be stabilised by physically realisable multiplicative noise (i.e., multiplicative noise in the sense of Stratonovich) received ample attention in the 1960s.It was shown that this cannot be achieved in one-dimensional systems, and it was hypothesised that it could not be achieved in higher dimensional systems either [30].Khas'misnkii [21] disproved this hypothesis with the following counterexample: where c 1 > 0, c 2 < 0, σ > 0, and •d denotes the Stratonovich differential.For this two-dimensional SDE, the projection (4.5) lives in a one dimensional manifold (the unit circle), and thus by changing to polar coordinates one can find analytical expressions for its stationary measures.In this case there is a unique stationary measure and Khas'minskii then argued that one can always find a sufficiently large (in absolute value) c 2 and σ so that π(Q) < 0, i.e., so that (4.10) is stable.Instead, we can verify Khas'minskii's findings for given c 1 and c 2 and σ by solving for the bounds (3.13).As an example, we analysed the system (4.10) with c 1 = 1 and c 2 = −30.From the bounds presented in Fig. 2, we conclude that, for these parameters, (4.10) is stable for 3 σ 3.7 and unstable otherwise.
In Example 4.3, we could have evaluated (4.11) numerically instead of computing (3.13).However, such analytical expressions for stationary measures of (4.5) are not available in higher dimensions.As explained in [1] (see also [48]): "The direct use of Khas'minskii's method to higher dimensional systems has not met with much success because of the difficulty of studying diffusion processes occurring on surfaces of unit hyperspheres in higher dimensional Euclidean spaces".Our approach complements Khas'minskii's procedure: it is simple to implement; the number of stationary measures of (4.5) is not a limitation; and the fact that the measures have support on the unit sphere is an advantage instead of a disadvantage, as Theorem 4.2 shows.4.3 Piecewise polynomial averages, a noisy nonlinear oscillator and structural reliability problems.Our algorithm can be extended to bound stationary averages π(f ) where f is a piecewise polynomial of the type Here f i are polynomials, 1 A is the indicator function of set A, and K i are N disjoint sets in G defined by This type of extension was first discussed in [24], and has been considered in [25,27,7].Many stationary averages of interest can be written in the above form.For instance, if Extension of the algorithm.
We follow here similar arguments to those presented in [25,27,7].Let π be a stationary measure of (1.1) with support in G and with moments of order d.Let π i be the restriction of π to set K i , and let π 0 denote the restriction to K c := G\ ∪ i K i , i.e., to the complement in G of the union of all K i s.Clearly, Let y 0 , y 1 , . . ., y N ∈ R r(d) denote the (N + 1) vectors of moments of π 0 , π 1 , . . ., π N , respectively.From the definition (4.12) it follows that The normalisation condition takes the form: Similarly to (3.4), the proof of Lemma 2.3 tells us that the stationarity of π implies that The vectors of moments y i ∈ R r(d) also fulfil similar conditions to (3.9) and (3.10), in particular: M s(d) (y i ) 0, ∀i = 0, . . ., N. (4.17) Furthermore, the definition (4.13) of the sets K i leads to two additional sets of conditions.Firstly, the vectors of moments y i ∈ R r(d) fulfil Secondly, using similar arguments as those leading to (3.10), we obtain additional moment conditions.Let q be a polynomial that is non-negative on K i and p is any polynomial of degree s(d − d q ).Then we have q(x)(p(x)) 2 = (q(x) m s(d−dq) (x) ⊗ m s(d−dq) (x))p p p, p p p .
Since π i has support in K i 0 ≤ π i (q p 2 ) = π i (q m s(d−dq) ⊗ m s(d−dq) )p p p, p p p = M s(d−dq) (θ q y i ) p p p, p p p (4. 19) where θ q : R r(d) → R r(d−dq) is the shift operator (θ q y) α = β∈N n dq q q q β y α+β , and the localising matrix is the element-wise integration of the matrix q m s(d−dq) ⊗ m s(d−dq) .Since (4.19) holds for all p ∈ R[x] s(d−dq) , the localising matrix M s(d−dq) (θ q y) is positive semidefinite.From the definition (4.13), q i k is nonnegative on K i , hence we obtain our additional set of moment conditions: Remark (Support of π 0 ).Notice that C d f,G does not include conditions on y 0 related to the support of π 0 .The reason why is that, in general, K c has no simple description of the form (4.13).If such a description exists, extra constraints can easily be appended.If such constraints are lacking, there is an unfortunate consequence: C d f,G always contains the origin, since (y 0 , y 1 , . . ., y N ) := (y, 0, . . ., 0) satisfies all the constraints in (4.21).Consequently, our extended algorithm only yields informative bounds if f is nonnegative (or nonpositive) and, in this case, only the upper (resp.lower) bound will be informative.However, for certain purposes this may be sufficient, as the following example demonstrates.
Example 4.4.We consider the analysis of a noisy nonlinear oscillator in relation to reliability problems of structural mechanics.Structures (e.g., buildings, bridges) perturbed by random forces (e.g.waves, earthquakes) are often modelled with the stationary response of a non-linear oscillator driven by Gaussian white noise [6,39].A typical such model used in the literature is the Duffing oscillator where Y t ∈ R describes the deviation of the structure from its standing point at time t, and W := {W t : t ≥ 0} is a standard Brownian motion.Structural reliability examines whether the structure will bend past a critical value of deviation, and how often this event should be expected to occur [39].First passage times (i.e., the amount of time it takes for the structure to bend past the critical deviation) and extreme-value probabilities (i.e., how likely the building is to bend past the critical deviation in a given interval of time) are typically investigated.Passage times feature more prominently in the structural mechanics literature since they are more amenable to analysis [6].
Here, we use our extended algorithm to find upper bounds on extreme-value probabilities and on the average fraction of time the structure spends bent beyond the critical deviation.where dy × d ẏ denotes the Lebesgue measure in R 2 .Throughout this section we assume that the process is at stationarity, i.e., (Y t , Ẏt ) ∼ π, ∀t ≥ 0. We begin by re-writing (4.22) in Itô form where X 1 := Y and X 2 := Ẏ .By Birkhoff 's Ergodic Theorem [2, §3], the average fraction of time that the building spends bent beyond the critical deviation u converges to

.24)
The extreme-value probability is defined as where [0, T ] is a given time interval of interest.For sufficiently high u, it is shown in [29] that the up-crossing events become independent; hence, the number of up-crossings in the interval [0, T ] has a Poisson distribution with mean v u T , where v u is the mean threshold crossing rate of u.
For sufficiently regular stationary processes, the mean up-crossing rate can be obtained from Rice's formula [34,41] v u := π x 2 1 {x∈R 2 :x 1 =u,x 2 ≥0} , (4.26) whence it follows that P u ≈ 1 − e −vuT .(4.27) In the computations below, we characterise this regime and consider crossings over high deviations u (at least three times larger than the standard deviation of π); hence we assume that (4.27) holds exactly.From (4.24) and (4.26), it is clear that our extended algorithm can provide upper bounds on v u and F u .These bounds are then used with equation (4.27) to bound P u .The standard deviation of π is σ ≈ 0.761, as computed directly from (4.23) or using our unmodified algorithm.Following [6], we considered a time interval T = 100 and critical deviations u varying from 3σ to 5σ.The largest SDP successfully solved by SDPT3 was d = 14, and we computed 14 bounds in a CPU time of 403 seconds, averaging 29 seconds per bound.The upper bounds computed with our algorithm are shown with the exact values computed using (4.23) in Tables 2  and 3.Although the upper bounds are orders of magnitude greater than the true P u and F u , the bounds could be useful for practical purposes.For instance, our bounds state that the probability of the structure bending further than five standard deviations at any point over the interval of time [0, 100] is less than 0.011%, and that the structure will spend less than 2 millionth's of a percent of that interval bent beyond this deviation.
This example was chosen so that exact values of F u and P u could be computed directly from (4.23), so as to evaluate the quality of the bounds.For most oscillator models, no such analytical expressions are available, and it is often not even clear how many stationary measures exist.Our method applies equally to these other oscillator models.5 Concluding remarks.In this paper, we have introduced an algorithm based on semidefinite programming that yields upper and lower bounds on stationary averages of SDEs with polynomial drift and diffusion coefficients.The motivation behind our work is the study of long-term behaviour of such SDEs.As explained in the introduction, additional work is required to link the bounds obtained by our algorithm with this long-term behaviour.Typically, a drift condition must be verified by finding an appropriate Lyapunov function [33].For polynomial drift vectors and diffusion matrices, one can also employ semidefinite programming to search for these Lyapunov functions (see, sum of squares programming approaches [35,36]).In many respects, these approaches are dual to the method we describe in this paper, see [28,25,8,18,19] for more on the connections.
Our algorithm is also applicable to SDEs whose diffusion coefficients σ are not polynomial, but whose diffusion matrix a := σσ T is polynomial (e.g., the Cox-Ingersoll-Ross interest rate model in financial mathematics).We have concentrated on polynomial diffusion coefficients for simplicity, in order to guarantee uniqueness of solutions.Furthermore, our algorithm can be extended to SDEs with rational drift vector and diffusion matrix-one must just carefully choose polynomials h such that Ah is still a polynomial.
Our choice of moment constraints was motivated by the convenience of use of the modelling package GloptiPoly 3.However, there is a wide selection of moment conditions, some of which lead to easier conic programs (e.g., linear programs or second-order cone programs) [25,26].We also restricted ourselves to stationary measures with supports contained in algebraic varieties.We did this to simplify the exposition and because the applications chosen did not require more generality.However, from Section 4.3 it is clear that a similar algorithm can be constructed for measures with supports in so called basic semialgebraic sets of the form (4.13) [25].Such an approach could be advantageous for Example 3.3-using Stroock Varadhan's Support Theorem it is not difficult to deduce that any stationary measure of those SDEs must have support on the nonnegative semiaxis [0, ∞).
Lastly, a practical issue relating to numerical aspects of our algorithm, and of GMAs in general.In contrast with other approaches, our algorithm runs in polynomial time-its computational complexity follows from that of primal-dual interior point algorithms [23].However, in our experience, the applicability of the algorithm is affected by certain numerical issues.Specifically, the SDPs that arise from moment problems can be badly conditioned, causing the solvers to perform badly for medium to large problems.We believe that this is a consequence of using raw moments as the basis of the space of moment vectors, as is done in most GMAs.Indeed, the order of magnitude of the moments of a distribution varies rapidly (e.g., see (3.7) in Example 3.2).Since the feasible set O d G over which we optimise contains such a vector of moments and these sets are often compact [25], we expect large discrepancies in the order of magnitude of the entries of our feasible points y and of the moment matrix M d (y).This can lead to a bad condition number of M d (y) and consequently to poor performance of the solver.Improvements could be achieved by using an orthonormal basis with respect to the measures of interest, but this is not easy in practice; not only do we usually have little a priori information about the measures to guide our choice of basis, but the necessary modifications of the algorithmic implementation are substantial and the package GloptiPoly 3 could not be used in its current form.In our experience, a simple way to mitigate the bad conditioning is to scale the entries of the vectors y by ỹα := y α /z α , where z ∈ R n + .This is equivalent to scaling by the moments of a Dirac measure at z, so z should be chosen so that the entry z i is close to the absolute value of the i th component of the mean of the measure of interest.A similar scaling was employed in [10].It is then straightforward to show that ỹ satisfies the same semidefinite inequalities as y, and all that is left to do is to rewrite the equality constraints in terms of the rescaled variables ỹ. for any t > 0. Since π is a stationary measure of (1.1), E [h(X t )] = E [h(X 0 )] = π(h) and so the above implies Since h is of degree less or equal than d−max{d b i , d a ij }, our assumptions on π imply that π(|Ah|) < ∞.Thus, we have that Ah(X s ) is integrable with respect to P × λ t , where λ t denotes the Lebesgue measure on [0, t].Choosing any t > 0 and applying Fubini's Theorem we obtain We now need to argue that D h is indeed a martingale.To show that D h is a martingale it suffices to show that E D h t F s = D h s for all t ≥ s.Equivalently that E 1 A D h t = E 1 A D h s for any A ∈ F s and t ≥ s.We do this by finding a sequence of martingales {D hm : m ∈ Z} such that for every t ≥ 0, {D hm t : m ∈ Z} is dominated by a P-integrable random variable and such that {D hm t : m ∈ Z} converges almost surely to D h t .With such a sequence at hand, we can use dominated convergence and the martingale property of D hm to establish the desired result: Thus, all that remains is to construct the sequence {D hm : m ∈ Z}.We do so by using the fact that if g ∈ C 2 (R n ) is compactly supported, then D g is a martingale, see [38, §V].From the definition of p we have that the righthand side of the above is less or equal than (2l) and m d (x) : R n → R r(d) denotes the vector of monomials of degree d or less.Hence the α-th component of the r(d)dimensional vector m d (x) is given by (m d (x)) α := x α .Let R[x] d denote the vector space of real polynomials on R n of degree at most d.The set {x α : α ∈ N n d } is a basis (known as the canonical or monomial basis) of R[x] d , and so we can write any polynomial p ∈ R[x] d as p(x) = α∈N n d p p p α x α = p p p, m d (x) , ∀x ∈ R n , where p p p := (p p p α ) α∈N n d in R r( (3.10)  holds for all h ∈ R[x] s(d) , M s(d) (y) is positive semidefinite: M s(d) (y) 0. (3.11)

Figure 1 :+ p 2 2 + p 2 3 )+p 2 2 +p 2 3
Figure 1: (a) The density of the noise ξ.(b) Sample path of length N = 250 generated to estimate the parameters p.The noise was generated by inverting the cdf of µ ξ numerically (uniform grid on [−10, 10] with step size of 10 −4 ) and drawing independent samples from a uniform [0, 1] distribution.(c) Upper and lower bounds on the means µ p|z (p 1 ), µ p|z (p 2 ), µ p|z (p 3 ) of the posterior distribution µ z|p (solid lines) plotted against the number of samples N = 10, 12, . . ., 250 of z used to generate the posterior.The actual values of the parameters used to generate the samples are shown with dashed lines.The upper and lower bounds were computed by solving (3.13) using d = 5 and the appropriate objective f (x) = x 1 , x 2 , or x 3 .The gap between the upper and lower bounds was always smaller than 10 −2 and hence the upper and lower bounds are indistinguishable in the plot.(d) Upper bounds on the total variance of the posterior distribution, var p|z= µ p|z (p 2 1 + p 2 2 + p 2 3 ) − µ p|z (p 1 ) 2 − µ p|z (p 2 ) 2 −µ p|z (p 3 ) 2 .These were obtained by computing upper bounds on µ p|z (p 2 1 +p 2 2 +p 2 3 ) (solving (3.13) with d = 5 and f(x) = x 2 1 + x 2 2 + x23 ), and combining these bounds with the lower bounds computed for the posterior means.In total, 847 bounds were computed taking a total CPU time of 720 seconds, averaging 0.85 seconds per bound.

Figure 2 :
Figure 2: Upper and lower bounds on the Lyapunov exponents of the SDE (4.10) with c 1 = 1 and c 2 = −30 computed for σ = 0.2, 0.3, . . ., 4.5 by solving (3.13) with d = 16.The gap between the lower and the upper bounds was always smaller than 10 −3 and hence the two sets of bounds are indistinguishable in the plot.In total 88 bounds were computed for a total CPU time of 364 seconds, averaging 4.1 seconds per bound.

y 2 1−y 2 if |y| < 1 0 otherwise .(
For any natural number m let h m (x) := φ(x 1 /m)φ(x 2 /m) . . .φ(x n /m)h(x) where φ is the smooth compactly supported function φ(y) := exp − By definition of h m , we have that D hm t tends almost surely to D h t and so we just need to show that D hm t is dominated by a P-integrable random variable.Since ∂ z φ(z/m)| z=y = ∂ x φ(x)/m| x=y/m , and ∂ zz φ(z/m)| z=y = ∂ xx φ(x)/m 2 | x=y/m , and φ, ∂φ, and ∂ 2 φ are all bounded (since they are continuous functions non-zero only on the compact set [−1, 1]) we have that|Ah m | ≤ β |hb i | + |∂ i hb i | + n i,j=1 |ha ij | + |∂ i ha ij | + |∂ j ha ij | + |∂ ij ha ij |  where β is a constant that depends on the maximums of φ, ∂φ, and ∂ 2 φ.Since all the polynomials in the right-hand side are of degree d or less, the right-hand side is π-integrable.Thus, the sequence of random variables {D hm t } m∈Z + is dominated by|h(X t )|+|h(X 0 )|+β t 0 n i=1 (|h(X s )b i (X s )|+|∂ i h(X s )b i (X s )|+ n i,j=1 |h(X s )a ij (X s )|+|∂ i h(X s )a ij (X s )|Since for any continuous function f and i = 1, . . ., n, it is the case that|G i f | ≤ G i |f |, we have that sup x∈M |∂ α h(x) − ∂ α q(x)| = sup x∈M ∂ α h(x) − ∂ α q(x) ≤ sup x∈M G 2−α ∂ 2 h − p (x) ≤ G 2−α sup x∈M ∂ 2 h(x) − p(x) .

Table 1 :
Bounds on the mean of the stationary measures of the SDE (3.8).In total, 40 bounds were computed taking a total CPU time of 10.3 seconds, averaging 0.26 seconds per bound.For many SDEs, like those in Examples 3.2 and 3.3, all stationary measures have moments of some order d.If G is compact, d = ∞; otherwise, such a d can be found by verifying a drift condition like the one in Condition 2.1.In such cases, B k f,G = B d f,G for all d f ≤ k ≤ d.However, as Example 3.4 shows, this does not hold in general for our outer approximations.Instead, we only have 5x,R , . . .we used the SDP-solver SDPT3.
are monotone non-decreasing (resp.non-increasing) sequences of lower (resp.upper) bounds on B d f,G .In practice, the best bounds are obtained by solving for the infimum/supremum of C k f,G for the largest d f ≤ k ≤ d that can be handled computationally by the solver.
; that is, by solving two SDPs with (N +1) times as many variables as in our original algorithm.

Table 2 :
Computed upper bounds on P u and exact values from (4.27).

Table 3 :
Computed upper bounds on F u and exact values from (4.25).
|2−α| ε.Since the ε was arbitrary, (B.1) follows.Lemma B.2.Under the conditions of Theorem 4.2 we have that, for every t ≥ 0 and x ∈ R n E [(Ah)(Λ x t )] = (Au t )(x) Proof.Choose any t, s ≥ 0 and x ∈ R n .For typographical condition we write we use Λ x t and Λ(t, x) interchangeably in this proof.If µ x s denotes the law of Λ x s , by definition of u t we have thatE [u t (Λ x s )] = E [h(Λ y t )] µ x s (dy) = E [h(Λ(t, Λ(s, x))] .Since the paths of Λ take values in a compact set the rightmost term above is a martingale (as a function of s).Taking expectations and applying Fubini's Theorem we have thatE [u t (Λ x s )] = u t (x) +Similarly, since the paths of Λ take values in a compact set, the rightmost term is a martingale.Thus taking expectations of the above we have thatE [u t (Λ x s )] − u t (x) = E By the Markov property of Λ we have thatE [h(Λ(t, Λ(s, x))] = E [h(Λ(t + s, x))] .Applying Itô's rule we have thath(Λ(t + s, x)) = h (Λ(t, x)) + v ), dW v .s→0E [u t (Λ x s )] − u t (x) s = Au t (x).Comparing the above with (B.3) gives the desired result.