Multigrid with rough coefficients and Multiresolution operator decomposition from Hierarchical Information Games

We introduce a $\mathcal{O}(N \ln^2 N)$-complexity (geometric and meshless/algebraic) multigrid method for PDEs with rough ($L^\infty$) coefficients with rigorous a-priori accuracy and performance estimates. The method is discovered through a decision theory/information game formulation of the problems of (1) identifying restriction and interpolation operators (2) recovering a signal from incomplete measurements based on norm constraints on its image under a linear operator (3) gambling on the value of the solution of the PDE based on a hierarchy of nested measurements of its solution or source term. The resulting elementary gambles form a hierarchy of (deterministic) basis functions of $H^1_0(\Omega)$ (gamblets) that (1) are orthogonal across subscales/subband with respect to the scalar product induced by the energy norm of the PDE (2) enable sparse compression of the solution space in $H^1_0(\Omega)$ (3) induce a orthogonal multiresolution operator decomposition. The operating diagram of the multigrid method is that of an inverted pyramid in which gamblets are computed locally (by virtue of their exponential decay), hierarchically (from fine to coarse scales) and the PDE is decomposed into a hierarchy of independent linear systems with uniformly bounded condition numbers. The resulting algorithm is parallelizable both in space (via localization) and in bandwith/subscale (subscales can be computed independently from each other). Although the method is deterministic it has a natural Bayesian interpretation under the measure of probability emerging (as a mixed strategy) from the information game formulation and multiresolution approximations form a martingale with respect to the filtration induced by the hierarchy of nested measurements.


Scientific discovery as a decision theory problem
The process of scientific discovery is oftentimes based on intuition, trial and error and plain guesswork. This paper is motivated by the question of the existence of a rational paper shows that this reformulation is possible and leads to a multigrid/multiresolution method/algorithm solving (1.1), up to the pre-specified level of accuracy in H 1 -norm (i.e. finding u app such that u − u app H 1 0 (Ω) ≤ g H −1 (Ω) for an arbitrary g decomposed over N degrees of freedom), in O N ln 3d max( 1 , N 1/d ) operations (for ∼ N −1/d , the hierarchical matrix method achieves -accuracy in L 2 norm in O(N ln 2d+8 N ) operations and the proposed multiresolution method achieves -accuracy in H 1 norm in O(N ln 3d N ) operations). For subsequent solves (i.e. if (1.1) needs to be solved for more than one g) then the proposed multiresolution method achieves accuracy ≈ N − 1 d in H 1 -norm in O(N ln d+1 N ) operations (we refer to Subsection 5.4 and in particular to Table 1 for a detailed complexity analysis of the proposed method, which can also achieve sublinear complexity if one only requires L 2 -approximations).
The core mechanism supporting the complexity of the method presented here is the fast decomposition of H 1 0 (Ω) into a direct sum of linear subspaces that are orthogonal (or near-orthogonal) with respect to the energy scalar product and over which (1.1) has uniformly bounded condition numbers. It is, to some degree, surprising that this decomposition can be achieved in near linear complexity and not in the complexity of an eigenspace decomposition. Naturally [86], this decomposition can be applied to the fast simulation of the wave and parabolic equations associated to (1.1) or to its fast diagonalization.
The essential step behind the automation of the discovery/design of scalable numerical solvers is the observation that fast computation requires repeated computation with partial information (and limited resources) over hierarchies of levels of complexity and the reformulation of this process as that of playing underlying hierarchies of adversarial information games [111,112].
Although the problem of finding a fast solver for (1.1) may appear disconnected from that of finding statistical estimators or making decisions from data sampled from an underlying unknown probability distribution, the proposed game theoretic reformulation is, to some degree, analogous to the one developed in Wald's Decision Theory [113], evidently influenced by Von Neumann's Game Theory [111,112] (the generalization of worst case Uncertainty Quantification analysis [83] to sample data/model uncertainty requires an analogous game theoretic formulation [80], see also [79] for how the underlying calculus could be used to guide the discovery of new Selberg identities). We also refer to subsection 1.3 for a review of the correspondence between statistical inference and numerical approximation.

Outline of the paper
The essential difficulty in generalizing the multigrid concept to PDEs with rough coefficients lies in the fact that the interpolation (downscaling) and restriction (upscaling) operators are, a priori, unknown. Indeed, in this situation, piecewise linear finite-elements can perform arbitrarily badly [7] and the design of the interpolation operator requires the identification of accurate basis elements adapted to the microstructure a(x).
This identification problem has also been the essential difficulty in numerical homoge-nization [117,6,4,18,59,33,84,17]. Although inspired by classical homogenization ideas and concepts (such as oscillating test functions [68,36,35], cell problems/correctors and effective coefficients [13,90,1,72,39,46], harmonic coordinates [61,6,4,76,12,2,84], compactness by compensation [100,45,67,14]) an essential goal of numerical homogenization has been the numerical approximation of the solution space of (1.1) with arbitrary rough coefficients [84], i.e., in particular, without the assumptions found in classical homogenization, such as scale separation, ergodicity at fine scales and -sequences of operators (otherwise the resulting method could lack robustness to rough coefficients, even under the assumption that coefficients are stationary [8]). Furthermore, to envisage applications to multigrid methods, the computation of these basis functions must also be provably localized [5,85,64,48,87,58] and compatible with nesting strategies [87]. In [77], it has been shown that this process of identification (of accurate basis elements for numerical homogenization), could, in principle, be guided through its reformulation as a Bayesian Inference problem in which the source term g in (1.1) is replaced by noise ξ and one tries to estimate the value of the solution at a given point based on a finite number of observations. In particular it was found that Rough Polyharmonic Splines [87] and Polyharmonic Splines [54,30,31,32] can be re-discovered as solutions of Gaussian filtering problems. This paper is inspired by the suggestion that this link between numerical homogenization and Bayesian Inference (and the link between Numerical Quadrature and Bayesian Inference [92,27,97,74,75]) are not coincidences but particular instances of mixed strategies for underlying information games and that optimal or near optimal methods could be obtained by identifying such games and their optimal strategies. The process of identification of these games starts with the (Information Based Complexity [119]) notion that computation can only be done with partial information. For instance, since the operator (1.1) is infinite dimensional, one cannot directly compute with u ∈ H 1 0 (Ω) but only with finite-dimensional features of u. An example of such finitedimensional features is the m-dimensional vector u m := ( Ω uφ 1 , . . . , Ω uφ m ) obtained by integrating the solution u of (1.1) against m test/measurement functions φ i ∈ L 2 (Ω). However to achieve an accurate approximation of u through computation with u m one must fill the information gap between u m and u (i.e. construct an interpolation operator giving u as a function of u m ). We will, therefore, reformulate the identification of this interpolation operator as a non-cooperative (min max) game where Player I chooses the source term g (1.1) in an admissible set/class (e.g. the unit ball of L 2 (Ω)) and Player II is shown u m and must approximate u from these incomplete measurements. Using the energy norm to quantify the accuracy of the recovery and calling u * Player I's bet (on the value of u), the objective of Player I is to maximize the approximation error u − u * a , while the objective of Player II is to minimize it. A remarkable result from Game Theory (as developed by Von Neumann [111], Von Neumann and Morgenstern [112] and Nash [70]) is that optimal strategies for deterministic zero sum finite games are mixed (i.e. randomized) strategies. Although the information game described above is zero sum, it is not finite. Nevertheless, as in Wald's Decision Theory [113], under sufficient regularity conditions it can be made compact and therefore approximable by a finite game. Therefore although the information game described above is purely deterministic (and has no a priori connection to statistical estimation), under compactness (and continuity of the loss function), the best strategy for Player I is to play at random by placing a probability distribution π I on the set of candidates for g (and select g as a sample from π I ) and the optimal strategy for Player II is to place a probability distribution π II on the set of candidates for g and approximate the solution of (1.1) by the expectation of u (under π II used as a prior distribution) conditioned on the measurements Ω uφ i .
Although the estimator employed by Player II may be called Bayesian, the game described here is not (i.e. the choice of Player I might be distinct from that of Player II) and Player II must solve a min max optimization problem over π I and π II to identify an optimal prior distribution for the Bayesian estimator (a careful choice of the prior also appears to be important due to the possible high sensitivity of posterior distributions [81,79,82]). Although solving the min max problem over π I and π II may be one way of determining the strategy of Player II, it will not be the method employed here. We will instead analyze the error of Player II's approximation as a function of Player II's prior and the source term g picked by Player I. Furthermore, to preserve the linearity of the calculations we will restrict Player II's decision space (the set of possible priors π II ) to Gaussian priors on the source term g. Since the resulting analysis is independent of the structure of (1.1) and solely depends on its linearity we will first perform this investigation, in Section 2, in the algebraic framework of linear systems of equations, identify Player II's optimal mixed strategy and show that it is characterized by deterministic optimal recovery and accuracy properties. The mixed strategy identified in Section 2 will then be applied in 3 to the numerical homogenization of (1.1) and the discovery of interpolation interpolators. In particular, it will be shown that the resulting elementary gambles form a set of deterministic basis functions (gamblets) characterized by (1) optimal recovery and accuracy properties (2) exponential decay (enabling their localized computation) (3) robustness to high contrast.
To compute fast, the game presented above must not be limited to filling the information gap between u m ∈ R m and u ∈ H 1 0 (Ω). This game must be played (and repeated) over hierarchies of levels of complexity (e.g. one must fill information gaps between R 4 and R 16 , then R 16 and R 64 , etc...). We will therefore, in Section 4, consider the (hierarchical) game where Player I chooses the r.h.s of (1.1) and Player II must (iteratively) gamble on the value of its solution based on a hierarchy of nested measurements of u (from coarse to fine measurements). Under Player II's mixed strategy (identified in Section 2 and used in Section 3), the resulting sequence of multi-resolution approximations forms a martingale. Conditioning and the independence of martingale increments lead to the hierarchy of nested interpolation operators and to the multiresolution orthogonal decomposition of (1.1) into independent linear systems of uniformly bounded condition numbers. The resulting elementary gambles (gamblets) (1) form a hierarchy of nested basis functions leading to the orthogonal decomposition (in the scalar product of the energy norm) of H 1 0 (Ω) (2) enable the sparse compression of the solution space of (1.1) (3) can be computed and stored in near-linear complexity by solving a nesting of linear systems with uniformly bounded condition numbers (4) enable the computation of the solution of (1.1) (or its hyperbolic or parabolic analogues) in near-linear complexity. The implementation and complexity of the algorithm are discussed in Section 5 with numerical illustrations.
1.3 On the correspondence between statistical inference and numerical approximation As exposed by Diaconis [27], the investigation of the correspondence between statistical inference and numerical approximation can be traced back to Poincaré's course in Probability Theory [92]. It is useful to recall Diaconis' compelling example [27] as an illustration of this conection. Let f : [0, 1] → R be a given function and assume that we are interested in the numerical approximation of The Bayesian approach to this quadrature problem is to (1) Put a prior (probability distribution) on continuous functions C[0, 1] (2) Calculate f at x 1 , x 2 , . . . , x n (to obtain the data (f (x 1 ), . . . , f (x n ))) (3) Compute a posterior (4) Estimate . . , f (x n ) is the piecewise linear interpolation of f between the points x 1 , . . . , x n and one re-discovers the trapezoidal quadrature rule. If the prior on . . , f (x n ) is the cubic spline interpolant and integrating k times yields splines of order 2k + 1.
Subsequent to Poincaré's early discovery [92], Sul'din [102] and (in particular) Larkin [62] initiated the systematic investigation of the correspondence between conditioning Gaussian measures/processes and numerical approximation. As noted by Larkin [62], despite Sard's introduction of probabilistic concepts in the theory of linear approximation [95], and Kimeldorf and Wahba's exposition [60] of the correspondence between Bayesian estimation and spline smoothing/interpolation, the application of probabilistic concepts and techniques to numerical integration/approximation "attracted little attention among numerical analysts" (perhaps due to the counterintuitive nature of the process of randomizing a known function). However, a natural framework for understanding this process of randomization can be found in the pioneering works of Woźniakowski [118], Packel [88], and Traub, Wasilkowski, and Woźniakowski [104] on Information Based Complexity [71,119], the branch of computational complexity that studies the complexity of approximating continuous mathematical operations with discrete and finite ones up a to specified level of accuracy. Indeed the concept that numerical implementation requires computation with partial information and limited resources emerges naturally from Information Based Complexity, where it is also augmented by concepts of contaminated and priced information associated with, for example, truncation errors and the cost of numerical operations. In this framework, the performance of an algorithm operating on incomplete information can be analysed in the usual worst case setting or the average case (randomized) setting [93,73] with respect to the missing information. Although the measure of probability (on the solution space) employed in the average case setting may be arbitrary, as observed by Packel [88], if that measure is chosen carefully (as the solution of a game theoretic problem) then the average case setting can be interpreted as lifting a (worst case) min max problem (where saddle points of pure strategies do not, in general, exist) to a min max problem over mixed (randomized) strategies (where saddle points do exist [111,112]). As exposed by Diaconis [27] (see also Shaw [97]) the randomized setting also establishes a correspondence between Numerical Analysis and Bayesian Inference providing a natural framework for the statistical description of numerical errors (in which confidence intervals can be derived from posterior distributions). Furthermore [89,27], classical min max numerical quadrature rules can be formulated as solutions of Bayesian inference problems with carefully chosen priors [27] and, as shown by O'Hagan [74,75], this correspondence can be exploited to discover new and useful numerical quadratures rules. As envisioned by Skilling [99], by placing a (carefully chosen) probability distribution on the solution space of an ODE and conditioning on quadrature points, one obtains a posterior distribution on the solution whose mean may coincide with classical numerical integrators such as Runge-Kutta methods [96]. As shown in [23] the statistical approach is particularly well suited for chaotic dynamical systems for which deterministic worst case error bounds may provide little information. While in [99,96,23] the probability distribution is directly placed on the solutions space, for PDEs [77] argues that the prior distribution must be placed on source terms (or on the image space of an integro-differential operator) and propagated/filtered through the inverse operator to reflect the structure of the solution space. In particular [77] shows that this process of filtering noise with the inverse operator, when combined with conditioning, produces accurate finite-element basis functions for the solution space whose deterministic worst case errors can be bounded by standard deviation errors using the reproducing kernel structure of the covariance function of the filtered Gaussian field. As already witnessed in [23,96,77,56,55,19,25], it is natural to expect that the possibilities offered by combining numerical uncertainties/errors with model uncertainties/errors in a unified framework will stimulate a resurgence of the statistical inference approach to numerical analysis.
2 Linear Algebra with incomplete information

The recovery problem
The problem of identifying interpolation operators for (1.1) is equivalent (after discretization or in the algebraic setting) to that of recovering or approximating the solution of a linear system of equations from an incomplete set of measurements (coarse variables) given known norm constraints on the image of the solution.
Let n ≥ 2 and A be a known real invertible n × n matrix. Let b be an unknown element of R n . Our purpose is to approximate the solution x of based on the information that (1) x solves Φx = y, (2.2) where Φ (the measurement matrix) is a known, rank m, m × n real matrix such that m < n and y (the measurement vector) is a known vector of R m , and (2) the norm b T T −1 b of b is known or bounded by a known constant (e.g., b T T −1 b ≤ 1), where T −1 is a known positive definite n × n matrix (with T −1 being the identity matrix as a prototypical example). Observe that since m < n, the measurements (2.2) are, a priori, not sufficient to recover the exact value x.
As described in Section 1, by formulating this recovery problem as a (non-cooperative) information game (where Player I chooses b and Player II chooses an approximation x * of x based on the observation Φx), one (Player II) is naturally lead to search for mixed strategy in the Bayesian class by placing a prior distribution on b. The purpose of this section is to analyze the resulting approximation error and select the prior distribution accordingly. To preserve the linearity (i.e. simplicity and computational efficiency) of calculations we will restrict Player II's decision space to Gaussian priors.

Player I's mixed strategy
We will therefore, in the first step of the analysis, replace b in (2.1) by ξ, a centered Gaussian vector of R n with covariance matrix Q (which may be distinct from T ) and consider the following stochastic linear system The Bayesian answer (a mixed strategy for Player II) to the recovery problem of Section 2 is to approximate x by the conditional expectation E[X|ΦX = y].
Theorem 2.1. The solution X of (2.3) is a centered Gaussian vector of R n with covariance matrix Furthermore, X conditioned on the value ΦX = y is a Gaussian vector of R n with mean E[X|ΦX = y] = Ψy, and of covariance matrix K Φ , where Ψ is the n × m matrix

5)
and K Φ is the rank n − m positive n × n symmetric matrix defined by K Φ := K − ΨΦK.

Variational/optimal recovery properties and approximation error
For a n × n symmetric positive definite matrix M let ·, · M be the (scalar) product M the corresponding norm. When M is the identity matrix then we write u, v and v the corresponding scalar product and norm. For a linear subspace V of R n we write P V,M for the orthogonal projections onto V with respect to the scalar product ·, · M . For a (possibly rectangular) matrix B we write Im(B) the image (range) of B and Ker(B) the null space of B. For an integer n, let I n be the n × n identity matrix.
Theorem 2.2. For w ∈ R m , Ψw is the unique minimizer of the following quadratic problem Minimize v, v K −1 Subject to Φv = w and v ∈ R n . (2.6) In particular, v = Ψy, the Bayesian approximation of the solution of (2.1), is the unique minimizer of Av Q −1 under the measurement constraints Φv = y. Furthermore, it also holds true that (1) ΦΨ = I m (2) Im(Ψ) is the orthogonal complement of Ker(Φ) with respect to the product ·, · K −1 and (3) ΨΦ = P Im(KΦ T ),K −1 and I n − ΨΦ = P Ker(Φ),K −1 .
Proof. First observe that (2.5) implies that ΦΨ = I m where I m is the identity m×m matrix. Therefore Φ(Ψw) = w. Note that (2.5) then v must belong to Ker(Φ). Since the dimension of Im(Ψ) is m and that of Ker(Φ) is n − m we conclude that Im(Ψ) is the orthogonal complement Ker(Φ) with respect to the product ·, · K −1 and in particular, Ψw, v K −1 = 0, ∀w ∈ R m and ∀v ∈ R n such that Φv = 0 . (2.7) Let w ∈ R m and v ∈ R n such that Φv = w. Since Ψw − v ∈ Ker(Φ), it follows from (2.7) Therefore Ψw is the unique minimizer of v, v K −1 over all v ∈ R n such that Φv = w. Now consider f ∈ R n , since Im(Ψ) = Im(KΦ T ) and Im(Ψ) is the orthogonal complement of Ker(Φ) with respect to the product ·, · K −1 , there exists a unique z ∈ R m and a unique g ∈ Ker(Φ) such that f = KΦ T z + g. Since ΨΦ = KΦ T (ΦKΦ T ) −1 Φ, it follows that ΨΦf = KΦ T z and (I n − ΨΦ)f = g. We conclude by observing that g = P Ker(Φ),K −1 f .
Proof. The proof follows by observing that v −ΨΦv belongs to the null space of Φ which, from Theorem 2.2, is the orthogonal complement of the image of Ψ with respect to the scalar product defining the norm · K −1 . Observe also that the image of Ψ is equal to that of KΦ T .
Remark 2.4. Observe that, from Theorem 2.2, v − ΨΦv spans the null space of Φ, and v 2 In particular, if x is the solution of (2.1) and y the vector in (2.2), then x − Ψy D / b Q −1 ≤ sup v∈R n , Φv=0 v D / v K −1 and the right hand side is the smallest constant for which the inequality holds (for all b).
Remark 2.5. A simple calculation (based on the reproducing Kernel property v, K ·,i K −1 = v i ) shows that if x is the solution of (2.1) and y the vector in (2.2), then i.e. the variance of the ith entry of the solution of the stochastic system (2.3) conditioned on ΦX = y, controls the accuracy of the approximation of the ith entry of the solution of the deterministic system (2.1). In that sense, the role of K Φ is analogous to that of the power function in radial basis function interpolation [116,40] and that of the Kriging function [120] in geostatistics [69].

Energy norm estimates and selection of the prior
We will from now on assume that A is symmetric positive definite. Observe that in this situation the energy norm · A is of practical significance for quantifying the approximation error and Theorem 2.3 leads to the estimate x is the solution of (2.1) and y the vector in (2.2), then Remark 2.7. Therefore, according to Corollary 2.6, if Q = A, then Ψy is the Galerkin approximation of x, i.e. the best approximation of x in · A -norm in the image of Ψ (which is equal to the image of A −1 Φ T ). This is interesting because Ψy is obtained without the prior knowledge of b.
Corollary 2.6 and Remark 2.7 motivate us to select Q = A as the covariance matrix of the Gaussian prior distribution (mixed strategy of Player II).

Impact and selection of the measurement matrix Φ
It is natural to wonder how good this recovery strategy is (under the choice Q = A) compared to the best possible function of y and how the approximation error is impacted by the measurement matrix Φ. If the energy norm is used to quantify accuracy, then the recovery problem can be expressed as finding the function θ of the measurements y minimizing the (worst case) approximation error inf θ sup b ≤1 x − θ(y) A / b with x = A −1 b and y = ΦA −1 b. Writing 0 < λ 1 (A) ≤ · · · ≤ λ n (A), the eigenvalues of A in increasing order, and a 1 , . . . , a n , the corresponding eigenvectors, it is easy to obtain that (1) the best choice for Φ would correspond to measuring the projection of x on span{a 1 , . . . , a m } and would lead to the worst approximation error 1/ λ m+1 and (2) the worst choice would correspond to measuring the projection of x on a subspace orthogonal to a 1 and would lead to the worst approximation error 1/ √ λ 1 . Under the decision Q = A the minimal value of (2.8) is also 1/ λ m+1 and achieved for Im(Φ T ) = span{a 1 , . . . , a m } and the maximal value of (2.8) is 1/ √ λ 1 and achieved when Im(Φ T ) is orthogonal to a 1 . The following theorem, which is a direct application of (2.8) and the estimate derived in [53, p. 10] (see also [66]), shows that, the subset of measurement matrices that are not nearly optimal is of small measure if the rows of Φ T are sampled independently on the unit sphere of R n .
x is the solution of the original equation (2.1), and 2 ≤ p then with probability at least Although the randomization of the measurement matrix [42,63,43,78] can be an efficient strategy in compressed sensing [105,21,20,28,44,22] and in Singular Value Decomposition/Low Rank approximation [53], we will not use this strategy here because the design of the interpolation operator presents the (added) difficulty of approximating the eigenvectors associated with the smallest eigenvalues of A rather than those associated with the largest ones. Furthermore, Ψ has to be computed efficiently and the dependence of the approximation constant in Theorem 2.8 on n and m can be problematic if sharp convergence estimates are to be obtained. We will instead select the measurement matrix based on the transfer property introduced in [14] and given in a discrete context in the following theorem.
Proof. Corollary 2.6 implies that if x is the solution of the original equation (2.1), then We finish the proof by observing that if A and B are symmetric positive definite matrices such that α 1 B ≤ A ≤ α 2 B for some constants Therefore according to Theorem 2.9, once a good measurement matrix Φ has been identified for a symmetric positive definite matrix B such that α 1 B ≤ A, the same measurement matrix can be used for A at the cost of an increase of the bound on the error by the multiplicative factor α −1/2 1 . As a prototypical example, one may consider a (stiffness) matrix A obtained from a finite element discretization of the PDE (1.1) and B may be the stiffness matrix of the finite element discretization of the Laplace Dirichlet PDE − ∆u (x) = g(x) on Ω with u = 0 on ∂Ω, (2.10) obtained from the same finite-elements (e.g. piecewise-linear nodal basis functions over the same fine mesh T h ). Using the energy norm (1.3), Theorem 2.9 and Remark 2.7 imply the following proposition Proposition 2.10. Let u h (resp. u h ) be the finite element approximation of the solution u of (1.1) (resp. the solution u of (2.10)) over the finite nodal elements of T h . Let u H (resp. u H ) be the finite element approximation of the solution u of (1.1) (resp. the solution u of (2.10)) over linear space spanned by the rows of A −1 Φ T (resp. over the linear space spanned by the rows of B −1 Φ T ). It holds true that Observe that the right hand side of (2.11) does not depend on λ max (a), therefore if λ min (a) = 1, then the error bound on u h − u H a does not depend on the contrast of a (i.e. λ max (a)/λ min (a)).

Numerical homogenization and design of the interpolation operator in the continuous case
We will now generalize the results and continue the analysis of Section 2 in the continuous case and design the interpolation operator for (1.1) in the context of numerical homogenization.

Information Game and Gamblets
As in Section 2 we will identify the interpolation operator (that will be used for the multigrid algorithm) through a non cooperative game formulation where Player I chooses the source term g (1.1) and Player II tries to approximate the solution u of (1.1) based on a finite number of measurements ( Ω uφ i ) 1≤i≤m obtained from linearly independent test functions φ i ∈ L 2 (Ω). As in Section 2, this game formulation, motivates the search for a mixed strategy for Player II that can be expressed by replacing the source term g with noise ξ. We will therefore consider the following SPDE where Ω and a are the domain and conductivity of (1.1). As in Section 2, to preserve the computational efficiency of the interpolation operator we will assume that ξ is a centered Gaussian field on Ω. The decision space of Player II is therefore the covariance function of ξ. Write L the differential operator − div(a∇) with zero Dirichlet boundary condition mapping H 1 0 (Ω) onto H −1 (Ω). Motivated by the analysis (Remark 2.7) of Subsection 2.4 (which can be reproduced in the continuous case) we will select the covariance function of ξ (Player II's decision) to be L. Therefore, under that choice, for all f ∈ H 1 0 (Ω), where f a is the energy norm of f defined in (1.3). Introducing the scalar product on recall that if (e 1 , e 2 , . . .) is an orthonormal basis of (H 1 0 (Ω), · a ) diagonalizing L, then ξ can formally be represented as N (0, 1) random variables) and, therefore, ξ can also be identified as the linear isometry mapping [77], if ξ is White Noise on Ω (i.e. a Gaussian field with covariance function δ(x − y)) then ξ can be represented as Let F be the σ-algebra generated by the random variables Ω v(x)φ i for i ∈ {1, . . . , m} (with v solution of (3.1)). We will identify the interpolation basis elements by conditioning the solution of (3.1) on F. Observe that the covariance matrix of the measurement Note that for l ∈ R m , l T Θl = w 2 a where w is the solution of (1.1) with right hand side g = m i=1 l i φ i . Therefore (since the test functions φ i are linearly independent) Θ is positive definite and we will write Θ −1 its inverse. Write δ i,j the Kronecker's delta (δ i,i = 1 and δ i,j = 0 for i = j).
Theorem 3.1. Let v be the solution of (3.1). It holds true that where the functions ψ i ∈ H 1 0 (Ω) are defined by and admit the following representation formula Furthermore, the distribution of v conditioned on F is that of a Gaussian field with mean (3.4) and covariance function Γ(x, y) = G(x, y) Proof. The proof is similar to that of [77,Thm. 3.5]. The identification of the covariance function follows from the expansion of Γ( Since, according to (3.5) and the discussion preceding (3.1), each ψ i is an elementary gamble (bet) on value of the solution of (1.1) given the information Ω φ j u = δ i,j for j = 1, . . . , m we will refer to the basis functions (ψ i ) 1≤i≤m , as gamblets. According to (3.4), once gamblets have been identified, they form a basis for betting on the value of the solution of (1.1) given the measurements ( Ω φ j u) 1≤i≤m .

Optimal recovery properties
Although gamblets admit the representation formula (3.6), we will not use it for their practical (numerical) computation. Instead we will work with variational properties inherited from the conditioning of the Gaussian field v. To guide our intuition, note that since L is the precision function (inverse of the covariance function) of v, the conditional expectation of v can be identified by minimizing Ω ψLψ given measurements constraints. This observation motivates us to consider, for i ∈ {1, . . . , m}, the following quadratic optimization problem where ψ a is the energy norm of ψ defined in (1.3).
The following theorem shows that (3.7) can be used to identify ψ i and that gamblets are characterized by optimal (variational) recovery properties.  (3.6). The definition of Θ implies that Ω ψ w (x)φ j (x) = w j for j ∈ {1, . . . , m}. Furthermore we obtain by integration by parts that for all ϕ ∈ H 1 which finishes the proof of optimality of ψ i and ψ w .

Optimal accuracy of the recovery
where u is the solution of (1.1) and ψ i are the gamblets defined by (3.5) and (3.6). Note u * corresponds to Player II's bet on the value of u given the measurements ( Ω u(y)φ i (y) dy) 1≤i≤m . In particular, if v is the solution of (3.1) then The following Theorem shows that u * is the best approximation (in energy norm) of u in span{L −1 φ i : i ∈ {1, . . . , m}}. Theorem 3.3. Let u be the solution of (1.1), u * defined in (3.9) and (3.10). It holds true that . . , m}} = span{ψ 1 , . . . , ψ m } and (3.11) follows from the fact that Ω (u − u * )φ j = 0 for all j implies that u − u * is orthogonal to span{ψ 1 , . . . , ψ m } with respect to the scalar product ·, · a .

Transfer property and selection of the measurement functions
We will now select the measurement (test) functions φ i by extending the result of Proposition 2.10 to the continuous case. For V , a finite dimensional linear subspace of Proof. Write G the Green's function of (1.1) and G * the Green's function of (2.10). Observe The monotonicity of Green's function as a quadratic form (see for instance [12,Lemma 4.13 This extension, which is also directly related to the transfer property of the flux-norm (introduced in [14] and generalized in [103], see also [115]), allows us to select accurate finite dimensional bases for the approximation of the solution space of (1.1).
Construction 3.5. Let (τ i ) 1≤i≤m be a partition of Ω such that each τ i is Lipschitz, convex and of diameter at most H. Let (φ i ) 1≤i≤m be elements of L 2 (Ω) such that for each i, the support of φ i is contained in the closure of τ i and τ i φ i = 0. Proposition 3.6. Let (φ i ) 1≤i≤m be the elements of Construction 3.5 and let u be the solution of (1.1).
Proof. Using Proposition 3.4 it is sufficient to complete proof when a is the constant identity matrix. Let u be the solution of (2.10) and [91] for the optimal constant 1/π used here) lead to u − v 2 Therefore, by using Cauchy-Schwartz inequality and sim- The value of the constant C in Proposition 3.6 motivates us to modify Construction 3.5 as follows.
Construction 3.7. Let (φ i ) 1≤i≤m be the elements constructed in 3.5 under the additional assumptions that (a) each φ i is equal to one on τ i and zero elsewhere (b) there exists δ ∈ (0, 1) such that for each i ∈ {1, . . . , m}, τ i contains a ball of diameter δH.
Let (φ i ) 1≤i≤m be as in Construction 3.7. Note that the additional assumption (a) implies that the constant C in Proposition 3.6 is equal to 2/(π λ min (a)). Assumption (b) will be used for localization purposes in subsections 3.5 and 3.6 (and is not required for Theorem 3.8). The following theorem is a direct consequence of Proposition 3.6 and Theorem 3.3.
and the minimum in the l.h.s of (3.15) is achieved for v = u * defined in (3.9) and (3.10).
The assumption of convexity of the subdomains τ i is only used to derive sharper constants via Poincaré's inequality for convex domains (without it, approximation error bounds remain valid after multiplication by π). Similarly, the transfer property can be used to derive constructions that are distinct from 3.5 and 3.7.
Remark 3.10. Gamblets defined via the constrained energy minimization problems (3.7) are analogous to the energy minimizing bases of [65,114,122,121] and in particular [108]. However they form a different set of basis functions when global constraints are added: the (total) energy minimizing bases of [65,114,122,121,108] are defined by minimizing the total energy i ψ i 2 a subject to the constraint i ψ i (x) = 1 related to the local preservation of constants. Numerical experiments [122] suggest that total energy minimizing basis functions could lead to a O( √ H) convergence rate (with rough coefficients). Note that (3.7) is also analogous to the constrained minimization problems associated with Polyharmonic Splines [54,30,31,32,87], which can be recovered with a Gaussian prior (on ξ) with covariance function δ(x − y) (corresponding to exciting (3.1) with white noise). We suspect that the basis functions obtained in the orthogonal decomposition of [64] can also be recovered via the variational formulation (3.7) by identifying the null space of the Clement quasi-interpolation operator with that of appropriately chosen measurement functions φ i .

Exponential decay of gamblets
Theorems 3.2 and 3.3 show that the gamblets ψ i have optimal recovery properties analogous to the discrete case of Theorem 2.2 and Corollary 2.6. However one may wonder why one should compute these gamblets rather than the elements (div a∇) −1 φ i since they span the same linear space (by the representation formula (3.6)). The answer lies in the fact that each gamblet ψ i decays exponentially as a function of the distance from the support of φ i and its computation can therefore be localized to a subdomain of diameter O(H ln 1 H ) without impacting the order of accuracy (3.15). Consider the construction 3.7. Let ψ i be defined as in Theorem 3.2 and let x i be an element of τ i . Write B(x, r) the ball of center x and radius r.
Theorem 3.11. Exponential decay of the basis elements ψ i . It holds true that Proof. Let k, l ∈ N * and i ∈ {1, . . . , m}. Let S 0 be the union of all the domains τ j that are contained in the closure of B(x i , klH) ∩ Ω, let S 1 be the union of all the domains τ j that are contained in the closure of (B(x i , (k + 1)lH)) c ∩ Ω and let S * = S c 0 ∩ S c 1 ∩ Ω (be the union of the remaining elements τ j not contained in S 0 or S 1 ). Let η be the function on Ω defined by η( (3.17) and I 2 = − Ω ηψ i div(a∇ψ i ). By (3.6), − div(a∇ψ i ) is piecewise constant and equal to Θ −1 i,j on τ j . By the constraints of (3.7) τ j ψ i = 0 for i = j. Therefore (writing η j the volume average of η over τ j ) we have We will now need the following lemma Writing G j the Green's function of the operator − div(a∇·) with Dirichlet boundary condition on ∂τ j , note that Using the monotonicity of the Green's function as a quadratic form (as in the proof of Proposition 3.4), we have τ 2 is the Green's function of the operator −∆ with Dirichlet boundary condition on ∂τ j . Recall that 2 τ j G * j (x, y) dy is the mean exit time (from τ j ) of a Brownian motion started from x and the mean exit time of a Brownian motion started from x to exit a ball of center x and radius r is r 2 (see for instance [12]). Since τ j contains a ball of diameter δH, it follows that 2 τ 2 which finishes the proof after taking the sum over j.

Localization of the basis elements
Theorem 3.11 allows us to localize the construction of basis elements ψ i as follows. For r > 0 let S r be the union of the subdomains τ j intersecting B(x i , r) (recall that x i is an element of τ i ) and let ψ loc,r i be the minimizer of the following quadratic problem (3.23) We will naturally identify ψ loc,r i with its extension to H 1 0 (Ω) by setting ψ loc,r i = 0 outside of S r . From now on, to simplify the expression of constants, we will assume without loss of generality that the domain is rescaled so that diam(Ω) ≤ 1.
Theorem 3.13. It holds true that where l is defined in Theorem 3.11, Proof. We will need the following lemma.
Lemma 3.14. It holds true that where V d is the volume of the d-dimensional unit ball, and, where l is the constant of Theorem 3.11 and r i,j is the distance between τ i and τ j .
Let us now prove Theorem 3.13. Let S 0 be the union of the subdomains τ j not contained in S r and let S 1 be the union of the subdomains τ j that are at distance at least H from S 0 (for S 0 = ∅ the proof is trivial, so we may assume that S 0 = ∅, similarly it is no restriction to assume that S 1 = ∅). Let η be the function on Ω defined by satisfies the constraints of (3.7), we have from (3.8), (3.28) Noting that ηψ i ∈ H 1 0 (S r ), Theorem 3.2 implies that ψ i,r w a ≤ ηψ i a , which, combined with (3.29) and (3.28) Combining these equations with the exponential decay of Theorem 3.11 we deduce (3.31) Similarly, using Cauchy-Schwartz and Poincaré inequalities we have for τ j ⊂ S * , which by the exponential decay of Theorem (3.11) (and ψ i a ≤ ψ i,r i a ) leads to The following theorem shows that gamblets preserve the O(H) rate of convergence (in energy norm) after localization to sub-domains of size O(H ln(1/H)). They can therefore be used as localized basis functions in numerical homogenization [5,85,64,87]. Section 4 will show that they can also be computed hierarchically at near linear complexity.
. We conclude using Theorem 3.13 to bound max i ψ i −ψ loc,r i a .

Multiresolution operator decomposition
Building on the analysis of Section 3, we will now gamble on the approximation of the solution of (1.1) based on measurements performed at different levels of resolution. The resulting hierarchical (and nested) games will then be used to derive a multiresolution decomposition of (1.1) (orthogonal across subscales) and a near-linear complexity multiresolution algorithm with a priori error bounds.

Hierarchy of nested measurement functions
In order to define the hierarchy of games we will first define a hierarchy of nested measurement functions.
Definition 4.1. We say that I is an index tree of depth q if it is a finite set of qtuples of the form i = (i 1 , . . . , i q ) with 1 ≤ i 1 ≤ m 0 and 1 ≤ i j ≤ m (i 1 ,...,i j−1 ) for j ≥ 2, where m 0 and m (i 1 ,...,i j−1 ) are strictly positive integers. For 1 ≤ k ≤ q and i = (i 1 , . . . , i q ) ∈ I, we write i (k) := (i 1 , . . . , i k ) and I (k) := {i (k) : i ∈ I}. For k ≤ k ≤ q and j = (j 1 , . . . , j k ) ∈ I (k ) we write j (k) := (j 1 , . . . , j k ). For i ∈ I (k) and k ≤ k ≤ q we write i (k,k ) the set of elements j ∈ I (k ) such that j (k) = i. Construction 4.2. Let I be an index tree of depth q. Let δ ∈ (0, 1) and 0 < H q < · · · < H 1 < 1. Let (τ (k) i , k ∈ {1, . . . , q}, i ∈ I (k) ) be a collection of subsets of Ω such that (1) is a Lipschitz, convex subset of Ω of diameter at most H k and contains a ball of diameter δH k (2) the sequence of partitions is nested, i.e. for k ∈ {1, . . . , q − 1} and i ∈ I (k) , τ As in Remark 3.9, the assumption of convexity of the subdomains τ i ). Note that the nesting of the domain decomposition implies that of the measurement functions, i.e. for k ∈ {1, . . . , q − 1} and i ∈ I (k) , k+1) . We will assume without loss of generality that φ i and the entries of π (k,k+1) by the corresponding multiplicative factors, we will keep track of the dependence of some of the constants on max i,j |τ

Hierarchy of nested gamblets and multiresolution approximations
Let us now consider the problem of recovering the solution of (1.1) based on the nested measurements ( Ω uφ (k) i ) i∈I (k) for k ∈ {1, . . . , q}. As in Section 3 we are lead to investigate the mixed strategy (for Player II) expressed by replacing the source term g with a centered Gaussian field with covariance function L = − div(a∇). Under that mixed strategy, Player II's bet on the value of the solution of (1.1), given the measurements where (see Theorem 3.2), for k ∈ {1, . . . , q} and i ∈ I (k) , ψ  Define V (q+1) := H 1 0 (Ω) and, for k ∈ {1, . . . , q}, | i ∈ I (k) }, and the nesting (4.1) of the measurement functions implies the nesting of the spaces V (k) . The following theorem is (which is a direct application of theorems 3.3 and 3.8) shows that u (k) is the best (energy norm) approximation of the solution of (1.1) in V (k) .

Nested games and martingale/multiresolution decomposition
As in Section 3 we consider the mixed strategy (for Player II) expressed by replacing the source term g with a centered Gaussian field with covariance function L. Under this mixed strategy, Player II's bet (4.2) on the value of the solution of (1.1), given the measurements ( Ω u(y)φ (k) i (y) dy) i∈I (k) , can also be obtained by conditioning the solution v of the SPDE (3.1) (see (3.10)), i.e.
represents Player II's bet on the value of the solution of (1.1) given the measurements Ω u(y)φ Now consider the nesting of non-cooperative games where Player I chooses g in (1.1) and Player II is shown the measurements ( Ω uφ (k) i ) i∈I (k) , step by step, in a hierarchical manner, from coarse (k = 1) to fine (k = q) and must, at each step k of the game, gamble on the value of solution u. The following theorem and (4.6) show that the resulting sequence of approximations u (k) form the realization of a martingale with independent increments. Theorem 4.4. Let F k be the σ-algebra generated by the random variables It holds true that (1) F 1 , . . . , F q forms a filtration, i.e.
Proof. The nesting (4.1) of the measurement functions implies F k ⊂ F k+1 and (F k ) k≥1 is therefore filtration. The fact that v (k) is a martingale follows from v (k) = E v F k .
Since v (1) and the increments (v (k+1) − v (k) ) k≥1 are Gaussian fields belonging to the same Gaussian space their independence is equivalent to zero covariance, which follows from the martingale property, i.e. for k (with a sequence H k decreasing towards 0) and using the martingale convergence theorem imply that, for all ϕ ∈ C ∞ 0 (Ω), Ω v (k) ϕ → Ω vϕ as k → ∞ (a.s. and in L 1 ).
The independence of the increments v (k+1) −v (k) is related to the following orthogonal multiresolution decomposition of the operator (1.1). For V (k) defined as in (4.4) and for k ∈ {2, . . . , q + 1} let W (k) be the orthogonal complement of V (k−1) within V (k) with respect to the scalar product ·, · a . Write ⊕ a the orthogonal direct sum with respect to the scalar product ·, · a . Note that by Theorem 4.3, u (k) defined by (4.2) is the finite element solution of (1.1) in V (k) (in particular we will write u (q+1) = u).

of Theorem 4.3 and integration by parts), for
Finally, (4.2), the constraints of (4.3) and the nesting property (4.1) imply that for i ∈ I (k) ,

Interpolation and restriction matrices/operators
Since the spaces V (k) are nested there exists a I (k) × I (k+1) matrix R (k,k+1) such that for 1 ≤ k ≤ q − 1 and i ∈ I (k) We will refer to R (k,k+1) as the restriction matrix and to its transpose R (k+1,k) := (R (k,k+1) ) T as the interpolation/prolongation matrix. The following theorem shows that (see Figure 1 is Player II's best bet on the value of Ω uφ (k+1) j given the information that Ω uφ (k) s = δ i,s , s ∈ I (k) ).
Theorem 4.7. It holds true that for i ∈ I (k) and j ∈ I (k+1) , R Proof. The first equality is obtained by integrating (4.11) against φ (k+1) j and using the constraints satisfied by ψ (k+1) j in (4.3). For the second equality, observe that since F k is a filtration we can replace v in the representation formula (4.7) by v (k) (as defined by the r.h.s. of (4.8)) and obtain ψ (k) l (y) dy = δ i,l , l ∈ I (k) which corresponds to (4.11).

Nested computation of the interpolation and stiffness matrices
Let v be the solution of (3.1). Observe that ( Ω v(x)φ (k) i ) i∈I (k) is a Gaussian vector with (symmetric, positive definite) covariance matrix Θ (k) defined by for i, j ∈ I (k) , Observe that, as in Theorem 3.2, Θ (k),−1 = A (k) where A (k) is the (symmetric, positive definite) stiffness matrix of the elements ψ (k) i , i.e., for i, j ∈ I (k) , (4.14) Write π (k+1,k) the transpose of the matrix π (k,k+1) (defined below (4.1)) and I (k) the I (k) × I (k) identity matrix. The following theorem enables the hierarchical/nested computation of A (k) from A (k+1) .
From now on we choose, for each k ∈ {2, . . . , q}, a J (k) × I (k) matrix W (k) as in Lemma 4.9. This choice is not unique and to enable fast multiplication by W (k) (or its transpose) we require that for (j, i) ∈ J (k) × I (k) , W (k) j,i = 0 if j (k−1) = i (k−1) . Therefore, the construction of W (k) requires, for each s ∈ I (k−1) , to specify a number m s − 1 of m s -dimensional vectors W  1, 1, . . . , 1, 1). We propose two simple constructions. For k ∈ {2, . . . , q} and i = (i 1 , . . . , i k−1 , i k ) ∈ J (k) define i + := (i 1 , . . . , i k−1 , i k + 1) and observe that under construction 4.11, whose game-theoretic interpretation is provided in Figure 1. is Player II's best bet on the value of the solution u of (1.1) given τ (k) For the second construction we need the following lemma whose proof is trivial.
n,n+1 = −n. Then for n ≥ 2, the rows of U (n) are orthogonal, U (n) n,j = 1 for 1 ≤ j ≤ n and we writeŪ (n) the corresponding orthonormal matrix obtained by renormalizing the rows of U (n) . Observe that under Construction 4.13 (1) the complexity of constructing W (k) is
Let g be the r.h.s of (1.1). For k ∈ {1, . . . , q} let g (k) be the |I (k) |-dimensional vector defined by g i g for i ∈ I (k) . Observe that g (k) can be computed iteratively using For k ∈ {2, . . . , q}, let w (k) be |J (k) |-dimensional vector defined as the solution of Furthermore let U (1) be the |I (1) |-dimensional vector defined as the solution of According to following theorem, which is a direct consequence of Theorem 4.6, the solution of (1.1) can be computed at any scale by solving the decoupled linear systems

Uniformly bounded condition numbers across subscales/subbands
Taking q = ∞ in Theorem 4.6, the construction of the basis elements ψ (k) i leads to the multiresolution orthogonal decomposition, In that sense the basis elements ψ i could be seen as a generalization of wavelets to the orthogonal decomposition of H 1 0 (Ω) (rather than L 2 (Ω)) adapted to the solution space of the PDE (1.1). We will now show that this orthogonal decomposition induces a subscale decomposition of the operator − div(a∇) into layered subbands of increasing frequencies. Moreover the condition number of the operator − div(a∇) restricted to each subspace W (k) will be shown to be uniformly bounded if H k−1 /H k is uniformly bounded (e.g. if H k is a geometric sequence). Write H 0 := 1 and let δ be defined as in Construction 4.2.

Well conditioned relaxation across subscales
If H k is a geometric sequence or if H k−1 /H k is uniformly bounded, then, by Theorem (4.17), the linear systems ((4.26) and (4.27)) entering in the calculation of the gamblets χ (k) i (and therefore ψ (k) i ) and the subband/subscale solutions u (1) and u (k+1) − u (k) have uniformly bounded condition numbers (in particular, these condition numbers are bounded independently from mesh size/resolution and the regularity of a(x)). Therefore these systems can be solved efficiently using iterative methods. One such methods is the Conjugate Gradient (CG) method [57]. Recall [98] that the application of the CG method to a linear system Ax = b (where A is a n × n symmetric positive definite matrix) with initial guess x (0) , yields a sequence of approximations x (l) satisfying (writing |e| 2 Recall [98] also that the maximum number of iterations required to reduce the error by a factor (|x − x (l) | A ≤ |x − x (0) | A ) is bounded by 1 2 Cond(A) ln 2 and has complexity (number of required arithmetic operations) O( Cond(A)N A ) (writing N A the number of non-zero entries of A).

Hierarchical localization and error propagation across scales
Although the multi-resolution decomposition presented in this section leads to well conditioned linear systems, the resulting matrices B (k) and A (k) are dense and to achieve near-linear complexity in the resolution of (1.1) these matrices must be truncated by localizing the computation of the basis functions ψ i ). The approximation error induced by these localization/truncation steps is controlled by the exponential decay of gamblets and the uniform bound on the condition numbers of the matrices B (k) . To make this control explicit and derive a bound the size of the localization sub-domains we need to quantify the propagation of truncation/localization errors across scales and this is the purpose of this subsection.
We will also need the following lemma.
Lemma 4.23. Let k ∈ {1, . . . , q − 1} and let R be the I (k) × I (k+1) matrix defined by and R i,j = 0 for j ∈ i ρ k ,k+1 and R i,j = R (k,k+1) i,j for j ∈ I (k+1) /i ρ k ,k+1 . It holds true that . Using Theorem 4.7 and Cauchy-Schwartz inequality we have |R . Using Theorem 3.11 we obtain that Let us now prove Theorem 4.21. We obtain by induction (using the constraints in (4.44)) that for k ∈ {1, . . . , q} and i ∈ I (k) , ψ (k),loc i satisfies the constraints of (4.3). Moreover (3.8) implies that if ψ satisfies the constraints of (4.3) then ψ 2 a = ψ , we have (since ψ * satisfies the constraints of (4.44)) ψ The following theorem allows us to control the effect of the localization error on the approximation of the solution of (1.1). Proof. We will need the following lemma. . Let B (resp. B ) be the m × m matrix defined by B i,j = χ i , χ j a (resp. B i,j = χ i , χ j a ). Let u m (resp. u m ) be the solution of (1.1) in span{χ i | i = 1, . . . , m} (resp. span{χ i | i = 1, . . . , m}). It holds true that for E ≤ λ min (B)/2 we conclude the proof of (4) after simplification.
for some constantsγ,γ, C 0 ≈ O(1). In addition to (5.1) the regularity of the finite elements is used to ensure the availability of the inverse Poincaré inequality for v ∈ span{ϕ i | i ∈ N } and some constant C 1 ≈ O(1), used to generalize the proof of Theorem 4.16 to the discrete case. Given g = i∈N g i ϕ i we want to find u ∈ span{ϕ i | i ∈ N } such that for all j ∈ N , In practical applications a is naturally assumed to be piecewise constant over the fine mesh (e.g. of constant value in each triangle or square of T h ) and one purpose of the algorithm is the fast resolution of the linear system (5.3) up to accuracy ∈ (0, 1). Example 5.1. We will illustrate the presentation of the algorithm with a numerical example in which T h is a square grid of mesh size h = (1 + 2 q ) −1 with q = 6 and 64 × 64 interior nodes ( Figure 2). a is piecewise constant on each square of T h and given by a(x) = 6 k=1 1 + 0.5 cos 2 k π( i 2 q +1 + j . The contrast of a (i.e., when a is scalar, the ratio between its maximum and minimum value) is 1866. The finite-element discretization (5.3) is obtained using continuous nodal basis elements ϕ i spanned by {1, x 1 , x 2 , x 1 x 2 } in each square of T h . Writing z i the positions of the interior nodes of T h , we choose, for our numerical example, g(x) = i∈N cos(3z i,1 + z i,2 ) + sin(3z i,2 ) + sin(7z i,1 − 5z i,2 ) ϕ i (x). To ensure a uniform bound on the condition numbers of the stiffness matrices (4.19) one must select the resolutions H k to form a geometric sequence (or simply such that H k−1 /H k is uniformly bounded), i.e. H k = H k for some H ∈ (0, 1) (for our numerical example H = 1/2, q = 6 and we identify I (k) as the indices of the interior nodes of a square grid of resolution (1 + 2 k ) −1 as illustrated in Figure 3). In this construction H q = h corresponds to the resolution of the fine mesh and each subset τ (q) i (i ∈ I (q) ) contains one and only one element of N (interior node of the fine mesh). Using this one to one correspondence we use the elements of I = I q to (re)label the nodal elements (ϕ i ) i∈N as (ϕ i ) i∈I . The measurement functions (φ (k) i ) i∈I (k) are then identified (1) by selecting φ (q) i = ϕ i for i ∈ I (q) and (2) via the nested aggregation (4.1) of the nodal elements (as commonly done in AMG), i.e. φ , . . . , q − 1} and i ∈ I (k) .
Remark 5.1. We refer to Figure 4 for an illustration of these measurement functions for our numerical example. Note that the support of each φ (k) i is only approximatively (and not exactly) τ (k) i and that the φ (k) i are only approximate set functions (and not exact ones). This does not affect the design, accuracy and localization of the algorithm presented here because the frame inequalities (4.32), and the Poincaré inequalities −1,k) ), hold true. Indeed, (5.1) and Construction 4.2 imply that the frame inequalities (4.32) withγ k ≤γδ −d and γ k ≥γδ d , and the Poincaré inequalities are regularity/homogeneity conditions on the mesh and the aggregated elements. Although a fine mesh has been used to facilitate the presentation of the algorithm, the proposed method is meshless (it only requires the specification of the basis elements (ϕ i ) i∈I ).

Exact gamblet transform and multiresolution operator inversion
The near-linear complexity of the proposed multi-resolution algorithm (Algorithm 2) is based on three properties (i) nesting (ii) uniformly bounded condition numbers (iii) localization/truncation based on exponential decay. Truncation/localization levels/subsets are, a priori, functions of the desired level of accuracy ∈ (0, 1) in approximating the solution of (5.3) and to distinguish the implementation of localization/truncation (and its consequences) we will first describe this algorithm in its zero approximation error version (i.e. = 0 and without using localization/truncation, Algorithm 1). Although this error-free version (Algorithm 1) performs the decomposition of the resolution of the linear system (5.3) (whose condition number is of the order of h −d−2 1) into the resolutions of a nesting of linear systems with uniformly bounded condition numbers, it is not of near linear complexity due to the presence of dense matrices. Algorithm 2 achieves near-linear complexity by truncating/localizing the dense matrices appearing in Algorithm 1 ( -accuracy is ensured using the off-diagonal exponential decay of these dense matrices). Let us now describe Algorithm 1 in detail. Lines 1 and 2 correspond to the computation of the (sparse) mass and stiffness matrices of (5.3). Line 4 corresponds to the calculation of level q gamblets ψ (q) i defined as the minimizer of ψ a subject to Ω ψφ (q) j = δ i,j and ψ ∈ span{ϕ l | l ∈ I}, note that since the number of constraints is equal to the number of degrees of freedom of ψ, and since Ω ϕ l φ : 10: 13: 14: 15: (note that by (5.1), the mass matrix is of O(1) condition number). Although not done here, one can also initialize the algorithm (and its fast version) with ψ j as level q measurement functions). Line 5 corresponds to initialization of the vector g (q) introduced above (4.25). Line 6 corresponds to the initialization of the stiffness matrix A (q) introduced in (4.14). The core of the algorithm is the nested computation performed (iteratively from k = q down to k = 2) in lines 8 to 16. Note that this nested computation takes A (k) , g (k) and (ψ (k) i ) i∈I (k) as inputs and produces (1) A (k−1) , g (k−1) and (ψ (k−1) i ) i∈I (k) as outputs for the next iteration and (2) the subband u (k) − u (k−1) of the solution and subband gamblets (χ (k) i ) i∈J (k) (which, do not need to be explicitly computed/stored since Line 11 is equivalent to i ). Note also that the gamblets (ψ can be stored and displayed using the hierarchical structure (4.11). Through this section and the remaining part of the paper we assume that the matrices W (k) are obtained as in Construction 4.11 or 4.13. Note that the number of non-zero entries of π (k−1,k) and W (k) is O(|I (k) |) (proportional to H −k in our numerical example). Lines 9 corresponds to solving the well conditioned linear system B (k) w (k) = W (k) g (k) and the |I (k−1) | well conditioned linear systems B (k) D (k,k−1) = −W (k) A (k) π (k,k−1) . Note that by Theorem 4.17 the matrices B (k) have uniformly bounded condition numbers and these linear systems can be solved efficiently using iterative methods (such as the Conjugate Gradient method recalled in Subsection (4.9)). u (1) is computed in lines 19 and 20 (recall that A (1) is also of uniformly bounded condition number) and the last step of the algorithm, is to obtain u via simple addition of the subband/subscale solution u (1) and (u (k) − u (k−1) ) 2≤k≤q . Observe that the operating diagram of Algorithm 1 is not a V or W but an inverted pyramid (or a comb). More precisely, the basis functions ψ   i . We refer to Figure 8 for an illustration of the condition numbers of A (k) and B (k) (with W (k) still defined by Construction 4.11). Observe that the bound on the condition numbers    illustrated in Figure 11, this approximation error drops down to zero because there is no gap between H q and the fine mesh (i.e., ψ (q) i and ϕ i span the same linear space in the discrete case). Moreover, as illustrated in Figure 10, the representation of u in the basis formed by the functions is sparse. Therefore, as illustrated in Figure   11 one can compress u, in this basis, by setting the smallest coefficients to zero without loss in energy norm.

Fast Gamblet transform/solve
Algorithm 2 achieves near linear complexity (1) in approximating the solution of (5.3) to a given level of accuracy and (2) in performing an approximate Gamblet transform (sufficient to achieve that level of accuracy). This fast algorithm is obtained by localizing/truncating the linear systems corresponding to lines 3 and 12 in Algorithm 1. We define these localization/truncation steps as follows. For k ∈ {1, . . . , q} and i ∈ I (k) define i ρ as in Subsection 4.10 (i.e. as the subset of indices j ∈ I (k) whose corresponding subdomains τ Note that the associated gamblet ψ (q),loc i (Line 4 of Algorithm 2) is also the solution of the problem of finding ψ ∈ span{ϕ j | j ∈ i ρq } such that Ω ψϕ j = δ i,j for j ∈ i ρq (i.e. localizing the computation of the gamblet ψ (q) i to a subdomain of size H q ρ q ). Line 5 can be replaced by g    i by setting 99% of the smallest coefficients to zero in the decomposition of Figure 10.
the presentation of the analysis). Line 12 of Algorithm 2 is defined in a similar way as follows.
Definition 5.3. Let k ∈ {2, . . . , q} and B be the positive definite J (k) × J (k) matrix B (k),loc computed in Line 8 of Algorithm 2. For i ∈ I (k−1) , let ρ = ρ k−1 and let i χ be the subset of indices j ∈ J (k) such that j (k−1) ∈ i ρ (recall that if j = (i 1 , . . . , i k ) ∈ J (k) then j (k−1) := (j 1 , . . . , j k−1 ) ∈ I (k−1) ). ρ) . We define the solution D (k,k−1),loc of the localized linear system Inv(B (k),loc D (k,k−1),loc = −W (k) A (k),locπ(k,k−1) , ρ k−1 ) as the J (k) × I (k−1) sparse matrix given by D (k,k−1),loc j,i = 0 for j ∈ i χ and D (k,k−1),loc j,i = y (i,ρ) j for j ∈ i χ . D (k−1,k),loc (Line 13 of Algorithm 2) is then defined as the transpose of D (k,k−1),loc . 10: 14: 15: 1 ) 20: u loc = u (1),loc + (u (2),loc − u (1),loc ) + · · · + (u (q),loc − u (q−1),loc ) // O(N q) and presentation to using Algorithm 2 as a direct solver. Note that, when used as a direct solver, Algorithm 2 is parallel both in space (via localization) and in bandwith/subscale (subscales can be computed independently from each other and ψ (k−1),loc i and u (k),loc − u (k−1),loc can be resolved in parallel). We will base our analysis on the results of Subsection 4.10 and in particular Theorem 4.27. Although obtained in a continuous setting, these results can be generalized to the discrete setting without difficulty. Two small differences are worth mentioning. (1) In this discrete setting, an alternative approach for obtaining localization error bounds in the first step of the algorithm (the computation of the localized gamblets ψ (q),loc i ) is to use the exponential decay property of the inverse of symmetric well-conditioned banded matrices [26]: since M is banded and of uniformly bounded condition number [26] (see also [11,Thm 4.10]) implies that M −1 i,j decays like exp − dist(τ i . This implies that, in the discrete setting, Ω ψ (k),loc i φ (k) j is not necessarily equal to zero if τ (k) j is adjacent to S i ρ k (with j ∈ i ρ k , using the notation of Subsection 4.10). This, however does not prevent the generalization of the proof because the value of Ω ψ (k),loc i φ (k) j (when τ (k) j is adjacent to S i ρ k ) can be controlled via the exponential decay of the basis functions (e.g. as done in the proof of Theorem 3.13). We will summarize this generalization in the following theorem (where the constant C depends on the constants C 1 , C 0 ,γ andγ associated with the finite elements (ϕ i ) in (5.1), in addition to d, Ω, λ min (a), λ max (a), δ).
Let us now describe the complexity of Algorithm 2. This complexity depends on the desired accuracy ∈ (0, 1). Lines 1 and 2 correspond to the computation of the (sparse) mass and stiffness matrices of (5.3). Note since A and M are sparse and banded (of bandwidth 2d = 4 in our numerical example) this computation is of O(N ) complexity. Line 3 corresponds to the resolution of the localized linear system introduced in Definition 5.2 using M i,ρq , the i ρq × i ρq sub-matrix of M . According to Theorem 5.5, the accuracy of each solve must be |y − y app | M i,ρq ≤ C −1 H 7d/2+3 /q 2 . Since |i ρq | = O(ρ d q ) and since M i,ρq is of condition number bounded by that of M , for each i the linear system of Line 3 can be solved efficiently (to accuracy O(C −1 H 7d/2+3 /q 2 ) using O(ρ q ) = O ln max( 1 , q) iterations of the CG method (reminded in Subsection 4.9) with a cost of O(ρ d q ) per iteration, which results in a total cost of O N ρ d q ln max( 1 , q) . Lines 4 and 5 are naturally of complexity O(N ρ d q ). Since A (q),loc i,j = 0 if τ (q) i and τ (q) j are at a distance larger than 2H q ρ q the complexity of Line 6 is O(N ρ 2d q ). Note that A (k),loc and B (k),loc are banded and of bandwidth O(N ρ d k ). It follows that Line 8 is of complexity O(|I (k) |ρ d k ). According to Theorem 5.6 the linear system of Line 9 needs to be solved up to accuracy |y − y app | B (k),loc ≤ g H −1 (Ω) /2. Since B (k),loc is of uniformly bounded condition number this can be done using O(ln 1 ) iterations of the CG method with a cost of O(|I (k) |ρ d k ) per iteration (using O(|J (k) |) = O(|I (k) |)), which results in a total cost of O(|I (k) |ρ d k ln 1 ) for Line 9. Storing the fine mesh values of u (k),loc − u (k−1),loc in Line 11 costs O(N ρ d k ) (since for each node x on the fine mesh only O(ρ d k ) localized basis functions contribute to the value of u (k),loc − u (k−1),loc ). According to Theorem 5.6, for each i ∈ I (k−1) the linear system B (i,ρ) y = b of Line 12 needs to be solved up to accuracy |y − y app | B (i,ρ) ≤ C −1 H −k+7d/2+4 /(k − 1) 2 . Since the matrix B (i,ρ) inherits the uniformly bounded condition number from B (k),loc this can be done using O(ln 1 ) iterations of the CG method with a cost of O(H −d ρ d k−1 ρ d k ) = O(ρ d k−1 ρ d k ) per iteration. This results in a total cost of O(|I (k) |ρ d k−1 ρ d k ln 1 ) for Line 12. We obtain, using the sparsity structures of D (k−1,k),loc and R (k−1,k),loc that the complexity of Line 13 is O(|I (k−1) |ρ d k−1 H −d ) = O(|I (k−1) |ρ d k−1 ) and that of Line 14 is O(|I (k−1) |ρ 2d k−1 ρ d k ). The complexity of lines 15 to 16 is summarized in the display of Algorithm 2 and a simple consequence of the sparsity structure of R (k−1,k),loc . Line 18 is complexity O(|I (1) |ρ d 1 ln 1 ) (using CG as in Line 9). As in Line 11, storing the values of u (1),loc costs O(N ρ d 1 ). Finally, obtaining u loc in Line 20 costs O(N q) (observe that q = O(ln N )).
Compute and store ψ  Total computational complexity, first solve. Summarizing we obtain that the complexity of Algorithm 2, i.e. the cost of computing the gamblets (ψ Computational complexity of subsequent solves with g ∈ H −1 (Ω). If (5.3) (i.e. (1.1)) needs to be solved for more than one g then the gamblets ψ ergodicity cells with a tight control on mixing as in [47]). The resulting cost of obtaining u (k),hom (in a first solve) such that u (k),loc − u (k),hom L 2 (Ω) ≤ C ( g L 2 (Ω) + g Lip ), is O N ln 3d N H p + −d ) ln d+1 1 .