Survey of multifidelity methods in uncertainty propagation, inference, and optimization

In many situations across computational science and engineering, multiple computational models are available that describe a system of interest. These different models have varying evaluation costs and varying fidelities. Typically, a computationally expensive high-fidelity model describes the system with the accuracy required by the current application at hand, while lower-fidelity models are less accurate but computationally cheaper than the high-fidelity model. Outer-loop applications, such as optimization, inference, and uncertainty quantification, require multiple model evaluations at many different inputs, which often leads to computational demands that exceed available resources if only the high-fidelity model is used. This work surveys multifidelity methods that accelerate the solution of outer-loop applications by combining high-fidelity and low-fidelity model evaluations, where the low-fidelity evaluations arise from an explicit low-fidelity model (e.g., a simplified physics approximation, a reduced model, a data-fit surrogate, etc.) that approximates the same output quantity as the high-fidelity model. The overall premise of these multifidelity methods is that low-fidelity models are leveraged for speedup while the high-fidelity model is kept in the loop to establish accuracy and/or convergence guarantees. We categorize multifidelity methods according to three classes of strategies: adaptation, fusion, and filtering. The paper reviews multifidelity methods in the outer-loop contexts of uncertainty propagation, inference, and optimization.

1. Introduction. We begin by introducing the setting and concepts surveyed in this paper: Section 1.1 defines the setting of multifidelity models and Section 1.2 introduces the concepts of multifidelity methods. Section 1.3 discusses different types of low-fidelity models that may arise in the multifidelity setting. Section 1.4 defines the three outer-loop applications of interest: uncertainty propagation, statistical inference, and optimization. Section 1.5 outlines the remainder of the paper.
1.1. Multifidelity models. Models serve to support many aspects of computational science and engineering, from discovery to design to decision-making and more. In some of these settings, one primary purpose of a model is to characterize the input-output relationship of the system of interest-the input describes the relevant system properties and environmental conditions, and the output describes quantities of interest to the task at hand. In this context, evaluating a model means performing a numerical simulation that implements the model, computes a solution, and thus maps an input onto an approximation of the output. For example, the numerical simulation might involve solving a partial differential equation (PDE), or solving a system of ordinary differential equations, or applying a particle method. Mathematically, we denote a model as a function f : Z → Y that maps an input z ∈ Z to an output y ∈ Y, where Z ⊆ R d is the domain of the inputs of the model, with dimension d ∈ N, and Y ⊆ R d is the domain of the outputs of the model, with dimension d ∈ N. Model evaluations (i.e., evaluations of f ) incur computational costs c ∈ R + that typically increase with the accuracy of the approximation of the output, where R + = {x ∈ R : x > 0} is the set of positive real numbers.
In many situations, multiple models are available that estimate the same output quantity with varying approximation qualities and varying computational costs. We define a high-fidelity model f hi : Z → Y as a model that estimates the output with the accuracy that is necessary for the current task at hand. We define a low-fidelity model f lo : Z → Y as a model that estimates the same output with a lower accuracy than the high-fidelity model. The costs c hi ∈ R + of the high-fidelity model f hi are typically higher than the costs c lo ∈ R + of a low-fidelity model f lo . More generally, we consider k ∈ N low-fidelity models, f (1) lo , . . . , f (k) lo , that each represent the relationship between the input and the output, f 1.2. Multifidelity methods for the outer loop. The use of principled approximations to accelerate computational tasks has long been a mainstay of scalable numerical algorithms. For example, quasi-Newton optimization methods [57,69,31] construct approximations of Hessians and apply low-rank updates to these approximations during the Newton iterations. Solvers based on Krylov subspace methods [121,9,122,184] and on Anderson relaxation [5,211,202] perform intermediate computations in low-dimensional subspaces that are updated as the computation proceeds. Whereas these methods-and many others across the broad field of numerical algorithms-embed principled approximations within a numerical solver, we focus in this paper on the particular class of multifidelity methods that invoke explicit approximate models in solution of an outer-loop problem. We define this class of methods more precisely below; first we introduce the notion of an outer-loop application problem.
We use the term outer-loop application to define computational applications that form outer loops around a model-where in each iteration an input z ∈ Z is received and the corresponding model output f (z) is computed, and an overall outer-loop result is obtained at the termination of the outer loop. For example, in optimization, the optimizer provides at each iteration the design variables to evaluate (the input) and the model must evaluate the corresponding objective function value, the constraints values, and possibly gradient information (the outputs). At the termination, an optimal design is obtained (the outer-loop result). Another outer-loop application is uncertainty propagation, which can be thought of conceptually as a loop over realizations of the input, requiring the corresponding model evaluation for each realization. In uncertainty propagation, the outer-loop result is the estimate of the statistics of interest. Other examples of outer-loop applications include inverse problems, data assimilation, control problems, and sensitivity analysis. 1 Note that although it is helpful for the exposition to think of outer-loop applications as loops, they are often not implemented as such. For example, in uncertainty propagation, once the realizations of the input have been drawn, the model outputs can be typically computed in parallel.
The term many-query application is often used to denote applications that evaluate a model many times [182], a categorization that applies to most (if not all)  outer-loop applications. We distinguish between many-query and outer-loop applications by considering the latter to be the class of applications that target a specific outer-loop result. In contrast, many-query applications do not necessarily target a specific outer-loop result (and thus the set of outer-loop applications is essentially a subset of the set of many-query applications). For example, performing a parameter study is many-query but does not necessarily lead to a specific outer-loop result. This distinction is important in the discussion of multifidelity methods, since accuracy and/or convergence will be assessed relative to a specific outer-loop result. The accuracy of the outer-loop result, as required by the problem at hand, can be achieved by using the high-fidelity model f hi in each iteration of the outer loop; however, evaluating the high-fidelity model in each iteration often leads to computational demands that exceed available resources. Simply replacing the high-fidelity model f hi with a low-fidelity model f lo can result in significant speedups but leads to a lower-and typically unknown-approximation quality of the outer-loop result. This is clearly unsatisfactory and motivates the need for multifidelity methods.
We survey here multifidelity methods for outer-loop applications. We consider the class of multifidelity methods that have two key properties: (1) They leverage a lowfidelity model f lo (or in the general case multiple low-fidelity models f (1) lo , . . . , f (k) lo , k ∈ N), to obtain computational speedups, and (2) they use recourse to the high-fidelity model f hi to establish accuracy and/or convergence guarantees on the outer-loop result, see Figure 1. Thus, multifidelity methods use low-fidelity models to reduce the runtime where possible, but recourse to the high-fidelity model to preserve the accuracy of the outer-loop result that would be obtained with a method that uses only the high-fidelity model. The two key ingredients of multifidelity methods are (1) low-fidelity models f In many situations, different types of low-fidelity models are available, e.g., coarsegrid approximations, projection-based reduced models, data-fit interpolation and regression models, machine-learning-based models, and simplified models. The low-fidelity models vary with respect to error and costs. Multifidelity methods leverage these heterogeneous types of low-fidelity models for speedup.
the models while providing theoretical guarantees that establish the accuracy and/or convergence of the outer-loop result. Note that a crucial component of this characterization of multifidelity methods for outer-loop problems is the use of explicit low-fidelity models that approximate the same output quantity as the high-fidelity model. This distinguishes the methods from those that embed approximations within the solver itself, such as the quasi-Newton and Krylov subspace methods discussed above.
The multifidelity methods we survey are applicable to a broad range of problems, but of particular interest is the setting of a high-fidelity model that corresponds to a fine-grid discretization of a PDE that governs the system of interest. In this setting, coarse-grid approximations have long been used as cheaper approximations. Varying the discretization parameters generates a hierarchy of low-fidelity models. We are here interested in richer and more heterogeneous sets of models, including projectionbased reduced models [191,182,87,19], data-fit interpolation and regression models [72,70], machine-learning-based models such as support vector machines (SVMs) [207,49,38], and other simplified models [132,151], see Figure 2. We further discuss types of low-fidelity models in Section 1.3. In a broader sense, we can think of the models as information sources that describe the input-output relationships of the system of interest. In that broader sense, expert opinions, experimental data, and historical data are potential information sources. We restrict the following discussion to models, because all of the multifidelity methods that we survey are developed in the context of models; however, we note that many of these multifidelity methods could potentially be extended to this broader class of information sources.
Model management serves two purposes. First is to balance model evaluations among the models (i.e., to decide which model to evaluate when). Second is to guarantee the same accuracy in the outer-loop result as if only the high-fidelity model were used. We distinguish between three types of model management strategies (see  high-fidelity model only when indicated by a low-fidelity filter. 2 The appropriate model management strategy for the task at hand typically depends on the nature of the outer-loop application. We survey model management techniques that fall into these three categories in Section 2. Comparison to multilevel methods. Multilevel methods have a long history in computational science and engineering, e.g., multigrid methods [28,93,30,142,204], multilevel preconditioners [27,55], and multilevel function representations [216,14,56,32]. Multilevel methods typically derive a hierarchy of low-fidelity models of the high-fidelity model by varying a parameter. For example, the parameter could be the mesh width and thus the hierarchy of low-fidelity models would be the hierarchy of coarse-grid approximations. A common approach in multilevel methods is to describe the approximation quality and the costs of the low-fidelity model hierarchy with rates, and then to use these rates to distribute work among the models. In this paper, we consider more general low-fidelity models with properties that cannot necessarily be well described by rates. Even though many multilevel methods are applicable to more heterogeneous models than coarse-grid approximations, describing the model properties by rates only, and consequently distributing work with respect to rates, can be too coarse a description and can miss important aspects of the models. Furthermore, in our setting, low-fidelity models are often given and cannot be easily generated on request by varying a (e.g., discretization) parameter. The multifidelity techniques that we describe here explicitly take such richer sets of models into account.
Comparison to traditional model reduction. Traditionally, model reduction [7,182,19] first constructs a low-fidelity reduced model, and then replaces the highfidelity model with the reduced model in an outer-loop application. Replacing the high-fidelity model often leads to significant speedups, but it also means that the accuracy of the outer-loop result depends on the accuracy of the reduced model. In some settings, error bounds or error estimates are available for the reduced model outputs [182,209,86], and it may be possible to translate these error estimates on the model outputs into error estimates on the outer loop result. In contrast, multifidelity  We categorize low-fidelity models as being of three types: simplified models, projection-based models, and data-fit models. methods establish accuracy and convergence guarantees-instead of providing error bounds and error estimates only-by keeping the high-fidelity model in the loop and thus trading some speedup for guarantees-even if the quality of the low-fidelity model is unknown.
1.3. Types of low-fidelity models. We categorize low-fidelity as being of three types: simplified low-fidelity models, projection-based low-fidelity models, and datafit low-fidelity models. Figure 4 depicts this categorization. For a given application, knowledge of and access to the high-fidelity model affects what kind of low-fidelity models can be created. In some cases, the high-fidelity system has a known structure that can be exploited to create low-fidelity models. In other cases, the high-fidelity models are considered to be "black box": they can be evaluated at the inputs in Z to obtain outputs in Y, however no details are available on how the outputs are computed.
Simplified low-fidelity models. Simplified models are derived from the high-fidelity model by taking advantage of domain expertise and in-depth knowledge of the implementation details of the high-fidelity model. Domain expertise allows the derivation of several models with different computational costs and fidelities that all aim to estimate the same output of interest of the system. For example, in computational fluid dynamics, there is a clear hierarchy of models for analyzing turbulent flow. From high to low fidelity, these are direct numerical simulations (DNS), large eddy simulations (LES), and Reynolds averaged Navier-Stokes (RANS). All these model turbulent flows, but DNS resolves the whole spatial and time domain to the scale of the turbulence, LES eliminates small scale behavior, and RANS applies the Reynolds decomposition to average over time. In aerodynamic design, an often-employed hierarchy comprises RANS, Euler equations, and potential theory [97]. The supersonic aerodynamic design problem in [43,42] employs the Euler equations, a vortice lattice model, and a classical empirical model. Similar hierarchies of models exist in other fields of engineering. Models for subsurface flows through karst aquifers reach from simple continuum pipe flow models [36] to coupled Stokes and Darcy systems [35]. In climate modeling, low-fidelity models consider only a limited number of atmospheric effects whereas high-fidelity models are fully-coupled atmospheric and oceanic simulation models [99,132]. There are more general concepts to derive low-fidelity models by simplification, which also require domain expertise but which are applicable across disciplines. Coarse-grid discretizations are an important class of such approximations.
As another example, in many settings a low-fidelity model can be derived by neglecting nonlinear terms. For example, lower-fidelity linearized models are common in aerodynamic and structural analyses [166]. Yet another example is when the highfidelity model relies on an iterative solver (e.g., Krylov subspace solvers or Newton's method), a low-fidelity model can be derived by loosening the residual tolerances of the iterative method-thus, to derive a low-fidelity approximation, the iterative solver is stopped earlier than if a high-fidelity output were computed.
Projection-based low-fidelity models. Model reduction derives low-fidelity models from a high-fidelity model by mathematically exploiting the problem structure, rather than using domain knowledge of the problem at hand. These methods proceed by identifying a low-dimensional subspace that is constructed so as to retain the essential character of the system input-output map. Projecting the governing equations onto the low-dimensional subspace yields the reduced model. The projection step is generally (but not always) intrusive and requires knowledge of the high-fidelity model structure. There are a variety of ways to construct the low-dimensional subspace, see [19] for a detailed review. One common method is the proper orthogonal decomposition (POD) [191,21,175,120,119], which uses so-called "snapshots"-state vectors of the high-fidelity model at selected inputs-to construct a basis for the low-dimensional subspace. POD is a popular basis generation method because it is applicable to a wide range of problems, including time-dependent and nonlinear problems [39,91,119,120,186]. Another basis generation approach is based on centroidal Voronoi tessellation (CVT) [60], where a special Voronoi clustering of the snapshots is constructed. The reduced basis is then derived from the generators of the Voronoi clustering. The work [33] discusses details on CVT-based basis construction. A combination of POD and CVT-based basis construction is introduced in [61]. There are also methods based on Krylov subspaces to generate a reduced basis [67,75], including multivariate Padé approximations and tangential interpolation for linear systems [13,17,74,87]. Dynamic mode decomposition is another basis generation method that is popular in the context of computational fluid dynamics [187,205,171]. Balanced truncation [145,146] is a common basis construction method used in the systems and control theory community. For stable linear time-invariant systems, balanced truncation provides a basis that guarantees asymptotically stable reduced systems and provides an error bound [7,88]. Another basis generation approach is the reduced basis method [182,86,183,181], where orthogonalized carefully selected snapshots are the basis vectors. Depending on the problem of interest, these reduced basis models can be equipped with cheap a posteriori error estimators for the reduced model outputs [107,86,92,182,206,215]. Efficient error estimators can also sometimes be provided for other basis generation methods, such as the POD [102,103].
Data-fit low-fidelity models. Data-fit low-fidelity models are derived directly from inputs and the corresponding outputs of the high-fidelity model. Thus, data-fit models can be derived from black-box high-fidelity models because only the inputs and outputs of the high-fidelity model need to be available. In many cases, data-fit models are represented as linear combinations of basis functions. Data-fit models are constructed by fitting the coefficients of the linear combination via interpolation or regression to the inputs and the corresponding high-fidelity model outputs. The choice of the interpolation and regression bases is critical for the approximation quality of the data-fit models. Polynomials, e.g., Lagrange polynomials, are classical basis functions that can be used for deriving data-fit models. Piecewise-polynomial interpolation approaches allow use of low-degree polynomials, which avoids problems with global polynomial interpolation of high degree, e.g., Runge's phenomenon. If the inputs are low dimensional, a multi-variate data-fit model can be derived with tensor product approaches. In higher-dimensional settings, discretization methods based on sparse grids [32] can be employed. Radial basis functions are another type of basis functions that are widely used for constructing data-fit models [174,70]. If based on the Gaussian density function, radial basis functions typically lead to numerically stable computations of coefficients of the linear combination of the data-fit model. Often the radial basis functions depend on hyper-parameters, e.g., the bandwidth of the Gaussian density function. Well-chosen hyper-parameters can greatly improve the approximation accuracy of the data-fit model but the optimization for these hyper-parameters is often computationally expensive [70]. A widely used approach to interpolation with radial basis functions is kriging, for which a sound theoretical understanding has been obtained and efficient approaches to optimize for the hyper-parameters have been developed [141,185,112,138,174,72]. In particular, kriging models are equipped with error indicators, see, e.g., [185]. There are also support vector machines [207,49,23,188], which have been developed by the machine-learning community for classification tasks but are now used as surrogates in science and engineering as well, see, e.g., [70,16,59,164].

Outer-loop applications.
We focus on three outer-loop applications for which a range of multifidelity methods exist: uncertainty propagation, statistical inference, and optimization.
Uncertainty propagation. In uncertainty propagation, the model input is described by a random variable and one is interested in statistics of the model output. Using Monte Carlo simulation to estimate statistics of the model output often requires a large number of model evaluations to achieve accurate approximations of the statistics. A multifidelity method that combines outputs from computationally cheap low-fidelity models with outputs from the high-fidelity model can lead to significant reductions in runtime and provide unbiased estimators of the statistics of the high-fidelity model outputs [76,150,148,198,160]. Note that we consider probabilistic approaches to uncertainty propagation only; other approaches to uncertainty propagation are, e.g., fuzzy set approaches [22] and worst-case scenario analysis [11].
Statistical inference. In inverse problems, an indirect observation of a quantity of interest is given. A classical example is that limited and noisy observations of a system output are given and one wishes to estimate the input of the system. In statistical inference [195,194,110,192], the unknown input is modeled as a random variable and one is interested in sampling the distribution of this random variable to assess the uncertainty associated with the input estimation. Markov chain Monte Carlo (MCMC) methods provide one way to sample the distribution of the input random variable. MCMC is an outer-loop application that requires evaluating the high-fidelity model many times. Multifidelity methods in MCMC typically use multistage adaptive delayed acceptance formulations that leverage low-fidelity models to speed up the sampling [44,62,51,54].
Optimization. The goal of optimization is to find an input that leads to an optimal model output with respect to a given objective function. Optimization is typically solved using an iterative process that requires evaluations of the model in each iteration. Multifidelity optimization reduces the runtime of the optimization process by using low-fidelity models to accelerate the search [24,112,71,70] or by using a low-fidelity model in conjunction with adaptive corrections and a trust-region modelmanagement scheme [1,2,24,135,172]. Other multifidelity optimization methods build a surrogate using evaluations from multiple models, and then optimize using this surrogate. For example, efficient global optimization (EGO) is a multifidelity optimization method that adaptively constructs a low-fidelity model by interpolating the objective function corresponding to the high-fidelity model with Gaussian process regression (kriging) [109].
1.5. Outline of the paper. The remainder of this paper focuses on model management strategies. Section 2 overviews the model management strategies of adaptation, fusion, and filtering. Sections 3-5 survey specific techniques in the context of uncertainty propagation, inference, and optimization, respectively. The outlook in Section 6 closes the survey.

Multifidelity model management strategies.
Model management in multifidelity methods defines how different models are employed during execution of the outer loop and how outputs from different models are combined. Models are managed such that low-fidelity models are leveraged for speedup, while judicious evaluations of the high-fidelity model establish accuracy and/or convergence of the outer-loop result. This section describes a categorization of model management methods into three types of strategies. The following sections then survey specific model management methods in the context of uncertainty propagation, statistical inference, and optimization.
As shown in Figure 3, we distinguish between three types of model management strategies: adaptation, fusion, and filtering.

2.1.
Adaptation. The first model management strategy uses adaptation to enhance the low-fidelity model with information from the high-fidelity model while the computation proceeds. One example of model management based on adaptation is global optimization with EGO, where a kriging model is adapted in each iteration of the optimization process [109,208]. Another example is the correction of lowfidelity model outputs via updates, which are derived from the high-fidelity model. It is common to use additive updates, which define the correction based on the difference between sampled high-fidelity and low-fidelity outputs, and/or multiplicative updates, which define the correction based on the ratio between sampled high-fidelity and low-fidelity outputs [1,2]. The correction model is then typically built using Taylor series expansion based on gradients, and possibly also on higher-order derivative information [63]. In [113], low-fidelity models are corrected (calibrated) with Gaussian process models to best predict the output of the high-fidelity model. Another multifidelity adaptation strategy is via adaptive model reduction, where projection-based reduced models are efficiently adapted as more data of the high-fidelity model become available during solution of the outer-loop application problem. Key to online adaptive model reduction is an efficient adaptation process. In [162,163], the basis and operators of projection-based reduced models are adapted with low-rank updates. In [37], an h-adaptive refinement of the basis vectors uses clustering algorithms to learn and adapt a reduced basis from high-fidelity model residuals. The work [4] adapts localized reduced bases to smooth the transition from one localized reduced basis to another localized basis.

2.2.
Fusion. The second model management strategy is based on information fusion. Approaches based on fusion evaluate low-and high-fidelity models and then combine information from all outputs. An example from uncertainty propagation is the control variate framework [29,96,149], where the variance of Monte Carlo estimators is reduced by exploiting the correlation between high-and low-fidelity models. The control variate framework leverages a small number of high-fidelity model evaluations to obtain unbiased estimators of the statistics of interest, together with a large number of low-fidelity model evaluations to obtain an estimator with a low variance. Another example from uncertainty propagation is the fusion framework introduced in [118], which is based on Bayesian regression.
Co-kriging is another example of a multifidelity method that uses model management based on fusion. Co-kriging derives a model from multiple information sources, e.g., a low-and a high-fidelity model [6,147,165]. Co-kriging is often used in the context of optimization if gradient information of the high-fidelity model is available, see [71]. The work [123] compares kriging and co-kriging models on aerodynamic test functions. In [214], gradients are computed cheaply with the adjoint method and then used to derive a co-kriging model for design optimization in large design spaces. In [97], co-kriging with gradients and further developments of co-kriging are compared for approximating aerodynamic models of airfoils.

2.3.
Filtering. The third model management strategy is based on filtering, where the high-fidelity model is invoked following the evaluation of a low-fidelity filter. This might entail evaluating the high-fidelity model only if the low-fidelity model is deemed inaccurate, or it might entail evaluating the high-fidelity model only if the candidate point meets some criterion based on the low-fidelity evaluation. One example of a multifidelity filtering strategy is a multi-stage MCMC algorithm. For example, in two-stage MCMC [44,73], a candidate sample needs to be first accepted by the likelihood induced by the low-fidelity model before the high-fidelity model is evaluated at the candidate sample. As another example, in the multifidelity stochastic collocation approach in [148], the stochastic space is explored with the low-fidelity model to derive sampling points at which the high-fidelity model is then evaluated. A third example is multifidelity importance sampling, where the sampling of the high-fidelity model is guided by an importance sampling biasing distribution that is constructed with a low-fidelity model [160].
3. Multifidelity model management in uncertainty propagation. Inputs of models are often formulated as random variables to describe the stochasticity of the system of interest. With random inputs, the output of the model becomes a random variable as well. Uncertainty propagation aims to estimate statistics of the output random variable [140]. Sampling-based methods for uncertainty propagation evaluate the model at a large number of inputs and then estimate statistics from the corresponding model outputs. Examples of sampling-based methods are Monte Carlo simulation and stochastic collocation approaches. In this section, we review multifidelity approaches for sampling-based methods in uncertainty propagation. These multifidelity approaches shift many of the model evaluations to low-fidelity models while evaluating the high-fidelity model a small number of times to establish unbiased estimators. Section 3.1 introduces the problem setup and briefly overviews the Monte Carlo simulation method. Sections 3.2-3.4 discuss multifidelity methods for Monte Carlo based on control variates, importance sampling, and other techniques, respectively. Multifidelity methods for stochastic collocation are discussed in Section 3.5.

Uncertainty propagation and Monte Carlo simulation.
Consider the high-fidelity model f hi : Z → Y and let the uncertainties in the inputs be represented by a random variable Z with probability density function p. At this point, the only assumption we make on the random variable Z is that the distribution is absolutely continuous such that a density function exists. In particular, the random variable Z can be a non-Gaussian random variable. The goal of uncertainty propagation is to estimate statistics of the random variable f hi (Z), e.g., the expectation which we assume to exist. The Monte Carlo method draws m ∈ N independent and identically distributed (i.i.d.) realizations z 1 , . . . , z m ∈ Z of the random variable Z, and estimates the expectation E[f hi ] as

The Monte Carlo estimator is an unbiased estimators
The convergence rate O(m −1/2 ) of the root mean squared error (RMSE) e(s hi m ) is low if compared to deterministic quadrature rules, see Section 3.5; however, the rate is independent of the smoothness of the integrand and the dimension d of the input z, which means that the Monte Carlo method is well-suited for high dimensions d, and, in fact, is often the only choice available if d is large. Typically more important in practice, however, is the pre-asymptotic behavior of the RMSE of the Monte Carlo estimator. In the pre-asymptotic regime, the variance V[f hi ] dominates the RMSE. Variance reduction techniques reformulate the estimation problem such that a function with a lower variance is integrated instead of directly integrating f hi (Z). Examples of variance reduction techniques are antithetic variates, control variates, importance sampling, conditional Monte Carlo sampling, and stratified sampling [96,177]. Variance reduction techniques often exploit the correlation between the random variable f hi (Z) of interest and an auxiliary random variable. Multifidelity methods construct the auxiliary random variable using low-fidelity models. We discuss multifidelity methods for variance reduction based on control variates in Section 3.2, and variance reduction based on importance sampling in Section 3.3.

Multifidelity uncertainty propagation based on control variates.
The control variate framework [96,29,149] aims to reduce the estimator variance of a random variable by exploiting the correlation with an auxiliary random variable. In the classical control variate method, as discussed in, e.g., [96], the statistics of the auxiliary random variable is known. Extensions relax this requirement by estimating the statistics of the auxiliary random variable from prior information [65,159]. We now discuss multifidelity approaches that construct auxiliary random variables from low-fidelity models.

Control variates based on low-fidelity models.
Consider the highfidelity model f hi and k ∈ N low-fidelity models f lo . In [150,164], a multifidelity method is introduced that uses the random variables f stemming from the low-fidelity models as control variates for estimating statistics of the random variable f hi (Z) of the high-fidelity model. An optimal model management is derived that minimizes the MSE of the multifidelity estimator for a given computational budget. In the numerical experiments, high-fidelity finite element models are combined with projection-based models, data-fit models, and support vector machines, which demonstrates that the multifidelity approach is applicable to a wide range of low-fidelity model types.
Let m 0 ∈ N be the number of high-fidelity model evaluations. Let m i ∈ N be the number of evaluations of the low-fidelity model f The multifidelity approach presented in [164] draws m k realizations from the random variable Z and computes the model outputs f hi (z 1 ), . . . , f hi (z m0 ) and for i = 1, . . . , k. These model outputs are used to derive Monte Carlo estimates Note that the estimates (3.8) use the first f lo (z mi−1 ) model outputs of (3.6) only, whereas the estimate (3.7) uses all m i model outputs (3.6) for i = 1, . . . , k. Following [164], the multifidelity estimator of The control variate coefficients α 1 , . . . , α k ∈ R balance the terms hi m0 stemming from the high-fidelity model and the termss mi−1 from the low-fidelity models. The multifidelity estimator (3.9) based on the control variate framework evaluates the highand the low-fidelity model and fuses both outputs into an estimate of the statistics of the high-fidelity model. The multifidelity estimator (3.9) therefore uses a model management based on fusion, see Section 2. We note that (3.9) could also be viewed as a correction, although the correction is to the estimators stemming from the lowfidelity models, not to the low-fidelity model outputs directly.
Properties of the multifidelity estimator. The multifidelity estimators MF is an Therefore The high-fidelity model is evaluated at m 0 realizations and the low-fidelity model f  [150,164], these parameters are chosen such that the MSE of the estimator (3.9) is minimized for a given computational budget γ ∈ R + . The solution to the optimization problem gives the coefficients α * 1 , . . . , α * k and the number of model evaluations m * 0 , . . . , m * k that minimize the MSE of the multifidelity estimators MF for the given computational budget γ. The constraints impose that 0 < m * 0 ≤ m * 1 ≤ · · · ≤ m * k and that the costs c(s MF ) of the estimator equal the computational budget γ.
Variance of the multifidelity estimator. Since the multifidelity estimators MF is unbiased, we have e(s MF ) = V[s MF ], and therefore the objective of minimizing the MSE e(s MF ) can be replaced with the variance V[s MF ] in the optimization problem (3.10). The variance V[s MF ] of the multifidelity estimators MF is are the variances of f hi (Z) and f Optimal selection of the number of samples and control variate coefficients. Under certain conditions on the low-and the high-fidelity model, the optimization problem (3.10) has a unique, analytic solution [164]. The optimal control variate coefficients are The optimal number of evaluations m * 0 , m * 1 , . . . , m * k are where the components of the vector r = [1, r 1 , . . . , r k ] T ∈ R k+1 are given as Note that the convention ρ k+1 = 0 is used in (3.14). We refer to [164] for details.
Interaction of models in multifidelity Monte Carlo. We compare the multifidelity estimators MF to a benchmark Monte Carlo estimators MC that uses the high-fidelity model alone. The multifidelity estimators MF and the benchmark estimators MC have the same costs γ. With the MSE e(s MF ) of the multifidelity estimator and the MSE e(s MC ) of the benchmark Monte Carlo estimator, the variance reduction ratio is The ratio (3.15) quantifies the variance reduction achieved by the multifidelity estimator compared to the benchmark Monte Carlo estimator. The variance reduction ratio is a sum over the costs c hi , c lo , . . . , c lo and the correlation coefficients ρ 1 , . . . , ρ k of all models in the multifidelity estimator. This shows that the contribution of a low-fidelity model to the variance reduction of the multifidelity estimator cannot be determined by the properties of that low-fidelity model alone but only by taking into account all other models that are used in the multifidelity estimator. Thus, the interaction between the models is what drives the efficiency of the multifidelity estimators MF . We refer to [164] for an in-depth discussion of the interaction between the models and a more detailed analysis.
Efficiency of the multifidelity estimator. It is shown in [164] that the MFMC estimators MF is computationally cheaper than the benchmark Monte Carlo estimator that uses the high-fidelity model f hi alone if The inequality (3.16) emphasizes that both correlation and costs of the models are critical for an efficient multifidelity estimator.
Algorithm. Algorithm 1 summarizes the multifidelity Monte Carlo method as presented in [164]. Inputs are the models f hi , f Set ρ k+1 = 0 and define vector r = [1, r 1 , . . . , r k ] T ∈ R k+1 Evaluate high-fidelity model f hi at realizations z 1 , . . . , z m * Compute the multifidelity estimates MF as in (3.9) 10: return multifidelity estimates MF 11: end procedure f hi (Z) stemming from the high-fidelity model and the random variables f lo are derived from r as in (3.13). The control variate coefficients α * 1 , . . . , α * k are obtained as in (3.12). In line 6, m * k realizations z 1 , . . . , z m * k are drawn from the random variable Z. The high-fidelity model f hi is evaluated at the realizations z 1 , . . . , z m * 0 and models f (i) lo are evaluated at z 1 , . . . , z m * i for i = 1, . . . , k. The multifidelity estimate is obtained as in (3.9) and returned.

Other uses of control variates as a multifidelity technique.
In [25,26], the reduced basis method is used to construct control variates. The reduced basis models are built with greedy algorithms that use a posteriori error estimators to particularly target variance reduction. The work [210] uses error estimators to combine reduced basis models with control variates. The StackMC method presented in [203] successively constructs machine-learning-based low-fidelity models and combines them with the control variate framework. In [151], the multifidelity control variate method is used in the context of optimization, where information of previous iterations of the optimization problem are used as control variate. This means that data from previous iterations serve as a kind of low-fidelity "model".
The multilevel Monte Carlo method [98,76] uses the control variate framework to combine multiple low-fidelity models with a high-fidelity model. Typically, in multilevel Monte Carlo, the low-fidelity models are coarse-grid approximations, where the accuracy and costs can be controlled by a discretization parameter. The properties of the low-fidelity models are therefore often described with rates. For example, the rate of the decay of the variance of the difference of two successive coarse-grid approximations and the rate of the increase of the costs with finer grids play a critical role in determining the efficiency of the multilevel Monte Carlo method. Additionally, rates are used to determine the number of evaluations of each low-fidelity model and the high-fidelity model, see, e.g., [45,Theorem 1]. In the setting of stochastic differential equations and coarse-grid approximations, multilevel Monte Carlo has been very successful, see, e.g., [45,199], the recent advances on multi-index Monte Carlo [95], and the nesting of multilevel Monte Carlo and control variates [155] for detailed studies and further references.
3.3. Multifidelity uncertainty propagation based on importance sampling. Importance sampling [96] uses a problem-dependent sampling strategy. The goal is an estimator with a lower variance than a Monte Carlo estimator such as (3.3). The problem-dependent sampling means that samples are drawn from a biasing distribution, instead of directly from the distribution of the random variable Z of interest, and then the change of the distribution is compensated with a re-weighting. Importance sampling is particularly useful in the case of rare event simulation, where the probability of the event of interest is small and therefore many realizations of the random variable Z are necessary to obtain a Monte Carlo estimate of reasonable accuracy. Importance sampling with a suitable biasing distribution can explicitly target the rare event and reduce the number of realizations required to achieve an acceptable accuracy. The challenge of importance sampling is the construction of a biasing distribution, which usually is problem-dependent and typically requires model evaluations.
We discuss multifidelity methods that use low-fidelity models for the construction of biasing distributions.
We define the set I = {z ∈ Z | I hi (z) = 1}. The goal is to estimate the probability of the event Z −1 (I), which is E p [I hi ], with importance sampling. Note that we now explicitly denote in the subscript of E with respect to which distribution the expectation is taken.
Step 1: Construction of biasing distribution. Traditionally, importance sampling consists of two steps. In the first step, the biasing distribution with density q is constructed. Let Z be the biasing random variable with the biasing density q. Recall that the input random variable Z with the nominal distribution has the nominal density p. Let be the support of the density p. If the support of the nominal density p is a subset of the support of the biasing density q, i.e., supp(p) ⊂ supp(q), then the expectation E p [I hi ] can be rewritten in terms of the biasing density q as where the ratio p/q serves as a weight.
Step 2: Deriving an importance sampling estimate. In the second step, the importance sampling estimator is evaluated for realizations z 1 , . . . , z m ∈ Z of the random variable Z . The MSE of the estimator (3.17) is Variance of importance sampling estimator. The variance in (3.18) is with respect to the biasing density q, cf. Section 3.1 and the MSE of the Monte Carlo estimator (3.4). Therefore, the goal is to construct a biasing distribution with to obtain an importance sampling estimators IS m that has a lower MSE than the Monte Carlo estimators hi m for the same number of realizations m.

Construction of the biasing distribution with low-fidelity models.
The multifidelity importance sampling approach introduced in [160] uses a low-fidelity model to construct the biasing distribution in the first step of importance sampling, and derives the statistics using high-fidelity model evaluations in step two. In that sense, multifidelity importance sampling uses a model management strategy based on filtering, see Section 2.
In step one, the low-fidelity model f lo is evaluated at a large number n ∈ N of realizations z 1 , . . . , z n of the input random variable Z. This is computationally feasible because the low-fidelity model is cheap to evaluate. A mixture model q of Gaussian distributions is fitted with the expectation-maximization algorithm to the set of realizations The mixture model q serves as a biasing distribution. Note that other density estimation methods can be used instead of fitting a mixture model of Gaussian distributions with the expectation-maximization algorithm [190,143,161].
In step two, the high-fidelity model is evaluated at realizations z 1 , . . . , z m from the biasing random variable Z with biasing density q. From the high-fidelity model evaluations f hi (z 1 ), . . . , f hi (z m ) an estimate of the event probability E p [I hi ] is obtained. Under the condition that the support of the biasing density q includes the support of the nominal density p, the multifidelity importance sampling approach leads to an unbiased estimator of the probability of the event. If the low-fidelity model is sufficiently accurate, then significant runtime savings can be obtained during the construction of the biasing distribution. Note that using I lo in the second step of the multifidelity approach would lead to a biased estimator of E p [I hi ] because f lo is only an approximation of f hi and thus E p [I hi ] = E p [I lo ] in general.

3.4.
Other model management strategies for probability estimation and limit state function evaluation. The multifidelity approaches for estimating E p [I hi ] that are discussed in Section 3.3 first use the low-fidelity model to construct a biasing density q and then the high-fidelity model to estimate the failure probability E p [I hi ] with importance sampling. Under mild conditions on the biasing density q derived from the low-fidelity model, the importance sampling estimator using the high-fidelity model is an unbiased estimator of the failure probability E p [I hi ]. In this section, we review multifidelity methods that combine low-and high-fidelity model evaluations to obtain an indicator functionĨ that approximates I hi and that is computationally cheaper to evaluate than I hi . Thus, instead of exploiting the two-step importance sampling procedure as in Section 3.3, the techniques in this section leverage the low-and high-fidelity model to approximate I hi withĨ with the aim that the error The multifidelity approach introduced in [128] is based on filtering to combine low-and high-fidelity model outputs to obtain an approximationĨ of I hi . Let f lo be a low-fidelity model and f hi the high-fidelity model. Let further γ > 0 be a positive threshold value. The approach in [128] considers the indicator functioñ I(z) = 1 , f lo (z) < −γ or |f lo (z)| ≤ γ and f hi (z) < 0 0 , else , which evaluates to 1 at an input z if either f lo (z) < −γ or |f lo (z)| ≤ γ and f hi (z) < 0. Evaluating the indicator functionĨ at z means that first the low-fidelity model f lo is evaluated at z. If f lo (z) < −γ, then 1 is returned, and no high-fidelity model evaluation is necessary. Similarly, if f lo (z) > γ, then the indicator functionĨ evaluates to 0 without requiring a high-fidelity model evaluation. However, if |f lo (z)| ≤ γ, then the input z lies near the failure boundary, and the high-fidelity model is evaluated to decide whether the indicator function returns 0 or 1. How often the high-fidelity model is evaluated is determined by the positive threshold value γ. The choice of γ directly depends on the error of the low-fidelity model f lo in a certain norm. If the error of f lo is known, the work [128] establishes a convergence theory under mild conditions on the error of f lo . In particular, the authors of [128] show that the error |E p [Ĩ] − E p [I hi ]| can be reduced below any > 0 by a choice of γ that depends on the error of f lo . If the error of the low-fidelity model is unknown, the work [128] proposes an iterative heuristic approach that avoids the choice of γ. In [127], the multifidelity approach of [128] is extended to importance sampling with the cross-entropy method.
Similarly to the approaches in [128,127], the work [41] switches between low-and high-fidelity model evaluations by relying on a posteriori error estimators for reduced basis models to decide if either the reduced model or the high-fidelity model should be used.
The multifidelity methods in [16,59,163] all use a model management strategy based on adaptation to estimate a failure boundary or failure probability. In [16], a support vector machine is used to derive a low-fidelity model of the limit state function in failure probability estimation and design. The authors decompose the input space using decision boundaries obtained via the support vector machine, and so handle the discontinuities that arise when approximating the limit state function. An adaptive sampling scheme is introduced that refines the low-fidelity model along the failure boundary. In [59], another approach is introduced that uses a support vector machine to approximate the failure boundary. It is proposed to train the support vector machine with data obtained from a low-and a high-fidelity model. With the low-fidelity model, an initial approximation of the failure boundary is obtained by extensively sampling the input domain, which is computationally tractable because the low-fidelity model is cheap to evaluate. The training of the support vector machine for approximating the failure boundary corresponding to the high-fidelity model is then initialized with the approximate boundary obtained with the low-fidelity model. Additional samples are drawn in regions where the low-and the high-fidelity failure boundary differ, to refine the approximation of the high-fidelity failure boundary. The work [163] presents an online adaptive reduced modeling approach, which is demonstrated in the context of failure probability estimation. To increase the accuracy of the reduced model, it is adapted to the failure boundary as the estimation of the failure probability proceeds. The adaptation is performed via low-rank updates to the basis matrix of the reduced model. The low-rank updates are derived from sparse samples of the high-fidelity model.

Stochastic collocation and multifidelity.
Stochastic collocation methods [12,154,89] compute statistics such as the expectation (3.1) and the variance (3.2) by using a deterministic quadrature rule instead of the Monte Carlo method. The quadrature rules are often based on sparse grids [32,154] to perform the quadrature efficiently for high-dimensional inputs.
In [64], statistics are computed using stochastic collocation, where the outputs of a low-fidelity model are corrected with a discrepancy model that accounts for the difference between the high-and the low-fidelity model. The discrepancy model is then used to derive either an additive correction, a multiplicative correction, or a weighted combination of additive and multiplicative corrections to the low-fidelity model outputs. Thus, this is another example of model management based on adaptation, see Section 2. The authors of [64] point out that an adaptive refinement of the discrepancy model is necessary because the complexity of the discrepancy between the high-and the low-fidelity model varies distinctly in the stochastic domain. This is because low-fidelity models tend to approximate the high-fidelity model well only in certain regions of the stochastic domain, whereas in other regions they hardly match the high-fidelity model at all.
Another multifidelity stochastic collocation method is presented in [148]. This method is based on a filtering model management strategy. The low-fidelity model is first evaluated at a large number of collocation points to sample the stochastic domain. From these samples, a small number of points is selected via a greedy procedure, and the high-fidelity model is evaluated at these points. The state solutions of the highfidelity model at the selected collocation points span a space in which approximations of the high-fidelity model states at all other sampling points are derived.
In [198], a multilevel stochastic collocation method uses a hierarchy of models to accelerate convergence. Low-fidelity models are coarse-grid approximations of the high-fidelity model. Similarly to the multilevel Monte Carlo method, a reduction of the computational complexity can be shown if the errors of the models in the hierarchy decay with a higher rate than the rate of the increase of the costs. We categorize this multilevel stochastic collocation method as model management based on fusion, because low-and high-fidelity model outputs are fused to derive an estimate of the statistics of the high-fidelity model. 4. Multifidelity model management in statistical inference. In a Bayesian setting, inverse problems are cast in a statistical formulation where the unknown input is modeled as a random variable and is described by its posterior distribution [195,194,110,192]. MCMC is a popular way to sample from the posterior distribution. Statistical inference raises several computational challenges, including the design of MCMC sampling schemes, the construction of approximate models that can reduce the costs of MCMC sampling, and the development of alternatives to MCMC sampling such as variational approaches [140]. Detailed discussions of these and many other important aspects of statistical inference can be found in the literature, e.g., [177,110,192,130]. We focus here on a few specific aspects of statistical inference in which multifidelity methods have been used. In particular, we survey multifidelity methods that use a two-stage formulation of MCMC, where a candidate sample has to be first accepted by a low-fidelity model before it is passed on to be either accepted or rejected by the high-fidelity model. Section 4.1 describes the problem setup. Section 4.2 describes a two-stage MCMC framework and Section 4.3 discusses a framework where a low-fidelity model is adapted while it is evaluated in an MCMC algorithm. A Bayesian approach to model and correct the error of low-fidelity models is discussed in Section 4.4.

Bayesian framework for inference.
Consider the high-fidelity model f hi that maps inputs z ∈ Z onto outputs y ∈ Y. Let p 0 be a prior density that describes the input z before any measurements. Let further y obs be noisy observational data with the stochastic relationship where is a random vector that captures the measurement error, noise, and other uncertainties of the observation y obs . In the following, the random vector is modeled as a zero-mean Gaussian with covariance Σ ∈ R d×d . Define the data-misfit function as with the norm · . The likelihood function L : Z → R is then proportional to ) .
An evaluation of the likelihood L entails an evaluation of the high-fidelity model f hi . The posterior probability density is p(z|y obs ) ∝ L(y obs |z)p 0 (z) .
We note that there are many challenging and important questions regarding the formulation of an inverse problem in the Bayesian setting. For example, the selection of the prior is a delicate and problem-dependent question, see for example [110,Section 3.3] for a discussion of prior models. We do not address those issues here, but Draw candidate z * from proposal q(·|z i−1 )

6:
Set the sample z i to end for 8: return z 1 , . . . , z m 9: end procedure note that our stated goal of a multifidelity formulation is to recover a solution of the outer-loop problem (here the inverse problem) that retains the accuracy of the highfidelity formulation. Thus, the multifidelity approaches described below will inherit the choices made in the high-fidelity formulation of the Bayesian inverse problem.
Exploring MCMC methods are a popular way to sample from the posterior distribution, which have been studied extensively [178,90,139,80,50,137,53,52]; see also the books [77,130]. One example of an MCMC method is the Metropolis-Hastings algorithm, which is an iterative scheme that draws candidate samples from a proposal distribution and then accepts the candidate sample with a probability that depends on the ratio of the posterior at the current candidate sample and the posterior at the sample of the previous iteration.
Metropolis-Hastings algorithm. Algorithm 2 summarizes the Metropolis-Hastings algorithm. Inputs are the likelihood L, the prior density p 0 , a proposal density q, and the number of samples m. The proposal density q is used to draw the next candidate sample. A typical choice for the proposal density is a Gaussian distribution that is centered at the sample of the previous iteration. In each iteration i = 1, . . . , m, a candidate sample z * is drawn from the proposal q(·|z i−1 ) that depends on the sample z i−1 of the previous iteration. The candidate sample z * is accepted z i = z * with the probability α(z * , z i−1 ), which depends on the ratio of the likelihood at the candidate sample z * and the sample z i−1 of the previous iteration. If the candidate sample is rejected, then z i = z i−1 . Algorithm 2 returns the samples z 1 , . . . , z m .
Metropolis-Hastings algorithm in practice. Several techniques are necessary to make the Metropolis-Hastings algorithm practical. For example, the samples generated in the first iterations are usually discarded (burn-in) because they are strongly influenced by the initial starting point z 0 . We refer to the literature [177] for more details and further practical considerations.
Efficiency of MCMC sampling. Once samples z 1 , . . . , z m are drawn with an MCMC algorithm, they can be used to, e.g., estimate the expectation (4.3) as MCMC generates correlated samples z 1 , . . . , z m . The efficiency of MCMC sampling can therefore be measured by the effective sample size for a given computational budget with respect to the estimatorh. To define the effective sample size, consider the samples z 1 , z 2 , z 3 , . . . drawn with an MCMC algorithm and define the integrated autocorrelation time as where ρ j = corr(h(z 1 ), h(z j+1 )) is the correlation between h(z 1 ) and h(z j+1 ). The effective sample size m eff (h) is see [130, p. 125f] for a detailed derivation and further references. This means that there are at least two ways to improve the efficiency of sampling with MCMC [54]: (1) Increase the effective sample size for a given number of MCMC iterations with, e.g., adaptive MCMC [78,79,179,90].
(2) Increase the number of MCMC iterations for a given computational budget, so that more samples can be generated for a given budget with, e.g., two-stage MCMC [44,73].

Two-stage Markov chain Monte
Carlo. Two-stage MCMC methods aim to increase the number of MCMC iterations for a given computational budget. In many applications, the Metropolis-Hastings algorithm, and MCMC in general, requires many iterations to produce an acceptable effective sample size. Each iteration means a likelihood evaluation, which means a high-fidelity model f hi evaluation in the case of the likelihood L as defined in (4.2). Two-stage MCMC methods employ delayed acceptance or delayed rejection strategies that use multifidelity models to reduce the number of samples evaluated using the expensive high-fidelity model.
The work [44,73] proposes a two-stage delayed acceptance MCMC sampling. The candidate sample z * has to be accepted with the likelihood induced by a low-fidelity model first, before z * is passed on to be either accepted or rejected with the likelihood induced by the high-fidelity model.
Multi-stage MCMC methods based on delayed rejection, in contrast to delayed acceptance, have been proposed in [200,144,85]. A candidate sample that is rejected in one stage is retried with a different proposal distribution at a subsequent stage. Ref. [200] suggests using an independence sampler based on a density that is thought to be a good approximation of the posterior density on the first stage. In case the density corresponding to the independence sampler is a poor approximation, a random walk is used on the second stage. Another approach discussed in [200] is to use a proposal based on a local quadratic approximation of the log posterior density, which is restricted to smaller and smaller neighborhoods of the current point in a trustregion fashion. In [85], the proposal at the first stage is a normal distribution. The proposal at the second stage is a normal distribution with the same mean but with a higher variance. In principle, the proposals in the context of delayed rejection MCMC, just as in delayed acceptance MCMC, can also be derived from models with different fidelities and costs (although we are not aware of such implementations in the literature that pre-date the work by [44,73]).
Two-stage delayed acceptance MCMC algorithm. The two-stage MCMC method introduced in [44] is summarized in Algorithm 3. Inputs are the likelihood L hi and L lo , corresponding to the high-and low-fidelity model respectively, the prior density p 0 , the proposal q, and the number of samples m. Consider one of the iterations i = 1, . . . , m. The first stage of Algorithm 3 (lines 4-6) proceeds as the Metropolis-Hastings algorithm with a candidate sample z lo drawn from the proposal distribution, except that the likelihood of the low-fidelity model is used, instead of the likelihood of the high-fidelity model. The result of the first stage is a sample z hi , which either is z hi = z lo (accept) or z hi = z i−1 (reject). In the second stage of the algorithm (lines 7-9), the high-fidelity model is used to either accept or reject the candidate sample z hi of the first stage. If the sample z hi is accepted in the second stage, then the algorithm sets z i = z hi . If the sample z hi is rejected in the second stage, then z i = z i−1 . Note that in case the first stage rejected the sample z lo , i.e., z hi = z i−1 , no high-fidelity model evaluation is necessary in the second stage because the highfidelity model output at z i−1 is available from the previous iteration. The proposal distribution Q in the second stage depends on the likelihood L lo of the low-fidelity model. Note that δ zi−1 is the Dirac mass at z i−1 . Algorithm 3 returns the samples z 1 , . . . , z m .
Efficiency of two-stage delayed acceptance MCMC. The key to the efficiency of Algorithm 3 is that no high-fidelity model evaluation is necessary in the second stage if the candidate sample z lo was rejected at the first stage. Since the rejection rate of MCMC is typically high, many high-fidelity model evaluations are saved by the two-stage MCMC. We refer to [44] for an asymptotic analysis and the convergence properties of the two-stage MCMC. The two-stage MCMC uses a model management based on filtering because only candidate samples accepted with the low-fidelity model are passed on to the high-fidelity model, see Section 2.
Other multifidelity MCMC approaches. In [62], two-stage MCMC is seen as a preconditioned Metropolis-Hastings algorithm. The low-fidelity model in [62] is a coarse-scale model of a high-fidelity multiscale finite volume model. We also mention [101], where multiple MCMC chains from coarse-(low-fidelity) and fine-scale (highfidelity) models are coupled using a product chain. The work [58] couples multilevel Monte Carlo and MCMC to accelerate the estimation of expected values with respect to a posterior distribution. The low-fidelity models form a hierarchy of coarse-grid approximations of the high-fidelity model.

Markov chain Monte
Carlo with adaptive low-fidelity models. In [54] an algorithm for combining high-and low-fidelity models in MCMC sampling Draw candidate z lo from proposal q(·|z i−1 )

5:
Compute acceptance probability Set the candidate sample z hi to Compute acceptance probability Set sample z i to z i = z hi , with probability α hi (z i−1 , z hi ) , z i−1 , with probability 1 − α hi (z i−1 , z hi ) 10: end for 11: return z 1 , . . . , z m 12: end procedure is presented. The low-fidelity model is used in the first step of a two-stage MCMC approach to increase the acceptance rate of candidates in the second step, where the high-fidelity model is used to either accept or reject the sample. Additionally, the high-fidelity model outputs computed in the second step are used to adapt the lowfidelity model. In that sense, the approach uses a model management based on a combination of adaptation and filtering.
The low-fidelity model is a projection-based reduced model in [54], see Section 1.3. The low-fidelity model is constructed in an offline phase from an initial reduced basis. At each MCMC iteration, the low-fidelity model is used in a first stage to generate a certain number of samples with the Metropolis-Hastings algorithm. The goal is to generate so many samples with the low-fidelity model that the initial sample and the last sample are almost uncorrelated. Then, in the second stage of the MCMC iteration, the last sample generated with the low-fidelity model is used as a candidate sample and the acceptance probability is computed using the high-fidelity model. Similarly to the two-stage MCMC methods discussed above, this algorithm aims to increase the acceptance probability at the second stage; however, the high-fidelity model output is used to improve the low-fidelity model after a sample has been accepted, and is not discarded. In that way, the authors of [54] improve the low-fidelity model during the MCMC iterations, and consequently increase the acceptance probability in the second stage of the MCMC iterations. The authors show that under certain technical conditions, the Hellinger distance d Hell (p hi , p lo ) between the posterior distribution p hi corresponding to the high-fidelity model f hi and the approximate posterior distribution p lo induced by the reduced model f lo decreases as the reduced model f lo is improved during the MCMC iterations. Thus, for any > 0, the Hellinger distance between the high-fidelity and low-fidelity posterior distribution can be bounded d Hell (p hi , p lo ) ≤ after a sufficiently large number of MCMC iterations. The resulting low-fidelity model is data-driven because it uses information provided by the observation y obs , rather than only prior information. Using samples from this adaptive two-stage MCMC approach yields unbiased Monte Carlo estimators, see the analysis in [54].

Bayesian estimation of low-fidelity model error.
In [110,111], a multifidelity approach for Bayesian inference is presented that relies on a Bayesian approximate error model of the low-fidelity model derived from the high-fidelity model. The inference is performed with the low-fidelity model but an additional term is introduced that quantifies the error between the low-fidelity and the high-fidelity model. The error term relies on the high-fidelity model and is used to correct for the error introduced by the low-fidelity model. The approach follows the Bayesian paradigm and considers the error term as a random variable, which becomes another noise term in the Bayesian formulation of the inverse problem. Within our categorization, this approach uses a model management based on model adaptation.
In [110,111], the relationship y obs = f hi (z) + is formulated as which can be rewritten as with the error term e(z). If f hi and f lo are linear in z, the sum of the covariance of the error e(z) and noise term can derived explicitly under certain assumptions [110,111]. If f hi and f lo are nonlinear, a Gaussian approximation of the covariance is proposed in [10].
The Bayesian approximate error model is used in [10,117] for a diffuse optical tomography problem, where the low-fidelity model corresponds a coarse-grid finiteelement approximation and the high-fidelity model is a fine-grid approximation. In [197,196], diffuse optical tomography is considered with a simplified-physics lowfidelity model. The error introduced by the low-fidelity model is corrected with the Bayesian approximate error modeling approach. Correcting for errors due to truncation of the computational domain and unknown domain shape are investigated in [125,153]. The Bayesian approximate error model approach was verified on real data in [152]. Extensions to time-dependent problems are presented in [106,105].
We note that estimating the error term e is related to estimating the model inadequacy of the low-fidelity model with respect to the high-fidelity model. Often, Gaussian-process models are used to attempt to correct model inadequacy [114,115,100,157,18]. The corrected low-fidelity model is typically cheap to evaluate, because the Gaussian-process model is cheap, and so the corrected low-fidelity model can be used in MCMC sampling with the aim of decreasing the costs per MCMC iteration compared to using the high-fidelity model. The work [110,111] can be seen as dealing with model inadequacy in the specific context of inverse problems.

Multifidelity model management in optimization.
The goal of optimization is to find an input that leads to an optimal model output with respect to a given objective function. Optimization is typically formulated as an iterative process that requires evaluating a model many times. Using only an expensive high-fidelity model is often computationally prohibitive. We discuss multifidelity methods that leverage low-fidelity models for speeding up the optimization while still resulting in a solution that satisfies the optimality conditions associated with the high-fidelity model. Section 5.1 formalizes the optimization task. Section 5.2 discusses global multifidelity optimization, which searches over the entire feasible domain, and Section 5.3 reviews local multifidelity optimization, which searches for a locally optimal solution, typically in a neighborhood of an initial input.
5.1. Optimization using a single high-fidelity model. We consider the setting of unconstrained optimization. Given is the high-fidelity model f hi : Z → Y, with the d-dimensional input z ∈ Z. The goal is to find an input z * ∈ Z that solves Typically, the optimal input z * is obtained in an iterative way, where a sequence of inputs z 1 , z 2 , . . . is constructed such that this sequence (z i ) converges to the optimal input z * in a certain sense [156].
Local and global optimization. We distinguish between local and global optimization approaches. Global optimization searches over the entire feasible domain Z for a minimizer z * , whereas local optimization terminates when a local optimum is found. Local optimization thus searches in a neighborhood of an initial point z ∈ Z. Global methods typically do not require the gradient of f hi , which may be advantageous in situations where the model is given as a black box and where approximations of the gradient contain significant noise. The use of gradient information in local methods leads to a more efficient search process that typically uses fewer model evaluations and that is more scalable to problems with high-dimensional input. There are also local methods that do not require gradient information.
We note that the multifidelity solution of constrained optimization problems can be achieved using penalty formulations that convert the constrained problem into an unconstrained one, although some multifidelity methods admit more sophisticated ways to handle approximations of the constraints. Local methods can more easily deal with constraints, whereas global methods often fall back to heuristics (see, e.g., the discussion in [133]).

Global multifidelity optimization.
A large class of global multifidelity optimization methods uses adaptation as a model management strategy. These methods search for a minimizer with respect to an adaptively refined low-fidelity model f lo . They guarantee that the resulting optimal solution is also a minimizer of the highfidelity model f hi by the way in which the low-fidelity model is adapted throughout the optimization search, using information from high-fidelity model evaluations. A critical task in this class of global multifidelity optimization methods therefore is to balance between exploitation (i.e., minimizing the current low-fidelity model) and exploration (i.e., adapting the low-fidelity model with information from the high-fidelity model).

Efficient global optimization (EGO)
. EGO with expected improvement is frequently used to balance exploitation and exploration in cases where the low-fidelity model is a kriging model, see [109] and Section 1.3. Let z be the input that minimizes the low-fidelity model f lo at the current iteration. Sampling at a new point z ∈ Z means evaluating the high-fidelity model at z and then adapting the low-fidelity model with information obtained from f hi (z ).
EGO provides a formulation for choosing the new sample point z . It does this by considering the value f hi (z ) of the high-fidelity model at the new sampling point z to be uncertain before the high-fidelity model is evaluated at z . The uncertainty in f hi (z ) is modeled as a Gaussian random variable Y with mean and standard deviation given by the kriging (low-fidelity) model, see Section 1.3. The improvement at point z is then given by I(z ) = max{f hi (z) − Y, 0}, and the expected improvement is The expected improvement at a point z can be efficiently computed by exploiting the cheap computation of the MSE (and therefore standard deviation) of kriging models [72,109]. The optimization process is then to start with an initial kriging model, find a new input that maximizes the expected improvement, evaluate the high-fidelity model at this new input, update the kriging model, and iterate. The optimization loop is typically stopped when the expected improvement is less than some small positive number [108].
EGO can be made globally convergent and does not require high-fidelity model derivatives [108]. However, as discussed in [108,133], EGO is sensitive to the initial points used for building the kriging model, which means that first a fairly exhaustive search around the initial points might be necessary before a more global search begins. The reason for this behavior is that EGO proceeds in two steps. In the first step, a kriging model is learned, and in the second step, the kriging model is used as if it were correct to determine the next point. We refer to [108,72] for details and possible improvements. The work in [173], builds on EGO and develops a concept of expected improvement for multiobjective optimization that considers improvement and risk when selecting a new point. We also mention [40], where EGO is equipped with an adaptive target method that learns from previous iterations how much improvement to expect in the next iteration.

5.2.2.
Other approaches to combine multifidelity models in the context of global optimization. In the work [3], information from multiple kriging models is fused. Each model in the multifidelity hierarchy is approximated by a kriging model, and the random variables representing the kriging models are fused following the technique introduced in [213]. The authors of [81] propose to use a weighted average of an ensemble of low-fidelity models in an optimization framework. The weights are derived from the errors of the low-fidelity models.
In [84], Gaussian-process low-fidelity models are adapted via active learning, instead of EGO. The approach samples the input space and efficiently fits local Gaussian-process models. These local models are obtained via the treed Gaussianprocess implementation [83] that partitions the input space into several regions and builds separate Gaussian-process models in each region. The approach in [84] is particularly suited to the case where sampling the high-fidelity model entails large-scale, parallel computations. In [82], Gaussian-process models and expected improvement are developed for constrained optimization. The augmented Lagrangian is used to reduce the constrained optimization problem into an unconstrained one, for which Gaussian-process models can be derived.
There are multifidelity optimization methods based on pattern search [104,201] that use filtering as model management strategy. For example, the multifidelity pattern search algorithm presented in [24] uses low-fidelity models to provide additional search directions, while preserving the convergence to a minimizer of the high-fidelity model. In [193], a Bayesian approach to pattern search with Gaussian-process models is presented. The low-fidelity model is used to guide the pattern search. We also refer to [172] for a discussion of pattern search algorithms with low-fidelity models.

Local multifidelity optimization.
Local multifidelity optimization methods in the literature typically use adaptation as a model management strategy. One class of approaches uses direct adaptation of a low-fidelity model, using high-fidelity evaluations that are computed as the optimization proceeds. In each optimization iteration, the high-fidelity model is evaluated at the minimizer of the low-fidelity model. The corresponding high-fidelity model output is then used to adapt the low-fidelity model. Convergence to the minimizer of the high-fidelity model can be guaranteed in limited situations, see [15,172,72] for details. Another class of approaches performs optimization on a corrected low-fidelity model, where the corrections are computed from high-fidelity model evaluations as the optimization proceeds. These latter approaches typically use a trust-region framework to manage the corrections, which we discuss in more detail now.

Multifidelity trust-region methods.
A classical way of exploiting a low-fidelity model in an optimization framework is to optimize over the low-fidelity model in a trust region-that is, to solve an optimization subproblem using the lowfidelity model, but to restrict the size of the step to lie within the trust region. A classical example is to derive a quadratic approximation of the high-fidelity model with its gradient and Hessian at the center of the trust region. The size of the trust region is then determined depending on the approximation quality of the quadratic approximation. We refer to [46] for an introduction, theory, and implementation of these classical trust region methods. Early formulations of trust region ideas can be found in the papers by Levenberg [126] and Marquardt [136], with the papers by Powell [168,169,170] providing the important contribution of establishing and proving convergence of trust-region algorithms.
Classical trust-region methods use quadratic approximations of the objective in the trust region. The work [1] established a multifidelity trust region framework for more general low-fidelity models. In particular, [1] formulates a first-order consistency requirement on the low-fidelity model. The first-order consistency requirement is that the low-and the high-fidelity model have equal value and gradient at the center of the trust region. This consistency requirement ensures that the resulting multifidelity optimization algorithm converges to an optimal solution of the high-fidelity model.

2:
Choose a starting point z 1 and initial step size ∆ 1 > 0 3: Construct low-fidelity model f Update point Compare actual and estimated decrease in f hi Update step size end for 10: return z 1 , z 2 , z 3 , . . . 11: end procedure the loop, an update to the current point is proposed and the trust-region is either expanded or contracted. Note that we have omitted stopping criteria in Algorithm 4 for the ease of exposition. In each iteration i = 1, 2, 3, . . . , a low-fidelity model f (i) lo is constructed so that it satisfies the first-order consistency requirement. Thus, the low-fidelity model output f  lo (z i ) = ∇f hi (z i ) equals the gradient of the high-fidelity model. It is shown in [1] that with an additive or multiplicative correction an arbitrary low-fidelity model can be adjusted to satisfy the first-order consistency requirement. The first-order consistency requirement can also be established with the scaling approach introduced in [94]. Then, the step s i is computed and accepted if the point z i + s i decreases the high-fidelity model f hi with respect to the current point z i , i.e., if f hi (z i + s i ) < f hi (z i ). The size of the trust region is updated depending on the ratio of the actual decrease f hi (z i ) − f hi (z i + s i ) to the estimated decrease f hi (z i ) − f (i) lo (z i + s i ). The parameters γ 1 and γ 2 control when to shrink and when to expand the trust region, respectively. The parameters η 1 and η 2 define the size of the contracted and expanded trust region, respectively. The algorithm returns the points z 1 , z 2 , z 3 , . . . .

5.3.3.
Generalized multifidelity trust-region methods. In Algorithm 4, the low-fidelity model is corrected in each iteration. In [8,66], the trust-region POD (TRPOD) is proposed, which uses a projection-based POD reduced model in conjunction with a trust-region optimization framework and adapts the POD reduced model in each optimization iteration. Similarly, in [20,218], projection-based models are adapted within similar trust region frameworks. The work [217] uses error bounds for interpolatory reduced models to define the trust region and refinement of the reduced models to guarantee convergence to the optimality conditions associated with the high-fidelity model. In [68], a correction model is used to correct low-fidelity model outputs in a trust-region model management optimization scheme. The correction model is constructed in the trust region and utilized to calculate additive and multiplicative adjustments to the low-fidelity model. The work [180] develops a trust-region multifidelity approach for combining low-and high-fidelity models that are defined over different design spaces. For example, the number of design variables used with the low-fidelity model can be different than the number of design variables with the high-fidelity model. Space mappings are derived that map between the design spaces. The space mappings are corrected to obtain a provably convergent method.
If gradients are unavailable, or too costly to approximate, then the framework developed by [1] relying on the first-order consistency of the low-fidelity model cannot be applied. In [134], a gradient-free multifidelity trust-region framework with radial basis function interpolants as low-fidelity models is introduced, building on the gradientfree trust-region methods of [47,48] and the tailored radial basis function modeling of [212]. This leads to an error interpolation that makes the low-fidelity model satisfy the sufficient condition to prove convergence to a minimizer of the high-fidelity model.

Multifidelity optimization under uncertainty.
In their simplest forms, optimization under uncertainty formulations consider a similar problem to (5.1), but where now the objective function f hi incorporates one or more statistics that in turn depend on underlying uncertainty in the problem. For example, a common objective function is a weighted sum of expected performance and performance standard deviation. Thus, each optimization iteration embeds an uncertainty quantification loop (e.g., Monte Carlo sampling or stochastic collocation) over the uncertain parameters. [150] uses the control variate-based multifidelity Monte Carlo method of Section 3.2 to derive a multifidelity optimization under uncertainty method that provides substantial reductions in computational cost using a variety of low-fidelity model types. In [151], it is shown that evaluations computed at previous optimization iterations form an effective and readily-available low-fidelity model that can be exploited by this control-variate formulation in the optimization under uncertainty setting. The work in [189] uses a combination of polynomial chaos stochastic expansions and corrections based on coarse-grid approximations to formulate a multifidelity robust optimization approach.
We highlight optimization under uncertainty as an important target area for future multifidelity methods. It is a computationally demanding process, but one with critical importance to many areas, such as engineering design.
6. Conclusions and outlook. As can be seen from the vast literature surveyed in this paper, multifidelity methods have begun to have impact across diverse outer-loop applications in computational science and engineering. Multifidelity methods have been used for more than two decades to accelerate solution of optimization problems. Their application in uncertainty quantification is more recent, but appears to offer even more opportunity for computational speedups, due to the heavy computational burden typically associated with uncertainty quantification tasks such as Monte Carlo and MCMC sampling. This paper highlights the broad range of multifidelity approaches that exist in the literature. These approaches span many types of low-fidelity models as well as many specific strategies for achieving the multifidelity model management. We attempt to bring some perspective on the similarities and differences across methods, as well as their relative advantages, by categorizing methods that share a common theoretical foundation. We discuss methods to create low-fidelity models according to the three areas of simplified models, projection-based models, and data-fit models. We categorize multifidelity model management methods as being based on adaptation, fusion, and filtering. In most settings, one can flexibly choose a combination of model management strategy and low-fidelity model type, although-as is always the case in computational modeling-bringing to bear knowledge of the problem structure helps to make effective decisions on the low-fidelity modeling and multifidelity model management strategies that are best suited to the problem at hand. We note that this paper focused on outer-loop applications and therefore mentioned only briefly multifidelity methods that do not directly exploit the outer-loop setting.
Multifidelity methods have advanced considerably, especially in the past decade. Yet a number of important challenges remain. In almost all existing multifidelity methods (and in the presentation of material in this paper), the assumption is that the high-fidelity model represents some "truth." This ignores the important fact that the output of the high-fidelity model is itself an approximation of reality. Even in the absence of uncertainties in the inputs, all models, including the high-fidelity model, are inadequate [114]. Furthermore, the relationships among different models may be much richer than the linear hierarchy assumed by existing multifidelity methods. One way multifidelity methods can account for model inadequacy is by fusing outputs from multiple models. Model inadequacy is often quantified by probabilities, which describe the belief that a model yields the true output. Techniques for assigning model probabilities reach from expert opinion [219,176] to statistical information criteria [129,34] to quantifying the difference between experimental data and model outputs [158]. The probabilities are then used to fuse the model outputs with, e.g., Bayesian model averaging [124] or adjustment factors [176]. The fundamental work by Kennedy and O'Hagan [114] treats model inadequacy in a Bayesian setting and led to a series of papers on correcting model inadequacy and validating models [115,100,157,18]. The work [18] introduces a six-step process to validate models. In [3], Bayesian estimation is employed to fuse model outputs together. Incorporating such approaches into multifidelity model management strategies for outer loop applications remains an important open research challenge.
Another important challenge, already discussed briefly in this paper, is to move beyond methods that focus exclusively on models, so that decision-makers can draw on a broader range of available information sources. Again, some of the foundations exist in the statistical literature, such as [113] which derives models by fusing multiple information sources such as experiments, expert opinions, lookup tables, and computational models. In drawing on various information sources, multifidelity model management strategies must be expanded to address not just which information source to evaluate when, but also where (i.e., at what inputs) to evaluate the information source. Relevant foundations to address this challenge include the experimental design literature and value of information analysis [167].