Emulation of multivariate simulators using thin-plate splines with application to atmospheric dispersion

It is often desirable to build a statistical emulator of a complex computer simulator in order to perform analysis which would otherwise be computationally infeasible. We propose methodology to model multivariate output from a computer simulator taking into account output structure in the responses. The utility of this approach is demonstrated by applying it to a chemical and biological hazard prediction model. Predicting the hazard area that results from an accidental or deliberate chemical or biological release is imperative in civil and military planning and also in emergency response. The hazard area resulting from such a release is highly structured in space and we therefore propose the use of a thin-plate spline to capture the spatial structure and fit a Gaussian process emulator to the coefficients of the resultant basis functions. We compare and contrast four different techniques for emulating multivariate output: dimension-reduction using (i) a fully Bayesian approach with a principal component basis, (ii) a fully Bayesian approach with a thin-plate spline basis, assuming that the basis coefficients are independent, and (iii) a"plug-in"Bayesian approach with a thin-plate spline basis and a separable covariance structure; and (iv) a functional data modeling approach using a tensor-product (separable) Gaussian process. We develop methodology for the two thin-plate spline emulators and demonstrate that these emulators significantly outperform the principal component emulator. Further, the separable thin-plate spline emulator, which accounts for the dependence between basis coefficients, provides substantially more realistic quantification of uncertainty, and is also computationally more tractable, allowing fast emulation. For high resolution output data, it also offers substantial predictive and computational advantages over the tensor-product Gaussian process emulator.

1. Introduction. The simulation of scientific and engineering systems via complex mathematical models has become a common method of gaining knowledge about processes where physical experimentation is infeasible or unaffordable. Encapsulated in computer codes or simulators, many of these models require substantial computing time to evaluate the response for a given set of inputs. For even moderately expensive simulators, the computational resources required to perform, for example, Monte Carlo inference may be prohibitive in practice. Hence, building an emulator or surrogate for the computer model trained on a-usually small-set of simulator evaluations has become standard practice; see, for example, the seminal paper of Sacks et al. [29], Kennedy, Anderson, and O'Hagan [19], who presented a number of case studies of such computer experiments, and the book-length treatments of Santner, Williams, and Notz [30], Fang, Li, and Sudjianto [9], and Forrester, Sobester, and Keane [10]. Computationally cheap emulators allow for real-time decision making and greater scientific understanding through, for example, sensitivity and uncertainty analyses.
Statistical modeling is a common method for constructing emulators. Essentially, simulator output is treated as a realization of a stochastic process, and regression models are fitted to the data in order to approximate the relationship between the simulator inputs and the outputs. The most common emulator is the Gaussian process (see, e.g., [25]), a smooth nonparametric interpolator. An emulator based on a statistical model allows for prediction of the simulator at untested inputs and quantification of the associated uncertainty. For deterministic simulators, as considered in this paper, this uncertainty is a result of incomplete knowledge of the simulator across the whole input space and approximation error from the regression model.
Modern applications increasingly involve highly multivariate simulators, with each run of the simulator producing data from a curve, surface, or other high-dimensional output structure. The standard approach is to perform dimension reduction on the multivariate output using a set of appropriate basis functions, and then use scalar emulation methods, such as the Gaussian process, on the basis coefficients. We delay our discussion of the related statistical literature to section 2.
Motivated by a simulator of chemical and biological dispersion, the contribution of this paper is to propose the use of thin-plate splines as basis functions for multivariate data with spatial structure, and to develop the necessary methodology for their application. For a twodimensional output, a thin-plate spline basis provides a spatial mapping using the proximity of data to a set of knots (see section 3). Splines have also recently been employed for computer experiment modeling [6] as an alternative to Gaussian process models. We implement both a fully Bayesian thin-plate spline emulator using Markov chain Monte Carlo and also a "plug-in" emulator (see, e.g., [20]) with a separable covariance structure [27] and correlation parameters estimated using a validation data set. We provide a detailed comparison of these competing methodologies with the application of a functional data model using a Gaussian process defined on both the input and output domains.
In this paper, the problem of emulating a multivariate simulator built from a Gaussian puff model is considered. This model accounts for both the effects of an urban environment and the underlying terrain features. For each run of the simulator, the response of interest is the dosage, defined as the integrated concentration over time, measured at a large number of points in a two-dimensional geographical domain. Inputs to the simulator include meteorological variables (such as wind speed and direction) and variables related to the source of the dispersion (such as location and size of release). While relatively fast (∼ 1 minute per run for a complete spatial output grid) and simple to compute, the dispersion model is employed in the optimization of sensor placements. Therefore computationally inexpensive surrogates are required that can provide accurate high-resolution prediction on the spatial output grid.
Previously, a number of authors have considered the univariate problem of calibrating (relatively simple) dispersion models using actual dispersion data obtained under a single, but uncertain, set of meteorological and source conditions; see, for example, [31], [20], [23], and [26]. Typically, the aim of such work is to solve the inverse problem of identifying unknown features of the source.
The remainder of this paper is organized as follows. In section 2, we introduce and describe the multivariate emulator, dimension-reduction techniques for multivariate outputs, and separable covariance structures. The necessary methods for thin-plate spline emulation are developed in section 3 and applied to an illustrative dispersion model in section 4. In that section, we also compare our methodology to applying dimension reduction via principal components and to a tensor-product, separable Gaussian process model. We explore the effectiveness of the methodology for high-resolution prediction. In section 5, we compare dimension-reduction and functional data modeling on an artificial environmental example. We conclude in section 6 with some discussion and areas for future research. The appendices provide mathematical and computational details of the methods.

Multivariate emulation. Let
be the vector of input values at which the ith run of the simulator is performed, i.e., the ith input point (i = 1, . . . , n), and let Y ( T be the vectorized output from this run. The vector s j = (s 1j , . . . , s qj ) T ∈ S locates the jth output in the q-dimensional output domain S ⊆ R q (j = 1, . . . , r). For example, for a two-dimensional simulator, q = 2 and s j = (s 1j , s 2j ) T , which may be the geographical coordinates of the response. We denote the complete output grid as an r × q matrix S = (s 1 , . . . , s r ) T .
We achieve dimension reduction through the assumption of a linear model for the response: In (2.1), a 1 (S), . . . , a p (S) are a set of r × 1 basis vectors which are assumed independent of x but which may depend on the output grid S, the coefficients β(x) = [β 1 (x), . . . , β p (x)] T are functions of the input variables x, and e i is an r-vector of errors resulting from the basis function approximation. Figure 1 summarizes this hierarchical model and demonstrates the dependence of β(x 1 ), . . . , β(x n ) on x 1 , . . . , x n , and of a 1 , . . . , a p on s 1 , . . . , s r . In our motivating application, interest is only in prediction at the s j , and hence for each output location we mean-center and standardize the data [Y (x 1 , s j ), . . . , Y (x n , s j )] to have variance equal to one. We complete the hierarchical specification of the model through choice of prior distributions. For β k (x), we assume the impact of the input variables is modeled using a Gaussian process, where GP denotes a Gaussian process and ρ(x, x ; θ) is a correlation function dependent on input vectors x and x . Scale and correlation length hyperparameters are labeled τ k and θ k = (θ 1k , . . . , θ dk ) T , respectively. We discuss the choice of joint distribution of β k (x) and β k (x ) below. For e i we assume with I r the r × r identity matrix and σ 2 > 0. The choice of an independent multivariate prior distribution for e i assumes that the regression function p k=1 a k (S)β k (x i ) is detailed enough to capture the vast majority of the variation in the data. The adequacy of this assumption for a particular application can be checked via standard residual diagnostics.
To specify the joint distribution of β k (x) and β k (x ), k, k = 1, . . . , p, we consider two simplifying cases.
(i) Independence of β k (x) and β k (x ) for k = k , that is, Then the block diagonal variance-covariance matrix of β is given by with τ W being the n × n common variance-covariance matrix for β k with ijth entry τ ρ(x i , x j ; θ), and V a p×p scale matrix for β( The exact form of φ and construction of V depend on the basis chosen; we discuss the choice of V for our thin-plate spline emulators in section 3. Point prediction of the response at a untried input point, x , is via whereβ k (x ) is an appropriate summary of the posterior distribution of β k (x ). There is no conjugate prior distribution available for unknown θ, and so to obtain the full marginal distribution, numerical methods must be used, such as Markov chain Monte Carlo (MCMC; [22,Chap. 10]) with a sample from the posterior predictive distribution obtained by plugging samples from β k (x )|Y into (2.6). Alternatively, a maximum a posteriori (MAP) or other estimator of θ can be substituted into the conditional posterior distribution of β k (x * ).
Similar models have been developed and applied by a variety of authors. Campbell, McKay, and Williams [5] considered a variety of basis functions for one-dimensional functional responses, including orthogonal polynomials and data-adaptive choices such as principal components and partial least squares. A wavelet basis was used by Bayarri et al. [1] in a calibration problem with a functional response. Higdon et al. [16] used a principal components basis for an example of cylinder deformation from the Manhattan project. These latter authors had a two-dimensional output grid, consisting of angles and times, and were again concerned with model calibration.
Principal component analysis (PCA; [17]) is a dimension-reduction technique that has been commonly used in the literature for defining a new, orthogonal basis for a set of multivariate data. For an r × n matrix Y = [Y (x 1 ), . . . , Y (x n )], the principal components are defined through the singular value decomposition of Y = ΛΣΩ T , where Λ is an r × n orthogonal matrix whose columns are the left singular vectors of Y , Σ is an n × n diagonal matrix holding the singular values of Y , and Ω is an n × n orthonormal matrix whose columns are the right singular vectors. A p-dimensional principal component basis a 1 , . . . , a p is given by the first p columns of n −1/2 ΛΣ, with corresponding weights β k (x i ) given by entry (k, i) in n 1/2 Ω (k = 1, . . . , p; i = 1, . . . , n). The basis functions are orthogonal and have the property of defining subspaces with the largest variance; see [15,Chap. 3]. The data-dependent nature of the principal components provides a flexible nonparametric modeling approach. However, it does not take the structure of the output grid S into account.

Thin-plate regression splines.
To provide a set of flexible and data-driven basis vectors a 1 (S), . . . , a p (S) that maintain the multidimensional spatial structure inherent in S, we use thin-plate regression splines (TPRSs) [33]. To define the TPRS basis, we start with the r × r matrix E having uvth entry η lq (||s u − s v ||), with || · || defined as Euclidean distance and with 2l > d controlling the smoothness of the spline (l = 2 corresponds to a smoothness penalty in terms of second derivatives of the response function, and is adopted in this paper). A thin-plate spline for the ith simulator run is then the solution to with respect to δ i and α i for λ i ≥ 0 (i = 1, . . . , n). The matrix T holds basis vectors corresponding to orthogonal polynomials in R q of degree less than l. In our applications with l = 2 and a regular lattice output grid, T is an r × 3 matrix with jth row [1 | s T j ]. The thin-plate spline is therefore the solution to a penalized least squares problem.
To avoid problems of choosing "knot locations" (e.g., a subsample of the output grid) in selecting a regression basis, Wood [33] defined a TPRS basis as a rank-m approximation to the spline, with basis vectors given by the columns of U m D m Z m and T , where D m is an m × m diagonal matrix holding the m largest eigenvalues of E ordered by absolute value, U m is an r × m matrix holding the corresponding eigenvectors, and Z m is an m-dimensional orthogonal column basis such that T T U m Z m = 0. That is, an approximation to problem (3.1) with λ i = 0 (i.e., unpenalized) is given by the solution to (3.2). We require a constant scale matrix for β(x i ) for all x i in order to construct the separable variance-covariance matrix (2.5) for the complete parameter vector β and to allow prediction for untried input points. Hence, rather than derive the variance- , a spatial correlation function defined on the output locations. It then follows that Cov{β m (

Application to the dispersion simulator.
In this section, the methodology from sections 2 and 3 is applied to an exemplar dispersion simulator. As described in section 1, the response of interest is dosage (integrated concentration over time). The resultant dosage map is known as a hazard prediction and forms the basis for many modeling systems designed to mitigate hazard such as source-term estimators [26], sensor placement optimizers, and training scenarios. The motivating system for this paper is a sensor placement tool which requires rapid hazard predictions over which the probability of detection of a chemical or biological release is numerically maximized via the placement of a set of sensors. For other applications, the methodology could be applied to emulate concentration at a given time, or time could be included as an extra dimension to either the input or output (cf. [8]).
In the application of interest, it is key that (i) a hazard prediction can be made rapidly to assess performance of a sensor grid against a particular threat in real time, and (ii) there is accurate prediction of the dosage on a common high-resolution spatial output grid for untried values of the input variables. Our data comes from the Urban Dispersion Model (UDM), an example of a Gaussian puff model. In these simulators, continuous releases are modeled as a serious of instantaneous releases ("puffs") with Gaussian profiles in the "x" and "y" planes (forming a bell shape). The UDM can model dispersion across complex terrain such as urban areas by including the effect of buildings, in addition to underlying terrain (for example, hills or valleys), by tracking the interaction of individual puffs with local geographical features.
There are numerous inputs to a dispersion simulator, including those related to the release (see, e.g., [4]). For this application, the input space is limited to the d = 2 dimensions which, from previous experience, are known to have the greatest effect on dosage: mass of the release (0 ≤ x 1 ≤ 50kg) and wind speed (5 ≤ x 2 ≤ 30m/s). For a given terrain, other variables known to affect dosage, such as wind direction and location of release, can be specified postsimulation through rotation and translation operations. Typical simulator responses exhibit a very rapid drop in dosage upwind of the release and a complex downwind gradual reduction in dosage. There is also different behavior in the downwind ("x") and crosswind ("y") directions. This behavior is not well described by standard exponential functions; see Figure 2.
Our data results from simulator runs on a "uniform" urban area representative of a medium-rise city environment outside of the central business district. Accurate high-resolution spatial prediction is important for both comparing competing sensor placements in the optimization tool and ensuring that assessments of chosen sensor placements are realistic summaries of actual system performance. Errors of only a few tens of meters in sensor placement, perhaps caused by small prediction errors in the modeling, can result in large discrepancies in predicted dosage and therefore lead to substantial underperformance of a sensor system against a specific threat. Any emulator must therefore be capable of producing rapid predictions on a high-resolution spatial grid, and even small improvements in predictive performance are practically valuable.
For the ith run, dosage is output on a k × k grid (i = 1, . . . , n). Hence, s j = (s 1j , s 2j ) T locates the jth (standardized) geographical position at which dosage is calculated (j = 1, . . . , r). The output grid is common to all runs; if this were not the case, linear interpolation could be used to map the outputs from different simulator runs to a common output grid. The vector Y (x i ) holds the r = k 2 dosages for the ith run; we in fact model log (Y (x i ) + 1). The simulator input values are held in We apply and compare four emulation approaches: 1. A fully Bayesian approach using a principal components basis (PC-GP emulator). 2. A fully Bayesian approach using a TPRS basis and assuming independence of the elements of β(x i ) (cf. (2.4); iTPRS-GP emulator). 3. An empirical Bayes approach using a TPRS basis and assuming a separable covariance structure for β (cf. (2.5); sTPRS-GP emulator). 4. An empirical Bayes approach using a Gaussian process with a separable covariance structure defined on X × S (sGP emulator). For emulators 1 and 2, the posterior predictive distribution for Y is obtained using MCMC with convergence assessed graphically using diagnostic plots. For these emulators, we also assume V = I p ; that is, the elements of β(x i ) are assumed independent. While for emulator 1 this assumption has a heuristic justification via adoption of an orthogonal PC basis, we acknowledge that for emulator 2 it may lead to overconfident prediction intervals and an overestimate of the effective sample size. However, it substantially reduces the computational burden of the MCMC algorithm and provides a benchmark for emulators 3 and 4. Here, the correlation length parameters θ and ν are chosen using validation simulation runs and substituted into the conditional posterior predictive density (see also [27] and [28]).
Emulator 4 models the simulator output directly as a function of both x and s, that is, The assumption of a separable correlation structure results in a tensor-product variance- . See [27] for a detailed discussion of this model and extensions.

Choice of correlation function.
In this paper, we use a squared exponential correlation function [16], with parameterization Assuming a standardized wind direction, which can be adjusted post-simulation, the main variation in the response will be axially aligned, making (4.1) a reasonable choice. A fully Bayesian approach requires a prior distribution for each θ j , and we assume common Beta prior densities with π(θ k ) ∝ θ a θ −1 A nugget term was added to improve the conditioning of the variance-covariance matrix of β. Clearly, alternative correlation functions could be employed; we also tried the Matérn correlation function but found for this application it led to less accurate results and did not improve numerical stability of the modeling. The addition of a nugget is a pragmatic method for reducing the sensitivity of the modeling results to underlying assumptions, including the choice of covariance function [12].

Design of the simulator runs.
The emulators were trained using n = 80 simulator runs generated as a maximin Latin hypercube sample [21]. A further 75 simulator runs were generated randomly as a Monte Carlo sample, with 40 of these runs used for validation and choice of some prior hyperparameters. The remaining 35 runs were used as test cases to assess emulator accuracy. Initially, all simulator runs were generated for a low-resolution k = 8 regular lattice in S with no boundary points. We assess the more promising emulators on a high-resolution output grid in section 4.6.

Model fitting and choice of hyperparameters.
Bayesian inference for the PC-GP and iTPRS-GP emulators proceeded using MCMC and the likelihoods derived by Higdon et al. [16] (for PC-GP) and in Appendix B (for iTPRS-GP). To reduce computational complexity for the iTPRS-GP emulator, we applied the Woodbury matrix identity (see, e.g., [13,Chap. 8]) to update the inverse and determinant of the expanded model matrix within the MCMC iterations. This replaces the inversion of an np × np square matrix with the inversion of an n × n matrix when sampling each of the nugget, τ k , and θ k . An outline of this implementation is given in Appendix C. The MCMC was run for 10000 iterations with a burn-in of 1000, which produced acceptable trace plots with no bias from starting location. No thinning is performed, as the trace plots showed the chains mixing well. Specification of the prior hyperparameters for the common Gamma distributions for each τ −1 k is simplified by the standardization of the simulator output, which leads to an expectation of the variance of each β k (x) being close to one. Hence, we chose an informative Gamma(5, 0.2) prior distribution, with density f (τ −1 ) ∝ τ −4 exp (−5/τ ), again following [16].
Other hyperparameters for all four emulators, including the numbers of regression basis functions in (2.1), were chosen to minimize the root mean squared error (RMSE) of prediction on the validation runs. Alternatives include choosing the hyperparameters to maximize the integrated likelihood, but use of the RMSE is consistent with our goal of accurate prediction for untried x. For all the dimension-reduced emulators (PC-GP, iTPRS-GP, and sTPRS-GP), the number of basis functions p was the main determinant of RMSE. The choice of p controls the trade-off between detailed modeling of the training data and the generalization to untried cases. Increasing p also increases the computational expense of the model fitting. For all three dimension-reduced emulators, we found that relatively small values of p minimized the RMSE.
For the PC-GP and iTPRS emulators, these hyperparameters consisted of a θ and b θ , common to the independent Beta prior distributions assumed for the entries of θ k (with density , a σ and b σ for the Gamma distribution assumed for σ −2 , and the numbers of principal components (PC-GP) and thin-plate spline basis functions (iTPRS-GP). We choose a σ = 2 to define a Gamma distribution with lim z→0 + f (z) = 0 and lim z→0 + f (z) = b −2 σ , so that larger values of b σ result in higher prior density for larger values of σ −2 . In general, RMSE was lowest for lower values of b σ and also smaller numbers of basis functions. Various shapes of Beta prior density were tested, and the results are fairly robust to the choices of a θ and b θ . For both emulators, we chose b σ = 0.01 and a θ = 1; for the PC-GP emulator, p = 3 and b θ = 3, and for the iTPRS-GP emulator, p = 5 and b θ = 0.1. Several similar hyperparameter settings produced comparable RMSEs.
For the sTPRS-GP and sGP emulators, correlation parameters θ = (θ 1 , θ 2 ) T and ν = (ν 1 , ν 2 ) T are chosen using the validation runs, together with hyperparameters a τ and b τ for the Gamma prior distribution on τ . The numerical results indicated that the RMSE was robust to the choice of a τ and b τ , and a Gamma(1,1) prior distribution was chosen, giving near linearly decreasing support between 0 and 1. For the sTPRS-GP emulator, the minimum RMSE was achieved with p = 5 and θ = ν = (0.05, 0.05) T . For the sGP emulator, the minimum RMSE occurred with θ = (0.5, 0.65) T and ν = (0.8, 0.4) T . For these two emulators, σ 2 is estimated from the mean squared error on the test data.

Comparison of emulators.
Each of the four emulators was used to predict the logged dosage for the 35 test runs using the posterior predictive mean. All four emulators produced similar spatial dispersion to the true simulator, illustrated in Figure 3. The PC-GP emulator has generally higher RMSE across all of the test runs; see Figure 4. The median RMSE for the PC-GP emulator (3.70) is considerably greater than the upper quartile of the iTPRS-GP (2.28) or sTPRS-GP (1.85) emulators. The sTPRS-GP emulator performs slightly better than the iTPRS-GP emulator, with lower quartiles of 1.07 for sTPRS-GP and 1.35 for iTPRS-GP. These differences demonstrate the advantage of modeling the within-run correlation between basis functions. For this low-resolution spatial grid, the sGP emulator shows better performance than any of the other emulators, with lower and upper quartiles of 0.19 and 0.61, respectively. In section 4.6, we investigate if the sGP emulator maintains this advantage over the sTPRS-GP emulator for a high-resolution output grid where the sGP emulator can only be trained on a subset of the output data.

Uncertainty quantification.
The Bayesian approach to emulation allows the uncertainty associated with predictions of the simulator to be quantified and assessed. Realistic and appropriate uncertainty quantification is a key determinant of a good emulator. Prediction uncertainty for the PC-GP and iTPRS-GP emulators can be obtained from the MCMC samples. For the sTPRS emulator, we use (2.6) with the conditional posterior distribution for β and an additive error term corresponding to the approximation error (2.3). A similar approach is adopted for the sGP emulator [27].
To demonstrate the predicted uncertainty from the four emulators of the dispersion simulation, in Figure 5 we present prediction intervals (posterior predictive mean ±3 posterior predictive standard errors) for a single test run for each of the PC-GP, iTPRS-GP, sTPRS-GP, and sGP emulators; other test runs produce similar results. The sample points in these figures are ordered by row, taking every 16th row and column from Figure 2. They can be related to the output grid in Figure 2 by considering each set of 8 consecutive points as a horizontal transect across the grid. The locations of the points in the output space S are illustrated in   We also calculate the frequentist coverage of these prediction intervals using the test data; we would expect coverage of around 99%. From Figure 5, it is clear that excluding within-run correlation in the iTPRS-GP emulator has led to substantial underestimation of the uncertainty (coverage of 28%), with almost all observed responses lying outside the (very narrow) prediction intervals. The MCMC chains had converged and sufficiently explored the parameter space, and hence the underestimation of the uncertainty appears to be a consequence of modeling assumptions. The other emulators give more realistic assessment of uncertainty, with coverages of 83% (PC-GP), 98% (sTPRS-GP), and 100% (sGP). The much wider predictive intervals for the sGP emulator plausibly result from the larger effective number of parameters in this emulator (with separate, dependent, Gaussian process models for each output point).

Prediction on a high-resolution output grid.
In our example simulation, the output grid covers a 10km × 10km spatial area. The low-resolution output grid considered up to Logged Dosage this point can only accurately assess the performance of sensor locations at around a 1km resolution. For application in a sensor placement tool, much higher resolution (∼ 10m) is required. To assess the performance of the sTPRS-GP and sGP emulators at a high resolution, we define a 63 × 63 regular lattice as an output grid (with no boundary points) and attempt to predict the output for untried x on this much finer scale. We choose these two emulators because of their superior performance on the low-resolution output grid. For this size output grid (r = 3639), it is no longer computationally feasible to train the sGP emulator using the whole output grid in realistic time (it requires multiple inversions of a 3696 × 3696 correlation matrix). However, as the sGP is a model for functional data, instead we train the emulator using an 8 × 8 subgrid (actually the same output grid as in the previous sections) and then predict at the interim locations. The sTPRS-GP emulator, however, performs dimension reduction on the output grid and hence can be trained using much larger data sets, with the basis functions defined using the complete output grid. The benefit of this method is twofold: prediction is actually computationally less expensive, and all of the output data can be used, providing more accurate prediction across the whole output domain S. The RMSEs of the two approaches are shown in Figure 7.
It is clear from this figure that the sTPRS emulator provides substantially more accurate predictions than the sGP emulator. An important advantage here of the sTPRS-GP emulator is that it does not need to use an estimated correlation structure on the output grid for prediction, unlike the sGP emulator. When the output grid displays nonstationary correlation, as with this dispersion example, the sGP emulator struggles to accurately predict at interim  locations; see Figure 8 for an illustration. In general, for detailed output domains, e.g., urban locations, using a low-resolution output grid to train the sGP emulator could result in missing important details in the response.

Artificial example.
The dispersion example demonstrated the effectiveness of the sTPRS-GP emulator for prediction on dense grids with changing spatial variation. In this section, we use an artificial environmental example to further compare the sTPRS-GP and sGP emulators.
The example is a modified version of a simulator that has previously been used to demonstrate calibration methodology [3]. The simulator models a chemical pollutant spill at two locations into a long, narrow holding channel. We assume the location and time of both spills is known and fixed but that the mass and diffusion rate at each spill location may be inputs to the simulator. For the pollutant concentration at location s 1 and time s 2 (so q = 2), the simulator has the simple closed form where x 1 , x 2 and x 3 , x 4 are the mass and diffusion rate of spilled pollutant at spills 1 and 2, respectively. The first spill is at (location, time) = (0, 0); the second is at (L, τ ) = (1.505, 30.1525). We generated data from simulator (5.1) for four scenarios, corresponding to d = 1, . . . , 4 variables, starting with varying just the first mass x 1 , then the first mass and diffusion rate (x 1 , x 2 ), and so on. Variables held fixed were set to the midpoints of their ranges, 7 ≤ x 1 , x 3 ≤ 13 and 0.02 ≤ x 2 , x 4 ≤ 0.12. We adapted code from http://www.sfu.ca/ ∼ ssurjano and used four n = 80 run maximin Latin hypercube designs with q = 1, . . . , 4 to generate the training data, and separate samples from continuous distributions to generate validation and test data sets, each with 10 runs.
We took the most effective emulators from the dispersion example, sTPRS-GP and sGP, and applied them to the four scenarios. For each scenario, the validation data was used to choose the various prior hyperparameters for each emulator, including the number, p, of functions in the TPRS basis. The sTPRS-GP emulator was estimated using an output grid with r = 2500 location/time points, whereas the sGP emulator was estimated using a much smaller grid with r = 100 points. Fitting the sGP emulator to the larger output grid was computationally infeasible. In each case, an r = 2500 output grid was used for the validation and test sets. All output grids were regular two-dimensional lattices with no boundary points. We again scaled the training, validation, and test data sets to have concentration with meanzero and standard deviation 1 at each output point. Figure 9 shows some typical output from this modeling exercise when d = 4. In this case, both the sTPRS-GP and sGP emulators capture the basic features of the true concentration surface. However, the sGP emulator, which was estimated using the much smaller output grid, has overestimated the mass and/or diffusion at both pollutant spills, resulting in higher predicted concentrations than the sTPRS-GP, which better captures the true concentration. Here, the sTPRS-GP has average RMSE 7% smaller than the TP-GP.
For fewer input variables (d < 4), the sGP emulator had a predictive advantage in terms of RMSE; see Figure 10. For d = 1, 2, 3, the improvement in average RMSE for the sGP was between 2% and 26%, with the sTPRS-GP performing poorest for d = 3. While there seems to be little pattern in performance here, it is plausible that the improved performance of the TPRS-GP emulator for d = 4 is due to (i) greater variation in the response across the output grid, particularly when there are high levels of concentrated mass at the two sources (high mass, low diffusion rate), and (ii) more detailed changes in the response surface occurring between runs. Both of these situations occur more often when both diffusion rates are varied.
6. Discussion and future work. We have presented the emulation of multivariate simulators with two-dimensional output structure. Thin-plate regression splines were demonstrated to be an effective method for dimensional reduction that resulted in substantially increased prediction accuracy over the use of principal components. When a separable covariance structure is assumed for the basis coefficients, a computationally feasible emulator results that provides realistic measures of uncertainty. For the separable emulator, we have adopted a plug-in approach using the posterior predictive distribution conditional on the correlation parameters. Clearly, there is the potential for this approach to underestimate the posterior uncertainty. However, the results in section 4.5 suggest that it is more important to account for the within-run correlations resulting from the nonorthogonal thin-plate spline basis functions.
When high-resolution prediction was required in the dispersion example, the thin-plate regression methodology proved more accurate and computationally feasible than functional data modeling using a separable Gaussian process model. A key reason for the advantage of the TPRS approach is the nonstationary nature of the output data, with differing correlation lengths across the output domain. The artificial example demonstrated that the two methodologies can perform very similarly, although the sTPRS-GP emulator was again preferred when nonstationary output variation was observed. An area for future research is to define more fully the scenarios (nonstationary correlation, high-resolution spatial features) in which the TPRS approach has a predictive advantage. Other research could include developing multivariate emulators using Gaussian processes with nonstationary or nonseparable covariance structures [11], modeling dispersion simulators with qualitative inputs (see, e.g., [24]) such as release type relating to the source term, and building emulators for dynamic responses such as concentration over time. Further research is also required into designing the computer experiments for these multivariate hierarchical problems, including the choice of the simulator inputs and also, in some applications, the selection of output domain locations (cf. [2]). Hence, for a separable correlation structure with C = W ⊗ V , point estimates of β can be provided by In addition, π(τ |β) ∝ τ −1−(aτ +np)/2 exp −(b τ + β T C −1 β)/2τ −1 .
Then the normal-Gamma model Gamma(a σ , b σ ) .
The likelihood depends on the simulator data only throughβ; therefore integrating out β with respect to its prior distribution (B.1) gives This posterior distribution is explored via MCMC using standard metropolis updates. Conditional on the hyperparameters, the posterior distribution of β k (x) is a Gaussian process of fairly standard form [27]. Samples from the unconditional posterior for β k (x) can be obtained by substituting samples from the MCMC chain for σ −2 , τ , θ|Ỹ .
The MCMC simulation is hampered by the inversion of the matrix which is of size np × np. However, only part of the matrix is updated at each step of the MCMC; therefore the matrix inversion is carried out once at the start of the chain and then updated via the method described in Appendix C.
Appendix C. Reduction of computational burden. Inversion of the np × np matrix [(σ −2 A T A) −1 + C] (see Appendix B) is computationally intensive when using a nonorthogonal basis such as a thin-plate spline. To overcome this problem and make the MCMC updates feasible, we note that any update of the parameters τ , θ, or the nugget only changes an n × n submatrix of the covariance matrix C. This submatrix depends on the parameter being updated, and hence care is required in locating the correct segment. Once the inverse of the whole matrix has been performed for the update of σ −2 the proceeding inverses can be computed using the Woodbury formula, where D, P , and Q all denote matrices of the correct size, and I is an identity matrix. For our problem, D = [(σ −2 A T A) −1 + C] and is size np × np, P is size np × n, and Q is size n × np. In order to make use of this result, we need to find a P Q equal to the difference in the D matrix resulting from the update.
where each 0 denotes a matrix of the correct dimension to position the change in the D matrix for the current update. Note that in the first updates, the initial 0 matrix will be of size 0 × 0, and in the last updates the final 0 matrix will be of size 0 × 0. Then We obtain P 1 Q 1 by performing LU decomposition on the difference between the current D matrix and the proposed D matrix, a simple np × np subtraction. Using block notation and remembering that D and therefore D −1 are symmetric, let This can be computed efficiently using bespoke multiplication routines. However, for large iteration cycles it is recommended that a full inversion be performed approximately once every 100 steps to ensure that small numerical errors do not accumulate. A similar derivation can be performed using the matrix determinant lemma [14].