Dimension reduction for Gaussian process emulation: an application to the influence of bathymetry on tsunami heights

High accuracy complex computer models, or simulators, require large resources in time and memory to produce realistic results. Statistical emulators are computationally cheap approximations of such simulators. They can be built to replace simulators for various purposes, such as the propagation of uncertainties from inputs to outputs or the calibration of some internal parameters against observations. However, when the input space is of high dimension, the construction of an emulator can become prohibitively expensive. In this paper, we introduce a joint framework merging emulation with dimension reduction in order to overcome this hurdle. The gradient-based kernel dimension reduction technique is chosen due to its ability to drastically decrease dimensionality with little loss in information. The Gaussian process emulation technique is combined with this dimension reduction approach. Our proposed approach provides an answer to the dimension reduction issue in emulation for a wide range of simulation problems that cannot be tackled using existing methods. The efficiency and accuracy of the proposed framework is demonstrated theoretically, and compared with other methods on an elliptic partial differential equation (PDE) problem. We finally present a realistic application to tsunami modeling. The uncertainties in the bathymetry (seafloor elevation) are modeled as high-dimensional realizations of a spatial process using a geostatistical approach. Our dimension-reduced emulation enables us to compute the impact of these uncertainties on resulting possible tsunami wave heights near-shore and on-shore. We observe a significant increase in the spread of uncertainties in the tsunami heights due to the contribution of the bathymetry uncertainties. These results highlight the need to include the effect of uncertainties in the bathymetry in tsunami early warnings and risk assessments.


Introduction
Simulators are widely employed to reproduce physical processes and explore their behavior, in fields such as fluid dynamics or climate modeling.To characterize the impact of the uncertainties in the boundary conditions or the parameterizations of the underlying physical processes, a sufficient number of simulations are required.However, when the simulators are computationally expensive, as it is the case for high accuracy simulations, the task can become extremely costly or even prohibitive.One prevailing way to overcome this hurdle is to construct statistical surrogates, namely emulators, to approximate the computer simulators in a probabilistic way [29].Emulators are trained on a relatively small number of well chosen simulations, i.e. a design of computer experiments.Outputs at any input can be predicted at little computational cost with emulators.One can then employ emulators for any subsequent purposes such as uncertainty propagation, sensitivity analysis and calibration.
With high-dimensional inputs, say beyond 20 dimensions (usually in the hundreds or thousands), a large design is usually required to explore the input space, typically in the order of 10 times the number of dimensions, for a reasonable level of approximation.One would face serious computational problems since the original simulator cannot be run many times or is very expensive to run.Advanced designs such as Latin Hypercubes or new sequential designs [4] that are more efficient than Latin Hypercubes only partially alleviate the issue.As a result, methods that adequately reduce the dimension of the input space are required, as high-dimensional inputs are often present in computer models, e.g. as boundary conditions like the bathymetry (i.e.seafloor elevation) in tsunami modeling.Some approaches ignore high-dimensional inputs and add stochastic terms to account for their contribution [19,25].These methods are easy to implement and effective in some applications.However, repeated simulations at the same input parameters that are encoded in the emulation are often required to estimate the variability due to those parameters that are ignored.The variability estimates are often restricted to the second moments, and the input-output relationships over the ignored inputs are not clear.Constantine et al. [6] proposed to find rotations of the input space with the strongest variability in the gradients of the simulators and constructed a response surface on such a low-dimensional active subspace.This Active Subspace (AS) approach has been demonstrated to be effective theoretically and numerically.Constantine and Gleich [5] studied further the properties of the Monte Carlo approximation of the subspace.However, this method requires the calculation of a sufficient number of gradients explicitly, which unfortunately prevents its use in many applications.The gradients are often unavailable in many realistic simulators, and typically intractable for systems of mixed PDEs or multi-physics simulations.Even in the rare situations where gradients are computable numerically, the computational cost of obtaining them could be prohibitive.
The concept of active subspace is closely related to the sufficient dimension reduction (SDR) [7,9] and effective dimension reduction (EDR) [22] in the statistical community.Given an explanatory variable X ∈ R m (input) and response variable Y (output), the aim of SDR (or EDR) is to find the directions in the subspace of X that contain sufficient information about the response for statistical inference.More specifically, a SDR R(X) ∈ R d where d < m satisfies p(Y |X) = p(Y |R(X)), where p(Y |X) and p(Y |R(X)) are conditional probability density functions with respect to X and R(X) respectively.The EDR approach aims to specifically find a linear projection matrix B onto a d-dimensional subspace (d < m) such that B T B = I d and p(Y |X) = p(Y |B T X) or equivalently Y ⊥ X|B T X. (1) Several methods have been developed to find SDR including nonparametric approaches such as sliced inverse regression (SIR) [22], minimum average variance estimation (MAVE) [30], and parametric approaches like principal fitted components (PFC) [8,10].In this paper, we adopt the gradient-based kernel dimension reduction (gKDR) [15] to construct low-dimensional approximations to the simulators.The gKDR approach does not require any strong assumptions on the variables and distributions.The response variable can be of arbitrary type: continuous or discrete, univariate or multivariate.Unlike the active subspace method, gradients are not required to be computed explicitly but are estimated non-parametrically and implicitly using stable kernel methods.Our proposed approach therefore provides an answer to the dimension reduction issue in emulation for a wide range of problems that cannot be tackled using existing methods at the moment.Moreover, the gKDR approach ends up with an eigen-problem without any needs of elaborate numerical optimization and thus can be applied to large and high-dimensional problems.
We introduce a joint framework to approximate the high-dimensional simulators by combining statistical emulators with the gKDR approach.Deterministic simulators are considered here, however the framework could potentially be applied to stochastic simulators, with additional treatments to the stochastic effect in the emulation, see e.g.[18].Throughout this paper, the mainstream Gaussian process (GP) emulators are employed for illustration.But the general framework and most of the results would potentially hold for other emulation techniques.
The paper is organized as follows.Section 2 and 3 review GP emulation and the gKDR approach respectively.In Section 4, a joint framework of dimension reduction combined with emulation is proposed and some theoretical properties are established.Section 5 contains the numerical experiments on an elliptic PDE and an application to the propagation of uncertainties in the bathymetry to tsunami wave heights, as well as comparison with alternative methods.Section 6 consists of some conclusive discussion.

Gaussian process emulation
A Gaussian process is a collection of random variables such that any finite subset of these variables follow a joint Gaussian distribution [27].It is widely used in various scientific fields.Here we briefly review some basics of its application in statistical emulation.
A deterministic simulator with multivariate input X = (x 1 , ..., x m ) T ∈ R m and univariate output Y ∈ R can be represented as Y = f (X).The GP emulator assumes that the simulator output Y = f (X) can be modeled with a Gaussian process.It is commonly assumed that the mean can be written as E(Y ) = m(X) = h T (X)β, where h(X) is a q-vector of pre-defined regression functions and the coefficients β ∈ R q .In practice, a constant or linear form for the regression functions would perform well.The covariance between two simulator outputs Y = f (X) and Y = f (X ) is usually represented as Cov(Y, Y ) = k(X, X ) = σ 2 c(X, X ), where the positive scalar parameter σ 2 is the process variance and c(X, X ) is the correlation function.One common choice for the correlation function is the squared-exponential correlation c(X, X ) = m i=1 exp (−(x i − x i ) 2 /δ 2 i ) , where δ = (δ 1 , ..., δ m ) T ∈ (0, ∞) m controls the correlation lengths.
Suppose that the simulator is run at n inputs X 1 , ..., X n and the outputs are Y 1 , ..., Y n respectively.We may need to impose a prior for the parameter β in the mean function m(X) = h T (X)β.One of the popular choices is a Gaussian prior, β ∼ N(b, V), which forms a conjugate prior with the GP likelihood.At any n * desired inputs X * 1 , X , the predictive process of Y * (also known as kriging prediction) given the observed data and the covariance function is where θ represents the hyperparameters from the covariance function, K, K * and K * * are n × n, n * × n and n * × n * matrices respectively with the associated (i, j)-th entry as We can see that the output at any desired input predicted using a GP emulator is a distribution rather than a single value.This could be used to estimate the uncertainty introduced into the prediction with emulator and to evaluate the confidence about the prediction.However, the hyperparameters θ is usually unknown and needs to be specified properly.It is possible to make a fully Bayesian inference with appropriate prior π(θ).But this usually requires costly MCMC approach for the analytically intractable posterior.In practice, a computationally cheap alternative is often employed by specifying the hyperparameters θ at the most probable values.This could be done by maximizing the marginal likelihood, Then the prediction can be completed by plugging in θ = argmax θ L(θ).
Usually there is no sufficient information about the parameter β, hence a vague prior can be imposed by letting V −1 → O and b = 0, where O is the matrix of zeros.In this case, the conditional mean and covariance of the predictive process are respectively where β = (HK −1 H T ) −1 HK −1 Y.This is closely related to the t-process [26] when a weak prior for (β, σ 2 , δ) that π(β, σ 2 , δ) ∝ σ −2 π δ (δ) is assumed with the mean function m(•) = h T (•)β and the covariance function k(•, •) = σ 2 c(•, •; δ), where δ contains the parameters in the correlation function c(•, •).When k(X, X ) = σ 2 c(X, X ) is used with a continuous correlation function, such as the squared-exponential correlation, the emulator interpolates through the training data, i.e. m(X i ) = Y i and v(X i ) = 0 at the training points {X i } n i=1 .When a nugget term is included, this is no longer true.A nugget term can be included, e.g. to mitigate numerical instabilities or account for the stochastic terms in simulations [3].The correlation function c(X, X ) can be extended with the addition of a nugget as c(X, X ) = νI X=X + (1 − ν)c(X, X ), where ν > 0 is the nugget term, and I X=X is the indicator function that takes 1 if X = X and 0 otherwise.The associated correlation matrix is K = (1 − ν)K + νI, where I is the identity matrix.
In practice, the error in the prediction of GP emulator depends on the number of training data points.As there are more and more training data points, the GP emulator will be expected to recover the simulator.There are several theoretical results on how well the GP emulator f can approximate the simulator f in the literature.For example, given n training samples that are quasi-uniformly distributed on Ω ⊂ R d , the error can be bounded [13] as f − f ∞ ≤ C d n −p/d f H for any f in some function space H over Ω, typically a Hilbert space, where p controls the smoothness of the function and C d is a constant depending on the dimension d.This result suggests that f provides arbitrarily high approximation order when p = ∞, i.e. f is infinitely smooth.However, this rate decreases as the dimension increases and the constant C d also grows with d.Hence, the approximation deteriorates in very high dimensions.This implies that more evaluations of the simulator are required to train an accurate emulator when the number of input parameters d increases and the associated computational cost of constructing an emulator could increase dramatically as a result.Therefore, it is desirable to reduce the dimension of the problem from the perspectives of both accuracy and efficiency.

Gradient-based kernel dimension reduction
For a set Ω, a symmetric kernel k : Ω × Ω → R is positive-definite if n i,j=1 c i c j k(ω i , ω j ) ≥ 0 for any ω 1 , ..., ω n ∈ Ω and c 1 , ..., c n ∈ R. Then a positive-definite kernel k on Ω is uniquely associated with a Hilbert space H consisting of functions on Ω such that 1) k(•, ω) ∈ H; 2) the linear hull of {k(•, ω)|ω ∈ Ω} is dense in H; 3) h, k(•, ω) H = h(ω) for any ω ∈ Ω and h ∈ H where •, • H is the inner product in H.Because the third property implies that the kernel k reproduces any function h ∈ H, the Hilbert space H is called the reproducing kernel Hilbert space (RKHS) associated with k.Let (X, Y ) be a random vector on the domain R m × Y, and k X and k Y be positive definite-kernels on R m and Y with respective RKHS H X and H Y .We shortly present the salient facts about the gKDR approach.
Fukumizu and Leng [15] noted that for any g ∈ H Y , there exists a function ϕ g (z) on Then, under mild assumptions, we have, for any X = x, On the other hand, defining the cross-covariance operator C Y X : holds for all h 1 ∈ H X , h 2 ∈ H Y , and using the fact that Equating the two expressions above yields for i, j = 1, ..., m, Therefore, the dimension reduction projection matrix B is formed as the eigenvectors associated with the nontrivial eigenvalues of the m × m matrix M(x).Given i.i.d.samples (X 1 , Y 1 ), ..., (X n , Y n ), the matrix B can be approximated with B [15] that contains the first d eigenvectors of the following m × m symmetric matrix, where G X and G Y are the Gram matrices with the (i, j)-entry as k Sometimes there may not exist such a sufficient subspace rigorously so that d = m, or we may want to select less dimensions d < d for later analysis even in cases where such a subspace exists in order to achieve a more stringent reduction (albeit with a small loss).For convenience, we slightly reformulate the gKDR approach into a more general form without any change to the results in [15].Let W be an m × m matrix with Following the same procedure as before, it is easy to see that If there exists B satisfying (1) with d < m, ∇ a ϕ(W T x) = 0 for any a > d, hence the respective columns correspond to the zero eigenvalues of M(x).The projection matrix W does not depend on the value of x, while the nontrivial eigenvalues vary with x.Therefore, we obtain the following eigen-decomposition

Joint emulation with dimension reduction
The gKDR approach is now applied together with GP emulation, to construct a lowdimensional approximation to a simulator.Thus the following procedure is employed to emulate a high-dimensional simulator.
Step 2. Split W into [ W1 W2 ], where W1 consists of the first d columns of W corresponding to the largest d eigenvectors.
Step 3. Design a set of n 2 runs (X 1 , Y 1 ), ..., (X n 2 , Y n 2 ) of the simulator, e.g. based on the reduced space WT 1 X, and construct an emulator using the lower dimensional pairs ( WT 1 X 1 , Y 1 ), ..., ( WT 1 X n 2 , Y n 2 ).In Step 1, sufficient samples are needed to estimate W accurately.The theoretical results in [15] on the convergence rate of Mn would provide some insights.In practice, the number of directions that have a major influence may also affect the sample size n 1 needed.Step 2 requires an appropriate selection of d to construct an efficient and effective emulator.The samples to train the emulator in Step 3 can be different (e.g.additional runs) from those already collected to find W in Step 1.There is a benefit in terms of design arising from the dimension reduction.Indeed, in step 3, the design can be built to explore the reduced space of possible WT 1 X but the actual inputs of the simulator are of the corresponding high-dimensional values of X , as the dimensions left out are deemed unimportant.

Approximation properties
We now explore some theoretical properties of the low-dimensional approximation to a simulator using the gKDR approach.For any X = x ∈ R m , if M(x) is known exactly, we have the eigen-decomposition (16).Suppose that the eigenvectors and eigenvalues are partitioned as where Λ 1 (x) = diag(λ 1 (x), ..., λ d (x)) with d < m consists of the first d largest eigenvalues, W 1 is the m × d matrix whose columns are the associated eigenvectors.Then for any X, we can define the projected coordinates by U Our proposed approach suggests to make inference on Y based on U instead of the full explanatory variable X.The following proposition establishes an error bound for such approximation.
The approximation error is bounded as follows: where C 1 is a constant depending on the domain of x, b i (i = d + 1, ..., m) are positive constants relating to W 1 and g.
.., m.Following [2], for any g ∈ H Y , we can define a bounded linear operator Φ : Its adjoint is a mapping Φ Because g is arbitrage, it must hold for any a ∈ R m that Φ * a = m i=1 a i φ i .
From the derivation of the gKDR approach, the derivative of G(x) w.r.t x is just ∇ x G = Φg and M = K = ΦΦ * .We denote the range of an operator A as Ra(A) and its kernel (null space) as Ker(A).The space Ra(Φ * ) is finite-dimensional and hence closed, so we have the decomposition Given the projection of coordinates from x to u and v, we can write The gradient of G w.r.t u can be obtained by the chain rule as where a ∈ R m relates to g.Then it is easy to see that where the positive constants b i depend on W 1 and g, for i = 1, ..., d.Similarly, we have where the positive constants b i depend on W 2 and g, for i = d + 1, ..., m.
We now infer g(Y ) based on u ∈ R d rather than x ∈ R m with d < m.For any u, we have Therefore, for any fixed u, we estimate Note that for any fixed u, the original function ) is a function of only v, while the approximation Ĝ(x) = E [G|u] is in fact the average of G(u, v) over all possible v, and so is not a function of v.The Poincaré inequality yields where C 1 is a constant depending on the domain of x.
When W 1 represents a sufficient dimension reduction, Though the result is presented with conditional mean E[g(Y )|•] for any g ∈ H Y , it is not limited to the first moment only.For characteristic kernels such as the popular Gaussian RBF kernel k(x, y) = exp(− x − y 2 /(2σ 2 )) and the Laplace kernel k(x, y) = exp(−α m i=1 |x i − y i |), probabilities are uniquely determined by their means on the associated RKHS [15]; see also [17] for a definition of the distance between probabilities using their means.
In practice, W cannot be known exactly.We can only estimate a perturbed version W = [ W1 W2 ] instead using the eigen-decomposition of Mn .Under some mild conditions, Mn converges in probability to E[M(x)] with order O p n − min{1/3,(2β+1)/(4β+4)} for some β > 0 [15].As a result, we have the following result.
where C 1 is a constant depending on the domain of x and the b i (i = 1, ..., m) are positive constants related to W and g.
Proof.Denoting Mn = E[M(x)] + E n and e n = n − min{1/3,(2β+1)/(4β+4)} , the convergence result on Mn [15] entails that for any > 0, there exists a constant C > 0 and N such that for any n ≥ N , Then there exists N such that for any n ≥ N , where where N can be chosen as The distance between subspaces that are spanned by columns of W 1 and W1 , denoted by span(W 1 ) and span( W1 ) respectively, can be defined as [16] dist(span(W 1 ), span( W1 Using Corollary 8.1.11 of [16], we have . We also note that W T 2 W2 ≤ W 2 W2 = 1.Then for any x, we have the following approximation to G(x), Letting ṽ = WT 2 x and following the same procedure as Proposition 1, for any fixed ũ we have, where The result holds by plugging the respective terms.
The approximation procedure generates an "innovative simulator" f on the reduced input space of U = WT 1 X, which is however not deterministic.Suppose there are two distinct inputs X 1 and X 2 with the respective outputs Y 1 = Y 2 .It may happen that WT 1 X 1 = WT 1 X 2 , i.e. the approximated simulator f may yield different outputs given the same input.The low-dimensional stochastic simulator f can nevertheless be emulated, for example using GP with nugget effect, assuming that the influence of the dropped components is relatively small and simple enough to be captured by the nugget.The overall approximate error of the final emulator f to f can be decomposed into f − f ≤ f − f + f − f , where the first term in the right hand side is due to the low-dimensional approximation which has been investigated in Proposition 2, and the second term depends on the emulation procedure.

Choice of parameters and structural dimension
When applying the proposed framework for emulation, several parameters need to be specified properly, e.g. the parameters in the kernels and the regularization parameter n .The cross validation approach can be used to tune such parameters as in many nonparametric statistical methods.In addition, it is also required to choose an appropriate structural dimension d to construct an accurate emulator.
One of the possible ways is to choose d within the dimension reduction procedure.Fukumizu and Leng [15] pointed out that it might not be practical to select d based on asymptotic analysis of some test statistics, as in many existing dimension reduction techniques, when the dimension is high and the sample size is small.They mentioned that the ratio of the sum of the largest d eigenvalues over the sum of all the eigenvalues, d i=1 λ i / m i=1 λ i , might be useful in identifying the conditional independence of Y and X given B T X.In addition, Proposition 2 shows that the approximation error decreases as a function of λ d − λ d+1 .As discussed in [5], d might be chosen such that λ d − λ d+1 is maximized.However, we may notice that the approximation error also depends on the squares of the eigenvalues with some unknown weights.Therefore it seems to be not very practical to select d solely based on the eigenvalues.
On the other hand, Fukumizu and Leng [15] suggested to select d based on the subsequent utilization of d rather than the dimension reduction procedure when dimension reduction serves as a pre-processing step.For example the ultimate goal of our proposed framework here is to construct an accurate emulator, hence it is intuitive to select the structural dimension that produces the best predictive performance.Therefore, in the following numerical studies, we select d as well as other parameters for the gKDR approach using simple trial-and-error or more formal cross validation approach based on the predictive accuracy of the respective emulators.

Numerical simulations
In this section we conduct two numerical studies.In the first study, the proposed emulation framework using the gKDR approach is compared with several alternatives of dimension reduction and the full emulation on a PDE problem.This problem set up allows the computation of gradients explicitly.In the second study, we illustrate the emulation framework with an application to tsunami modeling; we also provide a comparison to other methods, except AS which cannot be applied.Throughout the simulations, the GPML code using maximum likelihood method implemented by [27] is employed for the emulation assuming a linear form mean function with intercept, and a squared exponential correlation function.

Study 1: elliptic PDE with explicit gradients available
In this example, we investigate the elliptic PDE problem with random coefficients as studied in [6].Let u = u(s, x) satisfy the linear elliptic PDE The homogeneous Dirichlet boundary conditions are set on the left, top and bottom boundary (denoted by Γ 1 ) of the spatial domain of s, and a homogeneous Neumann boundary condition is imposed on the right side of the spatial domain denoted Γ 2 .The coefficients a = a(s, x) are modeled by a truncated Karhunen-Loeve (KL) type expansion log(a(s, x)) where the x i are i.i.d.standard Normal random variables, and φ i (s), γ i are the eigenpairs of the correlation operator The target value is a linear function of the solution The problem is discretized using a finite element method on a triangulation mesh, then f and ∇ x f can be computed as a forward and adjoint problem; see [6] for more details.
We choose m = 100 and examine two cases of the correlation lengths β = 1 or β = 0.01.Therefore, the original input space is X = R 100 with standard Normal distribution and the output f (x) is univariate.The gKDR approach is applied to reduce the dimension of the problem using M samples.We also compare with several popular alternative dimension reduction techniques: AS (here possible due to the explicit gradients), SIR, SIR-II [22], sliced average variance estimation (SAVE) [11], MAVE and PFC.After reducing the dimension of the problem, the GP emulator is trained using a Latin Hypercube design of 10d points on the reduced d-dimensional space so that the whole procedure needs M + 10d samples in total using each dimension reduction method.For comparison, we also emulate the problem on the original 100-dimensional input space directly with M + 10d samples, which is the full emulation.The gKDR approach is implemented in Matlab by K. Fukumizu, see http://www.ism.ac.jp/\string ~fukumizu/.The Matlab code for AS and solving the PDE by [6] is available on https://bitbucket.org/paulcon/active-subspace-methods-in-theory-and-practice.For SIR, SAVE and PFC, the codes are provided in the Matlab LDR-package (https: //sites.google.com/site/lilianaforzani/ldr-package),and SIR-II is implemented by simply modifying the SIR code.For MAVE, the Matlab code by Y. Xia is available from http://www.stat.nus.edu.sg/\string~staxyc/.The associated parameters in some methods, such as the kernel and regularization parameters for gKDR, the number of slices for the sliced methods and the degree of polynomial basis for PFC, are chosen in a simple trial-and-error way by trying several values and selecting the best.
The final emulators are used to make prediction on a testing set of n evaluations {f 1 , ..., f n } that differ from the training set, where f i = f (x i ) and x i ∈ R 100 is drawn randomly from the standard Normal distribution.The predictive performance is measured by the normalized predictive root-mean-square-error (PRMSE) where fi is the prediction (predictive mean) using emulation.The associated computing time is also recorded with three parts: T1 for running the simulator, T2 for estimating the dimension reduction and T3 for training the emulator and making prediction.Note that T1 includes the time devoted to run the simulator M times when using all the dimension reduction methods.It also includes the time used to compute the gradients for AS and the additional 10d runs for full emulation.T2 is zero for full emulation since there is no dimension reduction.T3 also includes the time for running the simulator 10d times on the designed points except full emulation.
In this study, we choose M = 300 and d = 1, ..., 5. Table 1 presents the results on a testing set with n = 500 evaluations using different emulation approaches and Figure 1 shows an example of the associated computing time when β = 1 and d = 5.Compared with the full emulation results, by reducing the dimension properly, the predictive accuracy can be improved, especially when the correlation length is long (β = 1).Also, as a result, the computing time for training GP emulator (T3) decreases dramatically.In terms of predictive accuracy, AS naturally performs the best, as it is using exact gradients, followed by gKDR.MAVE, SIR and PFC are better than SIR-II and SAVE, but PFC does not work very well when β = 1 and MAVE spends more computing time on dimension reduction.Most methods yield smaller errors for β = 1 than β = 0.01, except PFC and full emulation.Unlike the other techniques, AS employs exact gradients which explains its lead in performance.However, as shown in Figure 1, the computing time T1 for AS is about two orders of magnitude longer than the others making the method most computationally expensive.Moreover, computing gradients ∇ x f is sometimes impossible, e.g. for the tsunami simulation in the next study.This restricts the applicability of AS method to a few applications.To summarize, when the exact gradients are computable the proposed gKDR approach is able to produce comparable results (though not as good) as the AS method that uses exact gradients, and outperforms the other SDR methods in most cases.However, the computational cost of applying gKDR is much less than that employing AS.In fact, gKDR not only is able to find the SDR accurately and efficiently, but also can be applied in a wide range of scenarios where complicated variable types or very high dimensions are involved.The next application into tsunami simulation provides a snapshot of its wide capability when there are few applicable alternatives.

Study 2: tsunami emulation where no gradients available
Here we apply the proposed general framework to investigate the impact of uncertainties in the bathymetry on tsunami modeling, where the bathymetry is included as a highdimensional input.
A synthetic bathymetry surface is created in the (s 1 , s 2 ) coordinate system to conduct tsunami simulations as shown in Figure 2 (a).For simplicity, we assume that the seabed elevation only vary along the first coordinate s 1 .Though simple, it still captures the typical continental characteristics: the continental shelf spans from shore line (s 1 = 0) to around s 1 = −25 km at the water depth of around 150 m; the continental slope is between s 1 = −25 km and s 1 = −75 km with water depth of 150 ∼ 1800 m; west of −75 km it is the deep ocean with water depth of 1800 ∼ 2200 m .To model uncertainties and mimic the realistic boat tracks of oceanic surveys, some irregular lines are drawn.We consider two levels of survey density which are denoted by survey level 1 and 2 respectively.Considering that the surveys are usually constrained within budgets, the total lengths of the two level surveys are fixed at 1000 and 100 km.To account for different possible survey traces, 20 samples of boat tracks are drawn at each level of survey density; see in Figure 3 three samples per level for illustration.In this study, we only consider the impact of the uncertainties in the bathymetry within the area (s 1 , s 2 ) ∈ [−40000, 0] × [−5000, 5000] as shown with a blue rectangle in Figure 3.The bathymetry at other locations are fixed at the true values.This assumption is based on the physical knowledge that deep ocean has a relatively small influence on tsunami waves.Along the possible boat tracks, observations of bathymetry are collected every 500 m.Then the whole bathymetry surface can be modeled using the SPDE approach [23] and inferred using INLA method [28] for approximate Bayesian inference.Given observations of bathymetry z = (z 1 , ..., z n ) at locations s = (s 1 , ..., s n ) , it is assumed that z i = Z(s i ) + i , i = 1, ..., n, where the unknown bathymetry surface Z(s) is Gaussian field with Matérn covariance function where s − s * is the Euclidean distance between two locations s and s * ∈ R 2 , K ν is the modified Bessel function of the second kind and order ν > 0, κ > 0 controls the nominal correlation range through ρ = √ 8ν/κ corresponding to correlations near 0.1 at the Euclidean distance ρ, and σ 2 is the marginal variance.Lindgren et al. [23] noted that Z(s) also satisfies the stochastic partial differential equation (SPDE) where the innovation process W is spatial Gaussian white noise with unit variance, ∆ = is the Laplacian operator, and τ controls the marginal variance through the relationship With a finite elements representation over an appropriate triangular mesh, a stochastic weak solution to the SPDE can be approximated.It is shown that the coefficients w = (w 1 , ..., w m ) T can be approximated by a Gaussian Markov random field, i.e. w ∼ N(0, Q −1 ) for Q is sparse.Note that bivariate splines could be used [24] to reduce the number of parameters required for specific approximation order, which is good, but not enough, for dimension reduction.Then we build the following hierarchical spatial model, where A ij = ψ j (s i ) and θ contains all the hyperparameters.Since w uniquely determines the bathymetry, it is the de facto input for uncertain bathymetry.In this study, we build a mesh for the finite elements representation in the SPDE approach as shown in Figure 4 (a).The dense triangles in the middle cover the uncertain bathymetry area and the outer extension with coarse triangles is added to avoid boundary effect.There are 3200 nodes that influence the bathymetry, hence the uncertain input for bathymetry is of dimension 3200.Given each boat track and the associated observations z, 20 samples of the finite element coefficients are drawn from the posterior π(w|z) to construct a range of possible initial bathymetries.Thus there are 400 (20 samples of boat tracks times 20 samples of w) sets of possible initial bathymetries in total at each survey level.Figure 5 shows the empirical sample mean and standard deviation of these 400 possible bathymetries at survey 1 and 2. Tsunami waves are triggered by the following simplified seabed deformation, where dz(s 1 , s 2 ; t) is the seabed uplift at location s = (s 1 , s 2 ) and time t, h max denotes the maximum seabed uplift; see   h max = 1, ..., 5 m.These values are evenly combined with the uncertain initial bathymetry.Thus there are two sources of uncertainties: w for bathymetry and h max for tsunami source, where w is high-dimensional.We employ the tsunami code VOLNA [12], an advanced non-linear shallow water equation solver using the finite volume method on a high performance computing facility.The computational domain and mesh for VOLNA are presented in Figure 4 (b).There are 120, 661 triangles and 61, 068 nodes in the mesh, where the coarse triangles in both ends are added to avoid boundary reflection.The output of the simulation is chosen to be ∆η(s) = max η t (s) − η 0 (s), where η t (s) is the free surface elevation at simulation time t and location s. ∆η represents the maximum wave height at off shore locations or the maximum inundation depth at on shore locations.For illustration, we consider simulation values at gauge 1: (−2000, 0), gauge 2: (−400, 0), gauge 3: (0, 0) and gauge 4: (200, 0), which are at far shore, near shore, shore line and land respectively; see Figure 2 (c).
The simulation results are presented in Figure 6, with those using the true bathymetry shown in red lines and those using the sample mean bathymetry shown in green dash lines.We can see that ∆η increases with h max but also shows variation due to the uncertain inputs w for fixed h max , especially at gauges 2-4 around the shore line.In general, the simulations with sample mean bathymetry would deviate from true values, while those with random bathymetry samples can cover the true events quite well.The survey level also has significant influence that differs across the four gauges.In most cases the wider range of possible simulation values with coarser survey level 2 indicates that the uncertainty in the bathymetry would spread the tsunami waves out to simulate more extreme scenarios and such effect could be amplified around the shore line.
Following the procedure in Section 4, we can construct a low-dimensional emulator for such high dimensional simulator with 3200 input parameters for the bathymetry (w) and 1 parameter for the seabed deformation (h max ).Denoting the VOLNA code with f , the output can be represented as ∆η = f (w, h max ).Because Figure 6 displays a significant relationship between h max and ∆η, we keep it as a separate input in the emulator and reduce the dimension of w only.In this case, we try to find a projection matrix B such that w ⊥ (∆η, h max )|B T w.The conditional independence just implies the sufficiency of B T w, i.e. p(∆η|h max , w) = p(∆η|h max , B T w) [31].Therefore, (∆η, h max ) is regarded as a temporary output when applying gKDR to reduce the dimension of w.
For the gKDR approach, Gaussian RBF kernels k(x, y) = exp( x − y 2 /(2σ 2 )) are deployed for both k X and k Y but with different parameters σ 2 .Following [15], we have the parameterization σ X = c 1 σ med (X), σ Y = c 2 σ med (Y) where σ med (•) is the median of pairwise distance of the data.The regularization parameter n is fixed at 10 −5 because its influence is shown to be negligible after a few trials.There are three parameters to be specified properly: c 1 , c 2 , and d, the number of directions included in the emulator.We consider possible candidates c 1 , c 2 ∈ [0.5, 1, 5, 10, 15, 20], d ∈ [1, ..., 5] here.Then, based on each of the possible parameters combination, we can construct a GP emulator on the low-dimensional inputs (h max , B T w) and make predictions on the new inputs ( hmax , B T w).
For comparison, we also apply alternative dimension reduction techniques to construct the low-dimensional approximations.Due to the complexity of the VOLNA code, the gradients of simulation values with respect to the inputs are not computable.Hence the active subspace method cannot be employed.Most of the methods in Study 1 cannot be applied directly because of the need for partial dimension reduction, or the "large p, small n" feature, i.e. there are much more input parameters than the number of simulations.We consider two extensions to PFC and SIR.The partial PFC (PPFC) method [20] is implemented based on the R package ldr [1] to find the reduction on w only meanwhile taking the effect of h max into account.Note that PFC is not developed for the problem where p > n or p n. Another method we compare to is the sequential sufficient dimension reduction (SSDR) [31].It is specifically proposed to overcome the "large p, small n" difficulty by decomposing the variables into pieces each of which has p 1 < n variables so that conventional dimension reduction methods can be applied.The projective resampling approach [21] with SIR is employed.The R code for SSDR by [31] is available from http://wileyonlinelibrary.com/journal/rss-datasets.
To measure the predictive performance and select proper parameters, a 10-fold cross validation approach is employed.For each survey level, 400 simulations are divided evenly into 10 groups.Each group is retained as testing set once, while the other nine groups are used to estimate the projection matrix B using the gKDR, PPFC or SSDR approaches and train the respective GP emulator.Table 2 presents the normalized PRMSEs from the cross validation for each survey level and gauge using different dimension reduction techniques.The errors of survey level 1 are in general smaller than those of survey level 2. This implies that as the uncertainties in the bathymetry increase, it gets more difficult to make accurate predictions using emulation.The methods gKDR and SSDR outperform PPFC in all cases, especially in survey level 1 where the normalized PRMSEs can be 50% lower in some cases.In survey level 1, the errors of the gKDR approach are slightly larger than those of the SSDR for gauge 2-4 where the normalized PRMSEs using SSDR are around 1.1% ∼ 3.7% lower.But in survey level 2, the gKDR approach is more accurate than the SSDR approach for all gauges with reduction of normalized PRMSEs at 1.0% for gauge 1 and 10.1% ∼ 18.9% for gauge 2-4.Therefore, gKDR is comparable with SSDR in survey level 1 but works much better than SSDR in survey level 2 when there are more uncertainties involved.We can conclude that the proposed GP emulation framework combined with the gKDR dimension reduction approach is effective and accurate for this complicated tsunami simulator and overall it outperforms the alternatives.To investigate the impact of the training set size on the predictive performance of the proposed emulation framework with gKDR, we conduct repeated random sub-sampling cross validations with various training set sizes.We consider training set size as 2%, 5%, 10%, 20%, ..., 90% and test set size as 10% of the total 400 simulations.For each training set size, the sampling is repeated 50 times.The parameters c 1 , c 2 , d are fixed at those values selected through the above 10-fold cross validation.Figure 7 displays the normalized PRMSEs with various training set sizes.In general, the predictive errors decrease as the training set size increases, and eventually converge to a relatively flat level, after about 100 simulations.It is reassuring that such a small number of simulations is enough to allow an efficient and effective dimension reduction and Gaussian process emulation.In the end, we apply the resulting emulator to predict the simulation values over a large number of new inputs.The predictions can be used for probabilistic risk assessment and many other purposes.For illustration, 10, 000 samples of ( hmax , w) are drawn where hmax are drawn from a Normal distribution N(3, 1) truncated at 0 and 5.For each survey level, 100 samples of possible boat tracks are drawn randomly.Given the observations along each boat track, 100 samples of w are drawn from the posterior.In most of the current tsunami research work, the bathymetry is usually considered as fixed, which would neglect the possible uncertainty in the outputs.For comparison, we also conduct another set of simulations with the 5 possible values of h max , but with a fixed w taken to be the sample mean shown in Figure 5.In this case, h max is the only uncertain parameter.Then we can also make another set of predictions on the 10, 000 samples of hmax only.The predictions for these two cases are presented in Figure 8.We can see that at gauge 1 it makes no significant difference to include the uncertainty in the bathymetry or not, as the impact is relatively small on the far shore waves, as shown in Figure 6.However, the impact of the uncertainty in the bathymetry on the simulation values is more significant at gauges around the shore line.The distributions are shifted, skewed and spread out, covering more extreme events with larger ∆η.These features are potentially important, for example in the catastrophe models for (re)insurance, or hazard assessment used in coastal planning.

Discussion
We proposed a joint framework for emulation of high-dimensional simulators with dimension reduction.The gKDR approach is employed to construct low-dimensional approximations to the simulators.The approximations retain most of the information about the input-output behavior and make the emulation much more efficient.Both theoretical properties and numerical studies have demonstrated the efficiency and accuracy of the proposed approach and its advantages over other dimension reduction techniques.Our method can be applied for many purposes of uncertainty quantification such as risk assessment, sensitivity analysis and calibration, with great perspectives in real world applications.
There are some practical issues when applying the proposed framework.The hyperparameters in gKDR and the number of dimensions to be included in the emulator need to be specified properly.In practice, a simple trial and error procedure could be applied, especially when the results are not very sensitive to the choices.Cross validation steps could also benefit from parallel computing technique.The sample size also affect the predictive ability of the final emulator, as sufficient samples are needed to estimate the dimension reduction accurately.A diagnostic plot of predictive errors with increasing number of sizes such as Figure 7 could help identify the convergence.After determining the dimension reduction, a sufficient number of training samples with a proper design are often required to train the emulator in order to balance the computational cost and accuracy.The benefits of our approach are multiple.One can tackle uncertainty quantification tasks for complex models where boundary conditions are of high dimension.Beyond tsunami modeling, in climate simulation, weather forecasting or geophysical sciences, uncertainty quantification studies would become tractable and potentially offer solutions to important scientific problems.
In fact, if there exists a B matrix satisfying (1), we can just set W = [B C], where C is an m × (m − d) matrix such that C T C = I m−d and the column vectors of C are orthogonal to those of B; otherwise, W = B and d = m.

Proposition 2 .
For any g ∈ H Y and ũ ∈ R d , we approximate E[g(Y )|X = x] by E[g(Y )| Ũ = ũ] for every x such that WT 1 x = ũ.Then we have

Figure 3 :
Figure 3: Three samples of boat tracks at two levels of survey density; the bathymetry within the blue rectangle are assumed uncertain.

Figure 2 (
b) for example.We take 5 different values

Figure 5 :
Figure 5: Empirical mean and standard deviation of the 400 sets of bathymetry input across both uncertain boat tracks and posterior samples of w; note the different scales of standard deviation for survey level 1 and 2.

Figure 6 :
Figure 6: Simulation values with different inputs (w, h max ) at four gauges.The uncertain input w are drawn based on survey level 1 and 2, together with the true values and sample mean values.

Figure 8 :
Figure 8: Histogram of predictions of 10, 000 tsunami wave heights at four gauges due to the uncertain seabed uplift (prediction P2).Prediction P1 also accounts for the uncertainties in the bathymetry.Left column: high-resolution bathymetry survey (level 1); Right column: coarse bathymetry survey (level 2).

Table 2 :
Study 2: Normalized PRMSEs of the 10-fold cross validation using GP emulation combined with different dimension reduction methods.