Abstract

Despite the ubiquity of the Gaussian process regression model, few theoretical results are available that account for the fact that parameters of the covariance kernel typically need to be estimated from the data set. This article provides one of the first theoretical analyses in the context of Gaussian process regression with a noiseless data set. Specifically, we consider the scenario where the scale parameter of a Sobolev kernel (such as a Matérn kernel) is estimated by maximum likelihood. We show that the maximum likelihood estimation of the scale parameter alone provides significant adaptation against misspecification of the Gaussian process model in the sense that the model can become “slowly” overconfident at worst, regardless of the difference between the smoothness of the data-generating function and that expected by the model. The analysis is based on a combination of techniques from nonparametric regression and scattered data interpolation. Empirical results are provided in support of the theoretical findings.

Keywords

  1. nonparametric regression
  2. scattered data approximation
  3. credible sets
  4. Bayesian cubature
  5. model misspecification

MSC codes

  1. 60G15
  2. 62G20
  3. 68T37
  4. 65D05
  5. 46E22

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
R. Arcangéli, M. C. L. de Silanes, and J. J. Torrens, An extension of a bound for functions in Sobolev spaces, with applications to $(m,s)$-spline interpolation and smoothing, Numer. Math., 107 (2007), pp. 181--211.
2.
R. Arcangéli, M. C. L. de Silanes, and J. J. Torrens, Extension of sampling inequalities to Sobolev semi-norms of fractional order and derivative data, Numer. Math., 121 (2012), pp. 587--608.
3.
K. E. Atkinson, An Introduction to Numerical Analysis, 2nd ed., Wiley, New York, 1989.
4.
F. Bach, On the equivalence between kernel quadrature rules and random feature expansions, J. Mach. Learn. Res., 18 (2017), pp. 1--38.
5.
F. Bachoc, Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model misspecification, Comput. Statist. Data Anal., 66 (2013), pp. 55--69.
6.
F. Bachoc, Asymptotic analysis of covariance parameter estimation for Gaussian processes in the misspecified case, Bernoulli, 24 (2018), pp. 1531--1575.
7.
F. Bachoc, A. Lagnoux, and T. M. N. Nguyen, Cross-validation estimation of covariance parameters under fixed-domain asymptotics, J. Multivariate Anal., 160 (2017), pp. 42--67.
8.
F. Bachoc, A. Lagnoux, and A. Lopera-López, Maximum likelihood estimation for Gaussian processes under inequality constraints, Electron. J. Stat., 13 (2019), pp. 2921--2969.
9.
E. G. Băzăvan, F. Li, and C. Sminchisescu, Fourier kernel learning, European Conference on Computer Vision, Springer, Berlin, 2012, pp. 459--473.
10.
A. Berlinet and C. Thomas-Agnan, Reproducing Kernel Hilbert Spaces in Probability and Statistics, Springer, New York, 2004.
11.
V. I. Bogachev, Gaussian Measures, American Mathematical Society, Providence, RI, 1998.
12.
F.-X. Briol, C. J. Oates, M. Girolami, M. A. Osborne, and D. Sejdinovic, Probabilistic integration: A role in statistical computation? (with discussion and rejoinder), Statist. Sci., 34 (2019), pp. 1--22.
13.
A. D. Bull, Convergence rates of efficient global optimization algorithms, J. Mach. Learn. Res., 12 (2011), pp. 2879--2904.
14.
J. Cockayne, C. J. Oates, T. J. Sullivan, and M. Girolami, Bayesian probabilistic numerical methods, SIAM Rev., 61 (2019), pp. 756--789.
15.
M. Dashti and A. M. Stuart, The Bayesian approach to inverse problems, Handbook of Uncertainty Quantification, Springer, Cham, Switzerland, 2017, pp. 311--428.
16.
S. De Marchi and R. Schaback, Stability of kernel-based interpolation, Adv. Comput. Math., 32 (2010), pp. 155--161.
17.
D. Dong, Mine gas emission prediction based on Gaussian process model, Procedia Eng., 45 (2012), pp. 334--338.
18.
M. F. Driscoll, The reproducing kernel Hilbert space structure of the sample paths of a Gaussian process, Zeit. Wahrscheinlichkeitstheorie Verwandte Geb., 26 (1973), pp. 309--316.
19.
D. Duvenaud, Automatic Model Construction with Gaussian Processes, Ph.D. thesis, University of Cambridge, Cambridge, 2014.
20.
G. Fasshauer and M. McCourt, Kernel-based Approximation Methods Using MATLAB, World Scientific, Singapore, 2015.
21.
G. E. Fasshauer, Positive definite kernels: Past, present and future, Dolomites Res. Notes Approx., 4 (2011), pp. 21--63.
22.
E. Fong and C. C. Holmes, On the marginal likelihood and cross-validation, preprint, Biometrika, 107 (2020), pp. 489--496.
23.
P. Gao, A. Honkela, M. Rattray, and N. D. Lawrence, Gaussian process modelling of latent chemical species: Applications to inferring transcription factor activities, Bioinformatics, 24 (2008), pp. i70--i75.
24.
P. Grisvard, Elliptic Problems in Nonsmooth Domains, Pitman Publishing, Boston, 1985.
25.
A. Hadji and B. Szabó, Can We Trust Bayesian Uncertainty Quantification from Gaussian Process Priors with Squared Exponential Covariance Kernel?, preprint, arXiv:1904.01383v1, 2019.
26.
P. Hennig, M. A. Osborne, and M. Girolami, Probabilistic numerics and uncertainty in computations, Proc. A, 471 (2015), 2015.0142.
27.
A. Iske, Approximation Theory and Algorithms for Data Analysis, Springer, Cham, Switzerland, 2018.
28.
M. Kanagawa, P. Hennig, D. Sejdinovic, and B. K. Sriperumbudur, Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences, preprint, arXiv:1807.02582v1, 2018.
29.
M. Kanagawa, B. K. Sriperumbudur, and K. Fukumizu, Convergence analysis of deterministic kernel-based quadrature rules in misspecified settings, Found. Comput. Math., 20 (2020), pp. 155--194.
30.
T. Karvonen, C. J. Oates, and S. Särkkä, A Bayes--Sard cubature method, in Advances in Neural Information Processing Systems 31, Curran Associates, Red Hook, NY, 2018, pp. 5882--5893.
31.
T. Karvonen, Kernel-Based and Bayesian Methods for Numerical Integration, Ph.D. thesis, Department of Electrical Engineering and Automation, Aalto University, Espoo, Finland, 2019.
32.
T. Karvonen, F. Tronarp, and S. Särkkä, Asymptotics of maximum likelihood parameter estimates for Gaussian processes: The Ornstein--Uhlenbeck prior, In Proceedings of the 29th IEEE International Workshop on Machine Learning for Signal Processing, IEEE, Piscataway, NJ, 2019.
33.
M. C. Kennedy and A. O'Hagan, Bayesian calibration of computer models, J. Roy. Stat. Soc. Ser. B Stat. Methodol., 63 (2001), pp. 425--464.
34.
K. Kowalska and L. Peel, Maritime anomaly detection using Gaussian process active learning, in Proceedings of the 15th International Conference on Information Fusion, IEEE, Piscataway, NJ, 2012, pp. 1164--1171.
35.
F. M. Larkin, Gaussian measure in Hilbert space and applications in numerical analysis, Rocky Mountain J. Math., 2 (1972), pp. 379--422.
36.
D. Liebl and M. Reimherr, Fast and Fair Simultaneous Confidence Bands for Functional Parameters, preprint, arXiv:1910.00131v2, 2019.
37.
F. Lindgren, H. Rue, and J. Lindström, An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach, J. R. Stat. Soc. Ser. B Stat. Methodol., 73 (2011), pp. 423--498.
38.
D. Liu, J. Pang, J. Zhou, Y. Peng, and M. Pecht, Prognostics for state of health estimation of lithium-ion batteries based on combination Gaussian process functional regression, Microelectron. Reliab., 53 (2013), pp. 832--839.
39.
M. N. Lukić and J. H. Beder, Stochastic processes with sample paths in reproducing kernel Hilbert spaces, Trans. Amer. Math. Soc., 353 (2001), pp. 3945--3969.
40.
D. J. C. MacKay, Bayesian interpolation, Neural Comput., 4 (1992), pp. 415--447.
41.
D. J. C. MacKay, Hyperparameters: Optimize, or integrate out?, in Maximum Entropy and Bayesian Methods, Kluwer, Dordrecht, the Netherlands, 1996, pp. 43--59.
42.
G. Manogaran and D. Lopez, A Gaussian process based big data processing framework in cluster computing environment, Cluster Comput., 21 (2018), pp. 189--204.
43.
F. J. Narcowich, J. D. Ward, and H. Wendland, Sobolev error estimates and a Bernstein inequality for scattered data interpolation via radial basis functions, Constr. Approx., 24 (2006), pp. 175--186.
44.
E. Novak and H. Woźniakowski, Tractability of Multivariate Problems. Volume I: Linear Information, European Mathematical Society, Zurich, Switzerland, 2008.
45.
E. Novak, Deterministic and Stochastic Error Bounds in Numerical Analysis, Springer, Berlin, 1988.
46.
J. Oettershagen, Construction of Optimal Cubature Algorithms with Applications to Econometrics and Uncertainty Quantification, Ph.D. thesis, University of Bonn, Bonn, Germany, 2017.
47.
A. O'Hagan, Bayes--Hermite quadrature, J. Statist. Plann. Inference, 29 (1991), pp. 245--260.
48.
J. B. Oliva, A. Dubey, A. G Wilson, B. Póczos, J. Schneider, and E. P. Xing, Bayesian nonparametric kernel-learning, in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, JMLR Workshop Conf. Proc. 41, 2016, pp. 1078--1086.
49.
V. Rajpaul, S. Aigrain, M. A. Osborne, S. Reece, and S. Roberts, A Gaussian process framework for modelling stellar activity signals in radial velocity data, Month. Not. Roy. Astron. Soc., 452 (2015), pp. 2269--2291.
50.
C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2006.
51.
J. Rathinavel and F. Hickernell, Fast automatic Bayesian cubature using lattice sampling, Stat. Comput., 29 (2019), pp. 1215--1229.
52.
C. Rieger and B. Zwicknagl, Sampling inequalities for infinitely smooth functions, with applications to interpolation and machine learning, Adv. Comput. Math., 32 (2010), pp. 103--129.
53.
K. Ritter, Average-Case Analysis of Numerical Problems, Springer, New York, 2000.
54.
J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn, Design and analysis of computer experiments, Statist. Sci., 4 (1989), pp. 409--423.
55.
R. Schaback, Error estimates and condition numbers for radial basis function interpolation, Adv. Comput. Math., 3 (1995), pp. 251--264.
56.
R. Schaback, Improved error bounds for scattered data interpolation by radial basis functions, Math. Comput., 68 (1999), pp. 201--216.
57.
R. Schaback, A unified theory of radial basis functions: Native Hilbert spaces for radial basis functions II, J. Comput. Appl. Math., 121 (2000), pp. 165--177.
58.
R. Schaback, Superconvergence of kernel-based interpolation, J. Approx. Theory, 235 (2018), pp. 1--19.
59.
M. Scheuerer, R. Schaback, and M. Schlather, Interpolation of spatial data -- A stochastic or a deterministic problem?, European J. Appl. Math., 24 (2013), pp. 601--629.
60.
M. Scheuerer, Regularity of the sample paths of a general second order random field, Stochastic Process. Appl., 120 (2010), pp. 1879--1897.
61.
J. Q. Shi and B. Wang, Curve prediction and clustering with mixtures of Gaussian process functional regression models, Statist. Comput., 18 (2008), pp. 267--283.
62.
M. L. Stein, Spline smoothing with an estimated order parameter, Ann. Statist., 21 (1993), pp. 1522--1544.
63.
M. L. Stein, Interpolation of Spatial Data: Some Theory for Kriging, Springer, New York, 1999.
64.
I. Steinwart, Convergence types and rates in generic Karhunen-Loéve expansions with applications to sample path properties, Potential Anal., 51 (2019), pp. 361--395.
65.
I. Steinwart and C. Scovel, Mercer's theorem on general domains: On the interaction between measures, kernels, and RKHSs, Constr. Approx., 35 (2012), pp. 363--417.
66.
S. Sun, G. Zhang, C. Wang, W. Zeng, J. Li, and R. Grosse, Differentiable compositional kernel learning for Gaussian processes, in Proceedings of the 35th International Conference on Machine Learning, 31, Curran Associates, Red Hook, NY, 2018, pp. 4828--4837.
67.
B. Szabó, A. W. van der Vaart, and J. H. van Zanten, Empirical Bayes scaling of Gaussian priors in the white noise model, Electron. J. Stat., 7 (2013), pp. 991--1018.
68.
B. Szabó, A. W. van der Vaart, and J. H. van Zanten, Frequentist coverage of adaptive nonparametric Bayesian credible sets, Ann. Statist., 43 (2015), pp. 1391--1428.
69.
A. L. Teckentrup, Convergence of Gaussian Process Regression with Estimated Hyper-Parameters and Applications in Bayesian Inverse Problems, preprint, arXiv:1909.00232v2, 2019.
70.
H. Triebel, Theory of Function Spaces III, Birkhäuser, Basel, Switzerland, 2006.
71.
R. Tuo, W. Wang, and C. F. J. Wu, On the Improved Rates of Convergence for Matérn-type Kernel Ridge Regression, with Application to Calibration of Computer Models, preprint, arXiv:2001.00152v1, 2020.
72.
A. W. van der Vaart and J. H. van Zanten, Reproducing Kernel Hilbert Spaces of Gaussian Priors, Inst. Math. Stat. (IMS), Collect. 3, IMS, Beachwood, OH, 2008, pp. 200--222.
73.
A. W. van der Vaart and J. H. van Zanten, Information rates of nonparametric Gaussian process methods, J. Mach. Learn. Res., 12 (2011), pp. 2095--2119.
74.
J. Wang, A. Hertzmann, and D. J. Fleet, Gaussian process dynamical models, in Advances in Neural Information Processing Systems 19, 2006, Curran Associates, Red Hook, NY, pp. 1441--1448.
75.
W. Wang, On the Inference of Applying Gaussian Process Modeling to a Deterministic Function, preprint, arXiv:2002.01381v1, 2020.
76.
H. Wendland, Scattered Data Approximation, Cambridge University Press, Cambridge, 2005.
77.
H. Wendland and C. Rieger, Approximate interpolation with applications to selecting smoothing parameters, Numer. Math., 101 (2005), pp. 729--748.
78.
G. Wynne, F.-X. Briol, and M. Girolami, Convergence Guarantees for Gaussian Process Means with Misspecified Likelihoods and Smoothness, preprint, arXiv:2001.10818v2, 2020.
79.
W. Xu and M. L. Stein, Maximum likelihood estimation for a smooth Gaussian random field model, SIAM/ASA J. Uncertain. Quantif., 5 (2017), pp. 138--175.
80.
K. Yang, S. Keat Gan, and S. Sukkarieh, A Gaussian process-based RRT planner for the exploration of an unknown and cluttered environment with a UAV, Adv. Robot., 27 (2013), pp. 431--443.

Information & Authors

Information

Published In

cover image SIAM/ASA Journal on Uncertainty Quantification
SIAM/ASA Journal on Uncertainty Quantification
Pages: 926 - 958
ISSN (online): 2166-2525

History

Submitted: 30 January 2020
Accepted: 12 May 2020
Published online: 4 August 2020

Keywords

  1. nonparametric regression
  2. scattered data approximation
  3. credible sets
  4. Bayesian cubature
  5. model misspecification

MSC codes

  1. 60G15
  2. 62G20
  3. 68T37
  4. 65D05
  5. 46E22

Authors

Affiliations

Funding Information

Aalto ELEC Doctoral School
Academy of Finland https://doi.org/10.13039/501100002341
Alan Turing Institute https://doi.org/10.13039/100012338
Engineering and Physical Sciences Research Council https://doi.org/10.13039/501100000266 : 18000171

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.