Analysis of the Generalization Error: Empirical Risk Minimization over Deep Artificial Neural Networks Overcomes the Curse of Dimensionality in the Numerical Approximation of Black--Scholes Partial Differential Equations

Abstract

The development of new classification and regression algorithms based on empirical risk minimization (ERM) over deep neural network hypothesis classes, coined deep learning, revolutionized the area of artificial intelligence, machine learning, and data analysis. In particular, these methods have been applied to the numerical solution of high-dimensional partial differential equations with great success. Recent simulations indicate that deep learning--based algorithms are capable of overcoming the curse of dimensionality for the numerical solution of Kolmogorov equations, which are widely used in models from engineering, finance, and the natural sciences. The present paper considers under which conditions ERM over a deep neural network hypothesis class approximates the solution of a $d$-dimensional Kolmogorov equation with affine drift and diffusion coefficients and typical initial values arising from problems in computational finance up to error $\varepsilon$. We establish that, with high probability over draws of training samples, such an approximation can be achieved with both the size of the hypothesis class and the number of training samples scaling only polynomially in $d$ and $\varepsilon^{-1}$. It can be concluded that ERM over deep neural network hypothesis classes overcomes the curse of dimensionality for the numerical solution of linear Kolmogorov equations with affine coefficients.

Keywords

  1. deep learning
  2. curse of dimensionality
  3. Kolmogorov equation
  4. generalization error
  5. empirical risk minimization

MSC codes

  1. 60H30
  2. 65C30
  3. 62M45
  4. 68T05

Get full access to this article

View all available purchase options and get full access to this article.

Supplementary Material


PLEASE NOTE: These supplementary files have not been peer-reviewed.


Index of Supplementary Materials

Title of paper: Analysis of the Generalization Error: Empirical Risk Minimization over Deep Artificial Neural Networks Overcomes the Curse of Dimensionality in the Numerical Approximation of Black-Scholes Partial Differential Equations

Authors: Julius Berner, Philipp Grohs, and Arnulf Jentzen

File: M125649_01.pdf

Type: PDF

Contents: two additional proofs

References

1.
C. Aliprantis and K. Border, Infinite Dimensional Analysis: A Hitchhiker's Guide, 3rd ed., Springer, 2007.
2.
Z. Allen-Zhu, Y. Li, and Z. Song, A convergence theory for deep learning via over-parameterization, in International Conference on Machine Learning, 2019, pp. 242--252.
3.
W. Ames, Numerical Methods for Partial Differential Equations, Comput. Sci. Sci. Comput., Elsevier Science, 2014.
4.
M. Anthony and P. Bartlett, Neural Network Learning: Theoretical Foundations, Cambridge University Press, 2009.
5.
L. Arnold, Stochastic Differential Equations, Wiley-Interscience, 1974.
6.
S. Arora, R. Ge, B. Neyshabur, and Y. Zhang, Stronger generalization bounds for deep nets via a compression approach, in International Conference on Machine Learning, 2018, pp. 254--263.
7.
P. L. Bartlett, O. Bousquet, and S. Mendelson, Local Rademacher complexities, Ann. Statist., 33 (2005), pp. 1497--1537.
8.
P. L. Bartlett, D. J. Foster, and M. Telgarsky, Spectrally-normalized margin bounds for neural networks, in Advances in Neural Information Processing Systems, 2017, pp. 6240--6249.
9.
P. L. Bartlett, N. Harvey, C. Liaw, and A. Mehrabian, Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks, J. Mach. Learn. Res., 20 (2019), 63.
10.
C. Beck, S. Becker, P. Grohs, N. Jaafari, and A. Jentzen, Solving Stochastic Differential Equations and Kolmogorov Equations by Means of Deep Learning, preprint, https://arxiv.org/abs/1806.00421, 2018.
11.
C. Beck, W. E, and A. Jentzen, Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations, J. Nonlinear Sci., 29 (2019), pp. 1563--1619.
12.
S. Becker, P. Cheridito, and A. Jentzen, Deep optimal stopping, J. Mach. Learn. Res., 20 (2019), pp. 1--25.
13.
J. Berner, D. Elbrächter, and P. Grohs, How degenerate is the parametrization of neural networks with the ReLU activation function?, in Advances in Neural Information Processing Systems 32, Curran Associates, 2019, pp. 7790--7801.
14.
J. Berner, D. Elbrächter, P. Grohs, and A. Jentzen, Towards a regularity theory for ReLU networks---chain rule and global error estimates, in the 13th International Conference on Sampling Theory and Applications (SampTA), IEEE, 2019, pp. 1--5.
15.
H. Bölcskei, P. Grohs, G. Kutyniok, and P. Petersen, Optimal approximation with sparsely connected deep neural networks, SIAM J. Math. Data Sci., 1 (2019), pp. 8--45, https://doi.org/10.1137/18M118709X.
16.
M. Burger and A. Neubauer, Error bounds for approximation with neural networks, J. Approx. Theory, 112 (2001), pp. 235--250.
17.
A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, and Y. LeCun, The loss surfaces of multilayer networks, Proc. Mach. Learn. Res., 38 (2015), pp. 192--204.
18.
F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc., 39 (2002), pp. 1--49.
19.
F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge Monogr. Appl. Comput. Math., Cambridge University Press, 2007.
20.
S. S. Du, J. D. Lee, H. Li, L. Wang, and X. Zhai, Gradient descent finds global minima of deep neural networks, in International Conference on Machine Learning, 2019, pp. 1675--1685.
21.
W. E, J. Han, and A. Jentzen, Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations, Commun. Math. Stat., 5 (2017), pp. 349--380.
22.
W. E and B. Yu, The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems, Commun. Math. Stat., 6 (2018), pp. 1--12.
23.
D. Elbrächter, P. Grohs, A. Jentzen, and C. Schwab, DNN Expression Rate Analysis of High-Dimensional PDEs: Application to Option Pricing, preprint, https://arxiv.org/abs/1809.07669, 2018.
24.
L. Evans, Partial Differential Equations, 2nd ed., Grad. Stud. Math., American Mathematical Society, 2010.
25.
M. Fujii, A. Takahashi, and M. Takahashi, Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs, Asia-Pacific Financial Markets, 26 (2019), pp. 391--408.
26.
K.-I. Funahashi, On the approximate realization of continuous mappings by neural networks, Neural Networks, 2 (1989), pp. 183--192.
27.
N. Golowich, A. Rakhlin, and O. Shamir, Size-independent sample complexity of neural networks, in Conference on Learning Theory, 2018, pp. 297--299.
28.
C. Graham and D. Talay, Stochastic Simulation and Monte Carlo Methods: Mathematical Foundations of Stochastic Simulation, Stoch. Model. Appl. Probab. 68, Springer, 2013.
29.
P. Grohs, F. Hornung, A. Jentzen, and P. von Wurstemberger, A Proof that Artificial Neural Networks Overcome the Curse of Dimensionality in the Numerical Approximation of Black-Scholes Partial Differential Equations, preprint, https://arxiv.org/abs/1809.02362, 2018; to appear in Mem. Amer. Math. Soc.
30.
P. Grohs, F. Hornung, A. Jentzen, and P. Zimmermann, Space-Time Error Estimates for Deep Neural Network Approximations for Differential Equations, preprint, https://arxiv.org/abs/1908.03833, 2019.
31.
L. Györfi, M. Kohler, A. Krzyzak, and H. Walk, A Distribution-Free Theory of Nonparametric Regression, Springer Science & Business Media, 2006.
32.
M. Hairer, M. Hutzenthaler, and A. Jentzen, Loss of regularity for Kolmogorov equations, Ann. Probab., 43 (2015), pp. 468--527.
33.
J. Han, A. Jentzen, and W. E, Solving high-dimensional partial differential equations using deep learning, Proc. Natl. Acad. Sci. USA, 115 (2018), pp. 8505--8510.
34.
P. Henry-Labordere, Deep primal-dual algorithm for BSDEs: Applications of machine learning to CVA and IM, SSRN Electronic J., (2017), https://doi.org/10.2139/ssrn.3071506.
35.
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag., 29 (2012), pp. 82--97.
36.
M. Hutzenthaler, A. Jentzen, T. Kruse, and T. A. Nguyen, A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations, SN Partial Differential Equations Appl., 1 (2020), pp. 1--34.
37.
A. Jentzen, D. Salimova, and T. Welti, A Proof That Deep Artificial Neural Networks Overcome the Curse of Dimensionality in the Numerical Approximation of Kolmogorov Partial Differential Equations with Constant Diffusion and Nonlinear Drift Coefficients, preprint, https://arxiv.org/abs/1809.07321, 2018.
38.
K. Kawaguchi, Deep learning without poor local minima, in Advances in Neural Information Processing Systems 29, Curran Associates, 2016, pp. 586--594.
39.
D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, preprint, https://arxiv.org/abs/1412.6980, 2014.
40.
V. Koltchinskii, Introduction, in Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems, Springer, 2011, pp. 1--16.
41.
G. Kutyniok, P. Petersen, M. Raslan, and R. Schneider, A Theoretical Analysis of Deep Neural Networks and Parametric PDEs, preprint, https://arxiv.org/abs/1904.00377, 2019.
42.
Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015), pp. 436--444.
43.
Y. LeCun, C. Cortes, and C. J. C. Burges, The MNIST Database of Handwritten Digits, 1998, http://yann.lecun.com/exdb/mnist/ [online; accessed August 22, 2018].
44.
Y. Li and Y. Liang, Learning overparameterized neural networks via stochastic gradient descent on structured data, in Advances in Neural Information Processing Systems, 2018, pp. 8157--8166.
45.
Y. Li and Y. Yuan, Convergence analysis of two-layer neural networks with ReLU activation, in Advances in Neural Information Processing Systems, 2017, pp. 597--607.
46.
P. Massart, Concentration Inequalities and Model Selection, Springer, 2007.
47.
S. Mei, A. Montanari, and P.-M. Nguyen, A mean field view of the landscape of two-layer neural networks, Proc. Natl. Acad. Sci. USA, 115 (2018), pp. E7665--E7671.
48.
M. Mohri, A. Rostamizadeh, A. Talwalkar, and F. Bach, Foundations of Machine Learning, Adapt. Comput. Mach. Learn., MIT Press, 2012.
49.
B. Neyshabur, S. Bhojanapalli, D. McAllester, and N. Srebro, Exploring generalization in deep learning, in Advances in Neural Information Processing Systems, 2017, pp. 5947--5956.
50.
M. Nielsen, Neural Networks and Deep Learning, 2015, http://neuralnetworksanddeeplearning.com/chap1.html [online; accessed March 05, 2018].
51.
D. Perekrestenko, P. Grohs, D. Elbrächter, and H. Bölcskei, The Universal Approximation Power of Finite-Width Deep ReLU Networks, preprint, https://arxiv.org/abs/1806.01528, 2018.
52.
P. Petersen, M. Raslan, and F. Voigtlaender, Topological Properties of the Set of Functions Generated by Neural Networks of Fixed Size, preprint, https://arxiv.org/abs/1806.08459, 2018.
53.
P. Petersen and F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks, Neural Networks, 108 (2018), pp. 296--330.
54.
C. Reisinger and Y. Zhang, Rectified Deep Neural Networks Overcome the Curse of Dimensionality for Nonsmooth Value Functions in Zero-Sum Games of Nonlinear Stiff Systems, preprint, https://arxiv.org/abs/1903.06652, 2019.
55.
C. Schwab and J. Zech, Deep learning in high dimension: Neural network expression rates for generalized polynomial chaos expansions in UQ, Anal. Appl., 17 (2019), pp. 19--55.
56.
R. Seydel, Tools for Computational Finance, Universitext, Springer London, 2012.
57.
U. Shaham, A. Cloninger, and R. R. Coifman, Provable approximation properties for deep neural networks, Appl. Comput. Harmon. Anal., 44 (2018), pp. 537--557.
58.
O. Shamir and T. Zhang, Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes, in Proceedings of the 30th International Conference on Machine Learning, PMLR, 2013, pp. 71--79.
59.
J. Sirignano and K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, J. Comput. Phys., 375 (2018), pp. 1339--1364.
60.
S. A. Van de Geer, Applications of Empirical Process Theory, Camb. Ser. Stat. Probab. Math. 6, Cambridge University Press, 2000.
61.
D. Yarotsky, Error bounds for approximations with deep ReLU networks, Neural Networks, 94 (2017), pp. 103--114.
62.
C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 2017.

Information & Authors

Information

Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 631 - 657
ISSN (online): 2577-0187

History

Submitted: 16 April 2019
Accepted: 25 March 2020
Published online: 28 July 2020

Keywords

  1. deep learning
  2. curse of dimensionality
  3. Kolmogorov equation
  4. generalization error
  5. empirical risk minimization

MSC codes

  1. 60H30
  2. 65C30
  3. 62M45
  4. 68T05

Authors

Affiliations

Funding Information

Austrian Science Fund https://doi.org/10.13039/501100002428 : I3403-N32
Deutsche Forschungsgemeinschaft https://doi.org/10.13039/501100001659 : EXC 2044-390685587

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media