Abstract

Deep learning (DL) is transforming whole industries as complicated decision-making processes are being automated by deep neural networks (DNNs) trained on real-world data. Driven in part by a rapidly expanding literature on DNN approximation theory showing that DNNs can approximate a rich variety of functions, these tools are increasingly being considered for problems in scientific computing. Yet, unlike more traditional algorithms in this field, relatively little is known about DNNs in relation to the principles of numerical analysis, namely, stability, accuracy, computational efficiency, and sample complexity. In this paper we first introduce a computational framework for examining DNNs in practice, and then use it to study their empirical performance with regard to these issues. We examine the performance of DNNs of different widths and depths on a variety of test functions in various dimensions, including smooth and piecewise smooth functions. We also compare DL against best-in-class methods for smooth function approximation based on compressed sensing. Our main conclusion from these experiments is that there is a crucial gap between the approximation theory of DNNs and their practical performance, with trained DNNs performing relatively poorly on functions for which there are strong approximation results (e.g., smooth functions) yet performing well in comparison to best-in-class methods for other functions. To analyze this gap further, we then provide some theoretical insights. We establish a practical existence theorem, which asserts the existence of a DNN architecture and training procedure that offers the same performance as compressed sensing. This result establishes a key theoretical benchmark. It demonstrates that the gap can be closed, albeit via a DNN approximation strategy which is guaranteed to perform as well as, but no better than, current best-in-class schemes. Nevertheless, it demonstrates the promise of practical DNN approximation by highlighting the potential for developing better schemes through the careful design of DNN architectures and training strategies.

Keywords

  1. neural networks
  2. deep learning
  3. function approximation
  4. compressed sensing
  5. numerical analysis

MSC codes

  1. 41A25
  2. 41A46
  3. 42C05
  4. 65D05
  5. 65D15
  6. 65Y20
  7. 94A20

Get full access to this article

View all available purchase options and get full access to this article.

Supplementary Material


PLEASE NOTE: These supplementary files have not been peer-reviewed.


Index of Supplementary Materials

Title of paper: The gap between theory and practice in function approximation with deep neural networks

Authors: Ben Adcock and Nick Dexter

File: MLFA_supplement.pdf

Type: PDF

Contents: Additional information on the testing setup for the numerical experiments in this work, as well as additional numerical experiments relevant to the discussions, more details about truncation parameters and lower set-motivated recovery strategies in compressed sensing, and proofs of the exponential convergence of best s-term polynomial approximations for holomorphic functions, the convergence of compressed sensing on the same functions, and the proof of the main result on the approximation of such functions with deep neural networks.

References

1.
B. Adcock, Infinite-dimensional compressed sensing and function interpolation, Found. Comput. Math., 18 (2018), pp. 661--701.
2.
B. Adcock, A. Bao, and S. Brugiapaglia, Correcting for unknown errors in sparse high-dimensional function approximation, Numer. Math., 142 (2019), pp. 667--711.
3.
B. Adcock, S. Brugiapaglia, and C. G. Webster, Compressed sensing approaches for polynomial approximation of high-dimensional functions, in Compressed Sensing and Its Applications, Birkhäuser, 2017, pp. 93--124.
4.
V. Antun, F. Renna, C. Poon, B. Adcock, and A. C. Hansen, On Instabilities of Deep Learning in Image Reconstruction -- Does AI Come at a Cost?, preprint, https://arxiv.org/abs/1902.05300, 2019.
5.
S. Arridge, P. Maass, O. Öktem, and C.-B. Schönlieb, Solving inverse problems using data-driven models, Acta Numer., 28 (2019), pp. 1--174.
6.
F. Bach, Breaking the curse of dimensionality with convex neural networks, J. Mach. Learn. Res., 18 (2017), pp. 1--53.
7.
N. Baker, F. Alexander, T. Bremer, A. Hagberg, Y. Y. Kevrekidis, H. Najm, M. Parashar, A. Patra, J. Sethian, S. Wild, and K. Willcox, Workshop Report on Basic Research Needs for Scientific Machine Learning: Core Technologies for Artificial Intelligence, U.S. Department of Energy Advanced Scientific Computing Research, 2019.
8.
C. Beck, A. Jentzen, and B. Kuckuck, Full Error Analysis for the Training of Deep Neural Networks, preprint, https://arxiv.org/abs/1910.00121, 2019.
9.
J. Beck, F. Nobile, L. Tamellini, and R. Tempone, Convergence of quasi-optimal stochastic Galerkin methods for a class of PDEs with random coefficients, Comput. Math. Appl., 67 (2014), pp. 732--751.
10.
J. Beck, R. Tempone, F. Nobile, and L. Tamellni, On the optimal polynomial approximation of stochastic PDEs by Galerkin and collocation methods, Math. Models Methods Appl. Sci., 22 (2012), 1250023.
11.
J. Berner, P. Grohs, and A. Jentzen, Analysis of the Generalization Error: Empirical Risk Minimization over Deep Artificial Neural Networks Overcomes the Curse of Dimensionality in the Numerical Approximation of Black-Scholes Partial Differential Equations, preprint, https://arxiv.org/abs/1809.03062, 2018.
12.
J. Carrasquilla and R. Melko, Machine learning phases of matter, Nature Phys., 13 (2017), pp. 431--434.
13.
A. Chkifa, N. Dexter, H. Tran, and C. G. Webster, Polynomial approximation via compressed sensing of high-dimensional functions on lower sets, Math. Comp., 87 (2018), pp. 1415--1450.
14.
A. Cohen, R. DeVore, and C. Schwab, Convergence rates of best N-term Galerkin approximations for a class of elliptic sPDEs, Found. Comput. Math., 10 (2010), pp. 615--646.
15.
A. Cohen, R. DeVore, and C. Schwab, Analytic regularity and polynomial approximation of parametric and stochastic elliptic PDEs, Anal. Appl., 9 (2011), pp. 11--47.
16.
A. Cohen and R. A. DeVore, Approximation of high-dimensional parametric PDEs, Acta Numer., 24 (2015), pp. 1--159.
17.
G. Cybenko, Approximation by superpositions of a sigmoidal function, Math. Control Signals Systems, 2 (1989), pp. 303--314.
18.
E. C. Cyr, M. A. Gulian, R. G. Patel, M. Perego, and N. A. Trask, Robust training and initialization of deep neural networks: An adaptive basis viewpoint, in Proceedings of the First Mathematical and Scientific Machine Learning Conference (Princeton University, Princeton, NJ), PMLR 107, PMLR, 2020, pp. 512--536, http://proceedings.mlr.press/v107/cyr20a.html.
19.
G. E. Dahl, D. Yu, L. Deng, and A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Trans. Audio Speech Language Process., 20 (2012), pp. 30--42.
20.
J. Daws and C. Webster, Analysis of Deep Neural Networks with Quasi-Optimal Polynomial Approximation Rates, preprint, https://arxiv.org/abs/1912.02302, 2019.
21.
J. Daws and C. G. Webster, A Polynomial-Based Approach for Architectural Design and Learning with Deep Neural Networks, preprint, https://arxiv.org/abs/1905.10457, 2019.
22.
J. De Fauw, J. R. Ledsam, B. Romera-Paredes, S. Nikolov, N. Tomasev, S. Blackwell, H. Askham, X. Glorot, B. O'Donoghue, D. Visentin, G. van den Driessche, B. Lakshminarayanan, C. Meyer, F. Mackinder, S. Bouton, K. Ayoub, R. Chopra, D. King, A. Karthikesalingam, C. O. Hughes, R. Raine, J. Hughes, D. A. Sim, C. Egan, A. Tufail, H. Montgomery, D. Hassabis, G. Rees, T. Back, P. T. Khaw, M. Suleyman, J. Cornebise, P. A. Keane, and O. Ronneberger, Clinically applicable deep learning for diagnosis and referral in retinal disease, Nature Med., 24 (2018), pp. 1342--1350.
23.
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ImageNet: A large-scale hierarchical image database, in Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248--255.
24.
A. Dereventsov, A. Petrosyan, and C. Webster, Greedy Shallow Networks: A New Approach for Constructing and Training Neural Networks, preprint, https://arxiv.org/abs/1905.10409, 2019.
25.
A. Dereventsov, A. Petrosyan, and C. Webster, Neural network integral representations with the ReLU activation function, in Proceedings of the First Mathematical and Scientific Machine Learning Conference, Proc. Mach. Learn. Res. 107, PMLR, 2020, pp. 128--143.
26.
R. A. DeVore, Nonlinear approximation, Acta Numer., 7 (1998), pp. 51--150.
27.
N. Dexter, H. Tran, and C. Webster, A mixed $\ell_1$ regularization approach for sparse simultaneous approximation of parameterized PDEs, ESAIM Math. Model. Numer. Anal., 53 (2019), pp. 2025--2045.
28.
W. E, J. Han, and A. Jentzen, Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations, Commun. Math. Stat., 5 (2017), pp. 349--380.
29.
W. E, C. Ma, and L. Wu, Barron Spaces and the Compositional Function Spaces for Neural Network Models, preprint, https://arxiv.org/abs/1906.08039, 2019.
30.
W. E and Q. Wang, Exponential Convergence of the Deep Neural Network Approximation for Analytic Functions, preprint, https://arxiv.org/abs/1807.00297, 2018.
31.
W. E and B. Yu, The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems, Commun. Math. Stat., 6 (2018), pp. 1--12.
32.
C. Farabet, C. Couprie, L. Najman, and Y. LeCun, Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers, preprint, https://arxiv.org/abs/1202.2160, 2012.
33.
A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard, The robustness of deep networks: A geometrical perspective, IEEE Signal Process. Mag., 34 (2017), pp. 50--62.
34.
D. Fokina and I. Oseledets, Growing Axons: Greedy Learning of Neural Networks with Application to Function Approximation, preprint, https://arxiv.org/abs/1910.12686, 2019.
35.
M. Geist, P. Petersen, M. Raslan, R. Schneider, and G. Kutyniok, Numerical Solution of the Parametric Diffusion Equation by Deep Neural Networks, preprint, https://arxiv.org/abs/2004.12131, 2020.
36.
T. Gerstner and M. Griebel, Numerical integration using sparse grids, Numer. Algorithms, 18 (1998), pp. 209--232.
37.
X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, J. Mach. Learn. Res., 9 (2010), pp. 249--256.
38.
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016.
39.
N. M. Gottschling, V. Antun, B. Adcock, and A. C. Hansen, The Troublesome Kernel: Why Deep Learning for Inverse Problems Is Typically Unstable, preprint, https://arxiv.org/abs/2001.01258, 2020.
40.
P. Grohs, F. Hornung, A. Jentzen, and P. Von Wurstemberger, A Proof That Artificial Neural Networks Overcome the Curse of Dimensionality in the Numerical Approximation of Black-Scholes Partial Differential Equations, preprint, https://arxiv.org/abs/1809.02362, 2018.
41.
P. Grohs, T. Wiatowski, and H. Bolcskei, Deep convolutional neural networks on cartoon functions, in Proceedings of the IEEE International Symposium on Information Theory, ISIT 2016, Barcelona, Spain, 2016, pp. 1163--1167.
42.
I. Gühring, G. Kutyniok, and P. Petersen, Error Bounds for Approximations with Deep ReLU Neural Networks in $W^{s,p}$ Norms, preprint, https://arxiv.org/abs/1902.07896, 2019.
43.
M. D. Gunzburger, C. G. Webster, and G. Zhang, Stochastic finite element methods for partial differential equations with random input data, Acta Numer., 23 (2014), pp. 521--650.
44.
B. Hanin, Which neural net architectures give rise to exploding and vanishing gradients?, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2018, pp. 582--591.
45.
B. Hanin and D. Rolnick, How to start training: The effect of initialization and architecture, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2018, pp. 571--581.
46.
K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV 2015), 2015, pp. 1026--1034.
47.
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag., 29 (2012), pp. 82--97.
48.
K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Networks, 2 (1989), pp. 359--366.
49.
D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, preprint, https://arxiv.org/abs/1412.6980, 2014.
50.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2012, pp. 1097--1105.
51.
G. Kutyniok, P. Petersen, M. Raslan, and R. Schneider, A Theoretical Analysis of Deep Neural Networks and Parametric PDEs, preprint, https://arxiv.org/abs/1904.00377, 2019.
52.
B. Li, S. Tang, and H. Yu, Better Approximations of High Dimensional Smooth Functions by Deep Neural Networks with Rectified Power Units, preprint, https://arxiv.org/abs/1903.05858, 2019.
53.
S. Liang and R. Srikant, Why Deep Neural Networks for Function Approximation?, preprint, https://arxiv.org/abs/1610.04161, 2016.
54.
I. Loshchilov and F. Hutter, Decoupled weight decay regularization, in Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019, https://arxiv.org/abs/1711.05101.
55.
J.-L. Loyer, E. Henriques, M. Fontul, and S. Wiseall, Comparison of machine learning methods applied to the estimation of manufacturing cost of jet engine components, Int. J. Prod. Econ., 178 (2016), pp. 109--119.
56.
J. Lu, Z. Shen, H. Yang, and S. Zhang, Deep Network Approximation for Smooth Functions, preprint, https://arxiv.org/abs/2001.03040, 2021.
57.
Y. Lu, A. Zhong, Q. Li, and B. Dong, Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations, preprint, https://arxiv.org/abs/1710.10121, 2017.
58.
H. Montanelli, H. Yang, and Q. Du, Deep ReLU Networks Overcome the Curse of Dimensionality for Bandlimited Functions, preprint, https://arxiv.org/abs/1903.00735, 2019.
59.
S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, Universal Adversarial Perturbations, preprint, https://arxiv.org/abs/1610.08401, 2016.
60.
S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, Deepfool: A Simple and Accurate Method to Fool Deep Neural Networks, preprint, https://arxiv.org/abs/1511.04599, 2015.
61.
P. Nakkiran, G. Kaplun, Y. Bansal, T. Yang, B. Barak, and I. Sutskever, Deep Double Descent: Where Bigger Models and More Data Hurt, preprint, https://arxiv.org/abs/1912.02292, 2019.
62.
F. Nobile, R. Tempone, and C. G. Webster, A sparse grid stochastic collocation method for elliptic partial differential equations with random input data, SIAM J. Numer. Anal., 46 (2008), pp. 2309--2345, https://doi.org/10.1137/060663660.
63.
J. A. A. Opschoor, P. C. Petersen, and C. Schwab, Deep ReLU Networks and High-Order Finite Element Methods, Tech. report, ETH Zürich, Zürich, 2019.
64.
J. A. A. Opschoor, C. Schwab, and J. Zech, Exponential ReLU DNN Expression of Holomorphic Maps in High Dimension, Tech. report, ETH Zürich, Zürich, 2019.
65.
P. Petersen and F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks, Neural Netw., 108 (2018), pp. 296--330.
66.
H. Rauhut and R. Ward, Interpolation via weighted $\ell_1$-minimization, Appl. Comput. Harmon. Anal., 40 (2016), pp. 321--351.
67.
S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz, Data-driven discovery of partial differential equations, Sci. Adv., 3 (2017), e1602614.
68.
C. Schwab and J. Zech, Deep learning in high dimension: Neural network expression rates for generalized polynomial chaos expansions in UQ, Anal. Appl., 17 (2019), pp. 19--55.
69.
Z. Shen, H. Yang, and S. Zhang, Deep network approximation characterized by number of neurons, Commun. Comput. Phys., 28 (2020), pp. 1768--1811, https://global-sci.org/intro/article_detail/cicp/18396.html.
70.
D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. Van Den Driessche, T. Graepel, and D. Hassabis, Mastering the game of Go without human knowledge, Nature, 550 (2017), pp. 354--359.
71.
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, 2015, https://arxiv.org/abs/1409.1556.
72.
C. Sommer and D. W. Gerlich, Machine learning in cell biology - teaching computers to recognize phenotypes, J. Cell Sci., 126 (2013), pp. 5529--5539.
73.
M. Stoyanov, User Manual: Tasmanian Sparse Grids, Tech. report ORNL/TM-2015/596, Oak Ridge National Laboratory, Oak Ridge, TN, 2015.
74.
M. Stoyanov, Adaptive sparse grid construction in a context of local anisotropy and multiple hierarchical parents, in Sparse Grids and Applications (Miami, 2016), Springer, 2018, pp. 175--199.
75.
M. K. Stoyanov and C. G. Webster, A dynamically adaptive sparse grids method for quasi-optimal interpolation of multidimensional functions, Comput. Math. Appl., 71 (2016), pp. 2449--2465.
76.
E. Strubell, A. Ganesh, and A. McCallum, Energy and Policy Considerations for Deep Learning in NLP, preprint, https://arxiv.org/abs/1906.02243, 2019.
77.
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, Intriguing Properties of Neural Networks, preprint, https://arxiv.org/abs/1312.6199, 2013.
78.
W. Z. Taffese and E. Sistonen, Machine learning for durability and service-life assessment of reinforced concrete structures: Recent advances and future directions, Automation in Construction, 77 (2017), pp. 1--14.
79.
A. L. Tarca, V. J. Carey, X.-w. Chen, R. Romero, and S. Drăghici, Machine learning and its applications to biology, PLoS Comput. Biol., 3 (2007), e116.
80.
H. Tran, C. G. Webster, and G. Zhang, Analysis of quasi-optimal polynomial approximations for parameterized PDEs with deterministic and stochastic coefficients, Numer. Math., 137 (2017), pp. 451--493.
81.
M. Unser, A representer theorem for deep neural networks, J. Mach. Learn. Res., 20 (2019), pp. 1--28.
82.
E. van den Berg and M. P. Friedlander, SPGL1: A Solver for Sparse Least Squares, 2020, https://friedlander.io/spgl1/.
83.
E. van den Berg and M. P. Friedlander, Probing the Pareto frontier for basis pursuit solutions, SIAM J. Sci. Comput., 31 (2008), pp. 890--912, https://doi.org/10.1137/080714488.
84.
C. Wu, P. Karanasou, M. J. Gales, and K. C. Sim, Stimulated deep neural network for speech recognition, in Proceedings of Interspeech 2016, San Francisco, 2016, pp. 400--404.
85.
D. Yarotsky, Error bounds for approximations with deep ReLU networks, Neural Netw., 94 (2017), pp. 103--114.
86.
D. Yarotsky, Optimal Approximation of Continuous Functions by Very Deep ReLU Networks, preprint, https://arxiv.org/abs/1802.03620, 2018.
87.
G. Zhang, J. Zhang, and J. Hinkle, Learning nonlinear level sets for dimensionality reduction in function approximation, in Advances in Neural Information Processing Systems 32, Curran Associates, Inc., 2019, pp. 13199--13208.
88.
B. Zieliński, A. Plichta, K. Misztal, P. Spurek, M. Brzychczy-Włoch, and D. Ochońska, Deep learning approach to bacterial colony classification, PLoS ONE, 12 (2017), e0184554.

Information & Authors

Information

Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 624 - 655
ISSN (online): 2577-0187

History

Submitted: 17 January 2020
Accepted: 11 January 2021
Published online: 6 May 2021

Keywords

  1. neural networks
  2. deep learning
  3. function approximation
  4. compressed sensing
  5. numerical analysis

MSC codes

  1. 41A25
  2. 41A46
  3. 42C05
  4. 65D05
  5. 65D15
  6. 65Y20
  7. 94A20

Authors

Affiliations

Funding Information

Natural Sciences and Engineering Research Council of Canada https://doi.org/10.13039/501100000038 : R611675
Pacific Institute for the Mathematical Sciences https://doi.org/10.13039/100009059
Simon Fraser University https://doi.org/10.13039/501100004326

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media