# The Gap between Theory and Practice in Function Approximation with Deep Neural Networks

## Abstract

*deep neural networks*(DNNs) trained on real-world data. Driven in part by a rapidly expanding literature on DNN approximation theory showing that DNNs can approximate a rich variety of functions, these tools are increasingly being considered for problems in scientific computing. Yet, unlike more traditional algorithms in this field, relatively little is known about DNNs in relation to the principles of numerical analysis, namely, stability, accuracy, computational efficiency, and sample complexity. In this paper we first introduce a computational framework for examining DNNs in practice, and then use it to study their empirical performance with regard to these issues. We examine the performance of DNNs of different widths and depths on a variety of test functions in various dimensions, including smooth and piecewise smooth functions. We also compare DL against best-in-class methods for smooth function approximation based on compressed sensing. Our main conclusion from these experiments is that there is a crucial gap between the approximation theory of DNNs and their practical performance, with trained DNNs performing relatively poorly on functions for which there are strong approximation results (e.g., smooth functions) yet performing well in comparison to best-in-class methods for other functions. To analyze this gap further, we then provide some theoretical insights. We establish a

*practical existence theorem*, which asserts the existence of a DNN architecture and training procedure that offers the same performance as compressed sensing. This result establishes a key theoretical benchmark. It demonstrates that the gap can be closed, albeit via a DNN approximation strategy which is guaranteed to perform as well as, but no better than, current best-in-class schemes. Nevertheless, it demonstrates the promise of practical DNN approximation by highlighting the potential for developing better schemes through the careful design of DNN architectures and training strategies.

### Keywords

### MSC codes

## Get full access to this article

View all available purchase options and get full access to this article.

## Supplementary Material

**PLEASE NOTE: These supplementary files have not been peer-reviewed.**

**Index of Supplementary Materials**

**Title of paper:*** The gap between theory and practice in function approximation with deep neural networks*

**Authors: ***Ben Adcock and Nick Dexter*

**File:** MLFA_supplement.pdf

**Type: **PDF

**Contents: **Additional information on the testing setup for the numerical experiments in this work, as well as additional numerical experiments relevant to the discussions, more details about truncation parameters and lower set-motivated recovery strategies in compressed sensing, and proofs of the exponential convergence of best *s*-term polynomial approximations for holomorphic functions, the convergence of compressed sensing on the same functions, and the proof of the main result on the approximation of such functions with deep neural networks.

## References

*Infinite-dimensional compressed sensing and function interpolation*, Found. Comput. Math., 18 (2018), pp. 661--701.

*Correcting for unknown errors in sparse high-dimensional function approximation*, Numer. Math., 142 (2019), pp. 667--711.

*Compressed sensing approaches for polynomial approximation of high-dimensional functions*, in Compressed Sensing and Its Applications, Birkhäuser, 2017, pp. 93--124.

*On Instabilities of Deep Learning in Image Reconstruction -- Does AI Come at a Cost?*, preprint, https://arxiv.org/abs/1902.05300, 2019.

*Solving inverse problems using data-driven models*, Acta Numer., 28 (2019), pp. 1--174.

*Breaking the curse of dimensionality with convex neural networks*, J. Mach. Learn. Res., 18 (2017), pp. 1--53.

*Workshop Report on Basic Research Needs for Scientific Machine Learning: Core Technologies for Artificial Intelligence*, U.S. Department of Energy Advanced Scientific Computing Research, 2019.

*Full Error Analysis for the Training of Deep Neural Networks*, preprint, https://arxiv.org/abs/1910.00121, 2019.

*Convergence of quasi-optimal stochastic Galerkin methods for a class of PDEs with random coefficients*, Comput. Math. Appl., 67 (2014), pp. 732--751.

*On the optimal polynomial approximation of stochastic PDEs by Galerkin and collocation methods*, Math. Models Methods Appl. Sci., 22 (2012), 1250023.

*Analysis of the Generalization Error: Empirical Risk Minimization over Deep Artificial Neural Networks Overcomes the Curse of Dimensionality in the Numerical Approximation of Black-Scholes Partial Differential Equations*, preprint, https://arxiv.org/abs/1809.03062, 2018.

*Machine learning phases of matter*, Nature Phys., 13 (2017), pp. 431--434.

*Polynomial approximation via compressed sensing of high-dimensional functions on lower sets*, Math. Comp., 87 (2018), pp. 1415--1450.

*Convergence rates of best N-term Galerkin approximations for a class of elliptic sPDEs*, Found. Comput. Math., 10 (2010), pp. 615--646.

*Analytic regularity and polynomial approximation of parametric and stochastic elliptic PDEs*, Anal. Appl., 9 (2011), pp. 11--47.

*Approximation of high-dimensional parametric PDEs*, Acta Numer., 24 (2015), pp. 1--159.

*Approximation by superpositions of a sigmoidal function*, Math. Control Signals Systems, 2 (1989), pp. 303--314.

*Robust training and initialization of deep neural networks: An adaptive basis viewpoint*, in Proceedings of the First Mathematical and Scientific Machine Learning Conference (Princeton University, Princeton, NJ), PMLR 107, PMLR, 2020, pp. 512--536, http://proceedings.mlr.press/v107/cyr20a.html.

*Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition*, IEEE Trans. Audio Speech Language Process., 20 (2012), pp. 30--42.

*Analysis of Deep Neural Networks with Quasi-Optimal Polynomial Approximation Rates*, preprint, https://arxiv.org/abs/1912.02302, 2019.

*A Polynomial-Based Approach for Architectural Design and Learning with Deep Neural Networks*, preprint, https://arxiv.org/abs/1905.10457, 2019.

*Clinically applicable deep learning for diagnosis and referral in retinal disease*, Nature Med., 24 (2018), pp. 1342--1350.

*ImageNet: A large-scale hierarchical image database*, in Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248--255.

*Greedy Shallow Networks: A New Approach for Constructing and Training Neural Networks*, preprint, https://arxiv.org/abs/1905.10409, 2019.

*Neural network integral representations with the ReLU activation function*, in Proceedings of the First Mathematical and Scientific Machine Learning Conference, Proc. Mach. Learn. Res. 107, PMLR, 2020, pp. 128--143.

*Nonlinear approximation*, Acta Numer., 7 (1998), pp. 51--150.

*A mixed $\ell_1$ regularization approach for sparse simultaneous approximation of parameterized PDEs*, ESAIM Math. Model. Numer. Anal., 53 (2019), pp. 2025--2045.

*Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations*, Commun. Math. Stat., 5 (2017), pp. 349--380.

*Barron Spaces and the Compositional Function Spaces for Neural Network Models*, preprint, https://arxiv.org/abs/1906.08039, 2019.

*Exponential Convergence of the Deep Neural Network Approximation for Analytic Functions*, preprint, https://arxiv.org/abs/1807.00297, 2018.

*The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems*, Commun. Math. Stat., 6 (2018), pp. 1--12.

*Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers*, preprint, https://arxiv.org/abs/1202.2160, 2012.

*The robustness of deep networks: A geometrical perspective*, IEEE Signal Process. Mag., 34 (2017), pp. 50--62.

*Growing Axons: Greedy Learning of Neural Networks with Application to Function Approximation*, preprint, https://arxiv.org/abs/1910.12686, 2019.

*Numerical Solution of the Parametric Diffusion Equation by Deep Neural Networks*, preprint, https://arxiv.org/abs/2004.12131, 2020.

*Numerical integration using sparse grids*, Numer. Algorithms, 18 (1998), pp. 209--232.

*Understanding the difficulty of training deep feedforward neural networks*, J. Mach. Learn. Res., 9 (2010), pp. 249--256.

*Deep Learning*, MIT Press, 2016.

*The Troublesome Kernel: Why Deep Learning for Inverse Problems Is Typically Unstable*, preprint, https://arxiv.org/abs/2001.01258, 2020.

*A Proof That Artificial Neural Networks Overcome the Curse of Dimensionality in the Numerical Approximation of Black-Scholes Partial Differential Equations*, preprint, https://arxiv.org/abs/1809.02362, 2018.

*Deep convolutional neural networks on cartoon functions*, in Proceedings of the IEEE International Symposium on Information Theory, ISIT 2016, Barcelona, Spain, 2016, pp. 1163--1167.

*Error Bounds for Approximations with Deep ReLU Neural Networks in $W^{s,p}$ Norms*, preprint, https://arxiv.org/abs/1902.07896, 2019.

*Stochastic finite element methods for partial differential equations with random input data*, Acta Numer., 23 (2014), pp. 521--650.

*Which neural net architectures give rise to exploding and vanishing gradients?*, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2018, pp. 582--591.

*How to start training: The effect of initialization and architecture*, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2018, pp. 571--581.

*Delving deep into rectifiers: Surpassing human-level performance on imagenet classification*, in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV 2015), 2015, pp. 1026--1034.

*Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups*, IEEE Signal Process. Mag., 29 (2012), pp. 82--97.

*Multilayer feedforward networks are universal approximators*, Neural Networks, 2 (1989), pp. 359--366.

*Adam: A Method for Stochastic Optimization*, preprint, https://arxiv.org/abs/1412.6980, 2014.

*ImageNet classification with deep convolutional neural networks*, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2012, pp. 1097--1105.

*A Theoretical Analysis of Deep Neural Networks and Parametric PDEs*, preprint, https://arxiv.org/abs/1904.00377, 2019.

*Better Approximations of High Dimensional Smooth Functions by Deep Neural Networks with Rectified Power Units*, preprint, https://arxiv.org/abs/1903.05858, 2019.

*Why Deep Neural Networks for Function Approximation?*, preprint, https://arxiv.org/abs/1610.04161, 2016.

*Decoupled weight decay regularization*, in Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019, https://arxiv.org/abs/1711.05101.

*Comparison of machine learning methods applied to the estimation of manufacturing cost of jet engine components*, Int. J. Prod. Econ., 178 (2016), pp. 109--119.

*Deep Network Approximation for Smooth Functions*, preprint, https://arxiv.org/abs/2001.03040, 2021.

*Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations*, preprint, https://arxiv.org/abs/1710.10121, 2017.

*Deep ReLU Networks Overcome the Curse of Dimensionality for Bandlimited Functions*, preprint, https://arxiv.org/abs/1903.00735, 2019.

*Universal Adversarial Perturbations*, preprint, https://arxiv.org/abs/1610.08401, 2016.

*Deepfool: A Simple and Accurate Method to Fool Deep Neural Networks*, preprint, https://arxiv.org/abs/1511.04599, 2015.

*Deep Double Descent: Where Bigger Models and More Data Hurt*, preprint, https://arxiv.org/abs/1912.02292, 2019.

*A sparse grid stochastic collocation method for elliptic partial differential equations with random input data*, SIAM J. Numer. Anal., 46 (2008), pp. 2309--2345, https://doi.org/10.1137/060663660.

*Deep ReLU Networks and High-Order Finite Element Methods*, Tech. report, ETH Zürich, Zürich, 2019.

*Exponential ReLU DNN Expression of Holomorphic Maps in High Dimension*, Tech. report, ETH Zürich, Zürich, 2019.

*Optimal approximation of piecewise smooth functions using deep ReLU neural networks*, Neural Netw., 108 (2018), pp. 296--330.

*Interpolation via weighted $\ell_1$-minimization*, Appl. Comput. Harmon. Anal., 40 (2016), pp. 321--351.

*Data-driven discovery of partial differential equations*, Sci. Adv., 3 (2017), e1602614.

*Deep learning in high dimension: Neural network expression rates for generalized polynomial chaos expansions in UQ*, Anal. Appl., 17 (2019), pp. 19--55.

*Deep network approximation characterized by number of neurons*, Commun. Comput. Phys., 28 (2020), pp. 1768--1811, https://global-sci.org/intro/article_detail/cicp/18396.html.

*Mastering the game of Go without human knowledge*, Nature, 550 (2017), pp. 354--359.

*Very deep convolutional networks for large-scale image recognition*, in Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, 2015, https://arxiv.org/abs/1409.1556.

*Machine learning in cell biology - teaching computers to recognize phenotypes*, J. Cell Sci., 126 (2013), pp. 5529--5539.

*User Manual: Tasmanian Sparse Grids*, Tech. report ORNL/TM-2015/596, Oak Ridge National Laboratory, Oak Ridge, TN, 2015.

*Adaptive sparse grid construction in a context of local anisotropy and multiple hierarchical parents*, in Sparse Grids and Applications (Miami, 2016), Springer, 2018, pp. 175--199.

*A dynamically adaptive sparse grids method for quasi-optimal interpolation of multidimensional functions*, Comput. Math. Appl., 71 (2016), pp. 2449--2465.

*Energy and Policy Considerations for Deep Learning in NLP*, preprint, https://arxiv.org/abs/1906.02243, 2019.

*Intriguing Properties of Neural Networks*, preprint, https://arxiv.org/abs/1312.6199, 2013.

*Machine learning for durability and service-life assessment of reinforced concrete structures: Recent advances and future directions*, Automation in Construction, 77 (2017), pp. 1--14.

*Machine learning and its applications to biology*, PLoS Comput. Biol., 3 (2007), e116.

*Analysis of quasi-optimal polynomial approximations for parameterized PDEs with deterministic and stochastic coefficients*, Numer. Math., 137 (2017), pp. 451--493.

*A representer theorem for deep neural networks*, J. Mach. Learn. Res., 20 (2019), pp. 1--28.

*SPGL1: A Solver for Sparse Least Squares*, 2020, https://friedlander.io/spgl1/.

*Probing the Pareto frontier for basis pursuit solutions*, SIAM J. Sci. Comput., 31 (2008), pp. 890--912, https://doi.org/10.1137/080714488.

*Stimulated deep neural network for speech recognition*, in Proceedings of Interspeech 2016, San Francisco, 2016, pp. 400--404.

*Error bounds for approximations with deep ReLU networks*, Neural Netw., 94 (2017), pp. 103--114.

*Optimal Approximation of Continuous Functions by Very Deep ReLU Networks*, preprint, https://arxiv.org/abs/1802.03620, 2018.

*Learning nonlinear level sets for dimensionality reduction in function approximation*, in Advances in Neural Information Processing Systems 32, Curran Associates, Inc., 2019, pp. 13199--13208.

*Deep learning approach to bacterial colony classification*, PLoS ONE, 12 (2017), e0184554.

## Information & Authors

### Information

#### Published In

#### Copyright

#### History

**Submitted**: 17 January 2020

**Accepted**: 11 January 2021

**Published online**: 6 May 2021

#### Keywords

#### MSC codes

### Authors

#### Funding Information

#### Funding Information

#### Funding Information

## Metrics & Citations

### Metrics

### Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

#### Cited By

- Solving Elliptic Problems with Singular Sources Using Singularity Splitting Deep Ritz MethodSIAM Journal on Scientific Computing, Vol. 45, No. 4 | 8 August 2023
- An Adaptive Sampling and Domain Learning Strategy for Multivariate Function Approximation on Unknown DomainsSIAM Journal on Scientific Computing, Vol. 45, No. 1 | 23 February 2023
- Deep Neural Network Surrogates for Nonsmooth Quantities of Interest in Shape Uncertainty QuantificationSIAM/ASA Journal on Uncertainty Quantification, Vol. 10, No. 3 | 23 August 2022
- WARPd: A Linearly Convergent First-Order Primal-Dual Algorithm for Inverse Problems with Approximate Sharpness ConditionsSIAM Journal on Imaging Sciences, Vol. 15, No. 3 | 15 September 2022
- Recovering Wavelet Coefficients from Binary Samples Using Fast TransformsSIAM Journal on Scientific Computing, Vol. 44, No. 3 | 19 May 2022