We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in $L^2(\mathbb{R}^d)$. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions from this class to within a prescribed accuracy. Additionally, we prove that our lower bounds are achievable for a broad family of function classes. Specifically, all function classes that are optimally approximated by a general class of representation systems---so-called affine systems---can be approximated by deep neural networks with minimal connectivity and memory requirements. Affine systems encompass a wealth of representation systems from applied harmonic analysis such as wavelets, ridgelets, curvelets, shearlets, $\alpha$-shearlets, and, more generally, $\alpha$-molecules. Our central result elucidates a remarkable universality property of neural networks and shows that they achieve the optimum approximation properties of all affine systems combined. As a specific example, we consider the class of $\alpha^{-1}$-cartoon-like functions, which is approximated optimally by $\alpha$-shearlets. We also explain how our results can be extended to the approximation of functions on low-dimensional immersed manifolds. Finally, we present numerical experiments demonstrating that the standard stochastic gradient descent algorithm yields deep neural networks with close-to-optimal approximation rates. Moreover, these results indicate that stochastic gradient descent can learn approximations that are sparse in the representation systems optimally sparsifying the function class the network is trained on.


  1. neural networks
  2. function approximation
  3. optimal sparse approximation
  4. sparse connectivity
  5. wavelets
  6. shearlets

MSC codes

  1. 41A25
  2. 82C32
  3. 42C40
  4. 42C15
  5. 41A46
  6. 68T05
  7. 94A34
  8. 94A12

Formats available

You can view the full content in the following formats:


A. R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inform. Theory, 39 (1993), pp. 930--945.
A. R. Barron, Approximation and estimation bounds for artificial neural networks, Mach. Learn., 14 (1994), pp. 115--133.
J. Berner, P. Grohs, and A. Jentzen, Analysis of the Generalization Error: Empirical Risk Minimization over Deep Artificial Neural Networks Overcomes the Curse of Dimensionality in the Numerical Approximation of Black-Scholes Partial Differential Equations, preprint, https://arxiv.org/abs/1809.03062, 2018.
E. J. Candès, Ridgelets: Theory and Applications, Ph.D. thesis, Stanford University, Stanford, CA, 1998.
E. J. Candès, Ridgelets and the representation of mutilated Sobolev functions, SIAM J. Math. Anal., 33 (2001), pp. 347--368, https://doi.org/10.1137/S003614109936364X.
E. J. Candès and D. L. Donoho, New tight frames of curvelets and optimal representations of objects with piecewise C\textup2 singularities, Comm. Pure Appl. Math., 57 (2002), pp. 219--266.
C. K. Chui, X. Li, and H. N. Mhaskar, Neural networks for localized approximation, Math. Comp., 63 (1994), pp. 607--623.
A. Cohen, W. Dahmen, I. Daubechies, and R. A. DeVore, Tree approximation and optimal encoding, Appl. Comput. Harmon. Anal., 11 (2001), pp. 192--226.
N. Cohen, O. Sharir, and A. Shashua, On the expressive power of deep learning: A tensor analysis, in Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, NY, 2016, pp. 698--728.
N. Cohen and A. Shashua, Convolutional rectifier networks as generalized tensor decompositions, in International Conference on Machine Learning, New York, NY, 2016, pp. 955--963.
F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc., 39 (2002), pp. 1--49.
G. Cybenko, Approximation by superpositions of a sigmoidal function, Math. Control Signals Systems, 2 (1989), pp. 303--314.
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, Mastering the game of Go with deep neural networks and tree search, Nature, 529 (2016), pp. 484--489.
I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, 1992, https://doi.org/10.1137/1.9781611970104.
L. Demanet and L. Ying, Wave atoms and sparsity of oscillatory patterns, Appl. Comput. Harmon. Anal., 23 (2007), pp. 368--387.
R. A. DeVore, Nonlinear approximation, Acta Numer., 7 (1998), pp. 51--150.
R. A. DeVore and G. G. Lorentz, Constructive Approximation, Springer, New York, 1993.
R. A. DeVore, K. Oskolkov, and P. Petrushev, Approximation by feed-forward neural networks, Ann. Numer. Math., 4 (1996), pp. 261--287.
D. L. Donoho, Unconditional bases are optimal bases for data compression and for statistical estimation, Appl. Comput. Harmon. Anal., 1 (1993), pp. 100--115.
D. L. Donoho, Sparse components of images and optimal atomic decompositions, Constr. Approx., 17 (2001), pp. 353--382.
R. Eldan and O. Shamir, The power of depth for feedforward neural networks, in Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, NY, 2016, pp. 907--940.
S. Ellacott, Aspects of the numerical analysis of neural networks, Acta Numer., 3 (1994), pp. 145--202.
K.-I. Funahashi, On the approximate realization of continuous mappings by neural networks, Neural Networks, 2 (1989), pp. 183--192.
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, Cambridge, MA, 2016, http://www.deeplearningbook.org.
K. Gröchenig, Foundations of Time-Frequency Analysis, Springer, New York, 2013.
P. Grohs, Optimally sparse data representations, in Harmonic and Applied Analysis, Springer, New York, 2015, pp. 199--248.
P. Grohs, S. Keiper, G. Kutyniok, and M. Schäfer, $\alpha$-molecules, Appl. Comput. Harmon. Anal., 41 (2016), pp. 297--336.
P. Grohs, S. Keiper, G. Kutyniok, and M. Schäfer, Cartoon approximation with $\alpha$-curvelets, J. Fourier Anal. Appl., 22 (2016), pp. 1235--1293.
P. Grohs and G. Kutyniok, Parabolic molecules, Found. Comput. Math., 14 (2014), pp. 299--337.
P. Grohs, D. Perekrestenko, D. Elbrächter, and H. Bölcskei, Deep neural network approximation theory, IEEE Trans. Inform. Theory, submitted.
K. Guo, G. Kutyniok, and D. Labate, Sparse multidimensional representations using anisotropic dilation and shear operators, in Wavelets and Splines (Athens, GA, 2005), Nashboro Press, Nashville, TN, 2006, pp. 189--201.
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Washington, DC, 2016, pp. 770--778.
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Process. Mag., 29 (2012), pp. 82--97.
K. Hornik, Approximation capabilities of multilayer feedforward networks, Neural Networks, 4 (1991), pp. 251--257.
K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Networks, 2 (1989), pp. 359--366.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems 25, Curran Associates, Red Hook, NY, 2012, pp. 1097--1105.
G. Kutyniok and W.-Q. Lim, Compactly supported shearlets are optimally sparse, J. Approx. Theory, 163 (2011), pp. 1564--1589.
Y. LeCun, Modèles connexionnistes de l'apprentissage, Ph.D. thesis, These de Doctorat, Université Paris 6, Paris, France, 1987.
Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015), pp. 436--444.
V. Maiorov and A. Pinkus, Lower bounds for approximation by MLP neural networks, Neurocomputing, 25 (1999), pp. 81--91.
W. McCulloch and W. Pitts, A logical calculus of ideas immanent in nervous activity, Bull. Math. Biophys., 5 (1943), pp. 115--133.
H. Mhaskar and C. Micchelli, Degree of approximation by neural and translation networks with a single hidden layer, Adv. Appl. Math., 16 (1995), pp. 151--183.
H. N. Mhaskar, Approximation properties of a multilayered feedforward artificial neural network, Adv. Comput. Math., 1 (1993), pp. 61--80.
H. N. Mhaskar, Neural networks for optimal approximation of smooth and analytic functions, Neural Comput., 8 (1996), pp. 164--177.
H. N. Mhaskar and C. Micchelli, Approximation by superposition of sigmoidal and radial basis functions, Adv. Appl. Math., 13 (1992), pp. 350--373.
H. N. Mhaskar and T. Poggio, Deep vs. shallow networks: An approximation theory perspective, Anal. Appl. (Singap.), 14 (2016), pp. 829--848.
T. Nguyen-Thien and T. Tran-Cong, Approximation of functions and their derivatives: A neural network implementation with applications, Appl. Math. Model., 23 (1999), pp. 687--704.
E. Ott, Chaos in Dynamical Systems, 2nd ed., Cambridge University Press, Cambridge, UK, 2002.
P. Petersen and F. Voigtlaender, Equivalence of Approximation by Convolutional Neural Networks and Fully-Connected Networks, preprint, https://arxiv.org/abs/1809.00973, 2018.
P. Petersen and F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks, Neural Networks, 108 (2018), pp. 296--330.
A. Pinkus, Approximation theory of the MLP model in neural networks, Acta Numer., 8 (1999), pp. 143--195.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Nature, 323 (1986), pp. 533--536.
U. Shaham, A. Cloninger, and R. R. Coifman, Provable approximation properties for deep neural networks, Appl. Comput. Harmon. Anal., 44 (2018), pp. 537--557.
I. Steinwart and A. Christmann, Support Vector Machines, Information Science and Statistics, Springer, New York, 2008.
R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B, 58 (1996), pp. 267--288.
F. Voigtlaender and A. Pein, Analysis vs. Synthesis Sparsity for $\alpha$-Shearlets, preprint, https://arxiv.org/abs/1702.03559, 2017.
D. Yarotsky, Universal Approximations of Invariant Maps by Neural Networks, preprint, https://arxiv.org/abs/1804.10306v1, 2018.
D. Yarotsky, Error bounds for approximations with deep ReLU networks, Neural Netw., 94 (2017), pp. 103--114.
D.-X. Zhou, Universality of Deep Convolutional Neural Networks, preprint, https://arxiv.org/abs/1805.10769, 2018.

Information & Authors


Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 8 - 45
ISSN (online): 2577-0187


Submitted: 14 May 2018
Accepted: 29 November 2018
Published online: 12 February 2019


  1. neural networks
  2. function approximation
  3. optimal sparse approximation
  4. sparse connectivity
  5. wavelets
  6. shearlets

MSC codes

  1. 41A25
  2. 82C32
  3. 42C40
  4. 42C15
  5. 41A46
  6. 68T05
  7. 94A34
  8. 94A12



Funding Information

Einstein Center for Mathematics Berlin
Deutsche Forschungsgemeinschaft https://doi.org/10.13039/501100001659 : 1446/18, DFG-SPP 1798, KU 1446/21, KU 1446/23
Deutsche Forschungsgemeinschaft https://doi.org/10.13039/501100001659
Einstein Stiftung Berlin https://doi.org/10.13039/501100006188
European Commission https://doi.org/10.13039/501100000780 : 665044
Stanford University https://doi.org/10.13039/100005492

Metrics & Citations



If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.







Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.