Abstract.

We consider a singularly perturbed system of stochastic differential equations proposed by Chaudhari et al. (Res. Math. Sci. 2018) to approximate the entropic gradient descent in the optimization of deep neural networks via homogenization. We embed it in a much larger class of two-scale stochastic control problems and rely on convergence results for Hamilton–Jacobi–Bellman equations with unbounded data proved recently by ourselves (ESAIM Control Optim. Calc. Var. 2023). We show that the limit of the value functions is itself the value function of an effective control problem with extended controls and that the trajectories of the perturbed system converge in a suitable sense to the trajectories of the limiting effective control system. These rigorous results improve the understanding of the convergence of the algorithms used by Chaudhari et al., as well as of their possible extensions where some tuning parameters are modeled as dynamic controls.

Keywords

  1. entropic stochastic gradient descent
  2. stochastic optimal control
  3. deep neural networks
  4. multiscale systems
  5. Hamilton–Jacobi–Bellman equations
  6. homogenization

MSC codes

  1. 93C70
  2. 68T07
  3. 90C26
  4. 93E20

Get full access to this article

View all available purchase options and get full access to this article.

Acknowledgments.

We are very grateful to a referee for precious feedback, which helped us to improve the paper considerably. We also thank Alekos Cecchin for useful discussions about the dynamic programming principle and for pointing out to us the papers [18] and [19].

References

1.
J.-P. Aubin and H. Frankowska, Set-Valued Analysis, Birkhäuser, Boston, 1990.
2.
C. Baldassi, A. Ingrosso, C. Lucibello, L. Saglietti, and R. Zecchina, Subdominant dense clusters allow for simple learning and high computational performance in neural networks with discrete synapses, Phys. Rev. Lett., 115 (2015), 128101.
3.
M. Bardi and A. Cesaroni, Optimal control with random parameters: A multiscale approach, Eur. J. Control, 17 (2011), pp. 30–45.
4.
M. Bardi, A. Cesaroni, and L. Manca, Convergence by viscosity methods in multiscale financial models with stochastic volatility, SIAM J. Financial Math., 1 (2010), pp. 230–265, https://doi.org/10.1137/090748147.
5.
M. Bardi and H. Kouhkouh, Singular perturbations in stochastic optimal control with unbounded data, ESAIM Control Optim. Calc. Var., 29 (2023), 52.
6.
R. Bhatia, Matrix Analysis, Springer-Verlag, New York, 1997.
7.
P. Billingsley, Probability and Measure, John Wiley & Sons, New York, 2008.
8.
V. Bogachev, A. Kirillov, and S. Shaposhnikov, Invariant measures of diffusions with gradient drifts, Dokl. Math., 82 (2010), pp. 790–793.
9.
V. I. Bogachev, A. I. Kirillov, and S. V. Shaposhnikov, The Kantorovich and variation distances between invariant measures of diffusions and nonlinear stationary Fokker-Planck-Kolmogorov equations, Math. Notes, 96 (2014), pp. 855–863.
10.
V. Bogachev and M. Röckner, A generalization of Khasminskii’s theorem on the existence of invariant measures for locally integrable drifts, Theory Probab. Appl., 45 (2000), pp. 363–378, https://doi.org/10.1137/S0040585X97978348.
11.
V. I. Bogachev, M. Rockner, and W. Stannat, Uniqueness of solutions of elliptic equations and uniqueness of invariant measures of diffusions, Sb. Math., 193 (2002), pp. 945–976.
12.
V. Borkar and V. Gaitsgory, Averaging of singularly perturbed controlled stochastic differential equations, Appl. Math. Optim., 56 (2007), pp. 169–209.
13.
V. S. Borkar and V. Gaitsgory, Singular perturbations in ergodic control of diffusions, SIAM J. Control Optim., 46 (2007), pp. 1562–1577, https://doi.org/10.1137/060657327.
14.
P. Chaudhari, A. Choromanska, S. Soatto, Y. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, and R. Zecchina, Entropy-SGD: Biasing gradient descent into wide valleys, J. Stat. Mech. Theory Exp., 2019 (2019), 124018.
15.
P. Chaudhari, A. Oberman, S. Osher, S. Soatto, and G. Carlier, Deep relaxation: Partial differential equations for optimizing deep neural networks, Res. Math. Sci., 5 (2018), 30.
16.
F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and P. R. Wolenski, Nonsmooth Analysis and Control Theory, Springer-Verlag, New York, 1998.
17.
F. Da Lio and O. Ley, Uniqueness results for second-order Bellman–Isaacs equations under quadratic growth assumptions and applications, SIAM J. Control Optim., 45 (2006), pp. 74–106, https://doi.org/10.1137/S0363012904440897.
18.
M. F. Djete, D. Possamaï, and X. Tan, McKean–Vlasov optimal control: The dynamic programming principle, Ann. Probab., 50 (2022), pp. 791–833.
19.
N. El Karoui and X. Tan, Capacities, Measurable Selection and Dynamic Programming. Part II: Application in Stochastic Control Problems, preprint, arXiv:1310.3364, 2015.
20.
W. H. Fleming and H. M. Soner, Controlled Markov Processes and Viscosity Solutions, 2nd ed., Springer, New York, 2006.
21.
J.-P. Fouque, G. Papanicolaou, R. Sircar, and K. Sølna, Multiscale Stochastic Volatility for Equity, Interest Rate, and Credit Derivatives, Cambridge University Press, Cambridge, UK, 2011.
22.
P. Kokotović, H. K. Khalil, and J. O’Reilly, Singular Perturbation Methods in Control: Analysis and Design, Academic Press, London, 1986.
23.
H. Kouhkouh, Some Asymptotic Problems for Hamilton-Jacobi-Bellman Equations and Applications to Global Optimization, Ph.D. thesis, University of Padova, 2022, https://hdl.handle.net/11577/3444759.
24.
H. Kushner, Weak Convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems, Birkhäuser, Boston, 1990.
25.
Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015), pp. 436–444.
26.
Q. Li, C. Tai, and W. E, Stochastic modified equations and adaptive stochastic gradient algorithms, in Proceedings of the International Conference on Machine Learning, PMLR, 2017, pp. 2101–2110.
27.
E. Pardoux and A. Y. Veretennikov, On the Poisson equation and diffusion approximation, I, Ann. Probab., 29 (2001), pp. 1061–1085.
28.
E. Pardoux and A. Y. Veretennikov, On Poisson equation and diffusion approximation, II, Ann. Probab., 31 (2003), pp. 1166–1192.
29.
E. Pardoux and A. Y. Veretennikov, On the Poisson equation and diffusion approximation, III, Ann. Probab., 33 (2005), pp. 1111–1133.
30.
M. Pavon, On local entropy, stochastic control, and deep neural networks, IEEE Control Syst. Lett., 7 (2022), pp. 437–441.
31.
F. Pittorino, C. Lucibello, C. Feinauer, G. Perugini, C. Baldassi, E. Demyanenko, and R. Zecchina, Entropic gradient descent algorithms and wide flat minima, J. Stat. Mech. Theory Exp., 2021 (2021), 124015.
32.
D. W. Stroock and S. S. Varadhan, Multidimensional Diffusion Processes, Grundlehren Math. Wiss. 233, Springer-Verlag, Berlin, New York, 1979.
33.
T. P. Wihler, On the Hölder continuity of matrix functions for normal matrices, JIPAM J. Inequal. Pure Appl. Math., 10 (2009), 91.
34.
J. Yong and X. Y. Zhou, Stochastic Controls: Hamiltonian Systems and HJB Equations, Springer-Verlag, New York, 1999.

Information & Authors

Information

Published In

cover image SIAM Journal on Control and Optimization
SIAM Journal on Control and Optimization
Pages: 2229 - 2253
ISSN (online): 1095-7138

History

Submitted: 3 January 2023
Accepted: 7 May 2024
Published online: 24 July 2024

Keywords

  1. entropic stochastic gradient descent
  2. stochastic optimal control
  3. deep neural networks
  4. multiscale systems
  5. Hamilton–Jacobi–Bellman equations
  6. homogenization

MSC codes

  1. 93C70
  2. 68T07
  3. 90C26
  4. 93E20

Authors

Affiliations

Department of Mathematics “T. Levi-Civita”, University of Padova, I-35121 Padova, Italy.
RWTH Aachen University, Institut für Mathematik, RTG Energy, Entropy, and Dissipative Dynamics, Templergraben 55 (111810), 52062, Aachen, Germany, Current address: Department of Mathematics and Scientific Computing, NAWI, University of Graz, 8010, Graz, Austria.

Funding Information

Funding: The first author is a member of the Gruppo Nazionale per l’Analisi Matematica, la Probabilità e le loro Applicazioni (GNAMPA) of the Istituto Nazionale di Alta Matematica (INdAM). He also participates in the King Abdullah University of Science and Technology (KAUST) project CRG2021-4674 “Mean-Field Games: models, theory, and computational aspects,” and in the project funded by the EuropeanUnion-NextGenerationEU under the National Recovery and Resilience Plan (NRRP), Mission 4 Component 2 Investment 1.1 - Call PRIN 2022 No. 104 of February 2, 2022 of the Italian Ministry of University and Research; Project 2022W58BJ5 (subject area: PE - Physical Sciences and Engineering) “PDEs and optimal control methods in mean field games, population dynamics and multi-agent models”. The research of the second author was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Projektnummer 320021702/GRK2326 – Energy, Entropy, and Dissipative Dynamics (EDDy).

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item

View Options

View options

PDF

View PDF

Full Text

View Full Text

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media