Abstract.

In this paper, we consider the generalized (higher order) Langevin equation for the purpose of simulated annealing and optimization of nonconvex functions. Our approach modifies the underdamped Langevin equation by replacing the Brownian noise with an appropriate Ornstein–Uhlenbeck process to account for memory in the system. Under reasonable conditions on the loss function and the annealing schedule, we establish convergence of the continuous time dynamics to a global minimum. In addition, we investigate the performance numerically and show better performance and higher exploration of the state space compared to the underdamped Langevin dynamics with the same annealing schedule.

Keywords

  1. nonconvex optimization
  2. generalized Langevin equation
  3. simulated annealing

MSC codes

  1. 60J25
  2. 46N10
  3. 60J60

Get full access to this article

View all available purchase options and get full access to this article.

Acknowledgments.

The authors would like to thank Tony Lelievre, Gabriel Stoltz, Urbain Vaes and the anonymous referees for their helpful remarks.

Supplementary Materials

PLEASE NOTE: These supplementary files have not been peer-reviewed.
Index of Supplementary Materials
Title of paper: On the Generalized Langevin Equation for Simulated Annealing
Authors: Martin Chak, Nikolas Kantas, and Grigorios A. Pavliotis
File: supplement.pdf
Type: PDF
Contents: This file includes background/additional results, a description of the numerical methodology used and figures/tables from additional numerical experiments. The background results provide full justification of the main results in the manuscript; additional results provide a peripheral perspectve on the main results; the numerical methodology allows reproduction of the numerical experiments; the additional numerical experiments provide a wider variety of examples.

References

1.
S. A. Adelman and B. J. Garrison, Generalized Langevin theory for gas/solid processes: Dynamical solid models, J. Chem. Phys., 65 (1976), pp. 3751–3761, https://doi.org/10.1063/1.433564.
2.
H. AlRachid, L. Mones, and C. Ortner, Some remarks on preconditioning molecular dynamics, SMAI J. Comput. Math., 4 (2018), pp. 57–80, https://doi.org/10.5802/smai-jcm.29.
3.
A. D. Baczewski and S. D. Bond, Numerical integration of the extended variable generalized Langevin equation with a positive Prony representable memory kernel, J. Chem. Phys., 139 (2013), 044107, https://doi.org/10.1063/1.4815917.
4.
C. H. Bennett, Mass tensor molecular dynamics, J. Comput. Phys., 19 (1975), pp. 267–279, https://doi.org/10.1016/0021-9991(75)90077-7.
5.
R. Biswas and D. R. Hamann, Simulated annealing of silicon atom clusters in Langevin molecular dynamics, Phys. Rev. B, 34 (1986), pp. 895–901, https://doi.org/10.1103/PhysRevB.34.895.
6.
E. Bitzek, P. Koskinen, F. Gähler, M. Moseler, and P. Gumbsch, Structural relaxation made simple, Phys. Rev. Lett., 97 (2006), 170201, https://doi.org/10.1103/PhysRevLett.97.170201.
7.
V. I. Bogachev, N. V. Krylov, M. Röckner, and S. V. Shaposhnikov, Fokker–Planck–Kolmogorov Equations, Math. Surveys Monogr. 207, American Mathematical Society, Providence, RI, 2015, https://doi.org/10.1090/surv/207.
8.
L. Bottou, F. E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev., 60 (2018), pp. 223–311, https://doi.org/10.1137/16M1080173.
9.
H. Brezis, Functional Analysis, Sobolev Spaces and Partial Differential Equations, Universitext, Springer, New York, 2011.
10.
J. Carrillo, S. Jin, L. Li, and Y. Zhu, A Consensus-Based Global Optimization Method for High Dimensional Machine Learning Problems, preprint, https://arxiv.org/abs/1909.09249, 2019.
11.
J. A. Carrillo, Y.-P. Choi, C. Totzeck, and O. Tse, An analytical framework for consensus-based global optimization method, Math. Models Methods Appl. Sci., 28 (2018), pp. 1037–1066, https://doi.org/10.1142/S0218202518500276.
12.
M. Ceriotti, G. Bussi, and M. Parrinello, Langevin equation with colored noise for constant-temperature molecular dynamics simulations, Phys. Rev. Lett., 102 (2009), 020601, https://doi.org/10.1103/PhysRevLett.102.020601.
13.
M. Ceriotti, G. Bussi, and M. Parrinello, Colored-noise thermostats à la carte, J. Chem. Theory Comput., 6 (2010), pp. 1170–1180, https://doi.org/10.1021/ct900563s.
14.
M. Chaleyat-Maurel and D. Michel, Hypoellipticity theorems and conditional laws, Z. Wahrsch. Verw. Gebiete, 65 (1984), pp. 573–597.
15.
X. Cheng, N. S. Chatterji, P. L. Bartlett, and M. I. Jordan, Underdamped Langevin MCMC: A non-asymptotic analysis, in Proceedings of the 31st Conference On Learning Theory, S. Bubeck, V. Perchet, and P. Rigollet, eds., Proc. Mach. Learn. Res. 75, PMLR, 2018, pp. 300–323.
16.
T.-S. Chiang, C.-R. Hwang, and S.-J. Sheu, Diffusion for global optimization in \({\boldsymbol{\mathrm{R}}}^n\) , SIAM J. Control Optim., 25 (1987), pp. 737–753, https://doi.org/10.1137/0325042.
17.
J. Diestel, Uniform integrability: An introduction, in School on Measure Theory and Real Analysis (Grado, 1991). Rend. Istit. Mat. Univ. Trieste 23, Università degli Studi di Trieste, 1993, pp. 41–80.
18.
R. Douc, G. Fort, and A. Guillin, Subgeometric rates of convergence of f-ergodic strong Markov processes, Stochastic Process. Appl., 119 (2009), pp. 897–923, https://doi.org/10.1016/j.spa.2008.03.007.
19.
A. B. Duncan, T. Lelièvre, and G. A. Pavliotis, Variance reduction using nonreversible Langevin samplers, J. Stat. Phys., 163 (2016), pp. 457–491, https://doi.org/10.1007/s10955-016-1491-2.
20.
A. B. Duncan, N. Nüsken, and G. A. Pavliotis, Using perturbed underdamped Langevin dynamics to efficiently sample from probability distributions, J. Stat. Phys., 169 (2017), pp. 1098–1131, https://doi.org/10.1007/s10955-017-1906-8.
21.
A. Durmus and É. Moulines, High-dimensional Bayesian inference via the unadjusted Langevin algorithm, Bernoulli, 25 (2019), pp. 2854–2882, https://doi.org/10.3150/18-BEJ1073.
22.
A. Eberle, A. Guillin, and R. Zimmer, Couplings and quantitative contraction rates for Langevin dynamics, Ann. Probab., 47 (2019), pp. 1982–2010, https://doi.org/10.1214/18-AOP1299.
23.
J.-P. Eckmann and M. Hairer, Non-equilibrium statistical mechanics of strongly anharmonic chains of oscillators, Comm. Math. Phys., 212 (2000), pp. 105–164.
24.
A. Friedman, Stochastic Differential Equations and Applications, Vol. 1, Probab. Math. Statist. 28, Academic Press [Harcourt Brace Jovanovich, Publishers], New York-London, 1975.
25.
S. Gadat and F. Panloup, Long time behaviour and stationary regime of memory gradient diffusions, Ann. Inst. H. Poincaré Probab. Statist., 50 (2014), pp. 564–601, https://doi.org/10.1214/12-AIHP536.
26.
X. Gao, M. Gurbuzbalaban, and L. Zhu, Breaking Reversibility Accelerates Langevin Dynamics for Global Non-convex Optimization, preprint, https://arxiv.org/abs/1812.07725, 2018.
27.
S. B. Gelfand and S. K. Mitter, Recursive stochastic algorithms for global optimization in \({\boldsymbol{\mathrm{R}}}^d\) , SIAM J. Control Optim., 29 (1991), pp. 999–1018, https://doi.org/10.1137/0329055.
28.
S. B. Gelfand and S. K. Mitter, Weak convergence of Markov chain sampling methods and annealing algorithms to diffusions, J. Optim. Theory Appl., 68 (1991), pp. 483–498, https://doi.org/10.1007/BF00940066.
29.
S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell., PAMI-6 (1984), pp. 721–741, https://doi.org/10.1109/TPAMI.1984.4767596.
30.
S. Geman and C.-R. Hwang, Diffusions for global optimization, SIAM J. Control Optim., 24 (1986), pp. 1031–1043, https://doi.org/10.1137/0324060.
31.
S. Ghadimi and G. Lan, Accelerated gradient methods for nonconvex nonlinear and stochastic programming, Math. Program., 156 (2016), pp. 59–99, https://doi.org/10.1007/s10107-015-0871-8.
32.
B. Gidas, Global optimization via the Langevin equation, in Proceedings of the 1985 24th IEEE Conference on Decision and Control, 1985, pp. 774–778, https://doi.org/10.1109/CDC.1985.268602.
33.
B. Gidas, Nonstationary Markov chains and convergence of the annealing algorithm, J. Stat. Phys., 39 (1985), pp. 73–131, https://doi.org/10.1007/BF01007975.
34.
R. Holley and D. Stroock, Simulated annealing via Sobolev inequalities, Comm. Math. Phys., 115 (1988), pp. 553–569.
35.
R. A. Holley, S. Kusuoka, and D. W. Stroock, Asymptotics of the spectral gap with applications to the theory of simulated annealing, J. Funct. Anal., 83 (1989), pp. 333–347, https://doi.org/10.1016/0022-1236(89)90023-2.
36.
R. Höpfner, E. Löcherbach, and M. Thieullen, Strongly degenerate time inhomogeneous SDEs: Densities and support properties: Application to Hodgkin–Huxley type systems, Bernoulli, 23 (2017), pp. 2587–2616, https://doi.org/10.3150/16-BEJ820.
37.
C.-R. Hwang, Laplace’s method revisited: Weak convergence of probability measures, Ann. Probab., 8 (1980), pp. 1177–1182, https://www.jstor.org/stable/2243019.
38.
C.-R. Hwang and S. J. Sheu, Large-time behavior of perturbed diffusion Markov processes with applications to the second eigenvalue problem for Fokker-Planck operators and simulated annealing, Acta Appl. Math., 19 (1990), pp. 253–295, https://doi.org/10.1007/BF01321859.
39.
H. J. Kushner, Asymptotic global behavior for stochastic approximation and diffusions with slowly decreasing noise effects: Global minimization via Monte Carlo, SIAM J. Appl. Math., 47 (1987), pp. 169–185, https://doi.org/10.1137/0147010.
40.
H. Lei, N. A. Baker, and X. Li, Data-driven parameterization of the generalized Langevin equation, Proc. Natl. Acad. Sci. USA, 113 (2016), pp. 14183–14188, https://doi.org/10.1073/pnas.1609587113.
41.
B. Leimkuhler and C. Matthews, Molecular Dynamics: With Deterministic and Stochastic Numerical Methods, Interdiscip. Appl. Math. 39, Springer, Cham, 2015.
42.
B. Leimkuhler and M. Sachs, Efficient Numerical Algorithms for the Generalized Langevin Equation, preprint, https://arxiv.org/abs/2012.04245, 2020.
43.
T. Lelièvre, F. Nier, and G. A. Pavliotis, Optimal non-reversible linear drift for the convergence to equilibrium of a diffusion, J. Stat. Phys., 152 (2013), pp. 237–274, https://doi.org/10.1007/s10955-013-0769-x.
44.
C. Li, C. Chen, D. Carlson, and L. Carin, Preconditioned stochastic gradient Langevin dynamics for deep neural networks, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, AAAI Press, Washington, DC, 2016, pp. 1788–1794.
45.
Y.-A. Ma, Y. Chen, C. Jin, N. Flammarion, and M. I. Jordan, Sampling can be faster than optimization, Proc. Natl. Acad. Sci. USA, 116 (2019), pp. 20881–20885, https://doi.org/10.1073/pnas.1820003116.
46.
D. Márquez, Convergence rates for annealing diffusion processes, Ann. Appl. Probab., 7 (1997), pp. 1118–1139, https://doi.org/10.1214/aoap/1043862427.
47.
J. C. Mattingly and A. M. Stuart, Geometric ergodicity of some hypo-elliptic diffusions for particle motions, Inhomogeneous Random Systems (Cergy-Pontoise, 2001), Markov Process. Related Fields, 8 (2002), pp. 199–214.
48.
G. Menz and A. Schlichting, Poincaré and logarithmic Sobolev inequalities by decomposition of the energy landscape, Ann. Probab., 42 (2014), pp. 1809–1884, https://doi.org/10.1214/14-AOP908.
49.
G. Metafune, D. Pallara, and E. Priola, Spectrum of Ornstein-Uhlenbeck operators in \(L^p\) spaces with respect to invariant measures, J. Funct. Anal., 196 (2002), pp. 40–60, https://doi.org/10.1006/jfan.2002.3978.
50.
L. Miclo, Recuit simulé sur \({\boldsymbol{\mathrm{R}}}^n\) . Étude de l’évolution de l’énergie libre, Ann. Inst. H. Poincaré Probab. Statist., 28 (1992), pp. 235–266, http://www.numdam.org/item?id=AIHPB_1992__28_2_235_0.
51.
P. Monmarché, Hypocoercivity in metastable settings and kinetic simulated annealing, Probab. Theory Related Fields, 172 (2018), pp. 1215–1248, https://doi.org/10.1007/s00440-018-0828-y.
52.
P. Monmarché, Generalized \(\Gamma\) calculus and application to interacting particles on a graph, Potential Anal., 50 (2019), pp. 439–466, https://doi.org/10.1007/s11118-018-9689-3.
53.
W. Mou, Y.-A. Ma, M. J. Wainwright, P. L. Bartlett, and M. I. Jordan, High-order Langevin diffusion yields an accelerated MCMC algorithm, J. Mach. Learn. Res., 22 (2021), pp. 1–41, http://jmlr.org/papers/v22/20-576.html.
54.
M. Nava, M. Ceriotti, C. Dryzun, and M. Parrinello, Evaluating functions of positive-definite matrices using colored-noise thermostats, Phys. Rev. E, 89 (2014), 023302, https://doi.org/10.1103/PhysRevE.89.023302.
55.
Y. Nesterov, Lectures on Convex Optimization, 2nd ed., Springer Optim. Appl. 137, Springer, Cham, 2018, https://doi.org/10.1007/978-3-319-91578-4.
56.
M. Ottobre and G. A. Pavliotis, Asymptotic analysis for the generalized Langevin equation, Nonlinearity, 24 (2011), pp. 1629–1653, https://doi.org/10.1088/0951-7715/24/5/013.
57.
M. Ottobre, G. A. Pavliotis, and K. Pravda-Starov, Exponential return to equilibrium for hypoelliptic quadratic systems, J. Funct. Anal., 262 (2012), pp. 4000–4039, https://doi.org/10.1016/j.jfa.2012.02.008.
58.
S. Patterson and Y. W. Teh, Stochastic gradient Riemannian Langevin dynamics on the probability simplex, in Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, eds., Curran Associates, Inc., Red Hook, NY, 2013, pp. 3102–3110.
59.
G. A. Pavliotis, Stochastic Processes and Applications: Diffusion Processes, the Fokker–Planck and Langevin Equations, Texts Appl. Math. 60, Springer, New York, 2014, https://doi.org/10.1007/978-1-4939-1323-7.
60.
G. A. Pavliotis, G. Stoltz, and U. Vaes, Scaling limits for the generalized Langevin equation, J. Nonlinear Sci., 31 (2021), 8, https://doi.org/10.1007/s00332-020-09671-4.
61.
M. Pelletier, Weak convergence rates for stochastic approximation with application to multiple targets and simulated annealing, Ann. Appl. Probab., 8 (1998), pp. 10–44, https://doi.org/10.1214/aoap/1027961032.
62.
R. Pinnau, C. Totzeck, O. Tse, and S. Martin, A consensus-based model for global optimization and its mean-field limit, Math. Models Methods Appl. Sci., 27 (2017), pp. 183–204, https://doi.org/10.1142/S0218202517400061.
63.
C. Prévôt and M. Röckner, A Concise Course on Stochastic Partial Differential Equations, Lecture Notes in Math. 1905, Springer, Berlin, 2007.
64.
G. Royer, A remark on simulated annealing of diffusion processes, SIAM J. Control Optim., 27 (1989), pp. 1403–1408, https://doi.org/10.1137/0327072.
65.
S. Ruder, An Overview of Gradient Descent Optimization Algorithms, preprint, https://arxiv.org/abs/1609.04747, 2016.
66.
M. Sachs, The Generalised Langevin Equation: Asymptotic Properties and Numerical Analysis, Ph.D. thesis, The University of Edinburgh, 2017.
67.
H. Song, I. Triguero, and E. Özcan, A review on the self and dual interactions between machine learning and optimisation, Prog. Artif. Intell., 8 (2019), pp. 143–165, https://doi.org/10.1007/s13748-019-00185-z.
68.
W. Su, S. Boyd, and E. J. Candès, A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights, J. Mach. Learn. Res., 17 (2016), 153.
69.
Y. Sun and A. Garcia, Interactive diffusions for global optimization, J. Optim. Theory Appl., 163 (2014), pp. 491–509, https://doi.org/10.1007/s10957-013-0394-5.
70.
S. Taniguchi, Applications of Malliavin’s calculus to time-dependent systems of heat equations, Osaka J. Math., 22 (1985), pp. 307–320.
71.
X. Wu, B. R. Brooks, and E. Vanden-Eijnden, Self-guided Langevin dynamics via generalized Langevin equation, J. Comput. Chem., 37 (2016), pp. 595–601, https://doi.org/10.1002/jcc.24015.

Information & Authors

Information

Published In

cover image SIAM/ASA Journal on Uncertainty Quantification
SIAM/ASA Journal on Uncertainty Quantification
Pages: 139 - 167
ISSN (online): 2166-2525

History

Submitted: 2 December 2021
Accepted: 22 August 2022
Published online: 1 March 2023

Keywords

  1. nonconvex optimization
  2. generalized Langevin equation
  3. simulated annealing

MSC codes

  1. 60J25
  2. 46N10
  3. 60J60

Authors

Affiliations

Martin Chak Contact the author
Department of Mathematics, Imperial College London, London, SW7 2AZ, UK.
Nikolas Kantas
Department of Mathematics, Imperial College London, London, SW7 2AZ, UK.
Department of Mathematics, Imperial College London, London, SW7 2AZ, UK.

Funding Information

Engineering and Physical Sciences Research Council (EPSRC): EP/P031587/1, EP/L024926/1, EP/L020564/1
Funding: The first author was supported by an EPSRC studentship. The second and third authors were partially supported by JPMorgan Chase & Co. under J.P. Morgan A.I. Research Awards 2019. The third author was also partially supported by the EPSRC through grants EP/P031587/1, EP/L024926/1, and EP/L020564/1.

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.