Abstract

This paper develops algorithms for high-dimensional stochastic control problems based on deep learning and dynamic programming. Unlike classical approximate dynamic programming approaches, we first approximate the optimal policy by means of neural networks in the spirit of deep reinforcement learning, and then the value function by Monte Carlo regression. This is achieved in the dynamic programming recursion by performance or hybrid iteration and regress-now methods from numerical probabilities. We provide a theoretical justification of these algorithms. Consistency and rate of convergence for the control and value function estimates are analyzed and expressed in terms of the universal approximation error of the neural networks, and of the statistical error when estimating network function, leaving aside the optimization error. Numerical results on various applications are presented in a companion paper [Deep neural networks algorithms for stochastic control problems on finite horizon: Numerical applications, Methodol. Comput. Appl. Probab., to appear] and illustrate the performance of the proposed algorithms.

Keywords

  1. deep learning
  2. dynamic programming
  3. performance iteration
  4. regress-now
  5. convergence analysis
  6. statistical risk

MSC codes

  1. 65C05
  2. 90C39
  3. 93E35

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
K. Asadi and M. Littman, An alternative Softmax operator for reinforcement learning, in Proceedings of the 34th International Conference on Machine Learning, Vol. 70, 2017, pp. 243--252.
2.
F. Bach, Breaking the curse of dimensionality with convex neural networks, J. Mach. Learn. Res., 18 (2017), pp. 1--53.
3.
A. Bachouch, C. Huré, N. Langrené, and H. Pham, Deep neural networks algorithms for stochastic control problems on finite horizon: Numerical applications, Methodol. Comput. Appl. Probab., to appear.
4.
C. Beck, A. Jentzen, and B. Kuckuck, Full Error Analysis for the Training of Deep Neural Networks, arXiv:1910.00121v2, 2020.
5.
B. Bercu and J. Fort, Generic Stochastic Gradient Methods, in Wiley Encyclopedia of Operations Research and Management Science, Wiley, New York, 2011, pp. 1--8.
6.
D. P. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996.
7.
G. Cybenko, Approximations by superpositions of sigmoidal functions, Math. Control Signals Systems, 2 (1989), pp. 303--314.
8.
W. E, J. Han, and A. Jentzen, Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations, Commun. Math. Stat., 5 (2017), pp. 349--380.
9.
A. Géron, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd ed., O'Reilly Media, Newton, MA, 2019.
10.
L. Györfi, M. Kohler, A. Krzyzak, and H. Walk, A Distribution-Free Theory of Nonparametric Regression, Springer Ser. Statist., Springer, New York, 2002.
11.
J. Han and W. E, Deep learning approximation for stochastic control problems, in Proceedings of NIPS Deep Reinforcement Learning Workshop, 2016.
12.
J. Han and J. Long, Convergence of the deep BSDE method for coupled FBSDEs, Probab. Uncertainty Quant. Risk, 5 (2020), pp. 1--33.
13.
P. Henry-Labordère, Deep Primal-Dual Algorithm for BSDEs: Applications of Machine Learning to CVA and IM, SSRN:3071506, 2017.
14.
K. Hornick, Approximation capabilities of multilayer feedforward networks, Neural Networks, 4 (1991), pp. 251--257.
15.
A. Jentzen, B. Kuckuck, A. Neufeld, and P. von Wurstemberger, Strong Error Analysis for Stochastic Gradient Descent Optimization Algorithms, arXiv:1801.09324v1, 2018.
16.
M. Kohler, Nonparametric regression with additional measurement errors in the dependent variable, J. Statist. Plann. Inference, 136 (2006), pp. 3339--3361.
17.
M. Kohler, A. Krzyżak, and N. Todorovic, Pricing of high-dimensional American options by neural networks, Math. Finance, 20 (2010), pp. 383--410.
18.
A. N. Kolmogorov, On the representation of continuous functions of several variables by superpositions of continuous functions of a smaller number of variables, Math. Appl. (Soviet Ser.), 25 (1991).
19.
S. Kou, X. Peng, and X. Xu, EM Algorithm and Stochastic Control in Economics, SSRN:2865124, 2016.
20.
Y. Li, Deep Reinforcement Learning: An Overview, arXiv:1701.07274v3, 2017.
21.
F. A. Longstaff and E. S. Schwartz, Valuing American options by simulation: A simple least-squares approach, Rev. Financial Stud., 14 (2001), pp. 113--147.
22.
V. Mnih, K. Kavukcuoglu, D. Silver, and A. A. Rusu, Human-level control through deep reinforcement learning, Nature, 518 (2015), pp. 529--533.
23.
W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, Wiley, New York, 2011.
24.
R. S. Sutton and A. G. Barto, Reinforcement Learning, MIT Press, Cambridge, UK, 1998.

Information & Authors

Information

Published In

cover image SIAM Journal on Numerical Analysis
SIAM Journal on Numerical Analysis
Pages: 525 - 557
ISSN (online): 1095-7170

History

Submitted: 3 February 2020
Accepted: 23 November 2020
Published online: 22 February 2021

Keywords

  1. deep learning
  2. dynamic programming
  3. performance iteration
  4. regress-now
  5. convergence analysis
  6. statistical risk

MSC codes

  1. 65C05
  2. 90C39
  3. 93E35

Authors

Affiliations

Funding Information

FiME

Funding Information

Norwegian Research Council : 250768/F20

Funding Information

Agence Nationale de la Recherche https://doi.org/10.13039/501100001665 : ANR-15-CE05-0024

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.