Abstract

We consider a stochastic optimal exit time feedback control problem. The Bellman equation is solved approximatively via the Policy Iteration algorithm on a polynomial ansatz space by a sequence of linear equations. As high degree multipolynomials are needed, the corresponding equations suffer from the curse of dimensionality even in moderate dimensions. We employ Tensor-Train methods to account for this problem. The approximation process within the Policy Iteration is done via a Least-Squares ansatz and the integration is done via Monte-Carlo methods. Numerical evidences are given for the (multidimensional) double-well potential, a three-hole potential, and a 40-dimensional stochastic Van der Pol oscillator.

Keywords

  1. policy iteration
  2. tensor train
  3. Koopman operator
  4. variational Monte Carlo
  5. Hamilton--Jacobi--Bellmann (HJB)
  6. exit time

MSC codes

  1. 93E03
  2. 65C30
  3. 90C39

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
M. Akian and E. Fodjo, Probabilistic max-plus schemes for solving Hamilton-Jacobi-Bellman equations, Springer, Cham, 2018, pp. 183--209, https://doi.org/10.1007/978-3-030-01959-4_9.
2.
L. Arnold, Random Dynamical Systems, Springer Monogr. Math., Springer, Berlin, Heidelberg, 1998.
3.
L. Arnold and M. Scheutzow, Perfect cocycles through stochastic differential equations, Probab. Theory Related Fields, 101 (1995), pp. 65--88.
4.
M. Bachmayr, A. Cohen, and W. Dahmen, Parametric PDEs: Sparse or low-rank approximations?, IMA J. Numer. Anal., 38 (2017), pp. 1661--1708, https://doi.org/10.1093/imanum/drx052.
5.
M. Bachmayr, A. Cohen, D. Dung, and C. Schwab, Fully discrete approximation of parametric and stochastic elliptic PDEs, SIAM J. Numer. Anal., 55 (2017), pp. 2151--2186, https://doi.org/10.1137/17M111626X.
6.
M. Bachmayr, R. Schneider, and A. Uschmajew, Tensor networks and hierarchical tensors for the solution of high-dimensional partial differential equations, Found. Comput. Math., 16 (2016), pp. 1423--1472, https://doi.org/10.1007/s10208-016-9317-9.
7.
M. Bardi and I. Capuzzo-Dolcetta, Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations, Birkäuser, Boston, 1997, https://doi.org/10.1007/978-0-8176-4755-1.
8.
E. N. Barron, The Bellman equation for control of the running max of a diffusion and applications to look-back options, Appl. Anal., 48 (1993), pp. 205--222, https://doi.org/10.1080/00036819308840158.
9.
C. Beck, S. Becker, P. Grohs, N. Jaafari, and A. Jentzen, Solving the Kolmogorov PDE by Means of Deep Learning, preprint, https://arxiv.org/abs/1806.00421, 2018.
10.
R. Bellman, Functional equations in the theory of dynamic programming-- V: Positivity and quasi-linearity, Proc. Natl. Acad. Sci. USA, 41 (1955), pp. 743--746, https://doi.org/10.1073/pnas.41.10.743.
11.
R. Bellman, Dynamic Programming, Princeton University Press, Princeton, NJ, 1961.
12.
D. Bertsekas, Dynamic Programming and Optimal Control, Vol. 2, Approximate Dynamic Programming, 4th ed., Athena Scientific, Belmont, MA, 2019.
13.
D. P. Bertsekas, Approximate policy iteration: A survey and some new methods, J. Control Theory Appl., 9 (2011), pp. 310--335.
14.
J.-M. Bismut, An introductory approach to duality in optimal stochastic control, SIAM Rev., 20 (1978), pp. 62--78, https://doi.org/10.1137/1020004.
15.
O. Bokanowski, J. Garcke, M. Griebel, and I. Klompmaker, An adaptive sparse grid semi-Lagrangian scheme for first order Hamilton-Jacobi-Bellman equations, J. Sci. Comput., 55 (2013), pp. 575--605, https://doi.org/10.1007/s10915-012-9648-x.
16.
F. Bonnans, E. Ottenwaelter, and H. Zidani, A fast algorithm for the two dimensional hjb equation of stochastic control, ESAIM Math. Model. Numer. Anal., 38 (2004), pp. 723--735, https://doi.org/10.1051/m2an:2004034.
17.
F. Bonnans and H. Zidani, Consistency of generalized finite difference schemes for the stochastic HJB equation, SIAM J. Numer. Anal., 41 (2003), pp. 1008--1021, https://doi.org/10.1137/S0036142901387336.
18.
R. Buckdahn and T. Nie, Generalized Hamilton--Jacobi--Bellman equations with Dirichlet boundary condition and stochastic exit time optimal control problem, SIAM J. Control Optim., 54 (2016), pp. 602--631, https://doi.org/10.1137/140998160.
19.
M. Budišić, R. Mohr, and I. Mezić, Applied Koopmanism, Chaos, 22 (2012), 047510, https://doi.org/10.1063/1.4772195.
20.
H.-J. Bungartz and M. Griebel, Sparse grids, Acta Numer., 13 (2004), pp. 147--269, https://doi.org/10.1017/S0962492904000182.
21.
A. Cohen and G. Migliorati, Optimal Weighted Least-squares Methods, 2016, https://hal.archives-ouvertes.fr/hal-01354003.
22.
N. Črnjarić-Žic, S. Maćešić, and I. Mezić, Koopman Operator Spectrum for Random Dynamical Systems, preprint, https://arxiv.org/abs/1711.03146, 2017.
23.
F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc., 39 (2001), pp. 1--49, https://doi.org/10.1090/S0273-0979-01-00923-5.
24.
G. Da Prato and J. Zabczyk, Stochastic Equations in Infinite Dimensions, Encyclopedia of Mathematics and Its Applications, Cambridge University Press, Cambridge, 1992, https://doi.org/10.1017/CBO9780511666223.
25.
T. Damm, H. Mena, and T. Stillfjord, Numerical solution of the finite horizon stochastic linear quadratic control problem, Numer. Linear Algebra Appl., 24 (2017), e2091, https://doi.org/10.1002/nla.2091.
26.
K. Debrabant and E. Jakobsen, Semi-Lagrangian schemes for linear and fully non-linear Hamilton-Jacobi-Bellman equations, in Hyperbolic Problems: Theory, Numerics, Applications, AIMS Ser. Appl. Math. 8, Am. Inst. Math. Sci. (AIMS), Springfield, MO, 2014, pp. 483--490.
27.
M. Dellnitz, G. Froyland, and S. Sertl, On the isolated spectrum of the Perron-Frobenius operator, Nonlinearity, 13 (2000), pp. 1171--1188, https://doi.org/10.1088/0951-7715/13/4/310.
28.
M. Dellnitz and O. Junge, On the approximation of complicated dynamical behavior, SIAM J. Numer. Anal., 36 (1999), pp. 491--515, https://doi.org/10.1137/S0036142996313002.
29.
S. Dolgov, D. Kalise, and K. Kunisch, Tensor Decomposition Approach for High-Dimensional Hamilton-Jacobi-Bellman Equations, preprint, https://arxiv.org/abs/1908.01533, 2019.
30.
M. Eigel, R. Schneider, and P. Trunschke, Convergence Bounds for Empirical Nonlinear Least-squares, preprint, https://arxiv.org/abs/2001.00639, 2020.
31.
M. Eigel, R. Schneider, P. Trunschke, and S. Wolf, Variational Monte Carlo---Bridging concepts of machine learning and high-dimensional partial differential equations, Adv. Comput. Math., 45 (2019), pp. 2503--2532, https://doi.org/10.1007/s10444-019-09723-8.
32.
G. Fabbri, F. Gozzi, and A. Swieech, Stochastic Optimal Control in Infinite Dimension, Probab. Theory Stoch. Model. 82, Springer, Cham, 2017, https://doi.org/10.1007/978-3-319-53067-3.
33.
M. Falcone, A numerical approach to the infinite horizon problem of deterministic control theory, Appl. Math. Optim., 15 (1987), pp. 1--13, https://doi.org/10.1007/BF01442644.
34.
M. Falcone and R. Ferretti, Semi-Lagrangian Approximation Schemes for Linear and Hamilton--Jacobi Equations, SIAM, Philadelphia, 2013, https://doi.org/10.1137/1.9781611973051.
35.
W. H. Fleming, Controlled Markov processes and mathematical finance, NATO Sci. Ser. C Math. Phys. Sci. 528, Kluwer Acad. Publ., Dordrecht, 1999, pp. 407--446.
36.
W. H. Fleming and H. Soner, Control Markov Processes and Viscosity Solutions, 2nd ed., Springer, New York, 2006.
37.
J. Garcke and A. Kröner, Suboptimal feedback control of PDEs by solving HJB equations on adaptive sparse grids, J. Sci. Comput., 70 (2017), pp. 1--28, https://doi.org/10.1007/s10915-016-0240-7.
38.
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, Cambridge, 2016, http://www.deeplearningbook.org.
39.
A. Gorodetsky, S. Karaman, and Y. Marzouk, High-dimensional stochastic optimal control using continuous tensor decompositions, Int. J. Rob. Res., 37 (2018), pp. 340--377.
40.
W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Ser. Comput. Math. 42, Springer, Berlin, Heidelberg, 2012, https://doi.org/10.1007/978-3-642-28027-6.
41.
W. Hackbusch, Numerical tensor calculus, Acta Numer., 23 (2014), pp. 651--742, https://doi.org/10.1017/S0962492914000087.
42.
W. Hackbusch and R. Schneider, Tensor spaces and hierarchical tensor representations, Springer, Cham, 2014, pp. 237--261, https://doi.org/10.1007/978-3-319-08159-5_12.
43.
J. Han and W. E, Deep Learning Approximation for Stochastic Control Problems, preprint, https://arxiv.org/abs/1611.07422, 2016.
44.
J. Han, A. Jentzen, and W. E, Solving high-dimensional partial differential equations using deep learning, Proc. Natl. Acad. Sci. USA, 115 (2018), pp. 8505--8510, https://doi.org/10.1073/pnas.1718942115.
45.
C. Hartmann, R. Banisch, M. Sarich, T. Badowski, and C. Schütte, Characterization of rare events in molecular dynamics, Entropy, 16 (2013), pp. 350--376, https://doi.org/10.3390/e16010350.
46.
C. Hartmann, O. Kebiri, L. Neureither, and L. Richter, Variational approach to rare event simulation using least-squares regression, Chaos, 29 (2019), 063107.
47.
C. Hartmann, L. Richter, C. Schütte, and W. Zhang, Variational characterization of free energy: Theory and algorithms, Entropy, 19 (2017), 626, https://doi.org/10.3390/e19110626.
48.
C. Hartmann and C. Schütte, Efficient rare event simulation by optimal nonequilibrium forcing, J. Stat. Mech., 2012 (2012), P11004, https://doi.org/10.1088/1742-5468/2012/11/p11004.
49.
S. Holtz, T. Rohwedder, and R. Schneider, The alternating linear scheme for tensor optimization in the tensor train format, SIAM J. Sci. Comput., 34 (2012), pp. A683--A713, https://doi.org/10.1137/100818893.
50.
S. Holtz, T. Rohwedder, and R. Schneider, On manifolds of tensors of fixed tt-rank, Numer. Math., 120 (2012), pp. 701--731, https://doi.org/10.1007/s00211-011-0419-7.
51.
R. A. Howard, Dynamic Programming and Markov Processes, The Technology Press of the MIT, Cambridge MA, John Wiley & Sons, New York, 1960.
52.
B. Huber and S. Wolf, Xerus -- A General Purpose Tensor Library, https://libxerus.org/, 2017.
53.
S. Huo and J. E. Straub, The MaxFlux algorithm for calculating variationally optimized reaction paths for conformational transitions in many body systems at finite temperature, J. Chem. Phys., 107 (1997), pp. 5000--5006, https://doi.org/10.1063/1.474863.
54.
M. Jensen and I. Smears, On the convergence of finite element methods for Hamilton--Jacobi--Bellman equations, SIAM J. Numer. Anal., 51 (2013), pp. 137--162, https://doi.org/10.1137/110856198.
55.
B. Kafash, A. Delavarkhalafi, and S. Karbassi, Application of variational iteration method for Hamilton--Jacobi--Bellman equations, Appl. Math. Model., 37 (2013), pp. 3917--3928, https://doi.org/10.1016/j.apm.2012.08.013.
56.
B. N. Khoromskij, Tensors-structured numerical methods in scientific computing: Survey on recent advances, Chemom. Intell. Lab. Syst., 110 (2011), pp. 1--19, https://doi.org/10.1016/j.chemolab.2011.09.001.
57.
F. C. Klebaner, Introduction to Stochastic Calculus with Applications, 3rd ed., Imperial College Press, London, 2012, https://doi.org/10.1142/p821.
58.
S. Klus, P. Koltai, and C. Schütte, On the numerical approximation of the Perron-Frobenius and Koopman operator, J. Comput. Dynam., 3 (2016), pp. 51--79, https://doi.org/10.3934/jcd.2016003.
59.
B. O. Koopman, Hamiltonian systems and transformation in Hilbert space, Proc. of the National Academy of Sciences, 17 (1931), pp. 315--318, https://doi.org/10.1073/pnas.17.5.315.
60.
A. Kröner, A. Picarelli, and H. Zidani, Infinite horizon stochastic optimal control problems with running maximum cost, SIAM J. Control Optim., 56 (2017), pp. 3296--3319, https://doi.org/10.1137/17M115253X.
61.
B. Kutschan, Tangent cones to tensor train varieties, Linear Algebra Appl., 544 (2018), pp. 370--390, https://doi.org/10.1016/j.laa.2018.01.012.
62.
A. Lasota and M. C. Mackey, Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics, 2nd ed., Appl. Math. Sci. 97, Springer, New York, 1994.
63.
J. Lawton and R. W. Beard, Numerically efficient approximations to the Hamilton-Jacobi-Bellman equation, in Proceedings of the 1998 American Control Conference. ACC (IEEE Cat. 98CH36207), Vol. 1, 1998, pp. 195--199, https://doi.org/10.1109/ACC.1998.694657.
64.
B. Luo, H.-N. Wu, T. Huang, and D. Liu, Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design, Automatica J. IFAC, 50 (2014), pp. 3281--3290, https://doi.org/10.1016/j.automatica.2014.10.056.
65.
A. Mauroy, I. Mezić, and Y. Susuki, The Koopman Operator in System and Control, Lect. Notes Control Inf. Sci. 484, Springer, Cham, 2019.
66.
M. Nisio, Stochastic Control Theory, 2nd ed., Springer, Japan, 2015, https://doi.org/10.1007/978-4-431-55123-2.
67.
N. Nüsken and L. Richter, Solving High-dimensional Hamilton-Jacobi-Bellman PDEs using Neural Networks: Perspectives from the Theory of Controlled Diffusions and Measures on Path Space, preprint, https://arxiv.org/abs/2005.05409, 2020.
68.
T. E. Oliphant, A Guide to NumPy, Trelgol Publishing, 2006.
69.
B. Øksendal, Stochastic Differential Equations: An Introduction with Applications, Springer, Berlin, Heidelberg, 2003.
70.
I. Oseledets, Tensor-train decomposition, SIAM J. Sci. Comput., 33 (2011), pp. 2295--2317, https://doi.org/10.1137/090752286.
71.
I. Oseledets and E. Tyrtyshnikov, Breaking the curse of dimensionality, or how to use SVD in many dimensions, SIAM J. Sci. Comput., 31 (2009), pp. 3744--3759, https://doi.org/10.1137/090748330.
72.
M. Oster, L. Sallandt, and R. Schneider, Approximating the Stationary Bellman Equation by Hierarchical Tensor Products, preprint, https://arxiv.org/abs/1911.00279, 2021.
73.
S. Park, M. K. Sener, D. Lu, and K. Schulten, Reaction paths based on mean first-passage times, J. Chem. Phys., 119 (2003), pp. 1313--1319, https://doi.org/10.1063/1.1570396.
74.
G. A. Pavliotis, Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations, Texts Appl. Math. 60, Springer, New York, 2014.
75.
S. Peng, A generalized dynamic programming principle and Hamilton-Jacobi-Bellman equation, Stochastics and Stochastic Reports, 38 (1992), pp. 119--134, https://doi.org/10.1080/17442509208833749.
76.
Y. Peng and J. Li, Stochastic Optimal Control of Structures, Springer, 2019.
77.
H. Pham, X. Warin, and M. Germain, Neural Networks-based Backward Scheme for Fully Nonlinear PDEs, preprint, https://arxiv.org/abs/1908.00412, 2020.
78.
I. Pinelis, Optimum bounds for the distributions of Martingales in Banach spaces, Ann. Probab., 22 (1994), pp. 1679--1706, https://doi.org/10.1214/aop/1176988477.
79.
M. Raissi, P. Perdikaris, and G. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys., 378 (2019), pp. 686--707, https://doi.org/10.1016/j.jcp.2018.10.045.
80.
L. Rosasco, M. Belkin, and E. De Vito, On learning with integral operators, J. Mach. Learn. Res., 11 (2010), pp. 905--934, https://jmlr.org/papers/v11/rosasco10a.html.
81.
B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, 2001.
82.
C. Schütte, A. Fischer, W. Huisinga, and P. Deuflhard, A direct approach to conformational dynamics based on hybrid Monte Carlo, J. Comput. Phys., 151 (1999), pp. 146--168, https://doi.org/10.1006/jcph.1999.6231.
83.
C. Schütte, S. Winkelmann, and C. Hartmann, Optimal control of molecular dynamics using Markov state models, Math. Program., 134 (2012), pp. 259--282, http://publications.mi.fu-berlin.de/1107/.
84.
W. Sickel and T. Ullrich, Tensor products of Sobolev--Besov spaces and applications to approximation from the hyperbolic cross, J. Approx. Theory, 161 (2009), pp. 748--786.
85.
I. Steinwart and A. Christmann, Support Vector Machines, Springer, New York, 2008.
86.
S. Szalay, M. Pfeffer, V. Murg, G. Barcza, F. Verstraete, R. Schneider, and Örs Legeza, Tensor product methods and entanglement optimization for ab initio quantum chemistry, Int. J. Quantum Chem., 115 (2015), pp. 1342--1391, https://doi.org/10.1002/qua.24898.
87.
D. Tonon, M. Aronna, and D. Kalise, Optimal Control: Novel Directions and Applications, Springer, Cham, 2017, https://doi.org/10.1007/978-3-319-60771-9.
88.
P. Trunschke, Convergence Bounds for Nonlinear Least Squares and Applications to Tensor Recovery, preprint, https://arxiv.org/abs/2108.05237, 2021.
89.
S. Van Der Walt, S. C. Colbert, and G. Varoquaux, The Numpy array: A structure for efficient numerical computation, Comput. Sci. Eng., 13 (2011), pp. 22--30.
90.
P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. Jarrod Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. Carey, İ. Polat, Y. Feng, E. W. Moore, J. Vand erPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and SciPy 1.0 Contributors, SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python, Nature Methods, 17 (2020), pp. 261--272, https://doi.org10.1038/s41592-019-0686-2.
91.
Y. Xu, R. Gu, H. Zhang, W. Xu, and J. Duan, Stochastic bifurcations in a bistable Duffing--Van der Pol oscillator with colored noise, Phys. Rev. E, 83 (2011), 056215.
92.
M. Zhou, J. Han, and J. Lu, Actor-critic method for high dimensional static Hamilton--Jacobi--Bellman partial differential equations based on neural networks, SIAM J. Sci. Comput., 43 (2021), pp. A4043--A4066, https://doi.org/10.1137/21M1402303.

Information & Authors

Information

Published In

cover image Multiscale Modeling & Simulation
Multiscale Modeling & Simulation
Pages: 379 - 403
ISSN (online): 1540-3467

History

Submitted: 9 October 2020
Accepted: 30 November 2021
Published online: 21 March 2022

Keywords

  1. policy iteration
  2. tensor train
  3. Koopman operator
  4. variational Monte Carlo
  5. Hamilton--Jacobi--Bellmann (HJB)
  6. exit time

MSC codes

  1. 93E03
  2. 65C30
  3. 90C39

Authors

Affiliations

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

On May 28, 2024, our site will enter Read Only mode for a limited time in order to complete a platform upgrade. As a result, the following functions will be temporarily unavailable: registering new user accounts, any updates to existing user accounts, access token activations, and shopping cart transactions. Contact [email protected] with any questions.