Abstract.

We consider a problem of stochastic optimal control with separable drift uncertainty in strong formulation on a finite time horizon. The drift of the state \(Y^{u}\) is multiplicatively influenced by an unknown random variable \(\lambda\), while admissible controls \(u\) are required to be adapted to the observation filtration. Choosing a control actively influences the state and information acquisition simultaneously and comes with a learning effect. The problem, initially non-Markovian, is embedded into a higher-dimensional Markovian, full information control problem with control-dependent filtration and noise. To that problem, we apply the stochastic Perron method to characterize the value function as the unique viscosity solution of the HJB equation, explicitly construct \(\varepsilon\)-optimal controls, and show that the values in the strong and weak formulations agree. Numerical illustrations show a significant difference between the adaptive control and the certainty equivalent control, highlighting a substantial learning effect.

Keywords

  1. adaptive control
  2. drift uncertainty
  3. exploration-exploitation trade-off

MSC codes

  1. 93E20
  2. 93E11
  3. 93C40
  4. 49L25

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
A. Astolfi, D. Karagiannis, and R. Ortega, Nonlinear and Adaptive Control with Applications, Springer, New York, 2008.
2.
E. Bandini, A. Cosso, M. Fuhrman, and H. Pham, Backward SDEs for optimal control of partially observed path-dependent stochastic systems: A control randomization approach, Ann. Appl. Probab., 28 (2018), pp. 1634–1678, https://doi.org/10.1214/17-AAP1340.
3.
Y. Bar-Shalom and E. Tse, Dual effect, certainty equivalence, and separation in stochastic control, IEEE Trans. Automat. Control, 19 (1974), pp. 494–500, https://doi.org/10.1109/TAC.1974.1100635.
4.
G. Barles and P. E. Souganidis, Convergence of approximation schemes for fully nonlinear second order equations, Asymptot. Anal., 4 (1991), pp. 271–283, https://doi.org/10.3233/ASY-1991-4305.
5.
E. Bayraktar and M. Sîrbu, Stochastic Perron’s method and verification without smoothness using viscosity comparison: The linear case, Proc. Amer. Math. Soc., 140 (2012), pp. 3645–3654, https://doi.org/10.1090/S0002-9939-2012-11336-X.
6.
E. Bayraktar and M. Sîrbu, Stochastic Perron’s method for Hamilton–Jacobi–Bellman equations, SIAM J. Control Optim., 51 (2013), pp. 4274–4294, https://doi.org/10.1137/12090352X.
7.
C. Belak, A. Chen, C. Mereu, and R. Stelzer, Optimal investment with time-varying stochastic endowments, SIAM J. Financial Math., 13 (2022), pp. 969–1003, https://doi.org/10.1137/21M1453402.
8.
C. Belak, L. Mich, and F. T. Seifried, Optimal investment for retail investors, Math. Finance, 32 (2022), pp. 555–594, https://doi.org/10.1111/mafi.12336.
9.
V. E. Beneš, I. Karatzas, and R. Rishel, The separation principle for a Bayesian adaptive control problem with no strict-sense optimal law, Stoch. Monogr., 5 (1991).
10.
A. Bensoussan, Stochastic Control of Partially Observable Systems, Cambridge University Press, Cambridge, UK, 1992.
11.
J.-M. Bismut, Partially observed diffusions and their control, SIAM J. Control Optim., 20 (1982), pp. 302–309, https://doi.org/10.1137/0320023.
12.
P. Caines and H. Chen, Optimal adaptive LQG control for systems with finite state process parameters, IEEE Trans. Automat. Control, 30 (1985), pp. 185–189, https://doi.org/10.1109/TAC.1985.1103907.
13.
Á. Cartea, S. Jaimungal, and J. Penalva, Algorithmic and High-Frequency Trading, Cambridge University Press, Cambridge, UK, 2015.
14.
J. Claisse, D. Talay, and X. Tan, A pseudo-Markov property for controlled diffusion processes, SIAM J. Control Optim., 54 (2016), pp. 1017–1029, https://doi.org/10.1137/151004252.
15.
S. N. Cohen and R. J. Elliott, Stochastic Calculus and Applications, Vol. 2, Springer, New York, 2015.
16.
M. G. Crandall, H. Ishii, and P.-L. Lions, User’s guide to viscosity solutions of second order partial differential equations, Bull. Amer. Math. Soc., 27 (1992), pp. 1–67, https://doi.org/10.1090/S0273-0979-1992-00266-5.
17.
T. E. Duncan, L. Guo, and B. Pasik-Duncan, Adaptive continuous-time linear quadratic Gaussian control, IEEE Trans. Automat. Control, 44 (1999), pp. 1653–1662, https://doi.org/10.1109/9.788532.
18.
T. E. Duncan and B. Pasik-Duncan, Adaptive control of continuous-time linear stochastic systems, Math. Control Signals Syst., 3 (1990), pp. 45–60, https://doi.org/10.1007/BF02551355.
19.
E. Ekström, I. Karatzas, and J. Vaicenavicius, Bayesian sequential least-squares estimation for the drift of a Wiener process, Stochastic Process. Appl., 145 (2022), pp. 335–352, https://doi.org/10.1016/j.spa.2019.09.006.
20.
N. El Karoui, D. H. Nguyen, and M. Jeanblanc-Picqué, Existence of an optimal Markovian filter for the control under partial observations, SIAM J. Control Optim., 26 (1988), pp. 1025–1061, https://doi.org/10.1137/0326057.
21.
A. A. Feldbaum, Dual control theory. I, Avtomat. i Telemekh., 21 (1960), pp. 1240–1249.
22.
W. H. Fleming and É. Pardoux, Optimal control for partially observed diffusions, SIAM J. Control Optim., 20 (1982), pp. 261–285, https://doi.org/10.1137/0320021.
23.
T. T. Georgiou and A. Lindquist, The separation principle in stochastic control, redux, IEEE Trans. Automat. Control, 58 (2013), pp. 2481–2494, https://doi.org/10.1109/TAC.2013.2259207.
24.
O. Guéant, The Financial Mathematics of Market Liquidity: From Optimal Execution to Market Making, CRC Press, Boca Raton, FL, 2016.
25.
H. V. Henderson and S. R. Searle, Vec and vech operators for matrices, with some uses in jacobians and multivariate statistics, Canad. J. Statist., 7 (1979), pp. 65–81, https://doi.org/10.2307/3315017.
26.
A. J. Heunis, The innovations problem, in Oxford Handbook of Nonlinear Filtering, Oxford University Press, New York, 2011, pp. 425–449.
27.
K. Holkar and L. M. Waghmare, An overview of model predictive control, Int. J. Control Automat., 3 (2010), pp. 47–63.
28.
K. Ishii, Viscosity solutions of nonlinear second order elliptic PDEs associated with impulse control problems, Funkcial. Ekvac, 36 (1993), pp. 123–141.
29.
D. Jiang, J. Sirignano, and S. N. Cohen, Global Convergence of Deep Galerkin and PINNs Methods for Solving Partial Differential Equations, preprint, https://arxiv.org/abs/2305.06000, 2023.
30.
I. Karatzas and D. L. Ocone, The resolvent of a degenerate diffusion on the plane, with application to partially observed stochastic control, Ann. Appl. Probab., 2 (1992), pp. 629–668, https://doi.org/10.1214/aoap/1177005653.
31.
I. Karatzas and D. L. Ocone, The finite–horizon version for a partially-observed stochastic control problem of Beneš & Rishel, Stoch. Anal. Appl., 11 (1993), pp. 569–605, https://doi.org/10.1080/07362999308809332.
32.
I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus, Springer, New York, 1998.
33.
I. Karatzas and X. Zhao, Bayesian adaptive portfolio optimization, in Option Pricing, Interest Rates and Risk Management, Cambridge University Press, Cambridge, UK, 2001, pp. 632–669.
34.
V. Krishnamurthy, Partially Observed Markov Decision Processes, Cambridge University Press, Cambridge, UK, 2016.
35.
N. V. Krylov, Approximating value functions for controlled degenerate diffusion processes by using piece-wise constant policies, Electron. J. Probab., 4 (1999), pp. 1–19, https://doi.org/10.1214/EJP.v4-39.
36.
N. V. Krylov, Controlled Diffusion Processes, Springer, New York, 2008.
37.
P. R. Kumar, A survey of some results in stochastic adaptive control, SIAM J. Control Optim., 23 (1985), pp. 329–380, https://doi.org/10.1137/0323023.
38.
R. S. Liptser and A. N. Shiryaev, Statistics of Random Processes I: General Theory, Stoch. Model. Appl. Probab. 5, Springer, New York, 2013.
39.
R. S. Liptser and A. N. Shiryaev, Statistics of Random Processes II: Applications, Stoch. Model. Appl. Probab. 6, Springer, New York, 2013.
40.
H. Mania, S. Tu, and B. Recht, Certainty equivalence is efficient for linear quadratic control, Adv. Neural Inf. Process. Syst., 32 (2019).
41.
D. Revuz and M. Yor, Continuous Martingales and Brownian Motion, Springer, New York, 2013.
42.
M. Schäl, A selection theorem for optimization problems, Arch. Math., 25 (1974), pp. 219–224, https://doi.org/10.1007/BF01238668.
43.
J. Sirignano and K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, J. Comput. Phys., 375 (2018), pp. 1339–1364, https://doi.org/10.1016/j.jcp.2018.08.029.
44.
L. Szpruch, T. Treetanthiploet, and Y. Zhang, Exploration-exploitation trade-off for continuous-time episodic reinforcement learning with linear-convex models, preprint, https://arxiv.org/abs/2112.10264, 2021.
45.
K. T. Webster, Handbook of Price Impact Modeling, CRC Press, Boca Raton, FL, 2023.
46.
D. V. Widder, Positive temperatures on an infinite rod, Trans. Amer. Math. Soc., 55 (1944), pp. 85–95, https://doi.org/10.1090/S0002-9947-1944-0009795-2.
47.
W. M. Wonham, On the separation theorem of stochastic control, SIAM J. Control, 6 (1968), pp. 312–326, https://doi.org/10.1137/0306023.

Information & Authors

Information

Published In

cover image SIAM Journal on Control and Optimization
SIAM Journal on Control and Optimization
Pages: 1348 - 1373
ISSN (online): 1095-7138

History

Submitted: 12 November 2023
Accepted: 3 January 2025
Published online: 22 April 2025

Keywords

  1. adaptive control
  2. drift uncertainty
  3. exploration-exploitation trade-off

MSC codes

  1. 93E20
  2. 93E11
  3. 93C40
  4. 49L25

Authors

Affiliations

Mathematical Institute, University of Oxford, Oxford, United Kingdom, OX2 6GG.
School of Computation, Information and Technology, Technische Universität München, Parkring 11–13, 85748, Garching bei München, Germany.
Alexander Merkel
Institut für Mathematik, Technische Universität Berlin, Straße des 17. Juni, 136 10623, Berlin, Germany.

Funding Information

Oxford-Man Institute for Quantitative Finance
Funding: This research was supported by the Deutsche Forschungsgemeinschaft through the Berlin–Oxford IRTG 2544, Stochastic Analysis in Interaction. The first author also acknowledges the support of the UKRI Prosperity Partnership Scheme (FAIR) under EPSRC grant EP/V056883/1, the Alan Turing Institute, and the Oxford–Man Institute for Quantitative Finance.

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item

View Options

View options

PDF

View PDF

Full Text

View Full Text

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media