
We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularizers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures–Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.


  1. continuous-time linear-quadratic control
  2. policy optimization
  3. relative entropy
  4. geometry-aware gradient
  5. global linear convergence
  6. mesh-independent convergence

MSC codes

  1. 68Q25
  2. 93E20

Published In

SIAM Journal on Control and Optimization
Pages: 1060 - 1092
ISSN (online): 1095-7138


Submitted: 8 November 2022
Accepted: 5 January 2024
Published online: 22 March 2024


Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK.
Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK.
Department of Mathematics, Imperial College London, London, SW7 2AZ, UK.

