
\(Q\)-learning with function approximation is one of the most empirically successful while theoretically mysterious reinforcement learning (RL) algorithms and was identified in [R. S. Sutton, in European Conference on Computational Learning Theory, Springer, New York, 1999, pp. 11–17] as one of the most important theoretical open problems in the RL community. Even in the basic setting where linear function approximation is used, there are well-known divergent examples. In this work, we propose a stable online variant of \(Q\)-learning with linear function approximation that uses target network and truncation and is driven by a single trajectory of Markovian samples. We present the finite-sample guarantees of the algorithm, which imply a sample complexity of \(\tilde{\mathcal{O}}(\epsilon^{-2})\) up to a function approximation error. Importantly, we establish the results under minimal assumptions and do not modify the problem parameters to achieve stability.


  1. reinforcement learning
  2. \(Q\)-learning
  3. linear function approximation
  4. finite-sample analysis

MSC codes

  1. 60J20
  2. 93E20
  3. 90C40
  4. 62L20

Supplementary Materials

PLEASE NOTE: These supplementary files have not been peer-reviewed.
Index of Supplementary Materials
Title of paper: Target Network and Truncation Overcome the Deadly Triad in Q-Learning
Authors: Zaiwei Chen, John-Paul Clarke, and Siva Theja Maguluri
File: supplement.pdf
Type: PDF
Contents: The supplement additional discussions.


Information & Authors


Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 1078 - 1101
ISSN (online): 2577-0187


Submitted: 3 June 2022
Accepted: 5 September 2023
Published online: 7 December 2023


Computing + Mathematical Sciences, California Institute of Technology, Pasadena, CA 91106 USA.
John-Paul Clarke
Aerospace Engineering and Engineering Mechanics, University of Texas at Austin, Austin, TX 78712 USA.
Siva Theja Maguluri
Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA.

Funding Information

National Science Foundation: EPCN-2144316, CPS-2240982, CMMI-2112533
Funding: This work was partially supported by NSF grant EPCN-2144316, CPS-2240982, CMMI-2112533, and RTX.

View Options

