Abstract

In distributional reinforcement learning, the entire distribution of the return instead of just the expected return is modeled. The approach with categorical distributions as the approximation method is well-known in Q-learning, and convergence results have been established in the tabular case. In this work, speedy Q-learning is extended to categorical distributions, a finite-time analysis is performed, and probably approximately correct bounds in terms of the Cramér distance are established. It is shown that also in the distributional case the new update rule yields faster policy evaluation in comparison to the standard Q-learning one and that the sample complexity is essentially the same as the one of the value-based algorithmic counterpart. Without the need for more state-action-reward samples, one gains significantly more information about the return with categorical distributions. Even though the results do not easily extend to the case of policy control, a slight modification to the update rule yields promising numerical results.

Keywords

  1. reinforcement learning
  2. distributional reinforcement learning
  3. Q-learning
  4. PAC bounds
  5. complexity analysis

MSC codes

  1. 68Q25
  2. 68Q32
  3. 68T05
  4. 68T42

Get full access to this article

View all available purchase options and get full access to this article.

Supplementary Material


PLEASE NOTE: These supplementary files have not been peer-reviewed.


Index of Supplementary Materials

Title of paper: Speedy Categorical Distributional Reinforcement Learning and Complexity Analysis

Authors: Markus Bock and Clemens Heitzinger

File: source-code.zip

Type: Zip file

Contents: Contains commented source code to recreate all figures of the paper. Run with julia main.jl.

References

1.
M. G. Bellemare, W. Dabney, and R. Munos, A distributional perspective on reinforcement learning, in Proceedings of the 34th International Conference on Machine Learning, 2017, pp. 449--458.
2.
R. Bellman, Dynamic Programming, Princeton University Press, Princeton, NJ, 1957.
3.
E. Even-Dar and Y. Mansour, Learning rates for $Q$-learning, J. Mach. Learn. Res., 5 (2004), p. 1--25.
4.
M. Ghavamzadeh, H. J. Kappen, M. G. Azar, and R. Munos, Reinforcement Learning with a Near Optimal Rate of Convergence, Technical report, INRIA, Oct. 2011, ID 00636615v2.
5.
M. Ghavamzadeh, H. J. Kappen, M. G. Azar, and R. Munos, Speedy $Q$-learning, in Proceedings of the Conference on Neural Information Processing Systems, 2011, pp. 2411--2419.
6.
O. Kallenberg, Random Measures, Theory and Applications, Springer, Cham, 2017.
7.
C. Lyle, P. S. Castro, and M. G. Bellemare, A Comparative Analysis of Expected and Distributional Reinforcement Learning, preprint, arXiv:1901.11084 [cs.LG], 2019.
8.
M. Rowland, M. Bellemare, W. Dabney, R. Munos, and Y. W. Teh, An analysis of categorical distributional reinforcement learning, in Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, 2018, pp. 29--37.
9.
R. Sutton, Learning to predict by the method of temporal differences, Mach. Learn., 3 (1988), pp. 9--44.
10.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, second ed., The MIT Press, Cambridge, MA, 2018.
11.
C. J. C. H. Watkins and P. Dayan, Q-learning, Mach. Learn., 8 (1992), pp. 279--292.

Information & Authors

Information

Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 675 - 693
ISSN (online): 2577-0187

History

Submitted: 4 September 2020
Accepted: 18 February 2022
Published online: 6 June 2022

Keywords

  1. reinforcement learning
  2. distributional reinforcement learning
  3. Q-learning
  4. PAC bounds
  5. complexity analysis

MSC codes

  1. 68Q25
  2. 68Q32
  3. 68T05
  4. 68T42

Authors

Affiliations

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.