Singularly Perturbed Markov Decision Processes: A Multiresolution Algorithm

Singular perturbation techniques allow the derivation of an aggregate model whose solution is asymptotically optimal for Markov decision processes with strong and weak interactions. We develop an algorithm that takes advantage of the asymptotic optimality of the aggregate model in order to compute the solution of the original model. We derive conditions for which the proposed algorithm has better worst case complexity than conventional contraction algorithms. Based on our complexity analysis, we show that the major benefit of aggregation is that the reduced order model is no longer ill conditioned. The reduction in the number of states (due to aggregation) is a secondary benefit. This is a surprising result since intuition would suggest that the reduced order model can be solved more efficiently because it has fewer states. However, we show that this is not necessarily the case. Our theoretical analysis and numerical experiments show that the proposed algorithm can compute the optimal solution with a reduction in computational complexity and without any penalty in accuracy.

  • 1.  M. Bard and  C. March , Multiscale singular perturbations and homogenization of optimal control problems , Ser. Adv. Math. Appl. Sci. , 76 ( 2008 ), pp. 1 -- 27 . CrossrefGoogle Scholar

  • 2.  D. P. Dynamic Bertsekas Programming and  Optimal Control : 2 , Athena Scientific , Nashua, NH , 2007 . Google Scholar

  • 3.  W. L. Briggs V. E. Henson and  S. F. McCormick , A Multigrid Tutorial , SIAM , Philadelphia , 2000 . Google Scholar

  • 4.  G. C. Calafiore , Random convex programs , SIAM J. Optim. , 20 ( 2010 ), pp. 3427 -- 3464 . LinkISIGoogle Scholar

  • 5.  G. Calafiore and  M. C. Campi , Uncertain convex programs: Randomized solutions and confidence levels , Math. Program. Ser. A , 102 ( 2005 ), pp. 25 -- 46 . CrossrefISIGoogle Scholar

  • 6.  C. Chow and  J. N. Tsitsiklis , An optimal one-way multigrid algorithm for discrete-time stochastic control , IEEE Trans. Automat. Control , 36 ( 1991 ), pp. 898 -- 914 . CrossrefISIGoogle Scholar

  • 7.  P. D. Christofides , Control and Optimization of Multiscale Process Systems , Birkhäuser , Basel , Switzerland , 2009 . Google Scholar

  • 8.  D. . de Farias and B. V. Roy, On constraint sampling in the linear programming approach to approximate dynamic programming , Math. Oper. Res. , 29 ( 2004 ), pp. 462 -- 478 . CrossrefISIGoogle Scholar

  • 9.  W. Hackbusch , Multi-Grid Methods and Applications , Springer , New York , 2003 . Google Scholar

  • 10.  P. Kokotovic H. K. Khali and  J. O'reilly , Singular Perturbation Methods in Control: Analysis and Design , Vol. 25 , SIAM , Philadelphia , 1987 . Google Scholar

  • 11.  S. P. Meyn , Control Techniques for Complex Networks , Cambridge University Press , Cambridge , 2008 . Google Scholar

  • 12.  P. Parpas and  M. Webster , A stochastic multiscale model for electricity generation capacity expansion , European J. Oper. Res. , 232 ( 2014 ), pp. 359 -- 374 . CrossrefISIGoogle Scholar

  • 13.  W. B. Powell , Approximate Dynamic Programming: Solving the Curses of Dimensionality , Wiley , New York , 2011 . Google Scholar

  • 14.  C. Schutte S. Winkelmann and  C. Hartmann , Optimial control of molecular dynamics using Markov state models , Math. Program. Ser. B , 134 ( 2012 ), pp. 259 -- 282 . CrossrefISIGoogle Scholar

  • 15.  S. P. Sethi and  Q. Zhang , Hierarchical Decision Making in Stochastic Manufacturing Systems , Birkhäuser , Basel, Switzerland , 1994 . Google Scholar

  • 16.  H. A. Simon and  A. Ando , Aggregation of variables in dynamic systems , Econometrica , 29 ( 1961 ), pp. pp. 111 -- 138 . CrossrefISIGoogle Scholar

  • 17.  L. N. Trefethen and  D. Bau , Numerical Linear Algebra , SIAM , Philadelphia , 1997 . Google Scholar

  • 18.  G. Yin and  Q. Zhang , Continuous-Time Markov Chains and Applications , 3 rd ed., Springer , New York , 2013 . Google Scholar