Acceleration of Stochastic Approximation by Averaging

A new recursive algorithm of stochastic approximation type with the averaging of trajectories is investigated. Convergence with probability one is proved for a variety of classical optimization and identification problems. It is also demonstrated for these problems that the proposed algorithm achieves the highest possible rate of convergence.

  • [1]  M. A. Aizerman, E. M. Braverman and , L. I. Rozonoer, The Method of Potential Functions in the Machine Learning Theory, Nauka, Moscow, 1970, (In Russian.) Google Scholar

  • [2]  A. M. Benderskij and , M. B. Nevel'son, Multidimensional asymptotically optimal stochastic approximation, problems inform. transmission, 17 (1982), 423–434 Google Scholar

  • [3]  A. Benveniste, M. Metivier and , P. Priouret, Algorithmes Adaptatifs et Approximations Stochastiques (Théorie et Applications), Masson, Paris, 1987 Google Scholar

  • [4]  Nadav Berman, Arie Feuer and , Elias Wahnon, Convergence analysis of smoothed stochastic gradient-type algorithm, Internat. J. Systems Sci., 18 (1987), 1061–1078 88c:62132 0627.93080 CrossrefISIGoogle Scholar

  • [5]  Yu. M. Ermol'ev, Stochastic Programming Methods, Nauka, 1976Moscow, (In Russian.) Google Scholar

  • [6]  V. Fabian, Asymptotically efficient stochastic approximation; the ${\rm RM}$ case, Ann. Statist., 1 (1973), 486–495 52:2086 0258.62048 CrossrefISIGoogle Scholar

  • [7]  V. Fabian, On asymptotically efficient recursive estimation, Ann. Statist., 6 (1978), 854–866 57:17983 0378.62031 CrossrefISIGoogle Scholar

  • [8]  V. N. Fomin, Recursive Estimation and Adaptive Filtering, Nauka, Moscow, 1984, (In Russian.) Google Scholar

  • [9]  K. S. Fu, Sequential Methods in Pattern Recognition and Machine Learning, Academic Press, New York, London, 1968 0188.52303 Google Scholar

  • [10]  G. S. Goodwin and , K. S. Sin, Adaptive Filtering, Prediction and Control, Prentice-Hall, Englewood Cliffs, NJ, 1984 0653.93001 Google Scholar

  • [11]  A. M. Gupal and , L. G. Bajenov, Stochastic analog of the conjugate gradients method, Cybernetics, N1 (1972), 125–126, (In Russian.) Google Scholar

  • [12]  J. Kiefer and , J. Wolfowitz, Stochastic estimation of the maximum of a regression function, Ann. Math. Statistics, 23 (1952), 462–466 14,299e 0049.36601 CrossrefISIGoogle Scholar

  • [13]  I. P. Kornfel'd and , Sh. E. Shteinberg, Estimation of parameters of linear and nonlinear stochastic systems using the method of averaged residuals, Automate Remote Control, 46 (1986), 966–974 ISIGoogle Scholar

  • [14]  A. P. Korostelev, Stocastic Recurrent Procedures, Nauka, Moscow, 1981, (In Russian.) Google Scholar

  • [15]  A. P. Korostelev, On multi-step stochastic optimization procedures, Automat. Remote Control, 43 (1982), 606–611 Google Scholar

  • [16]  Harold J. Kushner and , Dean S. Clark, Stochastic approximation methods for constrained and unconstrained systems, Applied Mathematical Sciences, Vol. 26, Springer-Verlag, New York, 1978x+261 80g:62065 0381.60004 CrossrefGoogle Scholar

  • [17]  R. Sh. Liptser and , A. N. Shiryaev, Martingale Theory, Nauka, Moscow, 1986, (In Russian.) 0654.60035 Google Scholar

  • [18]  Lennart Ljung and , Torsten Söderström, Theory and practice of recursive identification, MIT Press Series in Signal Processing, Optimization, and Control, 4, MIT Press, Cambridge, Mass., 1983xx+529 84k:93002 0548.93075 Google Scholar

  • [19]  A. V. Nazin, Informational bounds for gradient stochastic optimization and optimal implemented algorithms, Automat. Remote Control, 50 (1989), 520–531 ISIGoogle Scholar

  • [20]  A. S. Nemirovskij and , D. B. Yudin, Complexity of Problems and Effectiveness of Optimization Methods, Nauka, Moscow, 1980, (In Russian.) Google Scholar

  • [21]  M. B. Nevel'son and , R. Z. Kkas'minskij, Stochastic approximation and recursive estimation, American Mathematical Society, Providence, R. I., 1973iv+244 54:11689 Google Scholar

  • [22]  M. B. Nevel'son and , R. Z. Kkas'minskij, Adaptive Robbins-Monro procedure, Automat. Remote Control, 34 (1974), 1594–1607 ISIGoogle Scholar

  • [23]  B. T. Polyak, Comparison of convergence rate for single-step and multi-step optimization algorithms in the presence of noise, Engrg. Cybernet., 15 (1977), 6–10 Google Scholar

  • [24]  B. T. Polyak, A new method of stochastic approximation type, Avtomat. i Telemekh., (1990), 98–107 91j:90056 Google Scholar

  • [25]  Boris T. Polyak, Introduction to optimization, Translations Series in Mathematics and Engineering, Optimization Software Inc. Publications Division, New York, 1987xxvii+438 92b:49001 Google Scholar

  • [26]  B. T. Polyak and , Ya. Z. Tsypkin, Attainable accuracy of adaptation algorithmsProblems of Cybernetics. Adaptive Systems, Nauka, Moscow, 1976, 6–19, (In Russian.) Google Scholar

  • [27]  B. T. Polyak and , Ya. Z. Tsypkin, Adaptive estimation algorithms (convergence, optimality, stability), Automat. Remote Control, 40 (1980), 378–389 0418.93077 ISIGoogle Scholar

  • [28]  B. T. Polyak and , Ya. Z. Tsypkin, Optimal pseudogradient adaptation algorithms, Automat. Remote Control, 41 (1981), 1101–1110 0462.49036 ISIGoogle Scholar

  • [29]  H. Robbins and , S. Monroe, A stochastic approximation method, Ann. Math. Statistics, 22 (1951), 400–407 13,144j 0054.05901 CrossrefISIGoogle Scholar

  • [30]  H. Robbins and , D. Siegmund, J. S. Rustaji, A convergence theorem for non negative almost supermartingales and some applicationsOptimizing methods in statistics (Proc. Sympos., Ohio State Univ., Columbus, Ohio, 1971), Academic Press, New York, 1971, 233–257 49:8097 0286.60025 Google Scholar

  • [31]  David Ruppert, A Newton-Raphson version of the multivariate Robbins-Monro procedure, Ann. Statist., 13 (1985), 236–245 86f:62141 0571.62072 CrossrefISIGoogle Scholar

  • [32]  D. Ruppert, Efficient estimators from a slowly convergent Robbins-Monro process, Tech. Report, 781, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY, 1988 Google Scholar

  • [33]  Andrzej Ruszczynski and , Wojciech Syski, Stochastic approximation method with gradient averaging for unconstrained problems, IEEE Trans. Automat. Control, 28 (1983), 1097–1105 10.1109/TAC.1983.1103184 86b:49041 0533.62076 CrossrefISIGoogle Scholar

  • [34]  D. T. Sakrison, A. V. Balakrishnan, Stochastic approximation: A recursive method for solving regression problemsAdvances in Communication Theory and Applications, Vol. 2, Academic Press, New York, London, 1966, 51–106 Google Scholar

  • [35]  A. N. Shiryaev, Probability, Nauka, Moscow, 1980, (In Russian.) Google Scholar

  • [36]  Ya. Z. Tsypkin, Adaptation and learning in automatic systems, Academic Press, New York, 1971xix+291, London 57:4680 Google Scholar

  • [37]  Ya. Z. Tsypkin, Foundations of Informational Theory of Identification, Nauka, Moscow, 1984, (In Russian.) Google Scholar

  • [38]  J. H. Venter, An extension of the Robbins-Monro procedure, Ann. Math. Statist., 38 (1967), 181–190 34:5225 0158.36901 CrossrefISIGoogle Scholar

  • [39]  M. L. Vil'k and , S. V. Shil'man, Convergence and optimality of implementable, adaptation algorithms (informational approach), Problems Inform. Transmission, 20 (1985), 314–326 Google Scholar

  • [40]  M. T. Wasan, Stochastic approximation, Cambridge Tracts in Mathematics and Mathematical Physics, No. 58, Cambridge University Press, London, 1969x+202 40:975 0293.62026 Google Scholar