The Approximate Duality Gap Technique: A Unified Theory of First-Order Methods

We present a general technique for the analysis of first-order methods. The technique relies on the construction of a duality gap for an appropriate approximation of the objective function, where the function approximation improves as the algorithm converges. We show that in continuous time the enforcement of an invariant, which corresponds to the approximate duality gap decreasing at a certain rate, exactly recovers a wide range of first-order continuous-time methods. We characterize the discretization errors incurred by different discretization methods, and show how iteration-complexity-optimal methods for various classes of problems cancel out the discretization error. The techniques are illustrated on various classes of problems---including convex minimization for Lipschitz-continuous objectives, smooth convex minimization, composite minimization, smooth and strongly convex minimization, solving variational inequalities with monotone operators, and convex-concave saddle-point optimization---and naturally extend to other settings.

  • 1.  Z. Allen-Zhu and  L. Orecchia , Linear coupling: An ultimate unification of gradient and mirror descent , in Proceedings of the 8th Innovations in Theoretical Computer Science Conference (ITCS 2017), LIPIcs. Leibniz Int. Proc. Inform. 67 , Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Wadern , 2017 . Google Scholar

  • 2.  N. Bansal and  A. Gupta , Potential-function Proofs for First-order Methods , 2017 , https://arxiv.org/abs/1712.04581. Google Scholar

  • 3.  H. H. Bauschke and  P. L. Combettes , Convex Analysis and Monotone Operator Theory in Hilbert Spaces , CMS Books Math. 408, Springer , Berlin, 2011 . Google Scholar

  • 4.  A. Ben-Tal and  A. Nemirovski , Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications , MPS-SIAM Ser. Optim , SIAM , Philadelphia, PA , 2001 . Google Scholar

  • 5.  D. P. Bertsekas , Control of Uncertain Systems with a Set-membership Description of the Uncertainty , PhD thesis, Massachusetts Institute of Technology , Cambridge, MA , 1971 . Google Scholar

  • 6.  D. P. Bertsekas A. Nedic and  A. E. Ozdaglar , Convex Analysis and Optimization , Athena Scientific , Nashua, NH , 2003 . Google Scholar

  • 7.  S. Bubeck , Theory of Convex Optimization for Machine Learning, preprint, https://arxiv.org/abs/1405.4980v1 , 2014 . , https://arxiv.org/abs/1405.4980v1. Google Scholar

  • 8.  S. Bubeck Y. T. Lee and  M. Singh , A geometric alternative to Nesterov's accelerated gradient descent, preprint, https://arxiv.org/abs/1506.08187 , 2015 . , https://arxiv.org/abs/1506.08187. Google Scholar

  • 9.  M. B. Cohen J. Diakonikolas and  L. Orecchia , On acceleration with noise-corrupted gradients , in Proceedings of the 35th International Conference on Machine Learning, Proc. Mach. Learn. Res. 80 , 2018 , pp. 1019 -- 1028 ; available at http://proceedings.mlr.press/v80/. , http://proceedings.mlr.press/v80/. Google Scholar

  • 10.  J. Diakonikolas M. Fazel and  L. Orecchia , Width Independence beyond Linear Objectives: Distributed Fair Packing and Covering Algorithms, preprint, https://arxiv.org/abs/1808.02517 , 2018 . , https://arxiv.org/abs/1808.02517. Google Scholar

  • 11.  J. Diakonikolas and  L. Orecchia , The Approximate Duality Gap Technique: A Unified Theory of First-Order Methods, preprint, https://arxiv.org/abs/1712.02485 , 2017 . , https://arxiv.org/abs/1712.02485. Google Scholar

  • 12.  J. Diakonikolas and  L. Orecchia , Solving Packing and Covering Linear Programs in $\tilde{O}(\epsilon^{-2})$ Distributed Iterations with a Single Algorithm and Simpler Analysis, preprint, https://arxiv.org/abs/1710.09002 , 2017 . , https://arxiv.org/abs/1710.09002. Google Scholar

  • 13.  J. Diakonikolas and  L. Orecchia , Accelerated extra-gradient descent: A novel, accelerated first-order method , in Proceedings of the 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), LIPIcs. Leibniz Int. Proc. Inform. 94 , Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Wadern , 2018 . Google Scholar

  • 14.  J. Diakonikolas and  L. Orecchia , Alternating randomized block coordinate descent , in Proceedings of the 35th International Conference on Machine Learning, Proc. Mach. Learn. Res. 80 , 2018 , pp. 1224 -- 1232 ; available at http://proceedings.mlr.press/v80/. , http://proceedings.mlr.press/v80/. Google Scholar

  • 15.  D. Drusvyatskiy M. Fazel and  S. Roy , An optimal first order method based on optimal quadratic averaging , SIAM J. Optim. , 28 ( 2018 ), pp. 251 -- 271 . LinkISIGoogle Scholar

  • 16.  J. C. Duchi S. Shalev-Shwartz Y. Singer and  A. Tewari , Composite objective mirror descent , in Proceedings of 23rd Annual Conference on Learning Theory , Omnipress, Madison, WI , 2010 ; available at http://www.learningtheory.org/colt2010/papers.html. , http://www.learningtheory.org/colt2010/papers.html. Google Scholar

  • 17.  A. Ene and  H. L. Nguyen , Constrained submodular maximization: Beyond $1/e$ , in 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), IEEE Press , Piscataway, NJ , 2016 , pp. 248 -- 257 . Google Scholar

  • 18.  M. Frank and  P. Wolfe , An algorithm for quadratic programming , Naval Res. Logist. , 3 ( 1956 ), pp. 95 -- 110 . CrossrefGoogle Scholar

  • 19.  J. A. Kelner Y. T. Lee L. Orecchia and  A. Sidford , An almost-linear-time algorithm for approximate max flow in undirected graphs, and its multicommodity generalizations , in Proceedings of the Twenty-fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM , Philadelphia, PA , 2014 , pp. 217 -- 226 . Google Scholar

  • 20.  J. A. Kelner L. Orecchia A. Sidford and  Z. A. Zhu , A simple, combinatorial algorithm for solving SDD systems in nearly-linear time , in Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, Association for Computing Machinery , New York , 2013 , pp. 911 -- 920 . Google Scholar

  • 21.  G. M. Korpelevich , The extragradient method for finding saddle points and other problems, Matekon : Transl. Russ. East Eur. Math. Econ. , 13 ( 1977 ), pp. 35 -- 49 . Google Scholar

  • 22.  W. Krichene A. Bayen and  P. L. Bartlett , Accelerated mirror descent in continuous and discrete time, in Adv. Neural Inf. Process. Syst. 28 , Curran Associates , Red Hook, NY , 2015 , pp. 2845 -- 2853 . Google Scholar

  • 23.  Y. T. Lee S. Rao and  N. Srivastava , A new approach to computing maximum flows using electrical flows , in Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, Association for Computing Machinery , New York , 2013 , pp. 755 -- 764 . Google Scholar

  • 24.  H. Lin J. Mairal and  Z. Harchaoui , A universal catalyst for first-order optimization, in Adv. Neural Inf. Process. Syst. 28 , Curran Associates , Red Hook, NY , 2015 , pp. 3384 -- 3392 . Google Scholar

  • 25.  A. Nemirovski , Prox-method with rate of convergence $O(1/t)$ for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems , SIAM J. Optim. , 15 ( 2004 ), pp. 229 -- 251 . LinkISIGoogle Scholar

  • 26.  A. Nemirovskii and  D. B. Yudin , Problem Complexity and Method Efficiency in Optimization , John Wiley , New York , 1983 . Google Scholar

  • 27.  Y. Nesterov , A method of solving a convex programming problem with convergence rate $O(1/k^2)$ , Dokl. Akad. Nauk , 269 ( 1983 ), pp. 543 -- 547 . ISIGoogle Scholar

  • 28.  Y. Nesterov , Smooth minimization of non-smooth functions , Math. Program. , 103 ( 2005 ), pp. 127 -- 152 . CrossrefISIGoogle Scholar

  • 29.  Y. Nesterov , Dual extrapolation and its applications to solving variational inequalities and related problems , Math. Program. , 109 ( 2007 ), pp. 319 -- 344 . CrossrefISIGoogle Scholar

  • 30.  Y. Nesterov , Primal-dual subgradient methods for convex problems , Math. Program. , 120 ( 2009 ), pp. 221 -- 259 . CrossrefISIGoogle Scholar

  • 31.  Y. Nesterov , Universal gradient methods for convex optimization problems , Math. Program. , 152 ( 2015 ), pp. 381 -- 404 . CrossrefISIGoogle Scholar

  • 32.  Y. Nesterov , Complexity bounds for primal-dual methods minimizing the model of objective function , Math. Program. , 171 ( 2018 ), pp. 311 -- 330 . CrossrefISIGoogle Scholar

  • 33.  Y. Nesterov , Lectures on Convex Optimization , Springer , Berlin , 2018 . Google Scholar

  • 34.  D. Scieur V. Roulet F. Bach and  A. D'Aspremont , Integration methods and accelerated optimization algorithms, in Adv. Neural Inf. Process. Syst. 30 , Curran Associates , Red Hook, NY , 2017 , pp. 1109 -- 1118 . Google Scholar

  • 35.  J. Sherman , Nearly maximum flows in nearly linear time , in Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, IEEE Press , Piscataway, NJ , 2013 , pp. 263 -- 269 . Google Scholar

  • 36.  D. A. Spielman H. Teng, Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , in Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, Association for Computing Machinery , New York , 2004 , pp. 81 -- 90 . Google Scholar

  • 37.  S. Sra S. Nowozin and  S. J. Wright , Optimization for Machine Learning , MIT Press , Cambridge, MA , 2012 . Google Scholar

  • 38.  W. Su S. Boyd and  E. J. Candes , A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights , J. Mach. Learn. Res. , 17 ( 2016 ), pp. 1 -- 43 . ISIGoogle Scholar

  • 39.  P. Tseng , On Accelerated Proximal Gradient Methods for Convex-Concave Optimization, preprint, https://www.mit.edu/~dimitrib/PTseng/papers/apgm.pdf , 2008 . , https://www.mit.edu/~dimitrib/PTseng/papers/apgm.pdf. Google Scholar

  • 40.  A. Wibisono A. C. Wilson and  M. I. Jordan , A variational perspective on accelerated methods in optimization , Proc. Natl. Acad. Sci. USA , 113 ( 2016 ), pp. E7351 -- E7358 . CrossrefISIGoogle Scholar

  • 41.  A. C. Wilson B. Recht and  M. I. Jordan , A Lyapunov Analysis of Momentum Methods in Optimization, preprint, https://arxiv.org/abs/1611.02635 , 2016 . , https://arxiv.org/abs/1611.02635. Google Scholar