Abstract

We propose an inexact variable-metric proximal point algorithm to accelerate gradient-based optimization algorithms. The proposed scheme, called QNing, can notably be applied to incremental first-order methods such as the stochastic variance-reduced gradient descent algorithm and other randomized incremental optimization algorithms. QNing is also compatible with composite objectives, meaning that it has the ability to provide exactly sparse solutions when the objective involves a sparsity-inducing regularization. When combined with limited-memory BFGS rules, QNing is particularly effective at solving high-dimensional optimization problems while enjoying a worst-case linear convergence rate for strongly convex problems. We present experimental results where QNing gives significant improvements over competing methods for training machine learning methods on large samples and in high dimensions.

Keywords

  1. convex optimization
  2. quasi-Newton
  3. L-BFGS

MSC codes

  1. 90C25
  2. 90C53

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
G. Andrew and J. Gao, Scalable training of $L_1$-regularized log-linear models, in Proceedings of the 24th International Conference on Machine Learning, Association for Computing Machinery, New York, 2007, pp. 33--40.
2.
A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183--202.
3.
A. Beck and M. Teboulle, Smoothing and first order methods: A unified framework, SIAM J. Optim., 22 (2012), pp. 557--580.
4.
S. Becker and J. Fadili, A quasi-Newton proximal splitting method, in Adv. Neural Inf. Process. Syst. 25, Curran Associates, Red Hook, NY, 2012, pp. 2618--2626.
5.
D. P. Bertsekas, Convex Optimization Algorithms, Athena Scientific, Nashua, NH, 2015.
6.
J.-F. Bonnans, J. C. Gilbert, C. Lemaréchal, and C. A. Sagastizábal, Numerical Optimization: Theoretical and Practical Aspects, 2nd edn, Springer, Berlin, 2006.
7.
J. Burke and M. Qian, On the superlinear convergence of the variable metric proximal point algorithm using Broyden and BFGS matrix secant updating, Math. Program., 88 (2000), pp. 157--181.
8.
R. H. Byrd, G. M. Chin, J. Nocedal, and Y. Wu, Sample size selection in optimization methods for machine learning, Math. Program., 134 (2012), pp. 127--155.
9.
R. H. Byrd, S. L. Hansen, J. Nocedal, and Y. Singer, A stochastic quasi-Newton method for large-scale optimization, SIAM J. Optim., 26 (2016), pp. 1008--1031.
10.
R. H. Byrd, J. Nocedal, and F. Oztoprak, An inexact successive quadratic approximation method for $L-1$ regularized optimization, Math. Program., 157 (2015), pp. 375--396.
11.
R. H. Byrd, J. Nocedal, and Y.-X. Yuan, Global convergence of a case of quasi-Newton methods on convex problems, SIAM J. Numer. Anal., 24 (1987), pp. 1171--1190.
12.
C. Cartis and K. Scheinberg, Global convergence rate analysis of unconstrained optimization methods based on probabilistic models, Math. Program., 169 (2018), pp. 337--375.
13.
C. Chang and C. Lin, LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Tech., 2 (2011), No. 27.
14.
X. Chen and M. Fukushima, Proximal quasi-Newton methods for nondifferentiable convex optimization, Math. Program., 85 (1999), pp. 313--334.
15.
A. Defazio, F. Bach, and S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, in Adv. Neural Inf. Process. Syst. 27, Curran Associates, Red Hook, NY, 2014, pp. 1646--1654.
16.
A. Defazio, J. Domke, and T. S. Caetano, Finito: A faster, permutable incremental gradient method for big data problems, in Proceedings of the International Conference on Machine Learning 2014, Proc. Mach. Learn. Res. 32, 2014, pp. 1125--1133; available at http://proceedings.mlr.press/v32/.
17.
J. E. Dennis and J. J. Moré, A characterization of superlinear convergence and its application to quasi-Newton methods, Math. Comp., 28 (1974), pp. 549--560.
18.
J. C. Duchi, P. L. Bartlett, and M. J. Wainwright, Randomized smoothing for stochastic optimization, SIAM J. Optim., 22 (2012), pp. 674--701.
19.
M. Elad, Sparse and Redundant Representations, Springer, New York, 2010.
20.
M. P. Friedlander and M. Schmidt, Hybrid deterministic-stochastic methods for data fitting, SIAM J. Sci. Comput., 34 (2012), pp. A1380--A1405.
21.
R. Frostig, R. Ge, S. M. Kakade, and A. Sidford, Un-regularizing: Approximate proximal point and faster stochastic algorithms for empirical risk minimization, in Proceedings of the 32nd International Conference on Machine Learning, Proc. Mach. Learn. Res. 37, 2015, pp. 2540--2548; available at http://proceedings.mlr.press/v37/.
22.
M. Fuentes, J. Malick, and C. Lemaréchal, Descentwise inexact proximal algorithms for smooth optimization, Comput. Optim. Appl., 53 (2012), pp. 755--769.
23.
M. Fukushima and L. Qi, A globally and superlinearly convergent algorithm for nonsmooth convex minimization, SIAM J. Optim., 6 (1996), pp. 1106--1120.
24.
S. Ghadimi, G. Lan, and H. Zhang, Generalized Uniformly Optimal Methods for Nonlinear Programming, preprint, https://arxiv.org/pdf/1508.07384, 2015.
25.
H. Ghanbari and K. Scheinberg, Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates, Comput. Optim. Appl., 69 (2018), pp. 597--627.
26.
R. M. Gower, D. Goldfarb, and P. Richtárik, Stochastic block BFGS: Squeezing more curvature out of data, in Proceedings of the International Conference on Machine Learning 2016, Proc. Mach. Learn. Res. 48, 2016, pp. 1869--1878; available at http://proceedings.mlr.press/v48/.
27.
O. Güler, New proximal point algorithms for convex minimization, SIAM J. Optim., 2 (1992), pp. 649--664.
28.
J.-B. Hiriart-Urruty and C. Lemaréchal, Convex Analysis and Minimization Algorithms I, Grundlehren Math. Wiss. 305, Springer, Berlin, 1996.
29.
J.-B. Hiriart-Urruty and C. Lemaréchal, Convex Analysis and Minimization Algorithms. II, Grundlehren Math. Wiss. 306, Springer, Berlin, 1996.
30.
C. Lee and S. J. Wright, Inexact Successive Quadratic Approximation for Regularized Optimization, preprint, https://arxiv.org/abs/1803.01298, 2018.
31.
J. Lee, Y. Sun, and M. Saunders, Proximal Newton-type methods for convex optimization, in Adv. Neural Inf. Process. Syst. 25, Curran Associates, Red Hook, NY, 2012, pp. 827--835.
32.
C. Lemaréchal and C. Sagastizábal, Practical aspects of the Moreau--Yosida regularization: Theoretical preliminaries, SIAM J. Optim., 7 (1997), pp. 367--385.
33.
H. Lin, J. Mairal, and Z. Harchaoui, A universal catalyst for first-order optimization, in Adv. Neural Inf. Process. Syst. 28, Curran Associates, Red Hook, NY, 2015, pp. 3384--3392.
34.
H. Lin, J. Mairal, and Z. Harchaoui, Catalyst acceleration for first-order convex optimization: From theory to practice, J. Mach. Learn. Res., 18 (2018), pp. 7854--7907.
35.
D. C. Liu and J. Nocedal, On the limited memory BFGS method for large scale optimization, Math. Program., 45 (1989), pp. 503--528.
36.
X. Liu, C.-J. Hsieh, J. D. Lee, and Y. Sun, An Inexact Subsampled Proximal Newton-Type Method for Large-Scale Machine Learning, preprint, https://arxiv.org/pdf/1708.08552, 2017.
37.
J. Mairal, Sparse Coding for Machine Learning, Image Processing and Computer Vision, Ph.D. thesis, École normale supérieure, Cachan, 2010.
38.
J. Mairal, Incremental majorization-minimization optimization with application to large-scale machine learning, SIAM J. Optim., 25 (2015), pp. 829--855.
39.
J. Mairal, End-to-end kernel learning with supervised convolutional kernel networks, in Adv. Neural Inf. Process. Syst. 29, Curran Associates, Red Hook, NY, 2016, pp. 1399--1407.
40.
J. Mairal, F. Bach, and J. Ponce, Sparse modeling for image and vision processing, Found. Trends Comput. Graph. Vision, 8 (2014), pp. 85--283.
41.
R. Mifflin, A quasi-second-order proximal bundle algorithm, Math. Program., 73 (1996), pp. 51--72.
42.
A. Mokhtari and A. Ribeiro, Global convergence of online limited memory BFGS, J. Mach. Learn. Res., 16 (2015), pp. 3151--3181.
43.
J.-J. Moreau, Fonctions convexes duales et points proximaux dans un espace Hilbertien, C. R. Acad. Sci. Paris Sér. A Math., 255 (1962), pp. 2897--2899.
44.
P. Moritz, R. Nishihara, and M. I. Jordan, A linearly-convergent stochastic L-BFGS algorithm, in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Proc. Mach. Learn. Res. 51, 2016, pp. 249--258, available at http://proceedings.mlr.press/v51/.
45.
Y. Nesterov, A method of solving a convex programming problem with convergence rate $O(1/k^2)$, Soviet Math. Dokl., 27 (1983), pp. 372--376.
46.
Y. Nesterov, Smooth minimization of non-smooth functions, Math. Program., 103 (2005), pp. 127--152.
47.
Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim., 22 (2012), pp. 341--362.
48.
Y. Nesterov, Gradient methods for minimizing composite functions, Math. Program., 140 (2013), pp. 125--161.
49.
J. Nocedal, Updating quasi-Newton matrices with limited storage, Math. Comp., 35 (1980), pp. 773--782.
50.
J. Nocedal and S. Wright, Numerical Optimization, Springer Ser. Oper. Res. Financ. Eng., Springer, New York, 2006.
51.
M. Razaviyayn, M. Hong, and Z.-Q. Luo, A unified convergence analysis of block successive minimization methods for nonsmooth optimization, SIAM J. Optim., 23 (2013), pp. 1126--1153.
52.
P. Richtárik and M. Takáč, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Math. Program., 144 (2014), pp. 1--38.
53.
R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optim., 14 (1976), pp. 877--898.
54.
K. Scheinberg and X. Tang, Practical inexact proximal quasi-Newton method with global complexity analysis, Math. Program., 160 (2016), pp. 495--529.
55.
M. Schmidt, D. Kim, and S. Sra, Projected Newton-Type Methods in Machine Learning, MIT Press, Cambridge, MA, 2011, pp. 305--330.
56.
M. Schmidt, N. L. Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Math. Program., 160 (2017), pp. 83--112.
57.
N. N. Schraudolph, J. Yu, and S. Günter, A stochastic quasi-Newton method for online convex optimization, in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, Proc. Mach. Learn. Res. 2, 2007, pp. 436--443; available at http://proceedings.mlr.press/v2/.
58.
S. Shalev-Shwartz and T. Zhang, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, Math. Program., 155 (2016), pp. 105--145.
59.
L. Stella, A. Themelis, and P. Patrinos, Forward-backward quasi-Newton methods for nonsmooth optimization problems, Comput. Optim. Appl., 67 (2017), pp. 443--487.
60.
L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), pp. 2057--2075.
61.
K. Yosida, Functional Analysis, Springer, Berlin, 1980.
62.
J. Yu, S. Vishwanathan, S. Günter, and N. N. Schraudolph, A quasi-Newton approach to non-smooth convex optimization, in Proceedings of the 25th International Conference on Machine Learning, Association for Computing Machinery, New York, 2008, pp. 1216--1223.
63.
H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. R. Stat. Soc. Ser. B. Stat. Methodol., 67 (2005), pp. 301--320.

Information & Authors

Information

Published In

cover image SIAM Journal on Optimization
SIAM Journal on Optimization
Pages: 1408 - 1443
ISSN (online): 1095-7189

History

Submitted: 10 April 2017
Accepted: 22 January 2019
Published online: 28 May 2019

Keywords

  1. convex optimization
  2. quasi-Newton
  3. L-BFGS

MSC codes

  1. 90C25
  2. 90C53

Authors

Affiliations

Funding Information

Agence Nationale de la Recherche https://doi.org/10.13039/501100001665 : ANR-14-CE23-0003-01
Canadian Institute for Advanced Research https://doi.org/10.13039/100007631
H2020 European Research Council https://doi.org/10.13039/100010663 : 714381

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.