Abstract

Backtracking line-search is an old yet powerful strategy for finding better step sizes to be used in proximal gradient algorithms. The main principle is to locally find a simple convex upper bound of the objective function, which in turn controls the step size that is used. In case of inertial proximal gradient algorithms, the situation becomes much more difficult and usually leads to very restrictive rules on the extrapolation parameter. In this paper, we show that the extrapolation parameter can be controlled by also locally finding a simple concave lower bound of the objective function. This gives rise to a double convex-concave backtracking procedure which allows for an adaptive choice of both the step size and extrapolation parameters. We apply this procedure to the class of inertial Bregman proximal gradient methods, and prove that any sequence generated by these algorithms converges globally to a critical point of the function at hand. Numerical experiments on a number of challenging nonconvex problems in image processing and machine learning were conducted and show the power of combining inertial step and double backtracking strategy in achieving improved performances.

Keywords

  1. composite minimization
  2. proximal gradient algorithms
  3. inertial methods
  4. convex-concave backtracking
  5. non-Euclidean distances
  6. Bregman distance
  7. global convergence
  8. Kurdyka--Łojasiewicz property

MSC codes

  1. 90C25
  2. 26B25
  3. 49M27
  4. 52A41
  5. 65K05

Formats available

You can view the full content in the following formats:

Supplementary Material


PLEASE NOTE: These supplementary files have not been peer-reviewed.


Index of Supplementary Materials

Title of paper: Convex-Concave Backtracking for Inertial Bregman Proximal Gradient Algorithms in Non-Convex Optimization

Authors: Mahesh Chandra Mukkamala, Peter Ochs, Thomas Pock, and Shoham Sabach

File: supplement.pdf

Type: PDF

Contents: Additional proofs.

References

1.
H. Attouch and J. Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Math. Program., 116 (2009), pp. 5--16.
2.
H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality, Math. Oper. Res., 35 (2010), pp. 438--457.
3.
H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward--backward splitting, and regularized Gauss--Seidel methods, Math. Program., 137 (2013), pp. 91--129.
4.
A. Auslender and M. Teboulle, Interior gradient and proximal methods for convex and conic optimization, SIAM J. Optim., 16 (2006), pp. 697--725, https://doi.org/10.1137/S1052623403427823.
5.
H. H. Bauschke, J. Bolte, and M. Teboulle, A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications, Math. Oper. Res., 42 (2017), pp. 330--348.
6.
H. H. Bauschke and J. M. Borwein, Legendre functions and the method of random Bregman projections, J. Convex Anal., 4 (1997), pp. 27--67.
7.
A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183--202, https://doi.org/10.1137/080716542.
8.
J. Bolte, A. Daniilidis, and A. Lewis, The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems, SIAM J. Optim., 17 (2006), pp. 1205--1223, https://doi.org/10.1137/050644641.
9.
J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota, Clarke subgradients of stratifiable functions, SIAM J. Optim., 18 (2007), pp. 556--572, https://doi.org/10.1137/060670080.
10.
J. Bolte, A. Daniilidis, O. Ley, and L. Mazet, Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity, Trans. Amer. Math. Soc., 362 (2010), pp. 3319--3363.
11.
J. Bolte, S. Sabach, and M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Math. Program., 146 (2014), pp. 459--494.
12.
J. Bolte, S. Sabach, M. Teboulle, and Y. Vaisbourd, First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems, SIAM J. Optim., 28 (2018), pp. 2131--2151, https://doi.org/10.1137/17M1138558.
13.
R. I. BoŢ, E. R. Csetnek, and S. C. László, An inertial forward--backward algorithm for the minimization of the sum of two nonconvex functions, EURO J. Comput. Optim., 4 (2016), pp. 3--25.
14.
L. M. Bregman, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, Ž. Vyčisl. Mat i Mat. Fiz., 7 (1967), pp. 200--217.
15.
E. J. Candes, X. Li, and M. Soltanolkotabi, Phase retrieval via Wirtinger flow: Theory and algorithms, IEEE Trans. Inform. Theory, 61 (2015), pp. 1985--2007.
16.
E. J. Candes, M. B. Wakin, and S. Boyd, Enhancing sparsity by reweighted $\ell_1$ minimization, J. Fourier Anal. Appl., 14 (2008), pp. 877--905.
17.
Y. Censor and A. Lent, An iterative row-action method for interval convex programming, J. Optim. Theory Appl., 34 (1981), pp. 321--353.
18.
Y. Censor and S. A. Zenios, Proximal minimization algorithm with D-functions, J. Optim. Theory Appl., 73 (1992), pp. 451--464.
19.
H. Chang, S. Marchesini, Y. Lou, and T. Zeng, Variational phase retrieval with globally convergent preconditioned proximal algorithm, SIAM J. Imaging Sci., 11 (2018), pp. 56--93, https://doi.org/10.1137/17M1120439.
20.
G. Chen and M. Teboulle, Convergence analysis of a proximal-like minimization algorithm using Bregman functions, SIAM J. Optim., 3 (1993), pp. 538--543, https://doi.org/10.1137/0803026.
21.
J. Duchi and F. Ruan, Solving (Most) of a Set of Quadratic Equalities: Composite Optimization for Robust Phase Retrieval, preprint, https://arxiv.org/abs/1705.02356, 2017.
22.
J. Eckstein, Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming, Math. Oper. Res., 18 (1993), pp. 202--226.
23.
J. Friedman, T. Hastie, and R. Tibshirani, The Elements of Statistical Learning, in Data mining, Inference, and Prediction, Springer Ser. Statist., Springer-Verlag, New York, 2001.
24.
P. Gong, C. Zhang, L. Zhaosong, J. Z. Huang, and J. Ye, A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems, in Proceedings of the 30th International Conference on Machine Learning, S. Dasgupta and D. McAllester, eds., PMLR, 28 (2013), pp. 37--45.
25.
J.-B. Hiriart-Urruty and C. Lemarechal, Fundamentals of Convex Analysis, Springer-Verlag, Berlin, Heidelberg, 2012.
26.
K. Kurdyka, On gradients of functions definable in o-minimal structures, Ann. Inst. Fourier (Grenoble), 48 (1998), pp. 769--783.
27.
S. Łojasiewicz, Une propriété topologique des sous-ensembles analytiques réels, in Les Équations aux Dérivées Partielles (Paris, 1962), Éditions du Centre National de la Recherche Scientifique, Paris, 1963, pp. 87--89.
28.
D. R. Luke, Phase retrieval, What's new?, SIAG/OPT Views and News, 25 (2017), pp. 1--5.
29.
J. J. Moreau, Proximité et dualité dans un espace hilbertien, Bull. Soc. Math. France, 93 (1965), pp. 273--299.
30.
M. C. Mukkamala and P. Ochs, Beyond Alternating Updates for Matrix Factorization with Inertial Bregman Proximal Gradient Algorithms, preprint, https://arxiv.org/abs/1905.09050, 2019.
31.
M. C. Mukkamala, F. Westerkamp, E. Laude, D. Cremers, and P. Ochs, Bregman Proximal Framework for Deep Linear Neural Networks, preprint, https://arxiv.org/abs/1910.03638, 2019.
32.
Y. E. Nesterov, A method for solving the convex programming problem with convergence rate $O(1/k\sp{2})$, Dokl. Akad. Nauk SSSR, 269 (1983), pp. 543--547.
33.
M. Nikolova, Analysis of the recovery of edges in images and signals by minimizing nonconvex regularized least-squares, Multiscale Model. Simul., 4 (2005), pp. 960--991, https://doi.org/10.1137/040619582.
34.
P. Ochs, Long Term Motion Analysis for Object Level Grouping and Nonsmooth Optimization Methods, Ph.D. thesis, Albert-Ludwigs-Universität Freiburg, Freiburg, Germany, 2015, http://lmb.informatik.uni-freiburg.de//Publications/2015/Och15.
35.
P. Ochs, Local convergence of the heavy-ball method and iPiano for non-convex optimization, J. Optim. Theory Appl., 177 (2018), pp. 153--180.
36.
P. Ochs, Unifying abstract inexact convergence theorems and block coordinate variable metric iPiano, SIAM J. Optim., 29 (2019), pp. 541--570, https://doi.org/10.1137/17M1124085.
37.
P. Ochs, Y. Chen, T. Brox, and T. Pock, iPiano: inertial proximal algorithm for nonconvex optimization, SIAM J. Imaging Sci., 7 (2014), pp. 1388--1419, https://doi.org/10.1137/130942954.
38.
P. Ochs, J. Fadili, and T. Brox, Non-smooth non-convex Bregman minimization: Unification and new algorithms, J. Optim. Theory Appl., 181 (2019), pp. 244--278.
39.
T. Pock and S. Sabach, Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems, SIAM J. Imaging Sci., 9 (2016), pp. 1756--1787, https://doi.org/10.1137/16M1064064.
40.
B. T. Polyak, Some methods of speeding up the convergence of iterative methods, Ž. Vyčisl. Mat i Mat. Fiz., 4 (1964), pp. 791--803.
41.
R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Grundlehren Math. Wiss. 317, Springer-Verlag, Berlin, 1998.
42.
W. Su, S. Boyd, and E. Candes, A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights, in Proceedings of the 27th International Conference on Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds., Curran Associates, Red Hook, NY, 2014, pp. 2510--2518.
43.
M. Teboulle, Entropic proximal mappings with application to nonlinear programming, Math. Oper. Res., 17 (1992), pp. 670--690.
44.
M. Teboulle, A simplified view of first order methods for optimization, Math. Program., 170 (2018), pp. 67--96.
45.
G. Wang, G. B. Giannakis, and Y. C. Eldar, Solving systems of random quadratic equations via truncated amplitude flow, IEEE Trans. Inform. Theory, 64 (2018), pp. 773--794.
46.
B. Wen, X. Chen, and T. K. Pong, Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems, SIAM J. Optim., 27 (2017), pp. 124--145, https://doi.org/10.1137/16M1055323.
47.
F. Wenand, L. Chu, P. Liu, and R. C. Qiu, A Survey on Nonconvex Regularization Based Sparse and Low-Rank Recovery in Signal Processing, Statistics, and Machine Learning, preprint, https://arxiv.org/abs/1808.05403, 2018.

Information & Authors

Information

Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 658 - 682
ISSN (online): 2577-0187

History

Submitted: 5 November 2019
Accepted: 14 May 2020
Published online: 6 August 2020

Keywords

  1. composite minimization
  2. proximal gradient algorithms
  3. inertial methods
  4. convex-concave backtracking
  5. non-Euclidean distances
  6. Bregman distance
  7. global convergence
  8. Kurdyka--Łojasiewicz property

MSC codes

  1. 90C25
  2. 26B25
  3. 49M27
  4. 52A41
  5. 65K05

Authors

Affiliations

Funding Information

Deutsche Forschungsgemeinschaft https://doi.org/10.13039/501100001659 : OC 150/1-1
H2020 European Research Council https://doi.org/10.13039/100010663 : 640156

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media