Abstract

We introduce two algorithms for nonconvex regularized finite sum minimization, where typical Lipschitz differentiability assumptions are relaxed to the notion of relative smoothness. The first one is a Bregman extension of Finito/MISO [A. Defazio and J. Domke, Proc. Mach. Learn. Res. (PMLR), 32 (2014), pp. 1125--1133; J. Mairal, SIAM J. Optim., 25 (2015), pp. 829--855], studied for fully nonconvex problems when the sampling is randomized, or under convexity of the nonsmooth term when it is essentially cyclic. The second algorithm is a low-memory variant, in the spirit of SVRG [R. Johnson and T. Zhang, Advances in Neural Information Processing Systems 26, Curran Associates, Red Hook, NY, 2013, pp. 315--323] and SARAH [L. M. Nguyen et al., Proc. Mach. Learn. Res. (PMLR), 70 (2017), pp. 2613--2621], that also allows for fully nonconvex formulations. Our analysis is made remarkably simple by employing a Bregman--Moreau envelope as the Lyapunov function. In the randomized case, linear convergence is established when the cost function is strongly convex, yet with no convexity requirements on the individual functions in the sum. For the essentially cyclic and low-memory variants, global and linear convergence results are established when the cost function satisfies the Kurdyka--Łojasiewicz property.

Keywords

  1. nonsmooth nonconvex optimization
  2. incremental aggregated algorithms
  3. Bregman--Moreau envelope
  4. KL inequality

MSC codes

  1. 90C06
  2. 90C25
  3. 90C26
  4. 49J52
  5. 49J53

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
M. Ahookhosh, A. Themelis, and P. Patrinos, A Bregman forward-backward linesearch algorithm for nonconvex composite optimization: Superlinear convergence to nonisolated local minima, SIAM J. Optim., 31 (2021), pp. 653--685.
2.
Z. Allen-Zhu and Y. Yuan, Improved SVRG for non-strongly-convex or sum-of-non-convex objectives, in Int. Conf. Mach. Learn., 2016, pp. 1080--1089.
3.
H. Attouch and J. Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Math. Program., 116 (2009), pp. 5--16.
4.
H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality, Math. Oper. Res., 35 (2010), pp. 438--457.
5.
H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Math. Program., 137 (2013), pp. 91--129.
6.
H. H. Bauschke, J. Bolte, J. Chen, M. Teboulle, and X. Wang, On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity, J. Optim. Theory Appl., 182 (2019), pp. 1068--1087.
7.
H. H. Bauschke, J. Bolte, and M. Teboulle, A descent lemma beyond Lipschitz gradient continuity: First-order methods revisited and applications, Math. Oper. Res., 42 (2017), pp. 330--348.
8.
H. H. Bauschke and J. M. Borwein, Legendre functions and the method of random Bregman projections, J. Convex Anal., 4 (1997), pp. 27--67.
9.
H. H. Bauschke, J. M. Borwein, and P. L. Combettes, Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces, Commun. Contemp. Math., 03 (2001), pp. 615--647.
10.
A. Beck, First-Order Methods in Optimization, SIAM, Philadelphia, 2017.
11.
A. Beck and M. Teboulle, Mirror descent and nonlinear projected subgradient methods for convex optimization, Oper. Res. Lett., 31 (2003), pp. 167--175.
12.
A. Beck and L. Tetruashvili, On the convergence of block coordinate descent type methods, SIAM J. Optim., 23 (2013), pp. 2037--2060.
13.
D. P. Bertsekas, Incremental proximal methods for large scale convex optimization, Math. Program., 129 (2011), pp. 163--195.
14.
D. P. Bertsekas, Convex Optimization Theory, Athena Scientific, Belmont, MA, 2009.
15.
D. P. Bertsekas and J. N. Tsitsiklis, Gradient convergence in gradient methods with errors, SIAM J. Optim., 10 (2000), pp. 627--642.
16.
D. Blatt, A. O. Hero, and H. Gauchman, A convergent incremental gradient method with a constant step size, SIAM J. Optim., 18 (2007), pp. 29--51.
17.
J. Bolte, A. Daniilidis, and A. Lewis, The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems, SIAM J. Optim., 17 (2007), pp. 1205--1223.
18.
J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota, Clarke subgradients of stratifiable functions, SIAM J. Optim., 18 (2007), pp. 556--572.
19.
J. Bolte, S. Sabach, and M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Math. Program., 146 (2014), pp. 459--494.
20.
J. Bolte, S. Sabach, M. Teboulle, and Y. Vaisbourd, First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems, SIAM J. Optim., 28 (2018), pp. 2131--2151.
21.
E. J. Candes, X. Li, and M. Soltanolkotabi, Phase retrieval via Wirtinger flow: Theory and algorithms, IEEE Trans. Inform. Theory, 61 (2015), pp. 1985--2007.
22.
G. Chen and M. Teboulle, Convergence analysis of a proximal-like minimization algorithm using Bregman functions, SIAM J. Optim., 3 (1993), pp. 538--543.
23.
Y. Chen and E. Candes, Solving random quadratic systems of equations is nearly as easy as solving linear systems, in Advances in Neural Information Processing Systems 28, Curran Associates, Red Hook, NY, 2015, pp. 739--747.
24.
Y. T. Chow, T. Wu, and W. Yin, Cyclic coordinate-update algorithms for fixed-point problems: Analysis and applications, SIAM J. Sci. Comput., 39 (2017), pp. A1280--A1300.
25.
D. Davis, D. Drusvyatskiy, and K. J. MacPhee, Stochastic Model-based Minimization under High-order Growth, preprint, arXiv:1807.00255, 2018.
26.
D. Davis, D. Drusvyatskiy, and C. Paquette, The nonsmooth landscape of phase retrieval, IMA J. Numer. Anal., 40 (2020), pp. 2652--2695.
27.
A. Defazio, F. Bach, and S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, in Advances in Neural Information Processing Systems 27, Curran Associates, Red Hook, NY, 2014, pp. 1646--1654.
28.
A. Defazio and J. Domke, Finito: A faster, permutable incremental gradient method for big data problems, Proc. Mach. Learn. Res. (PMLR), 32 (2014), pp. 1125--1133.
29.
J. C. Duchi and F. Ruan, Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval, Inf. Inference, 8 (2019), pp. 471--529.
30.
J. Friedman, T. Hastie, and R. Tibshirani, The Elements of Statistical Learning, Vol. 1, Springer Ser. Statist. New York, 2001.
31.
T. Gao, S. Lu, J. Liu, and C. Chu, Randomized Bregman Coordinate Descent Methods for Non-Lipschitz Optimization, preprint, arXiv:2001.05202, 2020.
32.
F. Hanzely and P. Richtárik, Fastest rates for stochastic mirror descent methods, Comput. Optim. Appl., 79 (2021), pp. 717--766.
33.
M. Hong, X. Wang, M. Razaviyayn, and Z.-Q. Luo, Iteration complexity analysis of block coordinate descent methods, Math. Program., 163 (2017), pp. 85--114.
34.
R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Advances in Neural Information Processing Systems 26, Curran Associates, Red Hook, NY, 2013, pp. 315--323.
35.
C. Kan and W. Song, The Moreau envelope function and proximal mapping in the sense of the Bregman distance, Nonlinear Anal., 75 (2012), pp. 1385--1399.
36.
K. Kurdyka, On gradients of functions definable in \(o\)-minimal structures, Ann. Institut Fourier, 48 (1998), pp. 769--783.
37.
P. Latafat, A. Themelis, and P. Patrinos, Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems, Math. Program., 193 (2021), pp. 195--224.
38.
S. Łojasiewicz, Sur la géométrie semi- et sous- analytique, Ann. Institut Fourier, 43 (1993), pp. 1575--1595.
39.
H. Lu, ``Relative continuity” for non-Lipschitz nonsmooth convex optimization using stochastic (or deterministic) mirror descent, INFORMS J. Optim., 1 (2019), pp. 288--303.
40.
H. Lu, R. M. Freund, and Y. Nesterov, Relatively smooth convex optimization by first-order methods, and applications, SIAM J. Optim., 28 (2018), pp. 333--354.
41.
D. R. Luke, J. V. Burke, and R. G. Lyon, Optical wavefront reconstruction: Theory and numerical methods, SIAM Rev., 44 (2002), pp. 169--224.
42.
J. Mairal, Incremental majorization-minimization optimization with application to large-scale machine learning, SIAM J. Optim., 25 (2015), pp. 829--855.
43.
C. A. Metzler, M. K. Sharma, S. Nagesh, R. G. Baraniuk, O. Cossairt, and A. Veeraraghavan, Coherent inverse scattering via transmission matrices: Efficient phase retrieval algorithms and a public dataset, in Proceedings of the 2017 IEEE International Conference Computational Photography (ICCP), IEEE, Piscatway, NJ, 2017, pp. 1--16.
44.
A. Mokhtari, M. Gúrbüzbalaban, and A. Ribeiro, Surpassing gradient descent provably: A cyclic incremental method with linear convergence rate, SIAM J. Optim., 28 (2018), pp. 1420--1447.
45.
A. Nedić and S. Lee, On stochastic subgradient mirror-descent algorithm with weighted averaging, SIAM J. Optim., 24 (2014), pp. 84--107.
46.
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM J. Optim., 19 (2009), pp. 1574--1609.
47.
Yu. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim., 22 (2012), pp. 341--362.
48.
L. M. Nguyen, J. Liu, K. Scheinberg, and M. Takáč, SARAH: A novel method for machine learning problems using stochastic recursive gradient, in Proc. Mach. Learn. Res. (PMLR), 70 (2017), pp. 2613--2621.
49.
P. Ochs, J. Fadili, and T. Brox, Non-smooth non-convex Bregman minimization: Unification and new algorithms, J. Optim. Theory Appl., 181 (2019), pp. 244--278.
50.
X. Qian, A. Sailanbayev, K. Mishchenko, and P. Richtárik, MISO is Making a Comeback with Better Proofs and Rates, preprint, arXiv:1906.01474, https://arxiv.org/abs/1906.01474 (2019).
51.
H. Robbins and D. Siegmund, A convergence theorem for non negative almost supermartingales and some applications, in Herbert Robbins Selected Papers, Springer, New York, 1985, pp. 111--135.
52.
R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970.
53.
R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Grundlehren Math. Wiss. 317, Springer, Berlin, 2009.
54.
M. Schmidt, N. Le Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Math. Program., 162 (2017), pp. 83--112.
55.
Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, Phase retrieval with application to optical imaging: A contemporary overview, IEEE Signal Process. Mag., 32 (2015), pp. 87--109.
56.
M. V. Solodov and B. F. Svaiter, An inexact hybrid generalized proximal point algorithm and some new results on the theory of Bregman functions, Math. Oper. Res., 25 (2000), pp. 214--230.
57.
M. R. Spiegel, Mathematical Handbook of Formulas and Tables, McGraw-Hill, New York, 1999.
58.
J. Sun, Q. Qu, and J. Wright, A geometric analysis of phase retrieval, Found. Comput. Math., 18 (2018), pp. 1131--1198.
59.
M. Teboulle, A simplified view of first order methods for optimization, Math. Program., 170 (2018), pp. 67--96.
60.
P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., 109 (2001), pp. 475--494.
61.
P. Tseng, On Accelerated Proximal Gradient Methods for Convex-Concave Optimization, manuscript.
62.
P. Tseng and D. P. Bertsekas, Relaxation methods for problems with strictly convex separable costs and linear constraints, Math. Program., 38 (1987), pp. 303--321.
63.
P. Tseng and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program., 117 (2009), pp. 387--423.
64.
P. Tseng and S. Yun, Incrementally updated gradient methods for constrained and regularized optimization, J. Optim. Theory Appl., 160 (2014), pp. 832--853.
65.
N. D. Vanli, M. Gürbüzbalaban, and A. Ozdaglar, Global convergence rate of proximal incremental aggregated gradient methods, SIAM J. Optim., 28 (2018), pp. 1282--1300.
66.
L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), pp. 2057--2075.
67.
Y. Xu and W. Yin, A globally convergent algorithm for nonconvex optimization based on block coordinate update, J. Sci. Comput., 72 (2017), pp. 700--734.
68.
P. Yu, G. Li, and T. K. Pong, Deducing Kurdyka-Łojasiewicz Exponent via Inf-projection, preprint, arXiv:1902.03635, https://link.springer.com/article/10.1007/s10208-021-09528-6 (2019).
69.
H. Zhang, Y. Chi, and Y. Liang, Provable non-convex phase retrieval with outliers: Median truncated Wirtinger flow, Proc. Mach. Learn. Res., 48 (2016), pp. 1022--1031.
70.
H. Zhang, Y.-H. Dai, L. Guo, and W. Peng, Proximal-like incremental aggregated gradient method with linear convergence under Bregman distance growth conditions, Math. Oper. Res., 46 (2021), pp. 61--81.
71.
S. Zhang and N. He, On the Convergence Rate of Stochastic Mirror Descent for Nonsmooth Nonconvex Optimization, preprint, arXiv:1806.04781, 2018.

Information & Authors

Information

Published In

cover image SIAM Journal on Optimization
SIAM Journal on Optimization
Pages: 2230 - 2262
ISSN (online): 1095-7189

History

Submitted: 9 March 2021
Accepted: 13 June 2022
Published online: 13 September 2022

Keywords

  1. nonsmooth nonconvex optimization
  2. incremental aggregated algorithms
  3. Bregman--Moreau envelope
  4. KL inequality

MSC codes

  1. 90C06
  2. 90C25
  3. 90C26
  4. 49J52
  5. 49J53

Authors

Affiliations

Funding Information

Fonds De La Recherche Scientifique - FNRS https://doi.org/10.13039/501100002661 : 30468160
Fonds Wetenschappelijk Onderzoek https://doi.org/10.13039/501100003130 : 1196820N, G0A0920N, G086518N, G086318N
KU Leuven https://doi.org/10.13039/501100004040 : C14/18/068
Japan Society for the Promotion of Science https://doi.org/10.13039/501100001691 : JP21K17710

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

View options

PDF

View PDF

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media