Abstract

We develop a trust-region method for minimizing the sum of a smooth term (f) and a nonsmooth term (h), both of which can be nonconvex. Each iteration of our method minimizes a possibly nonconvex model of (f + h) in a trust region. The model coincides with (f + h) in value and subdifferential at the center. We establish global convergence to a first-order stationary point when (f) satisfies a smoothness condition that holds, in particular, when it has a Lipschitz-continuous gradient, and (h) is proper and lower semicontinuous. The model of (h) is required to be proper, lower semi-continuous and prox-bounded. Under these weak assumptions, we establish a worst-case (O(1/\epsilon^2)) iteration complexity bound that matches the best known complexity bound of standard trust-region methods for smooth optimization. We detail a special instance, named TR-PG, in which we use a limited-memory quasi-Newton model of (f) and compute a step with the proximal gradient method, resulting in a practical proximal quasi-Newton method. We establish similar convergence properties and complexity bound for a quadratic regularization variant, named R2, and provide an interpretation as a proximal gradient method with adaptive step size for nonconvex problems. R2 may also be used to compute steps inside the trust-region method, resulting in an implementation named TR-R2. We describe our Julia implementations and report numerical results on inverse problems from sparse optimization and signal processing. Both TR-PG and TR-R2 exhibit promising performance and compare favorably with two linesearch proximal quasi-Newton methods based on convex models.

Keywords

  1. nonsmooth optimization
  2. nonconvex optimization
  3. composite optimization
  4. trust-region methods
  5. quasi-Newton methods
  6. proximal gradient method
  7. proximal quasi-Newton method

MSC codes

  1. 49J52
  2. 65K10
  3. 90C53
  4. 90C56

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
A. Aravkin and D. Davis, Trimmed statistical estimation via variance reduction, Math. Oper. Res., 45 (2020), pp. 292--322, https://doi.org/10.1287/moor.2019.0992.
2.
R. Baraldi, R. Kumar, and A. Aravkin, Basis pursuit denoise with nonsmooth constraints, IEEE Trans. Signal Process., 67 (2019), pp. 5811--5823, https://doi.org/10.1109/TSP.2019.2946029.
3.
H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, Cham, Switzerland, 2011, https://doi.org/10.1007/978-3-319-48311-5.
4.
A. Beck, First Order Methods in Optimization, MOS-SIAM Ser. Optim. 25, SIAM, Philadelphia, 2017, https://doi.org/10.1137/1.9781611974997.
5.
J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, Julia: A fresh approach to numerical computing, SIAM Rev., 59 (2017), pp. 65--98, https://doi.org/10.1137/141000671.
6.
T. Blumensath and M. E. Davies, Iterative hard thresholding for compressed sensing, Appl. Comput. Harmon. Anal., 27 (2009), pp. 265--274.
7.
R. I. Boţ, E. R. Csetnek, and S. László, An inertial forward--backward algorithm for the minimization of the sum of two nonconvex functions, EURO J. Comput. Optim., 4 (2016), pp. 3--25, https://doi.org/10.1007/s13675-015-0045-8.
8.
J. Bolte, S. Sabach, and M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Math. Program., 146 (2014), pp. 459--494, https://doi.org/10.1007/s10107-013-0701-9.
9.
C. Cartis, N. I. M. Gould, and Ph. L. Toint, On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming, SIAM J. Optim., 21 (2011), pp. 1721--1739, https://doi.org/10.1137/11082381X.
10.
P. L. Combettes and J.-C. Pesquet, Proximal splitting methods in signal processing, in Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Springer, New York, 2011, pp. 185--212, https://doi.org/10.1007/978-1-4419-9569-8_10.
11.
A. R. Conn, N. I. M. Gould, and Ph. L. Toint, Trust-Region Methods, MOS-SIAM Ser. Optim. 1, SIAM, Philadelphia, 2000, https://doi.org/10.1137/1.9780898719857.
12.
F. E. Curtis, Z. Lubberts, and D. Robinson, Concise complexity analyses for trust region methods, Optim. Lett., 12 (2018), pp. 1713--1724, https://doi.org/10.1007/s11590-018-1286-2.
13.
F. E. Curtis, D. P. Robinson, and M. Samadi, A trust region algorithm with a worst-case iteration complexity of (\mathcalO(\epsilon^-3/2)) for nonconvex optimization, Math. Program., 162 (2017), pp. 1--32, https://doi.org/10.1007/s10107-016-1026-2.
14.
J. Dennis, S. Li, and R. Tapia, A unified approach to global convergence of trust region methods for nonsmooth optimization, Math. Program., 68 (1995), pp. 319--346, https://doi.org/10.1007/BF01585770.
15.
J. E. Dennis Jr. and H. H. W. Mei, Two new unconstrained optimization algorithms which use function and gradient values, J. Optim. Theory Appl., 28 (1979), pp. 453--482, https://doi.org/10.1007/BF00932218.
16.
D. L. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52 (2006), pp. 1289--1306, https://doi.org/10.1109/TIT.2006.871582.
17.
J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., 96 (2001), pp. 1348--1360, https://doi.org/10.1198/016214501753382273.
18.
R. FitzHugh, Mathematical models of threshold phenomena in the nerve membrane, Bull. Math. Biophys., 17 (1955), pp. 257--278, https://doi.org/10.1007/BF02477753.
19.
H.-Y. Gao and A. G. Bruce, Waveshrink with firm shrinkage, Statist. Sinica, 7 (1997), pp. 855--874, https://www.jstor.org/stable/24306159.
20.
G. Grapiglia, J. Yuan, and Y. Yuan, Nonlinear stepsize control algorithms: Complexity bounds for first- and second-order optimality, J. Optim. Theory Appl., 171 (2016), pp. 980--997, https://doi.org/10.1007/s10957-016-1007-x.
21.
W. Hare and C. Sagastizábal, Computing proximal points of nonconvex functions, Math. Program., 116 (2009), pp. 221--258, https://doi.org/10.1007/s10107-007-0124-6.
22.
D. Kim, S. Sra, and I. S. Dhillon, A scalable trust-region algorithm with application to mixed-norm regression, in ICML, International Machine Learning Society, Stroudsburg, PA, 2010, pp. 519--526, https://icml.cc/Conferences/2010/papers/562.pdf.
23.
J. D. Lee, Y. Sun, and M. A. Saunders, Proximal Newton-type methods for minimizing composite functions, SIAM J. Optim., 24 (2014), pp. 1420--1443, https://doi.org/10.1137/130921428.
24.
H. Li and Z. Lin, Accelerated proximal gradient methods for nonconvex programming, in Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 1, NIPS'15, 2015, MIT Press, Cambridge, MA, pp. 379--387, http://irc.cs.sdu.edu.cn/973project/result/download/2015/28.AcceleratedProximal.pdf.
25.
P. L. Lions and B. Mercier, Splitting algorithms for the sum of two nonlinear operators, SIAM J. Numer. Anal., 16 (1979), pp. 964--979, https://doi.org/10.1137/0716071.
26.
S. Lotfi, T. Bonniot de Ruisselet, D. Orban, and A. Lodi, Stochastic damped L-BFGS with controlled norm of the Hessian approximation, OPT2020 Conference on Optimization for Machine Learning, 2020, https://doi.org/10.13140/RG.2.2.27851.41765/1.
27.
J. M. Martínez and A. C. Moretti, A trust region method for minimization of nonsmooth functions with linear constraints, Math. Program., 76 (1997), pp. 431--449, https://doi.org/10.1007/BF02614392.
28.
J. J. Moré and D. C. Sorensen, Computing a trust region step, SIAM J. Sci. Statist. Comput., 4 (1983), pp. 553--572, https://doi.org/10.1137/0904038.
29.
J. Nagumo, S. Arimoto, and S. Yoshizawa, An active pulse transmission line simulating nerve axon, Proc. IRE, 50 (1962), pp. 2061--2070, https://doi.org/10.1109/JRPROC.1962.288235.
30.
Y. Nesterov, Modified Gauss--Newton scheme with worst case guarantees for global performance, Optim. Methods Softw., 22 (2007), pp. 469--483, https://doi.org/10.1080/08927020600643812.
31.
D. Orban and A. S. Siqueira, Linearoperators.jl., https://doi.org/10.5281/zenodo.2559295 (2019).
32.
M. J. D. Powell, A new algorithm for unconstrained optimization, in Nonlinear Programming, J. Rosen, O. Mangasarian, and K. Ritter, eds., Academic Press, New York, 1970, pp. 31--65, https://doi.org/10.1016/B978-0-12-597050-1.50006-3.
33.
L. Qi and J. Sun, A trust region algorithm for minimization of locally Lipschitzian functions, Math. Program., 66 (1994), pp. 25--43, https://doi.org/10.1007/BF01581136.
34.
C. Rackauckas and Q. Nie, Differentialequations.jl--a performant and feature-rich ecosystem for solving differential equations in Julia, J. Open Res. Softw., 5 (2017), 151 https://doi.org/10.5334/jors.151.
35.
J. Revels, M. Lubin, and T. Papamarkou, Forward-Mode Automatic Differentiation in Julia, https://arxiv.org/abs/1607.07892 (2016).
36.
R. Rockafellar and R. Wets, Variational Analysis, Grundlehren Math. Wiss. 317, Springer, Berlin, 1998, https://doi.org/10.1007/978-3-642-02431-3.
37.
T. Steihaug, The conjugate gradient method and trust regions in large scale optimization, SIAM J. Numer. Anal., 20 (1983), pp. 626--637, https://doi.org/10.1137/0720042.
38.
L. Stella, A. Themelis, P. Sopasakis, and P. Patrinos, A simple and efficient algorithm for nonlinear model predictive control, in 2017 IEEE 56th Annual Conference on Decision and Control (CDC), IEEE, Piscataway, NJ, 2017, pp. 1939--1944, https://doi.org/10.1109/CDC.2017.8263933.
39.
A. Themelis, L. Stella, and P. Patrinos, Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms, SIAM J. Optim., 28 (2018), pp. 2274--2303, https://doi.org/10.1137/16M1080240.
40.
R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. Bstat. Methodol., 58 (1996), pp. 267--288, https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
41.
E. van den Berg and M. P. Friedlander, Probing the Pareto frontier for basis pursuit solutions, SIAM J. Sci. Comput., 31 (2009), pp. 890--912, https://doi.org/10.1137/080714488.
42.
B. Van der Pol, Lxxxviii. On “relaxation-oscillations,'' London, Edinburgh, Dublin Philos. Mag. J. Sci., 2 (1926), pp. 978--992, https://doi.org/10.1080/14786442608564127.
43.
Y.-X. Yuan, Conditions for convergence of trust region algorithms for nonsmooth optimization, Math. Program., 31 (1985), pp. 220--228, https://doi.org/10.1007/BF02591750.
44.
C.-H. Zhang, Nearly unbiased variable selection under minimax concave penalty, Ann. Statist., 38 (2010), pp. 894--942, https://doi.org/10.1214/09-AOS729.
45.
P. Zheng and A. Aravkin, Relax-and-split method for nonconvex inverse problems, Inverse Problems, 36 (2020), 095013, https://doi.org/10.1088/1361-6420/aba417.
46.
P. Zheng, T. Askham, S. L. Brunton, J. N. Kutz, and A. Y. Aravkin, A unified framework for sparse relaxed regularized regression: SR3, IEEE Access, 7 (2018), pp. 1404--1423, https://doi.org/10.1109/ACCESS.2018.2886528.

Information & Authors

Information

Published In

cover image SIAM Journal on Optimization
SIAM Journal on Optimization
Pages: 900 - 929
ISSN (online): 1095-7189

History

Submitted: 2 April 2021
Accepted: 11 November 2021
Published online: 19 May 2022

Keywords

  1. nonsmooth optimization
  2. nonconvex optimization
  3. composite optimization
  4. trust-region methods
  5. quasi-Newton methods
  6. proximal gradient method
  7. proximal quasi-Newton method

MSC codes

  1. 49J52
  2. 65K10
  3. 90C53
  4. 90C56

Authors

Affiliations

Funding Information

Natural Sciences and Engineering Research Council of Canada https://doi.org/10.13039/501100000038
U.S. Department of Energy https://doi.org/10.13039/100000015 : DE-FG02-97ER25308
Washington Research Foundation https://doi.org/10.13039/100001906

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media