Society for Industrial and Applied Mathematics: SIAM Journal on Optimization: Table of Contents
Table of Contents for SIAM Journal on Optimization. List of articles from both the latest and ahead of print issues.
https://epubs.siam.org/loi/sjope8?af=R
Society for Industrial and Applied Mathematics: SIAM Journal on Optimization: Table of Contents
Society for Industrial and Applied Mathematics
enUS
SIAM Journal on Optimization
SIAM Journal on Optimization
https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg
https://epubs.siam.org/loi/sjope8?af=R

Piecewise Polyhedral Relaxations of Multilinear Optimization
https://epubs.siam.org/doi/abs/10.1137/22M1507486?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/4">Volume 34, Issue 4</a>, Page 31673193, December 2024. <br/> Abstract. In this paper, we consider piecewise polyhedral relaxations (PPRs) of multilinear optimization problems over axisparallel hyperrectangular partitions of their domain. We improve formulations for PPRs by linking components that are commonly modeled independently in the literature. Numerical experiments with ALPINE, an opensource software for global optimization that relies on piecewise approximations of functions, show that the resulting formulations speed up the solver by an order of magnitude when compared to its default settings. If given the same time, the new formulation can solve more than twice as many instances from our testset. Most results on piecewise functions in the literature assume that the partition is regular. Regular partitions arise when the domain of each individual input variable is divided into nonoverlapping intervals and when the partition of the overall domain is composed of all Cartesian products of these intervals. We provide the first locally ideal formulation for general (nonregular) hyperrectangular partitions. We also perform experiments that show that, for a variant of tree ensemble optimization, our formulations based on nonregular partitions outperform an existing formulation for piecewise linear functions commonly used in the literature and also outperform by an order of magnitude formulations over regular partitions.
SIAM Journal on Optimization, Volume 34, Issue 4, Page 31673193, December 2024. <br/> Abstract. In this paper, we consider piecewise polyhedral relaxations (PPRs) of multilinear optimization problems over axisparallel hyperrectangular partitions of their domain. We improve formulations for PPRs by linking components that are commonly modeled independently in the literature. Numerical experiments with ALPINE, an opensource software for global optimization that relies on piecewise approximations of functions, show that the resulting formulations speed up the solver by an order of magnitude when compared to its default settings. If given the same time, the new formulation can solve more than twice as many instances from our testset. Most results on piecewise functions in the literature assume that the partition is regular. Regular partitions arise when the domain of each individual input variable is divided into nonoverlapping intervals and when the partition of the overall domain is composed of all Cartesian products of these intervals. We provide the first locally ideal formulation for general (nonregular) hyperrectangular partitions. We also perform experiments that show that, for a variant of tree ensemble optimization, our formulations based on nonregular partitions outperform an existing formulation for piecewise linear functions commonly used in the literature and also outperform by an order of magnitude formulations over regular partitions. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Piecewise Polyhedral Relaxations of Multilinear Optimization
10.1137/22M1507486
SIAM Journal on Optimization
20241001T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Jongeun Kim
JeanPhilippe P. Richard
Mohit Tawarmalani
Piecewise Polyhedral Relaxations of Multilinear Optimization
34
4
3167
3193
20241231T08:00:00Z
20241231T08:00:00Z
10.1137/22M1507486
https://epubs.siam.org/doi/abs/10.1137/22M1507486?af=R
© 2024 Society for Industrial and Applied Mathematics

Further Development in Convex Conic Reformulation of Geometric Nonconvex Conic Optimization Problems
https://epubs.siam.org/doi/abs/10.1137/23M1593346?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/4">Volume 34, Issue 4</a>, Page 31943211, December 2024. <br/> Abstract. A geometric nonconvex conic optimization problem (COP) was recently proposed by Kim, Kojima, and Toh (SIAM J. Optim., 30 (2020), pp. 1251–1273) as a unified framework for convex conic reformulation of a class of quadratic optimization problems and polynomial optimization problems. The nonconvex COP minimizes a linear function over the intersection of a nonconvex cone [math], a convex subcone [math] of the convex hull co[math] of [math], and an affine hyperplane with a normal vector [math]. Under the assumption co[math], the original nonconvex COP in their paper was shown to be equivalently formulated as a convex conic program by replacing the constraint set with the intersection of [math] and the affine hyperplane. This paper further studies the key assumption co[math] in their framework and provides three sets of necessarysufficient conditions for the assumption. Based on the conditions, we propose a new wide class of quadratically constrained quadratic programs with multiple nonconvex equality and inequality constraints, which can be solved exactly by their semidefinite relaxation.
SIAM Journal on Optimization, Volume 34, Issue 4, Page 31943211, December 2024. <br/> Abstract. A geometric nonconvex conic optimization problem (COP) was recently proposed by Kim, Kojima, and Toh (SIAM J. Optim., 30 (2020), pp. 1251–1273) as a unified framework for convex conic reformulation of a class of quadratic optimization problems and polynomial optimization problems. The nonconvex COP minimizes a linear function over the intersection of a nonconvex cone [math], a convex subcone [math] of the convex hull co[math] of [math], and an affine hyperplane with a normal vector [math]. Under the assumption co[math], the original nonconvex COP in their paper was shown to be equivalently formulated as a convex conic program by replacing the constraint set with the intersection of [math] and the affine hyperplane. This paper further studies the key assumption co[math] in their framework and provides three sets of necessarysufficient conditions for the assumption. Based on the conditions, we propose a new wide class of quadratically constrained quadratic programs with multiple nonconvex equality and inequality constraints, which can be solved exactly by their semidefinite relaxation. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Further Development in Convex Conic Reformulation of Geometric Nonconvex Conic Optimization Problems
10.1137/23M1593346
SIAM Journal on Optimization
20241001T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Naohiko Arima
Sunyoung Kim
Masakazu Kojima
Further Development in Convex Conic Reformulation of Geometric Nonconvex Conic Optimization Problems
34
4
3194
3211
20241231T08:00:00Z
20241231T08:00:00Z
10.1137/23M1593346
https://epubs.siam.org/doi/abs/10.1137/23M1593346?af=R
© 2024 Society for Industrial and Applied Mathematics

Proximity Operators of Perspective Functions with Nonlinear Scaling
https://epubs.siam.org/doi/abs/10.1137/23M1583430?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/4">Volume 34, Issue 4</a>, Page 32123234, December 2024. <br/> Abstract. A perspective function is a construction which combines a base function defined on a given space with a nonlinear scaling function defined on another space and which yields a lower semicontinuous convex function on the product space. Since perspective functions are typically nonsmooth, their use in firstorder algorithms necessitates the computation of their proximity operator. This paper establishes closedform expressions for the proximity operator of a perspective function defined on a Hilbert space in terms of a proximity operator involving its base function and one involving its scaling function.
SIAM Journal on Optimization, Volume 34, Issue 4, Page 32123234, December 2024. <br/> Abstract. A perspective function is a construction which combines a base function defined on a given space with a nonlinear scaling function defined on another space and which yields a lower semicontinuous convex function on the product space. Since perspective functions are typically nonsmooth, their use in firstorder algorithms necessitates the computation of their proximity operator. This paper establishes closedform expressions for the proximity operator of a perspective function defined on a Hilbert space in terms of a proximity operator involving its base function and one involving its scaling function. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Proximity Operators of Perspective Functions with Nonlinear Scaling
10.1137/23M1583430
SIAM Journal on Optimization
20241001T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Luis M. BriceñoArias
Patrick L. Combettes
Francisco J. Silva
Proximity Operators of Perspective Functions with Nonlinear Scaling
34
4
3212
3234
20241231T08:00:00Z
20241231T08:00:00Z
10.1137/23M1583430
https://epubs.siam.org/doi/abs/10.1137/23M1583430?af=R
© 2024 Society for Industrial and Applied Mathematics

A SearchFree [math] Homotopy Inexact ProximalNewton Extragradient Algorithm for Monotone Variational Inequalities
https://epubs.siam.org/doi/abs/10.1137/23M1593000?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/4">Volume 34, Issue 4</a>, Page 32353258, December 2024. <br/> Abstract. We present and study the iterationcomplexity of a relativeerror inexact proximalNewton extragradient algorithm for solving smooth monotone variational inequality problems in real Hilbert spaces. We removed a search procedure from Monteiro and Svaiter (2012) by introducing a novel approach based on homotopy, which requires the resolution (at each iteration) of a single strongly monotone linear variational inequality. For a given tolerance [math], our main algorithm exhibits pointwise [math] and ergodic [math] iterationcomplexities. From a practical perspective, preliminary numerical experiments indicate that our main algorithm outperforms some previous proximalNewton schemes.
SIAM Journal on Optimization, Volume 34, Issue 4, Page 32353258, December 2024. <br/> Abstract. We present and study the iterationcomplexity of a relativeerror inexact proximalNewton extragradient algorithm for solving smooth monotone variational inequality problems in real Hilbert spaces. We removed a search procedure from Monteiro and Svaiter (2012) by introducing a novel approach based on homotopy, which requires the resolution (at each iteration) of a single strongly monotone linear variational inequality. For a given tolerance [math], our main algorithm exhibits pointwise [math] and ergodic [math] iterationcomplexities. From a practical perspective, preliminary numerical experiments indicate that our main algorithm outperforms some previous proximalNewton schemes. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
A SearchFree [math] Homotopy Inexact ProximalNewton Extragradient Algorithm for Monotone Variational Inequalities
10.1137/23M1593000
SIAM Journal on Optimization
20241001T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
M. Marques Alves
J. M. Pereira
B. F. Svaiter
A SearchFree [math] Homotopy Inexact ProximalNewton Extragradient Algorithm for Monotone Variational Inequalities
34
4
3235
3258
20241231T08:00:00Z
20241231T08:00:00Z
10.1137/23M1593000
https://epubs.siam.org/doi/abs/10.1137/23M1593000?af=R
© 2024 Society for Industrial and Applied Mathematics

ParameterFree FISTA by Adaptive Restart and Backtracking
https://epubs.siam.org/doi/abs/10.1137/23M158961X?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/4">Volume 34, Issue 4</a>, Page 32593285, December 2024. <br/> Abstract. We consider a combined restarting and adaptive backtracking strategy for the popular fast iterative shrinkingthresholding algorithm (FISTA) frequently employed for accelerating the convergence speed of largescale structured convex optimization problems. Several variants of FISTA enjoy a provable linear convergence rate for the function values [math] of the form [math] under the prior knowledge of problem conditioning, i.e., of the ratio between the (Łojasiewicz) parameter [math] determining the growth of the objective function and the Lipschitz constant [math] of its smooth component. These parameters are nonetheless hard to estimate in many practical cases. Recent works address the problem by estimating either parameter via suitable adaptive strategies. In our work both parameters can be estimated at the same time by means of an algorithmic restarting scheme where, at each restart, a nonmonotone estimation of [math] is performed. For this scheme, theoretical convergence results are proved, showing that an [math] convergence speed can still be achieved along with quantitative estimates of the conditioning. The resulting freeFISTA is therefore parameterfree. Several numerical results are reported to confirm the practical interest of its use in many example problems.
SIAM Journal on Optimization, Volume 34, Issue 4, Page 32593285, December 2024. <br/> Abstract. We consider a combined restarting and adaptive backtracking strategy for the popular fast iterative shrinkingthresholding algorithm (FISTA) frequently employed for accelerating the convergence speed of largescale structured convex optimization problems. Several variants of FISTA enjoy a provable linear convergence rate for the function values [math] of the form [math] under the prior knowledge of problem conditioning, i.e., of the ratio between the (Łojasiewicz) parameter [math] determining the growth of the objective function and the Lipschitz constant [math] of its smooth component. These parameters are nonetheless hard to estimate in many practical cases. Recent works address the problem by estimating either parameter via suitable adaptive strategies. In our work both parameters can be estimated at the same time by means of an algorithmic restarting scheme where, at each restart, a nonmonotone estimation of [math] is performed. For this scheme, theoretical convergence results are proved, showing that an [math] convergence speed can still be achieved along with quantitative estimates of the conditioning. The resulting freeFISTA is therefore parameterfree. Several numerical results are reported to confirm the practical interest of its use in many example problems. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
ParameterFree FISTA by Adaptive Restart and Backtracking
10.1137/23M158961X
SIAM Journal on Optimization
20241001T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
JeanFrançois Aujol
Luca Calatroni
Charles Dossal
Hippolyte Labarrière
Aude Rondepierre
ParameterFree FISTA by Adaptive Restart and Backtracking
34
4
3259
3285
20241231T08:00:00Z
20241231T08:00:00Z
10.1137/23M158961X
https://epubs.siam.org/doi/abs/10.1137/23M158961X?af=R
© 2024 Society for Industrial and Applied Mathematics

A Disjunctive Cutting Plane Algorithm for Bilinear Programming
https://epubs.siam.org/doi/abs/10.1137/22M1515562?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/4">Volume 34, Issue 4</a>, Page 32863313, December 2024. <br/> Abstract. In this paper, we present and analyze a finitely convergent disjunctive cutting plane algorithm to obtain an [math]optimal solution or detect the infeasibility of a general nonconvex continuous bilinear program. While the cutting planes are obtained like Saxena, Bonami, and Lee [Math. Prog., 130 (2011), pp. 359–413] and Fampa and Lee [J. Global Optim., 80 (2021), pp. 287–305], a feature of the algorithm that guarantees finite convergence is exploring nearoptimal extreme point solutions to a current relaxation at each iteration. In this sense, the presented algorithm and its analysis extend the work Owen and Mehrotra [Math. Prog., 89 (2001), pp. 437–448] for solving mixedinteger linear programs to the general bilinear programs.
SIAM Journal on Optimization, Volume 34, Issue 4, Page 32863313, December 2024. <br/> Abstract. In this paper, we present and analyze a finitely convergent disjunctive cutting plane algorithm to obtain an [math]optimal solution or detect the infeasibility of a general nonconvex continuous bilinear program. While the cutting planes are obtained like Saxena, Bonami, and Lee [Math. Prog., 130 (2011), pp. 359–413] and Fampa and Lee [J. Global Optim., 80 (2021), pp. 287–305], a feature of the algorithm that guarantees finite convergence is exploring nearoptimal extreme point solutions to a current relaxation at each iteration. In this sense, the presented algorithm and its analysis extend the work Owen and Mehrotra [Math. Prog., 89 (2001), pp. 437–448] for solving mixedinteger linear programs to the general bilinear programs. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
A Disjunctive Cutting Plane Algorithm for Bilinear Programming
10.1137/22M1515562
SIAM Journal on Optimization
20241001T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Hamed Rahimian
Sanjay Mehrotra
A Disjunctive Cutting Plane Algorithm for Bilinear Programming
34
4
3286
3313
20241231T08:00:00Z
20241231T08:00:00Z
10.1137/22M1515562
https://epubs.siam.org/doi/abs/10.1137/22M1515562?af=R
© 2024 Society for Industrial and Applied Mathematics

ZerothOrder Riemannian Averaging Stochastic Approximation Algorithms
https://epubs.siam.org/doi/abs/10.1137/23M1605405?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/4">Volume 34, Issue 4</a>, Page 33143341, December 2024. <br/> Abstract. We present Zerothorder Riemannian Averaging Stochastic Approximation (ZoRASA) algorithms for stochastic optimization on Riemannian manifolds. We show that ZoRASA achieves optimal sample complexities for generating [math]approximation firstorder stationary solutions using only onesample or constantorder batches in each iteration. Our approach employs Riemannian movingaverage stochastic gradient estimators and a novel Riemannian–Lyapunov analysis technique for convergence analysis. We provably improve the algorithm’s practicality by using retractions and vector transport, instead of exponential mappings and parallel transports, thereby reducing periteration complexity. To do so, we introduce a novel geometric condition, satisfied by manifolds with bounded second fundamental form, which enables new error bounds for approximating parallel transport with vector transport.
SIAM Journal on Optimization, Volume 34, Issue 4, Page 33143341, December 2024. <br/> Abstract. We present Zerothorder Riemannian Averaging Stochastic Approximation (ZoRASA) algorithms for stochastic optimization on Riemannian manifolds. We show that ZoRASA achieves optimal sample complexities for generating [math]approximation firstorder stationary solutions using only onesample or constantorder batches in each iteration. Our approach employs Riemannian movingaverage stochastic gradient estimators and a novel Riemannian–Lyapunov analysis technique for convergence analysis. We provably improve the algorithm’s practicality by using retractions and vector transport, instead of exponential mappings and parallel transports, thereby reducing periteration complexity. To do so, we introduce a novel geometric condition, satisfied by manifolds with bounded second fundamental form, which enables new error bounds for approximating parallel transport with vector transport. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
ZerothOrder Riemannian Averaging Stochastic Approximation Algorithms
10.1137/23M1605405
SIAM Journal on Optimization
20241001T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Jiaxiang Li
Krishnakumar Balasubramanian
Shiqian Ma
ZerothOrder Riemannian Averaging Stochastic Approximation Algorithms
34
4
3314
3341
20241231T08:00:00Z
20241231T08:00:00Z
10.1137/23M1605405
https://epubs.siam.org/doi/abs/10.1137/23M1605405?af=R
© 2024 Society for Industrial and Applied Mathematics

Efficient First Order Method for Saddle Point Problems with Higher Order Smoothness
https://epubs.siam.org/doi/abs/10.1137/23M1566972?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/4">Volume 34, Issue 4</a>, Page 33423370, December 2024. <br/> Abstract. This paper studies the complexity of finding approximate stationary points for the smooth nonconvexstronglyconcave (NCSC) saddle point problem: [math]. Under the standard firstorder smoothness conditions where [math] is [math]smooth in both arguments and [math]strongly concave in [math], existing literature shows that the optimal complexity for firstorder methods to obtain an [math]stationary point is [math], where [math] is the condition number. However, when [math] has [math]Lipschitz continuous Hessian in addition, we derive a firstorder algorithm with an [math] complexity by designing an accelerated proximal point algorithm enhanced with the “Convex Until Proven Guilty” technique. Moreover, an improved [math] lower bound for firstorder method is also derived for sufficiently small [math]. As a result, given the secondorder smoothness of the problem, the complexity of our method improves the stateoftheart result by a factor of [math], while almost matching the lower bound except for a small [math] factor.
SIAM Journal on Optimization, Volume 34, Issue 4, Page 33423370, December 2024. <br/> Abstract. This paper studies the complexity of finding approximate stationary points for the smooth nonconvexstronglyconcave (NCSC) saddle point problem: [math]. Under the standard firstorder smoothness conditions where [math] is [math]smooth in both arguments and [math]strongly concave in [math], existing literature shows that the optimal complexity for firstorder methods to obtain an [math]stationary point is [math], where [math] is the condition number. However, when [math] has [math]Lipschitz continuous Hessian in addition, we derive a firstorder algorithm with an [math] complexity by designing an accelerated proximal point algorithm enhanced with the “Convex Until Proven Guilty” technique. Moreover, an improved [math] lower bound for firstorder method is also derived for sufficiently small [math]. As a result, given the secondorder smoothness of the problem, the complexity of our method improves the stateoftheart result by a factor of [math], while almost matching the lower bound except for a small [math] factor. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Efficient First Order Method for Saddle Point Problems with Higher Order Smoothness
10.1137/23M1566972
SIAM Journal on Optimization
20241001T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Nuozhou Wang
Junyu Zhang
Shuzhong Zhang
Efficient First Order Method for Saddle Point Problems with Higher Order Smoothness
34
4
3342
3370
20241231T08:00:00Z
20241231T08:00:00Z
10.1137/23M1566972
https://epubs.siam.org/doi/abs/10.1137/23M1566972?af=R
© 2024 Society for Industrial and Applied Mathematics

A Feasible Method for General Convex LowRank SDP Problems
https://epubs.siam.org/doi/abs/10.1137/23M1561464?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 21692200, September 2024. <br/> Abstract. In this work, we consider the lowrank decomposition (SDPR) of general convex semidefinite programming (SDP) problems that contain both a positive semidefinite matrix and a nonnegative vector as variables. We develop a ranksupportadaptive feasible method to solve (SDPR) based on Riemannian optimization. The method is able to escape from a saddle point to ensure its convergence to a global optimal solution for generic constraint vectors. We prove its global convergence and local linear convergence without assuming that the objective function is twice differentiable. Due to the special structure of the lowrank SDP problem, our algorithm can achieve better iteration complexity than existing results for more general smooth nonconvex problems. In order to overcome the degeneracy issues of SDP problems, we develop two strategies based on random perturbation and dual refinement. These techniques enable us to solve some primal degenerate SDP problems efficiently, for example, Lovász theta SDPs. Our work is a step forward in extending the application range of Riemannian optimization approaches for solving SDP problems. Numerical experiments are conducted to verify the efficiency and robustness of our method.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 21692200, September 2024. <br/> Abstract. In this work, we consider the lowrank decomposition (SDPR) of general convex semidefinite programming (SDP) problems that contain both a positive semidefinite matrix and a nonnegative vector as variables. We develop a ranksupportadaptive feasible method to solve (SDPR) based on Riemannian optimization. The method is able to escape from a saddle point to ensure its convergence to a global optimal solution for generic constraint vectors. We prove its global convergence and local linear convergence without assuming that the objective function is twice differentiable. Due to the special structure of the lowrank SDP problem, our algorithm can achieve better iteration complexity than existing results for more general smooth nonconvex problems. In order to overcome the degeneracy issues of SDP problems, we develop two strategies based on random perturbation and dual refinement. These techniques enable us to solve some primal degenerate SDP problems efficiently, for example, Lovász theta SDPs. Our work is a step forward in extending the application range of Riemannian optimization approaches for solving SDP problems. Numerical experiments are conducted to verify the efficiency and robustness of our method. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
A Feasible Method for General Convex LowRank SDP Problems
10.1137/23M1561464
SIAM Journal on Optimization
20240701T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Tianyun Tang
KimChuan Toh
A Feasible Method for General Convex LowRank SDP Problems
34
3
2169
2200
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1561464
https://epubs.siam.org/doi/abs/10.1137/23M1561464?af=R
© 2024 Society for Industrial and Applied Mathematics

PathFollowing Methods for Maximum a Posteriori Estimators in Bayesian Hierarchical Models: How Estimates Depend on Hyperparameters
https://epubs.siam.org/doi/abs/10.1137/22M153330X?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 22012230, September 2024. <br/> Abstract. Maximum a posteriori (MAP) estimation, like all Bayesian methods, depends on prior assumptions. These assumptions are often chosen to promote specific features in the recovered estimate. The form of the chosen prior determines the shape of the posterior distribution, thus the behavior of the estimator and complexity of the associated optimization problem. Here, we consider a family of Gaussian hierarchical models with generalized gamma hyperpriors designed to promote sparsity in linear inverse problems. By varying the hyperparameters, we move continuously between priors that act as smoothed [math] penalties with flexible [math], smoothing, and scale. We then introduce a predictorcorrector method that tracks MAP solution paths as the hyperparameters vary. Path following allows a user to explore the space of possible MAP solutions and to test the sensitivity of solutions to changes in the prior assumptions. By tracing paths from a convex region to a nonconvex region, the user could find local minimizers in strongly sparsity promoting regimes that are consistent with a convex relaxation derived using related prior assumptions. We show experimentally that these solutions are less error prone than direct optimization of the nonconvex problem.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 22012230, September 2024. <br/> Abstract. Maximum a posteriori (MAP) estimation, like all Bayesian methods, depends on prior assumptions. These assumptions are often chosen to promote specific features in the recovered estimate. The form of the chosen prior determines the shape of the posterior distribution, thus the behavior of the estimator and complexity of the associated optimization problem. Here, we consider a family of Gaussian hierarchical models with generalized gamma hyperpriors designed to promote sparsity in linear inverse problems. By varying the hyperparameters, we move continuously between priors that act as smoothed [math] penalties with flexible [math], smoothing, and scale. We then introduce a predictorcorrector method that tracks MAP solution paths as the hyperparameters vary. Path following allows a user to explore the space of possible MAP solutions and to test the sensitivity of solutions to changes in the prior assumptions. By tracing paths from a convex region to a nonconvex region, the user could find local minimizers in strongly sparsity promoting regimes that are consistent with a convex relaxation derived using related prior assumptions. We show experimentally that these solutions are less error prone than direct optimization of the nonconvex problem. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
PathFollowing Methods for Maximum a Posteriori Estimators in Bayesian Hierarchical Models: How Estimates Depend on Hyperparameters
10.1137/22M153330X
SIAM Journal on Optimization
20240701T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Zilai Si
Yucong Liu
Alexander Strang
PathFollowing Methods for Maximum a Posteriori Estimators in Bayesian Hierarchical Models: How Estimates Depend on Hyperparameters
34
3
2201
2230
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M153330X
https://epubs.siam.org/doi/abs/10.1137/22M153330X?af=R
© 2024 Society for Industrial and Applied Mathematics

Scalable Frank–Wolfe on Generalized SelfConcordant Functions via Simple Steps
https://epubs.siam.org/doi/abs/10.1137/23M1616789?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 22312258, September 2024. <br/> Abstract. Generalized selfconcordance is a key property present in the objective function of many important learning problems. We establish the convergence rate of a simple Frank–Wolfe variant that uses the openloop step size strategy [math], obtaining an [math] convergence rate for this class of functions in terms of primal gap and Frank–Wolfe gap, where [math] is the iteration count. This avoids the use of secondorder information or the need to estimate local smoothness parameters of previous work. We also show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 22312258, September 2024. <br/> Abstract. Generalized selfconcordance is a key property present in the objective function of many important learning problems. We establish the convergence rate of a simple Frank–Wolfe variant that uses the openloop step size strategy [math], obtaining an [math] convergence rate for this class of functions in terms of primal gap and Frank–Wolfe gap, where [math] is the iteration count. This avoids the use of secondorder information or the need to estimate local smoothness parameters of previous work. We also show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Scalable Frank–Wolfe on Generalized SelfConcordant Functions via Simple Steps
10.1137/23M1616789
SIAM Journal on Optimization
20240702T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Alejandro Carderera
Mathieu Besançon
Sebastian Pokutta
Scalable Frank–Wolfe on Generalized SelfConcordant Functions via Simple Steps
34
3
2231
2258
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1616789
https://epubs.siam.org/doi/abs/10.1137/23M1616789?af=R
© 2024 Society for Industrial and Applied Mathematics

Fast Convergence of Inertial Multiobjective GradientLike Systems with Asymptotic Vanishing Damping
https://epubs.siam.org/doi/abs/10.1137/23M1588512?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 22592286, September 2024. <br/> Abstract. We present a new gradientlike dynamical system related to unconstrained convex smooth multiobjective optimization which involves inertial effects and asymptotic vanishing damping. To the best of our knowledge, this system is the first inertial gradientlike system for multiobjective optimization problems including asymptotic vanishing damping, expanding the ideas previously laid out in [H. Attouch and G. Garrigos, Multiobjective Optimization: An Inertial Dynamical Approach to Pareto Optima, preprint, arXiv:1506.02823, 2015]. We prove existence of solutions to this system in finite dimensions and further prove that its bounded solutions converge weakly to weakly Pareto optimal points. In addition, we obtain a convergence rate of order [math] for the function values measured with a merit function. This approach presents a good basis for the development of fast gradient methods for multiobjective optimization.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 22592286, September 2024. <br/> Abstract. We present a new gradientlike dynamical system related to unconstrained convex smooth multiobjective optimization which involves inertial effects and asymptotic vanishing damping. To the best of our knowledge, this system is the first inertial gradientlike system for multiobjective optimization problems including asymptotic vanishing damping, expanding the ideas previously laid out in [H. Attouch and G. Garrigos, Multiobjective Optimization: An Inertial Dynamical Approach to Pareto Optima, preprint, arXiv:1506.02823, 2015]. We prove existence of solutions to this system in finite dimensions and further prove that its bounded solutions converge weakly to weakly Pareto optimal points. In addition, we obtain a convergence rate of order [math] for the function values measured with a merit function. This approach presents a good basis for the development of fast gradient methods for multiobjective optimization. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Fast Convergence of Inertial Multiobjective GradientLike Systems with Asymptotic Vanishing Damping
10.1137/23M1588512
SIAM Journal on Optimization
20240702T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Konstantin Sonntag
Sebastian Peitz
Fast Convergence of Inertial Multiobjective GradientLike Systems with Asymptotic Vanishing Damping
34
3
2259
2286
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1588512
https://epubs.siam.org/doi/abs/10.1137/23M1588512?af=R
© 2024 Society for Industrial and Applied Mathematics

Fast Optimization of Charged Particle Dynamics with Damping
https://epubs.siam.org/doi/abs/10.1137/23M1599045?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 22872313, September 2024. <br/> Abstract. In this paper, the convergence analysis of accelerated secondorder methods for convex optimization problems is developed from the point of view of autonomous dissipative inertial continuous dynamics in the magnetic field. Different from the classical heavy ball model with damping, we consider the motion of a charged particle in a magnetic field model involving the linear asymptotic vanishing damping. It is a coupled ordinary differential system by adding the magnetic coupled term [math] to the heavy ball system with [math]. In order to develop fast optimization methods, our first contribution is to prove the global existence and uniqueness of a smooth solution under certain regularity conditions of this system via the Banach fixed point theorem. Our second contribution is to establish the convergence rate of corresponding algorithms involving inertial features via discrete time versions of inertial dynamics under the magnetic field. Meanwhile, the connection of algorithms between the heavy ball model and the motion of a charged particle in a magnetic field model is established.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 22872313, September 2024. <br/> Abstract. In this paper, the convergence analysis of accelerated secondorder methods for convex optimization problems is developed from the point of view of autonomous dissipative inertial continuous dynamics in the magnetic field. Different from the classical heavy ball model with damping, we consider the motion of a charged particle in a magnetic field model involving the linear asymptotic vanishing damping. It is a coupled ordinary differential system by adding the magnetic coupled term [math] to the heavy ball system with [math]. In order to develop fast optimization methods, our first contribution is to prove the global existence and uniqueness of a smooth solution under certain regularity conditions of this system via the Banach fixed point theorem. Our second contribution is to establish the convergence rate of corresponding algorithms involving inertial features via discrete time versions of inertial dynamics under the magnetic field. Meanwhile, the connection of algorithms between the heavy ball model and the motion of a charged particle in a magnetic field model is established. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Fast Optimization of Charged Particle Dynamics with Damping
10.1137/23M1599045
SIAM Journal on Optimization
20240702T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Weiping Yan
Yu Tang
Gonglin Yuan
Fast Optimization of Charged Particle Dynamics with Damping
34
3
2287
2313
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1599045
https://epubs.siam.org/doi/abs/10.1137/23M1599045?af=R
© 2024 Society for Industrial and Applied Mathematics

A Descent Algorithm for the Optimal Control of ReLU Neural Network Informed PDEs Based on Approximate Directional Derivatives
https://epubs.siam.org/doi/abs/10.1137/22M1534420?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 23142349, September 2024. <br/> Abstract. We propose and analyze a numerical algorithm for solving a class of optimal control problems for learninginformed semilinear partial differential equations (PDEs). Such PDEs contain constituents that are in principle unknown and are approximated by nonsmooth ReLU neural networks. We first show that direct smoothing of the ReLU network with the aim of using classical numerical solvers can have disadvantages, such as potentially introducing multiple solutions for the corresponding PDE. This motivates us to devise a numerical algorithm that treats directly the nonsmooth optimal control problem, by employing a descent algorithm inspired by a bundlefree method. Several numerical examples are provided and the efficiency of the algorithm is shown.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 23142349, September 2024. <br/> Abstract. We propose and analyze a numerical algorithm for solving a class of optimal control problems for learninginformed semilinear partial differential equations (PDEs). Such PDEs contain constituents that are in principle unknown and are approximated by nonsmooth ReLU neural networks. We first show that direct smoothing of the ReLU network with the aim of using classical numerical solvers can have disadvantages, such as potentially introducing multiple solutions for the corresponding PDE. This motivates us to devise a numerical algorithm that treats directly the nonsmooth optimal control problem, by employing a descent algorithm inspired by a bundlefree method. Several numerical examples are provided and the efficiency of the algorithm is shown. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
A Descent Algorithm for the Optimal Control of ReLU Neural Network Informed PDEs Based on Approximate Directional Derivatives
10.1137/22M1534420
SIAM Journal on Optimization
20240702T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Guozhi Dong
Michael Hintermüller
Kostas Papafitsoros
A Descent Algorithm for the Optimal Control of ReLU Neural Network Informed PDEs Based on Approximate Directional Derivatives
34
3
2314
2349
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1534420
https://epubs.siam.org/doi/abs/10.1137/22M1534420?af=R
© 2024 Society for Industrial and Applied Mathematics

Subgradient Regularized Multivariate Convex Regression at Scale
https://epubs.siam.org/doi/abs/10.1137/21M1413134?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 23502377, September 2024. <br/> Abstract. We present new largescale algorithms for fitting a subgradient regularized multivariate convex regression function to [math] samples in [math] dimensions—a key problem in shape constrained nonparametric regression with applications in statistics, engineering, and the applied sciences. The infinitedimensional learning task can be expressed via a convex quadratic program (QP) with [math] decision variables and [math] constraints. While instances with [math] in the lower thousands can be addressed with current algorithms within reasonable runtimes, solving larger problems (e.g., [math] or [math]) is computationally challenging. To this end, we present an active set type algorithm on the dual QP. For computational scalability, we allow for approximate optimization of the reduced subproblems and propose randomized augmentation rules for expanding the active set. We derive novel computational guarantees for our algorithms. We demonstrate that our framework can approximately solve instances of the subgradient regularized convex regression problem with [math] and [math] within minutes and shows strong computational performance compared to earlier approaches.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 23502377, September 2024. <br/> Abstract. We present new largescale algorithms for fitting a subgradient regularized multivariate convex regression function to [math] samples in [math] dimensions—a key problem in shape constrained nonparametric regression with applications in statistics, engineering, and the applied sciences. The infinitedimensional learning task can be expressed via a convex quadratic program (QP) with [math] decision variables and [math] constraints. While instances with [math] in the lower thousands can be addressed with current algorithms within reasonable runtimes, solving larger problems (e.g., [math] or [math]) is computationally challenging. To this end, we present an active set type algorithm on the dual QP. For computational scalability, we allow for approximate optimization of the reduced subproblems and propose randomized augmentation rules for expanding the active set. We derive novel computational guarantees for our algorithms. We demonstrate that our framework can approximately solve instances of the subgradient regularized convex regression problem with [math] and [math] within minutes and shows strong computational performance compared to earlier approaches. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Subgradient Regularized Multivariate Convex Regression at Scale
10.1137/21M1413134
SIAM Journal on Optimization
20240702T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Wenyu Chen
Rahul Mazumder
Subgradient Regularized Multivariate Convex Regression at Scale
34
3
2350
2377
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/21M1413134
https://epubs.siam.org/doi/abs/10.1137/21M1413134?af=R
© 2024 Society for Industrial and Applied Mathematics

A Novel Nonconvex Relaxation Approach to LowRank Matrix Completion of Inexact Observed Data
https://epubs.siam.org/doi/abs/10.1137/22M1543653?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 23782410, September 2024. <br/> Abstract. In recent years, matrix completion has become one of the main concepts in data science. In the process of data acquisition in real applications, in addition to missing data, observed data may be inaccurate. This paper is concerned with such matrix completion of inexact observed data which can be modeled as a rank minimization problem. We adopt the difference of the nuclear norm and the Frobenius norm as an approximation of the rank function, employ Tikhonovtype regularization to preserve the inherent characteristics of original data and control oscillation arising from inexact observations, and then establish a new nonsmooth and nonconvex relaxation model for such lowrank matrix completion. We propose a new accelerated proximal gradient–type algorithm to solve the nonsmooth and nonconvex minimization problem and show that the generated sequence is bounded and globally converges to a critical point of our model. Furthermore, the rate of convergence is given via the Kurdyka–Łojasiewicz property. We evaluate our model and method on visual images and received signal strength fingerprint data in an indoor positioning system. Numerical experiments illustrate that our approach outperforms some stateoftheart methods, and also verify the efficacy of the Tikhonovtype regularization.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 23782410, September 2024. <br/> Abstract. In recent years, matrix completion has become one of the main concepts in data science. In the process of data acquisition in real applications, in addition to missing data, observed data may be inaccurate. This paper is concerned with such matrix completion of inexact observed data which can be modeled as a rank minimization problem. We adopt the difference of the nuclear norm and the Frobenius norm as an approximation of the rank function, employ Tikhonovtype regularization to preserve the inherent characteristics of original data and control oscillation arising from inexact observations, and then establish a new nonsmooth and nonconvex relaxation model for such lowrank matrix completion. We propose a new accelerated proximal gradient–type algorithm to solve the nonsmooth and nonconvex minimization problem and show that the generated sequence is bounded and globally converges to a critical point of our model. Furthermore, the rate of convergence is given via the Kurdyka–Łojasiewicz property. We evaluate our model and method on visual images and received signal strength fingerprint data in an indoor positioning system. Numerical experiments illustrate that our approach outperforms some stateoftheart methods, and also verify the efficacy of the Tikhonovtype regularization. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
A Novel Nonconvex Relaxation Approach to LowRank Matrix Completion of Inexact Observed Data
10.1137/22M1543653
SIAM Journal on Optimization
20240702T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Yan Li
Liping Zhang
A Novel Nonconvex Relaxation Approach to LowRank Matrix Completion of Inexact Observed Data
34
3
2378
2410
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1543653
https://epubs.siam.org/doi/abs/10.1137/22M1543653?af=R
© 2024 Society for Industrial and Applied Mathematics

High Probability Complexity Bounds for Adaptive Step Search Based on Stochastic Oracles
https://epubs.siam.org/doi/abs/10.1137/22M1512764?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 24112439, September 2024. <br/> Abstract. We consider a step search method for continuous optimization under a stochastic setting where the function values and gradients are available only through inexact probabilistic zeroth and firstorder oracles. (We introduce the term step search for a class of methods, similar to line search, but where step direction can change during the backtracking procedure.) Unlike the stochastic gradient method and its many variants, the algorithm does not use a prespecified sequence of step sizes but increases or decreases the step size adaptively according to the estimated progress of the algorithm. These oracles capture multiple standard settings including expected loss minimization and zerothorder optimization. Moreover, our framework is very general and allows the function and gradient estimates to be biased. The proposed algorithm is simple to describe and easy to implement. Under fairly general conditions on the oracles, we derive a high probability tail bound on the iteration complexity of the algorithm when it is applied to nonconvex, convex, and strongly convex (more generally, those satisfying the PolyakŁojasiewicz (PL) condition) functions. Our analysis strengthens and extends prior results for stochastic step and line search methods.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 24112439, September 2024. <br/> Abstract. We consider a step search method for continuous optimization under a stochastic setting where the function values and gradients are available only through inexact probabilistic zeroth and firstorder oracles. (We introduce the term step search for a class of methods, similar to line search, but where step direction can change during the backtracking procedure.) Unlike the stochastic gradient method and its many variants, the algorithm does not use a prespecified sequence of step sizes but increases or decreases the step size adaptively according to the estimated progress of the algorithm. These oracles capture multiple standard settings including expected loss minimization and zerothorder optimization. Moreover, our framework is very general and allows the function and gradient estimates to be biased. The proposed algorithm is simple to describe and easy to implement. Under fairly general conditions on the oracles, we derive a high probability tail bound on the iteration complexity of the algorithm when it is applied to nonconvex, convex, and strongly convex (more generally, those satisfying the PolyakŁojasiewicz (PL) condition) functions. Our analysis strengthens and extends prior results for stochastic step and line search methods. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
High Probability Complexity Bounds for Adaptive Step Search Based on Stochastic Oracles
10.1137/22M1512764
SIAM Journal on Optimization
20240702T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Billy Jin
Katya Scheinberg
Miaolan Xie
High Probability Complexity Bounds for Adaptive Step Search Based on Stochastic Oracles
34
3
2411
2439
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1512764
https://epubs.siam.org/doi/abs/10.1137/22M1512764?af=R
© 2024 Society for Industrial and Applied Mathematics

The Rate of Convergence of Bregman Proximal Methods: Local Geometry Versus Regularity Versus Sharpness
https://epubs.siam.org/doi/abs/10.1137/23M1580218?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 24402471, September 2024. <br/> Abstract. We examine the lastiterate convergence rate of Bregman proximal methods—from mirror descent to mirrorprox and its optimistic variants—as a function of the local geometry induced by the proxmapping defining the method. For generality, we focus on local solutions of constrained, nonmonotone variational inequalities, and we show that the convergence rate of a given method depends sharply on its associated Legendre exponent, a notion that measures the growth rate of the underlying Bregman function (Euclidean, entropic, or other) near a solution. In particular, we show that boundary solutions exhibit a stark separation of regimes between methods with a zero and nonzero Legendre exponent: The former converge at a linear rate, while the latter converge, in general, sublinearly. This dichotomy becomes even more pronounced in linearly constrained problems where methods with entropic regularization achieve a linear convergence rate along sharp directions, compared to convergence in a finite number of steps under Euclidean regularization.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 24402471, September 2024. <br/> Abstract. We examine the lastiterate convergence rate of Bregman proximal methods—from mirror descent to mirrorprox and its optimistic variants—as a function of the local geometry induced by the proxmapping defining the method. For generality, we focus on local solutions of constrained, nonmonotone variational inequalities, and we show that the convergence rate of a given method depends sharply on its associated Legendre exponent, a notion that measures the growth rate of the underlying Bregman function (Euclidean, entropic, or other) near a solution. In particular, we show that boundary solutions exhibit a stark separation of regimes between methods with a zero and nonzero Legendre exponent: The former converge at a linear rate, while the latter converge, in general, sublinearly. This dichotomy becomes even more pronounced in linearly constrained problems where methods with entropic regularization achieve a linear convergence rate along sharp directions, compared to convergence in a finite number of steps under Euclidean regularization. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
The Rate of Convergence of Bregman Proximal Methods: Local Geometry Versus Regularity Versus Sharpness
10.1137/23M1580218
SIAM Journal on Optimization
20240709T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Waïss Azizian
Franck Iutzeler
Jérôme Malick
Panayotis Mertikopoulos
The Rate of Convergence of Bregman Proximal Methods: Local Geometry Versus Regularity Versus Sharpness
34
3
2440
2471
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1580218
https://epubs.siam.org/doi/abs/10.1137/23M1580218?af=R
© 2024 Society for Industrial and Applied Mathematics

Complexity of FiniteSum Optimization with Nonsmooth Composite Functions and NonLipschitz Regularization
https://epubs.siam.org/doi/abs/10.1137/23M1546701?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 24722502, September 2024. <br/> Abstract. In this paper, we present complexity analysis of proximal inexact gradient methods for finitesum optimization with a nonconvex nonsmooth composite function and nonLipschitz regularization. By getting access to a convex approximation to the Lipschitz function and a Lipschitz continuous approximation to the nonLipschitz regularizer, we construct a proximal subproblem at each iteration without using exact function values and gradients. With certain accuracy control on inexact gradients and subproblem solutions, we show that the oracle complexity in terms of total number of inexact gradient evaluations is in order [math] to find an [math]approximate firstorder stationary point, ensuring that within a [math]ball centered at this point the maximum reduction of an approximation model does not exceed [math]. This shows that we can have the same worstcase evaluation complexity order as in [C. Cartis, N. I. M. Gould, and P. L. Toint, SIAM J. Optim., 21 (2011), pp. 1721–1739, X. Chen, Ph. L. Toint, and H. Wang, SIAM J. Optim., 29 (2019), pp. 874–903], even if we introduce the nonLipschitz singularity and the nonconvex nonsmooth composite function in the objective function. Moreover, we establish that the oracle complexity regarding the total number of stochastic oracles is in order [math] with high probability for stochastic proximal inexact gradient methods. We further extend the algorithm to adjust to solving stochastic problems with expectation form and derive the associated oracle complexity in order [math] with high probability.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 24722502, September 2024. <br/> Abstract. In this paper, we present complexity analysis of proximal inexact gradient methods for finitesum optimization with a nonconvex nonsmooth composite function and nonLipschitz regularization. By getting access to a convex approximation to the Lipschitz function and a Lipschitz continuous approximation to the nonLipschitz regularizer, we construct a proximal subproblem at each iteration without using exact function values and gradients. With certain accuracy control on inexact gradients and subproblem solutions, we show that the oracle complexity in terms of total number of inexact gradient evaluations is in order [math] to find an [math]approximate firstorder stationary point, ensuring that within a [math]ball centered at this point the maximum reduction of an approximation model does not exceed [math]. This shows that we can have the same worstcase evaluation complexity order as in [C. Cartis, N. I. M. Gould, and P. L. Toint, SIAM J. Optim., 21 (2011), pp. 1721–1739, X. Chen, Ph. L. Toint, and H. Wang, SIAM J. Optim., 29 (2019), pp. 874–903], even if we introduce the nonLipschitz singularity and the nonconvex nonsmooth composite function in the objective function. Moreover, we establish that the oracle complexity regarding the total number of stochastic oracles is in order [math] with high probability for stochastic proximal inexact gradient methods. We further extend the algorithm to adjust to solving stochastic problems with expectation form and derive the associated oracle complexity in order [math] with high probability. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Complexity of FiniteSum Optimization with Nonsmooth Composite Functions and NonLipschitz Regularization
10.1137/23M1546701
SIAM Journal on Optimization
20240710T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Xiao Wang
Xiaojun Chen
Complexity of FiniteSum Optimization with Nonsmooth Composite Functions and NonLipschitz Regularization
34
3
2472
2502
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1546701
https://epubs.siam.org/doi/abs/10.1137/23M1546701?af=R
© 2024 Society for Industrial and Applied Mathematics

Using TaylorApproximated Gradients to Improve the Frank–Wolfe Method for Empirical Risk Minimization
https://epubs.siam.org/doi/abs/10.1137/22M1519286?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 25032534, September 2024. <br/> Abstract. The Frank–Wolfe method has become increasingly useful in statistical and machine learning applications due to the structureinducing properties of the iterates and especially in settings where linear minimization over the feasible set is more computationally efficient than projection. In the setting of empirical risk minimization—one of the fundamental optimization problems in statistical and machine learning—the computational effectiveness of Frank–Wolfe methods typically grows linearly in the number of data observations [math]. This is in stark contrast to the case for typical stochastic projection methods. In order to reduce this dependence on [math], we look to secondorder smoothness of typical smooth loss functions (least squares loss and logistic loss, for example), and we propose amending the Frank–Wolfe method with Taylor series–approximated gradients, including variants for both deterministic and stochastic settings. Compared with current stateoftheart methods in the regime where the optimality tolerance [math] is sufficiently small, our methods are able to simultaneously reduce the dependence on large [math] while obtaining optimal convergence rates of Frank–Wolfe methods in both convex and nonconvex settings. We also propose a novel adaptive stepsize approach for which we have computational guarantees. Finally, we present computational experiments which show that our methods exhibit very significant speedups over existing methods on realworld datasets for both convex and nonconvex binary classification problems.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 25032534, September 2024. <br/> Abstract. The Frank–Wolfe method has become increasingly useful in statistical and machine learning applications due to the structureinducing properties of the iterates and especially in settings where linear minimization over the feasible set is more computationally efficient than projection. In the setting of empirical risk minimization—one of the fundamental optimization problems in statistical and machine learning—the computational effectiveness of Frank–Wolfe methods typically grows linearly in the number of data observations [math]. This is in stark contrast to the case for typical stochastic projection methods. In order to reduce this dependence on [math], we look to secondorder smoothness of typical smooth loss functions (least squares loss and logistic loss, for example), and we propose amending the Frank–Wolfe method with Taylor series–approximated gradients, including variants for both deterministic and stochastic settings. Compared with current stateoftheart methods in the regime where the optimality tolerance [math] is sufficiently small, our methods are able to simultaneously reduce the dependence on large [math] while obtaining optimal convergence rates of Frank–Wolfe methods in both convex and nonconvex settings. We also propose a novel adaptive stepsize approach for which we have computational guarantees. Finally, we present computational experiments which show that our methods exhibit very significant speedups over existing methods on realworld datasets for both convex and nonconvex binary classification problems. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Using TaylorApproximated Gradients to Improve the Frank–Wolfe Method for Empirical Risk Minimization
10.1137/22M1519286
SIAM Journal on Optimization
20240711T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Zikai Xiong
Robert M. Freund
Using TaylorApproximated Gradients to Improve the Frank–Wolfe Method for Empirical Risk Minimization
34
3
2503
2534
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1519286
https://epubs.siam.org/doi/abs/10.1137/22M1519286?af=R
© 2024 Society for Industrial and Applied Mathematics

A Finitely Convergent Circumcenter Method for the Convex Feasibility Problem
https://epubs.siam.org/doi/abs/10.1137/23M1595412?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 25352556, September 2024. <br/> Abstract. In this paper, we present a variant of the circumcenter method for the convex feasibility problem (CFP), ensuring finite convergence under a Slater assumption. The method replaces exact projections onto the convex sets with projections onto separating halfspaces, perturbed by positive exogenous parameters that decrease to zero along the iterations. If the perturbation parameters decrease slowly enough, such as the terms of a diverging series, finite convergence is achieved. To the best of our knowledge, this is the first circumcenter method for CFP that guarantees finite convergence.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 25352556, September 2024. <br/> Abstract. In this paper, we present a variant of the circumcenter method for the convex feasibility problem (CFP), ensuring finite convergence under a Slater assumption. The method replaces exact projections onto the convex sets with projections onto separating halfspaces, perturbed by positive exogenous parameters that decrease to zero along the iterations. If the perturbation parameters decrease slowly enough, such as the terms of a diverging series, finite convergence is achieved. To the best of our knowledge, this is the first circumcenter method for CFP that guarantees finite convergence. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
A Finitely Convergent Circumcenter Method for the Convex Feasibility Problem
10.1137/23M1595412
SIAM Journal on Optimization
20240715T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Roger Behling
Yunier BelloCruz
Alfredo N. Iusem
Di Liu
LuizRafael Santos
A Finitely Convergent Circumcenter Method for the Convex Feasibility Problem
34
3
2535
2556
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1595412
https://epubs.siam.org/doi/abs/10.1137/23M1595412?af=R
© 2024 Society for Industrial and Applied Mathematics

Fast Gradient Algorithm with Drylike Friction and Nonmonotone Line Search for Nonconvex Optimization Problems
https://epubs.siam.org/doi/abs/10.1137/22M1532354?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 25572587, September 2024. <br/> Abstract. In this paper, we propose a fast gradient algorithm for the problem of minimizing a differentiable (possibly nonconvex) function in Hilbert spaces. We first extend the dry friction property for convex functions to what we call the drylike friction property in a nonconvex setting, and then employ a line search technique to adaptively update parameters at each iteration. Depending on the choice of parameters, the proposed algorithm exhibits subsequential convergence to a critical point or full sequential convergence to an “approximate” critical point of the objective function. We also establish the full sequential convergence to a critical point under the Kurdyka–Łojasiewicz (KL) property of a merit function. Thanks to the parameters’ flexibility, our algorithm can reduce to a number of existing inertial gradient algorithms with Hessian damping and dry friction. By exploiting variational properties of the Moreau envelope, the proposed algorithm is adapted to address weakly convex nonsmooth optimization problems. In particular, we extend the result on KL exponent for the Moreau envelope of a convex KL function to a broad class of KL functions that are not necessarily convex nor continuous. Simulation results illustrate the efficiency of our algorithm and demonstrate the potential advantages of combining drylike friction with extrapolation and line search techniques.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 25572587, September 2024. <br/> Abstract. In this paper, we propose a fast gradient algorithm for the problem of minimizing a differentiable (possibly nonconvex) function in Hilbert spaces. We first extend the dry friction property for convex functions to what we call the drylike friction property in a nonconvex setting, and then employ a line search technique to adaptively update parameters at each iteration. Depending on the choice of parameters, the proposed algorithm exhibits subsequential convergence to a critical point or full sequential convergence to an “approximate” critical point of the objective function. We also establish the full sequential convergence to a critical point under the Kurdyka–Łojasiewicz (KL) property of a merit function. Thanks to the parameters’ flexibility, our algorithm can reduce to a number of existing inertial gradient algorithms with Hessian damping and dry friction. By exploiting variational properties of the Moreau envelope, the proposed algorithm is adapted to address weakly convex nonsmooth optimization problems. In particular, we extend the result on KL exponent for the Moreau envelope of a convex KL function to a broad class of KL functions that are not necessarily convex nor continuous. Simulation results illustrate the efficiency of our algorithm and demonstrate the potential advantages of combining drylike friction with extrapolation and line search techniques. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Fast Gradient Algorithm with Drylike Friction and Nonmonotone Line Search for Nonconvex Optimization Problems
10.1137/22M1532354
SIAM Journal on Optimization
20240717T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Lien T. Nguyen
Andrew Eberhard
Xinghuo Yu
Chaojie Li
Fast Gradient Algorithm with Drylike Friction and Nonmonotone Line Search for Nonconvex Optimization Problems
34
3
2557
2587
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1532354
https://epubs.siam.org/doi/abs/10.1137/22M1532354?af=R
© 2024 Society for Industrial and Applied Mathematics

Provably Faster Gradient Descent via Long Steps
https://epubs.siam.org/doi/abs/10.1137/23M1588408?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 25882608, September 2024. <br/> Abstract. This work establishes new convergence guarantees for gradient descent in smooth convex optimization via a computerassisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical oneiteration inductions used in most firstorder method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster [math] rate for gradient descent is also motivated along with simple numerical validation.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 25882608, September 2024. <br/> Abstract. This work establishes new convergence guarantees for gradient descent in smooth convex optimization via a computerassisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical oneiteration inductions used in most firstorder method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster [math] rate for gradient descent is also motivated along with simple numerical validation. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Provably Faster Gradient Descent via Long Steps
10.1137/23M1588408
SIAM Journal on Optimization
20240718T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Benjamin Grimmer
Provably Faster Gradient Descent via Long Steps
34
3
2588
2608
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1588408
https://epubs.siam.org/doi/abs/10.1137/23M1588408?af=R
© 2024 Society for Industrial and Applied Mathematics

Square Root LASSO: WellPosedness, Lipschitz Stability, and the Tuning TradeOff
https://epubs.siam.org/doi/abs/10.1137/23M1561968?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 26092637, September 2024. <br/> Abstract. This paper studies wellposedness and parameter sensitivity of the square root LASSO (SRLASSO), an optimization model for recovering sparse solutions to linear inverse problems in finite dimension. An advantage of the SRLASSO (e.g., over the standard LASSO) is that the optimal tuning of the regularization parameter is robust with respect to measurement noise. This paper provides three pointbased regularity conditions at a solution of the SRLASSO: the weak, intermediate, and strong assumptions. It is shown that the weak assumption implies uniqueness of the solution in question. The intermediate assumption yields a directionally differentiable and locally Lipschitz solution map (with explicit Lipschitz bounds), whereas the strong assumption gives continuous differentiability of said map around the point in question. Our analysis leads to new theoretical insights on the comparison between SRLASSO and LASSO from the viewpoint of tuning parameter sensitivity: noiserobust optimal parameter choice for SRLASSO comes at the “price” of elevated tuning parameter sensitivity. Numerical results support and showcase the theoretical findings.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 26092637, September 2024. <br/> Abstract. This paper studies wellposedness and parameter sensitivity of the square root LASSO (SRLASSO), an optimization model for recovering sparse solutions to linear inverse problems in finite dimension. An advantage of the SRLASSO (e.g., over the standard LASSO) is that the optimal tuning of the regularization parameter is robust with respect to measurement noise. This paper provides three pointbased regularity conditions at a solution of the SRLASSO: the weak, intermediate, and strong assumptions. It is shown that the weak assumption implies uniqueness of the solution in question. The intermediate assumption yields a directionally differentiable and locally Lipschitz solution map (with explicit Lipschitz bounds), whereas the strong assumption gives continuous differentiability of said map around the point in question. Our analysis leads to new theoretical insights on the comparison between SRLASSO and LASSO from the viewpoint of tuning parameter sensitivity: noiserobust optimal parameter choice for SRLASSO comes at the “price” of elevated tuning parameter sensitivity. Numerical results support and showcase the theoretical findings. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Square Root LASSO: WellPosedness, Lipschitz Stability, and the Tuning TradeOff
10.1137/23M1561968
SIAM Journal on Optimization
20240718T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Aaron Berk
Simone Brugiapaglia
Tim Hoheisel
Square Root LASSO: WellPosedness, Lipschitz Stability, and the Tuning TradeOff
34
3
2609
2637
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1561968
https://epubs.siam.org/doi/abs/10.1137/23M1561968?af=R
© 2024 Society for Industrial and Applied Mathematics

Small Errors in Random ZerothOrder Optimization Are Imaginary
https://epubs.siam.org/doi/abs/10.1137/22M1510261?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 26382670, September 2024. <br/> Abstract. Most zerothorder optimization algorithms mimic a firstorder algorithm but replace the gradient of the objective function with some gradient estimator that can be computed from a small number of function evaluations. This estimator is constructed randomly, and its expectation matches the gradient of a smooth approximation of the objective function whose quality improves as the underlying smoothing parameter [math] is reduced. Gradient estimators requiring a smaller number of function evaluations are preferable from a computational point of view. While estimators based on a single function evaluation can be obtained by use of the divergence theorem from vector calculus, their variance explodes as [math] tends to 0. Estimators based on multiple function evaluations, on the other hand, suffer from numerical cancellation when [math] tends to 0. To combat both effects simultaneously, we extend the objective function to the complex domain and construct a gradient estimator that evaluates the objective at a complex point whose coordinates have small imaginary parts of the order [math]. As this estimator requires only one function evaluation, it is immune to cancellation. In addition, its variance remains bounded as [math] tends to 0. We prove that zerothorder algorithms that use our estimator offer the same theoretical convergence guarantees as the stateoftheart methods. Numerical experiments suggest, however, that they often converge faster in practice.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 26382670, September 2024. <br/> Abstract. Most zerothorder optimization algorithms mimic a firstorder algorithm but replace the gradient of the objective function with some gradient estimator that can be computed from a small number of function evaluations. This estimator is constructed randomly, and its expectation matches the gradient of a smooth approximation of the objective function whose quality improves as the underlying smoothing parameter [math] is reduced. Gradient estimators requiring a smaller number of function evaluations are preferable from a computational point of view. While estimators based on a single function evaluation can be obtained by use of the divergence theorem from vector calculus, their variance explodes as [math] tends to 0. Estimators based on multiple function evaluations, on the other hand, suffer from numerical cancellation when [math] tends to 0. To combat both effects simultaneously, we extend the objective function to the complex domain and construct a gradient estimator that evaluates the objective at a complex point whose coordinates have small imaginary parts of the order [math]. As this estimator requires only one function evaluation, it is immune to cancellation. In addition, its variance remains bounded as [math] tends to 0. We prove that zerothorder algorithms that use our estimator offer the same theoretical convergence guarantees as the stateoftheart methods. Numerical experiments suggest, however, that they often converge faster in practice. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Small Errors in Random ZerothOrder Optimization Are Imaginary
10.1137/22M1510261
SIAM Journal on Optimization
20240719T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Wouter Jongeneel
ManChung Yue
Daniel Kuhn
Small Errors in Random ZerothOrder Optimization Are Imaginary
34
3
2638
2670
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1510261
https://epubs.siam.org/doi/abs/10.1137/22M1510261?af=R
© 2024 Society for Industrial and Applied Mathematics

Stochastic TrustRegion Algorithm in Random Subspaces with Convergence and Expected Complexity Analyses
https://epubs.siam.org/doi/abs/10.1137/22M1524072?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 26712699, September 2024. <br/> Abstract. This work proposes a framework for largescale stochastic derivativefree optimization (DFO) by introducing STARS, a trustregion method based on iterative minimization in random subspaces. This framework is both an algorithmic and theoretical extension of a random subspace derivativefree optimization (RSDFO) framework, and an algorithm for stochastic optimization with random models (STORM). Moreover, like RSDFO, STARS achieves scalability by minimizing interpolation models that approximate the objective in lowdimensional affine subspaces, thus significantly reducing periteration costs in terms of function evaluations and yielding strong performance on largescale stochastic DFO problems. The userdetermined dimension of these subspaces, when the latter are defined, for example, by the columns of socalled Johnson–Lindenstrauss transforms, turns out to be independent of the dimension of the problem. For convergence purposes, inspired by the analyses of RSDFO and STORM, both a particular quality of the subspace and the accuracies of random function estimates and models are required to hold with sufficiently high, but fixed, probabilities. Using martingale theory under the latter assumptions, an almost sure global convergence of STARS to a firstorder stationary point is shown, and the expected number of iterations required to reach a desired firstorder accuracy is proved to be similar to that of STORM and other stochastic DFO algorithms, up to constants.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 26712699, September 2024. <br/> Abstract. This work proposes a framework for largescale stochastic derivativefree optimization (DFO) by introducing STARS, a trustregion method based on iterative minimization in random subspaces. This framework is both an algorithmic and theoretical extension of a random subspace derivativefree optimization (RSDFO) framework, and an algorithm for stochastic optimization with random models (STORM). Moreover, like RSDFO, STARS achieves scalability by minimizing interpolation models that approximate the objective in lowdimensional affine subspaces, thus significantly reducing periteration costs in terms of function evaluations and yielding strong performance on largescale stochastic DFO problems. The userdetermined dimension of these subspaces, when the latter are defined, for example, by the columns of socalled Johnson–Lindenstrauss transforms, turns out to be independent of the dimension of the problem. For convergence purposes, inspired by the analyses of RSDFO and STORM, both a particular quality of the subspace and the accuracies of random function estimates and models are required to hold with sufficiently high, but fixed, probabilities. Using martingale theory under the latter assumptions, an almost sure global convergence of STARS to a firstorder stationary point is shown, and the expected number of iterations required to reach a desired firstorder accuracy is proved to be similar to that of STORM and other stochastic DFO algorithms, up to constants. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Stochastic TrustRegion Algorithm in Random Subspaces with Convergence and Expected Complexity Analyses
10.1137/22M1524072
SIAM Journal on Optimization
20240725T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
K. J. Dzahini
S. M. Wild
Stochastic TrustRegion Algorithm in Random Subspaces with Convergence and Expected Complexity Analyses
34
3
2671
2699
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1524072
https://epubs.siam.org/doi/abs/10.1137/22M1524072?af=R
© 2024 Society for Industrial and Applied Mathematics

Convergence Analysis of a Norm MinimizationBased Convex Vector Optimization Algorithm
https://epubs.siam.org/doi/abs/10.1137/23M1574580?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 27002728, September 2024. <br/> Abstract. In this work, we propose an outer approximation algorithm for solving bounded convex vector optimization problems (CVOPs). The scalarization model solved iteratively within the algorithm is a modification of the normminimizing scalarization proposed in [Ç. Ararat, F. Ulus, and M. Umer, J. Optim. Theory Appl., 194 (2022), pp. 681–712]. For a predetermined tolerance [math], we prove that the algorithm terminates after finitely many iterations, and it returns a polyhedral outer approximation to the upper image of the CVOP such that the Hausdorff distance between the two is less than [math]. We show that for an arbitrary norm used in the scalarization models, the approximation error after [math] iterations decreases by the order of [math], where [math] is the dimension of the objective space. An improved convergence rate of [math] is proved for the special case of using the Euclidean norm.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 27002728, September 2024. <br/> Abstract. In this work, we propose an outer approximation algorithm for solving bounded convex vector optimization problems (CVOPs). The scalarization model solved iteratively within the algorithm is a modification of the normminimizing scalarization proposed in [Ç. Ararat, F. Ulus, and M. Umer, J. Optim. Theory Appl., 194 (2022), pp. 681–712]. For a predetermined tolerance [math], we prove that the algorithm terminates after finitely many iterations, and it returns a polyhedral outer approximation to the upper image of the CVOP such that the Hausdorff distance between the two is less than [math]. We show that for an arbitrary norm used in the scalarization models, the approximation error after [math] iterations decreases by the order of [math], where [math] is the dimension of the objective space. An improved convergence rate of [math] is proved for the special case of using the Euclidean norm. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Convergence Analysis of a Norm MinimizationBased Convex Vector Optimization Algorithm
10.1137/23M1574580
SIAM Journal on Optimization
20240725T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Çağin Ararat
Firdevs Ulus
Muhammad Umer
Convergence Analysis of a Norm MinimizationBased Convex Vector Optimization Algorithm
34
3
2700
2728
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1574580
https://epubs.siam.org/doi/abs/10.1137/23M1574580?af=R
© 2024 Society for Industrial and Applied Mathematics

Convergence of EntropyRegularized Natural Policy Gradient with Linear Function Approximation
https://epubs.siam.org/doi/abs/10.1137/22M1540156?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 27292755, September 2024. <br/> Abstract. Natural policy gradient (NPG) methods, equipped with function approximation and entropy regularization, achieve impressive empirical success in reinforcement learning problems with large stateaction spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finitetime convergence analyses of entropyregularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropyregularized NPG with averaging satisfies the persistence of excitation condition, and achieves a fast convergence rate of [math] up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropyregularized NPG exhibits linear convergence up to the compatible function approximation error. Finally, we provide sample complexity results for samplebased NPG with entropy regularization.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 27292755, September 2024. <br/> Abstract. Natural policy gradient (NPG) methods, equipped with function approximation and entropy regularization, achieve impressive empirical success in reinforcement learning problems with large stateaction spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finitetime convergence analyses of entropyregularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropyregularized NPG with averaging satisfies the persistence of excitation condition, and achieves a fast convergence rate of [math] up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropyregularized NPG exhibits linear convergence up to the compatible function approximation error. Finally, we provide sample complexity results for samplebased NPG with entropy regularization. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Convergence of EntropyRegularized Natural Policy Gradient with Linear Function Approximation
10.1137/22M1540156
SIAM Journal on Optimization
20240730T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Semih Cayci
Niao He
R. Srikant
Convergence of EntropyRegularized Natural Policy Gradient with Linear Function Approximation
34
3
2729
2755
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1540156
https://epubs.siam.org/doi/abs/10.1137/22M1540156?af=R
© 2024 Society for Industrial and Applied Mathematics

Variational and Strong Variational Convexity in InfiniteDimensional Variational Analysis
https://epubs.siam.org/doi/abs/10.1137/23M1604667?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 27562787, September 2024. <br/> Abstract. This paper is devoted to a systematic study and characterizations of the fundamental notions of variational and strong variational convexity for lower semicontinuous functions. While these notions have been quite recently introduced by Rockafellar, the importance of them has already been recognized and documented in finitedimensional variational analysis and optimization. Here we address general infinitedimensional settings and derive comprehensive characterizations of both variational and strong variational convexity notions by developing novel techniques, which are essentially different from finitedimensional counterparts. As a consequence of the obtained characterizations, we establish new quantitative and qualitative relationships between strong variational convexity and tilt stability of local minimizers in appropriate frameworks of Banach spaces.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 27562787, September 2024. <br/> Abstract. This paper is devoted to a systematic study and characterizations of the fundamental notions of variational and strong variational convexity for lower semicontinuous functions. While these notions have been quite recently introduced by Rockafellar, the importance of them has already been recognized and documented in finitedimensional variational analysis and optimization. Here we address general infinitedimensional settings and derive comprehensive characterizations of both variational and strong variational convexity notions by developing novel techniques, which are essentially different from finitedimensional counterparts. As a consequence of the obtained characterizations, we establish new quantitative and qualitative relationships between strong variational convexity and tilt stability of local minimizers in appropriate frameworks of Banach spaces. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Variational and Strong Variational Convexity in InfiniteDimensional Variational Analysis
10.1137/23M1604667
SIAM Journal on Optimization
20240812T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
P. D. Khanh
V. V. H. Khoa
B. S. Mordukhovich
V. T. Phat
Variational and Strong Variational Convexity in InfiniteDimensional Variational Analysis
34
3
2756
2787
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1604667
https://epubs.siam.org/doi/abs/10.1137/23M1604667?af=R
© 2024 Society for Industrial and Applied Mathematics

MGProx: A Nonsmooth Multigrid Proximal Gradient Method with Adaptive Restriction for Strongly Convex Optimization
https://epubs.siam.org/doi/abs/10.1137/23M1552140?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 27882820, September 2024. <br/> Abstract. We study the combination of proximal gradient descent with multigrid for solving a class of possibly nonsmooth strongly convex optimization problems. We propose a multigrid proximal gradient method called MGProx, which accelerates the proximal gradient method by multigrid, based on using hierarchical information of the optimization problem. MGProx applies a newly introduced adaptive restriction operator to simplify the Minkowski sum of subdifferentials of the nondifferentiable objective function across different levels. We provide a theoretical characterization of MGProx. First we show that the MGProx update operator exhibits a fixedpoint property. Next, we show that the coarse correction is a descent direction for the fine variable of the original fine level problem in the general nonsmooth case. Last, under some assumptions we provide the convergence rate for the algorithm. In the numerical tests on the elastic obstacle problem, which is an example of a nonsmooth convex optimization problem where the multigrid method can be applied, we show that MGProx has a faster convergence speed than competing methods.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 27882820, September 2024. <br/> Abstract. We study the combination of proximal gradient descent with multigrid for solving a class of possibly nonsmooth strongly convex optimization problems. We propose a multigrid proximal gradient method called MGProx, which accelerates the proximal gradient method by multigrid, based on using hierarchical information of the optimization problem. MGProx applies a newly introduced adaptive restriction operator to simplify the Minkowski sum of subdifferentials of the nondifferentiable objective function across different levels. We provide a theoretical characterization of MGProx. First we show that the MGProx update operator exhibits a fixedpoint property. Next, we show that the coarse correction is a descent direction for the fine variable of the original fine level problem in the general nonsmooth case. Last, under some assumptions we provide the convergence rate for the algorithm. In the numerical tests on the elastic obstacle problem, which is an example of a nonsmooth convex optimization problem where the multigrid method can be applied, we show that MGProx has a faster convergence speed than competing methods. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
MGProx: A Nonsmooth Multigrid Proximal Gradient Method with Adaptive Restriction for Strongly Convex Optimization
10.1137/23M1552140
SIAM Journal on Optimization
20240813T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Andersen Ang
Hans De Sterck
Stephen Vavasis
MGProx: A Nonsmooth Multigrid Proximal Gradient Method with Adaptive Restriction for Strongly Convex Optimization
34
3
2788
2820
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1552140
https://epubs.siam.org/doi/abs/10.1137/23M1552140?af=R
© 2024 Society for Industrial and Applied Mathematics

On a Differential Generalized Nash Equilibrium Problem with Mean Field Interaction
https://epubs.siam.org/doi/abs/10.1137/22M1489952?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 28212855, September 2024. <br/> Abstract. We consider a class of [math]player linear quadratic differential generalized Nash equilibrium problems (GNEPs) with bound constraints on the individual control and state variables. In addition, we assume the individual players’ optimal control problems are coupled through their dynamics and objectives via a timedependent mean field interaction term. This assumption allows us to model the realistic setting that strategic players in large games cannot observe the individual states of their competitors. We observe that the GNEPs require a constraint qualification, which necessitates sufficient robustness of the individuals, in order to prove the existence of an openloop pure strategy Nash equilibrium and to derive optimality conditions. In order to gain qualitative insight into the [math]player game, we assume that players are identical and pass to the limit in [math] to derive a type of firstorder constrained mean field game (MFG). We prove that the mean field interaction terms converge to an absolutely continuous curve of probability measures on the set of possible state trajectories. Using variational convergence methods, we show that the optimal control problems converge to a representative agent problem. Under additional regularity assumptions, we provide an explicit form for the mean field term as the solution of a continuity equation and demonstrate the link back to the [math]player GNEP.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 28212855, September 2024. <br/> Abstract. We consider a class of [math]player linear quadratic differential generalized Nash equilibrium problems (GNEPs) with bound constraints on the individual control and state variables. In addition, we assume the individual players’ optimal control problems are coupled through their dynamics and objectives via a timedependent mean field interaction term. This assumption allows us to model the realistic setting that strategic players in large games cannot observe the individual states of their competitors. We observe that the GNEPs require a constraint qualification, which necessitates sufficient robustness of the individuals, in order to prove the existence of an openloop pure strategy Nash equilibrium and to derive optimality conditions. In order to gain qualitative insight into the [math]player game, we assume that players are identical and pass to the limit in [math] to derive a type of firstorder constrained mean field game (MFG). We prove that the mean field interaction terms converge to an absolutely continuous curve of probability measures on the set of possible state trajectories. Using variational convergence methods, we show that the optimal control problems converge to a representative agent problem. Under additional regularity assumptions, we provide an explicit form for the mean field term as the solution of a continuity equation and demonstrate the link back to the [math]player GNEP. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
On a Differential Generalized Nash Equilibrium Problem with Mean Field Interaction
10.1137/22M1489952
SIAM Journal on Optimization
20240823T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Michael Hintermüller
Thomas M. Surowiec
Mike Theiß
On a Differential Generalized Nash Equilibrium Problem with Mean Field Interaction
34
3
2821
2855
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1489952
https://epubs.siam.org/doi/abs/10.1137/22M1489952?af=R
© 2024 Society for Industrial and Applied Mathematics

MIP Relaxations in Factorable Programming
https://epubs.siam.org/doi/abs/10.1137/22M1515537?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 28562882, September 2024. <br/> Abstract. In this paper, we develop new discrete relaxations for nonlinear expressions in factorable programming. We utilize specialized convexification results as well as composite relaxations to develop mixedinteger programming relaxations. Our relaxations rely on ideal formulations of convex hulls of outerfunctions over a combinatorial structure that captures local innerfunction structure. The resulting relaxations often require fewer variables and are tighter than currently prevalent ones. Finally, we provide computational evidence to demonstrate that our relaxations close approximately 60%–70% of the gap relative to McCormick relaxations and significantly improve the relaxations used in a stateoftheart solver on various instances involving polynomial functions.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 28562882, September 2024. <br/> Abstract. In this paper, we develop new discrete relaxations for nonlinear expressions in factorable programming. We utilize specialized convexification results as well as composite relaxations to develop mixedinteger programming relaxations. Our relaxations rely on ideal formulations of convex hulls of outerfunctions over a combinatorial structure that captures local innerfunction structure. The resulting relaxations often require fewer variables and are tighter than currently prevalent ones. Finally, we provide computational evidence to demonstrate that our relaxations close approximately 60%–70% of the gap relative to McCormick relaxations and significantly improve the relaxations used in a stateoftheart solver on various instances involving polynomial functions. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
MIP Relaxations in Factorable Programming
10.1137/22M1515537
SIAM Journal on Optimization
20240826T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Taotao He
Mohit Tawarmalani
MIP Relaxations in Factorable Programming
34
3
2856
2882
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1515537
https://epubs.siam.org/doi/abs/10.1137/22M1515537?af=R
© 2024 Society for Industrial and Applied Mathematics

Optimality Conditions and Numerical Algorithms for a Class of Linearly Constrained Minimax Optimization Problems
https://epubs.siam.org/doi/abs/10.1137/22M1535243?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 28832916, September 2024. <br/> Abstract. It is well known that there have been many numerical algorithms for solving nonsmooth minimax problems; however, numerical algorithms for nonsmooth minimax problems with joint linear constraints are very rare. This paper aims to discuss optimality conditions and develop practical numerical algorithms for minimax problems with joint linear constraints. First, we use the properties of proximal mapping and the KKT system to establish optimality conditions. Second, we propose a framework of an alternating coordinate algorithm for the minimax problem and analyze its convergence properties. Third, we develop a proximal gradient multistep ascent descent method (PGmsAD) as a numerical algorithm and demonstrate that the method can find an [math]stationary point for this kind of nonsmooth problem in [math] iterations. Finally, we apply PGmsAD to generalized absolute value equations, generalized linear projection equations, and linear regression problems, and we report the efficiency of PGmsAD on largescale optimization.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 28832916, September 2024. <br/> Abstract. It is well known that there have been many numerical algorithms for solving nonsmooth minimax problems; however, numerical algorithms for nonsmooth minimax problems with joint linear constraints are very rare. This paper aims to discuss optimality conditions and develop practical numerical algorithms for minimax problems with joint linear constraints. First, we use the properties of proximal mapping and the KKT system to establish optimality conditions. Second, we propose a framework of an alternating coordinate algorithm for the minimax problem and analyze its convergence properties. Third, we develop a proximal gradient multistep ascent descent method (PGmsAD) as a numerical algorithm and demonstrate that the method can find an [math]stationary point for this kind of nonsmooth problem in [math] iterations. Finally, we apply PGmsAD to generalized absolute value equations, generalized linear projection equations, and linear regression problems, and we report the efficiency of PGmsAD on largescale optimization. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Optimality Conditions and Numerical Algorithms for a Class of Linearly Constrained Minimax Optimization Problems
10.1137/22M1535243
SIAM Journal on Optimization
20240903T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
YuHong Dai
Jiani Wang
Liwei Zhang
Optimality Conditions and Numerical Algorithms for a Class of Linearly Constrained Minimax Optimization Problems
34
3
2883
2916
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1535243
https://epubs.siam.org/doi/abs/10.1137/22M1535243?af=R
© 2024 Society for Industrial and Applied Mathematics

DataDriven Distributionally Robust Multiproduct Pricing Problems under Pure Characteristics Demand Models
https://epubs.siam.org/doi/abs/10.1137/23M1585131?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 29172942, September 2024. <br/> Abstract. This paper considers a multiproduct pricing problem under pure characteristics demand models when the probability distribution of the random parameter in the problem is uncertain. We formulate this problem as a distributionally robust optimization (DRO) problem based on a constructive approach to estimating pure characteristics demand models with pricing by Pang, Su, and Lee. In this model, the consumers’ purchase decision is to maximize their utility. We show that the DRO problem is welldefined, and the objective function is upper semicontinuous by using an equivalent hierarchical form. We also use the datadriven approach to analyze the DRO problem when the ambiguity set, i.e., a set of probability distributions that contains some exact information of the underlying probability distribution, is given by a general momentbased case. We give convergence results as the data size tends to infinity and analyze the quantitative statistical robustness in view of the possible contamination of driven data. Furthermore, we use the Lagrange duality to reformulate the DRO problem as a mathematical program with complementarity constraints, and give a numerical procedure for finding a global solution of the DRO problem under certain specific settings. Finally, we report numerical results that validate the effectiveness and scalability of our approach for the distributionally robust multiproduct pricing problem.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 29172942, September 2024. <br/> Abstract. This paper considers a multiproduct pricing problem under pure characteristics demand models when the probability distribution of the random parameter in the problem is uncertain. We formulate this problem as a distributionally robust optimization (DRO) problem based on a constructive approach to estimating pure characteristics demand models with pricing by Pang, Su, and Lee. In this model, the consumers’ purchase decision is to maximize their utility. We show that the DRO problem is welldefined, and the objective function is upper semicontinuous by using an equivalent hierarchical form. We also use the datadriven approach to analyze the DRO problem when the ambiguity set, i.e., a set of probability distributions that contains some exact information of the underlying probability distribution, is given by a general momentbased case. We give convergence results as the data size tends to infinity and analyze the quantitative statistical robustness in view of the possible contamination of driven data. Furthermore, we use the Lagrange duality to reformulate the DRO problem as a mathematical program with complementarity constraints, and give a numerical procedure for finding a global solution of the DRO problem under certain specific settings. Finally, we report numerical results that validate the effectiveness and scalability of our approach for the distributionally robust multiproduct pricing problem. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
DataDriven Distributionally Robust Multiproduct Pricing Problems under Pure Characteristics Demand Models
10.1137/23M1585131
SIAM Journal on Optimization
20240903T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Jie Jiang
Hailin Sun
Xiaojun Chen
DataDriven Distributionally Robust Multiproduct Pricing Problems under Pure Characteristics Demand Models
34
3
2917
2942
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1585131
https://epubs.siam.org/doi/abs/10.1137/23M1585131?af=R
© 2024 Society for Industrial and Applied Mathematics

A Quadratically Convergent Sequential Programming Method for SecondOrder Cone Programs Capable of Warm Starts
https://epubs.siam.org/doi/abs/10.1137/22M1507681?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 29432972, September 2024. <br/> Abstract. We propose a new method for linear secondorder cone programs. It is based on the sequential quadratic programming framework for nonlinear programming. In contrast to interior point methods, it can capitalize on the warmstart capabilities of activeset quadratic programming subproblem solvers and achieve a local quadratic rate of convergence. In order to overcome the nondifferentiability or singularity observed in nonlinear formulations of the conic constraints, the subproblems approximate the cones with polyhedral outer approximations that are refined throughout the iterations. For nondegenerate instances, the algorithm implicitly identifies the set of cones for which the optimal solution lies at the extreme points. As a consequence, the final steps are identical to regular sequential quadratic programming steps for a differentiable nonlinear optimization problem, yielding local quadratic convergence. We prove the global and local convergence guarantees of the method and present numerical experiments that confirm that the method can take advantage of good starting points and can achieve higher accuracy compared to a stateoftheart interior point solver.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 29432972, September 2024. <br/> Abstract. We propose a new method for linear secondorder cone programs. It is based on the sequential quadratic programming framework for nonlinear programming. In contrast to interior point methods, it can capitalize on the warmstart capabilities of activeset quadratic programming subproblem solvers and achieve a local quadratic rate of convergence. In order to overcome the nondifferentiability or singularity observed in nonlinear formulations of the conic constraints, the subproblems approximate the cones with polyhedral outer approximations that are refined throughout the iterations. For nondegenerate instances, the algorithm implicitly identifies the set of cones for which the optimal solution lies at the extreme points. As a consequence, the final steps are identical to regular sequential quadratic programming steps for a differentiable nonlinear optimization problem, yielding local quadratic convergence. We prove the global and local convergence guarantees of the method and present numerical experiments that confirm that the method can take advantage of good starting points and can achieve higher accuracy compared to a stateoftheart interior point solver. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
A Quadratically Convergent Sequential Programming Method for SecondOrder Cone Programs Capable of Warm Starts
10.1137/22M1507681
SIAM Journal on Optimization
20240903T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Xinyi Luo
Andreas Wächter
A Quadratically Convergent Sequential Programming Method for SecondOrder Cone Programs Capable of Warm Starts
34
3
2943
2972
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1507681
https://epubs.siam.org/doi/abs/10.1137/22M1507681?af=R
© 2024 Society for Industrial and Applied Mathematics

ConsensusBased Optimization Methods Converge Globally
https://epubs.siam.org/doi/abs/10.1137/22M1527805?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 29733004, September 2024. <br/> Abstract. In this paper we study consensusbased optimization (CBO), which is a multiagent metaheuristic derivativefree optimization method that can globally minimize nonconvex nonsmooth functions and is amenable to theoretical analysis. Based on an experimentally supported intuition that, on average, CBO performs a gradient descent of the squared Euclidean distance to the global minimizer, we devise a novel technique for proving the convergence to the global minimizer in meanfield law for a rich class of objective functions. The result unveils internal mechanisms of CBO that are responsible for the success of the method. In particular, we prove that CBO performs a convexification of a large class of optimization problems as the number of optimizing agents goes to infinity. Furthermore, we improve prior analyses by requiring mild assumptions about the initialization of the method and by covering objectives that are merely locally Lipschitz continuous. As a core component of this analysis, we establish a quantitative nonasymptotic Laplace principle, which may be of independent interest. From the result of CBO convergence in meanfield law, it becomes apparent that the hardness of any global optimization problem is necessarily encoded in the rate of the meanfield approximation, for which we provide a novel probabilistic quantitative estimate. The combination of these results allows us to obtain probabilistic global convergence guarantees of the numerical CBO method.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 29733004, September 2024. <br/> Abstract. In this paper we study consensusbased optimization (CBO), which is a multiagent metaheuristic derivativefree optimization method that can globally minimize nonconvex nonsmooth functions and is amenable to theoretical analysis. Based on an experimentally supported intuition that, on average, CBO performs a gradient descent of the squared Euclidean distance to the global minimizer, we devise a novel technique for proving the convergence to the global minimizer in meanfield law for a rich class of objective functions. The result unveils internal mechanisms of CBO that are responsible for the success of the method. In particular, we prove that CBO performs a convexification of a large class of optimization problems as the number of optimizing agents goes to infinity. Furthermore, we improve prior analyses by requiring mild assumptions about the initialization of the method and by covering objectives that are merely locally Lipschitz continuous. As a core component of this analysis, we establish a quantitative nonasymptotic Laplace principle, which may be of independent interest. From the result of CBO convergence in meanfield law, it becomes apparent that the hardness of any global optimization problem is necessarily encoded in the rate of the meanfield approximation, for which we provide a novel probabilistic quantitative estimate. The combination of these results allows us to obtain probabilistic global convergence guarantees of the numerical CBO method. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
ConsensusBased Optimization Methods Converge Globally
10.1137/22M1527805
SIAM Journal on Optimization
20240903T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Massimo Fornasier
Timo Klock
Konstantin Riedl
ConsensusBased Optimization Methods Converge Globally
34
3
2973
3004
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1527805
https://epubs.siam.org/doi/abs/10.1137/22M1527805?af=R
© 2024 Society for Industrial and Applied Mathematics

ComplexityOptimal and ParameterFree FirstOrder Methods for Finding Stationary Points of Composite Optimization Problems
https://epubs.siam.org/doi/abs/10.1137/22M1498826?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 30053032, September 2024. <br/> Abstract. This paper develops and analyzes an accelerated proximal descent method for finding stationary points of nonconvex composite optimization problems. The objective function is of the form [math], where [math] is a proper closed convex function, [math] is a differentiable function on the domain of [math], and [math] is Lipschitz continuous on the domain of [math]. The main advantage of this method is that it is “parameterfree” in the sense that it does not require knowledge of the Lipschitz constant of [math] or of any global topological properties of [math]. It is shown that the proposed method can obtain an [math]approximate stationary point with iteration complexity bounds that are optimal, up to logarithmic terms over [math], in both the convex and nonconvex settings. Some discussion is also given about how the proposed method can be leveraged in other existing optimization frameworks, such as minmax smoothing and penalty frameworks for constrained programming, to create more specialized parameterfree methods. Finally, numerical experiments are presented to support the practical viability of the method.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 30053032, September 2024. <br/> Abstract. This paper develops and analyzes an accelerated proximal descent method for finding stationary points of nonconvex composite optimization problems. The objective function is of the form [math], where [math] is a proper closed convex function, [math] is a differentiable function on the domain of [math], and [math] is Lipschitz continuous on the domain of [math]. The main advantage of this method is that it is “parameterfree” in the sense that it does not require knowledge of the Lipschitz constant of [math] or of any global topological properties of [math]. It is shown that the proposed method can obtain an [math]approximate stationary point with iteration complexity bounds that are optimal, up to logarithmic terms over [math], in both the convex and nonconvex settings. Some discussion is also given about how the proposed method can be leveraged in other existing optimization frameworks, such as minmax smoothing and penalty frameworks for constrained programming, to create more specialized parameterfree methods. Finally, numerical experiments are presented to support the practical viability of the method. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
ComplexityOptimal and ParameterFree FirstOrder Methods for Finding Stationary Points of Composite Optimization Problems
10.1137/22M1498826
SIAM Journal on Optimization
20240904T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Weiwei Kong
ComplexityOptimal and ParameterFree FirstOrder Methods for Finding Stationary Points of Composite Optimization Problems
34
3
3005
3032
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/22M1498826
https://epubs.siam.org/doi/abs/10.1137/22M1498826?af=R
© 2024 Society for Industrial and Applied Mathematics

Interpolation Conditions for Linear Operators and Applications to Performance Estimation Problems
https://epubs.siam.org/doi/abs/10.1137/23M1575391?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 30333063, September 2024. <br/> Abstract. The performance estimation problem methodology makes it possible to determine the exact worstcase performance of an optimization method. In this work, we generalize this framework to firstorder methods involving linear operators. This extension requires an explicit formulation of interpolation conditions for those linear operators. We consider the class of linear operators [math], where matrix [math] has bounded singular values, and the class of linear operators, where [math] is symmetric and has bounded eigenvalues. We describe interpolation conditions for these classes, i.e., necessary and sufficient conditions that, given a list of pairs [math], characterize the existence of a linear operator mapping [math] to [math] for all [math]. Using these conditions, we first identify the exact worstcase behavior of the gradient method applied to the composed objective [math], and observe that it always corresponds to [math] being a scaling operator. We then investigate the Chambolle–Pock method applied to [math], and improve the existing analysis to obtain a proof of the exact convergence rate of the primaldual gap. In addition, we study how this method behaves on Lipschitz convex functions, and obtain a numerical convergence rate for the primal accuracy of the last iterate. We also show numerically that averaging iterates is beneficial in this setting.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 30333063, September 2024. <br/> Abstract. The performance estimation problem methodology makes it possible to determine the exact worstcase performance of an optimization method. In this work, we generalize this framework to firstorder methods involving linear operators. This extension requires an explicit formulation of interpolation conditions for those linear operators. We consider the class of linear operators [math], where matrix [math] has bounded singular values, and the class of linear operators, where [math] is symmetric and has bounded eigenvalues. We describe interpolation conditions for these classes, i.e., necessary and sufficient conditions that, given a list of pairs [math], characterize the existence of a linear operator mapping [math] to [math] for all [math]. Using these conditions, we first identify the exact worstcase behavior of the gradient method applied to the composed objective [math], and observe that it always corresponds to [math] being a scaling operator. We then investigate the Chambolle–Pock method applied to [math], and improve the existing analysis to obtain a proof of the exact convergence rate of the primaldual gap. In addition, we study how this method behaves on Lipschitz convex functions, and obtain a numerical convergence rate for the primal accuracy of the last iterate. We also show numerically that averaging iterates is beneficial in this setting. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Interpolation Conditions for Linear Operators and Applications to Performance Estimation Problems
10.1137/23M1575391
SIAM Journal on Optimization
20240905T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Nizar Bousselmi
Julien M. Hendrickx
François Glineur
Interpolation Conditions for Linear Operators and Applications to Performance Estimation Problems
34
3
3033
3063
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1575391
https://epubs.siam.org/doi/abs/10.1137/23M1575391?af=R
© 2024 Society for Industrial and Applied Mathematics

A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization
https://epubs.siam.org/doi/abs/10.1137/23M1617965?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 30643087, September 2024. <br/> Abstract. We consider stochastic optimization problems involving an expected value of a nonlinear function of a base random vector and a conditional expectation of another function depending on the base random vector, a dependent random vector, and the decision variables. We call such problems conditional stochastic optimization problems. They arise in many applications, such as uplift modeling, reinforcement learning, and contextual optimization. We propose a specialized single timescale stochastic method for nonconvex constrained conditional stochastic optimization problems with a Lipschitz smooth outer function and a generalized differentiable inner function. In the method, we approximate the inner conditional expectation with a rich parametric model whose mean squared error satisfies a stochastic version of a Łojasiewicz condition. The model is used by an inner learning algorithm. The main feature of our approach is that unbiased stochastic estimates of the directions used by the method can be generated with one observation from the joint distribution per iteration, which makes it applicable to realtime learning. The directions, however, are not gradients or subgradients of any overall objective function. We prove the convergence of the method with probability one, using the method of differential inclusions and a specially designed Lyapunov function, involving a stochastic generalization of the Bregman distance. Finally, a numerical illustration demonstrates the viability of our approach.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 30643087, September 2024. <br/> Abstract. We consider stochastic optimization problems involving an expected value of a nonlinear function of a base random vector and a conditional expectation of another function depending on the base random vector, a dependent random vector, and the decision variables. We call such problems conditional stochastic optimization problems. They arise in many applications, such as uplift modeling, reinforcement learning, and contextual optimization. We propose a specialized single timescale stochastic method for nonconvex constrained conditional stochastic optimization problems with a Lipschitz smooth outer function and a generalized differentiable inner function. In the method, we approximate the inner conditional expectation with a rich parametric model whose mean squared error satisfies a stochastic version of a Łojasiewicz condition. The model is used by an inner learning algorithm. The main feature of our approach is that unbiased stochastic estimates of the directions used by the method can be generated with one observation from the joint distribution per iteration, which makes it applicable to realtime learning. The directions, however, are not gradients or subgradients of any overall objective function. We prove the convergence of the method with probability one, using the method of differential inclusions and a specially designed Lyapunov function, involving a stochastic generalization of the Bregman distance. Finally, a numerical illustration demonstrates the viability of our approach. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization
10.1137/23M1617965
SIAM Journal on Optimization
20240910T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Andrzej Ruszczyński
Shangzhe Yang
A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic Optimization
34
3
3064
3087
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1617965
https://epubs.siam.org/doi/abs/10.1137/23M1617965?af=R
© 2024 Society for Industrial and Applied Mathematics

On Minimal Extended Representations of Generalized Power Cones
https://epubs.siam.org/doi/abs/10.1137/23M1617205?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 30883111, September 2024. <br/> Abstract. In this paper, we analyze minimal representations of [math]power cones as simpler cones. We derive some new results on the complexity of the representations, and we provide a procedure to construct a minimal representation by means of second order cones in case [math] and [math] are rational. The construction is based on the identification of the cones with a graph, the mediated graph. Then, we develop a mixed integer linear optimization formulation to obtain the optimal mediated graph, and then the minimal representation. We present the results of a series of computational experiments in order to analyze the computational performance of the approach, both to obtain the representation and its incorporation into a practical conic optimization model that arises in facility location.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 30883111, September 2024. <br/> Abstract. In this paper, we analyze minimal representations of [math]power cones as simpler cones. We derive some new results on the complexity of the representations, and we provide a procedure to construct a minimal representation by means of second order cones in case [math] and [math] are rational. The construction is based on the identification of the cones with a graph, the mediated graph. Then, we develop a mixed integer linear optimization formulation to obtain the optimal mediated graph, and then the minimal representation. We present the results of a series of computational experiments in order to analyze the computational performance of the approach, both to obtain the representation and its incorporation into a practical conic optimization model that arises in facility location. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
On Minimal Extended Representations of Generalized Power Cones
10.1137/23M1617205
SIAM Journal on Optimization
20240910T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Víctor Blanco
Miguel MartínezAntón
On Minimal Extended Representations of Generalized Power Cones
34
3
3088
3111
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1617205
https://epubs.siam.org/doi/abs/10.1137/23M1617205?af=R
© 2024 Society for Industrial and Applied Mathematics

Minimum Spanning Trees in Infinite Graphs: Theory and Algorithms
https://epubs.siam.org/doi/abs/10.1137/23M157627X?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 31123135, September 2024. <br/> Abstract. We discuss finding minimum spanning trees (MSTs) on connected graphs with countably many nodes of finite degree. When edge costs are summable and an MST exists (which is not guaranteed in general), we show that an algorithm that finds MSTs on finite subgraphs (called layers) converges in objective value to the cost of an MST of the whole graph as the sizes of the layers grow to infinity. We call this the layered greedy algorithm since a greedy algorithm is used to find MSTs on each finite layer. We stress that the overall algorithm is not greedy since edges can enter and leave iterate spanning trees as larger layers are considered. However, in the setting where the underlying graph has the finite cycle (FC) property (meaning that every edge is contained in at most finitely many cycles) and distinct edge costs, we show that a unique MST [math] exists and the layered greedy algorithm produces iterates that converge to [math] by eventually “locking in" edges after finitely many iterations. Applications to network deployment are discussed.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 31123135, September 2024. <br/> Abstract. We discuss finding minimum spanning trees (MSTs) on connected graphs with countably many nodes of finite degree. When edge costs are summable and an MST exists (which is not guaranteed in general), we show that an algorithm that finds MSTs on finite subgraphs (called layers) converges in objective value to the cost of an MST of the whole graph as the sizes of the layers grow to infinity. We call this the layered greedy algorithm since a greedy algorithm is used to find MSTs on each finite layer. We stress that the overall algorithm is not greedy since edges can enter and leave iterate spanning trees as larger layers are considered. However, in the setting where the underlying graph has the finite cycle (FC) property (meaning that every edge is contained in at most finitely many cycles) and distinct edge costs, we show that a unique MST [math] exists and the layered greedy algorithm produces iterates that converge to [math] by eventually “locking in" edges after finitely many iterations. Applications to network deployment are discussed. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Minimum Spanning Trees in Infinite Graphs: Theory and Algorithms
10.1137/23M157627X
SIAM Journal on Optimization
20240911T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Christopher T. Ryan
Robert L. Smith
Marina A. Epelman
Minimum Spanning Trees in Infinite Graphs: Theory and Algorithms
34
3
3112
3135
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M157627X
https://epubs.siam.org/doi/abs/10.1137/23M157627X?af=R
© 2024 Society for Industrial and Applied Mathematics

NewtonBased Alternating Methods for the Ground State of a Class of Multicomponent Bose–Einstein Condensates
https://epubs.siam.org/doi/abs/10.1137/23M1580346?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 31363162, September 2024. <br/> Abstract. The computation of the ground state of special multicomponent Bose–Einstein condensates (BECs) can be formulated as an energy functional minimization problem with spherical constraints. It leads to a nonconvex quarticquadratic optimization problem after suitable discretizations. First, we generalize the Newtonbased methods for singlecomponent BECs to the alternating minimization scheme for multicomponent BECs. Second, the global convergent alternating NewtonNoda iteration (ANNI) is proposed. In particular, we prove the positivity preserving property of ANNI under mild conditions. Finally, our analysis is applied to a class of more general “multiblock” optimization problems with spherical constraints. Numerical experiments are performed to evaluate the performance of proposed methods for different multicomponent BECs, including pseudo spin1/2, antiferromagnetic spin1 and spin2 BECs. These results support our theory and demonstrate the efficiency of our algorithms.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 31363162, September 2024. <br/> Abstract. The computation of the ground state of special multicomponent Bose–Einstein condensates (BECs) can be formulated as an energy functional minimization problem with spherical constraints. It leads to a nonconvex quarticquadratic optimization problem after suitable discretizations. First, we generalize the Newtonbased methods for singlecomponent BECs to the alternating minimization scheme for multicomponent BECs. Second, the global convergent alternating NewtonNoda iteration (ANNI) is proposed. In particular, we prove the positivity preserving property of ANNI under mild conditions. Finally, our analysis is applied to a class of more general “multiblock” optimization problems with spherical constraints. Numerical experiments are performed to evaluate the performance of proposed methods for different multicomponent BECs, including pseudo spin1/2, antiferromagnetic spin1 and spin2 BECs. These results support our theory and demonstrate the efficiency of our algorithms. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
NewtonBased Alternating Methods for the Ground State of a Class of Multicomponent Bose–Einstein Condensates
10.1137/23M1580346
SIAM Journal on Optimization
20240911T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Pengfei Huang
Qingzhi Yang
NewtonBased Alternating Methods for the Ground State of a Class of Multicomponent Bose–Einstein Condensates
34
3
3136
3162
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/23M1580346
https://epubs.siam.org/doi/abs/10.1137/23M1580346?af=R
© 2024 Society for Industrial and Applied Mathematics

Corrigendum and Addendum: Newton Differentiability of Convex Functions in Normed Spaces and of a Class of Operators
https://epubs.siam.org/doi/abs/10.1137/24M1669542?af=R
SIAM Journal on Optimization, <a href="https://epubs.siam.org/toc/sjope8/34/3">Volume 34, Issue 3</a>, Page 31633166, September 2024. <br/> Abstract. As it is formulated, Proposition 3.12 of [M. Brokate and M. Ulbrich, SIAM J. Optim., 32 (2022), pp. 1265–1287] contains an error. But this can be corrected in the way described below. The results of [M. Brokate and M. Ulbrich, SIAM J. Optim., 32 (2022), pp. 1265–1287] based on Proposition 3.12 are not affected. We also use the opportunity to add a further illustrating example and to rectify some inaccuracies which may be confusing.
SIAM Journal on Optimization, Volume 34, Issue 3, Page 31633166, September 2024. <br/> Abstract. As it is formulated, Proposition 3.12 of [M. Brokate and M. Ulbrich, SIAM J. Optim., 32 (2022), pp. 1265–1287] contains an error. But this can be corrected in the way described below. The results of [M. Brokate and M. Ulbrich, SIAM J. Optim., 32 (2022), pp. 1265–1287] based on Proposition 3.12 are not affected. We also use the opportunity to add a further illustrating example and to rectify some inaccuracies which may be confusing. <p><img src="https://epubs.siam.org/na101/home/literatum/publisher/siam/journals/covergifs/sjope8/cover.jpg" alttext="cover image"/></p>
Corrigendum and Addendum: Newton Differentiability of Convex Functions in Normed Spaces and of a Class of Operators
10.1137/24M1669542
SIAM Journal on Optimization
20240916T07:00:00Z
© 2024 Society for Industrial and Applied Mathematics
Martin Brokate
Michael Ulbrich
Corrigendum and Addendum: Newton Differentiability of Convex Functions in Normed Spaces and of a Class of Operators
34
3
3163
3166
20240930T07:00:00Z
20240930T07:00:00Z
10.1137/24M1669542
https://epubs.siam.org/doi/abs/10.1137/24M1669542?af=R
© 2024 Society for Industrial and Applied Mathematics