Society for Industrial and Applied Mathematics: SIAM Journal on Optimization: Table of Contents

Sparse Polynomial Matrix Optimization

Jared Miller — 2026-04-01T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 503-533, June 2026.
Abstract. A polynomial matrix inequality (PMI) is a formula asserting that a polynomial matrix is positive semidefinite. Polynomial matrix optimization (PMO) concerns minimizing the smallest eigenvalue of a symmetric polynomial matrix subject to a tuple of PMIs. This work explores the use of sparsity methods in reducing the complexity of sum of squares–based methods in verifying PMIs or solving PMO. In the unconstrained setting, Newton polytopes can be employed to sparsify the monomial basis, resulting in smaller semidefinite programs. In the general setting, we show how to exploit different types of sparsity (term sparsity, correlative sparsity, matrix sparsity) encoded in polynomial matrices to derive sparse semidefinite programming relaxations for PMO. For term sparsity, we show that the block structures of the term sparsity iterations with maximal chordal extensions converge to the one determined by PMI sign symmetries. For correlative sparsity, unlike the scalar case, we provide a counterexample showing that asymptotic convergence does not hold under the Archimedean condition and the running intersection property. By employing the theory of matrix-valued measures, we establish several results on detecting global optimality and retrieving optimal solutions under correlative sparsity. The effectiveness of sparsity methods on reducing computational complexity is demonstrated on various examples of PMO.

A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization

Youssef Diouane — 2026-04-01T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 534-563, June 2026.
Abstract. We develop R2N, a modified quasi-Newton method for minimizing the sum of a [math] function [math] and a lower semicontinuous prox-bounded [math]. Both [math] and [math] may be nonconvex. At each iteration, our method computes a step by minimizing the sum of a quadratic model of [math], a model of [math], and an adaptive quadratic regularization term. A step may be computed by way of a variant of the proximal-gradient method. An advantage of R2N over competing trust-region methods is that proximal operators do not involve an extra trust-region indicator. We also develop the variant R2DH, in which the model Hessian is diagonal, which allows us to compute a step without relying on a subproblem solver when [math] is separable. R2DH can be used as a standalone solver, but also as a subproblem solver inside R2N. We describe nonmonotone variants of both R2N and R2DH. Global convergence of a first-order stationarity measure to zero holds without relying on local Lipschitz continuity of [math], while allowing model Hessians to grow unbounded, an assumption particularly relevant to quasi-Newton models. Under Lipschitz-continuity of [math], we establish a tight worst-case evaluation complexity bound of [math] to bring said measure below [math], where [math] controls the growth of model Hessians. Specifically, the latter must not diverge faster than [math], where [math] is the set of successful iterations up to iteration [math]. When [math], we establish the tight exponential complexity bound [math], where [math] is a constant. We describe our Julia implementation and report numerical experience on a classic basis-pursuit problem, an image denoising problem, a minimum-rank matrix completion problem, a nonlinear support vector machine, and an inverse nonlinear problem.

Zeroth-Order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization

Luke Marrinan — 2026-04-02T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 564-596, June 2026.
Abstract. We consider the minimization of a Lipschitz continuous and expectation-valued function, denoted by [math] and defined as [math], over a closed and convex set [math]. Our focus lies on - both deriving asymptotics as well as obtaining rate and complexity guarantees for computing an approximate stationary point (in a Clarke sense) via zeroth-order schemes. We adopt a smoothing-based approach reliant on minimizing [math], where [math], [math] is a random variable defined on a unit sphere, and [math]. In fact, it has been observed that a stationary point of the [math]-smoothed problem is an [math]-stationary point for the original problem in the Clarke sense. In such a setting, we develop two sets of schemes with promising empirical behavior. (I) We develop a smoothing-enabled variance-reduced zeroth-order gradient framework for minimizing [math] over [math]. In this setting, we make two sets of contributions for the sequence generated by the proposed zeroth-order gradient scheme. (a) The residual function of the smoothed problem tends to zero almost surely along the generated sequence, allowing for making guarantees for [math]-Clarke stationary solutions of the original problem. (b) To compute an [math] that ensures that the expected norm of the residual of the [math]-smoothed problem is within [math] requires no greater than [math] projection steps and [math] function evaluations. (II) Our second scheme is a zeroth-order stochastic quasi-Newton scheme reliant on a combination of randomized and Moreau smoothing; the corresponding iteration and sample complexities for this scheme are [math] and [math], respectively. These statements appear to be novel in the case of both constrained problems as well as in stochastic quasi-Newton settings, as there appear to be few available results that can contend with general nonsmooth, nonconvex, and stochastic regimes via zeroth-order approaches.

Effective Front-Descent Algorithms with Convergence Guarantees

Matteo Lapucci — 2026-04-06T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 597-625, June 2026.
Abstract. In this manuscript, we address continuous unconstrained multiobjective optimization problems and we discuss descent type methods for the reconstruction of the Pareto set. Specifically, we analyze the class of Front Descent methods, which generalizes the Front Steepest Descent algorithm allowing the employment of suitable, effective search directions (e.g., Newton, Quasi-Newton, Barzilai-Borwein). We provide a deep characterization of the behavior and the mechanisms of the algorithmic framework, and we prove that, under reasonable assumptions, standard convergence results and some complexity bounds hold for the generalized approach. Moreover, we prove that popular search directions can indeed be soundly used within the framework. Then, we provide a completely novel type of convergence result, concerning the sequence of sets produced by the procedure. In particular, iterate sets are shown to asymptotically approach stationarity for all of their points; the convergence result is accompanied by a worst-case iteration complexity bound; additionally, in finite precision settings, the sets are shown to only be enriched through exploration steps in later iterations, and suitable stopping conditions can be devised. Finally, the results from a large experimental benchmark show that the proposed class of approaches far outperforms state-of-the-art methodologies.

Nonsmooth Exact Penalty Methods for Equality-Constrained Optimization: Complexity and Implementation

Youssef Diouane — 2026-04-08T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 626-650, June 2026.
Abstract. Penalty methods are a well-known class of algorithms for constrained optimization. They transform a constrained problem into a sequence of unconstrained penalized problems in the hope that approximate solutions of the latter converge to a solution of the former. If Lagrange multipliers exist, exact penalty methods ensure that the penalty parameter only need increase a finite number of times, but they are typically scorned in smooth optimization because the penalized problems are not smooth. This led researchers to consider the implementation of exact penalty methods inconvenient. Recent advances in proximal methods have led to increasingly efficient solvers for nonsmooth optimization. We study a general exact penalty algorithm and use it to show that the exact [math]-penalty method for equality-constrained optimization can, in fact, be implemented efficiently by solving the penalized problem using a proximal-type algorithm. We study the convergence of our algorithm and establish a worst-case complexity bound of [math] to bring a stationarity measure below [math] under the Mangasarian–Fromovitz constraint qualification and Lipschitz continuity of the objective gradient and constraint Jacobian. While the Lipschitz continuity of the objective gradient is not required for convergence in view of recent works, it is used in our analysis to derive the complexity bound. In a degenerate scenario where the penalty parameter grows unbounded, the complexity becomes [math], which is worse than another bound found in the literature. We justify the difference by arguing that our feasibility measure is properly scaled. Finally, we report numerical experience on small-scale problems from a standard collection and compare our solver with an augmented-Lagrangian and an SQP method. Our preliminary implementation is superior to the augmented Lagrangian in terms of robustness and efficiency and is competitive with the SQP method in terms of robustness, though the latter retains a slight edge in terms of number of problem function evaluations.

Rockafellian Relaxation for PDE-Constrained Optimization with Distributional Ambiguity

Harbir Antil — 2026-04-15T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 651-677, June 2026.
Abstract. Stochastic optimization problems are generally known to be unstable to the form of the underlying uncertainty. A framework is introduced for optimal control problems with partial differential equations as constraints that is robust to inaccuracies in the precise form of the problem uncertainty. The framework is based on problem relaxation and involves optimizing a bivariate, “Rockafellian” objective functional that features both a standard control variable and an additional perturbation variable that handles the distributional ambiguity. In the presence of distributional corruption, the Rockafellian objective functionals are shown in the appropriate settings to [math]-converge to uncorrupted objective functionals in the limit of vanishing corruption. Numerical examples illustrate the framework’s utility for outlier detection and removal and for variance reduction.

Two-Norm Discrepancy and Convergence of the Stochastic Gradient Method with Application to Shape Optimization

Marc Dambrine — 2026-04-20T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 678-702, June 2026.
Abstract. The present article is dedicated to proving convergence of the stochastic gradient method in the case of random shape optimization problems. To that end, we consider Bernoulli’s exterior free boundary problem with a random interior boundary. We recast this problem into a shape optimization problem by means of the minimization of the expected Dirichlet energy. By restricting ourselves to the class of convex, sufficiently smooth domains of bounded curvature, the shape optimization problem becomes strongly convex with respect to an appropriate norm. Since this norm is weaker than the differentiability norm, we are confronted with the so-called two-norm discrepancy, a well-known phenomenon from optimal control. We therefore need to adapt the convergence theory of the stochastic gradient method to this specific setting correspondingly. The theoretical findings are supported and validated by numerical experiments.

Smoothed Gradient Clipping and Error Feedback for Decentralized Optimization under Symmetric Heavy-Tailed Noise

Shuhua Yu — 2026-04-23T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 703-728, June 2026.
Abstract. Motivated by understanding and analysis of large-scale machine learning under heavy-tailed gradient noise, we study decentralized optimization with gradient clipping, i.e., in which certain clipping operators are applied to the gradients or gradient estimates computed from local nodes prior to further processing. While vanilla gradient clipping has proven effective in mitigating the impact of heavy-tailed gradient noise in nondistributed setups, it incurs bias that causes convergence issues in heterogeneous distributed settings. To address the inherent bias introduced by gradient clipping, we develop a smoothed clipping operator, and propose a decentralized gradient method equipped with an error feedback mechanism, i.e., the clipping operator is applied on the difference between some local gradient estimator and local stochastic gradient. We consider strongly convex and smooth local functions under symmetric heavy-tailed gradient noise that may not have finite moments of order greater than one. We show that the proposed decentralized gradient clipping method achieves a mean-square error (MSE) convergence rate of [math], [math], where the exponent [math] is independent of the existence of higher order gradient noise moments [math] and lower bounded by some constant dependent on condition number. To the best of our knowledge, this is the first MSE convergence result for decentralized gradient clipping under heavy-tailed noise without assuming bounded gradient. Numerical experiments validate our theoretical findings.

Sum-of-Squares Hierarchy for the Gromov–Wasserstein Problem

Hoang Anh Tran — 2026-04-27T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 729-759, June 2026.
Abstract. The Gromov–Wasserstein (GW) problem is a variant of the classical optimal transport problem that allows one to compute meaningful transportation plans between incomparable spaces. At an intuitive level, it seeks plans that minimize the discrepancy between metric evaluations of pairs of points. The GW problem is typically cast as an instance of a nonconvex quadratic program that is, unfortunately, intractable to solve. In this paper, we describe tractable semidefinite relaxations of the GW problem based on the sum-of-squares (SOS) hierarchy. We describe how the Putinar-type and the Schmüdgen-type moment hierarchies can be simplified using marginal constraints, and we prove convergence rates for these hierarchies towards computing global optimal solutions to the GW problem. The proposed SOS hierarchies naturally induce a distance measure analogous to the distortion metrics, and we show that these are genuine distances in that they satisfy the triangle inequality. In particular, the proposed SOS hierarchies provide computationally tractable proxies of the GW distance and the associated distortion distances (over metric measure spaces) that are otherwise intractable to compute.

Convergence Rates of Sum-of-Squares Hierarchies for Polynomial Semidefinite Programs

Hoang Anh Tran — 2026-04-29T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 760-790, June 2026.
Abstract. We introduce a novel moment-sum-of-squares (SOS) hierarchy of lower bounds for a polynomial optimization problem whose feasible set is defined by polynomial matrix inequalities. Our hierarchy avoids the Kronecker product structure in Hol and Scherer’s hierarchy, thus resulting in smaller-sized semidefinite programs. Our approach involves utilizing a penalty function framework to directly address the matrix-based constraint, which is applicable to both discrete and continuous polynomial optimization problems. We investigate the convergence rates of these bounds for both types of problems. The proposed method yields a variant of Putinar’s theorem, tailored for positive polynomials on a compact set [math] defined by a polynomial matrix inequality. More specifically, we derive novel insights into the bounds on the degree of the SOS polynomials required to certify positivity over [math], based on Jackson’s theorem and a variant of the Łojasiewicz inequality in the matrix setting.

A Particle Algorithm for Mean-Field Variational Inference

Qiang Du — 2026-04-30T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 791-810, June 2026.
Abstract. Variational inference (VI) is a fast and scalable alternative to Markov chain Monte Carlo and has been widely applied to posterior inference tasks in statistics and machine learning. A traditional approach for implementing mean-field variational inference (MFVI) is coordinate ascent variational inference (CAVI), which relies crucially on parametric assumptions on complete conditionals. We introduce a novel particle-based algorithm for MFVI, named PArticle VI (PAVI), for nonparametric mean-field approximation. We obtain nonasymptotic error bounds for our algorithm. To the best of our knowledge, this is the first end-to-end guarantee for particle-based MFVI.

A Minimization Approach for Minimax Optimization with Coupled Constraints

Xiaoyin Hu — 2026-05-06T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 811-840, June 2026.
Abstract. In this paper, we focus on the nonconvex-strongly-concave minimax optimization problem (MCC), where the inner maximization subproblem contains constraints that couple the primal variable of the outer minimization problem. Based on the nondegeneracy of the coupled constraints, we prove that by introducing the dual variable of the inner maximization subproblem, (MCC) has the same first-order minimax points as a nonconvex-strongly-concave minimax optimization problem without coupled constraints (MOL). We then extend our focus to a class of nonconvex-strongly-concave minimax optimization problems (MM) that generalize (MOL). By performing the partial forward-backward envelope to the primal variable of the inner maximization subproblem, we propose a minimization problem (MMPen), where its objective function is explicitly formulated. We prove that the first-order stationary points of (MMPen) coincide with the first-order minimax points of (MM). Therefore, various efficient minimization methods and their convergence guarantees can be directly employed to solve (MM), hence solving (MCC) through (MOL). Preliminary numerical experiments demonstrate the great potential of our proposed approach.

Convergence of Trust-Region Algorithms in Metric Spaces

Paul Manns — 2026-05-22T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 841-865, June 2026.
Abstract. Trust-region algorithms can be applied to very abstract optimization problems because they do not require a specific direction of descent or gradient. This has lead to recent interest in them, in particular in the area of integer optimal control problems, where the infinite-dimensional problem formulations do not assume vector space structure. We analyze a trust-region algorithm in the abstract setting of a metric space, a setting in which integer optimal control problems with total variation regularization can be formulated. Our analysis avoids a reset of the trust-region radius upon acceptance of the iterates when proving convergence to stationary points. This reset has been present in previous analyses of trust-region algorithms for integer optimal control problems. Our computational benchmark shows that the runtime can be considerably improved when avoiding this reset, which is now theoretically justified.

Duality of Hoffman Constants

Javier F. Peña — 2026-05-22T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 866-886, June 2026.
Abstract. We show that a suitable Slater condition implies a duality inequality between the Hoffman constants of the following feasibility problems: [math] where [math], and [math] and [math] are reference polyhedral cones, with respective dual cones [math] and [math]. Our approach relies on an exact characterization of Hoffman constants and introduces a novel Hoffman duality inequality for polyhedral set-valued mappings. These two fundamental results also yield a striking identity between the Hoffman constants of box-constrained feasibility problems, which feature a similar primal-dual structure with a box and a linear subspace as reference sets. Additionally, we establish a surprising identity between the Hoffman constants of box-constrained feasibility problems and the chi condition measures for weighted least-squares problems.

Comprehensive Analysis of Kernel-Based Interior-Point Methods for the [math]-LCP

Zsolt Darvay — 2026-05-27T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 887-911, June 2026.
Abstract. We present an interior-point algorithm for [math]-linear complementarity problems, based on a barrier function defined by a new class of univariate kernel functions, called standard kernel functions (SKFs). A comprehensive and unified complexity analysis of the algorithm is provided, and a general procedure is developed to determine the iteration bounds for long-step and short-step versions of the method for the entire class of SKFs. We illustrate the general procedure by determining the iteration bounds for several parametric SKFs, which include, as special cases, all eligible kernel functions from the literature that have rational or exponential barrier terms. In all cases, we match the best iteration bounds obtained in the literature for these special cases of SKFs.

Exponential Convergence of General Iterative Proportional Fitting Procedures

Stephan Eckstein — 2026-05-27T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 912-937, June 2026.
Abstract. Motivated by the success of Sinkhorn’s algorithm for entropic optimal transport, we study convergence properties of iterative proportional fitting procedures (IPFP) used to solve more general information projection problems. We establish exponential convergence guarantees for the IPFP whenever the set of probability measures which is projected onto is defined through constraints arising from linear function spaces. This unifies and extends recent results from multimarginal, adapted, and martingale optimal transport. The proofs are based on strong convexity arguments for the dual problem, and the key contribution is to illuminate the role of the geometric interplay between the subspaces defining the constraints. In this regard, we show that the larger the angle (in the sense of Friedrichs) between the linear function spaces, the better the rate of contraction of the IPFP.

Avoiding Strict Saddle Points of Nonconvex Regularized Problems

Luwei Bai — 2026-05-29T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 2, Page 938-967, June 2026.
Abstract. This paper considers a class of nonconvex and nonsmooth sparse optimization problems, encompassing most existing nonconvex sparsity-inducing terms. We show that their second-order optimality conditions depend only on the nonzeros of the stationary points. We propose two damped iteratively reweighted algorithms, which are the iteratively reweighted [math] algorithm (DIRL[math]) and the iteratively reweighted [math] (DIRL[math]) algorithm, to solve these problems. For DIRL[math], we show that the reweighted [math] subproblem has the support identification property so that DIRL[math] locally reverts to a gradient descent algorithm around a stationary point. For DIRL[math], we show that the solution map of the reweighted [math] subproblem is differentiable and Lipschitz continuous everywhere. Therefore, the solution maps of DIRL[math] and DIRL[math] and their inverses are Lipschitz continuous, and the strict saddle points are their unstable fixed points. By applying the stable manifold theorem, these algorithms starting from almost every initial point are shown to converge to local minima when the strict saddle point property is assumed.

Variance-Reduced First-Order Methods for Deterministically Constrained Stochastic Nonconvex Optimization with Strong Convergence Guarantees

Zhaosong Lu — 2026-01-02T08:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 1-31, March 2026.
Abstract. In this paper, we study a class of deterministically constrained stochastic nonconvex optimization problems. Existing methods typically aim to find an [math]-expectedly feasible stochastic stationary point, where the expected violations of both constraints and first-order stationarity are within a prescribed tolerance [math]. However, in many practical applications, it is crucial that the constraints be nearly satisfied with certainty, making such an [math]-stochastic stationary point potentially undesirable due to the risk of substantial constraint violations. To address this issue, we propose single-loop variance-reduced stochastic first-order methods, where the stochastic gradient of the stochastic component is computed using either a truncated recursive momentum scheme or a truncated Polyak momentum scheme for variance reduction, while the gradient of the deterministic component is computed exactly. Under the error bound condition with a parameter [math] and other suitable assumptions, we establish that these methods respectively achieve sample complexity and first-order oracle complexity of [math] and [math] for finding an [math]-surely feasible stochastic stationary point ([math] represents [math] with logarithmic factors hidden), where the constraint violation is within [math] with certainty, and the expected violation of first-order stationarity is within [math]. For [math], these complexities reduce to [math] and [math], respectively, which match, up to a logarithmic factor, the best-known complexities achieved by existing methods for finding an [math]-stochastic stationary point of unconstrained smooth stochastic nonconvex optimization problems.

Large Deviation Upper Bounds and Improved MSE Rates of Nonlinear SGD: Heavy-Tailed Noise and Power of Symmetry

Aleksandar Armacki — 2026-01-27T08:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 32-59, March 2026.
Abstract. We study large deviation upper bounds and mean squared error (MSE) guarantees of a general framework of nonlinear stochastic gradient methods in the online setting and in the presence of heavy-tailed noise. Unlike existing works that rely on the closed form of a nonlinearity (typically clipping), our framework treats the nonlinearity in a black-box manner, allowing us to provide unified results for a broad class of bounded nonlinearities, including many popular ones, like sign, quantization, normalization, as well as componentwise and joint clipping. We provide several strong guarantees for a broad range of step-sizes in the presence of heavy-tailed noise with symmetric probability density function, positive in a neighborhood of zero and potentially unbounded moments. In particular, for nonconvex costs, we provide a large deviation upper bound for the minimum norm-squared of gradients, showing an asymptotic tail decay on an exponential scale, at a rate [math]. We establish the accompanying rate function, showing an explicit dependence on the choice of step-size, nonlinearity, noise, and problem parameters. Next, for nonconvex costs and the minimum norm-squared of gradients, we derive the optimal MSE rate [math]. Moreover, for strongly convex costs and the last iterate, we provide an MSE rate that can be made arbitrarily close to the optimal rate [math], improving on the state-of-the-art results in the presence of heavy-tailed noise. Finally, we establish almost sure (a.s.) convergence of the minimum norm-squared of gradients, providing an explicit rate which can be made arbitrarily close to [math].

Tightness of SDP and Burer–Monteiro Factorization for Phase Synchronization in a High-Noise Regime

Anderson Ye Zhang — 2026-02-02T08:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 60-89, March 2026.
Abstract. We study the difference between the maximum likelihood estimation (MLE) and its semidefinite programming (SDP) relaxation for the phase synchronization problem, where [math] latent phases are estimated based on pairwise observations corrupted by Gaussian noise at a level [math]. While previous studies have established that SDP coincides with the MLE when [math], the behavior in the high-noise regime [math] remains unclear. We address this gap by quantifying the deviation between the SDP and the MLE in the high-noise regime as [math], indicating an exponentially small discrepancy. In fact, we establish more general results for the Burer–Monteiro factorization that covers the SDP as a special case: it has the exponentially small deviation from the MLE in the high-noise regime and coincides with the MLE when [math] is small. To obtain our results, we develop a refined entrywise analysis of the MLE that is beyond the existing [math] analysis in the literature.

Exploring Chordal Sparsity in Semidefinite Programming with Sparse plus Low-Rank Data Matrices

Tianyun Tang — 2026-02-04T08:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 90-119, March 2026.
Abstract. Semidefinite programming (SDP) problems are challenging to solve because of their high dimensionality. However, solving sparse SDP problems with small tree width is known to be relatively easier because (1) they can be decomposed into smaller multiblock SDP problems through chordal conversion and (2) they have low-rank optimal solutions. In this paper, we study more general SDP problems whose coefficient matrices have sparse plus low-rank (SPLR) structure. We develop a unified framework to convert such problems into sparse SDP problems with bounded tree width. Based on this, we derive rank bounds for SDP problems with SPLR structure, which are tight in the worst case.

Policy Gradient Algorithms for Robust MDP[math] with Nonrectangular Uncertainty Sets

Mengmeng Li — 2026-02-09T08:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 120-151, March 2026.
Abstract. We propose policy gradient algorithms for robust infinite-horizon Markov decision processes (MDPs) with nonrectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that display statistical optimality properties and make optimal use of limited data often fail to be rectangular. Unfortunately, the corresponding robust MDPs cannot be solved with dynamic programming techniques and are in fact provably intractable. We first present a randomized projected Langevin dynamics algorithm that solves the robust policy evaluation problem to global optimality but is inefficient. We also propose a deterministic policy gradient method that is efficient but solves the robust policy evaluation problem only approximately, and we prove that the approximation error scales with a new measure of nonrectangularity of the uncertainty set. Finally, we describe an actor-critic algorithm that finds an [math]-optimal solution for the robust policy improvement problem in [math] iterations. We thus present the first complete solution scheme for robust MDPs with nonrectangular uncertainty sets offering global optimality guarantees. Numerical experiments show that our algorithms compare favorably against state-of-the-art methods.

Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective

Jelena Diakonikolas — 2026-02-17T08:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 152-184, March 2026.
Abstract. We initiate the study of nonsmooth optimization problems under bounded local subgradient variation, which postulates bounded difference between (sub)gradients in small local regions around points, in either the average or the maximum sense. The resulting class of objective functions encapsulates the classes of objective functions traditionally studied in the optimization literature, which are defined based on either Lipschitz continuity of the objective or Hölder/Lipschitz continuity of the function’s gradient. Further, the defined class is richer in the sense that it contains functions that neither are Lipschitz continuous nor have a Hölder-continuous gradient. Finally, when restricted to the aforementioned traditional classes of optimization problems, the constants defining the studied classes lead to more fine-grained oracle complexity bounds. Some highlights of our results are that (i) it is possible to obtain complexity results for both convex and nonconvex optimization problems with (local or global) Lipschitz constant being replaced by a constant of local subgradient variation, corresponding to small local regions, and (ii) complexity of the subgradient set around the set of optima—measured by its mean width in a local region around optima—plays a role in the complexity of nonsmooth optimization, particularly in parallel optimization settings. A consequence of (ii) is that for any error parameter [math], parallel oracle complexity of nonsmooth Lipschitz convex optimization is lower than its sequential oracle complexity by a factor [math] whenever the objective function is piecewise-affine with the number of pieces polynomial in the dimension and [math]. This is particularly surprising considering that existing parallel complexity lower bounds are based on such classes of functions. The seeming contradiction is resolved by considering the region in which the algorithm is allowed to query the objective.

New Interior-Point Algorithm for Linear Optimization Based on a Universal Tangent Direction

Marianna Eisenberg-Nagy — 2026-02-19T08:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 185-203, March 2026.
Abstract. In this paper, we suggest a new interior-point algorithm for linear optimization, based on the idea of parabolic target space. Our algorithm can start at any strictly feasible primal-dual pair and go directly towards a solution by a predictor-corrector scheme. We prove that the complexity of the proposed method coincides with the currently known best complexity results for interior-point algorithms. The method demonstrates a very fast local convergence on the test set problems we have evaluated. One of the main differences between our approach and the standard framework is that our algorithm is based on a parabolic primal-dual barrier function.

Moment-Sos and Spectral Hierarchies for Polynomial Optimization on the Sphere and Quantum de Finetti Theorems

Alexander Taveira Blomenhofer — 2026-02-20T08:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 204-232, March 2026.
Abstract. We revisit the convergence analysis of two approximation hierarchies for polynomial optimization on the unit sphere. The first one is based on the moment-sos approach and gives semidefinite bounds for which Fang and Fawzi (2021) showed an analysis in [math] for the [math]th level bound, using the polynomial kernel method. The second hierarchy was recently proposed by Lovitz and Johnston (2023) and gives spectral bounds for which they show a convergence rate in [math], using a quantum de Finetti theorem of Christandl et al. (2007) that applies to complex Hermitian matrices with a “double” symmetry. We investigate links between these approaches, in particular, via duality of moments and sums of squares. Our main results include showing that the spectral bounds cannot have a convergence rate better than [math] and that they do not enjoy generic finite convergence. In addition, we propose alternative performance analyses that involve explicit constants depending on intrinsic parameters of the optimization problem. For this we develop a novel “banded” real de Finetti theorem that applies to real matrices with “double” symmetry. We also show how to use the polynomial kernel method to obtain a de Finetti type result in [math] for real maximally symmetric matrices, improving an earlier result in [math] of Doherty and Wehner (2012).

Mirror Descent Algorithms with Nearly Dimension-Independent Rates for Differentially-Private Stochastic Saddle-Point Problems

Tomás González — 2026-02-20T08:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 233-262, March 2026.
Abstract. We study the problem of differentially-private (DP) stochastic (convex-concave) saddle-points in the [math] setting. We propose [math]-DP algorithms based on stochastic mirror descent that attain nearly dimension-independent convergence rates for the expected duality gap, a type of guarantee that was known before only for bilinear objectives. For convex-concave and first-order-smooth stochastic objectives, our algorithms attain a rate of [math], where [math] is the dimension of the problem and [math] the dataset size. Under an additional second-order-smoothness assumption, we show that the duality gap is bounded by [math] with high probability, by using bias-reduced gradient estimators. This rate provides evidence of the near-optimality of our approach, since a lower bound of [math] exists. Finally, we show that combining our methods with acceleration techniques from online learning leads to the first algorithm for DP stochastic convex optimization in the [math] setting that is not based on Frank–Wolfe methods. For convex and first-order-smooth stochastic objectives, our algorithms attain an excess risk of [math], and when additionally assuming second-order-smoothness, we improve the rate to [math]. Instrumental to all of these results are various extensions of the classical Maurey sparsification lemma [], which may be of independent interest.

Distributed Nonlinear Conic Optimization with Partially Separable Structure

Richard Heusdens — 2026-02-24T08:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 263-289, March 2026.
Abstract. In this paper, we consider the problem of distributed nonlinear optimization of a separable convex cost function over a graph subject to cone constraints. We show how to generalize using convex analysis, monotone operator theory, and fixed-point theory, the primal-dual method of multipliers (PDMM), originally designed for equality constraint optimization and recently extended to include linear inequality constraints, so that it can also accommodate cone constraints. The resulting algorithm can be applied to a variety of optimization problems, including the important class of semidefinite programs with partially separable structure, in a fully distributed fashion without relying on interior-point methods. We derive update equations by applying the Peaceman–Rachford splitting algorithm to the monotonic inclusion related to the lifted dual problem. The cone constraints are implemented by a reflection method in the lifted dual domain where auxiliary variables are reflected with respect to the intersection of the polar cone and a subspace relating the dual and lifted dual domain. Convergence results are provided for both synchronous and stochastic update schemes, and the proposed algorithm is demonstrated through an application to fully distributed sensor localization based on semidefinite programming.

Moving Higher-Order Taylor Approximations Method for Smooth Constrained Minimization Problems

Yassine Nabou — 2026-02-25T08:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 290-319, March 2026.
Abstract. In this paper, we introduce a higher-order method for solving composite (non)convex minimization problems with constraints expressed as inequalities involving smooth (non)convex functions. Starting from a feasible point, at each iteration, our method approximates the smooth part of the objective function and of the constraints by higher-order Taylor approximations, leading to a moving Taylor approximation (MTA) method. We present convergence guarantees for the MTA algorithm for both nonconvex and convex problems. In particular, when the objective and the constraints are nonconvex functions, and assuming some regularity condition on the constraints at any point in some sublevel set, we prove that the sequence generated by the MTA algorithm converges globally to a KKT point. Moreover, we derive convergence rates in the iterates when the problem’s data satisfy the Kurdyka–Łojasiewicz property. Further, when the objective function is (uniformly) convex and the constraints are also convex, we provide (linear/superlinear) sublinear convergence rates for our algorithm. Finally, we present an efficient implementation of the proposed algorithm and compare it with some existing methods from the literature.

Convergence Regions of Alternating Minimization Algorithms for Dictionary Learning

Simon Ruetz — 2026-02-26T08:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 320-349, March 2026.
Abstract. In this paper we derive sufficient conditions for the convergence of two popular alternating minimization algorithms for dictionary learning, the method of optimal directions (MOD) and online dictionary learning (ODL), which can also be thought of as approximative K-SVD. We show for a generating dictionary with [math] atoms, that given enough training signals in each iteration and a well-behaved initialization that is either within distance at most [math] to the generating dictionary or has a special structure, ensuring that each atom of the initialization only points to one generating atom, both algorithms will converge with geometric convergence rate to the generating dictionary. This is done even for signal models with nonuniform distributions on the supports of the sparse coefficients. These allow the appearance frequency of the dictionary atoms to vary heavily and thus model real data more closely.

Alternating Gradient-Type Algorithm for Bilevel Optimization with Inexact Lower-Level Solutions via Moreau Envelope–Based Reformulation

Xiaoning Bai — 2026-03-06T08:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 350-380, March 2026.
Abstract. In this paper, we study a class of bilevel optimization problems where the lower-level problem is a convex composite optimization model, which arises in various applications, including bilevel hyperparameter selection for regularized regression models. To solve these problems, we propose an alternating gradient–type algorithm with inexact lower-level solutions (AGILS) based on a Moreau envelope–based reformulation of the bilevel optimization problem. The proposed algorithm does not require exact solutions of the lower-level problem at each iteration, improving computational efficiency. We prove the convergence of AGILS to stationary points and, under the Kurdyka–Łojasiewicz property, establish its sequential convergence. Numerical experiments, including a toy example and a bilevel hyperparameter selection problem for the sparse group Lasso model, demonstrate the effectiveness of the proposed AGILS.

A QP1QC Approach for Deciding Whether or Not Two Quadratic Surfaces Intersect

Huu-Quang Nguyen — 2026-03-09T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 381-408, March 2026.
Abstract. Given two [math]-variate quadratic functions [math] and [math], we are interested in knowing whether or not the two hypersurfaces [math] and [math] intersect with each other. There are two ways of looking at this problem. In one respect, the famous Finsler–Calabi theorem (1936, 1964) asserts that if [math] and [math] are quadratic forms, [math] and [math] has no common solution other than the trivial one, [math], if and only if there exists a positive definite matrix pencil [math]. The result is in general not true for nonhomogeneous quadratic functions. On the other hand, Levin (c. late 1970s) tried to directly solve the intersection curve of [math] and [math] but it turned out to be way too ambitious. In this paper, we show that by incorporating the information about the unboundedness and the unattainability of several (at most 4) quadratic programming problems with one single quadratic constraint (QP1QC), the answer as to whether or not [math] and [math] intersect can be successfully determined.

Efficient Solutions in Uncertain Multiobjective Optimization with Countably Many Scenarios

C. Gutiérrez — 2026-03-16T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 409-433, March 2026.
Abstract. We use a vector approach to address, from an efficient point of view, an uncertain unconstrained multiobjective optimization problem with countably many scenarios. Specifically, we introduce several efficient solution notions that work not only in the Pareto case, but also when the preferences in the image space depend on the scenario and they are defined by a convex cone in the usual way. We state basic properties of these notions and we relate the involved solution sets with other well-known solution sets of the literature. Particularly, it is shown that the so-called highly solutions are a particular case of efficient solutions. In addition, we obtain characterizations through solutions of associated scalar optimization problems and we derive existence theorems. Finally, two applications are provided to illustrate the main results.

Non-degenerate Rigid Alignment in a Patch Framework

Dhruv Kohli — 2026-03-18T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 434-465, March 2026.
Abstract. Given a set of overlapping local views (patches) of a dataset, we consider the problem of finding a rigid alignment of the views that minimizes a 2-norm based alignment error. In general, the views are noisy and a perfect alignment may not exist. In this work, we characterize the non-degeneracy of an alignment in the noisy setting based on the kernel and positivity of a certain matrix. This leads to a polynomial time algorithm for testing the non-degeneracy of a given alignment. Subsequently, we focus on Riemannian gradient descent for minimizing the alignment error, providing a sufficient condition on an alignment for the algorithm to converge (locally) linearly to it. Additionally, we provide an exact recovery and noise stability analysis of the algorithm. In the case of noiseless views, a perfect alignment exists, resulting in a realization of the points that respects the geometry of the views. Under a mild condition on the views, we show that a non-degenerate perfect alignment characterizes the infinitesimally rigidity of a realization and thus the local rigidity of a generic realization. By specializing the non-degeneracy conditions to the noiseless case, we derive necessary and sufficient conditions on the overlapping structure of the views for a perfect alignment to be non-degenerate and, equivalently, for the resulting realization to be infinitesimally rigid.

Global Convergence of an Augmented Lagrangian Method for Nonlinear Programming via Riemannian Optimization

Roberto Andreani — 2026-03-19T07:00:00Z

SIAM Journal on Optimization, Volume 36, Issue 1, Page 466-501, March 2026.
Abstract. Considering a standard nonlinear programming problem, one may view a subset of the equality constraints as an embedded Riemannian manifold. In this paper we investigate the differences between the Euclidean and the Riemannian approach for this problem. It is well known that the linear independence constraint qualification for both approaches are equivalent. However, when considering recently introduced constant rank constraint qualifications, the Riemannian approach provides a weaker condition as the rank of the gradients must remain constant only inside the manifold, while the Euclidean approach requires constant rank properties inside a full-dimensional neighborhood of the ambient space. Therefore by employing a Riemannian augmented Lagrangian method to a standard nonlinear programming problem we are able to obtain standard global convergence to a Karush/Kuhn–Tucker point under a new weaker constant rank condition that considers only lower-dimensional neighborhoods. In this way we illustrate how the Riemannian perspective can provide new and stronger results to classical problems traditionally addressed through Euclidean theory. We also investigate the two alternative augmented Lagrangian algorithms in a comprehensive computational study, where we show some classes of problems where the Riemannian approach is much more effective in attaining better quality solutions.