SIAM Journal on Optimization


Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning

Related Databases

Web of Science

You must be logged in with an active subscription to view this.

Article Data

History

Submitted: 18 February 2014
Accepted: 27 January 2015
Published online: 14 April 2015

Publication Data

ISSN (print): 1052-6234
ISSN (online): 1095-7189
CODEN: sjope8

Majorization-minimization algorithms consist of successively minimizing a sequence of upper bounds of the objective function. These upper bounds are tight at the current estimate, and each iteration monotonically drives the objective function downhill. Such a simple principle is widely applicable and has been very popular in various scientific fields, especially in signal processing and statistics. We propose an incremental majorization-minimization scheme for minimizing a large sum of continuous functions, a problem of utmost importance in machine learning. We present convergence guarantees for nonconvex and convex optimization when the upper bounds approximate the objective up to a smooth error; we call such upper bounds “first-order surrogate functions.” More precisely, we study asymptotic stationary point guarantees for nonconvex problems, and for convex ones, we provide convergence rates for the expected objective function value. We apply our scheme to composite optimization and obtain a new incremental proximal gradient algorithm with linear convergence rate for strongly convex functions. Our experiments show that our method is competitive with the state of the art for solving machine learning problems such as logistic regression when the number of training samples is large enough, and we demonstrate its usefulness for sparse estimation with nonconvex penalties.

© 2015, Society for Industrial and Applied Mathematics

Cited by

(2020) Stochastic DCA for minimizing a large sum of DC functions with application to multi-class logistic regression. Neural Networks 132, 220-231. Crossref
(2020) Linear convergence of inexact descent method and inexact proximal gradient algorithms for lower-order regularization problems. Journal of Global Optimization 35. Crossref
(2020) Enhanced Low-rank Constraint for Temporal Subspace Clustering and Its Acceleration Scheme. Pattern Recognition, 107678. Crossref
(2020) Linear convergence of cyclic SAGA. Optimization Letters 14:6, 1583-1598. Crossref
(2020) Modulus-based iterative methods for constrained p q minimization. Inverse Problems 36:8, 084001. Crossref
(2020) A data-driven group-sparse feature extraction method for fault detection of wind turbine transmission system. Measurement Science and Technology 31:7, 074008. Crossref
(2020) An accelerated stochastic variance-reduced method for machine learning problems. Knowledge-Based Systems 198, 105941. Crossref
(2020) A generalized proximal linearized algorithm for DC functions with application to the optimal size of the firm problem. Annals of Operations Research 289:2, 313-339. Crossref
(2020) Stochastic quasi-gradient methods: variance reduction via Jacobian sketching. Mathematical Programming 66. Crossref
(2020) SVRG-MKL: A Fast and Scalable Multiple Kernel Learning Solution for Features Combination in Multi-Class Classification Problems. IEEE Transactions on Neural Networks and Learning Systems 31:5, 1710-1723. Crossref
(2020) Accelerating incremental gradient optimization with curvature information. Computational Optimization and Applications 2010. Crossref
(2020) Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization. Mathematical Programming 25. Crossref
(2020) Provable Convergence of Plug-and-Play Priors With MMSE Denoisers. IEEE Signal Processing Letters 27, 1280-1284. Crossref
(2019) The Double-Accelerated Stochastic Method for Regularized Empirical Risk Minimization. IEEE Transactions on Emerging Topics in Computational Intelligence 3:6, 440-451. Crossref
(2019) Incremental quasi-subgradient methods for minimizing the sum of quasi-convex functions. Journal of Global Optimization 75:4, 1003-1028. Crossref
(2019) Stochastic sub-sampled Newton method with variance reduction. International Journal of Wavelets, Multiresolution and Information Processing 17:06, 1950041. Crossref
(2019) Parametric Majorization for Data-Driven Energy Minimization Methods. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 10261-10272. Crossref
(2019) Stochastic proximal quasi-Newton methods for non-convex composite optimization. Optimization Methods and Software 34:5, 922-948. Crossref
(2019) Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering. Neuroinformatics 17:3, 407-421. Crossref
(2019) Proximal-Like Incremental Aggregated Gradient Method with Linear Convergence Under Bregman Distance Growth Conditions. Mathematics of Operations Research. Crossref
(2019) Convergence rates of accelerated proximal gradient algorithms under independent noise. Numerical Algorithms 81:2, 631-654. Crossref
(2019) Majorization and Dynamics of Continuous Distributions. Entropy 21:6, 590. Crossref
(2019) Generalized forward–backward splitting with penalization for monotone inclusion problems. Journal of Global Optimization 73:4, 825-847. Crossref
, , and . (2019) Riemannian Stochastic Variance Reduced Gradient Algorithm with Retraction and Vector Transport. SIAM Journal on Optimization 29:2, 1444-1472. Abstract | PDF (605 KB) 
, , and . (2019) An Inexact Variable Metric Proximal Point Algorithm for Generic Quasi-Newton Acceleration. SIAM Journal on Optimization 29:2, 1408-1443. Abstract | PDF (1017 KB) 
and . (2019) A Coordinate-Descent Primal-Dual Algorithm with Large Step Size and Possibly Nonseparable Functions. SIAM Journal on Optimization 29:1, 100-134. Abstract | PDF (797 KB) 
(2019) Stochastic Momentum Method With Double Acceleration for Regularized Empirical Risk Minimization. IEEE Access 7, 166551-166563. Crossref
(2018) A Distributed, Asynchronous, and Incremental Algorithm for Nonconvex Optimization: An ADMM Approach. IEEE Transactions on Control of Network Systems 5:3, 935-945. Crossref
(2018) Sparsity-based signal extraction using dual Q-factors for gearbox fault detection. ISA Transactions 79, 147-160. Crossref
(2018) Improving Sparsity and Scalability in Regularized Nonconvex Truncated-Loss Learning Problems. IEEE Transactions on Neural Networks and Learning Systems 29:7, 2782-2793. Crossref
(2018) Sparse Representation Using Multidimensional Mixed-Norm Penalty With Application to Sound Field Decomposition. IEEE Transactions on Signal Processing 66:12, 3327-3338. Crossref
(2018) Majorization Minimization Technique for Optimally Solving Deep Dictionary Learning. Neural Processing Letters 47:3, 799-814. Crossref
(2018) Large-scale asynchronous distributed learning based on parameter exchanges. International Journal of Data Science and Analytics 5:4, 223-232. Crossref
(2018) Kernel group sparse representation classifier via structural and non-convex constraints. Neurocomputing 296, 1-11. Crossref
(2018) Stream-suitable optimization algorithms for some soft-margin support vector machine variants. Japanese Journal of Statistics and Data Science 1:1, 81-108. Crossref
2018. Duality for Gaussian Processes from Random Signed Measures. Mathematical Analysis and Applications, 23-56. Crossref
, , and . (2018) Global Convergence Rate of Proximal Incremental Aggregated Gradient Methods. SIAM Journal on Optimization 28:2, 1282-1300. Abstract | PDF (661 KB) 
, , and . (2018) Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate. SIAM Journal on Optimization 28:2, 1420-1447. Abstract | PDF (764 KB) 
, , and . (2018) IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate. SIAM Journal on Optimization 28:2, 1670-1698. Abstract | PDF (848 KB) 
, , and . (2018) Composite Difference-Max Programs for Modern Statistical Estimation Problems. SIAM Journal on Optimization 28:4, 3344-3374. Abstract | PDF (1276 KB) 
2018. Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization. Multi-agent Optimization, 141-308. Crossref
(2018) Robust Guided Image Filtering Using Nonconvex Potentials. IEEE Transactions on Pattern Analysis and Machine Intelligence 40:1, 192-207. Crossref
(2018) A Unified Convergence Analysis of the Multiplicative Update Algorithm for Regularized Nonnegative Matrix Factorization. IEEE Transactions on Signal Processing 66:1, 129-138. Crossref
(2017) A new linear convergence result for the iterative soft thresholding algorithm. Optimization 66:7, 1177-1189. Crossref
(2017) Image Fusion With Cosparse Analysis Operator. IEEE Signal Processing Letters 24:7, 943-947. Crossref
(2017) Nonconvex nonsmooth optimization via convex–nonconvex majorization–minimization. Numerische Mathematik 136:2, 343-381. Crossref
(2017) Majorization–minimization generalized Krylov subspace methods for $${\ell _p}$$–$${\ell _q}$$ optimization applied to image restoration. BIT Numerical Mathematics 57:2, 351-378. Crossref
(2017) An introduction to Majorization-Minimization algorithms for machine learning and statistical estimation. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 7:2, e1198. Crossref
(2017) Who are the spoilers in social media marketing? Incremental learning of latent semantics for social spam detection. Electronic Commerce Research 17:1, 51-81. Crossref
(2017) A unified convergence analysis of the multiplicative update algorithm for nonnegative matrix factorization. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2562-2566. Crossref
(2017) A double incremental aggregated gradient method with linear convergence rate for large-scale optimization. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4696-4700. Crossref
(2017) Majorization-Minimization Algorithms in Signal Processing, Communications, and Machine Learning. IEEE Transactions on Signal Processing 65:3, 794-816. Crossref
(2017) Katyusha: the first direct acceleration of stochastic gradient methods. Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing - STOC 2017, 1200-1205. Crossref
(2016) Optimizing cluster structures with inner product induced norm based dissimilarity measures: Theoretical development and convergence analysis. Information Sciences 372, 796-814. Crossref
(2016) Global convergence rate of incremental aggregated gradient methods for nonsmooth problems. 2016 IEEE 55th Conference on Decision and Control (CDC), 173-178. Crossref
(2016) Coordinate descent with arbitrary sampling I: algorithms and complexity . Optimization Methods and Software 31:5, 829-857. Crossref
(2016) Cooperative coevolution with dependency identification grouping for large scale global optimization. 2016 IEEE Congress on Evolutionary Computation (CEC), 5201-5208. Crossref
(2016) Optical Hyperacuity Mechanism by Incorporating Human Eye Microsaccades. Journal of Software Engineering 10:4, 416-423. Crossref
(2016) A Survey of Stochastic Simulation and Optimization Methods in Signal Processing. IEEE Journal of Selected Topics in Signal Processing 10:2, 224-241. Crossref
2015. Adaptive Sampling for Incremental Optimization Using Stochastic Gradient Descent. Algorithmic Learning Theory, 317-331. Crossref