Abstract

Emerging applications in multiagent environments such as internet-of-things, networked sensing, autonomous systems, and federated learning, call for decentralized algorithms for finite-sum optimizations that are resource efficient in terms of both computation and communication. In this paper, we consider the prototypical setting where the agents work collaboratively to minimize the sum of local loss functions by only communicating with their neighbors over a predetermined network topology. We develop a new algorithm, called DEcentralized STochastic REcurSive gradient methodS (DESTRESS) for nonconvex finite-sum optimization, which matches the optimal incremental first-order oracle complexity of centralized algorithms for finding first-order stationary points, while maintaining communication efficiency. Detailed theoretical and numerical comparisons corroborate that the resource efficiencies of DESTRESS improve upon prior decentralized algorithms over a wide range of parameter regimes. DESTRESS leverages several key algorithm design ideas including stochastic recursive gradient updates with minibatches for local computation, gradient tracking with extra mixing (i.e., multiple gossiping rounds) for periteration communication, together with careful choices of hyperparameters and new analysis frameworks to provably achieve a desirable computation-communication trade-off.

Keywords

  1. decentralized optimization
  2. nonconvex finite-sum optimization
  3. stochastic recursive gradient methods

MSC codes

  1. 68Q25
  2. 68Q11
  3. 68T09

Get full access to this article

View all available purchase options and get full access to this article.

Supplementary Material


PLEASE NOTE: These supplementary files have not been peer-reviewed.


Index of Supplementary Materials

Title of paper: DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization

Authors: Boyue Li, Zhize Li, and Yuejie Chi

File: supplement.pdf

Type: PDF

Contents: Proofs of technical lemmas used in the main paper.

References

1.
Z. Allen-Zhu and E. Hazan, Variance reduction for faster non-convex optimization, Proc. Mach. Learn. Res. (PMLR), 48 (2016), pp. 699--707.
2.
M. Arioli and J. Scott, Chebyshev acceleration of iterative refinement, Numer. Algorithms, 66 (2014), pp. 591--608.
3.
M. Assran, N. Loizou, N. Ballas, and M. Rabbat, Stochastic gradient push for distributed deep learning, Proc. Mach. Learn. Res. (PMLR), 97 (2019), pp. 344--353.
4.
A. S. Berahas, R. Bollapragada, N. S. Keskar, and E. Wei, Balancing communication and computation in distributed optimization, IEEE Trans. Automat. Control, 64 (2019), pp. 3141--3155.
5.
A. S. Berahas, R. Bollapragada, and E. Wei, On the convergence of nested decentralized gradient methods with multiple consensus and gradient steps, IEEE Trans. Signal Process., 69 (2021), pp. 4192--4203.
6.
S. Cen, H. Zhang, Y. Chi, W. Chen, and T.-Y. Liu, Convergence of distributed stochastic variance reduced methods without sampling extra data, IEEE Trans. Signal Process., 68 (2020), pp. 3976--3989.
7.
A. Defazio, F. Bach, and S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, in Advances in Neural Information Processing Systems, Curran Associates, Red Hook, NY, 2014, pp. 1646--1654.
8.
L. Deng, The MNIST database of handwritten digit images for machine learning research [best of the web], IEEE Signal Process. Mag., 29 (2012), pp. 141--142.
9.
P. Di Lorenzo and G. Scutari, Next: In-network nonconvex optimization, IEEE Trans. Signal Inform. Process. Netw., 2 (2016), pp. 120--136.
10.
C. Fang, C. J. Li, Z. Lin, and T. Zhang, SPIDER: Near-optimal non-convex optimization via stochastic path-integrated differential estimator, in Advances in Neural Information Processing Systems, Curran Associates, Red Hook, NY, 2018, pp. 687--697.
11.
L. V. Gambuzza and M. Frasca, Distributed control of multiconsensus, IEEE Trans. Automat. Control, 66 (2020), pp. 2032--2044.
12.
A. Hashemi, A. Acharya, R. Das, H. Vikalo, S. Sanghavi, and I. S. Dhillon, On the benefits of multiple gossip steps in communication-constrained decentralized federated learning, IEEE Trans. Parallel Distrib. Systems, 33 (2021), pp. 2727--2739.
13.
C. Iakovidou and E. Wei, S-NEAR-DGD: A flexible distributed stochastic gradient method for inexact communication, IEEE Trans. Automat. Control, to appear.
14.
R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Advances in Neural Information Processing Systems, Curran Associates, Red Hook, NY, 2013, pp. 315--323.
15.
J. D. Lee, Q. Lin, T. Ma, and T. Yang, Distributed stochastic variance reduced gradient methods by sampling extra data with replacement, J. Mach. Learn. Res., 18 (2017), pp. 4404--4446.
16.
L. Lei, C. Ju, J. Chen, and M. I. Jordan, Non-convex finite-sum optimization via SCSG methods, in Advances in Neural Information Processing Systems, Curran Associates, Red Hook, NY, 2017, pp. 2345--2355.
17.
B. Li, S. Cen, Y. Chen, and Y. Chi, Communication-efficient distributed optimization in networks with gradient tracking and variance reduction, J. Mach. Learn. Res., 21 (2020), pp. 1--51.
18.
H. Li, C. Fang, W. Yin, and Z. Lin, Decentralized accelerated gradient methods with increasing penalty parameters, IEEE Trans. Signal Process., 68 (2020), pp. 4855--4870.
19.
Z. Li, SSRGD: Simple stochastic recursive gradient descent for escaping saddle points, in Advances in Neural Information Processing Systems, Curran Associates, Red Hook, NY, 2019, pp. 1523--1533.
20.
Z. Li, H. Bao, X. Zhang, and P. Richtárik, Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization, Proc. Mach. Learn. Res. (PMLR), 139 (2021), pp. 6286--6295.
21.
Z. Li and J. Li, A simple proximal stochastic gradient method for nonsmooth nonconvex optimization, in Advances in Neural Information Processing Systems, Curran Associates, Red Hook, NY, 2018, pp. 5569--5579.
22.
Z. Li and P. Richtárik, ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation, preprint, arXiv:2103.01447, 2021.
23.
X. Lian, C. Zhang, H. Zhang, C.-J. Hsieh, W. Zhang, and J. Liu, Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent, in Advances in Neural Information Processing Systems, 2017, pp. 5330--5340.
24.
Y. Lu and C. De Sa, Optimal complexity in decentralized training, Proc. Mach. Learn. Res. (PMLR), 139 (2021), pp. 7111--7123.
25.
A. Nedić, A. Olshevsky, and M. G. Rabbat, Network topology and communication-computation tradeoffs in decentralized optimization, Proc. IEEE, 106 (2018), pp. 953--976.
26.
A. Nedić, A. Olshevsky, and W. Shi, Achieving geometric convergence for distributed optimization over time-varying graphs, SIAM J. Optim., 27 (2017), pp. 2597--2633.
27.
A. Nedic and A. Ozdaglar, Distributed subgradient methods for multi-agent optimization, IEEE Trans. Automat. Control, 54 (2009), pp. 48--61.
28.
L. M. Nguyen, J. Liu, K. Scheinberg, and M. Takáč, SARAH: A novel method for machine learning problems using stochastic recursive gradient, Proc. Mach. Learn. Res. (PMLR), 70 (2017), pp. 2613--2621.
29.
L. M. Nguyen, M. van Dijk, D. T. Phan, P. H. Nguyen, T.-W. Weng, and J. R. Kalagnanam, Finite-sum smooth optimization with SARAH, Comput. Optim. Appl., 82 (2022), pp. 1--33.
30.
M. Nokleby, H. Raja, and W. U. Bajwa, Scaling-up distributed processing of data streams for machine learning, Proc. IEEE, 108 (2020), pp. 1984--2012.
31.
T. Pan, J. Liu, and J. Wang, D-SPIDER-SFO: A decentralized optimization algorithm with faster convergence rate for nonconvex problems, in Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 1619--1626.
32.
G. Qu and N. Li, Harnessing smoothness to accelerate distributed optimization, IEEE Trans. Control Netw. Syst., 5 (2018), pp. 1245--1260.
33.
S. J. Reddi, A. Hefny, S. Sra, B. Poczos, and A. Smola, Stochastic variance reduction for nonconvex optimization, Proc. Mach. Learn. Res. (PMLR), 48 (2016), pp. 314--323.
34.
S. J. Reddi, S. Sra, B. Póczos, and A. Smola, Fast incremental method for smooth nonconvex optimization, in 2016 IEEE 55th Conference on Decision and Control, IEEE, Piscataway, NJ, 2016, pp. 1971--1977.
35.
K. Scaman, F. Bach, S. Bubeck, Y. T. Lee, and L. Massoulié, Optimal algorithms for smooth and strongly convex distributed optimization in networks, Proc. Mach. Learn. Res. (PMLR), 70 (2017), pp. 3027--3036.
36.
K. Scaman, F. Bach, S. Bubeck, L. Massoulié, and Y. T. Lee, Optimal algorithms for non-smooth distributed optimization in networks, in Advances in Neural Information Processing Systems, Curran Associates, Red Hook, NY, 2018, pp. 2740--2749.
37.
W. Shi, Q. Ling, G. Wu, and W. Yin, EXTRA: An exact first-order algorithm for decentralized consensus optimization, SIAM J. Optim., 25 (2015), pp. 944--966.
38.
H. Sun, S. Lu, and M. Hong, Improving the sample and communication complexity for decentralized non-convex optimization: Joint gradient estimation and tracking, Proc. Mach. Learn. Res. (PMLR), 119 (2020), pp. 9217--9228.
39.
H. Tang, X. Lian, M. Yan, C. Zhang, and J. Liu, ${D}^2$: Decentralized training over decentralized data, Proc. Mach. Learn. Res. (PMLR), 80 (2018), pp. 4848--4856.
40.
Z. Wang, K. Ji, Y. Zhou, Y. Liang, and V. Tarokh, SpiderBoost and momentum: Faster variance reduction algorithms, in Advances in Neural Information Processing Systems, Curran Associates, Red Hook, NY, 2019, pp. 2406--2416.
41.
L. Xiao and S. Boyd, Fast linear iterations for distributed averaging, Systems Control Lett., 53 (2004), pp. 65--78.
42.
R. Xin, S. Kar, and U. A. Khan, Decentralized stochastic optimization and machine learning: A unified variance-reduction framework for robust performance and fast convergence, IEEE Signal Processing Mag., 37 (2020), pp. 102--113.
43.
R. Xin, U. A. Khan, and S. Kar, A fast randomized incremental gradient method for decentralized non-convex optimization, IEEE Trans. Automat. Control, to appear.
44.
R. Xin, U. A. Khan, and S. Kar, Fast decentralized nonconvex finite-sum optimization with recursive variance reduction, SIAM J. Optim., 32 (2022), pp. 1--28.
45.
R. Xin, S. Pu, A. Nedić, and U. A. Khan, A general framework for decentralized optimization with first-order methods, Proc. IEEE, 108 (2020), pp. 1869--1889.
46.
H. Ye, Z. Zhou, L. Luo, and T. Zhang, Decentralized accelerated proximal gradient descent, in Advances in Neural Information Processing Systems, Curran Associates, Red Hook, NY, 2020, pp. 18308--18317.
47.
J. Zhang and K. You, Decentralized Stochastic Gradient Tracking for Non-convex Empirical Risk Minimization, preprint, arXiv:1909.02712, 2019.
48.
H. Zhao, B. Li, Z. Li, P. Richtárik, and Y. Chi, BEER: Fast $O (1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression, preprint, arXiv:2201.13320, 2022.
49.
D. Zhou, P. Xu, and Q. Gu, Stochastic nested variance reduction for nonconvex optimization, in Advances in Neural Information Processing Systems, Curran Associates, Red Hook, NY, 2018, pp. 3921--3932.
50.
M. Zhu and S. Martínez, Discrete-time dynamic average consensus, Automatica J. IFAC, 46 (2010), pp. 322--329.

Information & Authors

Information

Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 1031 - 1051
ISSN (online): 2577-0187

History

Submitted: 5 October 2021
Accepted: 24 May 2022
Published online: 4 August 2022

Keywords

  1. decentralized optimization
  2. nonconvex finite-sum optimization
  3. stochastic recursive gradient methods

MSC codes

  1. 68Q25
  2. 68Q11
  3. 68T09

Authors

Affiliations

Funding Information

Carnegie Mellon University https://doi.org/10.13039/100008047
Air Force Research Laboratory https://doi.org/10.13039/100006602 : FA8750-20-2-0504
Office of Naval Research https://doi.org/10.13039/100000006 : N00014-19-1-2404
National Science Foundation https://doi.org/10.13039/100000001 : CCF-1901199, CCF-2007911

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.