Abstract

Distributed computing is a standard way to scale up machine learning and data science algorithms to process large amounts of data. In such settings, avoiding communication amongst machines is paramount for achieving high performance. Rather than distribute the computation of existing algorithms, a common practice for avoiding communication is to compute local solutions or parameter estimates on each machine and then combine the results; in many convex optimization problems, even simple averaging of local solutions can work well. However, these schemes do not work when the local solutions are not unique. Spectral methods are a collection of such problems, where solutions are orthonormal bases of the leading invariant subspace of an associated data matrix. These solutions are only unique up to rotation and reflections. Here, we develop a communication-efficient distributed algorithm for computing the leading invariant subspace of a data matrix. Our algorithm uses a novel alignment scheme that minimizes the Procrustean distance between local solutions and a reference solution and only requires a single round of communication. For the important case of principal component analysis (PCA), we show that our algorithm achieves a similar error rate to that of a centralized estimator. We present numerical experiments demonstrating the efficacy of our proposed algorithm for distributed PCA as well as other problems where solutions exhibit rotational symmetry, such as node embeddings for graph data and spectral initialization for quadratic sensing.

Keywords

  1. distributed computing
  2. spectral methods
  3. nonconvex optimization
  4. principal component analysis
  5. statistics

MSC codes

  1. 62-08
  2. 65F15
  3. 65F55
  4. 68Q87

Get full access to this article

View all available purchase options and get full access to this article.

Supplementary Material


PLEASE NOTE: These supplementary files have not been peer-reviewed.


Index of Supplementary Materials

Title of paper: Communication-Efficient Distributed Eigenspace Estimation

Authors: Vasileios Charisopoulos, Austin R. Benson, and Anil Damle

File: supplement.pdf

Type: PDF File

Contents: Auxiliary results and proofs omitted from manuscript.

References

1.
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, preprint, https://arxiv.org/abs/1603.04467, 2016.
2.
Z. Allen-Zhu and Y. Li, Doubly accelerated methods for faster CCA and generalized eigendecomposition, in Proceedings of the 34th International Conference on Machine Learning (Sydney, Australia), D. Precup and Y. W. Teh, eds., Proc. Mach. Learn. Res. 70, PMLR, 2017, pp. 98--106, http://proceedings.mlr.press/v70/allen-zhu17b.html.
3.
Z. Allen-Zhu and Y. Li, First efficient convergence for streaming k-PCA: A global, gap-free, and near-optimal rate, in Proceedings of the 58th Annual IEEE Symposium on Foundations of Computer Science (FOCS), IEEE, Washington, DC, 2017, pp. 487--492, https://doi.org/10.1109/focs.2017.51.
4.
M. F. Balcan, Y. Liang, L. Song, D. Woodruff, and B. Xie, Communication efficient distributed kernel principal component analysis, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, ACM, New York, 2016, pp. 725--734, https://doi.org/10.1145/2939672.2939796.
5.
R. Bekkerman, M. Bilenko, and J. Langford, Scaling Up Machine Learning, Cambridge University Press, Cambridge, UK, 2009, https://doi.org/10.1017/cbo9781139042918.
6.
A. Bhaskara and P. M. Wijewardena, On distributed averaging for stochastic k-PCA, in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, eds., Curran Associates, Red Hook, NY, 2019, pp. 11026--11035.
7.
R. Bro, E. Acar, and T. G. Kolda, Resolving the sign ambiguity in the singular value decomposition, J. Chemometrics, 22 (2008), pp. 135--140, https://doi.org/10.1002/cem.1122.
8.
E. J. Candes, X. Li, and M. Soltanolkotabi, Phase retrieval via Wirtinger flow: Theory and algorithms, IEEE Trans. Inform. Theory, 61 (2015), pp. 1985--2007, https://doi.org/10.1109/tit.2015.2399924.
9.
X. Chen, J. D. Lee, H. Li, and Y. Yang, Distributed estimation for principal component analysis: An enlarged eigenspace analysis, J. Amer. Statist. Assoc., (2021), https://doi.org/10.1080/01621459.2021.1886937.
10.
Y. Chen and E. Candes, Solving random quadratic systems of equations is nearly as easy as solving linear systems, in Advances in Neural Information Processing Systems 28, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, eds., Curran Associates, Red Hook, NY, 2015, pp. 739--747.
11.
Y. Chen, Y. Chi, and A. Goldsmith, Exact and stable covariance estimation from quadratic sampling via convex programming, IEEE Trans. Inform. Theory, 61 (2015), pp. 4034--4059, https://doi.org/10.1109/tit.2015.2429594.
12.
Y. Chen, L. Su, and J. Xu, Distributed statistical machine learning in adversarial settings, Proc. ACM Meas. Anal. Comput. Syst., 1 (2017), pp. 1--25, https://doi.org/10.1145/3154503.
13.
Y. Chi, Y. M. Lu, and Y. Chen, Nonconvex optimization meets low-rank matrix factorization: An overview, IEEE Trans. Signal Process., 67 (2019), pp. 5239--5269, https://doi.org/10.1109/tsp.2019.2937282.
14.
A. Damle and Y. Sun, Uniform bounds for invariant subspace perturbations, SIAM J. Matrix Anal. Appl., 41 (2020), pp. 1208--1236, https://doi.org/10.1137/19M1262760.
15.
J. Duchi, A. Agarwal, and M. Wainwright, Dual averaging for distributed optimization: Convergence analysis and network scaling, IEEE Trans. Automat. Control, 57 (2012), pp. 592--606, https://doi.org/10.1109/tac.2011.2161027.
16.
J. C. Duchi, M. I. Jordan, M. J. Wainwright, and Y. Zhang, Optimality guarantees for distributed statistical estimation, preprint, https://arxiv.org/abs/1405.0782, 2014.
17.
N. El Karoui and A. d'Aspremont, Second order accurate distributed eigenvector computation for extremely large matrices, Electron. J. Statist., 4 (2010), pp. 1345--1385, https://doi.org/10.1214/10-ejs577.
18.
J. Fan, D. Wang, K. Wang, and Z. Zhu, Distributed estimation of principal eigenspaces, Ann. Statist., 47 (2019), pp. 3009--3031, https://doi.org/10.1214/18-aos1713.
19.
D. Feldman, M. Schmidt, and C. Sohler, Turning Big Data into tiny data: Constant-size coresets for $k$-means, PCA and projective clustering, in Proceedings of the 2013 Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '13, SIAM, Philadelphia, 2013, pp. 1434--1453, https://doi.org/10.1137/1.9781611973105.103.
20.
J. Feng, H. Xu, and S. Mannor, Distributed Robust Learning, preprint, https://arxiv.org/abs/1409.5937, 2014.
21.
D. Garber, E. Hazan, C. Jin, S. Kakade, C. Musco, P. Netrapalli, and A. Sidford, Faster eigenvector computation via shift-and-invert preconditioning, in Proceedings of the 33rd International Conference on Machine Learning (New York, NY), M. F. Balcan and K. Q. Weinberger, eds., Proc. Mach. Learn. Res. 48, PMLR, 2016, pp. 2626--2634, https://proceedings.mlr.press/v48/garber16.html.
22.
D. Garber, O. Shamir, and N. Srebro, Communication-efficient algorithms for distributed stochastic principal component analysis, in Proceedings of the 34th International Conference on Machine Learning (Sydney, Australia), Proc. Mach. Learn. Res. 70, PMLR, 2017, pp. 1203--1212, https://proceedings.mlr.press/v70/garber17a.html.
23.
M. Ghashami, E. Liberty, J. M. Phillips, and D. P. Woodruff, Frequent directions: Simple and deterministic matrix sketching, SIAM J. Comput., 45 (2016), pp. 1762--1792, https://doi.org/10.1137/15m1009718.
24.
G. H. Golub and C. F. Van Loan, Matrix Computations, 2nd ed., Johns Hopkins University Press, Baltimore, MD, 2013.
25.
W. L. Hamilton, R. Ying, and J. Leskovec, Representation Learning on Graphs: Methods and Applications, preprint, https://arxiv.org/abs/1709.05584, 2017.
26.
N. J. Higham, The symmetric Procrustes problem, BIT, 28 (1988), pp. 133--143, https://doi.org/10.1007/bf01934701.
27.
M. Jaggi, V. Smith, M. Takac, J. Terhorst, S. Krishnan, T. Hofmann, and M. I. Jordan, Communication-efficient distributed dual coordinate ascent, in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, eds., Curran Associates, Red Hook, NY, 2014, pp. 3068--3076.
28.
I. Jolliffe, Principal Component Analysis, 2nd ed., Springer Ser. Statist., Springer, New York, 2002, https://doi.org/10.1007/978-1-4757-1904-8.
29.
M. I. Jordan, J. D. Lee, and Y. Yang, Communication-efficient distributed statistical inference, J. Amer. Statist. Assoc., 114 (2018), pp. 668--681, https://doi.org/10.1080/01621459.2018.1429274.
30.
R. Kannan and S. Vempala, Spectral algorithms, Found. Trends Theor. Comput. Sci., 4 (2008), pp. 157--288, https://doi.org/10.1561/0400000025.
31.
R. Kannan, S. Vempala, and D. Woodruff, Principal component analysis and higher correlations for distributed data, in Proceedings of the 27th Conference on Learning Theory (Barcelona, Spain), M. F. Balcan, V. Feldman, and C. Szepesvári, eds., Proc. Mach. Learn. Res. 35, PMLR, 2014, pp. 1040--1057, http://proceedings.mlr.press/v35/kannan14.html.
32.
M. Karow and D. Kressner, On a perturbation bound for invariant subspaces of matrices, SIAM J. Matrix Anal. Appl., 35 (2014), pp. 599--618, https://doi.org/10.1137/130912372.
33.
J. Konečný, H. Brendan McMahan, D. Ramage, and P. Richtárik, Federated Optimization: Distributed Machine Learning for On-Device Intelligence, preprint, https://arxiv.org/abs/1610.02527, 2016.
34.
J. Konečný, B. McMahan, and D. Ramage, Federated Optimization: Distributed Optimization Beyond the Datacenter, preprint, https://arxiv.org/abs/1511.03575, 2015.
35.
R. Kueng, H. Rauhut, and U. Terstiege, Low rank matrix recovery from rank one measurements, Appl. Comput. Harmon. Anal., 42 (2017), pp. 88--116, https://doi.org/10.1016/j.acha.2015.07.007.
36.
L. Lamport, R. Shostak, and M. Pease, The Byzantine generals problem, in Concurrency: The Works of Leslie Lamport, ACM, New York, 2019, pp. 203--226, https://doi.org/10.1145/3335772.3335936.
37.
Y. Liang, M.-F. F. Balcan, V. Kanchanapally, and D. Woodruff, Improved distributed principal component analysis, in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, eds., Curran Associates, Red Hook, NY, 2014, pp. 3113--3121.
38.
R. Livni, S. Shalev-Shwartz, and O. Shamir, On the computational efficiency of training neural networks, in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, eds., Curran Associates, Red Hook, NY, 2014, pp. 855--863.
39.
C. Ma, V. Smith, M. Jaggi, M. Jordan, P. Richtárik, and M. Takac, Adding vs. averaging in distributed primal-dual optimization, in Proceedings of the 32nd International Conference on Machine Learning (Lille, France), F. Bach and D. Blei, eds., Proc. Mach. Learn. Res. 37, PMLR, 2015, pp. 1973--1982.
40.
M. Mahoney, Large Text Compression Benchmark, http://www.mattmahoney.net/dc/textdata, 2011.
41.
A. Nedic and A. Ozdaglar, Distributed subgradient methods for multi-agent optimization, IEEE Trans. Automat. Control, 54 (2009), pp. 48--61, https://doi.org/10.1109/tac.2008.2009515.
42.
J. Nixon, M. Tyers, T. Reguly, J. Rust, A. Winter, M. Livstone, B.-J. Breitkreutz, C. Stark, L. Boucher, A. Chatr-Aryamontri, K. Dolinski, and R. Oughtred, The BioGRID interaction database, Nature Prec., 36 (2011), pp. D637--D640, https://doi.org/10.1038/npre.2011.5627.1.
43.
M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu, Asymmetric transitivity preserving graph embedding, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, ACM, New York, 2016, pp. 1105--1114, https://doi.org/10.1145/2939672.2939751.
44.
L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web, Tech. report, Stanford InfoLab, Stanford University, Stanford, CA, 1999.
45.
J. D. Rosenblatt and B. Nadler, On the optimality of averaging in distributed statistical learning, Inf. Inference, 5 (2016), pp. 379--404, https://doi.org/10.1093/imaiai/iaw013.
46.
K. Scaman, F. Bach, S. Bubeck, Y. T. Lee, and L. Massoulié, Optimal convergence rates for convex distributed optimization in networks, J. Mach. Learn. Res., 20 (2019), pp. 1--31, http://jmlr.org/papers/v20/19-543.html.
47.
O. Shamir, Fast stochastic algorithms for SVD and PCA: Convergence properties and convexity, in International Conference on Machine Learning, New York, NY, 2016, pp. 248--256.
48.
O. Shamir and N. Srebro, Distributed stochastic optimization and learning, in Proceedings of the 52nd Annual Allerton Conference on Communication, Control, and Computing (Monticello, IL), D. Precup and Y. W. Teh, eds., IEEE, Washington, DC, 2014, pp. 1203--1212, https://doi.org/10.1109/allerton.2014.7028543.
49.
O. Shamir, N. Srebro, and T. Zhang, Communication-efficient distributed optimization using an approximate Newton-type method, in Proceedings of the 31st International Conference on Machine Learning (Bejing, China), E. P. Xing and T. Jebara, eds., Proc. Mach. Learn. Res. 32, PMLR, 2014, pp. 1000--1008, http://proceedings.mlr.press/v32/shamir14.html.
50.
V. Smith, S. Forte, C. Ma, M. Takáč, M. I. Jordan, and M. Jaggi, CoCoA: A general framework for communication-efficient distributed optimization, J. Mach. Learn. Res., 18 (2018), pp. 1--49, http://jmlr.org/papers/v18/16-512.html.
51.
G. Stewart, Smooth Local Bases for Perturbed Eigenspaces, Tech. Report TR-5010, Institute for Advanced Computer Studies, University of Maryland, Baltimore, MD, 2012.
52.
C. A. Uribe, S. Lee, A. Gasnikov, and A. Nedic, A dual approach for optimal algorithms in distributed optimization over networks, Optim. Methods Softw., 36 (2020), pp. 171--210, https://doi.org/10.1080/10556788.2020.1750013.
53.
R. Vershynin, High-Dimensional Probability, Camb. Ser. Stat. Probab. Math. 47, Cambridge University Press, Cambridge, UK, 2018, https://doi.org/10.1017/9781108231596.
54.
U. von Luxburg, A tutorial on spectral clustering, Stat. Comput., 17 (2007), pp. 395--416, https://doi.org/10.1007/s11222-007-9033-z.
55.
M. J. Wainwright, High-Dimensional Statistics, Camb. Ser. Stat. Probab. Math. 48, Cambridge University Press, Cambridge, UK, 2019, https://doi.org/10.1017/9781108627771.
56.
B. A. y Arcas, Decentralized machine learning, in 2018 IEEE International Conference on Big Data, IEEE, Washington, DC, 2018, https://doi.org/10.1109/bigdata.2018.8622078.
57.
Y. Zhang, J. Duchi, M. I. Jordan, and M. J. Wainwright, Information-theoretic lower bounds for distributed statistical estimation with communication constraints, in Advances in Neural Information Processing Systems 2013, NeurIPS, San Diego, CA, 2013, pp. 2328--2336.
58.
Y. Zhang, J. C. Duchi, and M. J. Wainwright, Communication-efficient algorithms for statistical optimization, J. Mach. Learn. Res., 14 (2013), pp. 3321--3363, http://jmlr.org/papers/v14/zhang13b.html.
59.
Z. Zhang, C. Chang, H. Lin, Y. Wang, R. Arora, and X. Jin, Is network the bottleneck of distributed training?, in Proceedings of the Workshop on Network Meets AI & ML, NetAI '20, ACM, New York, 2020, pp. 8--13, https://doi.org/10.1145/3405671.3405810.

Information & Authors

Information

Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 1067 - 1092
ISSN (online): 2577-0187

History

Submitted: 10 September 2020
Accepted: 28 June 2021
Published online: 5 October 2021

Keywords

  1. distributed computing
  2. spectral methods
  3. nonconvex optimization
  4. principal component analysis
  5. statistics

MSC codes

  1. 62-08
  2. 65F15
  3. 65F55
  4. 68Q87

Authors

Affiliations

Vasileios Charisopoulos

Funding Information

Multidisciplinary University Research Initiative https://doi.org/10.13039/100014036
Army Research Office https://doi.org/10.13039/100000183 : W911NF19-1-0057
JPMorgan Chase and Company https://doi.org/10.13039/100004332
National Science Foundation https://doi.org/10.13039/100000001 : DMS-1830274

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.