Abstract.

Multiple tensor-times-matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower bounds that determine how much data movement is required (under mild conditions) to perform the Multi-TTM computation in parallel. The crux of the proof relies on analytically solving a constrained, nonlinear optimization problem. We also present a parallel algorithm to perform this computation that organizes the processors into a logical grid with twice as many modes as the input tensor. We show that, with correct choices of grid dimensions, the communication cost of the algorithm attains the lower bounds and is therefore communication optimal. Finally, we show that our algorithm can significantly reduce communication compared to the straightforward approach of expressing the computation as a sequence of tensor-times-matrix operations when the input and output tensors vary greatly in size.

Keywords

  1. communication lower bounds
  2. Multi-TTM
  3. tensor computations
  4. parallel algorithms
  5. HBL-inequalities

MSC codes

  1. 15A69
  2. 68Q17
  3. 68Q25
  4. 68W10
  5. 68W15
  6. 68W40

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
H. Al Daas, G. Ballard, L. Grigori, S. Kumar, and K. Rouse, Tight memory-independent parallel matrix multiplication communication lower bounds, in Proceedings of the 34th Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’22, ACM, New York, NY, 2022, pp. 445–448, https://doi.org/10.1145/3490148.3538552.
2.
W. Austin, G. Ballard, and T. G. Kolda, Parallel tensor compression for large-scale scientific data, in Proceedings of the 30th IEEE International Parallel and Distributed Processing Symposium, IEEE, 2016, pp. 912–922, https://doi.org/10.1109/IPDPS.2016.67.
3.
G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, and O. Schwartz, Communication lower bounds and optimal algorithms for numerical linear algebra, Acta Numer., 23 (2014), pp. 1–155, https://doi.org/10.1017/S0962492914000038.
4.
G. Ballard, A. Klinvex, and T. G. Kolda, TuckerMPI: A parallel C++/MPI software package for large-scale data compression via the Tucker tensor decomposition, ACM Trans. Math. Software, 46 (2020), pp. 1–31, https://doi.org/10.1145/3378445.
5.
G. Ballard, N. Knight, and K. Rouse, Communication lower bounds for matricized tensor times Khatri–Rao product, in 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2018, pp. 557–567, https://doi.org/10.1109/IPDPS.2018.00065.
6.
G. Ballard and K. Rouse, General memory-independent lower bound for MTTKRP, in Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Computing (PP), SIAM, 2020, pp. 1–11, https://doi.org/10.1137/1.9781611976137.1.
7.
J. Bennett, A. Carbery, M. Christ, and T. Tao, Finite bounds for Hölder–Brascamp–Lieb multilinear inequalities, Math. Res. Lett., 17 (2010), pp. 647–666, https://doi.org/10.4310/MRL.2010.v17.n4.a6.
8.
R. Bro and C. A. Andersson, Improving the speed of multiway algorithms: Part II: Compression, Chemometrics Intell. Lab. Syst., 42 (1998), pp. 105–113, https://doi.org/10.1016/S0169-7439(98)00011-2.
9.
V. T. Chakaravarthy, J. W. Choi, D. J. Joseph, X. Liu, P. Murali, Y. Sabharwal, and D. Sreedhar, On optimizing distributed Tucker decomposition for dense tensors, in 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2017, pp. 1038–1047, https://doi.org/10.1109/IPDPS.2017.86.
10.
E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn, Collective communication: Theory, practice, and experience, Concurrency Comput. Pract. Experience, 19 (2007), pp. 1749–1783, https://doi.org/10.1002/cpe.1206.
11.
J. Choi, X. Liu, and V. Chakaravarthy, High-performance dense Tucker decomposition on GPU clusters, in Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC ’18, IEEE, Piscataway, NJ, 2018, pp. 1–11, http://dl.acm.org/citation.cfm?id=3291656.3291712.
12.
M. Christ, J. Demmel, N. Knight, T. Scanlon, and K. A. Yelick, Communication Lower Bounds and Optimal Algorithms for Programs that Reference Arrays - Part 1, Technical report UCB/EECS-2013-61, EECS Department, University of California, Berkeley, 2013, http://www2.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-61.html.
13.
H. A. Daas, G. Ballard, L. Grigori, S. Kumar, and K. Rouse, Communication Lower Bounds and Optimal Algorithms for Multiple Tensor-Times-Matrix Computation, 2022, https://arxiv.org/abs/2207.10437.
14.
J. Demmel, D. Eliahu, A. Fox, S. Kamil, B. Lipshitz, O. Schwartz, and O. Spillinger, Communication-optimal parallel recursive rectangular matrix multiplication, in 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, IEEE, 2013, pp. 261–272, https://doi.org/10.1109/IPDPS.2013.80.
15.
J.-W. Hong and H. T. Kung, I/O complexity: The red-blue pebble game, in Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, STOC ’81, Association for Computing Machinery, New York, NY, 1981, pp. 326–333, https://doi.org/10.1145/800076.802486.
16.
D. Irony, S. Toledo, and A. Tiskin, Communication lower bounds for distributed-memory matrix multiplication, J. Parallel Distrib. Comput., 64 (2004), pp. 1017–1026, https://doi.org/10.1016/j.jpdc.2004.03.021.
17.
N. Knight, Communication-Optimal Loop Nests, Ph.D. thesis, University of California, Berkeley, 2015, http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-185.html.
18.
T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51 (2009), pp. 455–500, https://doi.org/10.1137/07070111X.
19.
H. Kolla, K. Aditya, and J. H. Chen, Higher order tensors for DNS data analysis and compression, in Data Analysis for Direct Numerical Simulations of Turbulent Combustion, H. Pitsch and A. Attili, eds., Springer, Cham, Switzerland, 2020, pp. 109–134, https://doi.org/10.1007/978-3-030-44718-2_6.
20.
L. De Lathauwer, B. De Moor, and J. Vandewalle, A multilinear singular value decomposition, SIAM J. Matrix Anal. Appl., 21 (2000), pp. 1253–1278, https://doi.org/10.1137/S0895479896305696.
21.
L. H. Loomis and H. Whitney, An inequality related to the isoperimetric inequality, Bull. Amer. Math. Soc. (N.S.), 55 (1949), pp. 961–962.
22.
L. Ma and E. Solomonik, Accelerating alternating least squares for tensor decomposition by pairwise perturbation, Numer. Linear Algebra Appl., 29 (2022), pp. 1–33, https://doi.org/10.1002/nla.2431.
23.
R. Minster, Z. Li, and G. Ballard, Parallel Randomized Tucker Decomposition Algorithms, 2022, https://arxiv.org/abs/2211.13028.
24.
R. Minster, A. K. Saibaba, and M. E. Kilmer, Randomized algorithms for low-rank tensor decompositions in the Tucker format, SIAM J. Math. Data Sci., 2 (2020), pp. 189–215, https://doi.org/10.1137/19M1261043.
25.
T. M. Smith, B. Lowery, J. Langou, and R. A. van de Geijn, A Tight I/O Lower Bound for Matrix Multiplication, 2019, https://arxiv.org/abs/1702.02017.
26.
E. Solomonik, J. Demmel, and T. Hoefler, Communication lower bounds of bilinear algorithms for symmetric tensor contractions, SIAM J. Sci. Comput., 43 (2021), pp. A3328–A3356, https://doi.org/10.1137/20M1338599.
27.
Y. Sun, Y. Guo, C. Luo, J. Tropp, and M. Udell, Low-rank Tucker approximation of a tensor from streaming data, SIAM J. Math. Data Sci., 2 (2020), pp. 1123–1150, https://doi.org/10.1137/19M1257718.
28.
R. Thakur, R. Rabenseifner, and W. Gropp, Optimization of collective communication operations in MPICH, Int. J. High Performance Comput. Appl., 19 (2005), pp. 49–66, https://doi.org/10.1177/1094342005051521.
29.
L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika, 31 (1966), pp. 279–311, https://doi.org/10.1007/BF02289464.
30.
A. N. Ziogas, G. Kwasniewski, T. Ben-Nun, T. Schneider, and T. Hoefler, Deinsum: Practically I/O optimal multi-linear algebra, in SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE, 2022, pp. 1–15, https://doi.org/10.1109/SC41404.2022.00030.

Information & Authors

Information

Published In

cover image SIAM Journal on Matrix Analysis and Applications
SIAM Journal on Matrix Analysis and Applications
Pages: 450 - 477
ISSN (online): 1095-7162

History

Submitted: 20 July 2022
Accepted: 4 August 2023
Published online: 6 February 2024

Keywords

  1. communication lower bounds
  2. Multi-TTM
  3. tensor computations
  4. parallel algorithms
  5. HBL-inequalities

MSC codes

  1. 15A69
  2. 68Q17
  3. 68Q25
  4. 68W10
  5. 68W15
  6. 68W40

Authors

Affiliations

Science and Technology Facilities Council, Rutherford Appleton Laboratory, Didcot, Oxfordshire, OX11 0QX, UK.
Computer Science, Wake Forest University, Winston-Salem, NC 27106 USA.
Ecole Polytechnique Fédérale de Lausanne PFL Institute of Mathematics, 1015 Lausanne and Paul Scherrer Institute, Laboratory for Simulation and Modelling, 5232 PSI Villigen, Switzerland; Significant part of the work performed while at Sorbonne Université, Inria, CNRS, Université de Paris, Laboratoire Jacques-Louis Lions, Paris, France.
Inria Lyon Centre, France; Significant part of the work performed while at Sorbonne Université, Inria, CNRS, Université de Paris, Laboratoire Jacques-Louis Lions, Paris, France.
Inmar Intelligence, Winston-Salem, NC 27101 USA.

Funding Information

US Department of Energy, Office of Science, Advanced Scientific Computing Research: DE-SC-0023296
Funding: This work is supported by the National Science Foundation under grant CCF-1942892 and OAC-2106920. This material is based upon work supported by the US Department of Energy, Office of Science, Advanced Scientific Computing Research program under award DE-SC-0023296. This project received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement 810367).

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Full Text

View Full Text

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media