Software and High-Performance Computing

Parallel Candecomp/Parafac Decomposition of Sparse Tensors Using Dimension Trees


CANDECOMP/PARAFAC (CP) decomposition of sparse tensors has been successfully applied to many problems in web search, graph analytics, recommender systems, health care data analytics, and many other domains. In these applications, efficiently computing the CP decomposition of sparse tensors is essential in order to be able to process and analyze data of massive scale. For this purpose, we investigate an efficient computation of the CP decomposition of sparse tensors and its parallelization. We propose a novel computational scheme for reducing the cost of a core operation in computing the CP decomposition with the traditional alternating least squares (CP-ALS) based algorithm. We then effectively parallelize this computational scheme in the context of CP-ALS in shared and distributed memory environments and propose data and task distribution models for better scalability. We implement parallel CP-ALS algorithms and compare our implementations with an efficient tensor factorization library using tensors formed from real-world and synthetic datasets. With our algorithmic contributions and implementations, we report up to 5.96x, 5.65x, and 3.9x speedup in sequential, shared memory parallel, and distributed memory parallel executions over the state of the art and achieve strong scalability up to 4096 cores on an IBM BlueGene/Q supercomputer.


  1. sparse tensors
  2. CP decomposition
  3. dimension tree
  4. parallel algorithms

MSC codes

  1. 15-04
  2. 05C70
  3. 15A69
  4. 15A83

Get full access to this article

View all available purchase options and get full access to this article.


E. Acar, D. M. Dunlavy, and T. G. Kolda, A scalable optimization approach for fitting canonical tensor decompositions, J. Chemometrics, 25 (2011), pp. 67--86.
C. A. Andersson and R. Bro, The N-way toolbox for MATLAB, Chemometrics Intelligent Laboratory Systems, 52 (2000), pp. 1--4.
W. Austin, G. Ballard, and T. G. Kolda, Parallel tensor compression for large-scale scientific data, in Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Chicago, IL, 2016, pp. 912--922.
B. W. Bader and T. G. Kolda, Efficient MATLAB computations with sparse and factored tensors, SIAM J. Sci. Comput., 30 (2007), pp. 205--231.
B. W. Bader, T. G. Kolda, et al., MATLAB Tensor Toolbox Version 2.6, (2015).
M. Baskaran, B. Meister, N. Vasilache, and R. Lethin, Efficient and scalable computations with sparse tensors, in Proceedings of the IEEE Conference on High Performance Extreme Computing, 2012, pp. 1--6.
J. Bennett and S. Lanning, The Netflix Prize, in Proceedings of KDD Cup and Workshop, 2007, p. 35.
S. Bird, E. Loper, and E. Klein, Natural Language Processing with Python, O'Reilly Media, Sebastopol, CA, 2009.
J. Buurlage, Self-Improving Sparse Matrix Partitioning and Bulk-Synchronous Pseudo-Streaming, Master's thesis, Utrecht University, 2016.
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, Jr., and T. M. Mitchell, Toward an architecture for never-ending language learning, in Proceedings of AAAI'10, Vol. 5, 2010, pp. 1306--1313.
D. J. Carroll and J. Chang, Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition, Psychometrika, 35 (1970), pp. 283--319.
Ü. V. Çatalyürek and C. Aykanat, PaToH: A Multilevel Hypergraph Partitioning Tool, Version 3.0, Department of Computer Engineering, Bilkent University, Ankara, Turkey, (1999).
Ü. V. Çatalyürek and C. Aykanat, A hypergraph-partitioning approach for coarse-grain decomposition, in ACM/IEEE 2001 Conference on Supercomputing, Denver, CO, 2001, p. 42.
Ü. V. Çatalyürek, C. Aykanat, and B. Uçar, On two-dimensional sparse matrix partitioning: Models, methods, and a recipe, SIAM J. Sci. Comput., 32 (2010), pp. 656--683.
Ü. V. Çatalyürek, Hypergraph Models for Sparse Matrix Partitioning and Reordering, Ph.D. thesis, Computer Engineering and Information Science, Bilkent University, Ankara, Turkey, 1999.
J. H. Choi and S. V. N. Vishwanathan, DFacTo: Distributed Factorization of Tensors, in Proceedings of the 27th Annual Conference on Advances in Neural Information Processing Systems, Montreal, QC, Canada, 2014, pp. 1296--1304.
O. Görlitz, S. Sizov, and S. Staab, PINTS: Peer-to-peer infrastructure for tagging systems, in Proceedings of the 7th International Conference on Peer-to-Peer Systems, Berkeley, CA, 2008, USENIX Association, p. 19.
L. Grasedyck, Hierarchical singular value decomposition of tensors, SIAM J. Matrix Anal. Appl., 31 (2010), pp. 2029--2054.
R. A. Harshman, Foundations of the PARAFAC procedure: Models and conditions for an “explanatory" multi-modal factor analysis, UCLA Working Papers in Phonetics, 16 (1970), pp. 1--84.
V. Henne, Label Propagation for Hypergraph Partitioning, Master's thesis, Karsruhe Institute of Technology, Germany, 2015.
J. H\rastad, Tensor rank is NP-complete, J. Algorithms, 11 (1990), pp. 644--654.
U. Kang, E. Papalexakis, A. Harpale, and C. Faloutsos, GigaTensor: Scaling tensor analysis up by $100$ times---Algorithms and discoveries, in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2012, pp. 316--324.
L. Karlsson, D. Kressner, and A. Uschmajew, Parallel algorithms for tensor completion in the CP format, Parallel Comput., 57 (2016), pp. 222--234.
G. Karypis and V. Kumar, Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning, Tech. Report 99--034, Department of Computer Science, University of Minnesota, Army HPC Research Center, Minneapolis, 1998.
O. Kaya and B. Uçar, High-Performance Parallel Algorithms for the Tucker Decomposition of Higher Order Sparse Tensors, Tech. Report RR-8801, Inria, 2015.
O. Kaya and B. Uçar, Scalable sparse tensor decompositions in distributed memory systems, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New York, 2015, ACM, pp. 77:1--77:11.
O. Kaya and B. Uçar, High performance parallel algorithms for the Tucker decomposition of sparse tensors, in Proceedings of the 45th International Conference on Parallel Processing, 2016, pp. 103--112.
T. G. Kolda and B. Bader, The TOPHITS model for higher-order web link analysis, in Proceedings of Link Analysis, Counterterrorism and Security, 2006.
T. G. Kolda and B. Bader, Tensor decompositions and applications, SIAM Rev., 51 (2009), pp. 455--500.
L. D. Lathauwer and B. D. Moor, From Matrix to Tensor: Multilinear Algebra and Signal Processing, in Proceedings of the IMA International Canference on Mathematics in Signal Processing, Vol. 67, 1998, pp. 1--16.
T. Lengauer, Combinatorial Algorithms for Integrated Circuit Layout, Wiley--Teubner, Chichester, UK, 1990.
J. Leskovec and A. Krevl, SNAP Datasets: Stanford Large Network Dataset Collection,, (2014).
J. Li, J. Choi, I. Perros, J. Sun, and R. Vuduc, Model-driven sparse CP decomposition for higher-order tensors, in Proceedings of the 31st IEEE International Symposium on Parallel and Distributed Processing, Orlando, FL, 2017, pp. 1048--1057.
I. Perros, R. Chen, R. Vuduc, and J. Sun, Sparse hierarchical Tucker factorization and its application to healthcare, in Proceedings of the 2015 IEEE International Conference on Data Mining, 2015, pp. 943--948.
A. H. Phan, P. Tichavský, and A. Cichocki, Fast alternating LS algorithms for high order CANDECOMP/PARAFAC tensor factorizations, IEEE Trans. Signal Process., 61 (2013), pp. 4834--4846.
S. Rendle and T. S. Lars, Pairwise interaction tensor factorization for personalized tag recommendation, in Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, New York, 2010, pp. 81--90.
S. Rendle, B. M. Leandro, A. Nanopoulos, and L. Schmidt-Thieme, Learning optimal ranking with tensor factorization for tag recommendation, in Proceedings of the 15th ACM International Conference on Knowledge Discovery and Data Mining, New York, 2009, pp. 727--736.
G. M. Slota, K. Madduri, and S. Rajamanickam, PuLP: Scalable multi-objective multi-constraint partitioning for small-world networks, in Proceedings of the 2nd IEEE International Conference on Big Data, 2014, pp. 481--490.
S. Smith and G. Karypis, Tensor-matrix products with a compressed sparse tensor, in Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, ACM, 2015, p. 7.
S. Smith and G. Karypis, A medium-grained algorithm for sparse tensor factorization, in Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Chicago, IL, 2016, pp. 902--911.
S. Smith, J. Park, and G. Karypis, An exploration of optimization algorithms for high performance tensor completion, in Proceedings of the 2016 ACM/IEEE Conference on Supercomputing, 2016.
S. Smith, N. Ravindran, N. D. Sidiropoulos, and G. Karypis, SPLATT: Efficient and parallel sparse tensor-matrix multiplication, in Proceedings of the 29th IEEE International Parallel & Distributed Processing Symposium, Hyderabad, India, 2015, pp. 61--70.
P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos, Tag recommendations based on tensor dimensionality reduction, in Proceedings of the ACM Conference on Recommender Systems, New York, 2008, pp. 43--50.
G. Tomasi and R. Bro, A comparison of algorithms for fitting the PARAFAC model, Comput. Statist. Data Anal., 50 (2006), pp. 1700--1734.
J. Ugander and L. Backstrom, Balanced label propagation for partitioning massive graphs, in Proceedings of the 6th ACM International Conference on Web Search and Data Mining, New York, 2013, pp. 507--516.
M. A. O. Vasilescu and D. Terzopoulos, Multilinear Analysis of Image Ensembles: TensorFaces, in Computer Vision---ECCV 2002, Springer, New York, 2002, pp. 447--460.
N. Zheng, Q. Li, S. Liao, and L. Zhang, Flickr group recommendation based on tensor decomposition, in Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 2010, pp. 737--738.

Information & Authors


Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing
Pages: C99 - C130
ISSN (online): 1095-7197


Submitted: 8 November 2016
Accepted: 14 December 2017
Published online: 20 February 2018


  1. sparse tensors
  2. CP decomposition
  3. dimension tree
  4. parallel algorithms

MSC codes

  1. 15-04
  2. 05C70
  3. 15A69
  4. 15A83



Funding Information

GENCI : 2016-i2016067501

Metrics & Citations



If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options


View PDF







Copy the content Link

Share with email

Email a colleague

Share on social media

On May 28, 2024, our site will enter Read Only mode for a limited time in order to complete a platform upgrade. As a result, the following functions will be temporarily unavailable: registering new user accounts, any updates to existing user accounts, access token activations, and shopping cart transactions. Contact [email protected] with any questions.