Tensor decomposition is a well-known tool for multiway data analysis. This work proposes using stochastic gradients for efficient generalized canonical polyadic (GCP) tensor decomposition of large-scale tensors. GCP tensor decomposition is a recently proposed version of tensor decomposition that allows for a variety of loss functions such as Bernoulli loss for binary data or Huber loss for robust estimation. The stochastic gradient is formed from randomly sampled elements of the tensor and is efficient because it can be computed using the sparse matricized-tensor times Khatri--Rao product tensor kernel. For dense tensors, we simply use uniform sampling. For sparse tensors, we propose two types of stratified sampling that give precedence to sampling nonzeros. Numerical results demonstrate the advantages of the proposed approach and its scalability to large-scale problems.


  1. tensor decomposition
  2. stochastic gradients
  3. stochastic optimization
  4. stratified sampling

MSC codes

  1. 15A69

Get full access to this article

View all available purchase options and get full access to this article.


E. Acar, D. M. Dunlavy, and T. G. Kolda, A scalable optimization approach for fitting canonical tensor decompositions, J. Chemometrics, 25 (2011), pp. 67--86, doi:10.1002/cem.1335.
E. Acar, D. M. Dunlavy, T. G. Kolda, and M. Mørup, Scalable tensor factorizations for incomplete data, Chemometrics and Intelligent Laboratory Systems, 106 (2011), pp. 41--56, doi:10.1016/j.chemolab.2010.08.004.
B. W. Bader and T. G. Kolda, Efficient MATLAB computations with sparse and factored tensors, SIAM J. Sci. Comput., 30 (2007), pp. 205--231, doi:10.1137/060676489.
B. W. Bader, T. G. Kolda, et al., MATLAB Tensor Toolbox Version, Version 3.1, https://www.tensortoolbox.org.
G. Ballard, N. Knight, and K. Rouse, Communication lower bounds for matricized tensor times Khatri-Rao product, in Proceedings of the IEEE International Parallel and Distributed Processing Symposium, IEEE, 2018, doi:10.1109/ipdps.2018.00065.
C. Battaglino, G. Ballard, and T. G. Kolda, A practical randomized CP tensor decomposition, SIAM J. Matrix Anal. Appl., 39 (2018), pp. 876--901, doi:10.1137/17M1112303.
A. Beutel, P. P. Talukdar, A. Kumar, C. Faloutsos, E. E. Papalexakis, and E. P. Xing, FlexiFaCT: Scalable flexible factorization of coupled tensors on Hadoop, in SDM'14: Proceedings of the 2014 SIAM International Conference on Data Mining, 2014, pp. 109--117, doi:10.1137/1.9781611973440.13.
R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, A limited memory algorithm for bound constrained optimization, SIAM J. Sci. Comput., 16 (1995), pp. 1190--1208, doi:10.1137/0916069.
J. D. Carroll and J. J. Chang, Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition, Psychometrika, 35 (1970), pp. 283--319, doi:10.1007/BF02310791.
Y. Chen, S. Bhojanapalli, S. Sanghavi, and R. Ward, Completing any low-rank matrix, provably, J. Mach. Learn. Res., 16 (2015), pp. 2999--3034, http://www.jmlr.org/papers/v16/chen15b.html.
D. Cheng, R. Peng, I. Perros, and Y. Liu, SPALS: Fast alternating least squares via implicit leverage scores sampling, in NIPS'16, 2016, https://papers.nips.cc/paper/6436-spals-fast-alternating-least-squares-via-implicit-leverage-scores-sampling.pdf .
E. C. Chi and T. G. Kolda, On tensors, sparsity, and nonnegative factorizations, SIAM J. Matrix Anal. Appl., 33 (2012), pp. 1272--1299, doi:10.1137/110859063.
R. N. Cochran and F. H. Horne, Statistically weighted principal component analysis of rapid scanning wavelength kinetics experiments, Analytical Chemistry, 49 (1977), pp. 846--853, doi:10.1021/ac50014a045.
R. Ge, F. Huang, C. Jin, and Y. Yuan, Escaping from saddle points---online stochastic gradient for tensor decomposition, in Proceedings of the Conference on Learning Theory, 2015, pp. 797--842, http://proceedings.mlr.press/v40/Ge15.pdf.
R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis, Large-scale matrix factorization with distributed stochastic gradient descent, in KDD'11: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, 2011, doi:10.1145/2020408.2020426.
S. Gopal, Adaptive sampling for SGD by exploiting side information, in Proceedings of the 33rd International Conference on Machine Learning, M. F. Balcan and K. Q. Weinberger, eds., PMLR 48, New York, 2016, pp. 364--372, http://proceedings.mlr.press/v48/gopal16.html.
E. Gujral, R. Pasricha, and E. E. Papalexakis, SamBaTen: Sampling-based batch incremental tensor decomposition, in Proceedings of the SIAM International Conference on Data Mining, 2018, pp. 387--395, doi:10.1137/1.9781611975321.44.
S. Hansen, T. Plantenga, and T. G. Kolda, Newton-based optimization for Kullback-Leibler nonnegative tensor factorizations, Optim. Methods Softw., 30 (2015), pp. 1002--1029, doi:10.1080/10556788.2015.1009977.
R. A. Harshman, Foundations of the PARAFAC Procedure: Models and Conditions for an “Explanatory” Multi-modal Factor Analysis, UCLA Working Papers in Phonetics 16, 1970, pp. 1--84, http://www.psychology.uwo.ca/faculty/harshman/wpppfac0.pdf.
K. Hayashi, G. Ballard, Y. Jiang, and M. J. Tobia, Shared-memory parallelization of MTTKRP for dense tensors, in Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM Press, 2018, doi:10.1145/3178487.3178522.
D. Hong, J. A. Fessler, and L. Balzano, Optimally Weighted PCA for High-Dimensional Heteroscedastic Data, http://arxiv.org/abs/1810.12862v2 [math.ST], 2018.
D. Hong, T. G. Kolda, and J. A. Duersch, Generalized canonical polyadic tensor decomposition, SIAM Rev., 62 (2020), pp. 133--163, doi:10.1137/18M1203626.
J. J. Jansen, H. C. J. Hoefsloot, H. F. M. Boelens, J. van der Greef, and A. K. Smilde, Analysis of longitudinal metabolomics data, Bioinformatics, 20 (2004), pp. 2438--2446, doi:10.1093/bioinformatics/bth268.
O. Kaya and B. Uçar, Scalable sparse tensor decompositions in distributed memory systems, in SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2015, doi:10.1145/2807591.2807624.
D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, arXiv:1412.6980v9 [cs.LG], 2015.
T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51 (2009), pp. 455--500, doi:10.1137/07070111X.
T. G. Kolda, A. Pinar, T. Plantenga, and C. Seshadhri, A scalable generative graph model with community structure, SIAM J. Sci. Comput., 36 (2014), pp. C424--C452, doi:10.1137/130914218.
Y. Koren, R. Bell, and C. Volinsky, Matrix factorization techniques for recommender systems, Computer, 42 (2009), pp. 30--37, doi:10.1109/MC.2009.263.
J. Li, J. Choi, I. Perros, J. Sun, and R. Vuduc, Model-driven sparse CP decomposition for higher-order tensors, in Proceedings of the IEEE International Parallel and Distributed Processing Symposium, IEEE, 2017, pp. 1048--1057, doi:10.1109/ipdps.2017.80.
I. Loshchilov and F. Hutter, Fixing Weight Decay Regularization in Adam, arXiv:1711.05101v2 [cs.LG], 2017.
C. Ma, X. Yang, and H. Wang, Randomized online CP decomposition, in Proceedings of the 10th International Conference on Advanced Computational Intelligence, 2018, pp. 414--419, doi:10.1109/ICACI.2018.8377495.
T. Maehara, K. Hayashi, and K.-i. Kawarabayashi, Expected tensor decomposition with stochastic gradient descent, in Proceedings of AAAI, 2016, pp. 1919--1925.
M. W. Mahoney, Randomized algorithms for matrices and data, Found. Trends Mach. Learn., 3 (2011), pp. 123--224, doi:10.1561/2200000035.
M. Mardani, G. Mateos, and G. B. Giannakis, Subspace learning and imputation for streaming big data matrices and tensors, IEEE Trans. Signal Process., 63 (2015), pp. 2663--2677, doi:10.1109/tsp.2015.2417491.
B. Marlin, R. S. Zemel, S. Roweis, and M. Slaney, Collaborative filtering and the missing at random assumption, in Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence, 2007, pp. 267--275.
D. Needell, N. Srebro, and R. Ward, Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm, Math. Program., 155 (2015), pp. 549--573, doi:10.1007/s10107-015-0864-7.
D. Nion and N. D. Sidiropoulos, Adaptive algorithms to track the PARAFAC decomposition of a third-order tensor, IEEE Trans. Signal Process., 57 (2009), pp. 2299--2310, doi:10.1109/TSP.2009.2016885.
E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos, ParCube: Sparse parallelizable tensor decompositions., in Machine Learning and Knowledge Discovery in Databases (European Conference, ECML PKDD 2012), Lecture Notes in Comput. Sci. 7523, Springer, 2012, pp. 521--536, doi:10.1007/978-3-642-33460-3_39.
A.-H. Phan, P. Tichavsky, and A. Cichocki, Fast alternating LS algorithms for high order CANDECOMP/PARAFAC tensor factorizations, IEEE Trans. Signal Process., 61 (2013), pp. 4834--4846, doi:10.1109/TSP.2013.2269903.
E. Phipps and T. G. Kolda, Software for sparse tensor decomposition on emerging computing architectures, SIAM J. Sci. Comput., 41 (2019), pp. C269--C290, doi:10.1137/18M1210691.
N. D. Sidiropoulos, E. E. Papalexakis, and C. Faloutsos, A parallel algorithm for big tensor decomposition using randomly compressed cubes (PARACOMP), in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2014, doi:10.1109/icassp.2014.6853546.
S. Smith, J. W. Choi, J. Li, R. Vuduc, J. Park, X. Liu, and G. Karypis, FROSTT: The Formidable Repository of Open Sparse Tensors and Tools, http://frostt.io/ (2017).
S. Smith, J. Park, and G. Karypis, An exploration of optimization algorithms for high performance tensor completion, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Press, 2016, pp. 31:1--31:13, doi:10.1109/sc.2016.30.
S. Smith, N. Ravindran, N. D. Sidiropoulos, and G. Karypis, SPLATT: Efficient and parallel sparse tensor-matrix multiplication, in Proceedings of IPDPS 2015: IEEE International Parallel and Distributed Processing Symposium, 2015, pp. 61--70, doi:10.1109/ipdps.2015.27.
Z. Song, D. P. Woodruff, and H. Zhang, Sublinear time orthogonal tensor decomposition, in Advances in Neural Information Processing Systems 30, 2016, https://papers.nips.cc/paper/6495-sublinear-time-orthogonal-tensor-decomposition.pdf.
N. Srebro and T. Jaakkola, Weighted low-rank approximations, in IMCL-2003: Proceedings of the 20th International Conference on Machine Learning, 2003, pp. 720--727, https://www.aaai.org/Papers/ICML/2003/ICML03-094.pdf.
O. Tamuz, T. Mazeh, and S. Zucker, Correcting systematic effects in a large set of photometric light curves, Monthly Notices of the Royal Astronomical Society, 356 (2005), pp. 1466--1470, doi:10.1111/j.1365-2966.2004.08585.x.
M. Udell, C. Horn, R. Zadeh, and S. Boyd, Generalized low rank models, Found. Trends Mach. Learn., 9 (2016), pp. 1--118, doi:10.1561/2200000055.
M. Vandecappelle, N. Vervliet, and L. D. Lathauwer, Nonlinear least squares updating of the canonical polyadic decomposition, in Proceedings of the 25th European Signal Processing Conference, IEEE, 2017, pp. 663--667, doi:10.23919/EUSIPCO.2017.8081290.
A. Vergara, J. Fonollosa, J. Mahiques, M. Trincavelli, N. Rulkov, and R. Huerta, On the performance of gas sensor arrays in open sampling systems using inhibitory support vector machines, Sensors and Actuators B Chemical, 185 (2013), pp. 462 -- 477, https://doi.org/10.1016/j.snb.2013.05.027.
N. Vervliet and L. De Lathauwer, A randomized block sampling approach to canonical polyadic decomposition of large-scale tensors, IEEE J. Sel. Top. Signal Process., 10 (2016), pp. 284--295, doi:10.1109/JSTSP.2015.2503260.
N. Vervliet, O. Debals, and L. De Lathauwer, Tensorlab 3.0---numerical optimization strategies for large-scale constrained and coupled matrix/tensor factorization, in Proceedings of the 50th Asilomar Conference on Signals, Systems and Computers, 2016, pp. 1733--1738, doi:10.1109/ACSSC.2016.7869679.
Y. Wang, H.-Y. Tung, A. J. Smola, and A. Anandkumar, Fast and guaranteed tensor decomposition via sketching, in Advances in Neural Information Processing Systems 28, 2015, pp. 991--999, http://papers.nips.cc/paper/5944-fast-and-guaranteed-tensor-decomposition-via-sketching.pdf.
M. Welling and M. Weber, Positive tensor factorization, Pattern Recognition Letters, 22 (2001), pp. 1255--1261, doi:10.1016/S0167-8655(01)00070-8.
D. P. Woodruff, Sketching as a tool for numerical linear algebra, Found. Trends Theoret. Comput. Sci., 10 (2014), pp. 1--157, doi:10.1561/0400000060.
H. H. Yue and M. Tomoyasu, Weighted principal component analysis and its applications to improve FDC performance, in Proceedings of the 43rd IEEE Conference on Decision and Control, IEEE, 2004, doi:10.1109/cdc.2004.1429421.
P. Zhao and T. Zhang, Accelerating Minibatch Stochastic Gradient Descent Using Stratified Sampling, arXiv:1405.3080v1 [stat.ML], 2014.
P. Zhao and T. Zhang, Stochastic optimization with importance sampling for regularized loss minimization, in Proceedings of the 32nd International Conference on Machine Learning, 2015, pp. 1--9, http://proceedings.mlr.press/v37/zhaoa15.html.
G. Zhou, A. Cichocki, and S. Xie, Decomposition of Big Tensors with Low Multilinear Rank, arXiv:1412.1885, 2014.

Information & Authors


Published In

cover image SIAM Journal on Mathematics of Data Science
SIAM Journal on Mathematics of Data Science
Pages: 1066 - 1095
ISSN (online): 2577-0187


Submitted: 4 June 2019
Accepted: 10 July 2020
Published online: 27 October 2020


  1. tensor decomposition
  2. stochastic gradients
  3. stochastic optimization
  4. stratified sampling

MSC codes

  1. 15A69



Funding Information

Sandia National Laboratories https://doi.org/10.13039/100006234

Funding Information

National Science Foundation https://doi.org/10.13039/100000001 : IIS 1838179

Metrics & Citations



If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.







Copy the content Link

Share with email

Email a colleague

Share on social media