Abstract

The low-rank canonical polyadic tensor decomposition is useful in data analysis and can be computed by solving a sequence of overdetermined least squares subproblems. Motivated by consideration of sparse tensors, we propose sketching each subproblem using leverage scores to select a subset of the rows, with probabilistic guarantees on the solution accuracy. We randomly sample rows proportional to leverage score upper bounds that can be efficiently computed using the special Khatri--Rao subproblem structure inherent in tensor decomposition. Crucially, for a $(d+1)$-way tensor, the number of rows in the sketched system is $O(r^d/\epsilon)$ for a decomposition of rank $r$ and $\epsilon$-accuracy in the least squares solve, independent of both the size and the number of nonzeros in the tensor. Along the way, we provide a practical solution to the generic matrix sketching problem of sampling overabundance for high-leverage-score rows, proposing to include such rows deterministically and combine repeated samples in the sketched system; we conjecture that this can lead to improved theoretical bounds. Numerical results on real-world large-scale tensors show the method is significantly faster than deterministic methods at nearly the same level of accuracy.

Keywords

  1. tensor decomposition
  2. CANDECOMP/PARAFAC
  3. canonical polyadic
  4. CP
  5. matrix sketching
  6. leverage score sampling
  7. randomized numerical linear algebra
  8. RandNLA

MSC codes

  1. 15A69

Get full access to this article

View all available purchase options and get full access to this article.

Supplementary Material


PLEASE NOTE: These supplementary files have not been peer-reviewed.


Index of Supplementary Materials

Title of paper: Practical Leverage-Based Sampling for Low-Rank Tensor Decomposition

Authors: Brett W. Larsen and Tamara G. Kolda

File: EndToEndComplexity.pdf

Type: PDF

Contents: CP-ARLS-LEV End-to-End Complexity.


File: EnronRRF.pdf

Type: PDF

Contents: Enron Tensor with RRF Initialization.


File: UberFullRuns.pdf

Type: PDF

Contents: Detailed Runs on Uber Tensor.


File: RedditFactors.pdf

Type: PDF

Contents: Visualizations of Reddit Factors.

References

1.
E. Acar and B. Yener, Unsupervised multiway data analysis: A literature survey, IEEE Trans. Knowledge Data Engineering, 21 (2009), pp. 6--20, https://doi.org/10.1109/TKDE.2008.112.
2.
K. S. Aggour, A. Gittens, and B. Yener, Adaptive sketching for fast and convergent canonical polyadic decomposition, in Proceedings of the International Conference on Machine Learning, 2020, http://proceedings.mlr.press/v119/gittens20a.html.
3.
S. Ahmadi-Asl, S. Abukhovich, M. G. Asante-Mensah, A. Cichocki, A. H. Phan, T. Tanaka, and I. Oseledets, Randomized algorithms for computation of Tucker decomposition and higher order SVD (HOSVD), IEEE Access, 9 (2021), pp. 28684--28706, https://doi.org/10.1109/access.2021.3058103.
4.
B. W. Bader and T. G. Kolda, Efficient MATLAB computations with sparse and factored tensors, SIAM J. Sci. Comput., 30 (2007), pp. 205--231, https://doi.org/10.1137/060676489.
5.
B. W. Bader, T. G. Kolda, et al., MATLAB Tensor Toolbox Version 3.1, https://www.tensortoolbox.org, 2019.
6.
C. Battaglino, G. Ballard, and T. G. Kolda, A practical randomized CP tensor decomposition, SIAM J. Matrix Anal. Appl., 39 (2018), pp. 876--901, https://doi.org/10.1137/17M1112303, arXiv:1701.06600.
7.
J. D. Carroll and J. J. Chang, Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition, Psychometrika, 35 (1970), pp. 283--319, https://doi.org/10.1007/BF02310791.
8.
D. Cheng, R. Peng, I. Perros, and Y. Liu, SPALS: Fast alternating least squares via implicit leverage scores sampling, in Proceedings of NIPS'16, 2016, https://papers.nips.cc/paper/6436-spals-fast-alternating-least-squares-via-implicit-leverage-scores-sampling.pdf.
9.
G. Drakopoulos, A. Kanavos, I. Karydis, S. Sioutas, and A. G Vrahatis, Tensor-based semantically-aware topic clustering of biomedical documents, Computation, 5 (2017), 34, https://doi.org/10.3390/computation5030034.
10.
P. Drineas, R. Kannan, and M. W. Mahoney, Fast Monte Carlo algorithms for matrices I: Approximating matrix multiplication, SIAM J. Comput., 36 (2006), pp. 132--157, https://doi.org/10.1137/s0097539704442684.
11.
P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff, Fast approximation of matrix coherence and statistical leverage, J. Mach. Learn. Res., 13 (2012), pp. 3475--3506, http://www.jmlr.org/papers/v13/drineas12a.html.
12.
P. Drineas and M. W. Mahoney, A randomized algorithm for a tensor-based generalization of the singular value decomposition, Linear Algebra Appl., 420 (2007), pp. 553--571, https://doi.org/10.1016/j.laa.2006.08.023.
13.
P. Drineas and M. W. Mahoney, Lectures on Randomized Numerical Linear Algebra, arXiv:1712.08880, 2017.
14.
P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlós, Faster least squares approximation, Numer. Math., 117 (2011), pp. 219--249, https://doi.org/10.1007/s00211-010-0331-6.
15.
C. Glowacz, V. Grosso, R. Poussier, J. Schüth, and F.-X. Standaert, Simpler and more efficient rank estimation for side-channel security assessment, in International Workshop on Fast Software Encryption, Springer, New York, 2015, pp. 117--129, https://doi.org/10.1007/978-3-662-48116-5_6.
16.
R. A. Harshman, Foundations of the PARAFAC Procedure: Models and Conditions for an “Explanatory" Multi-Modal Factor Analysis, UCLA Working Papers in Phonetics 16, 1970, http://www.psychology.uwo.ca/faculty/harshman/wpppfac0.pdf.
17.
M. A. Iwen, D. Needell, E. Rebrova, and A. Zare, Lower memory oblivious (tensor) subspace embeddings with fewer random bits: Modewise methods for least squares, SIAM J. Matrix Anal. Appl., 42 (2021), pp. 376--416, https://doi.org/10.1137/19M1308116.
18.
R. Jin, T. G. Kolda, and R. Ward, Faster Johnson-Lindenstrauss transforms via Kronecker products, Inform. Inference, 10 (2021), pp. 1533--1562, https://doi.org/10.1093/imaiai/iaaa028.
19.
T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51 (2009), pp. 455--500, https://doi.org/10.1137/07070111X.
20.
T. G. Kolda and D. Hong, Stochastic gradients for large-scale tensor decomposition, SIAM J. Math. Data Sci., 2 (2020), pp. 1066--1095, https://doi.org/10.1137/19m1266265.
21.
M. W. Mahoney, Randomized algorithms for matrices and data, Found. Trends Mach. Learn., 3 (2011), pp. 123--224, https://doi.org/10.1561/2200000035.
22.
M. W. Mahoney, M. Maggioni, and P. Drineas, Tensor-cur decompositions for tensor-based data, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 957--987.
23.
O. A. Malik and S. Becker, Low-rank tucker decomposition of large tensors using TensorSketch, in Proceedings of Advances in Neural Information Processing Systems, 2018, pp. 10096--10106, https://proceedings.neurips.cc/paper/2018/hash/45a766fa266ea2ebeb6680fa139d2a3d-Abstract.html.
24.
O. A. Malik and S. Becker, Guarantees for the Kronecker fast Johnson--Lindenstrauss transform using a coherence and sampling argument, Linear Algebra Appl., 602 (2020), pp. 120--137, https://doi.org/10.1016/j.laa.2020.05.004.
25.
D. P. Martin, J. F. O’connell, E. Oswald, and M. Stam, Counting keys in parallel after a side channel attack, in International Conference on the Theory and Application of Cryptology and Information Security, Springer, New York, 2015, pp. 313--337, https://doi.org/10.1007/978-3-662-48800-3_13.
26.
P.-G. Martinsson and J. A. Tropp, Randomized numerical linear algebra: Foundations and algorithms, Acta Numer., 29 (2020), pp. 403--572, https://doi.org/10.1017/s0962492920000021.
27.
K. Maruhashi, F. Guo, and C. Faloutsos, Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis, in Proceedings of the International Conference on Advances in Social Networks Analysis and Mining, IEEE, 2011, pp. 203--210, https://doi.org/10.1109/ASONAM.2011.80.
28.
Y. Mu, W. Ding, M. Morabito, and D. Tao, Empirical discriminative tensor analysis for crime forecasting, in International Conference on Knowledge Science, Engineering and Management, Springer, New York, 2011, pp. 293--304, https://doi.org/10.1007/978-3-642-25975-3_26.
29.
M. Nakatsuji, Q. Zhang, X. Lu, B. Makni, and J. A. Hendler, Semantic social network analysis by cross-domain tensor factorization, IEEE Trans. Comput. Social Syst., 4 (2017), pp. 207--217, https://doi.org/10.1109/TCSS.2017.2732685.
30.
D. Papailiopoulos, A. Kyrillidis, and C. Boutsidis, Provable deterministic leverage score sampling, in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2014, pp. 997--1006, https://doi.org/10.1145/2623330.2623698.
31.
E. E. Papalexakis, K. Pelechrinis, and C. Faloutsos, Location based social network analysis using tensors and signal processing tools, in Proceedings of the IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, IEEE, 2015, pp. 93--96, https://doi.org/10.1109/CAMSAP.2015.7383744.
32.
R. Poussier, F.-X. Standaert, and V. Grosso, Simple key enumeration (and rank estimation) using histograms: An integrated approach, in International Conference on Cryptographic Hardware and Embedded Systems, Springer, New York, 2016, pp. 61--81, https://doi.org/10.1007/978-3-662-53140-2_4.
33.
A. Sapienza, A. Bessi, and E. Ferrara, Non-negative tensor factorization for human behavioral pattern mining in online games, Information, 9 (2018), 66, https://doi.org/10.3390/info9030066.
34.
N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis, and C. Faloutsos, Tensor decomposition for signal processing and machine learning, IEEE Trans. Signal Process., 65 (2017), pp. 3551--3582, https://doi.org/10.1109/TSP.2017.2690524.
35.
S. Smith, J. W. Choi, J. Li, R. Vuduc, J. Park, X. Liu, and G. Karypis, FROSTT: The Formidable Repository of Open Sparse Tensors and Tools, http://frostt.io/, 2017.
36.
Z. Song, D. P. Woodruff, and P. Zhong, Relative error tensor low rank approximation, in Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, 2019, pp. 2772--2789, http://dl.acm.org/citation.cfm?id=3310435.3310607.
37.
Y. Sun, Y. Guo, C. Luo, J. Tropp, and M. Udell, Low-rank Tucker approximation of a tensor from streaming data, SIAM J. Math. Data Sci., 2 (2019), pp. 1123--1150, https://doi.org/10.1137/19M1257718.
38.
Y. Wang, H.-Y. Tung, A. J. Smola, and A. Anandkumar, Fast and guaranteed tensor decomposition via sketching, in Proceedings of Advances in Neural Information Processing Systems, 2015, pp. 991--999, http://papers.nips.cc/paper/5944-fast-and-guaranteed-tensor-decomposition-via-sketching.pdf.
39.
D. P. Woodruff, Sketching as a tool for numerical linear algebra, Found. Trends Theoret. Comput. Sci., 10 (2014), pp. 1--157, https://doi.org/10.1561/0400000060.
40.
Y. Yan, Y. Tao, J. Xu, S. Ren, and H. Lin, Visual analytics of bike-sharing data based on tensor factorization, J. Visualization, 21 (2018), pp. 495--509, https://doi.org/10.1007/s12650-017-0463-1.
41.
B. Yang, A. Zamzam, and N. D. Sidiropoulos, Parasketch: Parallel tensor factorization via sketching, in Proceedings of the 2018 SIAM International Conference on Data Mining, SIAM, 2018, pp. 396--404, https://doi.org/10.1137/1.9781611975321.45.
42.
A. R. Zhang, Y. Luo, G. Raskutti, and M. Yuan, ISLET: Fast and optimal low-rank tensor regression via importance sketching, SIAM J. Math. Data Sci., 2 (2020), pp. 444--479, https://doi.org/10.1137/19M126476X.
43.
G. Zhou, A. Cichocki, and S. Xie, Decomposition of Big Tensors with Low Multilinear Rank, http://arxiv.org/abs/1412.1885, 2014.

Information & Authors

Information

Published In

cover image SIAM Journal on Matrix Analysis and Applications
SIAM Journal on Matrix Analysis and Applications
Pages: 1488 - 1517
ISSN (online): 1095-7162

History

Submitted: 20 August 2021
Accepted: 6 June 2022
Published online: 30 August 2022

Keywords

  1. tensor decomposition
  2. CANDECOMP/PARAFAC
  3. canonical polyadic
  4. CP
  5. matrix sketching
  6. leverage score sampling
  7. randomized numerical linear algebra
  8. RandNLA

MSC codes

  1. 15A69

Authors

Affiliations

Funding Information

U.S. Department of Energy https://doi.org/10.13039/100000015
U.S. Department of Energy https://doi.org/10.13039/100000015 : DE-FG02-97ER25308

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.