Abstract.

Randomized matrix algorithms have become workhorse tools in scientific computing and machine learning. To use these algorithms safely in applications, they should be coupled with posterior error estimates to assess the quality of the output. To meet this need, this paper proposes two diagnostics: a leave-one-out error estimator for randomized low-rank approximations and a jackknife resampling method to estimate the variance of the output of a randomized matrix computation. Both of these diagnostics are rapid to compute for randomized low-rank approximation algorithms such as the randomized SVD and randomized Nyström approximation, and they provide useful information that can be used to assess the quality of the computed output and guide algorithmic parameter choices.

Keywords

  1. jackknife resampling
  2. low-rank approximation
  3. error estimation
  4. randomized algorithms

MSC codes

  1. 62F40
  2. 65F55
  3. 68W20

Get full access to this article

View all available purchase options and get full access to this article.

Disclaimer.

This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

Acknowledgment.

We thank Robert Webber for his advice and feedback.

Supplementary Materials

PLEASE NOTE: These supplementary files have not been peer-reviewed.
Index of Supplementary Materials
Title of paper: Efficient Error and Variance Estimation for Randomized Matrix Computations
Authors: Ethan N. Epperly and Joel A. Tropp
File: 126831_1_supp_537877_rzx98s_sc.pdf
Type: PDF
Contents: Supplement, including additional algorithmic details, more numerical experiments, and code segments

References

1.
D. Arthur and S. Vassilvitskii, k-means++: The advantages of careful seeding, in Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, Philadelphia, 2007, pp. 1027–1035.
2.
P. Drineas and M. W. Mahoney, RandNLA: Randomized numerical linear algebra, Commun. ACM, 59 (2016), pp. 80–90.
3.
B. Efron, The Jackknife, the Bootstrap and Other Resampling Plans, SIAM, Philadelphia, 1982.
4.
B. Efron and C. Stein, The jackknife estimate of variance, Ann. Statist., 9 (1981), pp. 586–596.
5.
E. N. Epperly, J. A. Tropp, and R. J. Webber, \(XT{\scriptstyle RACE}\): Making the most out of every sample in stochastic trace estimation, SIAM J. Matrix Anal. Appl., 45 (2024), pp. 1–23, https://doi.org/10.1137/23M1548323.
6.
A. Gittens and M. Mahoney, Revisiting the Nyström method for improved large-scale machine learning, in Proceedings of the 30th International Conference on Machine Learning, PMLR, 2013, pp. 567–575.
7.
N. Halko, P.-G. Martinsson, and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Rev., 53 (2011), pp. 217–288.
8.
H. Li, G. C. Linderman, A. Szlam, K. P. Stanton, Y. Kluger, and M. Tygert, Algorithm 971: An implementation of a randomized algorithm for principal component analysis, ACM Trans. Math. Software, 43 (2017), 28.
9.
M. Lopes, N. B. Erichson, and M. Mahoney, Error estimation for sketched SVD via the bootstrap, in Proceedings of the 37th International Conference on Machine Learning, Proc. Mach. Learn. Res. 119, PMLR, 2020, pp. 6382–6392.
10.
M. E. Lopes, N. B. Erichson, and M. W. Mahoney, Bootstrapping the operator norm in high dimensions: Error estimation for covariance matrices and sketching, Bernoulli, 29 (2023), pp. 428–450.
11.
M. E. Lopes, S. Wang, and M. W. Mahoney, A bootstrap method for error estimation in randomized matrix multiplication, J. Mach. Learn. Res., 20 (2019), 39.
12.
P.-G. Martinsson and J. A. Tropp, Randomized numerical linear algebra: Foundations and algorithms, Acta Numer., 29 (2020), pp. 403–572.
13.
R. Murray, J. Demmel, M. W. Mahoney, N. B. Erichson, M. Melnichenko, O. A. Malik, L. Grigori, P. Luszczek, M. Derezinski, M. E. Lopes, T. Liang, H. Luo, and J. Dongarra, Randomized Numerical Linear Algebra: A Perspective on the Field with an Eye to Software, http://arxiv.org/abs/2302.11474, 2023.
14.
C. Musco and C. Musco, Randomized block Krylov methods for stronger and faster approximate singular value decomposition, in Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 1, MIT Press, Cambridge, MA, 2015, pp. 1396–1404.
15.
D. Paulin, L. Mackey, and J. A. Tropp, Efron–Stein inequalities for random matrices, Ann. Probab., 44 (2016), pp. 3431–3473.
16.
S. Puntanen and G. P. H. Styan, Historical introduction: Issai Schur and the early development of the Schur complement, in The Schur Complement and Its Applications, Numer. Methods Algorithms 4, F. Zhang, ed., Springer, New York, 2005, pp. 1–16.
17.
M. H. Quenouille, Approximate tests of correlation in time-series, J. R. Stat. Soc. Ser. B. Methodol., 11 (1949), pp. 68–84.
18.
R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. von Lilienfeld, Quantum chemistry structures and properties of 134 kilo molecules, Scientific Data, 1 (2014), 140022.
19.
L. Ruddigkeit, R. van Deursen, L. C. Blum, and J.-L. Reymond, Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17, J. Chem. Inf. Model., 52 (2012), pp. 2864–2875.
20.
Y. Saad, Numerical Methods for Large Eigenvalue Problems, Manchester University Press, Manchester, UK, 1992.
21.
J. M. Steele, An Efron-Stein inequality for nonsymmetric statistics, Ann. Statist., 14 (1986), pp. 753–758.
22.
R. J. Tibshirani and B. Efron, An Introduction to the Bootstrap, Monogr. Statist. Appl. Probab. 57, CRC Press, Boca Raton, FL, 1993.
23.
J. A. Tropp and R. J. Webber, Randomized Algorithms for Low-Rank Matrix Approximation: Design, Analysis, and Applictions, 2023, https://doi.org/10.48550/arXiv.2306.12418.
24.
J. A. Tropp, A. Yurtsever, M. Udell, and V. Cevher, Fixed-rank approximation of a positive-semidefinite matrix from streaming data, in Advances in Neural Information Processing Systems, Vol. 30, 2017, pp. 1225–1234.
25.
J. A. Tropp, A. Yurtsever, M. Udell, and V. Cevher, Streaming low-rank matrix approximation with an application to scientific simulation, SIAM J. Sci. Comput., 41 (2019), pp. A2430–A2463.
26.
J. Tukey, Bias and confidence in not quite large samples, Ann. Math. Statist., 29 (1958), p. 614.
27.
U. Von Luxburg, A tutorial on spectral clustering, Stat. Comput., 17 (2007), pp. 395–416.
28.
C. K. I. Williams and M. Seeger, Using the Nyström method to speed up kernel machines, in Proceedings of the 13th International Conference on Neural Information Processing Systems, MIT Press, Cambridge, MA, 2000, pp. 661–667.
29.
D. P. Woodruff, Sketching as a tool for numerical linear algebra, Found. Trends Theoret. Comput. Sci., 10 (2014), pp. 1–157.
30.
J. Yao, N. B. Erichson, and M. E. Lopes, Error estimation for random Fourier features, in Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, Proc. Mach. Learn. Res. 206, PMLR, 2023, pp. 2348–2364.

Information & Authors

Information

Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing
Pages: A508 - A528
ISSN (online): 1095-7197

History

Submitted: 13 March 2023
Accepted: 30 October 2023
Published online: 8 February 2024

Keywords

  1. jackknife resampling
  2. low-rank approximation
  3. error estimation
  4. randomized algorithms

MSC codes

  1. 62F40
  2. 65F55
  3. 68W20

Authors

Affiliations

Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125 USA.
Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125 USA.

Funding Information

Office of Advanced Scientific Computing Research
Department of Energy Computational Science Graduate Fellowship: DE-SC0021110
Office of Naval Research (ONR): N00014-17-1-2146, N00014-18-1-2363
NSF FRG: 1952777
Funding: This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Department of Energy Computational Science Graduate Fellowship under award DE-SC0021110. The second author was supported in part by ONR awards N00014-17-1-2146 and N00014-18-1-2363 and NSF FRG award 1952777.

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item

View Options

View options

PDF

View PDF

Full Text

View Full Text

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media