Abstract

The computation of the singular value decomposition, or SVD, has a long history with many improvements over the years, both in its implementations and algorithmically. Here, we survey the evolution of SVD algorithms for dense matrices, discussing the motivation and performance impacts of changes. There are two main branches of dense SVD methods: bidiagonalization and Jacobi. Bidiagonalization methods started with the implementation by Golub and Reinsch in Algol60, which was subsequently ported to Fortran in the EISPACK library, and was later more efficiently implemented in the LINPACK library, targeting contemporary vector machines. To address cache-based memory hierarchies, the SVD algorithm was reformulated to use Level 3 BLAS in the LAPACK library. To address new architectures, ScaLAPACK was introduced to take advantage of distributed computing, and MAGMA was developed for accelerators such as GPUs. Algorithmically, the divide and conquer and MRRR algorithms were developed to reduce the number of operations. Still, these methods remained memory bound, so two-stage algorithms were developed to reduce memory operations and increase the computational intensity, with efficient implementations in PLASMA, DPLASMA, and MAGMA. Jacobi methods started with the two-sided method of Kogbetliantz and the one-sided method of Hestenes. They have likewise had many developments, including parallel and block versions and preconditioning to improve convergence. In this paper, we investigate the impact of these changes by testing various historical and current implementations on a common, modern multicore machine and a distributed computing platform. We show that algorithmic and implementation improvements have increased the speed of the SVD by several orders of magnitude, while using up to 40 times less energy.

Keywords

  1. singular value decomposition
  2. SVD
  3. bidiagonal matrix
  4. QR iteration
  5. divide and conquer
  6. bisection
  7. MRRR
  8. Jacobi method
  9. Kogbetliantz method
  10. Hestenes method

MSC codes

  1. 15A18
  2. 15A23
  3. 65Y05

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
E. Agullo, B. Hadri, H. Ltaief, and J. Dongarrra, Comparative study of one-sided factorizations with multiple software packages on multi-core hardware, in Proceedings of the Conference on High Performance Computing, Networking, Storage and Analysis (SC'09), ACM, 2009, art. 20, https://doi.org/10.1145/1654059.1654080.
2.
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, LAPACK Users' Guide, 3rd ed., SIAM, Philadelphia, 1999, https://doi.org/10.1137/1.9780898719604.
3.
H. Andrews and C. Patterson, Singular value decomposition (SVD) image coding, IEEE Trans. Commun., 24 (1976), pp. 425--432, https://doi.org/10.1109/TCOM.1976.1093309.
4.
P. Arbenz and G. H. Golub, On the spectral decomposition of Hermitian matrices modified by low rank perturbations with applications, SIAM J. Matrix Anal. Appl., 9 (1988), pp. 40--58, https://doi.org/10.1137/0609004.
5.
P. Arbenz and I. Slapničar, An analysis of parallel implementations of the block-Jacobi algorithm for computing the SVD, in Proceedings of the 17th International Conference on Information Technology Interfaces ITI, 1995, pp. 13--16, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.4595.
6.
G. Ballard, J. Demmel, and N. Knight, Avoiding communication in successive band reduction, ACM Trans. Parallel Comput., 1 (2015), p. 11, https://doi.org/10.1145/2686877.
7.
J. L. Barlow, More accurate bidiagonal reduction for computing the singular value decomposition, SIAM J. Matrix Anal. Appl., 23 (2002), pp. 761--798, https://doi.org/10.1137/S0895479898343541.
8.
M. Bečka, G. Okša, and M. Vajteršic, Dynamic ordering for a parallel block-Jacobi SVD algorithm, Parallel Comput., 28 (2002), pp. 243--262, https://doi.org/10.1016/S0167-8191(01)00138-7.
9.
M. Bečka, G. Okša, and M. Vajteršic, New dynamic orderings for the parallel one--sided block-Jacobi SVD algorithm, Parallel Process. Lett., 25 (2015), art. 1550003, https://doi.org/10.1142/S0129626415500036.
10.
M. Bečka, G. Okša, M. Vajteršic, and L. Grigori, On iterative QR pre-processing in the parallel block-Jacobi SVD algorithm, Parallel Comput., 36 (2010), pp. 297--307, https://doi.org/10.1016/j.parco.2009.12.013.
11.
M. Bečka and M. Vajteršic, Block-Jacobi SVD algorithms for distributed memory systems I: Hypercubes and rings, Parallel Algorithms Appl., 13 (1999), pp. 265--287, https://doi.org/10.1080/10637199808947377.
12.
M. Bečka and M. Vajteršic, Block-Jacobi SVD algorithms for distributed memory systems II: Meshes, Parallel Algorithms Appl., 14 (1999), pp. 37--56, https://doi.org/10.1080/10637199808947370.
13.
C. Bischof, B. Lang, and X. Sun, Algorithm 807: The SBR Toolbox---software for successive band reduction, ACM Trans. Math. Software, 26 (2000), pp. 602--616, https://doi.org/10.1145/365723.365736.
14.
C. Bischof and C. Van Loan, The WY representation for products of Householder matrices, SIAM J. Sci. Statist. Comput., 8 (1987), pp. 2--13, https://doi.org/10.1137/0908009.
15.
C. H. Bischof, Computing the singular value decomposition on a distributed system of vector processors, Parallel Comput., 11 (1989), pp. 171--186, https://doi.org/10.1016/0167-8191(89)90027-6.
16.
L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet et al., ScaLAPACK Users' Guide, SIAM, Philadelphia, 1997, https://doi.org/10.1137/1.9780898719642.
17.
L. S. Blackford, A. Petitet, R. Pozo, K. Remington, R. C. Whaley, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry et al., An updated set of basic linear algebra subprograms (BLAS), ACM Trans. Math. Software, 28 (2002), pp. 135--151, https://doi.org/10.1145/567806.567807.
18.
G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemarinier, H. Ltaief, et al., Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA, in 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Ph.D. Forum (IPDPSW), IEEE, 2011, pp. 1432--1441, https://doi.org/10.1109/IPDPS.2011.299.
19.
G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, and J. Dongarra, DAGuE: A generic distributed DAG engine for high performance computing, Parallel Comput., 38 (2012), pp. 37--51, https://doi.org/10.1016/j.parco.2011.10.003.
20.
W. H. Boukaram, G. Turkiyyah, H. Ltaief, and D. E. Keyes, Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression, Parallel Comput., 74 (2017), pp. 19--33, https://doi.org/10.1016/j.parco.2017.09.001.
21.
R. P. Brent and F. T. Luk, The solution of singular-value and symmetric eigenvalue problems on multiprocessor arrays, SIAM J. Sci. Statist. Comput., 6 (1985), pp. 69--84, https://doi.org/10.1137/0906007.
22.
R. P. Brent, F. T. Luk, and C. Van Loan, Computation of the singular value decomposition using mesh-connected processors, J. VLSI Comput. Syst., 1 (1985), pp. 242--270, http://maths-people.anu.edu.au/ brent/pd/rpb080i.pdf.
23.
T. F. Chan, An improved algorithm for computing the singular value decomposition, ACM Trans. Math. Software, 8 (1982), pp. 72--83, https://doi.org/10.1145/355984.355990.
24.
J. Choi, J. Dongarra, and D. W. Walker, The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form, Numer. Algorithms, 10 (1995), pp. 379--399, https://doi.org/10.1007/BF02140776.
25.
J. J. M. Cuppen, A divide and conquer method for the symmetric tridiagonal eigenproblem, Numer. Math., 36 (1980), pp. 177--195, https://doi.org/10.1007/BF01396757.
26.
P. I. Davies and N. J. Higham, Numerically stable generation of correlation matrices and their factors, BIT, 40 (2000), pp. 640--651, https://doi.org/10.1023/A:102238421.
27.
P. P. M. de Rijk, A one-sided Jacobi algorithm for computing the singular value decomposition on a vector computer, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 359--371, https://doi.org/10.1137/0910023.
28.
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, Indexing by latent semantic analysis, J. Amer. Soc. Inform. Sci., 41 (1990), pp. 391--407, https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9.
29.
J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Communication-optimal parallel and sequential QR and LU factorizations, SIAM J. Sci. Comput., 34 (2012), pp. A206--A239, https://doi.org/10.1137/080731992.
30.
J. Demmel, M. Gu, S. Eisenstat, I. Slapničar, K. Veselić, and Z. Drmač, Computing the singular value decomposition with high relative accuracy, Linear Algebra Appl., 299 (1999), pp. 21--80, https://doi.org/10.1016/S0024-3795(99)00134-2.
31.
J. Demmel and W. Kahan, Accurate singular values of bidiagonal matrices, SIAM J. Sci. Statist. Comput., 11 (1990), pp. 873--912, https://doi.org/10.1137/0911052.
32.
J. Demmel and K. Veselić, Jacobi's method is more accurate than QR, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 1204--1245, https://doi.org/10.1137/0613074.
33.
J. W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, 1997, https://doi.org/10.1137/1.9781611971446.
34.
J. W. Demmel, I. Dhillon, and H. Ren, On the correctness of some bisection-like parallel eigenvalue algorithms in floating point arithmetic, Electron. Trans. Numer. Anal., 3 (1995), pp. 116--149, http://emis.ams.org/journals/ETNA/vol.3.1995/pp116-149.dir/pp116-149.pdf.
35.
I. S. Dhillon, A New O($n^2$) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem, Ph.D. thesis, EECS Department, University of California, Berkeley, 1997, http://www.dtic.mil/docs/citations/ADA637073.
36.
I. S. Dhillon and B. N. Parlett, Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices, Linear Algebra Appl., 387 (2004), pp. 1--28, https://doi.org/10.1016/j.laa.2003.12.028.
37.
I. S. Dhillon and B. N. Parlett, Orthogonal eigenvectors and relative gaps, SIAM J. Matrix Anal. Appl., 25 (2004), pp. 858--899, https://doi.org/10.1137/S0895479800370111.
38.
I. S. Dhillon, B. N. Parlett, and C. Vömel, The design and implementation of the MRRR algorithm, ACM Trans. Math. Software, 32 (2006), pp. 533--560, https://doi.org/10.1145/1186785.1186788.
39.
J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart, LINPACK Users' Guide, SIAM, Philadelphia, 1979, https://doi.org/10.1137/1.9781611971811.
40.
J. Dongarra, J. Du Croz, S. Hammarling, and I. S. Duff, A set of level $3$ basic linear algebra subprograms, ACM Trans. Math. Software, 16 (1990), pp. 1--17, https://doi.org/10.1145/77626.79170.
41.
J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, An extended set of FORTRAN basic linear algebra subprograms, ACM Trans. Math. Software, 14 (1988), pp. 1--17, https://doi.org/10.1145/42288.42291.
42.
J. Dongarra, D. C. Sorensen, and S. J. Hammarling, Block reduction of matrices to condensed forms for eigenvalue computations, J. Comput. Appl. Math., 27 (1989), pp. 215--227, https://doi.org/10.1016/0377-0427(89)90367-1.
43.
Z. Drmač, Algorithm 977: A QR-preconditioned QR SVD method for computing the SVD with high accuracy, ACM Trans. Math. Software, 44 (2017), p. 11, https://doi.org/10.1145/3061709.
44.
Z. Drmač and K. Veselić, New fast and accurate Jacobi SVD algorithm, I, SIAM J. Matrix Anal. Appl., 29 (2008), pp. 1322--1342, https://doi.org/10.1137/050639193.
45.
Z. Drmač and K. Veselić, New fast and accurate Jacobi SVD algorithm, II, SIAM J. Matrix Anal. Appl., 29 (2008), pp. 1343--1362, https://doi.org/10.1137/05063920X.
46.
P. Eberlein, On one-sided Jacobi methods for parallel computation, SIAM J. Algebraic Discrete Methods, 8 (1987), pp. 790--796, https://doi.org/10.1137/0608064.
47.
48.
K. V. Fernando and B. N. Parlett, Accurate singular values and differential qd algorithms, Numer. Math., 67 (1994), pp. 191--229, https://doi.org/10.1007/s002110050024.
49.
G. E. Forsythe and P. Henrici, The cyclic Jacobi method for computing the principal values of a complex matrix, Trans. Amer. Math. Soc., 94 (1960), pp. 1--23, https://doi.org/10.2307/1993275.
50.
B. S. Garbow, J. M. Boyle, C. B. Moler, and J. Dongarra, Matrix eigensystem routines -- EISPACK guide extension, Lecture Notes in Comput. Sci. 51, Springer, Berlin, 1977, https://doi.org/10.1007/3-540-08254-9.
51.
M. Gates, S. Tomov, and J. Dongarraa, Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs, Parallel Comput., 74 (2018), pp. 3--18, https://doi.org/10.1016/j.parco.2017.10.004.
52.
G. Golub, Some modified matrix eigenvalue problems, SIAM Rev., 15 (1973), pp. 318--334, https://doi.org/10.1137/1015032.
53.
G. Golub and W. Kahan, Calculating the singular values and pseudo-inverse of a matrix, J. Soc. Indust. Appl. Math. Ser. B Numer. Anal., 2 (1965), pp. 205--224, https://doi.org/10.1137/0702016.
54.
G. Golub and C. Reinsch, Singular value decomposition and least squares solutions, Numer. Math., 14 (1970), pp. 403--420, https://doi.org/10.1007/BF02163027.
55.
B. Großer and B. Lang, Efficient parallel reduction to bidiagonal form, Parallel Comput., 25 (1999), pp. 969--986, https://doi.org/10.1016/S0167-8191(99)00041-1.
56.
M. Gu, J. Demmel, and I. Dhillon, Efficient Computation of the Singular Value Decomposition with Applications to Least Squares Problems, Tech. Report LBL-36201, Lawrence Berkeley Laboratory, 1994, http://www.cs.utexas.edu/users/inderjit/public_papers/least_squares.pdf.
57.
M. Gu and S. C. Eisenstat, A Divide and Conquer Algorithm for the Bidiagonal SVD, Tech. Report YALEU/DCS/TR-933, Department of Computer Science, Yale University, 1992, http://cpsc.yale.edu/research/technical-reports/1992-technical-reports.
58.
M. Gu and S. C. Eisenstat, A stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem, SIAM J. Matrix Anal. Appl., 15 (1994), pp. 1266--1276, https://doi.org/10.1137/S089547989223924X.
59.
M. Gu and S. C. Eisenstat, A divide-and-conquer algorithm for the bidiagonal SVD, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 79--92, https://doi.org/10.1137/S0895479892242232.
60.
A. Haidar, J. Kurzak, and P. Luszczek, An improved parallel singular value algorithm and its implementation for multicore hardware, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC'13), ACM, 2013, art. 90, https://doi.org/10.1145/2503210.2503292.
61.
A. Haidar, H. Ltaief, and J. Dongarra, Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels, in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC'11), ACM, 2011, art. 8, https://doi.org/10.1145/2063384.2063394.
62.
A. Haidar, H. Ltaief, P. Luszczek, and J. Dongarra, A comprehensive study of task coalescing for selecting parallelism granularity in a two-stage bidiagonal reduction, in 2012 IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2012, pp. 25--35, https://doi.org/10.1109/IPDPS.2012.13.
63.
S. Hammarling, A note on modifications to the Givens plane rotation, IMA J. Appl. Math., 13 (1974), pp. 215--218, https://doi.org/10.1093/imamat/13.2.215.
64.
V. Hari, Accelerating the SVD block-Jacobi method, Computing, 75 (2005), pp. 27--53, https://doi.org/10.1007/s00607-004-0113-z.
65.
V. Hari and J. Matejaš, Accuracy of two SVD algorithms for $2\times 2$ triangular matrices, Appl. Math. Comput., 210 (2009), pp. 232--257, https://doi.org/10.1016/j.amc.2008.12.086.
66.
V. Hari and K. Veselić, On Jacobi methods for singular value decompositions, SIAM J. Sci. Statist. Comput., 8 (1987), pp. 741--754, https://doi.org/10.1137/0908064.
67.
M. Heath, A. Laub, C. Paige, and R. Ward, Computing the singular value decomposition of a product of two matrices, SIAM J. Sci. Statist. Comput., 7 (1986), pp. 1147--1159, https://doi.org/10.1137/0907078.
68.
M. R. Hestenes, Inversion of matrices by biorthogonalization and related results, J. Soc. Indust. Appl. Math., 6 (1958), pp. 51--90, https://doi.org/10.1137/0106005.
69.
G. W. Howell, J. W. Demmel, C. T. Fulton, S. Hammarling, and K. Marmol, Cache efficient bidiagonalization using BLAS $2.5$ operators, ACM Trans. Math. Software, 34 (2008), art. 14, https://doi.org/10.1145/1356052.1356055.
70.
IBM Corporation, ESSL Guide and Reference, 2016, http://publib.boulder.ibm.com/epubs/pdf/a2322688.pdf.
71.
Intel Corporation, User's Guide for Intel Math Kernel Library for Linux OS, 2015, http://software.intel.com/en-us/mkl-for-linux-userguide.
72.
I. C. F. Ipsen, Computing an eigenvector with inverse iteration, SIAM Rev., 39 (2006), pp. 254--291, https://doi.org/10.1137/S0036144596300773.
73.
C. G. J. Jacobi, Über ein leichtes verfahren die in der theorie der säcularstörungen vorkommenden gleichungen numerisch aufzulösen, J. Reine Angew. Math., 30 (1846), pp. 51--94, http://eudml.org/doc/147275.
74.
E. Jessup and D. Sorensen, A divide and conquer algorithm for computing the singular value decomposition, in Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, 1989, SIAM, Philadelphia, pp. 61--66.
75.
W. Kahan, Accurate Eigenvalues of a Symmetric Tri-diagonal Matrix, Tech. Report, Stanford University, Stanford, CA, 1966, http://www.dtic.mil/docs/citations/AD0638796.
76.
E. Kogbetliantz, Solution of linear equations by diagonalization of coefficients matrix, Quart. Appl. Math., 13 (1955), pp. 123--132, http://www.ams.org/journals/qam/1955-13-02/S0033-569X-1955-88795-9/S0033-569X-1955-88795-9.pdf.
77.
J. Kurzak, P. Wu, M. Gates, I. Yamazaki, P. Luszczek, G. Ragghianti, and J. Dongarra, Designing SLATE: Software for Linear Algebra Targeting Exascale, SLATE Working Note 3, Innovative Computing Laboratory, University of Tennessee, 2017, http://www.icl.utk.edu/publications/swan-003.
78.
B. Lang, Parallel reduction of banded matrices to bidiagonal form, Parallel Comput., 22 (1996), pp. 1--18, https://doi.org/10.1016/0167-8191(95)00064-X.
79.
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, Basic linear algebra subprograms for FORTRAN usage, ACM Trans. Math. Software, 5 (1979), pp. 308--323, https://doi.org/10.1145/355841.355847.
80.
R.-C. Li, Solving Secular Equations Stably and Efficiently, Tech. Report UCB//CSD-94-851, Computer Science Division, University of California Berkeley, 1994, http://www.netlib.org/lapack/lawns/. Also: LAPACK Working Note 89.
81.
S. Li, M. Gu, L. Cheng, X. Chi, and M. Sun, An accelerated divide-and-conquer algorithm for the bidiagonal SVD problem, SIAM J. Matrix Anal. Appl., 35 (2014), pp. 1038--1057, https://doi.org/10.1137/130945995.
82.
H. Ltaief, J. Kurzak, and J. Dongarra, Parallel two-sided matrix reduction to band bidiagonal form on multicore architectures, IEEE Trans. Parallel Distrib. Syst., 21 (2010), pp. 417--423, https://doi.org/10.1109/TPDS.2009.79.
83.
H. Ltaief, P. Luszczek, and J. Dongarra, High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures, ACM Trans. Math. Software, 39 (2013), art. 16, https://doi.org/10.1145/2450153.2450154.
84.
F. T. Luk, Computing the singular-value decomposition on the ILLIAC IV, ACM Trans. Math. Software, 6 (1980), pp. 524--539, https://doi.org/10.1145/355921.355925.
85.
F. T. Luk and H. Park, On parallel Jacobi orderings, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 18--26, https://doi.org/10.1137/0910002.
86.
O. Marques and P. B. Vasconcelos, Computing the bidiagonal SVD through an associated tridiagonal eigenproblem, in International Conference on Vector and Parallel Processing (VECPAR), Springer, 2016, pp. 64--74, https://doi.org/10.1007/978-3-319-61982-8_8.
87.
W. F. Mascarenhas, On the convergence of the Jacobi method for arbitrary orderings, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 1197--1209, https://doi.org/10.1137/S0895479890179631.
88.
J. Matejaš and V. Hari, Accuracy of the Kogbetliantz method for scaled diagonally dominant triangular matrices, Appl. Math. Comput., 217 (2010), pp. 3726--3746, https://doi.org/10.1016/j.amc.2010.09.020.
89.
J. Matejaš and V. Hari, On high relative accuracy of the Kogbetliantz method, Linear Algebra Appl., 464 (2015), pp. 100--129, https://doi.org/10.1016/j.laa.2014.02.024.
91.
J. D. McCalpin, A survey of memory bandwidth and machine balance in current high performance computers, IEEE Comput. Soc. Tech. Committee Comput. Architect. (TCCA) Newslett., 19 (1995), pp. 19--25, http://tab.computer.org/tcca/NEWS/DEC95/dec95_mccalpin.ps.
92.
B. Moore, Principal component analysis in linear systems: Controllability, observability, and model reduction, IEEE Trans. Automat. Control, 26 (1981), pp. 17--32, https://doi.org/10.1109/TAC.1981.1102568.
93.
MPI Forum, MPI: A Message-Passing Interface Standard, Version 3.1, June 2015, http://www.mpi-forum.org/.
94.
NVIDIA Corporation, CUDA Toolkit 7.0, March 2015, http://developer.nvidia.com/cuda-zone.
95.
G. Okša and M. Vajteršic, Efficient pre-processing in the parallel block-Jacobi SVD algorithm, Parallel Comput., 32 (2006), pp. 166--176, https://doi.org/10.1016/j.parco.2005.06.006.
96.
OpenBLAS, OpenBLAS User Manual, 2016, http://www.openblas.net/.
97.
B. N. Parlett, The new QD algorithms, Acta Numer., 4 (1995), pp. 459--491, https://doi.org/10.1017/S0962492900002580.
98.
B. N. Parlett and I. S. Dhillon, Fernando's solution to Wilkinson's problem: An application of double factorization, Linear Algebra Appl., 267 (1997), pp. 247--279, https://doi.org/10.1016/S0024-3795(97)80053-5.
99.
V. Rokhlin, A. Szlam, and M. Tygert, A randomized algorithm for principal component analysis, SIAM J. Matrix Anal. Appl., 31 (2009), pp. 1100--1124, https://doi.org/10.1137/080736417.
100.
H. Rutishauser, Der quotienten-differenzen-algorithmus, Z. Angew. Math. Phys., 5 (1954), pp. 233--251, https://doi.org/10.1007/BF01600331.
101.
H. Rutishauser, Solution of eigenvalue problems with the LR-transformation, Nat. Bur. Standards Appl. Math. Ser., 49 (1958), pp. 47--81.
102.
H. Rutishauser, The Jacobi method for real symmetric matrices, in Handbook for Automatic Computation: Volume II: Linear Algebra, Grundlehren Math. Wiss. 186, Springer-Verlag, New York, 1971, pp. 202--211, https://doi.org/10.1007/978-3-642-86940-2.
103.
A. H. Sameh, On Jacobi and Jacobi-like algorithms for a parallel computer, Math. Comp., 25 (1971), pp. 579--590, https://doi.org/10.1090/S0025-5718-1971-0297131-6.
104.
R. Schreiber and C. Van Loan, A storage-efficient WY representation for products of Householder transformations, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 53--57, https://doi.org/10.1137/0910005.
105.
B. T. Smith, J. M. Boyle, J. Dongarra, B. S. Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler, Matrix Eigensystem Routines -- EISPACK Guide, Second Edition, Lecture Notes in Comput. Sci. 6, Springer, Berlin, 1976, https://doi.org/10.1007/3-540-07546-1.
106.
G. W. Stewart, The efficient generation of random orthogonal matrices with an application to condition estimators, SIAM J. Numer. Anal., 17 (1980), pp. 403--409, https://doi.org/10.1137/0717034.
107.
G. W. Stewart, On the early history of the singular value decomposition, SIAM Rev., 35 (1993), pp. 551--566, https://doi.org/10.1137/1035134.
108.
G. W. Stewart, QR Sometimes Beats Jacobi, Tech. Report CS-TR-3434, University of Maryland, 1995, http://drum.lib.umd.edu/handle/1903/709.
109.
S. Tomov, R. Nath, and J. Dongarra, Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing, Parallel Comput., 36 (2010), pp. 645--654, https://doi.org/10.1016/j.parco.2010.06.001.
110.
S. Tomov, R. Nath, H. Ltaief, and J. Dongarra, Dense linear algebra solvers for multicore with GPU accelerators, in 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW), IEEE, 2010, pp. 1--8, https://doi.org/10.1109/IPDPSW.2010.5470941.
111.
M. A. Turk and A. P. Pentland, Face recognition using eigenfaces, in Proceedings of 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 1991, pp. 586--591, https://doi.org/10.1109/CVPR.1991.139758.
112.
C. Van Loan, The Block Jacobi Method for Computing the Singular Value Decomposition, Tech. Report TR 85-680, Cornell University, 1985, https://ecommons.cornell.edu/handle/1813/6520.
113.
F. G. Van Zee, R. A. Van de Geijn, and G. Quintana-Ortí, Restructuring the tridiagonal and bidiagonal QR algorithms for performance, ACM Trans. Math. Software, 40 (2014), p. 18, https://doi.org/10.1145/2535371.
114.
F. G. Van Zee, R. A. Van De Geijn, G. Quintana-Ortí, and G. J. Elizondo, Families of algorithms for reducing a matrix to condensed form, ACM Trans. Math. Software, 39 (2012), art. 2, https://doi.org/10.1145/2382585.2382587.
115.
R. C. Whaley and J. Dongarra, Automatically tuned linear algebra software, in Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, IEEE Computer Society, 1998, pp. 1--27, https://doi.org/10.1109/SC.1998.10004.
116.
J. H. Wilkinson, Note on the quadratic convergence of the cyclic Jacobi process, Numer. Math., 4 (1962), pp. 296--300, https://doi.org/10.1007/BF01386321.
117.
J. H. Wilkinson and C. Reinsch, Handbook for Automatic Computation: Volume II: Linear Algebra, Grundlehren Math. Wiss. 186, Springer-Verlag, New York, 1971, https://doi.org/10.1007/978-3-642-86940-2.
118.
P. R. Willems and B. Lang, A framework for the $MR^3$ algorithm: Theory and implementation, SIAM J. Sci. Comput., 35 (2013), pp. A740--A766, https://doi.org/10.1137/110834020.
119.
P. R. Willems, B. Lang, and C. Vömel, Computing the bidiagonal SVD using multiple relatively robust representations, SIAM J. Matrix Anal. Appl., 28 (2006), pp. 907--926, https://doi.org/10.1137/050628301.
120.
B. B. Zhou and R. P. Brent, A parallel ring ordering algorithm for efficient one-sided Jacobi SVD computations, J. Parallel Distrib. Comput., 42 (1997), pp. 1--10, https://doi.org/10.1006/jpdc.1997.1304.

Information & Authors

Information

Published In

cover image SIAM Review
SIAM Review
Pages: 808 - 865
ISSN (online): 1095-7200

History

Submitted: 21 February 2017
Accepted: 22 March 2018
Published online: 8 November 2018

Keywords

  1. singular value decomposition
  2. SVD
  3. bidiagonal matrix
  4. QR iteration
  5. divide and conquer
  6. bisection
  7. MRRR
  8. Jacobi method
  9. Kogbetliantz method
  10. Hestenes method

MSC codes

  1. 15A18
  2. 15A23
  3. 65Y05

Authors

Affiliations

Funding Information

National Science Foundation https://doi.org/10.13039/100000001 : 1339822
Nvidia https://doi.org/10.13039/100007065
Intel Corporation https://doi.org/10.13039/100002418

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media