Software and High-Performance Computing

A Hierarchically Blocked Jacobi SVD Algorithm for Single and Multiple Graphics Processing Units

Abstract

We present a hierarchically blocked one-sided Jacobi algorithm for the singular value decomposition (SVD), targeting both single and multiple graphics processing units (GPUs). The blocking structure reflects the levels of the GPUs' memory hierarchy. The algorithm may outperform MAGMA's \textttdgesvd, while retaining high relative accuracy. To this end, we developed a family of parallel pivot strategies on the GPU's shared address space, but applicable also to inter-GPU communication. Unlike common hybrid approaches, our algorithm in a single-GPU setting needs a CPU for the controlling purposes only, while utilizing the GPU's resources to the fullest extent permitted by the hardware. When required by the problem size, the algorithm, in principle, scales to an arbitrary number of GPU nodes. The scalability is demonstrated by more than twofold speedup for sufficiently large matrices on a Tesla S2050 system with four GPUs versus a single Fermi card.

Keywords

  1. Jacobi (hyperbolic) singular value decomposition
  2. parallel pivot strategies
  3. graphics processing units

MSC codes

  1. 65Y05
  2. 65Y10
  3. 65F15

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
A. A. Anda and H. Park, Fast plane rotations with dynamic scaling, SIAM J. Matrix Anal. Appl., 15 (1994), pp. 162--174.
2.
M. Anderson, G. Ballard, J. W. Demmel, and K. Keutzer, Communication-avoiding QR decomposition for GPUs, in Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2011), Anchorage, AK, May 2011, pp. 48--58.
3.
M. Bečka, G. Okša, and M. Vajteršic, Dynamic ordering for a parallel block--Jacobi SVD algorithm, Parallel Comput., 28 (2002), pp. 243--262.
4.
R. P. Brent and F. T. Luk, The solution of singular-value and symmetric eigenvalue problems on multiprocessor arrays, SIAM J. Sci. Stat. Comput., 6 (1985), pp. 69--84.
5.
R. P. Brent, F. T. Luk, and C. F. Van Loan, Computation of the singular value decomposition using mesh-connected processors, J. VLSI Comput. Syst., 1 (1985), pp. 242--270.
6.
L. E. Cannon, A Cellular Computer to Implement the Kalman Filter Algorithm, Ph.D. thesis, Montana State University, Bozeman, MT, 1969.
7.
CUDA C Programming Guide 5.5, NVIDIA Corp., Santa Clara, CA, 2013.
8.
J. Demmel and H. D. Nguyen, Fast reproducible floating-point summation, in Proceedings of the 21st IEEE Symposium on Computer Arithmetic (ARITH), Austin, TX, 2013, pp. 163--172.
9.
J. W. Demmel, L. Grigori, M. F. Hoemmen, and J. Langou, Communication-Optimal Parallel and Sequential QR and LU Factorizations, Technical Report UCB/EECS--2008--89, Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, 2008.
10.
J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Communication-optimal parallel and sequential QR and LU factorizations, SIAM J. Sci. Comput., 34 (2012), pp. A206--A239.
11.
J. W. Demmel and K. Veselić, Jacobi's method is more accurate than QR, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 1204--1245.
12.
F. M. Dopico, P. Koev, and J. M. Molera, Implicit standard Jacobi gives high relative accuracy, Numer. Math., 113 (2009), pp. 519--553.
13.
Z. Drmač, Implementation of Jacobi rotations for accurate singular value computation in floating point arithmetic, SIAM J. Sci. Comput., 18 (1997), pp. 1200--1222.
14.
Z. Drmač, A posteriori computation of the singular vectors in a preconditioned Jacobi SVD algorithm, IMA J. Numer. Anal., 19 (1999), pp. 191--213.
15.
Z. Drmač and K. Veselić, New fast and accurate Jacobi SVD algorithm. I, SIAM J. Matrix Anal. Appl., 29 (2008), pp. 1322--1342.
16.
Z. Drmač and K. Veselić, New fast and accurate Jacobi SVD algorithm. II, SIAM J. Matrix Anal. Appl., 29 (2008), pp. 1343--1362.
17.
P. J. Eberlein, On one-sided Jacobi methods for parallel computation, SIAM J. Alg. Disc. Meth., 8 (1987), pp. 790--796.
18.
G. R. Gao and S. J. Thomas, An optimal parallel Jacobi--like solution method for the singular value decomposition, in Proceedings of the 1988 International Conference on Parallel Processing, St. Charles, IL, vol. 3, 1988, pp. 47--53.
19.
E. R. Hansen, On cyclic Jacobi methods, J. Soc. Indust. Appl. Math., 11 (1963), pp. 448--459.
20.
V. Hari, S. Singer, and S. Singer, Block-oriented $J$-Jacobi methods for Hermitian matrices, Linear Algebra Appl., 433 (2010), pp. 1491--1512.
21.
V. Hari, S. Singer, and S. Singer, Full block $J$-Jacobi method for Hermitian matrices, Linear Algebra Appl., 444 (2014), pp. 1--27.
22.
M. R. Hestenes, Inversion of matrices by biorthogonalization and related results, J. Soc. Indust. Appl. Math., 6 (1958), pp. 51--90.
23.
D. S. Johnson, M. Yannakakis, and C. H. Papadimitriou, On generating all maximal independent sets, Inform. Process. Lett., 27 (1988), pp. 119--123.
24.
S. Lahabar and P. J. Narayanan, Singular value decomposition on GPU using CUDA, in Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2009), Rome, Italy, 2009, 5161058.
25.
F. T. Luk and H. Park, On parallel Jacobi orderings, SIAM J. Sci. Stat. Comput., 10 (1989), pp. 18--26.
26.
F. T. Luk and H. Park, A proof of convergence for two parallel Jacobi SVD algorithms, IEEE Trans. Comput., 38 (1989), pp. 806--811.
27.
M. Mantharam and P. J. Eberlein, Block recursive algorithm to generate Jacobi-sets, Parallel Comput., 19 (1993), pp. 481--496.
28.
W. F. Mascarenhas, On the convergence of the Jacobi method for arbitrary orderings, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 1197--1209.
29.
V. Novaković and S. Singer, A GPU-based hyperbolic SVD algorithm, BIT, 51 (2011), pp. 1009--1030.
30.
R. Onn, A. O. Steinhardt, and A. Bojanczyk, The hyperbolic singular value decomposition and applications, IEEE Trans. Signal Process., 39 (1991), pp. 1575--1588.
31.
Standard for Floating-Point Arithmetic, IEEE Std 754-2008, IEEE, New York, 2008.
32.
H. Rutishauser, The Jacobi method for real symmetric matrices, Numer. Math., 9 (1966), pp. 1--10.
33.
A. H. Sameh, On Jacobi and Jacobi-like algorithms for a parallel computer, Math. Comp., 25 (1971), pp. 579--590.
34.
G. Shroff and R. S. Schreiber, On the convergence of the cyclic Jacobi method for parallel block orderings, SIAM J. Matrix Anal. Appl., 10 (1989), pp. 326--346.
35.
S. Singer, S. Singer, V. Hari, K. Bokulić, D. Davidović, M. Jurešić, and A. Ušćumlić, Advances in speedup of the indefinite one-sided block Jacobi method, in Numerical Analysis and Applied Mathematics, AIP Conf. Proc. 936, T. E. Simos, G. Psihoyios, and Ch. Tsitouras, eds., AIP, Melville, NY, 2007, pp. 519--522.
36.
S. Singer, S. Singer, V. Novaković, D. Davidović, K. Bokulić, and A. Ušćumlić, Three-level parallel $J$-Jacobi algorithms for Hermitian matrices, Appl. Math. Comput., 218 (2012), pp. 5704--5725.
37.
S. Singer, S. Singer, V. Novaković, A. Ušćumlić, and V. Dunjko, Novel modifications of parallel Jacobi algorithms, Numer. Algorithms, 59 (2012), pp. 1--27.
38.
I. Slapničar, Componentwise analysis of direct factorization of real symmetric and Hermitian matrices, Linear Algebra Appl., 272 (1998), pp. 227--275.
39.
S. Tomov, R. Nath, and J. Dongarra, Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing, Parallel Comput., 36 (2010), pp. 645--654.
40.
C. F. Van Loan, The block Jacobi method for computing the singular value decomposition, in Computational and Combinatorial Methods in Systems Theory, C. I. Byrnes and A. Lindquist, Elservier Science Publishers B.V. (North--Holland), Amsterdam, 1986, pp. 245--255.
41.
K. Veselić, A Jacobi eigenreduction algorithm for definite matrix pairs, Numer. Math., 64 (1993), pp. 241--269.
42.
H. Zha, A note on the existence of the hyperbolic singular value decomposition, Linear Algebra Appl., 240 (1996), pp. 199--205.

Information & Authors

Information

Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing
Pages: C1 - C30
ISSN (online): 1095-7197

History

Submitted: 13 January 2014
Accepted: 17 September 2014
Published online: 20 January 2015

Keywords

  1. Jacobi (hyperbolic) singular value decomposition
  2. parallel pivot strategies
  3. graphics processing units

MSC codes

  1. 65Y05
  2. 65Y10
  3. 65F15

Authors

Affiliations

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media