Abstract

The standard implementation of the conjugate gradient algorithm suffers from communication bottlenecks on parallel architectures, due primarily to the two global reductions required every iteration. In this paper, we study conjugate gradient variants which decrease the runtime per iteration by overlapping global synchronizations, and in the case of pipelined variants, matrix-vector products. Through the use of a predict-and-recompute scheme, whereby recursively updated quantities are first used as a predictor for their true values and then recomputed exactly at a later point in the iteration, these variants are observed to have convergence behavior nearly as good as the standard conjugate gradient implementation on a variety of test problems. We provide a rounding error analysis which provides insight into this observation. It is also verified experimentally that the variants studied do indeed reduce the runtime per iteration in practice and that they scale similarly to previously studied communication-hiding variants. Finally, because these variants achieve good convergence without the use of any additional input parameters, they have the potential to be used in place of the standard conjugate gradient implementation in a range of applications.

Keywords

  1. Krylov subspace methods
  2. conjugate gradient
  3. parallel algorithms
  4. numerical algorithms

MSC codes

  1. 65F10
  2. 65N12
  3. 65Y05

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
T. J. Ashby, P. Ghysels, W. Heirman, and W. Vanroose, The impact of global communication latency at extreme scales on Krylov methods, in Algorithms and Architectures for Parallel Processing, Y. Xiang, I. Stojmenovic, B. O. Apduhan, G. Wang, K. Nakano, and A. Zomaya, eds., Springer, Berlin, 2012, pp. 428--442.
2.
G. Ballard, E. C. Carson, J. Demmel, M. Hoemmen, N. Knight, and O. Schwartz, Communication lower bounds and optimal algorithms for numerical linear algebra, Acta Numer., 23 (2014), p. 1--155.
3.
R. F. Boisvert, R. Pozo, K. Remington, R. F. Barrett, and J. J. Dongarra, Matrix Market: A web resource for test matrix collections, in Quality of Numerical Software, Springer, Berlin, 1997, pp. 125--137.
4.
E. C. Carson, M. Rozložník, Z. Strakoš, P. Tichý, and M. Tuma, The numerical stability analysis of pipelined conjugate gradient methods: Historical context and methodology, SIAM J. Sci. Comput., 40 (2018), pp. A3549--A3580, https://doi.org/10.1137/16M1103361.
5.
A. Chronopoulos, A Class of Parallel Iterative Methods Implemented on Multiprocessors, Ph.D. thesis, University of Illinois, Chicago, IL, 1987.
6.
A. Chronopoulos and C. W. Gear, $s$-step iterative methods for symmetric linear systems, J. Comput. Appl. Math., 25 (1989), pp. 153--168.
7.
S. Cools, J. Cornelis, and W. Vanroose, Numerically stable recurrence relations for the communication hiding pipelined conjugate gradient method, IEEE Trans. Parallel Distrib. Syst., 30 (2019), pp. 2507--2522.
8.
S. Cools, E. F. Yetkin, E. Agullo, L. Giraud, and W. Vanroose, Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined conjugate gradient method, SIAM J. Matrix Anal. Appl., 39 (2018), pp. 426--450, https://doi.org/10.1137/17M1117872.
9.
S. Cools and W. Vanroose, Numerically Stable Variants of the Communication-Hiding Pipelined Conjugate Gradients Algorithm for the Parallel Solution of Large Scale Symmetric Linear Systems, preprint, https://arxiv.org/abs/1706.05988, 2017.
10.
J. Cornelis, S. Cools, and W. Vanroose, The Communication-Hiding Conjugate Gradient Method with Deep Pipelines, preprint, https://arxiv.org/abs/1801.04728, 2018.
11.
L. D. Dalcin, R. R. Paz, P. A. Kler, and A. Cosimo, Parallel distributed computing using Python, Adv. Water Res., 34 (2011), pp. 1124--1139.
12.
E. F. D'Azevedo and C. H. Romine, Reducing Communication Costs in the Conjugate Gradient Algorithm on Distributed Memory Multiprocessors, Tech. report, Office of Scientific and Technical Information (OSTI), 1992, https://doi.org/10.2172/10176473.
13.
J. Demmel, M. F. Hoemmen, M. Mohiyuddin, and K. A. Yelick, Avoiding Communication in Computing Krylov Subspaces, Tech. Report UCB/EECS-2007-123, EECS Department, University of California, Berkeley, CA, 2007.
14.
J. Dongarra, M. A. Heroux, and P. Luszczek, High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems, Internat. J. High Performance Comput. Appl., 30 (2016), pp. 3--10.
15.
P. R. Eller and W. Gropp, Scalable non-blocking preconditioned conjugate gradient methods, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC'16, IEEE Press, Piscataway, NJ, 2016, pp. 18:1--18:12.
16.
P. Ghysels and W. Vanroose, Hiding global synchronization latency in the preconditioned conjugate gradient algorithm, Parallel Comput., 40 (2014), pp. 224--238.
17.
A. Greenbaum, Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences, Linear Algebra Appl., 113 (1989), pp. 7--63.
18.
A. Greenbaum, Estimating the attainable accuracy of recursively computed residual methods, SIAM J. Matrix Anal. Appl., 18 (1997), pp. 535--551, https://doi.org/10.1137/S0895479895284944.
19.
A. Greenbaum, Iterative Methods for Solving Linear Systems, SIAM, Philadelphia, 1997, https://doi.org/10.1137/1.9781611970937.
20.
A. Greenbaum, H. Liu, and T. Chen, On the Convergence Rate of Variants of the Conjugate Gradient Algorithm in Finite Precision Arithmetic, preprint, https://arxiv.org/abs/1905.05874, 2019.
21.
M. H. Gutknecht and Z. Strakoš, Accuracy of two three-term and three two-term recurrences for Krylov space solvers, SIAM J. Matrix Anal. Appl., 22 (2000), pp. 213--229, https://doi.org/10.1137/S0895479897331862.
22.
M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Research Nat. Bur. Standards, 49 (1952), pp. 409--436.
23.
L. Johnsson, Highly concurrent algorithms for solving linear systems of equations, in Elliptic Problem Solvers, Academic Press, Orlando, FL, 1984, pp. 105--126.
24.
K. McManus, S. Johnson, and M. Cross, Communication latency hiding in a parallel conjugate gradient method, in Eleventh International Conference on Domain Decomposition Methods (London, 1998), DDM.org, Augsburg, 1999, pp. 306--313.
25.
G. Meurant, Multitasking the conjugate gradient method on the Cray X-MP/48, Parallel Comput., 5 (1987), pp. 267--280.
26.
G. Meurant, The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite Precision Computations, SIAM, Philadelphia, 2006, https://doi.org/10.1137/1.9780898718140.
27.
G. Meurant and Z. Strakoš, The Lanczos and conjugate gradient algorithms in finite precision arithmetic, Acta Numer., 15 (2006), pp. 471--542.
28.
C. C. Paige, The Computation of Eigenvalues and Eigenvectors of Very Large Sparse Matrices, Ph.D. thesis, University of London, 1971.
29.
C. C. Paige, Accuracy and effectiveness of the Lanczos algorithm for the symmetric eigenproblem, Linear Algebra Appl., 34 (1980), pp. 235--258.
30.
Y. Saad, Practical use of polynomial preconditionings for the conjugate gradient method, SIAM J. Sci. Statist. Comput., 6 (1985), pp. 865--881, https://doi.org/10.1137/0906059.
31.
Y. Saad, Krylov subspace methods on supercomputers, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 1200--1232, https://doi.org/10.1137/0910073.
32.
G. L. G. Sleijpen, H. A. van der Vorst, and D. R. Fokkema, BiCGstab($l$) and other hybrid Bi-CG methods, Numer. Algorithms, 7 (1994), pp. 75--109.
33.
Z. Strakoš, On the real convergence rate of the conjugate gradient method, Linear Algebra Appl., 154-156 (1991), pp. 535--549.
34.
Z. Strakoš and A. Greenbaum, Open Questions in the Convergence Analysis of the Lanczos Process for the Real Symmetric Eigenvalue Problem, preprint, University of Minnesota, 1992.
35.
R. Strzodka and D. Goddeke, Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components, in 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2006, pp. 259--270.
36.
J. Van Rosendale, Minimizing inner product data dependencies in conjugate gradient iteration, in International Conference on Parallel Processing (ICPP), 1983, pp. 44--46.

Information & Authors

Information

Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing
Pages: A3084 - A3108
ISSN (online): 1095-7197

History

Submitted: 23 July 2019
Accepted: 29 June 2020
Published online: 7 October 2020

Keywords

  1. Krylov subspace methods
  2. conjugate gradient
  3. parallel algorithms
  4. numerical algorithms

MSC codes

  1. 65F10
  2. 65N12
  3. 65Y05

Authors

Affiliations

Funding Information

Charles University : PRIMUS/19/SCI/11
National Science Foundation https://doi.org/10.13039/100000001 : DGE-1762114

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media