Abstract

Parallel implementations of linear iterative solvers generally alternate between phases of data exchange and phases of local computation. Increasingly large problem sizes and more heterogeneous compute architectures make load balancing and the design of low latency network interconnects that are able to satisfy the communication requirements of linear solvers very challenging tasks. In particular, global communication patterns such as inner products become increasingly limiting at scale. We explore the use of asynchronous communication based on one-sided Message Passing Interface primitives in the context of domain decomposition solvers. In particular, a scalable asynchronous two-level Schwarz method is presented. We discuss practical issues encountered in the development of a scalable solver and show experimental results obtained on a state-of-the-art supercomputer system that illustrate the benefits of asynchronous solvers in load balanced as well as load imbalanced scenarios. Using the novel method, we can observe speedups of up to four times over its classical synchronous equivalent.

Keywords

  1. asynchronous iteration
  2. domain decomposition
  3. Schwarz methods
  4. chaotic relaxation

MSC codes

  1. 68W10
  2. 65Y05
  3. 68W15
  4. 65N55

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
V. Ahlgren, S. Andersson, J. M. Brandt, N. P. Cardo, S. Chunduri, J. Enos, P. Fields, A. Gentile, R. Gerber, J. Greenseid, A. Greiner, B. Hadri, Y. He, D. Hoppe, U. Kaila, K. Kelly, M. Klein, A. Kristiansen, S. Leak, M. Mason, K. Pedretti, J.-G. Piccinali, J. Repik, J. Rogers, S. Salminen, M. Showerman, C. Whitney, and J. Williams, Cray System Monitoring: Successes Requirements and Priorities, Technical report, Sandia National Lab., Albuquerque, NM, 2018.
2.
J. M. Bahi, S. Contassot-Vivier, R. Couturier, and F. Vernier, A decentralized convergence detection algorithm for asynchronous parallel iterative algorithms, IEEE Trans. Parallel Distrib. Syst., 16 (2005), pp. 4--13.
3.
S. Balay, S. Abhyankar, M. F. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. D. Gropp, D. Karpeyev, D. Kaushik, M. G. Knepley, D. A. May, L. C. McInnes, R. T. Mills, T. Munson, K. Rupp, P. Sanan, B. F. Smith, S. Zampini, H. Zhang, and H. Zhang, PETSc Users Manual, Technical report ANL-95/11---Revision 3.13, Argonne National Laboratory, 2020, https://www.mcs.anl.gov/petsc.
4.
G. M. Baudet, Asynchronous iterative methods for multiprocessors, J. ACM, 25 (1978), pp. 226--244.
5.
D. P. Bertsekas, Distributed asynchronous computation of fixed points, Math. Program., 27 (1983), pp. 107--120.
6.
D. P. Bertsekas and J. N. Tsitsiklis, Parallel and distributed computation: Numerical methods, Prentice-Hall, Englewood Cliffs, NJ, 1989.
7.
X.-C. Cai, M. Dryja, and M. Sarkis, Restricted additive Schwarz preconditioners with harmonic overlap for symmetric positive definite linear systems, SIAM J. Numer. Anal., 41 (2003), pp. 1209--1231.
8.
X.-C. Cai and M. Sarkis, A restricted additive Schwarz preconditioner for general sparse linear systems, SIAM J. Sci. Comput., 21 (1999), pp. 792--797.
9.
F. Cappello, A. Geist, B. Gropp, L. Kale, B. Kramer, and M. Snir, Toward exascale resilience, Int. J. High Perform. Comput. Appl., 23 (2009), pp. 374--388, https://doi.org/10.1177/1094342009347767.
10.
F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, and M. Snir, Toward exascale resilience: 2014 update, Supercomput. Front. Innov., 1 (2014), pp. 5--28, https://doi.org/10.14529/jsfi140101.
11.
D. Chazan and W. Miranker, Chaotic relaxation, Linear Algebra Appl., 2 (1969), pp. 199--222.
12.
Y. Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam, Algorithm 887: Cholmod, supernodal sparse Cholesky factorization and update/downdate, ACM Trans. Math. Softw., 35 (2008), https://doi.org/10.1145/1391989.1391995.
13.
A. T. Chronopoulos and C. W. Gear, On the efficient implementation of preconditioned s-step conjugate gradient methods on multiprocessors with memory hierarchy, Parallel Comput., 11 (1989), pp. 37--53.
14.
A. T. Chronopoulos and C. W. Gear, $s$-step iterative methods for symmetric linear systems, J. Comput. Appl. Math., 25 (1989), pp. 153--168, https://.doi.org/10.1016/0377-0427(89)90045-9.
15.
J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, and J. W. H. Liu, A supernodal approach to sparse partial pivoting, SIAM J. Matrix Anal. Appl., 20 (1999), pp. 720--755.
16.
V. Dolean, P. Jolivet, and F. Nataf, An Introduction to Domain Decomposition Methods: Algorithms, Theory, and Parallel Implementation, SIAM, Philadelphia, 2015, https://doi.org/10.1137/1.9781611974065.ch1.
17.
M. El Haddad, J. C. Garay, F. Magoulès, and D. B. Szyld, Synchronous and asynchronous optimized Schwarz methods for one-way subdivision of bounded domains, Numerical Linear Algebra Appl., 27 (2020), e2279.
18.
A. Frommer and D. B. Szyld, On asynchronous iterations, J. Comput. Appl. Math., 123 (2000), pp. 201--216.
19.
A. Frommer and D. B. Szyld, An algebraic convergence theory for restricted additive Schwarz methods using weighted max norms, SIAM J. Numer. Anal., 39 (2001), pp. 463--479.
20.
J. C. Garay, F. Magoulès, and D. B. Szyld, Synchronous and asynchronous optimized Schwarz method for Poisson's equation in rectangular domains, Technical report 17-10-18, Department of Mathematics, Temple University, Philadelphia, PA, 2018.
21.
J. C. Garay, F. Magoulès, and D. B. Szyld, Convergence of asynchronous optimized Schwarz methods in the plane, in Proceedings of Domain Decomposition Methods in Science and Engineering XXIV, P. E. Bjørstad, S. C. Brenner, L. Halpern, H. H. Kim, R. Kornhuber, T. Rahman, and O. B. Widlund, eds., Lecture Notes Comput. Sci. Eng., 2018, Springer, Berlin, pp. 333--341.
22.
P. Ghysels, T. J. Ashby, K. Meerbergen, and W. Vanroose, Hiding global communication latency in the GMRES algorithm on massively parallel machines, SIAM J. Sci. Comput., 35 (2013), pp. C48--C71.
23.
P. Jolivet, F. Hecht, F. Nataf, and C. Prudhomme, Scalable domain decomposition preconditioners for heterogeneous elliptic problems, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 13, Association for Computing Machinery, New York, 2013, https://doi.org/10.1145/2503210.2503212.
24.
P. Jolivet and F. Nataf, HPDDM---High-Performance Unified Framework for Domain Decomposition Methods, 2020, https://github.com/hpddm/hpddm.
25.
G. Karypis and V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput., 20 (1998), pp. 359--392, https://doi.org/10.1137/S1064827595287997.
26.
R. B. Lehoucq, D. C. Sorensen, and C. Yang, ARPACK Users' Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods, SIAM, Philadelphia, 1998.
27.
X. Li, J. Demmel, J. Gilbert, L. Grigori, M. Shao, and I. Yamazaki, SuperLU Users' Guide, Technical report LBNL-44289, Lawrence Berkeley National Laboratory, 1999, https://portal.nersc.gov/project/sparse/superlu/.
28.
J. Liu, J. Wu, and D. K. Panda, High performance RDMA-based MPI implementation over InfiniBand, Internat. J. Parallel Program., 32 (2004), pp. 167--198.
29.
F. Magoulès and G. Gbikpi-Benissan, Distributed convergence detection based on global residual error under asynchronous iterations, IEEE Trans. Parallel Distrib. Syst., 29 (2018), pp. 830--842.
30.
F. Magoulès, D. B. Szyld, and C. Venet, Asynchronous optimized Schwarz methods with and without overlap, Numer. Math., 137 (2017), pp. 199--227, https://doi.org/10.1007/s00211-017-0872-z.
31.
Z. Peng, Y. Xu, M. Yan, and W. Yin, ARock: An Algorithmic Framework for Asynchronous Parallel Coordinate Updates, SIAM J. Sci. Comput., 38 (2016), pp. A2851--A2879, https://doi.org/10.1137/15M1024950.
32.
M. Si, A. J. Pena, J. Hammond, P. Balaji, M. Takagi, and Y. Ishikawa, Casper: An asynchronous progress model for MPI RMA on many-core architectures, in Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS) IEEE, 2015, pp. 665--676.
33.
B. Smith, P. Bjorstad, and W. Gropp, Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations, Cambridge University Press, Cambridge, 2004.
34.
A. Toselli and O. Widlund, Domain Decomposition Methods: Algorithms and Theory, Springer Ser. Computer Math. 34, Springer, Berlin, 2006.
35.
J. Wolfson-Pou and E. Chow, Asynchronous multigrid methods, in Proceedings of the 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2019, pp. 101--110.
36.
I. Yamazaki, E. Chow, A. Bouteiller, and J. Dongarra, Performance of Asynchronous Optimized Schwarz with One-sided Communication, Parallel Comput., 86 (2019), pp. 66--81.
37.
I. Yamazaki, S. Rajamanickam, E. G. Boman, M. Hoemmen, M. A. Heroux, and S. Tomov, Domain decomposition preconditioners for communication-avoiding Krylov methods on a hybrid CPU/GPU cluster, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Press, Piscataway, NJ, 2014, pp. 933--944.

Information & Authors

Information

Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing
Pages: C384 - C409
ISSN (online): 1095-7197

History

Submitted: 11 October 2019
Accepted: 3 August 2020
Published online: 14 December 2020

Keywords

  1. asynchronous iteration
  2. domain decomposition
  3. Schwarz methods
  4. chaotic relaxation

MSC codes

  1. 68W10
  2. 65Y05
  3. 68W15
  4. 65N55

Authors

Affiliations

Sivasankaran Rajamanickam

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media