We present an implementation of a stage-parallel preconditioner for Radau IIA type fully implicit Runge–Kutta methods, which approximates the inverse of the Runge–Kutta matrix \(A_Q\) from the Butcher tableau by the lower triangular matrix resulting from an LU decomposition and diagonalizes the system with as many blocks as stages. For the transformed system, we employ a block preconditioner where each block is distributed and solved by a subgroup of processes in parallel. For combination of partial results, we use either a communication pattern resembling Cannon’s algorithm or shared memory. A performance model and a large set of performance studies (including strong-scaling runs with up to 150k processes on 3k compute nodes) conducted for a time-dependent heat problem, using matrix-free finite element methods, indicate that the stage-parallel implementation can reach higher throughputs near the scaling limit. The achievable speedup increases linearly with the number of stages and is bounded by the number of stages. Furthermore, we show that the presented stage-parallel concepts are also applicable to the case that \(A_Q\) is directly diagonalized, which requires either complex arithmetic or solutions of two-by-two blocks, both exposing about half the parallelism. Alternatively to distributing stages and assigning them to distinct processes, we discuss the possibility of batching operations from different stages together.


  1. implicit Runge–Kutta methods
  2. Radau quadrature
  3. stage-parallel preconditioning
  4. finite element methods
  5. matrix-free methods
  6. geometric multigrid
  7. massively parallel

MSC codes

  1. 65Y05
  2. 65M55
  3. 68W10

Get full access to this article

View all available purchase options and get full access to this article.


The authors acknowledge discussions with Ben Southworth regarding extensions of the algorithms toward nonlinear equations and collaboration with the deal.II community.


R. Abu-Labdeh, S. MacLachlan, and P. E. Farrell, Monolithic Multigrid for Implicit Runge-Kutta Discretizations of Incompressible Fluid Flow, preprint, https://arxiv.org/abs/2202.07381, 2022.
M. Adams, M. Brezina, J. Hu, and R. Tuminaro, Parallel multigrid smoothing: Polynomial versus Gauss-Seidel, J. Comput. Phys., 188 (2003), pp. 593–610.
D. Arndt, W. Bangerth, B. Blais, M. Fehling, R. Gassmöller, T. Heister, L. Heltai, U. Köcher, M. Kronbichler, M. Maier, P. Munch, J.-P. Pelteret, S. Proell, K. Simon, B. Turcksin, D. Wells, and J. Zhang, The deal.II library, version 9.3, J. Numer. Math., 29 (2021), pp. 171–186.
D. Arndt, W. Bangerth, D. Davydov, T. Heister, L. Heltai, M. Kronbichler, M. Maier, J.-P. Pelteret, B. Turcksin, and D. Wells, The deal.II finite element library: Design, features, and insights, Comput. Math. Appl., 81 (2021), pp. 407–422.
O. Axelsson, Global integration of differential equations through lobatto quadrature, BIT, 4 (1964), pp. 69–86.
O. Axelsson, I. Dravins, and M. Neytcheva, Stage-parallel preconditioners for implicit Runge-Kutta methods of arbitrarily high order, linear problems, to appear.
O. Axelsson and M. Neytcheva, Numerical solution methods for implicit Runge-Kutta methods of arbitrarily high order, in 21st Conference on Scientific Computing, Vysoké Tatry-Podbanské 7, Slovakia, 2020, pp. 11–20.
O. Axelsson, M. Pourbagher, and D. K. Salkuyeh, Robust Iteration Methods for Complex Systems with an Indefinite Matrix Term, preprint, https://arxiv.org/abs/2110.00537, 2021.
T. A. Bickart, An efficient solution process for implicit runge-Kutta methods, SIAM J. Numer. Anal., 14 (1977), pp. 1022–1027.
M. Bolten, D. Moser, and R. Spech, A multigrid perspective on the parallel full approximation scheme in space and time, Numer. Linear Algebr. Appl., 24 (2017), e2110.
K. Burrage, C. Eldershaw, and R. Sidje, A parallel matrix-free implementation of a Runge-Kutta code, in Proceedings of the Joint Australian-Taiwanese Workshop on Analysis and Applications, Australian National University, Mathematical Sciences Institute, 1999, pp. 83–88.
J. C. Butcher, On the implementation of implicit Runge-Kutta methods, BIT, 16 (1976), pp. 237–240.
L. E. Cannon, A Cellular Computer to Implement the Kalman Filter Algorithm, Ph.D. thesis, Montana State University, 1969.
J. J. B. De Swart, W. M. Lioen, and W. A. Van Der Veen, Specification of PSIDE, CWI, Amsetrdam, 1998.
P. E. Farrell, R. C. Kirby, and J. Marchena-Menendez, Irksome: Automating Runge-Kutta time-stepping for finite element methods, ACM Trans. Math. Software, 47 (2021), pp. 30/1–26.
M. W. Gee, C. M. Siefert, J. J. Hu, R. S. Tuminaro, and M. G. Sala, ML 5.0 Smoothed Aggregation User’s Guide, Technical Report SAND2006-2649, Sandia National Laboratories, 2006.
G. Hager and G. Wellein, Introduction to High Performance Computing for Scientists and Engineers, CRC Press, Boca Raton, FL, 2010.
K. R. Jackson and S. P. Nørsett, The potential for parallelism in Runge-Kutta methods. Part I. RK formulas in standard form, J. Numer. Anal., 32 (1995), pp. 49–82.
L. O. Jay and T. Braconnier, A parallelizable preconditioner for the iterative solution of implicit Runge-Kutta-type methods, J. Comput. Appl. Math., 111 (1999), pp. 63–76.
C. A. Kennedy, M. H. Carpenter, and R. M. Lewis, Low-storage, explicit Runge-Kutta schemes for the compressible Navier-Stokes equations, Appl. Numer. Math., 35 (2000), pp. 177–219.
T. Kolev, P. Fischer, M. Min, J. Dongarra, J. Brown, V. Dobrev, T. Warburton, S. Tomov, M. S. Shephard, A. Abdelfattah, V. Barra, N. Beams, J.-S. Camier, N. Chalmers, Y. Dudouit, A. Karakus, I. Karlin, S. Kerkemeier, Y.-H. Lan, D. Medina, E. Merzari, A. Obabko, W. Pazner, T. Rathnayake, C. W. Smith, L. Spies, K. Swirydowicz, J. Thompson, A. Tomboulides, and V. Tomov, Efficient exascale discretizations: High-order finite element methods, Int. J. High Perform. Comput. Appl., 35 (2021), pp. 527–552.
M. Kronbichler and K. Kormann, A generic interface for parallel cell-based finite element operator application, Comput. Fluids, 63 (2012), pp. 135–147.
M. Kronbichler and K. Kormann, Fast matrix-free evaluation of discontinuous Galerkin finite element operators, ACM Trans. Math. Software, 45 (2019), 29.
M. Kronbichler, D. Sashko, and P. Munch, Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementations, Int J. High Perform. Comput. Appl. (2022).
J. L. Lions, Y. Maday, and G. Turinici, A “parareal” in time discretization of PDEs, C. R. Acad. Sci. Ser. I Math., 332 (2001), pp. 661–668.
K.-A. Mardal, T. K. Nilssen, and G. A. Staff, Order-optimal preconditioners for implicit Runge-Kutta schemes applied to parabolic PDEs, SIAM J. Sci. Comput., 29 (2007), pp. 361–375.
M. Masud Rana, V. E. Howle, K. Long, A. Meek, and W. Milestone, A new block preconditioner for implicit Runge-Kutta methods for parabolic PDE problems, SIAM J. Sci. Comput., 43 (2021), pp. S475–S495.
P. Munch, T. Heister, L. Prieto Saavedra, and M. Kronbichler, Efficient Distributed Matrix-Free Multigrid Methods on Locally Refined Meshes for FEM Computations, ACM Trans. Parallel Comput., https://arxiv.org/abs/2203.12292, 2023, https://doi.org/10.1145/3580314.
P. Munch, K. Kormann, and M. Kronbichler, hyper.deal: An efficient, matrix-free finite-element library for high-dimensional partial differential equations, ACM Trans. Math. Software, 47 (2021), pp. 33/1–34.
W. Pazner and P.-O. Persson, Stage-parallel fully implicit Runge-Kutta solvers for discontinuous Galerkin fluid simulations, J. Comput. Phys., 335 (2017), pp. 700–717.
B. S. Southworth, O. Krzysik, and W. Pazner, Fast solution of fully implicit Runge-Kutta and discontinuous Galerkin in time for numerical PDEs, part II: Nonlinearities and DAEs, SIAM J. Sci. Comput., 44 (2022), pp. A636–A663.
B. S. Southworth, O. Krzysik, W. Pazner, and H. D. Sterck, Fast solution of fully implicit Runge-Kutta and discontinuous Galerkin in time for numerical PDEs, part I: The linear setting, SIAM J. Sci. Comput., 44 (2022), pp. A416–A443.
G. A. Staff, K.-A. Mardal, and T. K. Nilssen, Preconditioning of fully implicit Runge-Kutta schemes for parabolic PDEs, MIC J., 27 (2006), pp. 109–123.
R. A. Van De Geijn and J. Watts, Summa: Scalable universal matrix multiplication algorithm, Concurrency-Pract. Ex., 9 (1997), pp. 255–274.
E. Hairer and G. Wanner, Solving Ordinary Differential Equations II, Springer Ser. Comput. Math. 14, Springer, Berlin, 1996.

Information & Authors


Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing
Pages: S71 - S96
ISSN (online): 1095-7197


Submitted: 15 June 2022
Accepted: 20 December 2022
Published online: 18 July 2023


  1. implicit Runge–Kutta methods
  2. Radau quadrature
  3. stage-parallel preconditioning
  4. finite element methods
  5. matrix-free methods
  6. geometric multigrid
  7. massively parallel

MSC codes

  1. 65Y05
  2. 65M55
  3. 68W10


Dedicated to the memory of Owe Axelsson



Corresponding author. Helmholtz-Zentrum Hereon, Geesthacht, 21502, and High-Performance Scientific Computing, University of Augsburg, Augsburg, 86159, Germany.
Department of Information Technology, Uppsala University, Uppsala, SE-75105, Sweden.
High-Performance Scientific Computing, University of Augsburg, Augsburg, 86159, Germany, and Uppsala University, Uppsala, SE-75105, Sweden.
Department of Information Technology, Uppsala University, Uppsala, SE-75105, Sweden.

Funding Information

Gauss Centre for Supercomputing: pr83te
Funding: This work was supported by the Bayerisches Kompetenznetzwerk für Technisch-Wissenschaftliches Hoch- und Höchstleistungsrechnen (KONWIHR) through the project “High-order matrix-free finite element implementations with hybrid parallelization and improved data locality.” The Gauss Centre for Supercomputing e.V. (https://www.gauss-centre.eu) funded this project by providing computing time on the GCS Supercomputer SuperMUC-NG at Leibniz Supercomputing Centre (LRZ, https://www.lrz.de) through project id pr83te. The work of the second author (fully) and the fourth author (partly) was supported by research grant VR-2017-03749, financed by the Swedish Research Council.

Metrics & Citations



If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item







Copy the content Link

Share with email

Email a colleague

Share on social media