Software and High-Performance Computing

Performance and Scalability of Hierarchical Hybrid Multigrid Solvers for Stokes Systems


In many applications involving incompressible fluid flow, the Stokes system plays an important role. Complex flow problems may require extremely fine resolutions, easily resulting in saddle-point problems with more than a trillion ($10^{12}$) unknowns. Even on the most advanced supercomputers, the fast solution of such systems of equations is a highly nontrivial and challenging task. In this work we consider a realization of an iterative saddle-point solver which is based mathematically on the Schur-complement formulation of the pressure and algorithmically on the abstract concept of hierarchical hybrid grids. The design of our fast multigrid solver is guided by an innovative performance analysis for the computational kernels in combination with a quantification of the communication overhead. Excellent node performance and good scalability to almost a million parallel threads are demonstrated on different characteristic types of modern supercomputers.


  1. hierarchical hybrid grids
  2. multigrid methods
  3. parallel solver
  4. node performance
  5. Stokes system

MSC codes

  1. 65C
  2. 65N
  3. 68W

Get full access to this article

View all available purchase options and get full access to this article.


W. Bangerth, C. Burstedde, T. Heister, and M. Kronbichler, Algorithms and data structures for massively parallel generic adaptive finite element codes, ACM Trans. Math. Software, 38 (2011), article 14.
P. Bastian, M. Blatt, A. Dedner, C. Engwer, R. Klöfkorn, M. Ohlberger, and O. Sander, A generic grid interface for parallel and adaptive scientific computing. I: Abstract framework, Computing, 82 (2008), pp. 103--119.
J. R. Baumgardner and P. O. Frederickson, Icosahedral discretization of the two-sphere, SIAM J. Numer. Anal., 22 (1985), pp. 1107--1115.
M. Bercovier and O. Pironneau, Error estimates for finite element method solution of the Stokes problem in the primitive variables, Numer. Math., 33 (1979), pp. 211--224.
B. Bergen, T. Gradl, U. Rüde, and F. Hülsemann, A massively parallel multigrid method for finite elements, Comput. Sci. Eng., 8 (2006), pp. 56--62.
B. Bergen and F. Hülsemann, Hierarchical hybrid grids: Data structures and core algorithms for multigrid, Numer. Linear Algebra Appl., 11 (2004), pp. 279--291.
F. Brezzi, D. Boffi, L. Demkowicz, R. G. Duran, R. S. Falk, and M. Fortin, Mixed Finite Elements, Compatibility Conditions, and Applications, Springer-Verlag, Berlin, 2008.
A. N. Brooks and T. J. R. Hughes, Streamline upwind/Petrov-Galerkin formulations for convection dominated flows with particular emphasis on the incompressible Navier-Stokes equations, Comput. Methods Appl. Mech. Engrg., 32 (1982), pp. 199--259.
H. P. Bunge and J. R. Baumgardner, Mantle convection modeling on parallel virtual machines, Comput. Phys., 9 (1995), pp. 207--215.
C. Burstedde, G. Stadler, L. Alisic, L. C. Wilcox, E. Tan, M. Gurnis, and O. Ghattas, Large-scale adaptive mantle convection simulation, Geophys. J. Int., 192 (2013), pp. 889--906.
C. Burstedde, L. C. Wilcox, and O. Ghattas, \textttp$4$est: Scalable algorithms for parallel adaptive mesh refinement on forests of octrees, SIAM J. Sci. Comput., 33 (2011), pp. 1103--1133.
B. Chapman, G. Jost, and R. Van Der Pas, Using OpenMP: Portable Shared Memory Parallel Programming, Sci. Engrg. Comput. 10, MIT Press, Cambridge, MA, 2008.
D. Chen, N. Eisley, P. Heidelberger, S. Kumar, A. Mamidala, F. Petrini, R. Senger, Y. Sugawara, R. Walkup, B. Steinmacher-Burow, et al., Looking under the hood of the IBM Blue Gene/Q network, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, IEEE Computer Society Press, Piscataway, NJ, 2012, article 69.
D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. L. Satterfield, B. Steinmacher-Burow, and J. J. Parker, The IBM Blue Gene/Q interconnection network and message unit, in Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), IEEE, Piscataway, NJ, 2011, pp. 1--10.
E. Chow, R. D. Falgout, J. J. Hu, R. S. Tuminaro, and U. Meier-Yang, A survey of parallelization techniques for multigrid solvers, in Parallel Processing for Scientific Computing, M. A. Heroux, P. Raghavan, and H. D. Simon, eds., Software, Environments, and Tools 20, SIAM, Philadelphia, 2006, pp. 179--201.
T. Dickopf, D. Krause, R. Krause, and M. Potse, Analysis of a Lightweight Parallel Adaptive Scheme for the Solution of the Monodomain Equation, Technical report, Institute of Computational Science, University of Lugano, Lugano, Switzerland, 2013.
H. C. Elman, D. J. Silvester, and A. J. Wathern, Finite Elements and Fast Iterative Solvers: With Applications in Incompressible Fluid Dynamics, Oxford University Press, New York, 2005.
R. D. Falgout, J. E. Jones, and U. Meier-Yang, The design and implementation of hypre, a library of parallel high performance preconditioners, in Numerical Solution of Partial Differential Equations on Parallel Computers, Lect. Notes Comput. Sci. Eng. 51, Springer, Berlin, 2006, pp. 267--294.
C. Flaig and P. Arbenz, A scalable memory efficient multigrid solver for micro-finite element analyses based on CT images, Parallel Comput., 37 (2011), pp. 846--854.
J. Gaidamour, J. Hu, C. Siefert, and R. Tuminaro, Design considerations for a flexible multigrid preconditioning library, Scientific Programming, 20 (2012), pp. 223--239.
T. Geenen, M. ur Rehman, S. P. MacLachlan, G. Segal, C. Vuik, A. P. van den Berg, and W. Spakman, Scalable robust solvers for unstructured FE geodynamic modeling applications: Solving the Stokes equation for models with large localized viscosity contrasts, Geochem. Geophys. Geosyst., 10 (2009), Q09002.
M. Geimer, F. Wolf, B. J. N. Wylie, E. Ábrahám, D. Becker, and B. Mohr, The Scalasca performance toolset architecture, Concurrency and Computation: Practice and Experience, 22 (2010), pp. 702--719.
C. Geuzaine and J.-F. Remacle, GMSH: A $3$-d finite element mesh generator with built-in pre-and post-processing facilities, Internat. J. Numer. Methods Engrg., 79 (2009), pp. 1309--1331.
P. Ghysels and W. Vanrose, Modeling the performance of geometric multigrid on multicore computer architectures, SIAM J. Sci. Comput., to appear.
B. Gmeiner, T. Gradl, F. Gaspar, and U. Rüde, Optimization of the multigrid-convergence rate on semi-structured meshes by local Fourier analysis, Comput. Math. Appl., 65 (2013), pp. 694--711.
B. Gmeiner, H. Köstler, M. Stürmer, and U. Rüde, Parallel multigrid on hierarchical hybrid grids: A performance study on current high performance computing clusters, Concurrency and Computation: Practice and Experience, 26 (2014), pp. 217--240.
B. Gmeiner, M. Mohr, and U. Rüde, Hierarchical hybrid grids for mantle convection: A first study, in Proceedings of the 11th International Symposium on Parallel and Distributed Computing (ISPDC), IEEE, Piscataway, NJ, 2012, pp. 309--314.
P. P. Grinevich and M. A. Olshanskii, An iterative method for the Stokes-type problem with variable viscosity, SIAM J. Sci. Comput., 31 (2009), pp. 3959--3978.
W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message-Passing Interface, Sci. Engrg. Comput. 1, MIT Press, Cambridge, MA, 1999.
G. Hager, J. Treibig, J. Habich, and G. Wellein, Exploring performance and power properties of modern multi-core chips via simple machine models, Concurrency and Computation: Practice and Experience, published online Dec. 12, 2014.
G. Hager and G. Wellein, Introduction to High Performance Computing for Scientists and Engineers, CRC Press, Boca Raton, FL, 2010.
V. Heuveline, D Lukarski, N. Trost, and J.-P. Weiss, Parallel smoothers for matrix-based geometric multigrid methods on locally refined meshes using multicore CPUs and GPUs, in Facing the Multicore-Challenge II, Lecture Notes in Comput. Sci. 7174, Springer-Verlag, Berlin, Heidelberg, 2012, pp. 158--171.
T. J. R. Hughes, L. P. Franca, and M. Balestra, A new finite element formulation for computational fluid dynamics: V. Circumventing the Babuška-Brezzi condition: A stable Petrov--Galerkin formulation of the Stokes problem accommodating equal-order interpolations, Comput. Methods Appl. Mech. Engrg., 59 (1986), pp. 85--99.
F. Hülsemann, M. Kowarschik, M. Mohr, and U. Rüde, Parallel geometric multigrid, in Numerical Solution of Partial Differential Equations on Parallel Computers, A. M. Bruaset and A. Tveito, eds., Lect. Notes Comput. Sci. Eng. 51, Springer, Berlin, 2005, pp. 165--208.
M. Kowarschik, U. Rüde, C. Weiss, and W. Karl, Cache-aware multigrid methods for solving Poisson's equation in two dimensions, Computing, 64 (2000), pp. 381--399.
M. Kronbichler, T. Heister, and W. Bangerth, High accuracy mantle convection simulation through modern numerical methods, Geophysics J. Int., 191 (2012), pp. 12--29.
S. Kumar, A. R. Mamidala, D. A. Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, et al., PAMI: A parallel active message interface for the Blue Gene/Q supercomputer, in Proceedings of the 26th International Parallel & Distributed Processing Symposium (IPDPS), IEEE, Piscataway, NJ, 2012, pp. 763--773.
A. Logg, K. B. Ølgaard, M. E. Rognes, and G. N. Wells, FFC: The FEniCS form compiler, in Automated Solution of Differential Equations by the Finite Element Method, Lect. Notes Comput. Sci. Eng. 84, A. Logg, K.-A. Mardal, and G. N. Wells, eds., Springer, Berlin, 2012, Chap. 11.
C. Lomont, Introduction to INTEL Advanced Vector Extensions, Intel White Paper, 2011.
J. D. McCalpin, Memory bandwidth and machine balance in current high performance computers, IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, 1995, pp. 19--25.
E. H. Mueller and R. Scheichl, Massively Parallel Solvers for Elliptic PDEs in Numerical Weather- and Climate Prediction, preprint, ArXiv:1307.2036[CS.DC], 2013.
A. Neic, M. Liebmann, G. Haase, and G. Plank, Algebraic multigrid solver on clusters of CPUs and GPUs, in Applied Parallel and Scientific Computing, Lecture Notes in Comput. Sci. 7134, Springer-Verlag, Berlin, Heidelberg, 2012, pp. 389--398.
G. Romanazzi and P. K. Jimack, Parallel performance prediction for multigrid codes on distributed memory architectures, in High Performance Computing and Communications, Lecture Notes in Comput. Sci. 4782, Springer-Verlag, Berlin, Heidelberg, 2007, pp. 647--658.
H. Sundar, G. Biros, C. Burstedde, J. Rudi, O. Ghattas, and G. Stadler, Parallel geometric-algebraic multigrid on unstructured forests of octrees, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, IEEE, Piscataway, NJ, 2012, article 43.
J. Treibig and G. Hager, Introducing a performance model for bandwidth-limited loop kernels, in Parallel Processing and Applied Mathematics, R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, eds., Lecture Notes in Comput. Sci. 6067, Springer-Verlag, Berlin, Heidelberg, 2010, pp. 615--624.
R. Verfürth, A combined conjugate gradient-multigrid algorithm for the numerical solution of the Stokes problem, IMA J. Numer. Anal., 4 (1984), pp. 441--455.
M. Wang and L. Chen, Multigrid methods for the Stokes equations using distributive Gauss–Seidel relaxations based on the least squares commutator, J. Sci. Comput., 56 (2013), pp. 409--431.
C. Wieners, A geometric data structure for parallel finite elements and the application to multigrid methods with block smoothing, Comput. Vis. Sci., 13 (2010), pp. 161--175.
S. Webb Williams, A. Waterman, and D. A. Patterson, Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures, Tech. Report UCB/EECS-2008-134, EECS Department, University of California, Berkeley, CA, 2008.

Information & Authors


Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing
Pages: C143 - C168
ISSN (online): 1095-7197


Submitted: 15 October 2013
Accepted: 10 November 2014
Published online: 18 March 2015


  1. hierarchical hybrid grids
  2. multigrid methods
  3. parallel solver
  4. node performance
  5. Stokes system

MSC codes

  1. 65C
  2. 65N
  3. 68W



Metrics & Citations



If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By







Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account