Computational Methods in Science and Engineering

Stencil Scaling for Vector-Valued PDEs on Hybrid Grids With Applications to Generalized Newtonian Fluids

Abstract

Matrix-free finite element implementations for large applications provide an attractive alternative to standard sparse matrix data formats due to the significantly reduced memory consumption. Here, we show that they are also competitive with respect to the run-time in the low-order case if combined with suitable stencil scaling techniques. We focus on variable coefficient vector-valued partial differential equations as they arise in many physical applications. The presented method is based on scaling constant reference stencils originating from a linear finite element discretization instead of evaluating the bilinear forms on the fly. This method assumes the usage of hierarchical hybrid grids, and it may be applied to vector-valued second-order elliptic partial differential equations directly or as a part of more complicated problems. We provide theoretical and experimental performance estimates showing the advantages of this new approach compared to the traditional on-the-fly integration and stored matrix approaches. In our numerical experiments, we consider two specific mathematical models, namely, linear elastostatics and incompressible Stokes flow. The final example considers a nonlinear shear-thinning generalized Newtonian fluid. For this type of nonlinearity, we present an efficient approach for computing a regularized strain rate which is then used to define the nodewise viscosity. Depending on the compute architecture, we could observe maximum speedups of 64% and 122% compared to the on-the-fly integration. The largest considered example involved solving a Stokes problem with 12288 compute cores on the state-of-the-art supercomputer SuperMUC-NG.

Keywords

  1. matrix-free
  2. finite elements
  3. variable coefficients
  4. stencil scaling

MSC codes

  1. 65N30
  2. 65N55
  3. 65Y05
  4. 65Y20

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
A. Afzal, The Cost of Computation: Metrics and Models for Modern Multicore-Based Systems in Scientific Computing, Master's thesis, Department Informatik, Friedrich Alexander Universität Erlangen-Nürnberg, 2015.
2.
C. L. Alappat, J. Hofmann, G. Hager, H. Fehske, A. R. Bishop, and G. Wellein, Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors, preprint, https://arxiv.org/abs/2002.03344, 2020.
3.
P. Arbenz, G. H. van Lenthe, U. Mennel, R. Müller, and M. Sala, A scalable multi-level preconditioner for matrix-free $\mu$-finite element analysis of human bone structures, Int. J. Numer. Methods Engrg., 73 (2008), pp. 927--947.
4.
S. Bauer, D. Drzisga, M. Mohr, U. Rüde, C. Waluga, and B. Wohlmuth, A stencil scaling approach for accelerating matrix-free finite element implementations, SIAM J. Sci. Comput., 40 (2018), pp. C748--C778, https://doi.org/10.1137/17M1148384.
5.
B. Bergen, Hierarchical Hybrid Grids: Data Structures and Core Algorithms for Efficient Finite Element Simulations on Supercomputers, SCS Publishing House, 2005.
6.
B. Bergen and F. Hülsemann, Hierarchical hybrid grids: Data structures and core algorithms for multigrid, Numer. Linear Algebra Appl., 11 (2004), pp. 279--291.
7.
B. Bergen, G. Wellein, F. Hülsemann, and U. Rüde, Hierarchical hybrid grids: Achieving TERAFLOP performance on large scale finite element simulations, Int. J. Parallel Emergent Distrib. Syst., 22 (2007), pp. 311--329.
8.
J. Bey, Tetrahedral grid refinement, Computing, 55 (1995), pp. 355--378.
9.
J. Bielak, O. Ghattas, and E.-J. Kim, Parallel octree-based finite element method for large-scale earthquake ground motion simulation, CMES Comput. Model. Eng. Sci., 10 (2005), pp. 99--112.
10.
F. Brezzi and J. Pitkäranta, On the stabilization of finite element approximations of the Stokes equations, in Efficient Solutions of Elliptic Systems, Springer, 1984, pp. 11--19.
11.
J. Brown, Efficient nonlinear solvers for nodal high-order finite elements in $3$D, J. Sci. Comput., 45 (2010), pp. 48--63.
12.
G. F. Carey and B.-N. Jiang, Element-by-element linear and nonlinear solution schemes, Comm. Appl. Numer. Methods, 2 (1986), pp. 145--153.
13.
R. S. Dembo, S. C. Eisenstat, and T. Steihaug, Inexact Newton methods, SIAM J. Numer. Anal., 19 (1982), pp. 400--408, https://doi.org/10.1137/0719025.
14.
D. Drzisga, L. John, U. Rüde, B. Wohlmuth, and W. Zulehner, On the analysis of block smoothers for saddle point problems, SIAM J. Matrix Anal. Appl., 39 (2018), pp. 932--960, https://doi.org/10.1137/16M1106304.
15.
C. Flaig and P. Arbenz, A highly scalable matrix-free multigrid solver for $\mu$FE analysis based on a pointer-less octree, in Large-Scale Scientific Computing: 8th International Conference (LSSC 2011), Sozopol, Bulgaria, Revised Selected Papers, I. Lirkov, S. Margenov, and J. Waśniewski, eds., Springer, 2012, pp. 498--506.
16.
G. P. Galdi, R. Rannacher, A. M. Robertson, and S. Turek, Hemodynamical Flows: Modeling, Analysis and Simulation, Birkhäuser, 2008.
17.
L. J. Gibson and M. F. Ashby, Cellular Solids: Structure and Properties, Cambridge University Press, 1999.
18.
B. Gmeiner, T. Gradl, H. Köstler, and U. Rüde, Highly parallel geometric multigrid algorithm for hierarchical hybrid grids, in NIC Symposium 2012 - Proceedings, Jülich, Germany, NIC Ser. 45, 2012, pp. 323--330.
19.
G. Hager and G. Wellein, Introduction to High Performance Computing for Scientists and Engineers, CRC Press, 2010.
20.
A. Ilic, F. Pratas, and L. Sousa, Cache-aware Roofline model: Upgrading the loft, IEEE Comput. Architecture Lett., 13 (2013), pp. 21--24.
22.
Intel Corp., Intel VTune Profiler, https://software.intel.com/en-us/vtune, 2019.
23.
N. Kohl, D. Thönnes, D. Drzisga, D. Bartuschat, and U. Rüde, The HyTeG finite-element software framework for scalable multigrid solvers, Int. J. Parallel Emergent Distrib. Syst., 34 (2019), pp. 477--496.
24.
M. Kronbichler and K. Kormann, A generic interface for parallel cell-based finite element operator application, Comput. & Fluids, 63 (2012), pp. 135--147.
25.
K. Ljungkvist, Matrix-free finite-element computations on graphics processors with adaptively refined unstructured meshes, in Proceedings of the 25th High Performance Computing Symposium (HPC '17), Society for Computer Simulation International, 2017, pp. 1:1--1:12.
26.
K. Ljungkvist and M. Kronbichler, Multigrid for Matrix-Free Finite Element Computations on Graphics Processors, Tech. report 2017-006, Department of Information Technology, Uppsala University, 2017.
27.
J. Loffeld and J. Hittinger, On the arithmetic intensity of high-order finite-volume discretizations for hyperbolic systems of conservation laws, Int. J. High Performance Comput. Appl., 33 (2019), pp. 25--52.
28.
LRZ, Hardware of SuperMUC-NG, https://doku.lrz.de/display/PUBLIC/Hardware+of+SuperMUC-NG (retrieved 25 February 2020).
29.
LRZ, SuperMUC Petascale System, https://www.lrz.de/services/compute/supermuc/systemdescription/ (retrieved 29 November 2018).
30.
D. A. May, J. Brown, and L. L. Pourhiet, A scalable, matrix-free multigrid preconditioner for finite element discretizations of heterogeneous Stokes flow, Comput. Methods Appl. Mech. Engrg., 290 (2015), pp. 496--523.
31.
Y. Ricard, Physics of mantle convection, Treatise on Geophysics, 7 (2007), pp. 31--81.
32.
U. Rüde, C. Waluga, and B. Wohlmuth, Mass-corrections for the conservative coupling of flow and transport on collocated meshes, J. Comput. Phys., 305 (2016), pp. 319--332.
33.
J. Rudi, G. Stadler, and O. Ghattas, Weighted BFBT preconditioner for Stokes flow problems with highly heterogeneous viscosity, SIAM J. Sci. Comput., 39 (2017), pp. S272--S297, https://doi.org/10.1137/16M108450X.
34.
H. Stengel, J. Treibig, G. Hager, and G. Wellein, Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model, in Proceedings of the 29th ACM International Conference on Supercomputing, ACM, 2015, pp. 207--216.
35.
B. van Rietbergen, H. Weinans, R. Huiskes, and B. Polman, Computational strategies for iterative solutions of large FEM applications employing voxel data, Int. J. Numer. Methods Engrg., 39 (1996), pp. 2743--2767.
36.
S. Williams, A. Waterman, and D. Patterson, Roofline: An insightful visual performance model for multicore architectures, Commun. ACM, 52 (2009), pp. 65--76.
37.
Y. Zhang, D. Rodrigue, and A. Ait-Kadi, High density polyethylene foams. II. Elastic modulus, J. Appl. Polymer Sci., 90 (2003), pp. 2120--2129.
38.
O. C. Zienkiewicz and J. Z. Zhu, The superconvergent patch recovery and a posteriori error estimates: Part 1: The recovery technique, Int. J. Numer. Methods Engrg., 33 (1992), pp. 1331--1364.

Information & Authors

Information

Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing
Pages: B1429 - B1461
ISSN (online): 1095-7197

History

Submitted: 13 June 2019
Accepted: 27 August 2020
Published online: 1 December 2020

Keywords

  1. matrix-free
  2. finite elements
  3. variable coefficients
  4. stencil scaling

MSC codes

  1. 65N30
  2. 65N55
  3. 65Y05
  4. 65Y20

Authors

Affiliations

Funding Information

Deutsche Forschungsgemeinschaft https://doi.org/10.13039/501100001659 : W0671/11-1

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media