Abstract

A form compiler takes a high-level description of the weak form of partial differential equations and produces low-level code that carries out the finite element assembly. In this paper we present the Two-Stage Form Compiler (TSFC), a new form compiler with the main motivation being to maintain the structure of the input expression as long as possible. This facilitates the application of optimizations at the highest possible level of abstraction. TSFC features a novel, structure-preserving method for separating the contributions of a form to the subblocks of the local tensor in discontinuous Galerkin problems. This enables us to preserve the tensor structure of expressions longer through the compilation process than is possible with other form compilers. This is also achieved in part by a two-stage approach that cleanly separates the lowering of finite element constructs to tensor algebra in the first stage, from the scheduling of those tensor operations in the second stage. TSFC also efficiently traverses complicated expressions, and experimental evaluation demonstrates good compile-time performance even for highly complex forms.

Keywords

  1. code generation
  2. finite element method
  3. form compiler
  4. tensor algebra
  5. weak form

MSC codes

  1. 68N20
  2. 65M60
  3. 65N30

Formats available

You can view the full content in the following formats:

Supplementary Material


PLEASE NOTE: These supplementary files have not been peer-reviewed.


Index of Supplementary Materials

Title of paper: A Structure-Preserving Form Compiler

Authors: Miklos Homolya, Lawrence Mitchell, Fabio Luporini, David A. Ham

File: supplement.pdf

Type: PDF

Contents: Additional figures and code listings, as well as full description of the test cases used for the experimental evaluation.

References

1.
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, TensorFlow: A system for large-scale machine learning, in Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, Berkeley, CA, USENIX Association, 2016, pp. 265--283, https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf, https://arxiv.org/abs/1605.08695.
2.
A. Allam, J. Ramanujam, G. Baumgartner, and P. Sadayappan, Memory minimization for tensor contractions using integer linear programming, in Proceedings of the 20th IEEE International Parallel Distributed Processing Symposium, 2006, https://doi.org/10.1109/IPDPS.2006.1639717.
3.
M. S. Aln\aes, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg, C. Richardson, J. Ring, M. E. Rognes, and G. N. Wells, The FEniCS Project Version 1.5, Arch. Numer. Software, 3 (2015), pp. 9--23, https://doi.org/10.11588/ans.2015.100.20553.
4.
M. S. Aln\aes, A. Logg, K. B. Ø lgaard, M. E. Rognes, and G. N. Wells, Unified Form Language: A domain-specific language for weak formulations of partial differential equations, ACM Trans. Math. Software, 40 (2014), 9, https://doi.org/10.1145/2566630, https://arxiv.org/abs/1211.4047.
5.
M. S. Aln\aes and K.-A. Mardal, On the efficiency of symbolic computations combined with code generation for finite element methods, ACM Trans. Math. Software, 37 (2010), 6, https://doi.org/10.1145/1644001.1644007.
6.
B. Bagheri and R. Scott, Analysa, http://people.cs.uchicago.edu/~ridg/al/aa.html (accessed 2016-08-05).
7.
G. Balaban, M. S. Aln\aes, J. Sundnes, and M. E. Rognes, Adjoint multi-start-based estimation of cardiac hyperelastic material parameters using shear data, Biomech. Model. Mechanobiol., 15 (2016), pp. 1509--1521, https://doi.org/10.1007/s10237-016-0780-7, https://arxiv.org/abs/1603.03796.
8.
G. Baumgartner, A. A. Auer, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, C.-C. Lam, Q. Lu, M. Nooijen, R. M. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov, Synthesis of high-performance parallel programs for a class of Ab Initio quantum chemistry models, Proc. IEEE, 93 (2005), pp. 276--292, https://doi.org/10.1109/JPROC.2004.840311.
9.
C. Chiw, G. Kindlmann, and J. Reppy, EIN: An intermediate representation for compiling tensor calculus, in Compilers for Parallel Computing, CPC 2016, 2016, https://cpc2016.infor.uva.es/wp-content/uploads/2016/06/CPC2016_paper_21-compressed.pdf.
10.
C. Chiw, G. Kindlmann, J. Reppy, L. Samuels, and N. Seltzer, Diderot: A parallel DSL for image analysis and visualization, in Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, ACM, New York, 2012, pp. 111--120, https://doi.org/10.1145/2254064.2254079.
11.
D. Cociorva, G. Baumgartner, C.-C. Lam, P. Sadayappan, J. Ramanujam, M. Nooijen, D. E. Bernholdt, and R. Harrison, Space-time trade-off optimization for a class of electronic structure calculations, in Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI '02, ACM, New York, 2002, pp. 177--186, https://doi.org/10.1145/512529.512551.
12.
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2nd ed., The MIT Press, Cambridge, MA, 2001.
13.
P. Dular, C. Geuzaine, F. Henrotte, and W. Legros, A general environment for the treatment of discrete problems and its application to the finite element method, IEEE Trans. Magnetics, 34 (1998), pp. 3395--3398, https://doi.org/10.1109/20.717799.
14.
E. Epifanovsky, M. Wormit, T. Kuś, A. Landau, D. Zuev, K. Khistyaev, P. Manohar, I. Kaliman, A. Dreuw, and A. I. Krylov, New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations, J. Comput. Chem., 34 (2013), pp. 2293--2309, https://doi.org/10.1002/jcc.23377.
15.
F. Franchetti, Y. Voronenko, and M. Püschel, Formal loop merging for signal transforms, in Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, ACM, New York, 2005, pp. 315--326, https://doi.org/10.1145/1065010.1065048.
16.
G. Guennebaud, B. Jacob, et al., Eigen v3, 2010, http://eigen.tuxfamily.org (accessed 2017-12-14).
17.
A. Hartono, Q. Lu, X. Gao, S. Krishnamoorthy, M. Nooijen, G. Baumgartner, D. E. Bernholdt, V. Choppella, R. M. Pitzer, J. Ramanujam, A. Rountev, and P. Sadayappan, Identifying cost-effective common subexpressions to reduce operation count in tensor contraction evaluations, in Computational Science -- ICCS 2006, Lecture Notes in Comput. Sci. 3991, V. N. Alexandrov, G. D. van Albada, P. M. A. Sloot, and J. Dongarra, eds., Springer, New York, 2006, pp. 267--275, https://doi.org/10.1007/11758501_39.
18.
A. Hartono, A. Sibiryakov, M. Nooijen, G. Baumgartner, D. E. Bernholdt, S. Hirata, C.-C. Lam, R. M. Pitzer, J. Ramanujam, and P. Sadayappan, Automated operation minimization of tensor contraction expressions in electronic structure calculations, in Computational Science -- ICCS 2005, Lecture Notes in Comput. Sci. 3514, V. S. Sunderam, G. D. van Albada, P. M. A. Sloot, and J. J. Dongarra, eds., Springer, New York, 2005, pp. 155--164, https://doi.org/10.1007/11428831_20.
19.
F. Hecht, New development in freefem++, J. Numer. Math., 20 (2012), pp. 251--266, https://doi.org/10.1515/jnum-2012-0013.
20.
A. Heinecke, G. Henry, M. Hutchinson, and H. Pabst, Libxsmm: Accelerating small matrix multiplications by runtime code generation, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Press, Piscataway, NJ, 2016, pp. 981--991, https://doi.org/10.1109/SC.2016.83.
21.
M. Homolya, Experimentation framework for manuscript “TSFC: A structure-preserving form compiler,” version 1, May 2017, https://doi.org/10.5281/zenodo.573371.
22.
M. Homolya, R. C. Kirby, and D. A. Ham, Exposing and Exploiting Structure: Optimal Code Generation for High-Order Finite Element Methods, preprint, https://arxiv.org/abs/1711.02473, 2017; submitted to ACM Trans. Math. Software.
23.
A. B. Kahn, Topological sorting of large networks, Comm. ACM, 5 (1962), pp. 558--562, https://doi.org/10.1145/368996.369025.
24.
G. Kindlmann, C. Chiw, N. Seltzer, L. Samuels, and J. Reppy, Diderot: A domain-specific language for portable parallel scientific visualization and image analysis, IEEE Trans. Vis. Comput. Graphics, 22 (2016), pp. 867--876, https://doi.org/10.1109/TVCG.2015.2467449.
25.
R. C. Kirby, Algorithm 839: FIAT, a new paradigm for computing finite element basis functions, ACM Trans. Math. Software, 30 (2004), pp. 502--516, https://doi.org/10.1145/1039813.1039820.
26.
R. C. Kirby, A General Approach to Transforming Finite Elements, preprint, https://arxiv.org/abs/1706.09017, 2017.
27.
R. C. Kirby and A. Logg, A compiler for variational forms, ACM Trans. Math. Software, 32 (2006), pp. 417--444, https://doi.org/10.1145/1163641.1163644, https://arxiv.org/abs/1112.0402.
28.
R. C. Kirby and A. Logg, Efficient compilation of a class of variational forms, ACM Trans. Math. Software, 33 (2007), https://doi.org/10.1145/1268769.1268771, https://arxiv.org/abs/1205.3014.
29.
F. Kjolstad, S. Kamil, S. Chou, D. Lugato, and S. Amarasinghe, The tensor algebra compiler, Proc. ACM Programming Languages, 1 (2017), 77, https://doi.org/10.1145/3133901.
30.
D. E. Knuth, The Art of Computer Programming. Vol. 1: Fundamental Algorithms, Addison-Wesley, Reading, MA, 1968.
31.
F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, Factor graphs and the sum-product algorithm, IEEE Trans. Inform. Theory, 47 (2001), pp. 498--519, https://doi.org/10.1109/18.910572.
32.
C.-C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan, Memory-optimal evaluation of expression trees involving large objects, in High Performance Computing -- HiPC'99, Lecture Notes in Comput. Sci. 1745, P. Banerjee, V. K. Prasanna, and B. P. Sinha, eds., Springer, New York, 1999, pp. 103--110, https://doi.org/10.1007/978-3-540-46642-0_15.
33.
C.-C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan, Optimization of memory usage and communication requirements for a class of loops implementing multi-dimensional integrals, in Languages and Compilers for Parallel Computing, Lecture Notes in Comput. Sci. 1863, L. Carter and J. Ferrante, eds., Springer, New York, 1999, pp. 350--364, https://doi.org/10.1007/3-540-44905-1_22.
34.
C.-C. Lam, T. Rauber, G. Baumgartner, D. Cociorva, and P. Sadayappan, Memory-optimal evaluation of expression trees involving large objects, Comput. Languages Syst. Structures, 37 (2011), pp. 63--75, https://doi.org/10.1016/j.cl.2010.09.003.
35.
C.-C. Lam, P. Sadayappan, and R. Wenger, Optimal reordering and mapping of a class of nested-loops for parallel execution, in 9th International Workshop on Languages and Compilers for Parallel Computing, Springer, Berlin, Heidelberg, 1996, pp. 315--329, https://doi.org/10.1007/BFb0017261.
36.
C.-C. Lam, P. Sadayappan, and R. Wenger, On optimizing a class of multi-dimensional loops with reduction for parallel execution, Parallel Process. Lett., 07 (1997), pp. 157--168, https://doi.org/10.1142/S0129626497000176.
37.
A. Logg, K.-A. Mardal, and G. Wells, eds., Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book, Lect. Notes Comput. Sci. Eng. 84, Springer, New York, 2012, https://doi.org/10.1007/978-3-642-23099-8.
38.
A. Logg, K. B. Ø lgaard, M. E. Rognes, and G. N. Wells, FFC: The FEniCS form compiler, in Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book, A. Logg, K.-A. Mardal, and G. Wells, eds., Lect. Notes Comput. Sci. Eng. 84, Springer, New York, 2012, pp. 227--238, https://doi.org/10.1007/978-3-642-23099-8.
39.
K. Long, R. Kirby, and B. van Bloemen Waanders, Unified embedded parallel finite element computations via software-based Fréchet differentiation, SIAM J. Sci. Comput., 32 (2010), pp. 3323--3351, https://doi.org/10.1137/09076920X.
40.
F. Luporini, D. A. Ham, and P. H. J. Kelly, An algorithm for the optimization of finite element integration loops, ACM Trans. Math. Software, 44 (2017), 3, https://doi.org/10.1145/3054944, https://arxiv.org/abs/1604.05872.
41.
F. Luporini, A. L. Varbanescu, F. Rathgeber, G.-T. Bercea, J. Ramanujam, D. A. Ham, and P. H. J. Kelly, Cross-loop optimization of arithmetic intensity for finite element local assembly, ACM Trans. Architect. Code Optim., 11 (2015), 57, https://doi.org/10.1145/2687415, https://arxiv.org/abs/1407.0904.
42.
S. Müthing, M. Piatkowski, and P. Bastian, High-Performance Implementation of Matrix-Free High-Order Discontinuous Galerkin Methods, preprint, https://arxiv.org/abs/1711.10885, 2017.
43.
T. Nelson, A. Rivera, P. Balaprakash, M. Hall, P. D. Hovland, E. Jessup, and B. Norris, Generating efficient tensor contractions for GPUs, in 44th International Conference on Parallel Processing (ICPC), 2015, pp. 969--978, https://doi.org/10.1109/ICPP.2015.106.
44.
K. B. Ølgaard, A. Logg, and G. N. Wells, Automated code generation for discontinuous Galerkin methods, SIAM J. Sci. Comput., 31 (2008), pp. 849--864, https://doi.org/10.1137/070710032.
45.
K. B. Ølgaard and G. N. Wells, Optimisations for quadrature representations of finite element tensors through automated code generation, ACM Trans. Math. Software, 37 (2010), 8, https://doi.org/10.1145/1644001.1644009, https://arxiv.org/abs/1104.0199.
46.
M. Puschel, J. M. F. Moura, J. R. Johnson, D. Padua, M. M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, SPIRAL: Code generation for DSP transforms, Proc. IEEE, 93 (2005), pp. 232--275, https://doi.org/10.1109/JPROC.2004.840306.
47.
F. Rathgeber, D. A. Ham, L. Mitchell, M. Lange, F. Luporini, A. T. T. McRae, G.-T. Bercea, G. R. Markall, and P. H. J. Kelly, Firedrake: Automating the finite element method by composing abstractions, ACM Trans. Math. Software, 43 (2016), 24, https://doi.org/10.1145/2998441, https://arxiv.org/abs/1501.01809.
48.
B. A. Sanders, R. Bartlett, E. Deumens, V. Lotrich, and M. Ponton, A block-oriented language and runtime system for tensor algebra with very large arrays, in 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010, pp. 1--11, https://doi.org/10.1109/SC.2010.3.
49.
R. Sethi and J. D. Ullman, The generation of optimal code for arithmetic expressions, J. Assoc. Comput. Mach., 17 (1970), pp. 715--728, https://doi.org/10.1145/321607.321620.
50.
E. Solomonik, D. Matthews, J. Hammond, and J. Demmel, Cyclops Tensor Framework: Reducing communication and eliminating load imbalance in massively parallel contractions, in 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), 2013, pp. 813--824, https://doi.org/10.1109/IPDPS.2013.112.
51.
P. Springer and P. Bientinesi, Design of a high-performance GEMM-like tensor--tensor multiplication, ACM Trans. Math. Softw., 44 (2018), 28, https://doi.org/10.1145/3157733.
52.
G. Truter, Structure-Preserving Automatic Differentiation and Pull-backs in a Language for Variational Forms, master's thesis, Imperial College London, 2017.
53.
J. Xiong, J. Johnson, R. Johnson, and D. Padua, SPL: A language and compiler for DSP algorithms, in Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, PLDI '01, ACM, New York, 2001, pp. 298--308, https://doi.org/10.1145/378795.378860.
54.
Zenodo/COFFEE, COFFEE: A COmpiler for Fast Expression Evaluation, Feb. 2016, https://doi.org/10.5281/zenodo.573235.
55.
Zenodo/COFFEE, COFFEE: A COmpiler for Fast Expression Evaluation, May 2017, https://doi.org/10.5281/zenodo.573267.
56.
Zenodo/dijitso, A Python Module for Distributed Just-in-Time Shared Library Building, May 2017, https://doi.org/10.5281/zenodo.573287.
57.
Zenodo/FFC, FFC: The FEniCS Form Compiler, Apr. 2015, https://doi.org/10.5281/zenodo.573237.
58.
Zenodo/FFC, FFC: The FEniCS Form Compiler, May 2017, https://doi.org/10.5281/zenodo.573270.
59.
Zenodo/FIAT, FIAT: FInite Element Automated Tabulator, Feb. 2016, https://doi.org/10.5281/zenodo.573238.
60.
Zenodo/FIAT, FIAT: FInite Element Automated Tabulator, May 2017, https://doi.org/10.5281/zenodo.573269.
61.
Zenodo/FInAT, FInAT: A Smarter Library of Finite Elements, May 2017, https://doi.org/10.5281/zenodo.573266.
62.
Zenodo/Instant, Instant Is a Python Module that Allows for Instant Inlining of C and C++ Code in Python, May 2017, https://doi.org/10.5281/zenodo.573255.
63.
Zenodo/TSFC, TSFC: The Two-Stage Form Compiler, May 2017, https://doi.org/10.5281/zenodo.573271.
64.
Zenodo/UFL, UFL: Unified Form Language, Apr. 2015, https://doi.org/10.5281/zenodo.573236.
65.
Zenodo/UFL, UFL: Unified Form Language, May 2017, https://doi.org/10.5281/zenodo.573268.

Information & Authors

Information

Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing
Pages: C401 - C428
ISSN (online): 1095-7197

History

Submitted: 16 May 2017
Accepted: 6 April 2018
Published online: 26 June 2018

Keywords

  1. code generation
  2. finite element method
  3. form compiler
  4. tensor algebra
  5. weak form

MSC codes

  1. 68N20
  2. 65M60
  3. 65N30

Authors

Affiliations

Funding Information

Grantham Foundation for the Protection of the Environment https://doi.org/10.13039/100008118
Imperial College London https://doi.org/10.13039/501100000761
Engineering and Physical Sciences Research Council https://doi.org/10.13039/501100000266 : EP/M011054/1
Natural Environment Research Council https://doi.org/10.13039/501100000270 : NE/K008951/1

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media