Abstract

A priori bounds are derived for the discrete solution of second-order elliptic partial differential equations (PDEs). The bounds have two contributions. First, the influence of boundary conditions is taken into account through a discrete maximum principle. Second, the contribution of the source field is evaluated in a fashion similar to that used in the treatment of the continuous a priori operators. Closed form expressions are, in particular, obtained for the case of a conservative, second-order finite difference approximation of the diffusion equation with variable scalar diffusivity. The bounds are then incorporated into a resilient domain decomposition framework, in order to verify the admissibility of local PDE solutions. The computations demonstrate that the bounds are able to detect most system faults, and thus considerably enhance the resilience and the overall performance of the solver.

Keywords

  1. elliptic PDE
  2. maximum principle
  3. discrete bounds
  4. resilience
  5. exascale computing
  6. domain decomposition

MSC codes

  1. 35J15

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
Microprocessor Standards Committee of the IEEE Computer Society, IEEE Standard for Floating-Point Arithmetic, Technical report, ANSI/IEEE Std 754--2008, 2008, doi:10.1109/IEEESTD.2008.4610935.
2.
W. Bland, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, Post-failure recovery of MPI communication capability: Design and rationale, Int. J. High Perform. Comput. App., 27 (2013), pp. 244--254, doi:10.1177/1094342013488238.
3.
G. Bosilca, R. Delmas, J. Dongarra, and J. Langou, Algorithm-based fault tolerance applied to high performance computing, J. Parallel Distrib. Comput., 69 (2009), pp. 410--416, doi:10.1016/j.jpdc.2008.12.002.
4.
J. H. Bramble and B. E. Hubbard, New monotone type approximations for elliptic problems, Math. Comp., 18 (1964), pp. 349--367.
5.
P. G. Bridges, K. B. Ferreira, M. A. Heroux, and M. Hoemmen, Fault-tolerant linear solvers via selective reliability, pre-print, https://arxiv.org/abs/1206.1390, 2012.
6.
E. J. Candès and Y. Plan, Near-ideal model selection by $\ell_1$ minimization, Ann. Statist., 37 (2009), pp. 2145--2177.
7.
E. J. Candès and M. B. Wakin, An introduction to compressive sampling, IEEE Signal Process. Mag., 25 (2008), pp. 21--30, doi:10.1109/MSP.2007.914731.
8.
F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, and M. Snir, Toward exascale resilience: 2014 update, Supercomput. Frontiers and Innovations, 1 (2014), http://superfri.org/superfri/article/view/14.
9.
R. Chartrand and W. Yin, Iteratively reweighted algorithms for compressive sensing, in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008, IEEE, Piscataway, NJ, 2008, pp. 3869--3872.
10.
Z. Chen, Algorithm-based recovery for iterative methods without checkpointing, in Proceedings of the 20th International Symposium on High Performance Distributed Computing, HPDC '11, ACM, New York, 2011, pp. 73--84, doi:10.1145/1996130.1996142.
11.
P. G. Ciarlet, Discrete maximum principle for finite-difference operators, Aequationes Math., 4 (1970), pp. 338--352.
12.
D. Coleman, P. Holland, N. Kaden, V. Klema, and S. C. Peters, A system of subroutines for iteratively reweighted least squares computations, ACM Trans. Math. Software, 6 (1980), pp. 327--336, doi:10.1145/355900.355905.
13.
L. Collatz, The Numerical Treatment of Differential Equations, Grundkhren Math. Wiss., 60, Springer, 1960.
14.
I. Daubechies, R. DeVore, M. Fornasier, and C. S. Güntürk, Iteratively reweighted least squares minimization for sparse recovery, Comm. Pure Appl. Math., 63 (2010), pp. 1--38.
15.
C. Ding, C. Karlsson, H. Liu, T. Davies, and Z. Chen, Matrix multiplication on GPUS with on-line fault tolerance, in 2011 IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA), IEEE, Picataway, NJ, 2011, pp. 311--317, doi:10.1109/ISPA.2011.50.
16.
P. Du, A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, Algorithm-based fault tolerance for dense matrix factorizations, in Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, ACM, New York, 2012, pp. 225--234, doi:10.1145/2145816.2145845.
17.
P. DuChateau and D. Zachmann, Applied Partial Differential Equations, Dover Books on Mathematics, Dover Publications, 2002.
18.
C. Engelmann and T. Naughton, Toward a performance/resilience tool for hardware/software co-design of high-performance computing systems, in 2013 42nd International Conference on Parallel Processing (ICPP), IEEE, Piscatway, NJ, 2013, pp. 960--969, doi:10.1109/ICPP.2013.114.
19.
K. Ferreira, J. Stearley, J. H. Laros, III, R. Oldfield, K. Pedretti, R. Brightwell, R. Riesen, P. G. Bridges, and D. Arnold, Evaluating the viability of process replication reliability for exascale systems, in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, ACM, New York, 2011, pp. 44:1--44:12, doi:10.1145/2063384.2063443.
20.
D. Gilbarg and N. S. Trudinger, Elliptic Partial Differential Equations of Second Order, Springer, New York, 2001.
21.
T.-Z. Huang and Y. Zhu, Estimation of $\|A^{-1}\|_{\infty}$ for weakly chained diagonally dominant m-matrices, Linear Algebra Appl., 432 (2010), pp. 670--677.
22.
J. Karátson and S. Korotov, Discrete maximum principles for finite element solutions of nonlinear elliptic problems with mixed boundary conditions, Numer. Math., 99 (2005), pp. 669--698.
23.
P. Laplante, Dictionary of Computer Science, Engineering and Technology, CRC/Boca Raton, FL, 2000.
24.
M.-L. Li, P. Ramachandran, S. K. Sahoo, S. V. Adve, V. S. Adve, and Y. Zhou, Understanding the propagation of hard errors to software and implications for resilient system design, SIGOPS Oper. Syst. Rev., 42 (2008), pp. 265--276, doi:10.1145/1353535.1346315.
25.
W. Li, The infinity norm bound for the inverse of nonsingular diagonal dominant matrices, Appl. Math. Lett., 21 (2008), pp. 258--263.
26.
K. Malkowski, P. Raghavan, and M. Kandemir, Analyzing the soft error resilience of linear solvers on multicore multiprocessors, in 2010 IEEE International Symposium on, Parallel Distributed Processing (IPDPS), IEEE, Piscataway, NJ, 2010, pp. 1--12, doi:10.1109/IPDPS.2010.5470411.
27.
P. Mycek, A. Contreras, O. Le Maître, K. Sargsyan, F. Rizzi, K. Morris, C. Safta, B. Debusschere, and O. Knio, A Resilient Domain Decomposition Polynomial Chaos Solver for Uncertain Elliptic PDEs manuscript.
28.
M. R. Osborne, Finite Algorithms in Optimization and Data Analysis, Wiley, New York, 1985.
29.
S. Portnoy and R. Koenker, et al., The Gaussian hare and the Laplacian tortoise: Computability of squared-error versus absolute-error estimators, Statist. Sci., 12 (1997), pp. 279--300.
30.
A. Quarteroni and A. Valli, Domain Decomposition Methods for Partial Differential Equations, Numer. Math. Sci. Comput., Clarendon, OxFord, 1999.
31.
F. Rizzi, K. Morris, K. Sargsyan, P. Mycek, C. Safta, B. Debusschere, O. Le Maître, and O. Knio, ULFM-MPI implementation of a resilient task-based partial differential equations preconditioner, in Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale, FTXS '16, ACM, New York, 2016, pp. 19--26, doi:10.1145/2909428.2909429.
32.
F. Rizzi, K. Morris, K. Sargsyan, P. Mycek, C. Safta, H. Najm, O. Le Maître, O. Knio, and B. Debusschere, Partial differential equations preconditioner resilient to soft and hard faults, in IEEE Internationl Conference on Cluster Computing, IEEE, Piscataway, NJ, 2015 pp. 552--562.
33.
K. Sargsyan, F. Rizzi, P. Mycek, C. Safta, K. Morris, H. Najm, O. Le Maître, O. Knio, and B. Debusschere, Fault resilient domain decomposition preconditioner for PDEs, SIAM J. Sci. Comput., 37 (2015), pp. A2317--A2345, https://doi.org/10.1137/15M1014474.
34.
J. A. Scales and A. Gersztenkorn, Robust methods in inverse theory, Inverse Problems, 4 (1988), pp. 1071--1092.
35.
E. J. Schlossmacher, An iterative technique for absolute deviations curve fitting, J. Amer. Statist. Assoc., 68 (1973), pp. 857--859, http://www.jstor.org/stable/2284512.
36.
B. Schroeder and G. Gibson, A large-scale study of failures in high-performance computing systems, IEEE Trans., Dependable Secure Comput., 7 (2010), pp. 337--350.
37.
M. Shashkov, Conservative Finite-Difference Methods on General Grids, CRC, Boca Raton, FL, 1995.
38.
P. N. Shivakumar, J. J. Williams, Q. Ye, and C. A. Marinov, On two-sided bounds related to weakly diagonally dominant M-matrices with application to digital circuit dynamics, SIAM J. Matrix Anal. App., 17 (1996), pp. 298--312.
39.
A. Shye, T. Moseley, V. Reddi, J. Blomstedt, and D. Connors, Using process-level redundancy to exploit multiple cores for transient fault tolerance, in 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2007, DSN '07, IEEE Computer Society, Los Alamitos, CA, 2007, pp. 297--306, doi:10.1109/DSN.2007.98.
40.
M. Snir et al., Addressing failures in exascale computing, Int. J. High Perform. Comput. (2014), p. 1094342014522573.
41.
K. Teranishi and M. A. Heroux, Toward local failure local recovery resilience model using mpi-ulfm, in Proceedings of the 21st European MPI Users' Group Meeting, EuroMPI/ASIA '14, ACM, New York, 2017, 15, doi:10.1145/2642769.2642774.
42.
J. Thomas, Numerical Partial Differential Equations: Conservation Laws and Elliptic Equations, Texts Appl. Math., Springer, New York, 2013.
43.
A. Toselli and O. Widlund, Domain Decomposition Methods - Algorithms and Theory, Springer Ser. Comput. Math., Springer, Berlin, 2005, doi:10.1007/b137868.
44.
R. S. Varga, Matrix Iterative Analysis, Springer Ser, Comput. Math. 27, Springer, Berlin 2009.

Information & Authors

Information

Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing
Pages: C1 - C28
ISSN (online): 1095-7197

History

Submitted: 9 December 2015
Accepted: 19 October 2016
Published online: 5 January 2017

Keywords

  1. elliptic PDE
  2. maximum principle
  3. discrete bounds
  4. resilience
  5. exascale computing
  6. domain decomposition

MSC codes

  1. 35J15

Authors

Affiliations

Funding Information

U.S. Department of Energy https://doi.org/10.13039/100000015 : DE-SC0010540

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media