An Arbitrary Precision Scaling and Squaring Algorithm for the Matrix Exponential

The most popular algorithms for computing the matrix exponential are those based on the scaling and squaring technique. For optimal efficiency these are usually tuned to a particular precision of floating-point arithmetic. We design a new scaling and squaring algorithm that takes the unit roundoff of the arithmetic as input and chooses the algorithmic parameters in order to keep the forward error in the underlying Padé approximation below the unit roundoff. To do so, we derive an explicit expression for all the coefficients in an error expansion for Padé approximants to the exponential and use it to obtain a new bound for the truncation error. We also derive a new technique for selecting the internal parameters used by the algorithm, which at each step decides whether to scale or to increase the degree of the approximant. The algorithm can employ diagonal Padé approximants or Taylor approximants and can be used with a Schur decomposition or in transformation-free form. Our numerical experiments show that the new algorithm performs in a forward stable way for a wide range of precisions and that the most accurate of our implementations, the Taylor-based transformation-free variant, is superior to existing alternatives.

1. Introduction. The exponential of a matrix has been the subject of much research in the 150 years or so since Laguerre first defined it [28], thanks to its many applications and in particular its central role in the solution of differential equations. Several equivalent definitions of this matrix function exist [18, Table 10.1], of which perhaps the most well known is the representation via its Taylor series expansion: the exponential of A P C nˆn is the matrix Since the analogous series expansion of e z , for z P C, has an infinite radius of convergence, the power series (1.1) is convergent for any A P C nˆn [18,Thm. 4.6], and truncating it to the first few terms gives a crude algorithm for approximating e A . This method is known to be unsatisfactory-so much so that Moler and Van Loan [33], [34] take it as a lower bound on the performance of any algorithm for computing the matrix exponential. The most popular method for computing the exponential of a matrix is the scaling and squaring algorithm paired with Padé approximation. This technique, originally proposed by Lawson [29], and further developed and analyzed by various authors over the past half-century, proves remarkably reliable in finite precision arithmetic, but its numerical stability is not fully understood. The method owes its name to the identity where r km pzq is the rk{ms Padé approximant to e z at 0 and the nonnegative integers k, m, and s are chosen so that r km p2´sAq achieves a prescribed accuracy while minimizing the computational cost of the algorithm. In practice, diagonal approximants r m :" r mm are the most common choice, as symmetries in the coefficients of the numerator and denominator enable an efficient evaluation of r m pAq.
In recent years there has been a sharp rise of interest in multiprecision computation, and the number of programming languages that support arbitrary precision floating-point arithmetic, either natively or through dedicated libraries, is growing. In many cases, a wide range of arbitrary precision linear algebra kernels is available. Numerical routines for the evaluation of matrix functions are also sometimes provided, as we now explain.
The computer algebra systems Maple [30] and Mathematica [31] offer functions that can evaluate in arbitrary precision real matrix powers, the matrix logarithm, the matrix exponential, and a function that computes f pAq given a scalar function f and a square matrix A. The open source computer algebra system Sage [39], [44] supports arbitrary precision floating-point arithmetic, but does not implement any algorithms for the evaluation of matrix functions.
Turning to software focused on floating-point arithmetic, the mpmath library [26] for Python provides functions for evaluating in arbitrary precision a wide range of matrix functions, including real powers, exponential, logarithm, sine, and cosine. MATLAB does not support arbitrary precision floating-point arithmetic natively, but arbitrary precision floating-point data types are provided by the Symbolic Math Toolbox [41] and the Multiprecision Computing Toolbox [35]. Both toolboxes implement algorithms for the matrix square root, the exponential, the logarithm, and general matrix functions, and the Multiprecision Computing Toolbox also includes the hyperbolic and trigonometric sine and cosine of a matrix. Finally, the Julia language [6] supports multiprecision floating-point numbers by means of the built-in data type BigFloat, which provides only a few basic linear algebra kernels for arbitrary precision computation, and the ArbFloats package, a wrapper to the C library Arb [25] for arbitrary-precision ball arithmetic, which is capable of computing the matrix square root and exponential.
The algorithms underlying the functions described above are not publicly available, to our knowledge. Nor are details of the implementations (albeit embodied in the source code of the open source packages), which in some cases may involve symbolic arithmetic.
The MATLAB function expm is a careful implementation of the algorithm of Al-Mohy and Higham [2], which relies on diagonal Padé approximants and exploits precomputed constants θ m that specify how small the 1-norm of certain powers of a matrix A must be in order for r m pAq to provide an accurate approximation to e A in IEEE double precision arithmetic. These constants are obtained by combining a floating-point backward error analysis with a mix of symbolic and high precision computations, and, at the price of a computationally expensive algorithm design stage, provide a very efficient algorithm. For arbitrary precision computations, however, a new approach is required, since this procedure, despite being in principle repeatable for any given precision, is impractical to carry out when the accuracy at which the function should be evaluated is known only at runtime and should hence be treated as an input parameter to the algorithm.
The only published algorithm that we are aware of for computing the matrix exponential in arbitrary precision is that of Caliari and Zivcovich [7], which employs a scaling and squaring algorithm based upon Taylor approximation. It includes a new shifting technique less prone to overflow than the classic approach in [18, sect. 10.7.3] and a novel way to compute at runtime a bound on the backward error of the truncated Taylor series. The underlying backward error analysis relies on an explicit series expansion for the backward error of truncated Taylor series approximants [40] that does not readily extend to general Padé approximants, and the technique used to bound the error relies on a conjecture on the decay rate of the terms of this series expansion.
The goal of this work is to develop an algorithm for evaluating the exponential of a matrix in arbitrary precision floating-point arithmetic that can be used with diagonal Padé approximants or Taylor approximants and is fully rigorous. We wish to avoid symbolic computation and we are particularly interested in precisions higher than double. The algorithms we develop work in lower precision arithmetic as well, but they can suffer from overflow or underflow when formats with limited range, such as IEEE half precision, are used.
The techniques discussed here, together with those in [12], provide algorithms for evaluating in arbitrary precision most matrix functions that appear in applications. In particular, the inverse scaling and squaring algorithms for the matrix logarithm developed in [12] can be adapted in a straightforward way to the evaluation of fractional powers of a matrix [21], [22], whereas the algorithms proposed here can be used to compute trigonometric [1], [4], [15], [18,Chap. 12], [20], [23] and hyperbolic functions [1], [8], [20], and their inverses [5], by relying on functional identities involving only the matrix exponential.
The broad need for arbitrary precision matrix functions is clear from their inclusion in the software mentioned above. The need to compute the matrix exponential to high precision is needed, for example, in order to compute accurate solutions to the burnup equations in nuclear engineering [38]. Our particular interest in the matrix exponential stems not only from its many applications but also from algorithm development. Estimating the forward error of algorithms for matrix functions requires a reference solution computed in higher precision, and an arbitrary precision algorithm for the matrix exponential can be used both for the exponential and for other types of functions as mentioned above. Furthermore, such an algorithm allows us to estimate the backward error of algorithms for evaluating matrix functions defined implicitly by equations involving the exponential, such as the logarithm [3], [12], the Lambert W function [13], and inverse trigonometric and hyperbolic functions [5].
After giving some notation and background material in the next section, we derive in section 3 a new bound on the forward error of Padé approximants to the matrix exponential. We also make a conjecture that, if true, would lead to a more cheaply computable error bound. In section 4 we develop a novel algorithm for evaluating the exponential of a matrix in arbitrary precision. In section 5 we test experimentally several versions of this algorithm and compare their performance with that of existing algorithms. In section 6 we summarize our findings and discuss future lines of research.
2. Notation and background. We denote by R`" tx P R : x ě 0u the set of nonnegative real numbers, by N the set of nonnegative integers, and by }¨} any consistent matrix norm. The spectrum of a square matrix A P C nˆn is denoted by σpAq, its spectral radius by ρpAq " maxt|λ| : λ P σpAqu, and the unit roundoff of floating-point arithmetic by u. Given f : C nˆn Ñ C nˆn and A P C nˆn , we measure the sensitivity of f pAq by means of the relative condition number which is given explicitly by [18,Thm. 3.1] where D f : C nˆn Ñ C nˆn is the Fréchet derivative of f at A, which is the unique linear map that, for all E P C nˆn , satisfies f pA`Eq " f pAq`D f pAqrEs`op}E}q.
3. Padé approximation of the matrix exponential. The state of the art scaling and squaring algorithm for the matrix exponential relies on a bound on the relative backward error of Padé approximation in order to select suitable algorithmic parameters [2]. This approach requires an expensive precision-dependent design step that is unpractical to carry out when the precision at which the computation will be performed is known only at runtime. For that reason, we prefer to use a bound on the forward (truncation) error of the Padé approximants to the exponential that is cheap to evaluate at runtime.
Let f be a complex function analytic at 0, and let k, m P N. The rational function r km pzq " p km pzq{q km pzq is the rk{ms Padé approximant of f at 0 if p km pzq and q km pzq are polynomials of degree at most k and m, respectively, the denominator is normalized so that q km p0q " 1, and f pzq´r km pzq " Opz k`m`1 q.
The numerator and denominator of the rk{ms Padé approximant to the exponential at 0 are [14, Thm. 5.9.1] In our algorithm, we will approximate e A by means of the rational matrix function r km pAq " q km pAq´1p km pAq, which we evaluate by first computing P " p km pAq and Q " q km pAq and then solving a multiple right-hand side linear system in order to obtain X :" Q´1P . The computational efficiency of this method depends entirely on the evaluation scheme chosen to compute P and Q. In the literature, the customary choice is the Paterson-Stockmeyer method [37], which we now briefly recall.
If we use Horner's method with (3.2), then the number of matrix multiplications required to evaluate ppXq is which is approximately minimized by taking either ν " t ? ku or ν " r ? ks. Therefore evaluating r km pXq requires, in general, rounded to the nearest integer. This cost can be considerably reduced for diagonal Padé approximants (for which k " m) by exploiting the identity p m pXq " q m p´Xq, where p m :" p mm and q m :" q mm . By rewriting the numerator as where U o and U e are the sums of the monomials with even and odd powers, respectively, we obtain that q m pAq " U e´Uo . By using ν stages of the Paterson-Stockmeyer method on A 2 , computing U e and U o requires one matrix product to form A 2 and ν´1 matrix multiplications to compute the first ν powers of A 2 ; evaluating the polynomials U e and V require X tm{2u{ν \´η pν, tm{2uq and X tpm´1q{2u{ν \´η pν, tpm´1q{2uq matrix multiplications, respectively; and computing U o requires one additional multiplication by A. Therefore evaluating both p m pAq and q m pAq requires C e ν pmq " ν`1`Z In principle, when designing an algorithm based on Padé approximation, one could use approximants of any order but, for any given cost, it is worth considering only the approximant that will deliver the most accurate result. By definition, this will be that of highest order, thus if evaluating the approximant of order m requires Cpmq matrix multiplications, an algorithm will typically examine only approximants of optimal order m 1 pζq " maxt m : Cpmq " ζ u, for some ζ P N. For truncated Taylor series and diagonal Padé approximants to the exponential, the sequences of optimal orders are [11, eqs. (2.7) and (4.6)] respectively. Note that, for diagonal Padé approximants to the exponential, all optimal orders but b 1 are odd. For a thorough discussion of the effect of rounding errors on the evaluation of matrix polynomials using the scheme (3.2) see [18, 3.1. Forward error. In this section we present a new upper bound on the norm of the forward error of r km pAq as an approximation to e A . In section 4, this bound will play a central role in the design of a scaling and squaring algorithm for computing the matrix exponential in arbitrary precision. The leading term of the truncation error of the rk{ms Padé approximant is known [14, Thm. 5.9.1], since We begin by obtaining all the terms in the series expansion of q km pzqe z´p km pzq, which is closely related to the truncation error in (3.7).
Lemma 3.1. Let r km pzq " p km pzq{q km pzq be the rk{ms Padé approximant to e z at 0. Then for all z P C, (3.9) q km pzqe z´p km pzq " Proof. By equating the coefficients of z k`m`i on the left-and right-hand sides of (3.9), we obtain that c i k,m is the sum, for j from 0 to m, of the jth coefficient of q km pzq multiplied by the pk`m`i´jqth coefficient of the series expansion of e z : We prove (3.10) by induction on m. For m " 1 we have By exploiting the identity`a`1 b˘"`a b˘``a b´1˘, for the inductive step we have This result can be exploited to bound the truncation error of r km pAq. We will use the result in [2, Thm. 4.2(a)]: if f pxq " ř 8 i" c i x i and c i has the same sign for all i ě , then for any X such that is less than the radius of convergence of the series, we have An alternative definition of α d pXq has been recently proposed for the computation of the wave-kernel matrix functions [36]. This more refined strategy requires the computation of }A di } 1 for all d i , d j such that gcdpd i , d j q " 1 and d i d j´di´dj ă k`m. The cost of finding all such pairs is difficult to determine, but it must be at least O`pk`mq logpk`mq˘operations, as there are at least O`pk`mq 2˘p airs to test and Euclid's algorithm for finding the greatest common divisor of two integers a, b P N such that a ă b requires 2 log 2 a`1 operations in the worst case. Moreover, even if all the pairs to be tested were known, the cost of evaluating the bound would increase with k and m, and as both can be potentially large when resorting to high precision, we prefer to use the cheaper bound given by (3.11). However, all the results in this section can be modified by replacing α d pXq with .
For the truncation error of the rk{ms Padé approximant to the matrix exponential, we would like to obtain a bound of the form (3.13) }e X´r km pXq} ď |e α d pXq´r km pα d pXqq|, which would be true if all the nonzero terms in the series expansion at 0 of e z´r km pzq had the same sign. By Lemma 3.1, this would be true if q km pzq´1 had a power series expansion with all coefficients of the same sign. This applies to Taylor approximants t k :" r k0 , since q km pzq´1 " 1, and by (3.12) we can derive the bound For all other even values of m that we have checked, this is not the case. For example, the series expansion for the reciprocal of the denominator of the r2{2s Padé approximant is However, for the algorithm we are designing we are interested only in bounding the forward error of diagonal approximants of optimal degree, and from (3.6) we know that most optimal degrees are, in fact, odd. The evidence suggests that, in this case, the coefficients of the series expansion are indeed one-signed, and we make a conjecture. If this conjecture were true, then we could use the bound (3.13) for all diagonal approximants of degree b i in (3.6) for i P Nz t1u, but since we do not have a proof of the conjecture we will bound the truncation error of r km for any k and m. We will use the next result, which combines Lemma 3.1 and (3.12). Corollary 3.3 (bound on the truncation error of Padé approximants). Let r km pzq " p km pzq{q km pzq be the rk{ms Padé approximant to e z at 0. Then for X P C nˆn and any positive integer d such that dpd´1q ď k`m`1, where for the last inequality we used the fact that the coefficients of the series expansion of q km pzqe z´p km pzq all have the same sign, by Lemma 3.1. Since the norm of q´1 km pXq does not depend on α d pXq, the bound in (3.14) is nondecreasing in α d pXq and therefore is minimized by choosing for d the value Depending on the size of k and m, this choice might require the estimation of α d pXq for too many values of d, and thus be unpractical. On the other hand, it has been observed [2] that the sequence pα d pXqq dPN is typically roughly decreasing, so it is reasonable to use the considerably cheaper approximation α d rk{ms pXq. In our algorithm, we adopt an intermediate approach that has the same cost as the computation of α d rk{ms pXq, but improves on it by reusing previously computed quantities. We discuss this in detail in section 4.

4.
A multiprecision scaling and squaring algorithm. In this section we develop a novel scaling and squaring method for computing the matrix exponential in arbitrary precision floating-point arithmetic. Our algorithm differs from traditional scaling and squaring approaches, such as those of Al-Mohy and Higham [2] and Caliari and Zivcovich [7], in several respects. First, it relies on a bound on the forward error rather than on the backward error of the Padé approximants to the matrix exponential and avoids the use of any precomputed precision-dependent constants by evaluating at runtime the bound (3.14) for some choice of d. Moreover, unlike scaling and squaring algorithms for double precision based on diagonal Padé approximants [2], [17], [18,Alg. 10.20], [19], which use approximants of order at most 13 and a nonzero scaling parameter only if the approximant of highest degree is expected not to deliver either a truncation error smaller than u or an accurate evaluation of r 13 pAq, our algorithm blends the two phases together and tries to determine both parameters at the same time.
Our arbitrary precision scaling and squaring algorithm for the computation of e A is given in Algorithm 4.1. Besides the matrix A P C nˆn , the algorithm accepts several additional input arguments.
‚ The arbitrary precision floating point parameter u P R`specifies the unit roundoff of the working precision of the algorithm. ‚ The Boolean parameter use taylor specifies the kind of Padé approximants the algorithm will use: truncated Taylor series if set to true, diagonal Padé approximants otherwise. ‚ The parameter u bnd P R`specifies the unit roundoff of the precision used to evaluate }q mi p2´sAq´1} 1 in (4.1) below. The value of u bnd is ignored if use taylor is set to true. ‚ The vector m P N N , sorted in ascending order, specifies what orders of Padé approximants the algorithm can consider. The algorithm will select i between 1 and N , and then evaluate either the truncated Taylor series of order m i or the rm i {m i s Padé approximant, depending on the value of use taylor. ‚ The nonnegative integer s max specifies the maximum number of binary powerings the algorithm is allowed to compute during the squaring stage, or, equivalently, the maximum number of times the matrix can be multiplied by 1 2 during the initial scaling stage. ‚ The function parameter ζ : R`Ñ R`specifies a precision-dependent value that is used to predict whether the evaluation of q mi pAq will be accurate or not. This parameter is not used when use taylor is set to true. We now discuss the outline of Algorithm 4.1. The variables A, γ, and α min are assumed to be available within the following code fragments (that is, their scope is global). The Boolean variable use taylor chooses between the auxiliary functions in Fragments 4.3 and 4.5, tailored to the case of diagonal Padé approximants and truncated Taylor series, respectively. Finally, we use the notation rx, x, . . . s, to denote a vector whose elements are all initialized to x and whose length is unimportant. We Algorithm 4.1: Scaling and squaring algorithm for the matrix exponential.
Given A P C nˆn , this algorithm computes an approximation to e A in floatingpoint arithmetic with unit roundoff u using a scaling and squaring method based upon Padé approximants. The pseudocode of evalBoundDiag and evalPadeDiag is given in Fragment 4.3, that of evalBoundTayl and evalPadeTayl in Fragment 4.5, and that of recompDiags in Fragment 4.2.
assume that very few of its entries will take a value different from the default, and thus that such a vector can be stored in a memory-efficient way.
The algorithm starts by determining a suitable order and a scaling parameter s for the scaling and squaring method. To this end, on line 10-11 it sets s and i to 0 and then increments them, trying to find a choice for which the right-hand side of (3.14), for X " 2´sA, k " m i , and m " k or m " 0, is smaller than uψp2´sAq, where ψpXq approximates › › e X › › 1 . As long as this condition is not satisfied, two heuristics are used to decide which parameter is more convenient to change. One approach is aimed at 1 function recompDiagspA P C nˆn , X P C nˆn q Ź Compute main diagonal and first upper-diagonal of X « e A from A.
keeping the evaluation of the Padé approximant as accurate as possible, by taking into account the conditioning of q mi ; being specific to diagonal approximants this is discussed in section 4.1. On the other hand, we noticed that when α d p2´sAq " 1, the bound (3.14) can sometimes decrease exceedingly slowly as m increases, leading to the use of an approximant of degree much larger than needed, which in turn causes loss of accuracy and unnecessary computation. We found that monitoring the rate at which our bound on the truncation error of the approximant decreases provides an effective strategy to prevent this from happening. In particular, we increment s when the bound on the truncation error does not decrease at least quadratically, that is, when δ old ă δ 2 , where δ old and δ are the values of the error bound at the previous and current iteration of the while loop on line 16 of Algorithm 4.1, respectively.
As soon as a combination of scaling parameter and Padé approximant order is found, the algorithm computes Y by evaluating the Padé approximant (diagonal or Taylor) of order m i at 2´sA, and finally computes e A « Y 2 s , by applying s steps of binary powering to Y . If A is upper quasi-triangular, then in order to improve the accuracy of the final result, the function recompDiags in Fragment 4.2 is used to recompute the diagonal and first upperdiagonal of the intermediate matrices from the elements of A, as recommended by Al-Mohy and Higham [2].
In the next two sections, we discuss how the functions evalBound and evalPade can be implemented efficiently for diagonal Padé approximants and truncated Taylor series.

Diagonal Padé approximants.
When use taylor is set to false, that is, when diagonal Padé approximants are being considered, the condition that needs to be tested on lines 15 and 22 of Algorithm 4.1 is, for some d such that dpd´1q ă 2 m i`1 , As discussed in section 3, the choice of α d pXq that would guarantee the best bound is α for at most two values of d independently of the value of m i , and is often not far from the best choice, since the sequence`}A d } 1{d˘d PN is typically roughly decreasing [2]. However, it is sometimes possible to obtain a better bound at almost no extra cost, by reusing quantities computed during previous steps of the algorithm. Observe that, since }p2´sAq d } 1{d " 2´s}A d } 1{d and thus α d p2´sAq " 2´sα d pAq, it is enough to estimate the norm of powers of A and then scale their value as required. Moreover, since the algorithm considers the approximants in nondecreasing order of cost, the value of d rmi{mis is nondecreasing in i. Therefore, in (4.1) we can replace α d p2´2Aq by 2´sα min , where α min is a variable that keeps track of the smallest value of α d rm i {m i s pXq computed so far, and is updated only when a new value α d rm j {m j s pXq ă α min is found for some j ą i.
Since only the order of magnitude of α min is actually needed, we estimate }A d } 1 by running in precision u bnd the 1-norm estimation algorithm of Higham and Tisseur [24]. This method repeatedly computes the action of A on a tall and skinny matrix without explicitly forming any powers of A, and thus requires only Opn 2 q flops. In the pseudocode, the 1-norm estimation is performed by the function normest1, whose only input is a function that computes the product AX given the matrix X P C nˆt . In order to keep the notation as succinct as possible, anonymous functions are specified using a lambda notation, and λx.f pxq denotes a function that replaces all the occurrences of x in the body of f with the value of its input argument.
By storing the values of }A d } 1 in the global array γ, the 1-norm of each power of A is estimated at most once. Further computational savings can be achieved by computing some carefully chosen powers of A within the algorithm and using them to evaluate of the action of powers of A on a vector, as we will discuss later.
As long as the bound (4.1) is not satisfied, the algorithm can decide to either increment the scaling factor or increase the order of the Padé approximant, since either choice will reduce the truncation error of the approximation. Both options, however, may have an adverse effect on the numerical behavior of the algorithm, since taking a Padé approximant of higher degree may significantly increase the conditioning of the coefficient of the linear system to be solved, thus jeopardizing the accurate evaluation of the approximant, whereas increasing s will increase the number of matrix multiplications that will occur during the squaring phase of the algorithm, which is the most sensitive to rounding errors, as shown by [18,Thm. 10.21].
We solve this dilemma by means of a heuristic that prevents the 1-norm condition number of q kimi p2´sAq from getting too large. In particular, if our estimate κ A " normest1pλx.q m p2´sAq´1xq normest1pλx.q m p2´sAqxq is larger than a constant ζpuq that depends on the unit roundoff u, we update the scaling parameter and leave the order of the Padé approximant unchanged. Otherwise, we increment i and take an approximant of higher order chosen according to the elements in m. In practice, we set ζpuq :" u´1 {8 . For IEEE double precision, this choice gives ζpuq " 2 53{8 « 98.70, which agrees (within a factor of 1.4) with the largest condition number allowed by Al-Mohy and Higham [2, Table 3.1] for double precision.
Within our algorithm, we can exploit the evaluation scheme discussed in section 3 to reduce the computational cost of the evaluation not only of r mi p2´sAq, but also of the term }q mi p2´2Aq} 1 appearing in the bound (4.1).
Since Algorithm 4.1 considers Padé approximants of increasing cost, for diagonal Padé approximants we have that m i ă m j for i ă j. Hence, whenever in Algorithm 4.1 the bound (4.1) is evaluated for the approximant of order m i , we are guaranteed that on line 23 r mj p2´sAq will be evaluated for some j ě i. Since numerator and denominator of the approximant are evaluated by means of the Paterson-Stockmeyer method, we know that at least the first ν " r ? m i s powers of 2´sA will be needed, and since scaling a matrix requires only Opn 2 q flops, it is worth it to compute immediately the first 1 function evalBoundDiagpA P C nˆn , m P N, s P Nq Ź Check (4.1) for r m and estimate κ 1 pq m pAqq. 2 α min Ð optAlphaDiagpA, m, α min q 3 rU e , U o s Ð evalPadeDiagAuxpA, m, s, u bnd q 4 Set working precision to u bnd . 5 rL, U s Ð lupU e´Uo q 6 η Ð normest1pλx.pU´1pL´1xqq 7 Set working precision to u. 8 δ Ð η |q m p2´sα min qe 2´sαmin´p m p2´sα min q| 9 ψ Ð normest1pλx.pU´1pL´1p2U o xqq`xqq 10 κ A Ð η normest1pλx.pU e´Uo qxq 11 return δ, ψ, κ A 12 function evalPadeDiagpA P C nˆn , m P N, s P Nq Ź Evaluate r m p2´sAq. 13 rU e , U o s Ð evalPadeDiagAuxpA, m, s, uq 14 rL, U s Ð lupU e´Uo q 15 return U´1pL´1p2U o`I qq 16 function evalPadeDiagAuxpA P C nˆn , m P N, s P N, u P R`q Ź Evaluate components of p mm pAq and q mm pAq. 17 Set working precision to u. This implementation stores the powers of A 2 in the global array A, which is updated only when it does not already contain the first r ? ms powers of A 2 . Since the number of matrices stored in A changes with m, we introduce a function lengthpAq, that returns the number of positive powers stored in A. In other words, if lengthpAq " , then A contains `1 matrices from A 0 " I to A " A 2 .
Note that although it makes sense, from a performance point of view, to compute U e and U o in lower precision, the elements of A must be computed at precision u in order to be reused to evaluate numerator and denominator of r m p2´sAq. These lower precision approximation of U e and U o can be used to compute a cheap approximation ψpe 2´sA q to }e 2´sA } 1 needed in (4.1), since q m pXq´1p m pXq " 2pU e´Uo q´1U o`I can be evaluated by means of only one multiple right-hand side system solve at precision u bnd .
In addition, the elements of A can be used to reduce the computational cost of estimating the 1-norm of powers of A. In order to estimate }A d } 1 , normest1 computes repeatedly Y :" A d X, where X P C nˆt , with t ! n. If the matrix multiplications are performed from right to left, evaluating Y requires 2dtn 2 flops, but if some of the powers of A are available, the factor d can be reduced to as little as log 2 d. We illustrate the strategy to perform this cost reduction in the function evalPowVecDiag, which evaluates A d X using the powers of A 2 stored in A. We use the two variables r d and , initialized to d and lengthpAq, respectively, to keep track of the state of the computation. The function repeatedly multiplies X by A " A 2 for t " t r d{p2 qu times, that is, until r d becomes smaller than 2 . At this point, the algorithm updates r d and , setting the former to the number of matrix multiplications left to perform, r d´t, and the latter to the largest integer smaller than the new value of r d. Since A contains powers of A 2 rather than A, an additional multiplication by A is necessary for odd d.
This algorithm requires`minpC t ? miu pm i q, C r ? mis pm i qq`s˘n 3 flops in precision u, for evaluating r mi pAq and performing the final squaring phase, and`2 ? m i`2 3˘i n 3 flops in precision u bnd for evaluating and factorizing q mi p2´sAq, in order to check whether the bound (4.1) is satisfied.

Taylor approximants.
Truncated Taylor series are appealing Padé approximants to use in conjunction with bound (3.14), as the property that q k0 pxq " 1 enables us to eliminate the computation of q mi p2´sAq, the most expensive term to evaluate in (4.1), and thus obtain substantial computational savings. Even though for truncated Taylor series there is no need to evaluate the approximant when evaluating the bound, the function evalBoundTayl updates the array A, which in this case stores powers of A rather than A 2 . In fact, these powers can be used to reduce the cost of estimating }A d } 1 as well as }e A } 1 . The elements of A are estimated by means of normest1, and the action of the powers of A on a vector is computed by means of the function evalPowVecTayl in Fragment 4.5, which uses the elements in A analogously to evalPowVecDiag.
For ψ in the bound (4.1) one can use a lower bound on the 1-norm of e A . The inequality }e A } 1 ě e´} A}1 [18,Thm. 10.10] can be exploited at no extra cost, but being typically not very sharp can potentially lead to unnecessary computation. Van 1 function evalPolyPSps P N, β P C t , ν P Nq Ź Evaluate ř t "0 β p2´sAq using elements of A.
which is typically tighter than the previous bound, and always so when λ˚ą 0. Estimating λ˚, however, requires either the eigendecomposition of A, or the solution of a family of shifted linear systems [32], and both solutions might be unpractical in that they require Opn 3 q flops for dense matrices. A practical estimate that can be computed with only Opn 2 q extra cost is provided by the function estimateNormExp in Fragment 4.5, which relies on the approximation If only the elements of A already computed on line 2-3 of evalBoundTayl are used, computing this estimate requires only 2 n 2 flops. Additional savings can be gained by noting that ξ s `1 can be obtained from ξ s with only one matrix scaling and one matrix sum, and the full cost of 2 n 2 flops need not be paid as long as s does not change.
Overall, this algorithm requires`2 ? m i`s˘n 3 floating-point operations.

Schur-Padé variants.
If A is normal (A˚A " AA˚) and a multiprecision implementation of the QR algorithm is available, diagonalization in higher precision is another approach for computing e A . More generally, a (real) Schur decomposition can be used: A " QT Q˚, where T, Q P C nˆn are, respectively, upper triangular and unitary if A has complex entries and upper quasi-triangular and orthogonal if A has real entries. Then e A " Qe T Q˚. In our experiments, we consider a Schur-Padé approach that computes the Schur decomposition of the input matrix and exploits Algorithm 4.1 to compute the exponential of its triangular factor.
Overall, this algorithm requires`28``mintC t ? miu pm i q, C r ? mis pm i qu`s˘{3˘n 3 and 28`p2 ? m i`s q{3˘n 3 flops for diagonal Padé approximants and truncated Taylor series, respectively.

Numerical experiments.
We now test the algorithm derived in section 4 and compare it with two existing codes for computing the matrix exponential in arbitrary precision floating point arithmetic. We consider two test sets: H, which Fragment 4.5: Auxiliary functions for truncated Taylor series t m .
17 return mintmaxtγ d , γ d`1 u, α min u 18 function evalPowVecTaylpd P N, X P C nˆt q Ź Compute A d X using elements in A.

19
Ð lengthpAq 20 while d ą 0 do contains 35 Hermitian matrices, and N , which consists of 97 non-Hermitian matrices. These matrices, of size ranging between 2 and 1000, are taken from the literature of the matrix exponential, from a collection of benchmark problems for the burnup equations [27], [43], and from the MATLAB gallery function. The experiments were performed using the 64-bit (glxna64) version of MATLAB 9.5 (R2018b Update 3) on a machine equipped with an Intel I5-3570 processor running at 3.40GHz and with 8GB of RAM. The code uses the Multiprecision Computing Toolbox (version 4.4.7.12739) [35], which provides the class mp to represent arbitrary precision floating-point numbers and overloads all the MATLAB functions we need in our implementations. We note that this toolbox allows the user to specify the number of decimal digits of working precision, but not the number of bits in the fraction of its binary representation, thus, in this section, whenever we refer to d (decimal) digits of precision, we mean that the working precision is set using the command mp.Digits(d). The MATLAB code that runs the tests in this section is available on GitHub. 1 In our experiments, we compare the following codes. ‚ expm, the built-in function expm of MATLAB, which implements the algorithm of Al-Mohy and Higham [2], and is intended for double precision only. ‚ exp mct, the expm function provided by the Multiprecision Computing Toolbox. ‚ exp otf, the algorithm by Caliari and Zivcovich [7], a shifted scaling and squaring method based on truncated Taylor series. The matrix is shifted by tracepAq{n, and (1.2) is replaced by The order of the approximant is chosen by estimating at runtime a bound on the backward error of the approximant in exact arithmetic. ‚ exp d, an implementation of Algorithm 4.1 with use taylor " false. ‚ exp t, an implementation of Algorithm 4.1 with use taylor " true. ‚ exp sp d, an implementation of the Schur-Padé approach discussed in section 4.3 using Algorithm 4.1, with use taylor " false, for the triangular Schur factor. ‚ exp sp t, an implementation of the Schur-Padé approach discussed in section 4.3 using Algorithm 4.1, with use taylor " true, for the triangular Schur factor. In our implementations of Algorithm 4.1, we set u bnd " 2´5 3 , " u, s max " 100, and the entries of m to the elements of (3.5) smaller than 1000 and those of (3.6) smaller than 400, for truncated Taylor series and diagonal Padé approximant, respectively.
We do not include the function expm provided by the Symbolic Math Toolbox in our tests since, for precision higher than double, it appears to be extending the precision internally, using a number of extra digits that increases with the working precision. As a result, the forward error of the algorithm is typically several orders of magnitude smaller than machine epsilon, when computing with 20 or more digits, but tends to be larger than the unit roundoff u, for u « 10´2 0 or larger. We note that the accuracy drops for matrices that are nondiagonalizable, singular, or have ill-conditioned eigenvectors. Moreover, the Symbolic Math Toolbox implementation is rather slow on moderately large matrices (n Á 50, say), which makes this code unsuitable for extended testing.
In our tests, we assess the accuracy of the solution r X computed by an algorithm running with d digits of precision by means of the relative forward error }X´r X} 1 {}X} 1 , where X is a reference solution computed using exp mct with 2d significant digits. Since the magnitude of forward errors depends not only on the working precision but also on the conditioning of the problem, in our plots we com-   pare the forward error of the algorithms with κ exp pAqu, where κ exp pAq is the 1-norm condition number [18,Chap. 4] of the matrix exponential of A (see (2.1)). We estimate it in double precision using the funm condest1 function provided by the Matrix Function Toolbox [16] on expm.
When plotting the forward error, we choose the limits of the y-axis in order to show the area where most of the data points lie, and move the outliers to the closest edge of the box containing the plot. In several cases, we present our experimental results with the aid of performance profiles [10], and adopt the technique of Dingle and Higham [9] to rescale values smaller than u.

Comparison with expm in double precision.
In this experiment, we compare the performance of exp d, exp t, exp sp d, and exp sp t running in IEEE double precision, with that of expm and exp otf. The purpose of this experiment is to verify that the new algorithms are broadly competitive with expm; since expm is optimized for double precision, we do not expect them to be as efficient. Figure 5.1a compares the forward error of the algorithms on the matrices in our test sets, sorted by decreasing condition number. The performance profile in Figure 5.1b presents the results for N on a by-algorithm rather than by-matrix basis: for a given method, the height of the line at θ " θ 0 represents the fraction of matrices in the data set for which the relative forward error is within a factor θ 0 of the error of the algorithm that delivers the most accurate result for that matrix.
For Hermitian matrices, we do not provide a performance profile, as the error plot clearly shows that while exp otf, exp t, and exp d all provide forward error well below κ exp pAqu, exp otf is consistently the most accurate on that data set. The performance of expm is the same as that of exp sp d because both implementations reduce to the evaluation of the scalar exponential at the eigenvalues of A when A is Hermitian.
For the matrices in N , the errors of expm, exp otf, exp d, and exp t are approximately bounded by κ exp pAqu, with the algorithms based on truncated Taylor series being overall more accurate than those based on diagonal Padé approximants. The algorithms based on the Schur decomposition of A tend to give somewhat larger errors, with the result that the performance profile curves are the least favorable. The results show that, as for double precision, transformation-free algorithms tend to produce a more accurate approximation of e A than those based on the Schur decomposition of A. The code exp mct typically delivers the least accurate results, with a forward error usually larger than κ exp pAqu.
On the Hermitian matrices in the test set, exp otf is typically the most accurate algorithm, having a surprising ability to produce errors much less than κ exp pAqu. On the set of non-Hermitian matrices, exp otf and exp t are the most accurate algorithms, with exp t having the superior performance profile. On this dataset, the least accurate of our versions of Algorithm 4.1 is exp sp t, closely followed by exp sp d; the forward error of both, despite being typically smaller than that of exp mct, is often slightly larger than κ exp pAqu, especially for the better conditioned of our test matrices. Finally, exp d performs better than Schur-based variants, but is slightly less accurate than exp otf and exp t on most of the matrices in this data set. We remark, however, that its forward error is typically smaller than κ exp pAqu when that of the other two Taylor-based algorithms is.
We note that the tests of exp otf in [7] have precision target 10´2 5 or larger, which is between double and quadruple precision. This experiment shows that exp otf maintains its good accuracy up to much higher precisions. Table 5.1, we compare the execution time of our MAT-LAB implementations of exp t and exp d, running in quadruple precision (d " 34 and unit roundoff u " 2´1 13 ), on the matrices A = 1000 * triu(ones(n),1); B = zeros(n); B(n+1:n+1:n^2) = 1:n-1; % Upper bidiagonal. C = gallery('lotkin', n); where n ranges between 10 and 1000. The first two matrices are strictly upper triangular, thus singular, but the 1-norm of A is much larger than that of B. The full matrix C is nonsingular, but the exponential decay of the absolute value of its eigenvalues makes it extremely ill-conditioned. For each matrix, we report the overall execution time in seconds of the two implementations (T tot ), specifying how much time is spent evaluating the scalar bound (T bnd ), evaluating the approximant on the input matrix (T eval ), and performing the final squaring stage (T sqr ).

Code profiling. In
For exp d, the evaluation of the forward bound on the truncation error of the Padé approximant is typically the most expensive operation for small matrices, and even when the size of the matrix increases the cost of this operation remains nonnegligible, as the estimation of the quantity › › pq km pAqq´1 › › 1 , which appears in the bound (4.1), has cubic dependence on the size of the input matrix. For exp t, on the other hand, T bnd         depends only quadratically on n, and tends to become relatively small for matrices of size larger than 100.
6. Conclusions. State-of-the-art scaling and squaring algorithms for computing the matrix exponential typically rely on a backward error analysis in order to scale the matrix and then select an appropriate Padé approximant to meet a given accuracy threshold. The result of this analysis is a small set of precision-dependent constants that are hard to compute but can be easily stored for subsequent use. This approach is not viable in multiprecision environments, where the working precision is known only at run time and is an input argument of the algorithm rather than a property of the underlying floating-point number system. For truncated Taylor series, it is possible to estimate on-the-fly a bound on the backward error of the approximation [7], but this technique relies on a conjecture and does not readily generalize to other Padé approximants.
We have developed a new algorithm (Algorithm 4.1) based on Padé approximation for computing the matrix exponential in arbitrary precision. In particular, we derived a new representation for the truncation error of a general rk{ms Padé approximant and showed how it can be used to compute practical error bounds for truncated Taylor series and diagonal Padé approximants. In the first case, the bound is cheap to compute, requiring only Opn 2 q flops. For diagonal Padé approximants the new bound requires Opn 3 q low-precision flops, but if Conjecture 3.2 turns out to be true then this cost will reduce to Opn 2 q flops.
According to our experimental results, Algorithm 4.1 in transformation-free form using truncated Taylor series is the best option for computing the matrix exponential in precisions higher than double. In particular, the algorithm is the most accurate on non-Hermitian matrices. For Hermitian matrices, it is natural to compute e A via a spectral decomposition (as does the MATLAB function expm), but an interesting finding is that on our Hermitian test matrices this approach (to which exp sp d and exp sp t effectively reduce) is less accurate than Algorithm 4.1 and the algorithm of [7]. When developing the algorithm and testing it experimentally, we focused on precisions higher than double. We believe that a different approach is needed to address the computation of the matrix exponential in low precision arithmetic, such as IEEE half precision. Indeed, due to its very limited range this number format is prone to underflow and overflow, which makes accuracy and robustness difficult to achieve. How to handle these challenges will be the subject of future work.