A Multiprecision Derivative-Free Schur–Parlett Algorithm for Computing Matrix Functions

. The Schur–Parlett algorithm, implemented in MATLAB as funm , computes a function f ( A ) of an n × n matrix A by using the Schur decomposition and a block recurrence of Parlett. The algorithm requires the ability to compute f and its derivatives, and it requires that f has a Taylor series expansion with a suitably large radius of convergence. We develop a version of the Schur–Parlett algorithm that requires only function values and uses higher precision arithmetic to evaluate f on the diagonal blocks of order greater than 2 (if there are any) of the reordered and blocked Schur form. The key idea is to compute by diagonalization the function of a small random diagonal perturbation of each triangular block, where the perturbation ensures that diagonalization will succeed. This multiprecision Schur–Parlett algorithm is applicable to arbitrary functions f and, like the original Schur–Parlett algorithm, it generally behaves in a numerically stable fashion. Our algorithm is inspired by Davies’s randomized approximate diagonalization method, but we explain why that is not a reliable numerical method for computing matrix functions. We apply our algorithm to the matrix Mittag–Leﬄer function and show that it yields results of accuracy similar to, and in some cases much greater than, the state of the art algorithm for this function. The algorithm will be useful for evaluating any matrix function for which the derivatives of the underlying function are not readily available or accurately computable.

1. Introduction. The need to compute matrix functions arises in many applications in science and engineering. Specialized methods exist for evaluating particular matrix functions, including the scaling and squaring algorithm for the matrix exponential [1], [28] Newton's method for matrix sign function [23,Chap. 5], [33], and the inverse scaling and squaring method for the matrix logarithm [2], [27]. See [25] for links to software for these and other methods. For some functions a specialized method is not available, in which case a general purpose algorithm is needed. The Schur-Parlett algorithm [8] computes a general function f of a matrix, with the function dependence restricted to the evaluation of f on the diagonal blocks of the re-ordered and blocked Schur form. It evaluates f on the nontrivial diagonal blocks via a Taylor series, so it requires the derivatives of f and it also requires the Taylor series to have a sufficiently large radius of convergence. However, the derivatives are not always available or accurately computable.
We develop a new version of the Schur-Parlett algorithm that requires only the ability to evaluate f itself and can be used whatever the distribution of the eigenvalues. Our algorithm handles close or repeated eigenvalues by an idea inspired by Davies's idea of randomized approximate diagonalization [7] together with higher precision arithmetic. We therefore assume that as well as the arithmetic of the working precision, with unit roundoff u, we can compute at a higher precision with unit roundoff u h < u, where u h can be arbitrarily chosen. Higher precisions will necessarily be done in software, and so will be expensive, but we aim to use them as little as possible.
We note that multiprecision algorithms have already been developed for the matrix exponential [12] and the matrix logarithm [11]. Those algorithms are tightly coupled to the functions in question, whereas here we place no restrictions on the function. Indeed the new algorithm greatly expands the range of functions f for which we can reliably compute f (A). A numerically stable algorithm for evaluating the Lambert W function of a matrix was only recently developed [13]. Our algorithm can readily compute this function, as well as other special functions and multivalued functions for which the Schur-Parlett algorithm is not readily applicable.
In section 2 we review the Schur-Parlett algorithm. In section 3 we describe Davies's randomized approximate diagonalization and explain why it cannot be the basis of a reliable numerical algorithm. In section 4 we describe our new algorithm for evaluating a function of a triangular matrix using only function values. In section 5 we use this algorithm to build a new Schur-Parlett algorithm that requires only function values and we illustrate its performance on a variety of test problems. We apply the algorithm to the matrix Mittag-Leffler function in section 6 and compare it with a special purpose algorithm for this function. Conclusions are given in section 7.
We will write "normal (0,1) matrix" to mean a random matrix with elements independently drawn from the normal distribution with mean 0 and variance 1. We will use the Frobenius norm, A F = ( i,j |a ij | 2 ) 1/2 , and the p-norms A p = max{ Ax p : 2. Schur-Parlett algorithm. The Schur-Parlett algorithm [8] for computing a general matrix function f (A) is based on the Schur decomposition A = QT Q * ∈ C n×n , with Q ∈ C n×n unitary and T ∈ C n×n upper triangular. Since f (A) = Qf (T )Q * , computing f (A) reduces to computing f (T ), the same function evaluated at a triangular matrix. If the function of the square diagonal blocks F ii = f (T ii ) can be computed, the off-diagonal blocks F ij of f (T ) can be obtained using the block form of Parlett's recurrence [31], from which F ij can be computed either a block superdiagonal at a time or a block row or block column at a time. To address the potential problems caused by close or equal eigenvalues in two diagonal blocks of T , Davies and Higham [8] devised a scheme with a blocking parameter δ > 0 to reorder T into a partitioned upper triangular matrix T = U * T U = ( T ij ) by a unitary similarity transformation such that • eigenvalues λ and µ from any two distinct diagonal blocks T ii and T jj satisfy min |λ − µ| > δ, and • the eigenvalues of every block T ii of size larger than 1 are well clustered in the sense that either all the eigenvalues of T ii are equal or for every eigenvalue λ 1 of T ii there is an eigenvalue λ 2 of T ii with λ 1 = λ 2 such that |λ 1 − λ 2 | ≤ δ. To evaluate f ( T ii ), the Schur-Parlett algorithm expands f in a Taylor series about σ = trace( T ii )/m i , the mean of the eigenvalues of T ii ∈ C mi×mi , truncating the series after an appropriate number of terms. All the derivatives of f up to a certain order are required in (2.2), where that order depends on how quickly the powers of T ii −σI decay. Moreover, for the series (2.2) to converge we need λ−σ to lie in the radius of convergence of the series for every eigenvalue λ of T ii . Obviously, this procedure for evaluating f ( T ) may not be appropriate if it is difficult or expensive to accurately evaluate the derivatives of f or if the Taylor series has a finite radius of convergence.
For normal matrices, V can be chosen to be unitary and this approach is an excellent way to compute f (A). However, for nonnormal A the eigenvector matrix V can be ill-conditioned, in which case an inaccurate computed f (A) can be expected in floating-point arithmetic [23, sect. 4.5].
A way to handle a nonnormal matrix is to perturb it before diagonalizing it. Davies [7] suggested perturbing A to A = A + E, computing the diagonalization This approach relies on the fact that even if A is defective, A + E is likely to be diagonalizable because the diagonalizable matrices are dense in C n×n . Davies measured the quality of the approximate diagonalization by the quantity where the condition number κ 2 (V ) = V 2 V −1 2 and can be thought of as the unit roundoff. Minimizing over E and V (since V is not unique) gives which is a measure of the best approximate diagonalization that this approach can achieve. Davies conjectured that (3.2) σ(A, ) ≤ c n 1/2 for some constant c n , where A 2 ≤ 1 is assumed, and he proved the conjecture for Jordan blocks and triangular Toeplitz matrices (both with c n = 2) and for arbitrary 3 × 3 matrices (with c 3 = 4). Davies's conjecture was recently proved by Banks, Kulkarni, Mukherjee, and Srivastava [4, Thm. 1.1] with c n = 4n 3/2 + 4n 3/4 ≤ 8n 3/2 . Building on the solution of Davies' conjecture a randomized algorithm with low computational complexity is developed in [5] for approximately computing the eigensystem. Note that (3.2) suggests it is sufficient to choose E such that E 2 ≈ 1/2 in order to obtain an error of order 1/2 . As we have stated it, the conjecture is over C n×n . Davies's proofs of the conjecture for Jordan blocks and triangular Toeplitz matrices have E real when A is real, which is desirable. In the proof in [4], E is not necessarily real when A is real. However, Jain, Sah, and Sawhney [26] have proved the conjecture for real A and real perturbations E.
The matrix E can be thought of as a regularizing perturbation for the diagonalization. For computing matrix functions, Davies suggests taking E as a random matrix and gives empirical evidence that normal (0,1) matrices E scaled so that E 2 ≈ u 1/2 are effective at delivering a computed result with error of order u 1/2 when A 2 ≤ 1. One of us published a short MATLAB code to implement this idea [24], 1 as a way of computing f (A) with error of order u 1/2 . However, this approach does not give the change in f induced by E is as much as L f (A) 2 E 2 , and the factor L f (A) 2 can greatly exceed 1. A simple experiment with = u illustrates the point. All the experiments in this paper are carried out in MATLAB R2020a with a working precision of double (u ≈ 1.1 × 10 −16 ). We take A to be an n × n Jordan block with eigenvalue λ and f (A) = A 1/2 (the principal matrix square root), for which L f (A) F = (I ⊗ A 1/2 + (A 1/2 ) T ⊗ I) −1 2 [21]. The diagonalization and evaluation of f ( A) is done at the working precision. In Table 3.1 we show the relative errors where E is a (full) normal (0,1) matrix scaled so that E F = u 1/2 A F and the reference solution f (A) is computed in 100 digit precision using the function sqrtm from the Multiprecision Computing Toolbox [30]. For λ = 1 we obtain an error of order u 1/2 , but the errors grow as λ decreases and we achieve no correct digits for λ = 0.1. The reason is clear from Table 3.2, which shows the values of the term that multiplies E F in (3.3), which are very large for small λ. We stress that increasing the precision at which f ( A) is evaluated does not reduce the errors; the damage done by the perturbation E cannot be recovered.
In this work we adapt the idea of diagonalizing after a regularizing perturbation, but we take a new approach that does not depend on Davies's theory.
4. Evaluating a function of a triangular matrix. Our new algorithm uses the same blocked and re-ordered Schur form as the Schur-Parlett algorithm. The key difference from that algorithm is how it evaluates a function of a triangular block. Given an upper triangular block T ∈ C m×m of the reordered Schur form and an arbitrary function f we apply a regularizing perturbation with norm of order u and evaluate f (T ) at precision u h < u. We expect m generally to be small, in which case the overhead of using higher precision arithmetic is small. In the worst case this approach should be competitive with the worst case for the Schur-Parlett algorithm [8,Alg. 2.6], since (2.2) requires up to O(m 4 ) (working precision) flops.
We will consider two different approaches.
4.1. Approximate diagonalization with full perturbation. Our first approach is a direct application of approximate diagonalization, with = u 2 . Here, E is a multiple of a (full) normal (0,1) matrix with norm of order 1/2 = u. Whereas Davies considered only matrices A of 2-norm 1, we wish to allow any norm, and the norm of E should scale with that of A. We will scale E so that We evaluate f (T +E) by diagonalization at precision u h = u 2 and hope to obtain a computed result with relative error of order u. Diagonalization requires us to compute the Schur decomposition of a full matrix T + E, and it costs about 28 2 3 m 3 flops in precision u h .
Although we do not expect this approach to provide a numerical method that works well for all problems, in view of the discussion and example in section 3, it is a useful basis for comparison with the new method in the next section.

Approximate diagonalization with triangular perturbation.
Instead of regularizing by a full perturbation, we now take the perturbation E to be an upper triangular normal (0,1) matrix, normalized by (4.1). An obvious advantage of taking E triangular is that T = T + E is triangular and we can compute the eigenvectors (needed for diagonalization) by substitution, which is substantially more efficient than computing the complete eigensystem of a full matrix. Note that the diagonal entries of T are distinct with probability 1, albeit perhaps differing by as little as order E F . This approach can be thought of as indirectly approximating the derivatives by finite differences. Indeed for m = 2 we have so when t 11 = t 22 , perturbing to t 11 = t 22 results in a first order finite difference approximation to f (t 11 ). For m > 2, these approximations are intertwined with the evaluation of f (T ). In order to find the eigenvector matrix V of the perturbed triangular matrix T = T + E we need to compute a set of m linearly independent eigenvectors v i , i = 1 : m. This can be done by solving at precision u h the m triangular systems where we set v i to be 1 in its ith component, zero in components i + 1 : m, and solve for the first i − 1 components by substitution. Thus the matrix V is upper triangular. Careful scaling is required to avoid overflow [35].
To summarize, we compute in precision u h the diagonalization where in practice the λ i will be distinct. We then form f ( T ) = V f (D)V −1 in precision u h , which involves solving a multiple right-hand side triangular system with a triangular right-hand side. The cost of the computation is We expect the error in the computed approximation F to F = f ( T ) to be bounded approximately by (cf. [23, p. 82 (The choice of norm is not crucial; the 1-norm is convenient here.) We will use this bound to determine u h . Note that Since we do not know f ( T ) 1 a priori we will approximate f (D) 1 / f ( T ) 1 by 1 (the geometric mean of its bounds), and hence we will use Since we need to know how to choose u h before we compute V , we need an estimate of κ(V ) based only on T . Since we are using a triangular perturbation its regularizing effect will be less than that of a full perturbation, so we expect that we may need a precision higher than double the working precision. Demmel [3, sect. 5.3], [9] showed that κ 2 (V ) is within a factor m of max i P i 2 , where P i is the spectral projector corresponding to the eigenvalue λ i . Writing the spectral projector for the eigenvalue λ 1 = t 11 is, with the same partitioning, From (4.6) we have This bound will be very pessimistic if we apply it to t 11 I − T 22 , because for the bound to be a good approximation it is necessary that many diagonal elements of U are of order α, yet t 11 I − T 22 will typically have only a few (if any) small elements. Let us group the t ii according to the Schur-Parlett blocking criteria described in section 2, with blocking parameter δ = δ 1 . Suppose the largest block has size k ≥ 2 and suppose, without loss of generality, that it comprises the first k diagonal elements of T . Then we will approximate ( t 11 I − T 22 ) −1 1 by ( t 11 I − T 22 (1 : k − 1, 1 : k − 1)) −1 1 , and bound it by (4.7), leading to the approximation where the parameter c m is such that c m u ≈ min i |w ii |, where the w ii are the diagonal elements of t 11 I − T 22 (1 : k − 1, 1 : k − 1), and hence this is an estimate of κ 1 (V ) by Demmel's result. We are aiming for an error of order u, so from (4.5) we need κ 1 (V )u h u, which gives the requirement In the case k = 2, the bound (4. . If the largest block size is k = 1, we use u h = u 2 (corresponding to Davies's conjecture (3.2)) since we do not expect κ(V ) to be so large that a precision higher than double the working precision is required.
Importantly, for 2 × 2 diagonal blocks of T with distinct eigenvalues we are able to use (4.2) at the working precision.
In summary, we make an upper triangular perturbation E of norm O(u T 1 ) to T and evaluate f (T +E). We choose the precision of the evaluation in order to be sure of obtaining an accurate evaluation of f (T + E). Our approach differs fundamentally from Davies's randomized approximate diagonalization. Our triangular E has a lesser regularizing effect than a full one, so it results in a potentially larger κ(V ), but our choice of u h takes this into account. On the other hand, since our perturbation E is upper triangular and of order u T 1 , it corresponds to a backward error of order u and so is harmless. A full perturbation E cannot be interpreted as a backward error for the f (T ) evaluation as it perturbs the zeros in the lower triangle of T .
The analysis above is unchanged if E is diagonal, so we allow E to be chosen as diagonal or upper triangular in Algorithm 4.1 and will compare the two choices experimentally in the next section.
We now discuss the choice of δ 1 and c m . The parameter c m is such that c m u ≈ min i |w ii |, where the w ii are the diagonal elements of t 11 I − T 22 (1 : k − 1, 1 : k − 1). We will determine c m by considering the extreme case when min i |w ii | is extremely small, which is when all t ii in the largest block of size k ≤ m differ only by the perturbation we added (in which case the t ii are exactly repeated) and thus t 11 This choice of c m makes the chosen higher precision u h < u 2 pessimistic when some of the t ii are close in the sense they are partitioned in the same block by δ = δ 1 but not all of them are exactly repeated, but it helps to ensure the accuracy of the algorithm Algorithm 4.1 Multiprecision algorithm for function of a triangular matrix. Given a triangular matrix T ∈ C m×m and a function f , this algorithm computes F = f (T ). It uses arithmetics of unit roundoff u (the working precision), u 2 , and possibly a higher precision u h ≤ u 2 . Lines 9-11 are to be executed at precision u 2 and lines 12-16 are to be executed at precision u h .
in all cases. However, since the algorithm is to be employed in the next section for computing a function of T ∈ C m×m where generally m (and hence k) is expected to be small, we do not expect this approach to seriously affect the efficiency of the overall algorithm. In the case we are considering we have |w ii | = | t 11 − t ii | = |e 11 − e ii |. The matrix E on line 8 of Algorithm 4.1 has entries u(max i,j |t ij |)| n ij |, where N F = 1, and we expect | n ij | ≈ 1/m. This suggests taking c m = θ max i,j |t ij |/m for some constant θ. In our experiments with different choices of θ we found θ = 0.5 to be a good choice.
The blocking parameter δ = δ 1 is important in determining the largest group size k in (4.8). A smaller δ can potentially group fewer eigenvalues and decrease k, causing a larger u h to be used. Yet too large a δ can result in a u h that is much smaller than necessary to achieve the desired accuracy. We have found experimentally that δ 1 = 5 × 10 −3 is a good choice.

Numerical experiments.
In this section we describe a numerical experiment with the methods of sections 4.1 and 4.2 for computing a function of a triangular matrix. Precisions higher than double precision are implemented with the Multiprecision Computing Toolbox [30].
We set the function f to be the exponential, the square root, the sign function, the logarithm, the cosine, and the sine. The algorithms for computing f (T ) to be tested are • Alg full: approximate diagonalization with a full perturbation and u h = u 2 , as described in section 4.1, • Alg diag: Algorithm 4.1 with diagonal E, c m = 0.5 max i,j |t ij |/m, and δ 1 = 5 × 10 −3 . We use the following matrices, generated from built-in MATLAB functions.
• T 6 = triu(rand(m)). • T 7 = gallery('jordbloc',m,0.5): a Jordan block with eigenvalue 0.5. Since we are computing the principal matrix square root and the principal logarithm we multiply matrices T 3 , T 4 , and T 5 by 1 + i for these functions to avoid their eigenvalues being on the negative real axis.
We report the equivalent number of decimal digits for the higher precision u h used by Algorithm 4.1 for each test matrix in the computation in Table 4.1. Since the outputs of Alg full and Alg diag depend on the random perturbation E, we compute the function of each matrix 10 times and report in Table 4.2 the maximum relative error F − F / F F , where F is a reference solution computed by the functions expm, sqrtm, logm, cosm, and sinm provided by the Multiprecision Computing Toolbox running at 200 digit precision, and rounded back to double precision. We use a diagonal perturbation E in Algorithm 4.1. For the reference solution of the matrix sign function, we run signm from the Matrix Function Toolbox [20] at 200 digit precision, and round back to double precision. The same procedure is followed in the experiments in the following sections.
We show in Table 4.2 the quantity κ f (A)u, where κ f (A) is the 1-norm condition number [23,Chap. 3] of f at A, which we estimate using the funm condest1 function provided by [20]. A numerically stable algorithm will produce forward errors bounded by a modest multiple of κ f (A)u.
The results show that Algorithm 4.1 behaves in a numerically stable fashion in every case, typically requiring a higher precision with unit roundoff u h equal to or not much smaller than u 2 . We see that for the same class of matrices the number of digits of precision used is nondecreasing with the matrix size m, which is to be expected since we expect a larger maximum block size (equal to k in (4.8)) for a larger matrix. On the other hand, as expected in view of the discussion in section 3, the randomized approximate diagonalization method Alg full is less reliable and sometimes not accurate at all when f is the matrix square root, the matrix sign function, or the matrix logarithm.
Note that our test matrices here are more general than will arise in the algorithm of the next section, for which the triangular blocks will have clustered eigenvalues.
We repeated this experiment with an upper triangular E in Algorithm 4.1. The errors were of the same order of magnitude as for diagonal E. Since a diagonal E requires slightly less computation, we will take E diagonal in the rest of this paper.

5.
Overall algorithm for computing f (A). Our algorithm for computing f (A) follows the framework of the Schur-Parlett algorithm [8]. First the Schur decomposition A = QT Q * is computed. Then the triangular matrix T is reordered to a partitioned upper triangular matrix T by a unitary similarity transformation, which is achieved by    In the reordering and blocking of the Schur-Parlett framework the blocking parameter δ > 0, described in section 2, needs to be specified. A large δ leads to greater separation of the eigenvalues of the diagonal blocks, which improves the accuracy of the solutions to the Sylvester equations. In this respect, there is a significant difference between Algorithm 5.1 and the standard Schur-Parlett algorithm: the latter algorithm cannot tolerate too large a δ because it slows down convergence of the Taylor series expansion, meaning that more terms may be needed (or the series may simply not converge). Since Algorithm 4.1 performs well irrespective of the eigenvalue distribution we can choose δ without consideration of the accuracy of the evaluation of f on the diagonal blocks and larger δ will in general do no harm to accuracy. In the extreme case where δ is so large that one block is employed, Algorithm 5.1 does not solve Sylvester equations and thus avoids the potential error incurred in the process, and in general this is when our algorithm attains its optimal accuracy, but the price to pay is that it becomes very expensive because higher precision arithmetic is being used on an n × n matrix. We investigate the choice of δ experimentally in the next subsection.

Numerical experiments.
In the Schur-Parlett algorithm [8] the blocking parameter δ = 0.1 is chosen, which is shown there to perform well most of the time. In order to investigate a suitable value for δ in Algorithm 5.1, we compare the following four algorithms, where "nd" stands for "no derivative".
• funm nd 0.1, Algorithm 5.1 with δ = 0.1; • funm nd 0.2, Algorithm 5.1 with δ = 0.2; • funm nd norm, Algorithm 5.1 with δ = 0.1 max i |t ii |; and • funm nd ∞, Algorithm 5.1 with δ = ∞ (no blocking, so the whole Schur factor T is computed by Algorithm 4.1). The 35 tested matrices are nonnormal taken from • the MATLAB gallery; • the Matrix Computation Toolbox [19]; • other MATLAB matrices: magic, rand, and randn. We set their size to be 32 × 32, and we also test the above matrices multiplied by 10 ±2 to examine the robustness of the algorithms under scaling. We set the function f to be the matrix sine; similar results were obtained with the other functions. Figure 5.1, in which the solid line is κ f (A)u, shows that Algorithm 5.1 with a constant δ is fairly stable under scaling while using a δ that scales with the matrix A (funm nd norm) can produce large errors when A is small. This is not unexpected since a smaller δ results in a smaller separation of the blocks and more ill-conditioned Sylvester equations.
In most cases there is no difference in accuracy between the algorithms. The results show no significant benefit of δ = 0.2 over δ = 0.1, and the former produces larger blocks in general so increases the cost.
In general, the choice δ in Algorithm 5.1 must be a balance between speed and accuracy, and the optimal choice of δ will be problem-dependent. We suggest taking δ = 0.1 as the default blocking parameter in Algorithm 5.1.
Next we set the function f to the sine, the cosine, the hyperbolic sine, and the hyperbolic cosine and use the same set of 35 test matrices as in the previous experiment. We compare the following three algorithms: • funm, the built-in MATLAB function implementing the standard Schur-Parlett algorithm [8] with δ = 0.1; • funm nd, Algorithm 5.1 with δ = 0.1.
• funm nd ∞, Algorithm 5.1 with δ = ∞ (no blocking, so the whole Schur factor T is computed by Algorithm 4.1). Note that since we are comparing with the Schur-Parlett algorithm funm we are restricted to functions f having a Taylor expansion with an infinite radius of convergence and for which derivatives of all orders can be computed. Also, we exclude the exponential, square root, and logarithm because for these functions the specialized codes expm, sqrtm, and logm are preferred to funm.
From Figure 5.2 we observe that, overall, there is no significant difference between funm nd and funm in accuracy, and funm nd ∞ is superior to the other algorithms in accuracy, as expected.
We list the computational cost of the three algorithms in flops in Table 5.1. We note that the cost of reordering and blocking, and solving the Sylvester equations that are executed in precision u, is usually negligible compared with the overall cost. For more details of the reordering and partitioning processes of T and evaluating the upper triangular part of f (A) via the block Parlett recurrence see [8]. In most cases the blocks are expected to be of much smaller dimension than A, especially when n is large. Obviously, funm nd is not more expensive than funm nd ∞ and it can be substantially cheaper; indeed funm nd requires no higher than the working precision to compute 1 × 1 and 2 × 2 diagonal blocks in the Schur form. Table 5.2 compares in a working precision of double the mean execution times in seconds and the maximal forward errors of funm, funm nd, and funm nd ∞ over ten runs, and reports the maximal block size in the reordered and blocked Schur form for each matrix and the maximal number of equivalent decimal digits used by funm nd. We choose f = sin and f = cosh and consider the following matrices, generated from built-in MATLAB functions and scaled to different degrees to have non-trivial blocks of the reordered and blocked Schur form in the Schur-Parlett algorithms.
• A 1 = rand(n)/5. -5): upper triangular with 1s on the diagonal and -5s off the diagonal. We see from Table 5.2 that funm, funm nd, and funm nd ∞ provide the same level of accuracy except for one case: f = sin and A 3 . In this case funm requires about n Taylor series terms and produces an error several orders of magnitude larger than that of other algorithms. For the matrix A 3 with repeated eigenvalues, funm nd is much slower than funm due to the use of higher precision arithmetic in a large block, and in this case there is no noticeable difference in execution time between funm nd and funm nd ∞, which confirms that the cost of the reordering and blocking in funm nd Asymptotic cost in flops of funm, funm nd, and funm nd ∞. Here, n = s i=1 m i is the size of the original matrix A, s is the number of the diagonal blocks in the Schur form after reordering and blocking, and m i is the size of the ith block. is negligible. For the randomly generated matrices (A 1 and A 2 ) funm can be up to 3.8 times faster than funm nd, but in some cases when the block size is small funm nd is competitive with funm in speed. For these matrices, funm nd is much faster than funm nd ∞. Finally, we note that Algorithm 5.1 is not restricted only to a working precision of double since its framework is precision independent. For other working precisions suitable values for the parameters c m , δ 1 and δ may be different, but they can be determined in an approach similar to the one used in this work.
The reason for developing Algorithm 5.1 is that it requires only accurate function values and not derivative values. In the next section we consider a function for which accurate derivative values are not easy to compute.
6. An application to the matrix Mittag-Leffler function. The matrix Mittag-Leffler function is the two-parameter function defined by the convergent series where A ∈ C n×n , α, β ∈ C, and Re α > 0. Analogously to the matrix exponential in the solution of systems of linear differential equations, the Mittag-Leffler function plays an important role in the solution of linear systems of fractional differential equations [17], [34], including time-fractional Schrödinger equations [15], [16] and multiterm fractional differential equations [32]. Despite the importance of the matrix Hence ml opc can produce large relative errors when |E α,β (z)| 1. By the power series definition we intuitively expect that the function value |E α,β (z)| will generally decrease with decreasing |z| or increasing β. Hence we do not expect ml opc to provide small relative errors for all arguments.

Numerical experiments.
In this section we present numerical tests of Algorithm 5.1 (funm nd). In funm nd the ability to accurately evaluate the scalar Mittag-Leffler function in precisions beyond the working precision is required. We evaluate the scalar Mittag-Leffler function by truncating the series definition and we use a precision a few digits more than the highest precision required by the algorithms for the evaluation of the triangular blocks.
In the literature particular attention has been paid to the Mittag-Leffler functions with 0 < α < 1 and β > 0 as this is the case that occurs most frequently in applications [16], [29]. In addition to the Mittag-Leffler functions with β ≈ 1 that are often tested in the literature, we will also investigate the cases when β takes other positive values that appear in actual applications. For example, in linear multiterm fractional differential equations the source term can often be approximated by polynomials, and then the exact solution involves evaluating the matrix Mittag-Leffler function with β = α + , = 1, 2, . . . [18].
We compare the accuracy of our algorithm funm nd with that of mlm, the numerical scheme proposed by Garrappa and Popolizio [18]. The normwise relative forward Example 1: the Redheffer matrix. We first use the Redheffer matrix, which is gallery('redheff') in MATLAB and has been used for test purposes in [18]. It is a square matrix R with r ij = 1 if i divides j or if j = 1 and otherwise r ij = 0. The Redheffer matrix has n − log 2 n − 1 eigenvalues equal to 1 [6], which makes it necessary to evaluate high order derivatives in computing E α,β (−R) by means of the standard Schur-Parlett algorithm. The dimension of the matrix is set to n = 20.
In this case the Schur-Parlett algorithm funm nd chooses five blocks: one 16×16  On the other hand, for β ≥ 6, mlm produces errors that grow with β and become much larger than κ M L (A)u, so it is behaving numerically unstably. It is not surprising to see that mlm becomes numerically unstable when β = 8.0, as it aims to achieve (6.1) and |E α,β (z)| decays to 0 when β increases; for example, |E 0.5,10 (1)| ≈ 4.0e-6.
Example 2: matrices with clustered eigenvalues. In the second experiment we test two matrices A 1 and A 2 of size 30 × 30 and 40 × 40 with both fixed and randomly generated eigenvalues that are clustered to different degrees, as explained in Table 6.1.
The test matrices were designed to have nontrivial diagonal blocks in the reordered and blocked Schur form. We assigned the specified values to the diagonal matrices and performed similarity transformations with random matrices having a condition number of order the matrix size to obtain the full matrice A 21 and A 22 with the desired spectrum.
In this example, funm nd chooses six blocks for A 21 and ten blocks for A 22 . Figure 6.2 shows that for these matrices funm nd performs in a numerically stable fashion, whereas mlm does not for β ≥ 6.
Example 3: matrices from the MATLAB gallery. Now we take 10 × 10 matrices from the MATLAB gallery and test the algorithm using the matrix Mittag-Leffler functions with α = 0.8 and β = 1.2 or β = 8.0. The forward errors are shown in Figure 6.3. We see that mlm is mostly numerically unstable for β = 8 while funm nd remains largely numerically stable.
One conclusion from these experiments is that by exploiting higher precision arithmetic it is possible to evaluate the Mittag-Leffler function with small relative error even when the function has small norm.

Conclusions.
We have built an algorithm for evaluating arbitrary matrix functions f (A) that requires only function values and not derivatives. The inspiration for our algorithm is Davies's randomized approximate diagonalization. We have shown that the measure of error that underlies randomized approximate diagonalization makes it unsuitable for computing matrix functions. Nevertheless, we have exploited the approximate diagonalization idea within the Schur-Parlett algorithm by making random diagonal perturbations to the nontrivial blocks of order greater than 2 in the reordered and blocked triangular Schur factor and then diagonalizing the perturbed blocks in higher precision. Our new multiprecision algorithm, Algorithm 5.1, is applicable to any sufficiently smooth function and requires only function values. By contrast, the standard Schur-Parlett algorithm, implemented as funm in MATLAB, requires derivatives and is applicable only to functions that have a Taylor series with a sufficiently large radius of convergence.
Numerical experiments show similar accuracy of our algorithm to funm. We found that when applied to the Mittag-Leffler function E α,β our algorithm provides results of accuracy at least as good as, and systematically for β ≥ 6 much greater than, the special-purpose algorithm mlm of [18].
Our multiprecision Schur-Parlett algorithm requires at most 2n 3 /3 flops to be carried out in higher precisions in addition to the approximately 28n 3 flops at the working precision, and the amount of higher precision arithmetic needed depends on the eigenvalue distribution of the matrix. When there are only 1×1 and 2×2 blocks on the diagonal of the reordered and blocked triangular Schur factor no higher precision arithmetic is required.
Our new algorithm is a useful companion to funm that greatly expands the class of readily computable matrix functions. Our MATLAB code funm_nd is available from https://github.com/Xiaobo-Liu/mp-spalg.