A Block Krylov Method to Compute the Action of the Fréchet Derivative of a Matrix Function on a Vector with Applications to Condition Number Estimation

We design a block Krylov method to compute the action of the Frechet derivative of a matrix function on a vector using only matrix-vector products, i.e., the derivative of $f(A)b$ when $A$ is subject to a perturbation in the direction $E$. The algorithm we derive is especially effective when the direction matrix $E$ in the derivative is of low rank, while there are no such restrictions on $A$. Our results and experiments are focused mainly on Frechet derivatives with rank 1 direction matrices. Our analysis applies to all functions with a power series expansion convergent on a subdomain of the complex plane which, in particular, includes the matrix exponential. We perform an a priori error analysis of our algorithm to obtain rigorous stopping criteria. Furthermore, we show how our algorithm can be used to estimate the 2-norm condition number of $f(A)b$ efficiently. Our numerical experiments show that our new algorithm for computing the action of a Frechet derivative typically requires a small number of ite...

subspace (1.1) by a rational Krylov subspace, utilizing the inverse of shifted versions of A, have also been developed (see Güttel [8] for a recent survey). For the matrix exponentialthere are a number of alternatives to using a Krylov subspace, such as the Leja method [3] and methods based upon Taylor series approximation [1], which have been compared in a recent review paper [2].
In any numerical computation it is important to know the condition number of the problem: when the algorithm used is backward stable, we know that the relative error is approximately bounded above by the condition number times the unit roundoff. Therefore, being able to estimate the condition number efficiently is critical when determining how accurate a solution can be expected.
Before discussing the condition number of the f (A)b problem, we must first define the Fréchet derivative of the matrix function f . The Fréchet derivative is an operator L f (A, ·) : C n×n → C n×n , which is linear in its second argument and for any matrix E ∈ C n×n satisfies where o( E ) represents a remainder term that, when divided by E , tends to zero as E → 0.
If we first consider the matrix function f (A), without applying it to a vector, its relative condition number is closely related to the Fréchet derivative of the function. We have (see [10,Chap. 3]) For the action of the matrix function, Deadman [5, p. 4] defines the relative condition number of f (tA)b. If one ignores the dependence on t by setting t = 1, he defines for any induced matrix norm and goes on to show that where the o( E ) and o( ∆b ) terms disappear as → 0. In order to estimate this condition number efficiently we need to compute L f (A, E)b, for multiple matrices E, in a highly efficient manner. The details of how to estimate this condition number are discussed further in section 5.
The primary goal of this paper is to give an efficient means of evaluating L f (A, E)b for E of rank 1 based upon block Krylov subspace methods. This can easily be extended to E of rank k, since L f (A, E) is linear in E. We also provide a priori error analysis for our algorithm in order to derive a rigorous stopping criterion. As a secondary goal we show how our algorithm can be applied to improve the computation of the condition number cond(f, A, b).
The paper is organized as follows. In section 2 we give some background information on the block Krylov method that we use throughout the rest of our analysis. Following this, in sections 3 and 4, we derive an algorithm to compute L f (A, E)b for a rank 1 matrix E and perform our a priori error analysis. In section 5 we show how our algorithm can be used to estimate cond(f, A, b), before performing a battery of numerical experiments in section 6. Finally, in section 7 we give some concluding remarks.
2. Block Krylov method. The algorithm for computing L f (A, E)b that we introduce in section 3 is based upon the block Krylov decomposition; this section introduces the notation required throughout the rest of this work.
We define a block Krylov subspace, with block size p, as the set In practice we use a basis only for the span of the first M = mp vectors, which we call Note that, for p = 1, the algorithm reverts to the standard Arnoldi process. One potential advantage of this algorithm over other variants is that the dimension of the subspace approximation M is not necessarily forced to be a multiple of p. This property can be of advantage in some applications, as it may require less computational effort to achieve convergence, but is not employed here. The following analysis assumes that M is a multiple of p.
When the subspace dimension M is a multiple of p, the algorithm can be rewritten to use level-3 BLAS routines, to take advantage of the induced block parallelism. To increase the stability of the algorithm we can also add a reorthogonalization step at the end of each iteration. In the rest of our analysis we will denote by m the number of steps and by M the dimension of the resulting block Krylov subspace.
For M = mp we have analogues of the standard Arnoldi decomposition which are useful for our later analysis. Before giving these relationships, we need to define our notation. First, let us define U i to be the n × p matrix with columns v p(i−1)+1 , . . . , v pi from Algorithm 1 and V m = [U 1 , . . . , U m ]. Second, letH m be the band Hessenberg matrix with nonzero entries h ij from Algorithm 1. We split the matrix in the following fashion:H and denote by J m the final p columns of the identity matrix I M . In this case we have the following analogues of the Arnoldi decomposition (see [18, p. 161]): 3. Algorithm derivation. In this section we derive the basic algorithm upon which the rest of our work builds. For the analysis in the next two sections we take the direction matrix E = ηyz * in the Fréchet derivative to be of rank 1 with η ∈ R and y 2 = z 2 = 1. Furthermore, without loss of generality, we will assume that b 2 = 1. We can easily generalize the development to rank k matrices, as the Fréchet derivative is linear in its second argument. In fact, for a rank 2 matrix , which we might expect to be in the block Krylov space K(A, [y 1 , y 2 , b]). More generally, we would ideally like to use a block Krylov space with block size k + 1 for an E of rank k. Analyzing this extension more formally is outside the scope of the current article but would make an interesting idea for future research.
We start by proving that the quantity that we wish to compute is in the Krylov subspace K m (A, y) for some m > 0. Theorem 1. Given A, E ∈ C n×n such that f is defined and Fréchet differentiable at A and such that E = ηyz * , then L f (A, E)b ∈ K m (A, y) for some m > 0, where K m (A, y) is the Krylov subspace with starting vector y of dimension m.
Proof. Assuming that the scalar function f is sufficiently differentiable, we can form a 2 × 2 block matrix to get We refer to [10, sect. 3.2] for the derivation of this formula. Therefore, we can multiply both sides by a vector from the right-hand side to obtain so that the quantity we desire is in the upper half of the resulting vector.
Now for X ∈ C l×l we know that f (X) = p(X) for some polynomial p of degree at most l − 1 [10,Chap. 1], and therefore L f (A, E)b is in the span of the top half of the vectors formed by the matrix-vector products for k = 0 : l − 1, where l ≤ 2n. To complete the proof it remains to show that the top half of these vectors are in K m (A, y), which we achieve via induction. The base case for k = 0 is obvious. For k = 1 and recalling that E = ηyz * , we have where the top half of the result is in K 1 (A, y) and the bottom half is in K 1 (A, b). For induction, suppose we have a vector where y k ∈ K k (A, y) and b k ∈ K k (A, b). Applying our 2 × 2 block matrix, we obtain where y k+1 ∈ K k+1 (A, y) and b k+1 ∈ K k+1 (A, b), completing the proof.
With this result we know that L f (A, yz * )b ∈ K m (A, y), and our hope is that in practice a Krylov subspace with m < n is required. Unfortunately, we have been unable to see how to approximate L f (A, yz * )b solely from the space K m (A, y). However, we will show that, by using the block Krylov subspace K m (A, [y, b]), we can compute an approximation to L f (A, yz * )b. Since K(A, b) is already required for computing f (A)b with a Krylov method, this is not an unreasonable restriction.
When the scalar function f is analytic on and inside a contour Γ enclosing Λ(A), the spectrum of A, we can represent f (A) using the Cauchy integral formula (see [10, Def. 1.11]) Taking the Fréchet derivative of this representation using the chain rule [10, Prob. 3.8], we obtain For the next step, assume that we have a block Krylov subspace K m (A, [y, b]), so that we can form the approximation By employing these two approximations, we see that Overall, we have shown that if the block Krylov subspace K m (A, [y, b]) can be used to approximate both (ωI − A) −1 y and (ωI − A) −1 b, then we can approximate L (m) ≈ L f (A, ηyz * )b by computing the Fréchet derivative at the (potentially much smaller) matrix H m . We note that H m ∈ C 2m×2m , as we use a block Krylov subspace of block size 2.
To make this into an iterative algorithm we must now introduce a stopping criterion. We note that, in exact arithmetic, the algorithm will converge in at most n/2 iterations since we add two vectors in each iteration. If we want to compute the answer to some relative tolerance, tol, we can continue adding vectors into our Krylov subspace by increasing m until This results in the following basic algorithm.

Algorithm 2
Let A be an n×n matrix, and f a matrix function that is defined and Fréchet differentiable at A while being analytic on and inside a contour enclosing Λ(A). Furthermore, suppose that y, z, and b are three vectors satisfying y 2 = z 2 = b 2 = 1. Finally, let η ∈ R and a tolerance tol > 0 be given. Then the following algorithm approximates The above algorithm can be used to compute an approximation to L f (A, E)b but has one major issue: the rather crude stopping criterion can lead to early termination of the algorithm if any stagnation in the convergence is encountered. Therefore, our next section aims to design an a priori error bound so that (an upper bound on) the number of iterations required to achieve a given tolerance can be found analytically.

4.
A priori error analysis. As mentioned above, the goal of this section is to improve the reliability of the stopping criterion in Algorithm 2 by performing an a priori error analysis. Our methodology is based upon that of Saad [17] for the matrix exponential, though we need to make a number of modifications to account for the Fréchet derivative. More specifically, we aim to show that our approximation to the Fréchet derivative at each step is equivalent to forming a bivariate polynomial in A and E. Once this has been achieved, we can use a bivariate power series as the remainder function and bound its norm. Since the size of this bound will decrease with m, we can find the number of iterations required to guarantee a given tolerance in our approximation. During our analysis we will repeatedly use the fact that if a power series f (x) = ∞ i=0 a i x i has radius of convergence r, then for A, E ∈ C n×n with A < r we have (see [10,Prob. 3 Our first step towards an a priori error estimate is to relate polynomials of the above form with corresponding polynomials in H m , the block-Hessenberg matrix, which is the subject of the following lemma.
Lemma 2. Let A be any matrix and V m , H m the results of m steps of the block Arnoldi method applied to A with initial vectors [y, b]; also let E = ηyz * . Then for Proof. It suffices to prove (4.2), from which (4.3) follows easily. Writing out E in full, we have Making these substitutions, we obtain Now, our plan is to approximate L f (A, E)b using a truncation of the power series expansion (4.1). As a corollary of our previous result, we show that truncating this series to order m − 1 is equivalent to forming a polynomial using the basis of our Krylov subspace. In particular, let us define to be a truncation of the power series (4.1).
Proof. This is a special case of (4.3) in Lemma 2. Therefore, the approximation Using this result, we can now write an expression for the error which we will then bound. Before we do, let us define the remainder function as follows: This function quantifies how close our approximating polynomial s m−1 (A, E) is to the actual Fréchet derivative L f (A, E). In the following lemma we will relate the remainder of L f (A, E) to the corresponding remainder of ηV m L f (H m , e 1 z * V m )V * m . Lemma 4. Let A be any matrix and V m , H m the results of m steps of the block Arnoldi method applied to A with starting block [y, b]; also let E = ηyz * . Then where the remainder r m (x, e) is defined above.
Proof. By rearranging the definition of the remainder polynomial, we have as required.
Following this lemma, taking the norm of both sides, we have where the final inequality is justified by the fact that Xq 2 ≤ X 2 for any X and q where q 2 = 1 and by the unitary invariance of the the 2-norm. Recall that (without loss of generality) at the start of section 3 we made the assumption that b 2 = 1 and therefore ẽ 2 = 1.
With this result, we can now obtain a priori error bounds on our approximation which depend upon the size of the Krylov subspace. Using the polynomial s m−1 (x, e) as defined in (4.4) at the mth iteration of the Krylov method, we obtain the remainder function Our next lemma bounds this remainder term from above using the power series t m (x) = ∞ i=m |a i |ix i−1 . Lemma 5. Let f have a Taylor series convergent in a disc of radius r, let A 2 < r, and let f be Fréchet differentiable at A. Furthermore, let E ∈ C n×n be given. Then the remainder function (4.7) is bounded above by Proof. We can see that This leads to the following corollary.
Corollary 6. For functions f and matrices A satisfying the criteria of Lemma 5, with E = ηyz * and b given such that b 2 = y 2 = z 2 = 1, the error in approxi- where t m is defined just prior to Lemma 5.
Proof. Combine the upper bound (4.6) with Lemma 5, noting that H m 2 ≤ A 2 and that t m is a monotonically increasing function.
To obtain more concrete bounds it is helpful to look at a specific function. For example, let us consider the matrix exponential. In this case we have the coefficients a i = 1/i! and hence t m (x) = Corollary 7. When considering the matrix exponential in Corollary 6, we obtain the upper bound Proof. See above.
The analysis in this section has allowed us to use a relative a priori error bound to terminate our iterative algorithm, as opposed to simply taking the relative difference between iterates in Algorithm 2. The resulting algorithm is given below.
5. Application to condition number estimation. In this section we use our new algorithm for computing L f (A, E)b to design an algorithm for estimating the condition number of f (A)b, building upon work by Deadman [5]. In particular, Deadman proves the following bounds on cond(f, A, b), which we defined in (1.3).

Algorithm 3
Let A be an n×n matrix, and f a matrix function that is defined and Fréchet differentiable at A while being analytic on and inside a contour enclosing Λ(A). Furthermore, suppose that y, z, and b are three vectors satisfying y 2 = z 2 = b 2 = 1. Finally, let η ∈ R and tol > 0 be given. Then the following algorithm attempts to approximate L f (A, ηyz * )b within a relative tolerance tol. Set α equal to the a priori error estimate given by Corollary 6 or 7. Lemma 8. Given A ∈ C n×n , b ∈ C n , an induced matrix norm · , and a matrix function f which is defined and Fréchet differentiable at A, the condition number cond(f, A, b) satisfies the following bounds: Proof. For a proof, see [5, p. 5].
Deadman then proceeds to work in the 1-norm and shows that where A 1 and f (A) 1 can be estimated using the 1-norm estimation algorithm by Higham and Tisseur [11]. The quantity γ is a 2-norm estimate of the matrix Instead of following Deadman by approximating the 1-norm condition number, we choose to estimate the 2-norm condition number, for which this approximation becomes Note that this avoids introducing the √ n factor in the first term which, for large sparse matrices, could potentially dominate the approximation. We can then estimate f (A) 2 and A 2 using a power method. This leaves only the estimation of γ.
To estimate γ we first need to introduce the adjoint of a Fréchet derivative, L f (A, E). For simplicity we will focus on matrix functions with real power series (such as the matrix exponential) for which we know that L f (A, E) = L f (A * , E). For further details on the adjoint, see Higham [10, p. 66], for example.

Algorithm 4 Condition number estimate; cf. [5, Alg. 3.3].
For the function f , A ∈ C n×n , and b ∈ C n this algorithm computes an estimate 1 Choose a nonzero starting vector y 0 ∈ C n . 2 it max = 10 % Choose the maximum number of iterations.
Using the adjoint, the value γ ≈ K f (A, b) 2 is estimated from the following algorithm, given by Deadman [5].
The major cost of this algorithm is computing the derivative on line 4, which can be computed by extracting the top n elements of the vector Let us denote the 2 × 2 block matrix in the above equation by X, and the vector by v. Furthermore, we denote by fAmv(X, v) a method that computes f (X)v using only matrix-vector products. This quantity could be computed using a Krylov method or any other method requiring only matrix-vector products. However, such a method will have to deal with the inner subproblem of computing matrix-vector products with L f (A, y k b * ), found in the top-right block of the matrix in (5.3). Following the approach taken by Deadman, since L f (A, y k b * ) = L f (A * , y k b * ) for functions with real power series expansions, we could compute L f (A, y k b * )v as the top n elements of Now, since the direction term in the Fréchet derivative is of rank 1, our new algorithm can compute this inner subproblem in an extremely efficient manner. Note that, since the condition number does not generally need to be known to high precision, our new algorithm can be run with a low tolerance for even greater efficiency. Before we give the overall algorithm, we first define the following code fragment for efficiently computing the quantity given by (5.3).
Code Fragment 9 (w = Xu). This code computes the matrix-vector product where the direction term in the Fréchet derivative is of rank 1 and computed by Algorithm 2 or 3.

Algorithm 5
Given A ∈ C n×n , y k ∈ C n and b ∈ C n with b 2 = y k 2 = 1. The user-supplied routine fAmv for which fAmv(X, v) = f (X)v uses only matrix-vector products w = Xu is evaluated by Code Fragment 9. The algorithm computes x = L f (A, L f (A, yb * ))b.
6. Numerical experiments. In this section we will perform a number of numerical experiments to test the accuracy and efficiency of our new algorithms. All experiments were performed using MATLAB 2015a on a Linux machine (specifically the 64-bit variant-glnxa64), and all timings use only one processor which we achieved by using the option -singleCompThread. We also deactivated the Java virtual machine (and hence ran MATLAB without a GUI) by using -nojvm, to further increase the reliability of the results. The Python code is run using the Anaconda distribution of Python 2.7, with the libraries scipy 0.17.0 and numpy 1.10.4, linked to Intel MKL BLAS.
We use the matrix exponential (almost) exclusively throughout our tests, since this allows us to test the performance of our algorithm using expmv [1], which has implementations in both MATLAB and Python. We can use expmv to compute L exp (A, E)b using the 2 × 2 block formula from (3.1). We will refer to expmv used within the block formula for the Fréchet derivative as the "block algorithm." Throughout our experiments the direction matrix E will be rank-1 such that E = yz * , where y and z have elements sampled from a random normal N (0, 1) distribution.
We will also need to compute L exp (A, E)b in a highly accurate manner, so that we can compute the relative error achieved by our algorithm. We compute this by forming the 2 × 2 block formula (3.1) in variable-precision arithmetic from the MATLAB Symbolic Toolbox. We compute the function f (X) to high precision by taking an eigendecomposition V DV −1 = X + ∆X using 200 digits, where ∆X 2 = 10 −100 is used to ensure that, with probability 1, the matrix is diagonalizable. We can then form V f (D)V −1 and round back to double precision floating point.
Our first two experiments are designed to investigate the behavior of Algorithms 2 and 3. We compare the relative errors achieved by the block algorithm and Algorithm 2 when aiming for half, single, or double precision accuracy over a range of test matrices. We also compare the amount of work performed by each algorithm.
Our second experiment shows the convergence of our new algorithm as m, the number of block Krylov iterations, increases. This allows us to compare the effectiveness of the two stopping criteria in Algorithms 2 and 3. We also include a convergence profile using the matrix function ϕ(A) = A −1 (e A − I) in this experiment.
The third experiment illustrates the performance of our new algorithm on large sparse matrices. We also illustrate the storage advantages of our algorithm in comparison to the block algorithm.
Finally, in our fourth experiment, we investigate the application of our algorithm to computing a bound on the condition number of e A b for several different matrices.
Experiment 1 (behavior in floating point arithmetic). This experiment is designed to test the behavior of the method in floating point arithmetic. To do this, we will check that the algorithms attain the prescribed relative error. In particular we use the tolerances half, single, and double, which in IEEE floating point arithmetic correspond to 2 −11 , 2 −24 , and 2 −53 , respectively.
The test matrices used in this experiment are taken from the Matrix Computation Toolbox [9], where we set their dimension to n = 100 and compute L exp (A, yz * )b for y, z, and b with elements sampled from a normal N (0, 1) distribution. For reference we also compute L exp (A, yz * )b by applying expmv to the 2 × 2 block matrix shown in (3.1), i.e., the block algorithm. We compute the exact solution, from which we calculate the relative error obtained, using variable precision arithmetic as described previously.
In an attempt to control the, sometimes extremely large, condition numbers of the matrix functions, we rescaled all the matrices to have unit 2-norm. Nevertheless, several of the matrices were still too ill-conditioned to yield reasonable results with either method. The excluded matrices were those with indices [4,7,12,17,18,29,35,40] from the Matrix Computation Toolbox.
In Figure 6.1 we see the relative error obtained by each algorithm. The results show that, for each test matrix, both algorithms obtain a relative error close to the desired accuracy. In particular this shows that our new algorithm appears to behave in a forwards stable manner.
In Figure 6.2 we show the amount of work required by each algorithm. We define one unit of work to be the effort (in flops) required to compute one matrix-vector product with an n × n matrix. Therefore, if our new algorithm is using an m × m matrix on the current iteration, then each matrix-vector product contributes m 2 /n 2 units of work. We see that the block algorithm performs a similar amount of work regardless of the accuracy desired: they are barely distinguishable on the graph since all three lines overlap one another. Meanwhile, our new algorithm is often cheaper when aiming for double precision accuracy, and over a factor of 10 times cheaper for single and half precision accuracy.
In Figure 6.3 we see how the savings in work described above translates to savings in time. In order to obtain reliable timings we ran each code 500 times and display the mean time. We see that Algorithm 2 is faster when the desired accuracy is half or single precision. However, with a few exceptions, it is slower when we desire double precision accuracy. This is partially due to the fact that inside our implementation we evaluate the stopping criterion, (3.6), at the end of every iteration. We may be able to improve the runtime by, for example, checking the stopping criterion less frequently. Experiment 2 (convergence profile). In our second experiment we illustrate the convergence behavior of Algorithms 2 and 3 with respect to the degree m of the block Arnoldi process; see Figure 6.4. This will allow us to compare the two different stopping criteria used in Algorithms 2 and 3.
For the convergence profile we have chosen the matrix A to be the poisson2D matrix from the University of Florida Sparse Matrix Collection [4]. The vectors b, y, and z are chosen randomly with elements sampled from a normal N (0, 1) distribution. As before, we begin by computing L exp (A, yz * )b.
For the computation shown in Figure 6.4 the stopping criteria are ignored, so that we continue iterating until m = floor(n/2). The exact solution is computed in variable   precision arithmetic with 200 digits in the manner described at the beginning of this section. The graphs are cropped at m = 25 as, by this point, full double precision accuracy has already been achieved.
As we might expect, we see that the a priori estimate overestimates the number of iterations required to compute the Fréchet derivative at the desired accuracy but certainly provides an upper bound on the number of iterations required. On the other hand, we can see that the heuristic stopping criterion, L (m) − L (m−1) / L (m) , captures the rate of convergence well.
In our second convergence profile we switch to computing L ϕ (A, yz * )b, where ϕ(A) = A −1 (exp(A) − I), using the same matrix and vectors as above. The matrix function ϕ(x) is used extensively in exponential integrators [14]. In order to use our a priori error bound we must tailor Lemma 5, and the corollaries following it, for use with the ϕ(x) function. In particular, this function has the Taylor series ϕ(x) = can be bounded as follows: The final inequality is obtained by applying Corollary 7. The results of our experiment when using this a priori error bound are given in Figure 6.5. As we can see, the behavior here is very similar to that observed for the matrix exponential in Figure 6.4: the a priori error estimate tends to overestimate the number of block Krylov steps required to reach the desired tolerance, whereas the simplistic error bound from Algorithm 2 follows the true error in each iteration closely.
Experiment 3 (sparse test matrices). In this experiment we compare Algorithm 2 and the block algorithm for matrices of larger scale. We use examples from the University of Florida Sparse Matrix Collection [4] and the MUFC Twitter matrix. 1 From Table 6.1 we can see that the results for larger scale matrices are similar to those seen in Experiment 1. We choose single precision as the desired accuracy, which is achieved in all of the examples. In order to get an estimate for the error, we compute a reference solution by calling the block algorithm with double precision accuracy.
Some further comments are in order for the MUFC Twitter matrix, found in the final row of Table 6.1. This matrix is of size 148918 × 148918 with 193032 nonzero entries, and therefore the rank-1 direction term matrix E in the Fréchet derivative (which is dense) needs approximately 165.2 GB of storage. As a result, using the expmv algorithm directly is not possible on a standard workstation. The result given for the block algorithm in Table 6.1 for this matrix required some modification of the expmv algorithm, allowing it to use only matrix-vector products.
Experiment 4 (condition numbers). Our final experiment shows how our algorithm can be used to bound the condition number of computing e A b by combining Algorithms 3 and 6. For each of the matrices in Table 6.1, minus the MUFC Twitter, which was too large for the intermediate calculations during Code Fragment 9 to fit in memory, we report the relative error obtained when comparing our new algorithm to the exact solution and the estimated condition number (multiplied by the machine epsilon u = 2 −53 ) in Table 6.2. The exact solution is computed using variable precision arithmetic, as explained at the beginning of this section. In order to avoid overflow, the matrices were scaled such that A 2 = 1, and the vectors b were chosen to have elements sampled from a normal N (0, 1) distribution. In all cases we can see that the relative error is bounded above by the condition number times u, meaning that our method behaves in a forwards stable manner.
For comparison we have also included the 1-norm relative error and 1-norm condition number, where the latter is computed via the Python code corresponding to [5,Alg. 4.1]. 2 It is interesting to note that, for the extremely ill-conditioned matrix "Gset/G45," estimating the condition number with Algorithm 6 would inform the user that the computed solution is untrustworthy, while the 1-norm algorithm returns a condition number estimate that is orders of magnitude smaller than the observed relative error. This discrepancy is likely due to the design choice mentioned by Deadman [5, p. 13] of choosing only a single set of parameters for computing all the matrix exponentials required by the algorithm: clearly this can lead to poor condition number estimates in some cases.

Conclusions.
We have derived a new method for computing the action of the Fréchet derivative of a matrix function on a vector for low rank direction matrices E, making use of block Krylov subspace approximations. After introducing the method, we showed how one can obtain rigorous a priori error bounds to ensure that the action of the Fréchet derivative is computed to sufficient accuracy. Our numerical experiments confirm that our error bounds are effective in practice. Furthermore, we observe that typically the size of the required Krylov subspace is much smaller than the size of the original matrix. This allows the action of the Fréchet derivative to be computed much more efficiently than by other currently used methods.
We have also shown how our algorithm can be used to efficiently estimate the condition number of computing f (A)b in the 2-norm. The growing use of matrix functions in a variety of areas within applied mathematics makes it vital to have efficient condition number estimates available. Our experiments also show that our condition number estimator can detect ill-conditioned problems, where the returned solution is likely to be suspect, that are not detected by competing algorithms.
Looking towards the future, the block Krylov scheme used by our algorithms is relatively basic at this stage. By incorporating data reuse, implicit restarting, rational Krylov approximations, and other state-of-the-art techniques, it is likely that the action of the Fréchet derivative could be computed even more efficiently. This would have a dramatic impact on the computation of the condition number, for which multiple Fréchet derivative computations are required.