A New Analysis of Iterative Reﬁnement and its Application to Accurate Solution of Ill-Conditioned Sparse Linear Systems

. Iterative reﬁnement is a long-standing technique for improving the accuracy of a computed solution to a nonsingular linear system Ax = b obtained via LU factorization. It makes use of residuals computed in extra precision, typically at twice the working precision, and existing results guarantee convergence if the matrix A has condition number safely less than the reciprocal of the unit roundoﬀ, u . We identify a mechanism that allows iterative reﬁnement to produce solutions with normwise relative error of order u to systems with condition numbers of order u − 1 or larger, provided that the update equation is solved with a relative error suﬃciently less than 1. A new rounding error analysis is given and its implications are analyzed. Building on the analysis, we develop a GMRES-based iterative reﬁnement method (GMRES-IR) that makes use of the computed LU factors as preconditioners. GMRES-IR exploits the fact that even if A is extremely ill conditioned the LU factors contain enough information that preconditioning can greatly reduce the condition number of A . Our rounding error analysis and numerical experiments show that GMRES-IR can succeed where standard reﬁnement fails, and that it can provide accurate solutions to systems with condition numbers of order u − 1 and greater. Indeed in our experiments with such matrices—both random and from the University of Florida Sparse Matrix Collection—GMRES-IR yields a normwise relative error of order u in at most 3 steps in every case.

1. Introduction.Ill-conditioned linear systems Ax = b arise in a wide variety of science and engineering applications, ranging from geomechanical problems [13] to computational number theory [5].When A is ill conditioned the solution x to Ax = b is extremely sensitive to changes in A and b.Indeed, when the condition number κ(A) = A A −1 is of order u −1 , where u is the unit roundoff, we cannot expect any accurate digits in a solution computed by standard techniques.This poses an obstacle in applications where an accurate solution is required, of which there is a growing number [4], [15], [23], [24], [34].
Iterative refinement (Algorithm 1.1) is frequently used to obtain an accurate solution to a linear system Ax = b.Typically, one computes an initial approximate solution x using Gaussian elimination (GE), saving the factorization A = LU .Here and throughout, to simplify the notation we assume that A is a nonsingular matrix for which any required row or column interchanges have been carried out in advance (that is, "A ≡ P AQ", where P and Q are appropriate permutation matrices).After computing the residual r = b − A x in higher precision 1 u, where typically u = u 2 , one reuses L and U to solve the system Ad = r, rewritten as LU d = r, by substi-If A is very ill conditioned however, the iterative refinement process may not converge, and indeed all existing results on the convergence of iterative refinement require A to be safely less than u −1 .Nevertheless, despite the ill conditioning of A, there is still useful information contained in the LU factors and their inverses (perhaps implicitly applied).It has been observed that if L and U are the computed factors of A from LU factorization with partial pivoting then κ( L −1 A U −1 ) ≈ 1 + κ(A)u even for κ(A) u −1 , where L −1 A U −1 is computed by substitution; for discussion and further experiments see [26], [30], [31].This approximation can also be made in the case where left-preconditioned is used, that is, κ( U −1 L −1 A) ≈ 1 + κ(A)u.The experimental results shown in Figure 1.1 illustrate the quality of this approximation.
The results in Figure 1.1 lead us to question the conventional wisdom that iterative refinement cannot work in the regime where κ(A) > u −1 .If the LU factors contain useful information in that regime, can iterative refinement be made to work there too?We will give a new rounding error analysis that identifies a mechanism by which iterative refinement can indeed work when κ(A) > u −1 , provided that we can solve the equations for the updates on line 5 of Algorithm 1.1 with some relative accuracy.
We will then use the analysis to develop an implementation of Algorithm 1.1 that enables accurate solution of sparse, very ill conditioned systems.We use the generalized minimal residual (GMRES) method [33] preconditioned by the computed LU factors to solve for the corrections.We refer to this method as GMRES-based iterative refinement (GMRES-IR).
We show that in the case where the condition number κ ∞ (A) is around u −1 , our approach can obtain a solution x to Ax = b for which the forward error x−x ∞ / x ∞ is of order u.Extra precision (assumed to be with a unit roundoff of u 2 ) need only be used in computing the residual, in the triangular solves involved in applying the preconditioner, and in matrix-vector multiplication with A. All other computations are performed in the working precision u.
An advantage of our approach is that it can succeed when standard iterative refinement (using the LU factors to solve Ad = r) fails or, in the words of Ma et al. [24], "is on the brink of failure".Ma et al. [24] need to solve linear programming problems arising in a biological application and their attempts to use iterative refinement in the underlying linear system solutions were only partially successful.GMRES-IR offers  A) and 1 + κ∞(A)u for a computed LU factorization with partial pivoting, for matrices of dimension shown on the x-axis.These matrices are very ill conditioned with κ∞(A) ranging from 10 13 to 10 35 .Computations were in double precision arithmetic.Top: inverse Hilbert matrix.Bottom: matrices generated as gallery('randsvd',n,10^(n+5)) in MATLAB.
the potential for better results in this application.
We note that there is growing interest in using lower precisions such as single or even half precision in climate and weather modeling [29] and machine learning [14], but this brings an increased likelihood that the problems encountered will be very ill conditioned relative to the working precision.Our work, which is applicable for any u, could be especially relevant in these contexts.
In section 2 we present the new rounding error analysis and investigate its implications.In section 3 we explain how we use preconditioned GMRES within iterative refinement and give, using the results of section 2, theoretical justification that this GMRES-based iterative refinement can yield an error of the order of u even in cases where κ ∞ (A) ≥ u −1 .Since we are primarily concerned with sparse linear systems, we discuss in section 3.1 various choices of pivoting strategy and how these affect the numerical behavior.In section 4 we discuss other, related work on iterative refinement.Numerical experiments presented in section 5 confirm our theoretical analysis.Our numerical experiments motivate a two-stage iterative refinement approach, which we briefly discuss in section 5.3, that first attempts the less expensive standard iterative refinement and switches to a GMRES-IR stage in the case of slow convergence or divergence.
2. Error analysis of iterative refinement.The most general rounding error analysis of iterative refinement is that of Higham [17, sec.12.1], which appeared first in [16].That analysis cannot provide the result we want; convergence is guaranteed only when κ ∞ (A) ≤ u −1 .We therefore carry out a new analysis with different assumptions.A key observation is that an inequality used without comment in previous analyses can be very weak.We introduce a quantity µ (p) i , in (2.2) below, that captures the sharpness of the inequality and allows us to draw stronger conclusions.
We first define some notation that will be used in the remaining text.Given an integer k, we define where c is some small constant independent of the problem size.For a matrix A and vector x, we define the condition numbers Let A ∈ R n×n be nonsingular and let x be a computed solution to Ax = b.Define x 0 = x and consider the following iterative refinement process: The only assumption we will make on the solver for Ad i = r i is that the computed solution d i satisfies (2.1) where θ i is a constant depending on A, r i , n, and u.Thus the solver need not be LU factorization, or even a factorization method.From this point until the statement of Theorem 2.1 we define r i , d i , and x i to be the computed quantities, in order to avoid a profusion of hats.
For any p-norm we define µ and note that As we argue in the next subsection, µ (p) i may be far below its upper bound, and this is the key reason why iterative refinement can work when κ(A) u −1 .
Consider first the computation of r i .There are two stages.First, [17, sec. 3.5].Second, the residual is rounded to the working precision: For the second step we have, by (2.1) and (2.2), For the last step, using the variant [17, Eq. (2.5)] of the rounding error model we have Rewriting gives Hence, using (2.3) and (2.4), We summarize the analysis in the following theorem.As a reminder, we will now use hats to denote computed quantities.
Theorem 2.1.Let iterative refinement in precisions u and u ≤ u be applied to a linear system Ax = b with nonsingular A ∈ R n×n and a given approximation x 0 to x, and assume that the solver used on step 5 of Algorithm 1.1 satisfies (2.1).Then for i ≥ 0 the computed iterate Proof.The result follows from (2.7) on dropping second order terms, since θ i u ≤ 1 by assumption.
We conclude that as long as (2.9) 2µ for all i, the error will contract until a limiting normwise relative error of order is achieved, where θ is an upper bound on the θ i terms (here we have used Thus the normwise relative error will be of order u as long as cond ∞ (A, x)u u, or equivalently cond ∞ (A, x)u 1 when u = u 2 .To achieve (2.9) we need θ i u to be sufficiently less than 1, which is a condition on the solver and the data, and µ (∞) i κ ∞ (A)u to be sufficiently less than 1, which is essentially a condition on the iteration.In the next subsection we consider the latter condition.
Note that the limiting accuracy is essentially independent of θ, as long as θu < 1. Therefore it is not necessary to solve the correction equation Ad i = r i to high accuracy in order to achieve a final relative error of order u.
2.1.Bounding µ i .Now we consider the size of µ (p) i in (2.2).We will focus on the 2-norm, but by equivalence of norms our conclusions also apply to the ∞-norm (although note that translating between norms involves factors depending on the dimension of the problem).Let A have the singular value decomposition A = U ΣV T and denote the jth columns of the matrices of left singular vectors U and right singular vectors V by u j and v j , respectively.(Note that in this subsection only, U denotes the matrix of left singular vectors of A rather than the upper triangular factor from an LU factorization of A.) Since we are interested in the case where A is ill conditioned (but nonsingular), the singular values satisfy 0 < σ n σ 1 .Denote by r i = b − A x i = A(x − x i ) the exact residual for the computed x i .Then we can rewrite (2.2) for p = 2 as (2.10) We have and so The bound tells us that µ (2) i will be much less than 1 if r i contains a significant component in the subspace span(U k ) for any k such that σ n+1−k ≈ σ n .
This argument says that we can expect µ (2) i 1 when r i is a "typical" vectorone having sizeable components in the direction of every left singular vector of A-in which case x − x i is not typical, in that it has large components in the direction of the right singular vectors of A corresponding to small singular values.We cannot prove that r i is typical, but we can verify it numerically, which we do in section 5.
We can gain further insight from backward error considerations.For any backward stable solver (such as LU factorization with appropriate pivoting for stability, or GMRES with Householder orthogonalization [10] or modified Gram-Schmidt orthogonalization [28]) we know that the backward error r i 2 /( A 2 x i 2 ) of the computed solution x i to Ax = b will be small, yet the forward error may be large.For the refinement, the initial backward error will be small and the same will be true for each iterate x i , as refinement does not degrade the backward error.So for an ill-conditioned system we would expect to see that or equivalently µ (2) i 1 assuming x i 2 ≈ x 2 , at least in the early stages of the refinement when x i is not very accurate.However, close to convergence both the residual and the error will be small, so that to increase as the refinement steps progress and this could result in a slowing of the convergence.
We note that Wilkinson [38] comments that "The successive r derived during the course of iterative refinement become progressively more deficient in components corresponding to the smaller singular values of A".This claim is equivalent to saying that µ (2) i will increase with i, but Wilkinson does not justify the claim or make any further use of it.
2.2.The role of θ i .In standard iterative refinement the LU factorization of A is reused to solve Ad i = r i by substitution in each refinement step, where here and in the remaining text r i denotes the computed residual vector.We will now show that no matter how much precision is used in the substitutions, a relative error less than 1 for the computed solution d i cannot be guaranteed.Indeed, it suffices to assume that the substitutions with the computed LU factors L and U are carried out exactly.Then, since by [17,Thm. 9.3] (2.11) we have

Hence
(2.12) , which is at least as large as cond ∞ (A), will usually be of similar size to κ ∞ (A), unless A has poor row scaling.Therefore if κ ∞ (A) ≥ u −1 then (2.12) does not guarantee any relative accuracy in d i , so we have θ i u > 1 in (2.1) and our analysis suggests that iterative refinement may fail.The culprit is the ∆A term, which comes from the LU factorization in precision u.
We conclude that if the correction equation is solved using the original LU factors then standard iterative refinement may fail to converge for very ill conditioned A, no matter how much precision is used in the triangular solves and regardless of the size of the µ i values.
One way to satisfy (2.1) is to use higher precision in computing the LU factorization, but this is very expensive.In the following section we present an alternative approach.We show that the correction equations can be solved with some relative accuracy even for numerically singular A by using a different solver: GMRES preconditioned by the existing (precision u) LU factors.This approach can be motivated by the observation, mentioned in section 1, that even if A is very ill conditioned the computed LU factors still contain useful information.

GMRES iterative refinement.
In this section we will show that we can use GMRES [33] to solve Ad i = r i in iterative refinement in such a way that Theorem 2.1 guarantees accurate solution of ill-conditioned systems.We will use the computed LU factors as left preconditioners, so that GMRES solves the preconditioned system (3.1) where The GMRES method presented in Algorithm 3.1 is a simplified variant in which no restarting is used and we assume that the Compute z = U −1 (L −1 (Av k )) in precision u; store in precision u.

5:
for = 1, . . ., k do 6: end for 9: : 11: Update decomposition Q = HR (via Givens rotations). 12: iteration is started with the zero vector as the initial guess.Additionally, the method uses precision u = u 2 in the triangular solves with L and U and in matrix-vector multiplication with A. The remaining computations are performed in precision u and all quantities are stored in precision u.For clarity, in this section we use hats to decorate all quantities computed in finite precision.To avoid confusion, in the remaining text we will use the word iterations and indices j and k in association with GMRES and the word steps and index i in association with the iterative refinement process.
Our analysis proceeds in three main steps.First, we show that κ ∞ ( A) is small.Then we show that the error s i − s i ∞ in the computed right-hand side s i = f l( U −1 f l( L −1 r i )) is small.Then we use the analysis of [28] to show that our GMRES variant provides a backward stable solution to Ad i = s i .These three results allow us to conclude that Ad i = s i can be solved with some degree of relative accuracy, that is, (2.1) is satisfied.To simplify the analysis we assume in this section that κ ∞ (A) is not too much larger than u −1 , although our experiments in section 5 suggest that the GMRES-based approach can work even when κ ∞ (A) is a few orders of magnitude larger than u −1 .
We begin by showing that the matrix A is well conditioned.Using (2.11) we can write

which give the bounds
Therefore even if A is so ill conditioned that γ n |A −1 || L|| U | ∞ is of order 100 (say), we still expect κ ∞ ( A) to be of modest size.(Note that by comparison with the observation in section 1 that κ( U −1 L −1 A) ≈ 1 + κ(A)u for the computed matrix, here we have a strict bound for the corresponding exact matrix.)Of course, the matrix A is not explicitly formed in preconditioned GMRES.Since GMRES only requires matrix-vector products with the preconditioned coefficient matrix, we compute Av i by forming w i = Av i and performing the triangular solves Ly i = w i and U z i = y i , all at precision u = u 2 .
Unlike A, the right-hand side s i is explicitly formed at the beginning of the preconditioned GMRES algorithm.Using precision u, this computation yields Again, the quantity |A −1 || L|| U | ∞ can be as large as κ ∞ (A).Nevertheless, since precision u is used, we still expect s i − s i ∞ γn s i ∞ as long as κ ∞ (A) is not too much larger than u −1 .We now want to show that GMRES provides a backward stable solution to Ad i = s i .We will use the analysis of [28], where it is proved that the variant of GMRES that uses modified Gram-Schmidt orthogonalization (MGS-GMRES) is backward stable.This proof relies on the observation that, given a matrix A and right-hand side b, carrying out k − 1 iterations of the Arnoldi process is equivalent to applying k steps of modified Gram-Schmidt to the matrix where The term ∆V k−1 = [∆v 1 , . . ., ∆v k−1 ] contains both errors in applying the matrix A to vectors v j and errors in normalizing v j , for j ≤ k − 1.Assuming that matrixvector products and inner products are computed in floating point arithmetic in the usual way, ∆V k−1 ≤ k 1/2 γ n A F .In [28], this bound is then combined with results on the finite precision behavior of MGS, including the loss of orthogonality in the MGS process and the backward stability of MGS for solving linear least squares problems, to show the backward stability of MGS-GMRES for solving Ax = b.
We now consider the case where MGS-GMRES is used to solve Ad i = s i .The only thing that will change computationally is the error in applying A to a vector, which is done in this case without explicitly forming A. Other aspects of the MGS-GMRES algorithm, such as the MGS orthogonalization process and least squares solve, remain unchanged.Therefore if we can show that then carrying through the remaining analysis in [28] shows that the MGS-GMRES backward error results of [28] hold for the left-preconditioned GMRES method run with A and s i , that is, for some k ≤ n, we have We now show that if precision u is used in implicitly applying A to v j , then ∆V k−1 indeed satisfies the required bound (3.5).Using precision u, we compute The computed vector z j can therefore be written where If κ F (A) ≈ u −1 and κ F ( L) is of modest size (which will usually be the case, as L is unit triangular with off-diagonal elements bounded by 1), then since κ F ( U ) κ F (A)κ F ( L), we have Accounting for the errors in normalization, with (3.4) we have with ∆v j = ∆ A v j + A∆v j .Using (3.7), this gives Then after k − 1 iterations, Therefore (3.5) is satisfied, and so the backward error result (3.6) holds.
We now want to show that the computed d i is a backward stable solution to Ad i = s i (with the exact preconditioned residual s i rather than the computed preconditioned residual s i as the right-hand side).Writing s i = s i + ( s i − s i ), from (3.6) we have which, using (3.3) and (3.6) gives the bound Thus the normwise relative backward error for the system (3.1) is and therefore the relative error of the computed d i can be bounded by Thus in (2.1) we can take θ i u = nγ kn κ ∞ ( A). Since, as we have shown, κ ∞ ( A) is small (see (3.2)), we expect that θ i u 1.We note in passing that it can be shown that if the normwise relative backward error is small for the system A d i = s i then it is also small for the system A d i = r i .
We conclude that this variant of preconditioned GMRES can solve for the correction vector with sufficient accuracy to allow convergence of the iterative refinement process.Thus we define a new iterative refinement scheme, where in Algorithm 1.1, the solve in line 5 is performed by invoking GMRES (Algorithm 3.1) with input matrix A, right-hand side r i , preconditioners L, U , and a specified tolerance τ and maximum number of iterations m.We call this method GMRES-based iterative refinement (GMRES-IR).In section 5, we show experimentally that GMRES-IR can indeed converge to an accurate solution to Ax = b even when κ ∞ (A) is a few orders of magnitude larger than u −1 .
In discussing the backward stability of GMRES, we have used results from [28] that are specific to the MGS variant of GMRES.We conjecture however that one could prove similar results (that is, show θ i u 1) when certain other GMRES variants are used to solve for the corrective term with LU preconditioning, such as Householder GMRES [37], and flexible GMRES (FGMRES) [32], which were proved to be backward stable in [10] and [1], respectively.Although out of the scope of this paper, one could potentially use existing results to prove that our approach will work with other Krylov subspace methods besides GMRES.One such potential method is the full orthogonalization method (FOM), for which bounds on the forward error have been given in [3].
As an alternative to A we could use right preconditioning or split preconditioning.Consider the split preconditioning case, where Ā = L The first of these two bounds is the more favorable as it allows the diagonal of U , which has elements of potentially widely varying magnitude, to cancel, whereas in the second bound the L-based term intervenes.A related observation is that for the exact LU factors we have AD = LU D for diagonal D, and D does not affect the pivot sequence.Therefore, since |U ||U −1 | = |U D||(U D) −1 |, the bound for Ā − I has the desirable property of being insensitive to the column scaling of A, so split preconditioning might be the best choice when the matrix is badly-scaled.
3.1.Pivoting strategies for sparse LU.In the sparse case, it is common to use a pivoting strategy that allows for minimizing fill-in of the triangular factors and preallocating data structures.One such technique is static pivoting [11], [12], [22], in which a strict pivot ordering is decided during a structural analysis phase.If a pivot is encountered that is too small then a small perturbation can be added to the diagonal in order to limit the element growth.Another technique for sparse matrices is threshold pivoting, in which an entry a pq is selected as a pivot only if |a pq | ≥ φ max p |a pj |, where 0 < φ < 1.This limits the growth factor to (1 + 1/φ) n−1 .
Another point of interest is the use of an incomplete LU factorization, where the nonzero structure of L and U is restricted based on the nonzero structure of A k for some fill level k ≥ 0. One possibility is to use the complete LU factors for the initial solve and to drop entries from L and U for their use as preconditioners in GMRES-IR.This could allow the preconditioned system to remain reasonably wellconditioned while reducing the cost of applying the preconditioner in some cases.The investigation of incomplete LU factorizations for our purposes remains future work.[20], [21] have designed an iterative refinement method for linear systems Ax = b with ill-conditioned A. They compute an LU factorization with partial pivoting of A T , perform an initial solve, then carry out standard iterative refinement.If iterative refinement fails to converge in a set number of steps then a second phase is entered: W = U −T is computed, the products Z = W A and d = W b are formed using special techniques that yield greater accuracy, and the system Zx = d is solved by LU factorization with partial pivoting followed by iterative refinement.An alternative approach requiring fewer flops is given in [21], in which the preconditioned matrix is constructed by computing an accurate residual of the LU factorization.However, both methods require explicit construction of the preconditioned system, making them unsuitable for sparse problems, and they need a second LU factorization of the preconditioned coefficient matrix.

Related work. Kobayashi and Ogita
No analysis is given in [20], [21] to support the method.However, our analysis is applicable, as we briefly indicate.We need to determine a bound on θu in (2.1) for solution of the update equation Ad i = r i , which is actually solved via (W A)d i = W r i .Here, both W A and W r i are effectively computed at precision u and then rounded to precision u, and an LU factorization of W A is used.Relative errors of order roughly κ(W A)u + u Z −1 W A are incurred.It is an assumption of this method that κ(W A) κ(A), and if this inequality is true we can expect θu 1. Ogita [26] and Oishi, Ogita, and Rump [27] develop algorithms for accurate solution of ill-conditioned linear systems that build approximate inverses of A or its LU factors.These algorithms are very different from that developed here and are not applicable to sparse matrices because of the need to form explicit approximations to Arioli and Duff [1] show that FGMRES implemented in double precision and preconditioned with an LU factorization computed in single precision can deliver backward stability at double precision, even for ill conditioned systems.This work builds on the earlier work of Arioli et al. [2], which focuses on the symmetric indefinite case.
Based on this work, Hogg and Scott [18] have implemented an algorithm for symmetric indefinite systems that computes a solution using a direct solver in single precision, performs iterative refinement using the factorization of A, and then uses mixed precision FGMRES preconditioned by the direct solver to solve the original system.The stopping criteria are backward error-based.
Turner and Walker [36] frame restarted GMRES as a type of "abstract improvement algorithm".They show that restarted GMRES can be viewed as an iterative refinement process where, in each step, the corrective term is found using a fixed number of GMRES iterations.They use this connection with standard iterative refinement to justify the use of high-accuracy computations in selected parts of restarted GMRES.They do not, however, give any supporting analysis, nor do they consider preconditioned versions of GMRES.It is also worth noting that restarted GMRES may not converge.
Our approach is related to those in [1], [2], and [18] in the sense that restarted GMRES can be viewed as an iterative refinement process (see [36]).However our approach differs from those in [1], [2], and [18] in a number of ways.First, we analyze the convergence of the iterative refinement process where a preconditioned GMRES solver is used for refinement, rather than analyze the convergence of GMRES (right) preconditioned by the triangular factors.Second, our emphasis is on solving sparse nonsymmetric linear systems, whereas the algorithms in [2] and [18] are aimed at the sparse symmetric case.Finally-and most importantly-our focus is on the forward error as opposed to the backward error.Our goal is to obtain a forward error of order the unit roundoff, u, whereas a backward error of order u only guarantees a forward error of order κ ∞ (A)u.

Numerical experiments.
In this section we compare the convergence of the forward error in standard iterative refinement and GMRES-IR for problems where the matrix is very ill conditioned.Our test problems include both random dense matrices generated in MATLAB and real-life problems from the University of Florida Sparse Matrix Collection [7], [8].We test two combinations of u and u: single/double precision (u = 2 −24 , u = 2 −53 ) and double/quadruple precision (u = 2 −53 , u = 2 −113 ).Single and double quantities and computations use built-in MATLAB datatypes and routines.For quadruple precision, we use the Advanpix Multiprecision Computing Toolbox [25] with the setting mp.Digits (34), which is compliant with the IEEE 754-2008 standard [19].
For all the test problems in this section, the right-hand-side is generated in MAT-LAB by b = randn(n,1) and then normalized so that b ∞ ≈ 1.This results in a true solution x for which x ∞ is large.We also carried out the same experiments using a small-normed x, by choosing x as a random vector and constructing the righthand side b = Ax using extra precision.The results were similar to the results for large-normed x presented in this section.At the start of each experiment we use the MATLAB command rng(1) to seed the random number generator for reproducibility.We use the MATLAB LU function to compute the LU factorization with partial pivoting.
All quantities are stored in the working precision u.The computation of the residual at the start of each refinement step is done in precision u.Within the GMRES method, the matrix-vector multiplication with A and the triangular solves with L and U are also performed in precision u (as explained in section 3).All other computations are performed in precision u.
In the figures, plots on the left show the relative error e i = x − x i ∞ / x ∞ for standard iterative refinement (red line and circles) and GMRES-IR (blue line and squares), both started from the initial solution obtained via LU factorization with partial pivoting.Here, x is a reference solution computed in precision u 2 (and stored in precision u).We let the process run until the forward error converges to the level = n 1/2 u (indicated by a dashed black line) or the maximum number of refinement steps is reached.Plots in the middle show the computed values of µ (∞) i (in (2.2)), and plots on the right show the computed values of θ i u (in (2.1)) for the solves for the correction terms.In these plots, the dashed black line marks 1, which is an upper bound on µ (∞) i and a constraint on θ i u for convergence of the iterative refinement process.
In all tests in this section, we set the maximum number of iterative refinement steps (parameter i max in Algorithm 1.1) to 15.For GMRES-IR, the maximum number of GMRES iterations m in each iterative refinement step is set to n, although convergence always occurs well before n iterations.The GMRES convergence tolerance (the parameter τ in Algorithm 3.1) is set to 10 −4 .As discussed just after Theorem 2.1, it is not necessary to solve the correction equation to high accuracy.Since we expect the preconditioned matrix A to be very well conditioned the forward error of the correction will be not too much larger than the backward error, so GMRES can be terminated long before the backward error is at the level O(u).In these tests, we found that τ = 10 −4 provided a good balance between ensuring convergence of the GMRES-IR process and minimizing the number of GMRES iterations required.In practice, this parameter may be adjusted depending on the application, the conditioning of A, and the relative costs of standard iterative refinement and GMRES-IR steps.
Table 5.1 shows the number of standard iterative refinement (SIR) steps, the number of GMRES-IR steps, and the number of GMRES iterations summed over all GMRES-IR steps.In the parenthetical list next to the number of GMRES iterations, element i gives the number of GMRES iterations in GMRES-IR step i.Dashes in the table indicate that the method did not converge to the level = n 1/2 u within the maximum number of refinement steps (15 in all experiments).From Figure 5.1, we can see that when the condition number of A is close to but still less than u −1 (Test 1), standard iterative refinement converges within 4 steps.GMRES-IR converges in 2 steps, each of which consists of 3 iterations of preconditioned GMRES.When the condition number of A grows to u −1 and larger, however, standard iterative refinement no longer converges within 15 steps (in fact it diverges in Tests 2, 3, and 4).From the plots on the right we can see that θ i u > 1 for standard iterative refinement in these tests, and so by Theorem 2.1, we should not expect standard iterative refinement to converge.For GMRES-IR, however, θ i u < 1 for all tests, and GMRES-IR converges in at most 3 refinement steps, though Table 5.1 shows that as κ ∞ (A) grows larger more GMRES iterations are required for convergence in each refinement step.
The middle plots display µ 2) and section 2.1) for each iterative refinement step.We see that µ (∞) i starts out close to κ ∞ (A) −1 and grows at a rate proportional to the rate of the decrease of the error e i .So if the iterative refinement process is converging, the error becomes small and µ (∞) i increases towards 1.In Tests 2, 3, and 4, where standard iterative refinement does not converge, µ (∞) i stays small.Failure of GMRES-IR is possible if κ ∞ (A) becomes large enough relative to u −1 , but the algorithm often does better than we might hope.For this class of matrices, GMRES-IR exhibits slower convergence and/or stagnation of the error once κ ∞ (A) 5 • 10 10 .
We now perform an analogous experiment using u = 2 −53 (double precision) and u = 2 −113 (quadruple precision).The problems are the same size (n = 100) and are generated in the same way as before using the MATLAB gallery('randsvd') function, but now with kappa values 10 15 , 10 16 , 10 17 , and 10 18 .
The results are shown in Figure 5.2.We give the total number of standard iterative refinement steps, GMRES-IR steps, and GMRES iterations required for convergence in each test in Table 5.2.
The observations from the single/double experiments hold for the double/quad case as well.In these tests, standard iterative refinement converged in Test 1 and 2 but not in Tests 3 and 4. In Test 3, we can see that it appears that standard iterative refinement may eventually converge to level = n 1/2 u if allowed enough refinement steps.Interestingly, the corresponding plot for θ i u shows that θ i u is very close to 1 (it is around 0.4 in each step), confirming that the iterative refinement process can still make progress despite not having θ i u 1. GMRES-IR converges in 3 refinement steps in all the double/quad tests.Table 5.2 shows that the number of GMRES iterations required per GMRES-IR step increases with κ ∞ (A), although when both standard iterative refinement and GMRES-IR converge, the total number of GMRES iterations is about the same as the number of  standard iterative refinement steps.In Test 4, we see that GMRES-IR can converge to a relative error of order n 1/2 u even when κ ∞ (A) is orders of magnitude larger than u −1 (in this test, κ ∞ (A) = 1.6 • 10 18 ).

University of Florida Sparse Matrix Collection tests.
We now test the two iterative refinement schemes on some ill-conditioned problems from the University of Florida Sparse Matrix Collection [7], [8].Properties of the test matrices are listed in Table 5.3.
We first test matrices that are close to numerically singular for u = 2 −24 (single precision) and u = 2 −53 (double precision); see the the first four rows of Table 5.3.
Figure 5.3 and Table 5.4 show the results in the same format as previous experiments.For the matrices adder dcop 06 and adder dcop 26, standard iterative refinement does not converge, as we would expect since θ i u ≥ 1.For matrices radfr1 and adder dcop 19, θ i u is close to but still less than 1, and standard iterative refinement converges slowly.In all tests, GMRES-IR converges in only a single refinement step consisting of at most 3 iterations of GMRES.
The last four matrices in Table 5.3 were tested using u = 2 −53 (double precision) and u = 2 −113 (quadruple precision).The results can be found in Figure 5.4 and Table 5.5.Again, we see that the behavior of standard iterative refinement and GMRES-IR is as expected.In short, GMRES-IR enables the accurate solution of very ill conditioned problems even when standard iterative refinement fails.5.3.Two-stage iterative refinement.From our numerical experiments, we can see that in some cases, even though A is close to numerically singular, standard iterative refinement still converges (see, e.g., Test 1 in Figures 5. 1-5.4).Since a  We therefore propose a two-stage iterative refinement process, which starts by trying standard iterative refinement and switches to GMRES-IR (and makes use of the existing LU factorization) in case of slow convergence or divergence.The decision of whether to switch from standard iterative refinement to GMRES-IR can be based on the stopping criteria suggested by Demmel et al. [9], which detect when standard refinement is converging too slowly or not at all.The optimal parameters to use in these stopping criteria will be dependent on the particular application.

Conclusions and future work.
There is an argument in numerical analysis that a nearly singular problem does not deserve to be solved accurately, because if the data is inexact there may be an exactly singular problem within the region of uncertainty.The problem should therefore be reformulated or regularized.While this argument is often valid, there is an increasing number of applications where very ill conditioned problems do arise and an accurate solution is warranted, as explained in section 1.Moreover, the trend towards trading precision for performance (single precision for double precision, or half precision for single precision) means that problems that are only moderately ill conditioned at one precision become extremely ill conditioned at the reduced precision.
We have shown that, contrary to the conventional wisdom, iterative refinement can provide a highly accurate solution to a linear system Ax = b with condition number of order u −1 .Our new rounding error analysis shows that it is sufficient to obtain some correct significant digits in solving the correction equation, thanks to a special property of the residuals of the iterates that enables much smaller error bounds to be obtained.Our use of GMRES to solve the correction equation preconditioned by the LU factors (GMRES-IR) yields the necessary accuracy for refinement to work, so it expands the range of accurately solvable linear systems.
More work is required on practical implementation of GMRES-IR.As noted in section 3.1, various pivoting strategies as well as incomplete LU factorization might be used, and the two-stage process proposed in section 5.3 requires various choices of parameters.
Finally, we note that in further work we have shown that GMRES-IR can tolerate the use of an LU factorization computed at less than the working precision.See [6] for details.

Table 5 . 1
Comparison of refinement steps for each method shown in Figure 5.1.

Table 5 . 2
Comparison of refinement steps for each method shown in Figure 5.2.

Table 5 . 4
Comparison of refinement steps for each method shown in Figure 5.3.

Table 5 . 5
Comparison of refinement steps for each method shown in Figure5.4.iterative refinement is likely to be less expensive than a step of GMRES-IR (how much less expensive depends on the number of GMRES iterations in each GMRES-IR step), it may be preferable in such cases to use standard iterative refinement.