Random Matrices Generating Large Growth in LU

. We identify a class of random, dense, n \times  n matrices for which LU factorization with any form of pivoting produces a growth factor typically of size at least n/ (4log n ) for large n . The condition number of the matrices can be arbitrarily chosen, and large growth also happens for the transpose. Previously, no matrices with all these properties were known. The matrices can be generated by the MATLAB function gallery(flrandsvdfl,...) , and they are formed as the product of two random orthogonal matrices from the Haar distribution with a diagonal matrix having only one diagonal entry different from 1, which lies between 0 and 1 (the ``one small singular value"" case). Our explanation for the large growth uses the fact that the maximum absolute value of any element of a Haar distributed orthogonal matrix tends to be relatively small for large n . We verify the behavior numerically and find that for partial pivoting the actual growth is significantly larger than the lower bound and much larger than the growth observed for random matrices with elements from the uniform [0 , 1] or standard normal distributions. We show more generally that a rank-1 perturbation to an orthogonal matrix producing large growth for any form of pivoting also generates large growth under reasonable assumptions. Finally, we demonstrate that GMRES-based iterative refinement can provide stable solutions to Ax = b when large growth occurs in low precision LU factors, even when standard iterative refinement cannot.

The code uses the function gep from the Matrix Computation Toolbox [18] to compute the growth factor for LU factorization with partial pivoting on a random n \times n matrix A with n = 750. The growth factor is defined by where a (k) ij (k = 1 : n) are the elements at the kth stage of the factorization [19, sect. 9.3], [39]. Growth of over 100 for a matrix of this size with partial pivoting is very unusual. Unusually large growth is also obtained for the same matrix with rook pivoting and complete pivoting: (See [19, sect. 9.1], [34], [39] for details of all these pivoting strategies.) Large growth factors are undesirable because they are a warning that numerical instability is likely in the LU factorization, as originally shown by Wilkinson [39]. Several classes of matrices generating large growth factors for partial pivoting are known. Wilkinson [39, p. 327], [40, p. 212] showed that the n \times n matrix of the form illustrated for n = 4 by gives \rho n = 2 n - 1 , which is the worst case for partial pivoting. Higham and Higham [20] give examples of practically occurring n \times n matrices for which \rho (A) \gtrsim n/2 for any pivoting strategy; they are all orthogonal matrices or well conditioned diagonal scalings of orthogonal matrices. Wright [41] describes a class of two-point boundary value problems for which the multiple shooting method leads to a linear system on which partial pivoting suffers exponential growth. The matrix is block lower bidiagonal, except for a nonzero block in the top right-hand corner. Foster [10] shows that a quadrature method for solving a practically occurring Volterra integral equation gives rise to dense linear systems for which partial pivoting again gives growth factors exponential in the dimension. In all these examples the matrices are well conditioned.
The matrix in our example has 2-norm condition number \kappa 2 (A) = \sigma 1 /\sigma n = 10 8 and a singular value decomposition (SVD) of the form A = P \Sigma Q T \in \BbbR n\times n , P T P = Q T Q = I, (1.1a) \Sigma = diag(1, . . . , 1, \sigma n ), 1 \geq \sigma n \geq 0. (1.1b) Here, n -1 of the singular values of A are equal to 1, and the last one is less than or equal to 1. The matrices P and Q are orthogonal matrices from the Haar distribution, that is, they are distributed according to the Haar measure, which is the unique measure on the orthogonal matrices that is invariant under multiplication on the left and right by orthogonal matrices [31]. A Haar distributed random orthogonal matrix can be obtained as the orthogonal QR factor of a matrix with elements from the normal (0,1) distribution, provided that the factorization is normalized so that the diagonal elements of R are nonnegative [3], [36].
Matrices of the form (1.1) are generated by a MATLAB function call of the form gallery(flrandsvdfl,n,kappa,mode) with kappa = \sigma - 1 n \geq 1 and mode = 2 (the default value of mode is 3, which produces geometrically distributed singular values).   pivoting, rook pivoting, and complete pivoting. For each dimension, we generated 12 matrices and took the mean growth factor. The figure illustrates the results for \kappa 2 (A) = 10 2 , 10 6 , 10 10 . As above, we used the gep function, which computes the exact growth factor (as opposed to the lower bound max i,j | u ij | / max i,j | a ij | that must be used if we have access to the LU factors but not the intermediate quantities). We used the Parallel Computing Toolbox [33] to speed up the computations. We see that irrespective of the condition number, the growth factor increases with n at a rate roughly proportional to n for all three pivoting strategies. Experiments with other condition numbers confirm that the condition number has little effect on the growth factor. The largest growth factor observed in this experiment was 497. By contrast, for random matrices with elements from the uniform [0, 1] distribution (rand in MATLAB) or the normal (0, 1) distribution (randn in MATLAB) the figure shows that the growth factor for partial pivoting grows more slowly than linearly in n (as previously observed in [38]). The curves for the minimum and maximum growth factors are broadly similar to those for the means shown in Figure 1.1, and indeed every growth factor for the matrices (1.1) lies above the black curve, whose significance is explained in the following sections.
The significance of the matrices (1.1) is that they provide a new class of dense matrices A for which Downloaded 02/06/21 to 192. 41.114.226. Redistribution subject to CCBY license \bullet A generates large growth for any pivoting strategy, \bullet A T also generates large growth for any pivoting strategy, and \bullet \kappa 2 (A) is arbitrary and easily assigned by choosing \sigma n in (1.1).
The existing examples of large growth mentioned above are all well conditioned, some produce large growth only for partial pivoting, and not all of them produce large growth for A T .
A growth factor of order \alpha n for some constant \alpha < 1 with \alpha > 1/10 (say) may not seem to be a serious problem, given that the worst-case growth for partial pivoting is 2 n - 1 . But matrix dimensions in practical problems are increasing, with dense linear systems of order 10 7 being solved on today's largest machines [4]. The backward error bound for solution of a linear system Ax = b by LU factorization is proportional to \rho n u, where u is the unit roundoff [19,Thm. 9.5], so growth of order n can be problematic. Matters are exacerbated by the increasing use of low precision arithmetics such as IEEE half precision (u \approx 5 \times 10 - 4 ) [24] and bfloat16 (u \approx 4\times 10 - 3 ) [25]. Low precision LU factorizations are being combined with iterative refinement to achieve faster solution times [14], [15], [16], [23], and the new HPL-AI benchmark uses this approach [8]. In low precision arithmetic large growth can even cause overflow. Indeed, we spotted the large growth factor for randsvd matrices with mode 2 because it led to overflow in LU factorization on these matrices in IEEE half precision arithmetic, for which the largest finite number is of order 6 \times 10 4 [15].
In the next section we prove that for large n, large growth typically occurs for the matrices (1.1) with \kappa 2 (A) = 1. This is the case of Haar distributed orthogonal matrices. In section 3 we show that if an orthogonal matrix generates large growth for any pivoting strategy then large growth persists after a random rank-1 perturbation, under reasonable assumptions. We specialize the results to a rank-1 perturbation of a Haar distributed orthogonal matrix, that is, matrices of the form (1.1) with an arbitrary \kappa 2 (A). In section 4 we provide an alternative analysis for the growth factor of a rank-1 perturbation of an orthogonal matrix based on the Sherman--Morrison formula. In section 5 we investigate the ability of mixed precision iterative refinement to overcome the instability in LU factorization caused by large growth factors.
2. Orthogonal matrices from the Haar distribution. We first consider the case where \sigma n = 1 in (1.1), so that A = P Q T with P and Q orthogonal matrices from the Haar distribution. Since the Haar distribution is invariant under left or right multiplication by an orthogonal matrix, A is also Haar distributed, so we are effectively taking a single sample from the Haar distribution.
We need the following result from [20].
Theorem 2.1 is used in [20] to show that for certain specific matrices that are orthogonal, or are well conditioned diagonal scalings of orthogonal matrices, the inequality \rho n (A) \gtrsim n/2 holds for any pivoting strategy.
The lower bound in (2.1) is not as large as those for the orthogonal matrices and well conditioned diagonal scalings of orthogonal matrices in [20], but those matrices are nonrandom. Orthogonal matrices from the Haar distribution are the first class of random orthogonal matrices to be shown to give large growth. Figure 2.1 shows the results of an experiment in which we generated Haar distributed orthogonal matrices of dimensions n = 100 : 100 : 2500 and computed the growth factors for partial pivoting, rook pivoting, and complete pivoting. For each dimension we generated 12 matrices and we show the maximum and average growth factors. All the growth factors in this experiment exceed n/(4 log n) by a factor of more than 5, and they increase with n a little more rapidly than this approximate lower bound. As expected, the growth factor for partial pivoting exceeds that for rook pivoting, which in turn exceeds that for complete pivoting.
3. Rank-1 perturbations of orthogonal matrices. Matrices A of the form (1.1) can be written as where P and Q are orthogonal matrices from the Haar distribution and p n and q n are the last columns of P and Q, respectively. So A is a rank-1 perturbation of P Q T , which is a Haar distributed orthogonal matrix and hence tends to give large growth. Preservation of large growth under rank-1 perturbations is not limited to Haar distributed orthogonal matrices or to the special form of the vectors making up the rank-1 perturbation in (3.1), as we now show with an experiment. We consider four different nonrandom orthogonal matrices W identified in [20] that have a maximum element of at most (2/n) 1/2 and produce growth factors of at least n/2 for any pivoting strategy. Specifically, we take for W the MATLAB Downloaded 02/06/21 to 192.41.114.226. Redistribution subject to CCBY license Table 3.1 Growth factors for LU factorization with partial pivoting for four orthogonal matrices W = gallery('orthog',1000,j) and 20 rank-1 perturbations A = W + xy T with random x and y sampled from the uniform (0, 1) or normal (0, 1) distributions and scaled so that \| x\| 2 = \| y\| 2 = 1.

Random uniform
Random normal j \rho n(W ) min \rho n(A) mean \rho n(A) max \rho n(A) min \rho n(A) mean \rho n(A) max \rho n(A) matrices gallery(florthogfl,n,j) with j = 1, 2, 5, 6 and dimension n = 1000. We use LU factorization with partial pivoting and show in Table 3.1 the growth factor for W and the minimum, mean, and maximum growth factors for A = W + xy T with 20 vectors x and y having elements sampled from the uniform (0, 1) or normal (0, 1) distributions and then scaled to have unit 2-norm. The results show at worst a minor attenuation of the growth factor, with at least one perturbation giving an increased growth factor for every matrix type. Another notable feature of this experiment is that the largest element in magnitude of the upper triangular U factor was in the (n, n) position in every case for both W and A.
3.1. Analysis for general orthogonal matrices. We wish to analyze how element growth changes under a rank-1 update. We will initially derive a formula that holds for a rank-1 update of a general nonsingular matrix B \in \BbbR n\times n , with x, y \in \BbbR n . We assume that B has an LU factorization. We know that the U factor in the LU factorization of B \in \BbbR n\times n is given explicitly by [19, sect. 9.2] 1 and hence, analogously to (3.4), assuming B(1: i, Combining this equation with (3.3) we obtain .

(3.5)
Now we specialize to the situation of interest, where W is an orthogonal matrix. Since we found in our experiments at the start of this section that the largest element of U in magnitude was always the (n, n) element, we take i = j = n. Now (3.5) becomes, using .

(3.7)
We will argue that this ratio is likely to be of order 1, so that large growth for W manifested in a large | u nn | translates into a large | \widetil u nn | and hence large growth for A. This is certainly not guaranteed; indeed, if y = - W T x then the numerator is zero, but such x, y, and W are specially correlated. Indeed, for (3.1) we have y T W T x = \sigma n -1, so for small \sigma n , (3.7) need not be of order 1. We will adapt our analysis for this case in the next subsection.
To analyze the more typical behavior, we will assume that x is constructed as follows, and likewise for y: let z 1 , . . . , z n be independent random variables from the normal (0, 1) distribution, and set x = z/\| z\| 2 . Then x and y are uniformly distributed over the n-dimensional unit sphere [29]. We will assume that W is a fixed matrix and will bound the expectations of the numerator and denominator in (3.7).
We need the following lemmas, in which . Let z \in \BbbR n be uniformly distributed over the n-dimensional unit sphere and let g \in \BbbR n be a constant vector. Then \BbbE (| z T g| ) = \mu n \| g\| 2 .
Lemma 3.2. Let x, y \in \BbbR n be independent vectors uniformly distributed over the n-dimensional unit sphere and let B \in \BbbR n\times n be a constant matrix. Then  First, we consider the numerator of (3.7). By Lemma 3.2, \BbbE (| y T W T x)| ) \leq \mu n < n - 1/2 . Hence the expected value of the numerator of (3.7) is of order 1.
Now we turn to the denominator of (3.7). Using Lemma 3.2, we have By the CS decomposition (see Theorem A.1), W n - 1 has n - 2 singular values 1 and one singular value c \leq 1, and | w nn | = | c| .
For the matrices in the example at the start of this section, this bound is approximately 1 n + (n 2 /4)(2/n) n 2 = 3 2n .
In both cases, these quantities are much less than 1 for large n, so the expected value of the denominator of (3.7) will be of order 1. We have focused on the (n, n) element of U and argued that when x and y are uniformly distributed on the unit sphere and | u nn | = max i,j | u ij | we can expect | \widetil u nn | to be of a similar order of magnitude to | u nn | . This is what we observed in the experiment at the start of this section, where the (n, n) element was always the largest for both U and \widetil U . If large growth is not reflected in the (n, n) element one can consider a pair (i, j) in (3.5) for which | u ij | is large. As long as i = n -k and j \geq n -k with k small, the analysis generalizes. In particular, we can apply a similar argument to the (n -k) \times (n -k) submatrices appearing in the numerator (after suitable column permutations) and denominator of (3.5). A case in point is the randsvd matrices, which we consider next. Downloaded 02/06/21 to 192.41.114.226. Redistribution subject to CCBY license This analysis is for LU factorization without pivoting. If the pivoting strategy produces an LU factorization \Pi 1 A\Pi 2 = \widehat L \widehat U , with \Pi 1 and \Pi 2 permutation matrices, then we can rewrite (3.2) as \Pi 1 A\Pi 2 = \Pi 1 B\Pi 2 + (\Pi 1 x)(\Pi 2 y) T and apply the analysis with A \leftarr \Pi 1 A\Pi 2 , B \leftarr \Pi 1 W \Pi 2 , x \leftarr \Pi 1 x, and y \leftarr \Pi 2 y. Since we are interested in orthogonal W for which large growth is obtained for any pivot sequence, our conclusions are unaffected.
3.2. Analysis for randsvd matrices. We now consider matrices A of the form (1.1) with P and Q from the Haar distribution, which we recall from (3.1) can be written as where p n and q n are the last columns of P and Q, respectively. Since W = P Q T is Haar distributed, it typically gives a large growth factor for large n, as shown in section 2. However, this large growth is not usually reflected in u nn when \sigma n is small. Indeed, A has just one nonunit singular value, \sigma n , and P A = LU implies \pm \sigma n = det(A) = det(U ) = u 11 . . . u nn ; for \sigma n \ll 1, the pivoting strategy will tend to produce a rank revealing factorization, which in this context means one with a well conditioned leading principal (n -1) \times (n -1) submatrix and hence a small | u nn | . However, in the experiment with randsvd matrices reported in section 1, large growth was always observed in the (n -1, n -1) element of U ; indeed, the ratio | u n - 1,n - 1 | / max i,j | u ij | exceeded 0.5 and 0.1 in 82 percent and 98 percent of the cases, respectively. We therefore set i = j = n -1 in (3.5) to obtain \widetil u n - 1,n - 1 u n - 1,n - 1 = 1 + y(1: n -1) T W - 1 n - 1 x(1: n -1) 1 + y(1: n -2) T W - 1 n - 2 x(1: n -2) .

(3.11)
The numerator is the same as the denominator in (3.7), and the denominator has an analogous form. Therefore in (3.10) we take x = (\sigma n -1)p n and y = q n . For large n, the vectors p n and q n have components that are approximately normally distributed random variables with mean 0 and standard deviation n - 1/2 [12], [27,Cor. 1], so they are approximately uniformly distributed on the unit sphere. The analysis of section 3.1 therefore gives insight into why the ratio (3.11) is typically of order 1 and hence why A inherits a large growth factor from P Q T .
Experiments show that Haar distributed orthogonal matrices maintain large growth under a wider class of rank-1 perturbations than (3.10). Figure 3.1 plots growth factors for partial pivoting for A = W + xy T , with W an orthogonal matrix from the Haar distribution and x and y generated with elements from the uniform distribution on [0, 1] and then scaled so that \| x\| 2 = \| y\| 2 = 1. For each n = 100: 100: 2500 we generated 12 random A and took the mean growth factor. We see that the growth factors for A are very similar to those for W .
It is interesting to note that, unlike for (3.10) with small \sigma n , A = W + uv T is very well conditioned when u and v are random unit 2-norm vectors with independent entries from the same distribution. Indeed, Benaych-Georges and Nadakuditi [1, sect. 3.2] show that, almost surely We mention some further matrices that generate large growth and are related to those we have considered. Matrices of the form (1.1a) with arithmetically distributed singular values are found to produce large growth in [15], though these are not low rank updates of orthogonal matrices. A referee reported experimental evidence that matrices of the form (1.1a) whose first n -1 singular values are exponentially distributed on [0, 1/2] with \sigma n \ll 1/2 give large growth. We have also observed that large growth is preserved under rank-k perturbations of Haar distributed orthogonal matrices for k \geq 1, with the growth factor decreasing as k increases.

Analysis via the Sherman--Morrison formula.
In section 3 we used the explicit characterization (3.3) of U in order to study growth factors for rank-1 perturbations xy T of orthogonal matrices, focusing on the case where \| x\| 2 \leq 1 and \| y\| 2 \leq 1. In this section we look at rank-1 perturbations of orthogonal matrices from a different perspective, applying the Sherman--Morrison formula and then making use of the indirect bound from Theorem 2.1. We will show that growth of order n/(4 log(n)) typically arises for large n for any rank-1 perturbation xy T of a Haar distributed orthogonal matrix whenever the vectors x and y have 1-norm bounded by 1 and have elements of roughly uniform magnitude. We need the following general result.  where W \in \BbbR n\times n is orthogonal, t \in [0, 1], and x, y \in \BbbR n satisfy \| x\| 1 \leq 1 and \| y\| 1 \leq 1. Let and suppose that \alpha w < 1. Then A is nonsingular and, for any pivoting strategy Downloaded 02/06/21 to 192.41.114.226. Redistribution subject to CCBY license producing an LU factorization for A, the growth factor satisfies (4.2) \rho n (A) \geq 1 -t\alpha w \alpha w (\alpha w + t\| x\| \infty \| y\| \infty ) .
In the case where W is an orthogonal matrix from the Haar distribution, we have \alpha w \lesssim 2 \sqrt{} log(n)/n for large n, as noted in section 2. In this case, Theorem 4.1 gives which matches the bound (2.1) for the unperturbed case. Under the constraints \| x\| 1 \leq 1 and \| y\| 1 \leq 1, the additional requirement (4.5) will hold for any 0 \leq t \leq 1 when the vectors x and y have elements of roughly equal magnitude, because then \| x\| \infty \approx \| y\| \infty \approx 1/n. Now we consider two particular random choices of x and y.
Now let x = u/\| u\| 1 and y = v/\| v\| 1 , where u and v are columns of Haar distributed orthogonal matrices. For large n, the vectors u and v have components that are approximately independent normally distributed random variables with mean 0 and Downloaded 02/06/21 to 192. 41.114.226. Redistribution subject to CCBY license standard deviation n - 1/2 [27,Cor. 1], [28,Thm. 3]. Since the mean of the absolute value of a standard normal random variable is (2/\pi ) 1/2 [30, eq. (3)], the 1-norms of u and v have means approximately (2/\pi ) 1/2 n. Moreover, the \infty -norm of a random vector z \in \BbbR n with independent standard normal components has mean and variance bounded above by terms of order \surd log n and log n, respectively [7, Appendix A]; an application of the Chebyshev inequality [2, p. 80] then allows us to bound \| z\| \infty by order \surd n log n with high probability. Identifying u and v with z/ \surd n, we find that x and y both have \infty -norms bounded above by order log n/n with high probability whence, with t = 1, (4.5) and (4.6) follow.
Theorem 4.1 also shows that existing growth factor bounds obtained for orthogonal matrices, such as those in [20], are essentially unchanged under appropriate rank-1 perturbations.
We note that Theorem 4.1 constrains the 1-norms of x and y. Since the 1-norm is generally larger than the 2-norm, this Sherman--Morrison-based analysis complements rather than replaces that in section 3.

5.
Curing instability with mixed precision iterative refinement. When an LU factorization of A suffers large growth and we use the factorization to solve Ax = b, the solution usually (but not always [20]) has a correspondingly large backward error. Suppose A is one of the types of matrix identified in this paper that has an LU factorization with a large growth factor; how can we obtain a backward stable solution to Ax = b using this factorization? The natural answer is to apply iterative refinement. Indeed, it has been known since the 1970s that iterative refinement can cure instability in LU factorization [26], [35].
A recent usage of iterative refinement is with the LU factorization computed at a lower precision than the working precision, with residuals possibly computed in extra precision, and with the refinement equation solved either by substitution using the LU factors (denoted LU-IR) or by GMRES using the LU factors as preconditioners (known as GMRES-IR). GMRES-IR was proposed by Carson and Higham in [5], [6], and the analysis therein (notably [6, Thm. 4.1]) implies that it can tolerate instability in the factorization provided that the convergence of GMRES is not hindered by a lower quality preconditioner. Element growth is likely to reduce the quality of the preconditioner, so it is of interest to test experimentally the effect of a large growth factor on the convergence of GMRES.
We present an experiment in which we used mode 2 gallery(flrandsvdfl) matrices (that is, matrices of the form (1.1)) of varying dimensions, and \kappa 2 (A) = 10 2 and \kappa 2 (A) = 10 7 . The iterative refinement algorithms that we use are characterized by a triple of precisions (p 1 , p 2 , p 3 ), where p 1 is the precision at which the LU factorization is computed, p 2 is the working precision, and p 3 is the precision at which the residual is computed. We consider three precision combinations (H, S, D), (H, D, D), and (S, D, D), where H, S, and D denote half precision (u \approx 4.88 \times 10 - 4 ), single precision (u \approx 5.96 \times 10 - 8 ), and double precision (u \approx 1.11 \times 10 - 16 ), respectively. Half precision computations are performed using the chop function 2 of Higham and Pranesh [22]. The right-hand side vector is generated using randn. Iterative refinement is terminated when \| b -A\widehat x\| \infty \| b\| \infty + \| A\| \infty \| \widehat x\| \infty \leq nu, where the left-hand side is the relative backward error, and u is the unit roundoff  of the working precision. The inner GMRES iterations are terminated based on a backward error criterion for the preconditioned system with tolerances 10 - 2 and 10 - 4 for working precisions of single and double, respectively, and a maximum of 20 iterative refinement steps are performed. In practice, we hope for convergence in a handful of iterative refinement steps, but we allow more in order to explore the speed of convergence for different problems and the two methods. Table 5.1 shows the convergence for \kappa 2 (A) = 10 2 and Table 5.2 shows the growth factors and condition numbers. Tables 5.3 and 5.4 give the corresponding information for \kappa 2 (A) = 10 7 . We need \kappa \infty (A)u sufficiently less than 1 to guarantee convergence of LU-IR and \kappa \infty ( \widehat U - 1 \widehat L - 1 A)u sufficiently less than 1 to guarantee convergence of GMRES-IR [6].
Both LU-IR and GMRES-IR successfully solve the problems with \kappa 2 (A) = 10 2 . For \kappa 2 (A) = 10 7 , LU-IR fails to converge in several instances, whereas GMRES-IR always converges within three iterative refinement steps, even though the condition guaranteeing convergence is not satisfied for (H, S, D). This behavior is consistent with the theory [6]. The important finding is that the inner GMRES solves converge in a modest number iterations, which shows that the large growth does not inhibit the ability of the computed low precision LU factors to act as effective preconditioners for GMRES.
We note that the convergence of the refinement could be enhanced by improving Downloaded 02/06/21 to 192.41.114.226. Redistribution subject to CCBY license ative refinement is able to cure the instability, and we found that the performance of GMRES-IR, which uses the low precision computed LU factors as preconditioners for a GMRES-based solution to the correction equations, is unaffected by the lower quality computed LU factors.
Appendix A. The CS decomposition. We state here the CS decomposition with square diagonal blocks. For more details (including the most general form of the CS decomposition) see, e.g., Golub and Van Loan [11, p. 85], Paige and Wei [32], or Stewart and Sun [37, sect. 5.1].