A NEW APPROACH TO PROBABILISTIC ROUNDING ERROR ANALYSIS

. Traditional rounding error analysis in numerical linear algebra leads to backward error bounds involving the constant γ n = nu/ (1 − nu ), for a problem size n and unit roundoﬀ u . In the light of large-scale and possibly low-precision computations, such bounds can struggle to provide any useful information. We develop a new probabilistic rounding error analysis that can be applied to a wide range of algorithms. By using a concentration inequality and making probabilistic assumptions about the rounding errors, we show that in several core linear algebra computations γ n can be replaced by a relaxed constant (cid:101) γ n proportional to √ n log nu with a probability bounded below by a quantity independent of n . The new constant (cid:101) γ n grows much more slowly with n than γ n . Our results have three key features: they are backward error bounds; they are exact, not ﬁrst order; and they are valid for any n , unlike results obtained by applying the central limit theorem, which apply only as n → ∞ . We provide numerical experiments that show that for both random and real-life matrices the bounds can be much smaller than the standard deterministic bounds and can have the correct asymptotic growth with n . We also identify two special situations in which the assumptions underlying the analysis are not valid and the bounds do not hold. Our analysis provides, for the ﬁrst time, a rigorous foundation for the rule of thumb that “one can take the square root of an error constant because of statistical eﬀects in rounding error propagation”.

1. Introduction.Modern rounding error analysis faces a double challenge with the rise of large-scale, mixed-precision computations.On the one hand, increasingly large problems are being solved.For example, the supercomputers currently at the top of the TOP500 list 1 are there by virtue of having solved, in record time, linear systems Ax = b of dimensions of order 10 8 by LU factorization, and future exascale systems will solve problems of even larger size.Traditional error analysis does not guarantee small residuals for such large systems because the error constants are so large-yet a small residual is obtained in practice.
On the other hand, low-precision floating-point arithmetic-in particular half precision (fp16)-is becoming increasingly attractive due to both its higher speed and its lower energy consumption [5], [9], [10].For half precision arithmetic the unit roundoff is so large that traditional bounds cannot guarantee an accurate inner product even for relatively small problems (of dimensions in the thousands).Yet, machine learning algorithms do successfully run in half, or even lower, precision.
This discrepancy between theory and practice stems from the fact that traditional rounding error bounds are worst-case bounds and so are pessimistic on average.In most practical cases, they do not provide good estimates of the size of the error, and in particular they overestimate the error growth, that is, the asymptotic dependence of the error on the problem size.
Since the beginning of the digital computer era many researchers have modeled rounding errors as random variables in an attempt to obtain better estimates of how the error behaves on average.These include (in chronological order) von Neumann and Goldstine [27], Henrici [11], [12], [13], Hull and Swenson [17], Tienari [26], Barlow and Bareiss [1], Chatelin and Brunet [7], and Calvetti [4].These treatments typically linearize the forward error into a sum of the form e = n i=1 δ i t i , with |δ i | ≤ u and where u is the unit roundoff.The key intuition is that it is very unlikely that |e| will attain its worst-case magnitude u n i=1 |t i |, which can happen only when each δ i is of maximal magnitude and the δ i t i have identical signs.In most of these references the δ i are assumed to be independent random variables with mean zero, and usually also assumed to be from a uniform distribution or a normal distribution.The central limit theorem (e.g., [2, sec. 27]) shows that as n → ∞ the probability distribution of e/( n i=1 t 2 i ) 1/2 tends towards a normal distribution of mean zero and standard deviation σ ≤ u; therefore for sufficiently large n, the probability that |e| will not exceed u( n i=1 t 2 i ) 1/2 times a small constant λ is very high (e.g., by the "three-sigma rule", it is about 99.7% for λ = 3).Compared to the worst-case constant n i=1 |t i |, ( n i=1 t 2 i ) 1/2 can be smaller by a factor up to √ n.This probabilistic approach to rounding error analysis has led to the well-known rule of thumb, based on informal arguments and assumptions, that constants in rounding error bounds can be replaced by their square roots.For example, Wilkinson applies this rule of thumb in [28, p. 318], [29, pp. 26, 52, 102, 151].However, a rigorous result along these lines for a wide class of algorithms has not previously been obtained, to our knowledge.
The goal of this work is to obtain rigorous probabilistic rounding error bounds for a wide range of linear algebra algorithms based on a minimal number of assumptions and to test them experimentally.
Previous probabilistic rounding error analyses suffer from four important shortcomings.Our results, based on Theorem 2.4 in the next section, overcome these shortcomings as follows.
Backward rather than forward bounds.Previous work has almost exclusively focused on forward error bounds.We choose instead to analyze backward errors.Backward error bounds have the advantage that they bound perturbations to the data and can be interpreted without the need for condition numbers.Moreover, forward error bounds can be directly derived from backward ones as their product with the condition number of the problem.
Bounds correct to all orders.Rounding error analysis of a sequence of operations leads to products of terms 1+δ i with |δ i | ≤ u.When these products are linearized and the central limit theorem is applied to the first order term, bounds containing (implicitly or explicitly) a "+O(u 2 )" term are obtained.In the proof of Theorem 2.4 we overcome this limitation by taking the logarithm of the product, thereby transforming it to a sum.This transformation has the drawback of introducing nonlinearities, but we are able to bound the expected value of a random variable of the form log(1 + δ i ) using Taylor expansions.
Fewer assumptions for a more rigorous proof.Some assumptions made in previous probabilistic error analysis are unnecessary.Our analysis requires only two assumptions beyond rounding errors being bounded in modulus by u (see Model 2.1 below): that the rounding errors have mean zero and that they are independent.In particular, in contrast to much existing work we do not assume any specific probability distribution for the rounding errors (e.g., uniform or normal).Such assumptions are usually made so as to bound the standard deviation σ of the rounding errors.However, since rounding errors are bounded by u, σ is obviously also bounded by u.Additional assumptions on σ are thus mostly unnecessary, as they would only slightly improve the constants in the resulting bounds.Moreover, our assumptions even allow the rounding errors not to be identically distributed, as long as they are independent.
Finally, the major assumption that we drop is the one on the problem size n.To the best of our knowledge, all previous work is based on the crucial assumption that n is "sufficiently large", which is necessary to use the central limit theorem.However, it is difficult to quantify the accuracy of the resulting approximation for a given n.In Theorem 2.4, we overcome this issue by using a concentration inequality (specifically, Hoeffding's inequality [16]) instead of the central limit theorem, and this inequality is valid for all n.
General framework applicable to a wide class of algorithms.Modern rounding error analysis builds on a basic result (Lemma 2.2 below) that bounds the distance from 1 of a product We provide in Theorem 2.4 a probabilistic analogue of this result with bounds proportional to √ nu and with no restriction on n.We also derive corresponding bounds for an inner product combined with a subtraction and a division, matrix-vector products, matrix-matrix products, the solution of triangular systems, and LU and Cholesky factorizations.These are all key computational kernels and so our analysis facilitates the probabilistic error analysis of a wide class of algorithms.
We note that recent work has shown how to relax the condition nu < 1 in some traditional analysis, at the cost of stronger assumptions on the arithmetic than (1.2) below and more complicated proofs [20], [25].
In the next section we obtain the main result of this paper: we show that under a certain probabilistic model the constant (1.1) in the basic result described above can be replaced by a constant γ n proportional to √ nu with probability at least a certain value.We apply this result in section 3 to a variety of numerical linear algebra algorithms.In section 4, we perform an extensive set of numerical experiments on both random and real-life matrices.We provide our concluding remarks in section 5.
Throughout the paper we denote the expectation and standard deviation of a random variable x by E(x) and σ(x), respectively.We use the following classical model for floating-point arithmetic [14, sec.2.2]: This model holds for IEEE arithmetic [19]; indeed, the IEEE standard requires more: that fl(a op b) is the correctly rounded (to nearest) value of a op b.We will refer to δ as the rounding error in the operation, though the absolute error a op b − fl(a op b) is perhaps more commonly so-described.
2. Probabilistic bound for product of rounding errors.To derive our probabilistic error bounds we will use the following model of rounding errors in a given computation.
Model 2.1 (probabilistic model of rounding errors).In the computation of interest, the quantities δ in the model (1.2) associated with every pair of operands are independent random variables of mean zero.
Note that this model does not require the rounding errors to be identically distributed.
Model 2.1 is clearly not always realistic.For example, in some cases δ is necessarily zero, such as when the operands are (not too large) integers in an addition, subtraction, or multiplication; when the operands in a subtraction differ by at most a factor two and so are subtracted exactly (by Sterbenz's result [14,Thm. 2.5]); or when one of the operands is a power of two in a multiplication or division.Or pairs of operands might be repeated, so that different occurrences of δ are in the fact the same.Indeed non-pathological examples can be found where rounding errors are strongly correlated-notably a rational function example of Kahan [14,sec. 1.17].More subtly, if an operand comes from an earlier computation it will depend on an earlier δ and so the new δ will depend on the previous one, violating the independence assumption.
We can hope, nevertheless, that conclusions drawn from Model 2.1 remain approximately true.Indeed, as Kahan notes [21] "The fact that rounding errors are neither random nor uncorrelated will not in itself preclude the possibility of modelling them usefully by uncorrelated random variables."In a similar vein, Hull and Swenson [17] point out that "There is no claim that ordinary rounding and chopping are random processes, or that successive errors are independent.The question to be decided is whether or not these particular probabilistic models of the processes will adequately describe what actually happens." In backward error analysis, products of terms of the form 1 + δ i or its reciprocal, where δ i is a rounding error satisfying (1.2), are ubiquitous.These products are typically simplified by means of the following result [14,Lem. 3.1], which employs the constant We now derive a probabilistic version of this result containing a constant γ n that is proportional to √ n rather than n.Note that the new constant γ n does not require that nu < 1.To do so, we need to use a concentration inequality, that is, an inequality that bounds the probability that the sum of n independent random variables X i deviates from its expected value by a given quantity.Several such inequalities exist [3]; we choose to use Hoeffding's inequality [16,Thm. 2], which requires the variables X i to be bounded.Lemma 2.3 (Hoeffding's inequality).Let X 1 , . . ., X n be independent random variables satisfying Then the sum S = n i=1 X i satisfies We are now ready to state our main result.Define (2.1) Theorem 2.4 (Probabilistic error bound).Let δ 1 , δ 2 , . . ., δ n be independent random variables of mean zero bounded in absolute value by the unit roundoff u, and let ρ i = ±1, i = 1 : n.Then for any constant λ > 0 the relation to obtain the upper and lower bounds We therefore have and taking the absolute value we obtain We can therefore apply Lemma 2.3 with Moreover, by taking the expected value in (2.3) and using E(δ i ) = 0, we obtain and therefore E(log φ) can be bounded by linearity of the expected value: We then have Therefore, with probability at least 1 − 2 exp(−ξ 2 (1 − u) 2 /(2nu 2 )), log φ lies within the interval with endpoints ±(ξ + nu 2 /(1 − u)).To make this probability independent of n, we set ξ = λ √ nu, for some constant λ > 0 independent of n.Then the bound holds with probability at least Taking the exponential, the bound 1 − u also holds with probability at least P (λ).Then holds with probability at least P (λ).The equality θ n = φ − 1 concludes the proof.
Note that γ n (λ) is defined for any n, though the bound (2.5) requires λ The probability P (λ) in (2.2) is independent of n.Moreover, it rapidly approaches 1 as λ increases and is essentially independent of u, as shown in Table 2.1.
It is also important to note that the second order part of the argument of the exponential in (2.1) is innocuous.The argument is λ( , so for λ ≥ 1 (say) the second order term is smaller than the first order term unless the latter is of order 1, in which case the error bound of Theorem 2.4 provides no useful information.
Many rounding error analyses rely on Lemma 2.2 and can therefore potentially yield probabilistic bounds if they are adapted to make use of Theorem 2.4.In the next section we explore some examples from numerical linear algebra.

3.
Application to numerical linear algebra.We now apply Theorem 2.4 within the error analysis of a variety of algorithms in numerical linear algebra.We aim to derive probabilistic bounds that have the same form as the original ones but with γ n replaced by γ n (λ).3.1.Inner products.We first apply Theorem 2.4 to the computation of the inner product of two vectors.Recall that γ n (λ) and P (λ) are defined in (2.1) and (2.2), respectively.We define where ε i and δ i represent the rounding errors from the products and additions, respectively, and δ 1 = 0. We therefore have where by Theorem 2.4 |ψ i | ≤ γ n−max(i,2)+2 (λ) holds for any particular i with probability at least P (λ).For a given i, the latter bound fails to hold with probability at most 1 − P (λ), therefore by the inclusion-exclusion principle [30, p. 39] the bound fails to hold for at least one i with probability at most n(1 − P (λ)).It follows that the probability that the bounds hold for all i is at least holds for all i with at least the same probability, and it is not hard to see that this bound holds for any ordering.
Proof.Assume for the moment that we first form the sum and so by (3.5) Applying Theorem 2.4 to each of the rounding error terms gives where |ψ 0 | ≤ γ 2 (λ) holds with probability at least P (λ) and |ψ i | ≤ γ k−max(i,2)+1 (λ) holds for any particular i with probability at least P (λ).By the same reasoning as in the proof of Theorem 3.1, these inequalities on |ψ i | hold for i = 0 : k − 1 with probability at least Q(λ, k).Different orderings of the evaluation will give different expressions of the form (3.8) with different numbers of (1 + δ i ) ±1 terms corresponding to each 1 + ψ i , but there can never be more than k such terms.The worst case, which leads to γ k (λ), is when the subtraction with c is done first-see [14,Lem. 8.2].The theorem as stated therefore holds for any ordering.
Note that Theorem 3.2 is a backward error result showing that the computed y is the exact result for a perturbed set of b i .Alternatively, the result can be recast to perturb the a i and c instead of the b i .
3.2.Matrix-vector and matrix-matrix products.Building on the error analysis for inner products we can obtain results for matrix-vector and matrix-matrix products.
Theorem 3.3 (Matrix-vector products).Let A ∈ R m×n , x ∈ R n , and y = Ax.Under Model 2.1, the computed result y satisfies with probability at least Q(λ, mn).
Proof.The vector y is obtained via m inner products y i = a T i x, where a i is the ith row of A. By Theorem 3.1, we know that (3.10) holds with probability at least Q(λ, n).Combining the m instances of (3.10) for i = 1 : m, we have that (3.9) holds with probability at least 1 with probability at least Q(λ, mn), and hence with probability at least Q(λ, mnp).
Proof.Equation (3.11) is simply an application of Theorem 3.3.The bound (3.12) follows by combining the p instances of (3.11) and from the fact that 1 The two loops used to evaluate a matrix-vector product can be ordered in an "ij" form based on inner products or a "ji" form based on vector operations.While the proof of Theorem 3.3 assumes the use of inner products, the error bound is nevertheless applicable to both orderings.As for standard error analysis [14, sec.3.5], different orderings result in the same operations being performed in a different order, so the same rounding errors are generated but in a different order and the same error bounds are satisfied.The same is true for Theorem 3.4, with the six possible orderings of the three nested loops underlying a matrix product.In the rest of this paper we will implicitly use this equivalence of different orderings.

LU factorization and linear systems.
In the following three theorems we give probabilistic backward error bounds for triangular systems, LU factorization, and general linear systems.Theorem 3.5 (Solution of triangular systems).Let the triangular system T x = b, where T ∈ R n×n is nonsingular, be solved by substitution.Under Model 2.1, the computed solution x satisfies with probability at least Q(λ, n(n + 1)/2).
Proof.Assuming, without loss of generality, that T is lower triangular, we have The probability of (3.14) holding for all i is therefore at least The result follows by weakening the bounds on |µ j | to γ n (λ).Proof.The Doolittle form of Gaussian elimination [14, secs.2.2, 2.3] gives the following recurrences for the LU factors: We apply Theorem 3.2 to each of these n 2 equations.This readily gives (3.15) with probability at least Theorem 3.7 (Linear system).Let A ∈ R n×n and suppose Gaussian elimination produces a computed solution x to Ax = b.Under Model 2.1, holds with probability at least Q(λ, n 3 /3 + 3n 2 /2 + 7n/6).
Proof.From Theorem 3.6, L U = A + ∆A 1 , where |∆A 1 | ≤ γ n (λ)| L|| U | with probability at least Q(λ, n 3 /3 + n 2 /2 + n/6).By Theorem 3.5, the triangular solves produce y and x satisfying each inequality holding with probability at least Q(λ, n(n + 1)/2).Thus holds with probability at least one minus the probability that one of the bounds for ∆A 1 , ∆L, and ∆U is violated, namely which yields the result.
3.4.Cholesky factorization.We also give a result for Cholesky factorization, because symmetry brings an improvement in the probability compared with LU factorization.
Theorem 3.8.If Cholesky factorization applied to the symmetric positive definite matrix A ∈ R n×n runs to completion then the computed factor R satisfies Proof.The recurrences for R can be written We apply Theorem 3.2 to each of the n(n − 1)/2 equations for the r ij with i < j.For r jj we need an analogue of Theorem 3.2 in which a square root replaces the division (cf.[14, Prob.10.4]): the constant is γ k+1 (λ) and the probability remains Q(λ, k).
We readily find that (3.17) holds with probability at least 1 The interesting feature of Theorem 3.8 is that the argument f (n) in the probability Q(λ, f (n)) for Cholesky factorization is about half that for LU factorization.The reason is that there are half as many invocations of Theorem 3.2, which is essentially because there are half as many flops, and hence half as many probabilistic events that need to hold.In fact, for all the algorithms analyzed in this section, the argument f (n) in the probability is approximately half the flop count, that is, the bounds hold with probability approximately Q(λ, flops/2); this is a direct consequence of Theorem 3.1: an inner product requires 2n − 1 flops and the error bound holds with probability Q(λ, n).9.9646e−01 −2.5407e+00 −3.5397e+03 11.5 1.0000e+00 1.0000e+00 9.9999e−01 9.8723e−01 −1.1770e+01 12.0 1.0000e+00 1.0000e+00 1.0000e+00 9.9996e−01 9.6413e−01 12.5 1.0000e+00 1.0000e+00 1.0000e+00 1.0000e+00 9.9992e−01 13.0 1.0000e+00 1.0000e+00 1.0000e+00 1.0000e+00 1.0000e+00 The second important feature is that the overall probability Q(λ, f (n)) depends on the problem dimensions, where f (n) is as large as n 3 /3 + 3n 2 /2 + 7n/6 for the solution of Ax = b by LU factorization.An important question is thus how fast λ must increase in order that Q(λ, cn 3 ), where c is a constant as in our bounds, stays independent of n.Write P (λ) as Then, recalling the definition (3.1), we have 3 ) is bounded below independent of n.From the expression of ε, it is easy to check that λ must therefore increase proportionally to √ log n, which represents an extremely slow increase.The dependence on n of Q(λ, f (n)) therefore does not cause any serious deterioration in the probabilities as long as we increase λ a little.Table 3.1 shows values of Q(λ, n 3 /3) for double precision arithmetic, a range of n up to 10 10 , and λ = 10 : 13.In order to avoid cancellation affecting the results the values are computed at 100-digit precision using the Multiprecision Computing Toolbox [23] and then rounded to the accuracy shown.For λ = 13, we have a probability of 1.0000 for n ≤ 10 10 (the same is true for half precision, which is not shown in the table); this value of λ therefore suffices for the largest dense linear systems that will be solved on exascale computers.Furthermore, as we show in the next section, the probability Q(λ, f (n)) is actually very pessimistic and in practice the bounds hold with much smaller values of λ.
4. Numerical experiments.We now present a set of numerical experiments designed to give insight into our probabilistic bounds.The experiments have three main aims: to test the sharpness of the bounds and the probabilities; to compare the rate of growth of the error with n with that of the probabilistic (γ n (λ) ≈ √ nu) and deterministic (γ n ≈ nu) bounds; and to check whether the probabilistic bounds are applicable at all, since they were derived under Model 2.1.
These experiments are carried out with MATLAB R2018b.Computations are performed in single precision, except in section 4.1.2,where half (fp16) and quarter (fp8) precisions are used, and section 4.5, where double precision is used.Half precision (corresponding to the IEEE standard) and quarter precision (not standard, and as suggested in [22]) are simulated with the rounding function chop.m from [15].The "exact" quantities appearing in the backward error formulas for inner products and matrix-vector products are computed in double precision.
In sections 4.1-4.4we use randomly generated matrices and vectors.We compare different types of distributions, in particular • random uniform [0, 1]: rand(m,n); • random uniform [−1, 1]: (rand(m,n)-0.5)*2; • random constant: rand*ones(m,n).In order to make the experiments reproducible, we use rng(1) to seed the random number generator at the beginning of each script generating a figure of this section.We have made these scripts available online2 .For each size of problem n, we run the same experiment N test times and plot the maximum and mean backward errors, denoted by ε max bwd and ε mean bwd , respectively.We have set N test = 100 for Figures 4.1  4.7.We compare these backward errors with their deterministic and probabilistic bounds γ n and γ n (λ).Throughout this section, the probabilistic bounds are plotted taking λ = 1.In Section 4.5 we report numerical results on a large set of real-life matrices coming from a variety of applications, taken from the SuiteSparse collection [8].
The inner product and matrix-vector product computations were implemented in MATLAB using loops such as, for x = a T b, x = 0; for i=1:n x = x + a(i)*b(i); end This code corresponds to our analysis, where every floating-point operation involves a rounding, so that the model (1.2) of floating-point arithmetic is applicable throughout.If we compute the inner product as x = a'*b then, depending on the details of how the underlying BLAS operation is coded and optimized, the sum could be accumulated using fan-in (a binary tree) or with extra precision for intermediate quantities, both of which can make the constants in our analysis and the traditional deterministic bounds pessimistic by a factor up to n [6].
4.1.Inner products.We first report numerical experiments with inner products y = a T b.We record the backward error of the computed y, 4.1.1.Random uniform vectors.We consider the case where the vectors a and b have random entries from the uniform [0, 1] or uniform [−1, 1] distributions.The results, plotted in Figure 4.1, show that the probabilistic bound, plotted with λ = 1, is in much better agreement with the actual backward error than the deterministic one and is sharp for the [0, 1] data.In theory, the probabilistic bound can only be guaranteed to hold with high probability for λ 6.Nevertheless, out of 5000 runs (50 different problem sizes times N test = 100), the bound holds with λ = 1 in all cases except one (which would require a slightly higher value λ = 1.02) in Figure 4.1a and in all cases in Figure 4.1b.Therefore, the probability Q(λ, n) is extremely pessimistic.
However, and perhaps most importantly, in the case of vectors with entries on [0, 1] (Figure 4.1a) the probabilistic bound successfully captures the asymptotic behavior of the error growth, which follows √ n rather than n.Interestingly, this is not the case for vectors with entries on [−1, 1] (Figure 4.1b).Since our theory does not assume the data to follow any specific random distribution, and since the probabilistic bound   is sharp for the [0, 1] data, we conclude that it cannot be further improved without additional assumptions.
4.1.2.Lower precision arithmetics.Now we repeat the experiment from section 4.1.1 with uniform [0, 1] vectors a and b but using precisions lower than single to compute the inner product y = a T b.We report the results using fp16 (u = 2 −11 ) and fp8 (u = 2 −4 ) arithmetics in Figures 4.2a and 4.2b, respectively.The results lead to the same conclusions as when using single precision.Importantly, the deterministic bound is unable to guarantee even a single digit of accuracy when n ≥ 10 3 in fp16, and yet the error is only of order 10 −2 .Our probabilistic bound is able to successfully explain and predict this behavior.This effect is even clearer with fp8 arithmetic.4.2.Two cases where Model 2.1 is invalid.In this section, we present two cases where Model 2.1 is invalid and the probabilistic bound does not hold.In the first case the rounding errors have nonzero mean and in the second case they are dependent.
4.2.1.Very large nonnegative vectors: rounding errors have nonzero mean.We consider the inner product of two vectors a and b of very large size n = 10 8 .Their entries are sampled from the uniform [0, 1] distribution and are thus nonnegative.In Figure 4.3a, we plot the backward error and its bounds at each step i of the algorithm.As expected and previously analyzed (see Figure 4.1a), the error is in good agreement with the probabilistic bound (here, with λ = 1) for moderate values of i.However, for large values of i (starting at around 10 6 ), the error starts increasing rapidly and violates the probabilistic bound.Increasing λ is not sufficient in this case, because for i ≥ 10 6 the error increases at a faster rate than the bound.Model 2.1 is thus clearly invalid in this case.
The explanation is that for large nonnegative vectors the value of the inner product y = a T b is large, whereas each increment a i b i is (potentially much) smaller and certainly bounded by 1.Let y i be the partial sum i−1 k=1 a k b k , so that y i+1 = y i + a i b i .The computed y i+1 satisfies, by ( If n is very large, then at some point y i will become so large that incrementing it by fl(a i b i ) will not change its computed value, that is, y i+1 = y i .Specifically, let q be the integer such that 2 q−1 ≤ y i < 2 q ; the spacing between each floating point number in this interval is 2 q u.Therefore, if fl(a i b i ) < 2 q−1 u, we have y i+1 = y i and so by (4.2) y i δ i + fl(a i b i )(1 + δ i ) = 0, that is, δ i = − fl(a i b i )/( y i + fl(a i b i )) < 0. As i and thus y i and q increase, the probability that fl(a i b i ) < 2 q−1 u also increases.It is thus clear that as i increases the mean of the errors δ i will deviate from zero, which violates the assumption made by Model 2.1.This is illustrated in Figure 4.3b, which shows a histogram of the values of δ i .For i ≤ 10 6 the δ i have a distribution of mean approximately zero, but for 10 6 ≤ i ≤ 10 8 the mean is significantly smaller than zero.We note that in half precision this issue arises as soon as n ≈ 10 4 (cf.Figure 4.2a).
4.2.2.Constant vectors: rounding errors not independent.We now perform the same experiment as in section 4.1.1 but with vectors a and b for which a i = α, b i = β, i = 1 : n, where α and β are from the uniform [0, 1] distribution.This experiment leads to a very different result.Indeed, as shown in Figure 4.4a, the probabilistic bound does not bound the error.Indeed the probabilistic bound has a slower asymptotic growth than the error, which is unaffected by increasing λ (which has only a constant effect on a logarithmic scale), so clearly Model 2.1 is not valid for this constant data.
The explanation is that, in this case, the δ i in the proof of Theorem 3.1 are not independent and thus Model 2.1 is violated.In fact, all computed quantities s i that lie between the same consecutive powers of two produce the same rounding error.As illustrated in Figure 4.5, since the spacing between floating-point numbers is constant between consecutive powers of two, and since the increments a i b i = αβ are also constant, the errors in the sums s i = f l( s i−1 + f l(αβ)) must be constant as well.This phenomenon is illustrated in Figure 4.4b, for n = 10 4 .Each time s i crosses a power of two, the rounding error being accumulated changes.If it remains of the same sign, the overall error keeps increasing (this is what happens, e.g., around i ≈ 850 and i ≈ 3400), whereas if it switches sign, the overall error first decreases until it crosses the exact result and starts increasing again (this is what happens, e.g., around i ≈ 1700 and i ≈ 6800).
We mention that the phenomenon of successive rounding errors reinforcing rather than cancelling was observed and analyzed by Huskey and Hartree [18] in the numerical solution of differential equations on the ENIAC .4.3.Matrix-vector products.Next, we consider matrix-vector products y = Ax, where A ∈ R n×n .We compute the backward error where the latter formula follows from the Oettli-Prager theorem [14, Thm.7.3], [24].
In Figure 4.6 we compare the backward error with the probabilistic bound given by Theorem 3.3, with constant γ n (λ) and the deterministic bound with constant γ n [14, sec.3.5].Similarly to the case for inner products, the probabilistic bound can be guaranteed to hold with high probability only for λ 7, yet λ = 1 leads to only 63 cases out of 500 violating the bound and for λ = 1.24 the bound holds in every case.Even though the probability is pessimistic, the bound γ n (λ) itself is quite sharp in the [0, 1] case and successfully predicts the error growing proportionally to √ n.
4.4.Linear systems.We now consider the solution of linear systems Ax = b via LU factorization.We compute the backward error Fig. 4.5: Illustration of the fact that the error θ i = s i +incr − s i+1 is constant between consecutive powers of two when the increment fl(a i b i ) = incr is constant (here, s i is the first instance of the sum in the interval [2 q−1 , 2 q ]).We first compute the LU factorization of A and measure the backward error For 25 of these real-life matrices, the backward error violates the deterministic, worstcase bound γ n due to a large amount of underflow, which is not included in our standard floating-point model (1.2).We filter these matrices out and plot the backward errors for the LU factorization for the remaining 1139 matrices in Figure 4.8a.Then we use the computed LU factors to solve the system Ax = b.For an additional 195 matrices, we are unable to compute a solution because the U factor is singular.We are left with 944 matrices, for which we plot the linear system backward errors in Figure 4.8b.The probabilistic bound holds with λ = 1 for all but two matrices for LU factorization and one matrix for the solution of Ax = b.Increasing λ to 2.26 makes it hold for every matrix.The probabilistic bound often exceeds the actual error by still a few orders of magnitude but is closer to the actual backward error than the deterministic bound by several orders of magnitude.4.6.Discussion.From the numerical experiments presented in this section, we can draw the following overall conclusions.
• The values of Q(λ, f (n)) that provide a lower bound on the probability of our bounds holding are extremely pessimistic.The probabilistic bounds hold in all cases with a value of λ smaller than 2.26.While our analysis in section 3.5 suggests that λ should increase asymptotically like √ log n, in order to keep the probability bound independent of n, we have not detected any such effect in our experiments.
• The constant γ n (λ) in the probabilistic bounds can be sharp, both in the sense of yielding sharp bounds (as in Figure 4.6a) and, more importantly, in correctly predicting that the error grows like √ n.Therefore without any as-sumption on the matrices and vectors, or further assumptions on the rounding errors, the probabilistic bounds cannot be improved upon.• Overall, the experiments show that our probabilistic Model 2.1 can give useful predictions of the backward error for both random matrices and matrices from real-life applications.The examples in sections 4.2.1 and 4.2.2, however, reveal two situations in which the assumptions in the model do not hold and the probabilistic bound can then be violated.• For matrices and vectors with elements from the uniform [−1, 1] distribution we have observed the backward errors to be much smaller than for the uniform [0, 1] case, and to grow little, or not at all, with n.Even the probabilistic bounds are pessimistic in this case.Explaining the difference between the [0, 1] and [−1, 1] cases is an interesting question for future research.

Conclusions.
We have shown that under the assumption that rounding errors are independent and of zero mean, probabilistic backward error bounds for several key matrix and vector operations can be obtained that have exactly the same form as classic deterministic bounds, but are of order λ √ nu instead of nu, for a constant λ.Strictly, λ should be proportional to √ log n in order to keep the probabilities from decaying with n, but this term is less than 5 for dense linear systems that can currently be solved, that is, for n ≤ 10 8 .
The new bounds hold with a probability bounded below by a quantity Q(λ, f (n)), and λ = 12 suffices to give a probability within 10 −7 of 1 for n ≤ 10 8 for matrix factorizations.Even this value is pessimistic, as throughout our tests the probabilistic bounds held with λ = 2.26, except in two tests where the model is not applicable.
Our analysis therefore provides the first rigorous justification of the rule of thumb that one can take the square root of the constant in a deterministic error bound to obtain a more realistic bound that takes account of statistical effects in rounding error propagation.Moreover, our results apply for any n, not just for sufficiently large n, as would be the case for results based on the central limit theorem.
A key question underlying any probabilistic analysis is whether the results it produces are useful for error estimation and for predicting the asymptotic rate of error growth with problem dimension.Our experiments show that the probabilistic bounds are indeed useful for both random matrices and real-life matrices from the SuiteSparse collection.However, we identified two situations involving inner products in which the underlying assumptions are not valid and the bounds do not apply: large nonnegative vectors, for which the rounding errors eventually have nonzero mean, and constant vectors for which the rounding errors are dependent.Clearly, care is required in applying and interpreting the probabilistic error bounds.
In future work we will explore further applications of our probabilistic analysis.We will also modify our analysis to allow for inner products and matrix-vector and matrix-matrix multiplications being implemented in optimized forms that use extra precision internally, as is often the case in modern processors.

From
the backward error bound of Theorem 3.1 we immediately have the forward error bound (3.6) |y − y| |y| ≤ γ n (λ) |a| T |b| |a T b| with the same probability bound.Next we analyze an important kernel that appears in the solution of triangular systems and in LU factorization.Theorem 3.2.Let y = c − k−1 i=1 a i b i /b k be evaluated in floating-point arithmetic.Under Model 2.1, no matter what the order of evaluation the computed y satisfies (3.7) then subtract the result from c and divide by b k .Then

Theorem 3 . 4 (
Matrix-matrix products).Let C = AB with A ∈ R m×n and B ∈ R n×p .Under Model 2.1, the jth column of the computed C satisfies(3.11)

Fig. 4 . 1 :
Fig. 4.1: Backward error and its bounds for the computation in single precision of the inner product y = a T b, for vectors a and b with random uniform entries.Here, N test = 100 and λ = 1.

Fig. 4 . 2 :
Fig. 4.2: Backward error and its bounds for the computation of the inner product a T b for random uniform [0, 1] vectors in lower precision arithmetic.Here, N test = 100 and λ = 1.γ n and γ n are plotted as 1 when their value exceeds 1.

Fig. 4 . 3 :
Fig. 4.3: Computation of the inner product of two vectors a and b of size n = 10 8 from the uniform [0, 1] distribution.Here, λ = 1 and γ i is plotted as 1 when iu > 1.

Fig. 4 . 4 :
Fig. 4.4: (a): backward error and its bounds for the inner product s n = a T b of two random vectors a and b with constant entries from the uniform [0, 1] distribution.Here, N test = 100 and λ = 1.(b): Backward error at step i of the computation of s n with n = 10 4 .Vertical dashed lines indicate the values of i for which s i crosses a power of two.

Fig. 4 . 6 :
Fig.4.6:Backward error and its bounds for the matrix-vector product y = Ax, with a matrix A and vector x with random uniform entries.Here, N test = 10 and λ = 1.

Fig. 4 . 8 :
Fig. 4.8: Backward error and its bounds for LU factorization and the solution of a linear system Ax = b for a set of matrices from the SuiteSparse collection sorted by size.Here, λ = 1.