INCOMPLETE PRECONDITIONER MAX-PLUS APPROXIMATION LU FACTORIZATION

. We present a new method for the a priori approximation of the orders of magnitude of the entries in the LU factors of a complex or real matrix A . This approximation is used to determine the positions of the largest entries in the LU factors of A , and these positions are used as the sparsity pattern for an incomplete LU factorization preconditioner. Our method uses max-plus algebra and is based solely on the moduli of the entries of A . We also present techniques for predicting which permutation matrices will be chosen by Gaussian elimination with partial pivoting. We exploit the strong connection between the ﬁeld of Puiseux series and the max-plus semiring to prove properties of the max-plus LU factors. Experiments with a set of test matrices from the University of Florida Sparse Matrix Collection show that our max-plus LU preconditioners outperform traditional level of ﬁll methods and have similar performance to those preconditioners computed with more expensive threshold-based methods.

1. Introduction. Max-plus algebra is the analogue of linear algebra developed for the binary operations max and plus over the real numbers together with −∞, the latter playing the role of additive identity. Max-plus algebraic techniques have already been used in numerical linear algebra to, for example, approximate the orders of magnitude of the roots of scalar polynomials [19], approximate the moduli of the eigenvalues of matrix polynomials [1,10,14], and approximate singular values [9]. These approximations have been used as starting points for iterative schemes and in the design of preprocessing steps to improve the numerical stability of standard algorithms [3,6,7,14,20]. Our aim is to show how max-plus algebra can be used to approximate the sizes of the entries in the LU factors of a complex or real matrix A and how these approximations can subsequently be used in the construction of an incomplete LU (ILU) factorization preconditioner for A.
In order to be able to apply max-plus techniques to the matrix A ∈ C n×n we must first transform it into a max-plus matrix. We do this using the valuation map (1) V c : C → R := R ∪ {−∞}, V c (x) = log |x| (log 0 = −∞).
The valuation map is applied to matrices componentwise, so that V c (A) ∈ R n×n is a max-plus matrix. Note that for x, y ∈ C, V c (xy) = V c (x) + V c (y), and when |x| |y| or |x| |y|, then V c (x + y) ≈ max{V c (x), V c (y)}. This suggests using the operations max and plus, which we denote by ⊕ and ⊗, respectively, in place of the classical addition and multiplication once we have applied the map V c . The fundamental basis for our approximation of the magnitude of the entries of the LU factors of A ∈ C n×n is (a) the fact that the entries in the lower triangle of L and the upper triangle of U can be expressed explicitly in terms of determinants of submatrices S of A, and (b) the heuristic that, when the matrix S has large variation in the size of its entries, V c (det(S)) ≈ perm(V c (S)), where perm is the max-plus permanent. We use (a) and (b) to define a lower triangular max-plus matrix L and an upper triangular max-plus matrix U such that (2) V c (L) ≈ L, V c (U ) ≈ U, and we refer to L and U as the max-plus LU factors of A := V c (A) ∈ R n×n . The approximation (2) is a heuristic which only aims to capture the order of magnitude of the entries of L and U . One way to think about the max-plus LU approximation of the LU factors of A is as an intermediate between the true LU factors of A and a symbolic or Boolean factorization which, based purely on the pattern of nonzero entries in A, predicts the nonzero patterns of the LU factors. We show that the matrix-matrix product L ⊗ U is usually not a factorization of A but that it "balances" A, in a sense made precise below.
In order for the max-plus approximation to be useful in practice, it is essential that the cost of computing it be less than the cost of computing the LU factorization exactly. We show that the max-plus LU factors can be computed by solving maximally weighted tree problems. As a result we provide an algorithm for computing the LU approximation of A ∈ C n×n with worst case cost O(nτ + n 2 log n), where τ is the number of nonzero entries in A. Note that this cost depends on the number of nonzero entries in A and not on the number of nonzero entries in the LU factors of A. Thus while the approximate LU factors will exhibit fill-in just as in the exact case, the cost of computing the approximation is not affected by fill-in and will therefore be less than that of computing the exact LU factors. If the matrix A is first reordered according to its optimal assignment, so that the product of the moduli of the entries on its diagonal is maximized, then our approximation of the LU factors can be computed in parallel by n separate computations, each of individual cost O(τ + n log n). If we seek only the positions and values of the k largest entries in each row of U and column of L, or if we seek only the position and values of the entries that are greater in modulus than some threshold, then this cost can be reduced further.
An approximation of the size of the entries in the LU factors of a sparse matrix A can be used to help construct an ILU preconditioner for solving Ax = b, that is, a pair of sparse lower-and upper triangular matrices L, U such that the preconditioned matrix AU −1 L −1 is more amenable to iterative methods such as GMRES [17]. Two classes of ILU preconditioners are threshold ILU and ILU(k). In threshold ILU, Gaussian elimination is applied to A, but any computed element with modulus less than some threshold value is set to zero. By storing only the larger entries, threshold ILU is able to compute effective preconditioners.
For ILU(k) preconditioners, a sparsity pattern for the ILU factors is first computed from a symbolic factorization that determines the level of fill-in of each fill-in entry of A [17, sec. 10.3]. A fill-in entry is dropped when its level of fill is above k, and the corresponding entry in the sparsity pattern matrix is set to zero. The ILU factors are then computed using a variant of Gaussian elimination restricted to the sparsity c 2017 SIAM. Published by SIAM under the terms of the Creative Commons 4.0 license pattern, such as that presented in [17,Alg. 10.3]. ILU(k) preconditioners can be computed using fast parallel algorithms, e.g., [12], but that may not reliably result in effective preconditioners because the sparsity pattern of the factors depends only on the sparsity pattern of A and not on the numerical values of its entries.
Our max-plus LU approximation enables us to take a hybrid approach that offers the best of both of these methods. We use the max-plus LU factors L and U to define the sparsity pattern of the ILU preconditioners by allowing only entries with a value over a certain threshold. Provided the entries of L and U give good approximations of the size of the true LU entries, our approach results in an ILU pair very close to that obtained through standard threshold ILU. Our method for computing the maxplus sparsity pattern makes n independent calls to Dijkstra's algorithm with an early stopping condition. This is remarkably similar to the basic method for computing an ILU(k) sparsity pattern, which makes n independent calls to the breadth first search algorithm with an early stopping condition. So it is reasonable to assume that the max-plus sparsity pattern can be computed almost as efficiently as the ILU(k) sparsity pattern. As a result, the total time for computing the max-plus ILU factors should be competitive with that of the ILU(k) factors, since once we have the max-plus sparsity pattern, we can use the same efficient algorithm for computing the incomplete factors as ILU(k) does.
The remainder of this paper is organized as follows. In section 2 we introduce the max-plus permanent and discuss how it can be used to approximate the order of magnitude of the determinant of a complex matrix. This approximation forms the basis of our LU factor approximation. In section 3 we define the max-plus LU factors of a max-plus matrix and argue that they can be used to approximate the orders of magnitude of the entries in the LU factors of a complex matrix. We also show how our max-plus LU factorization can be adapted to include pivoting and examine the special case of Hungarian scaled matrices. In section 4 we examine the connection between max-plus LU factors and the LU decomposition of matrices of Puiseux series, and use this connection to prove several of the theoretical results that are stated earlier in the paper. In section 5 we give a derivation of our different max-plus LU algorithms and describe our max-plus ILU preconditioner. In section 6 we apply our max-plus LU approximation and ILU preconditioning technique to a set of test problems from real life scientific computing problems.
Throughout this paper, complex matrices will be denoted by capital letters, with their entries denoted by the corresponding lowercase letter in the usual way, A = (a ij ) ∈ C n×n . Matrices of complex Puiseux series will be denoted by capital letters with a tilde, and their entries by the corresponding lowercase letter also with a tilde, A = (ã ij ) ∈ C{{z}} n×n , where C{{z}} denotes the field of Puiseux series. Maxplus matrices will be denoted by calligraphic capital letters, and their entries by the corresponding lowercase calligraphic letter, A = (a ij ) ∈ R n×n . Since the most important results of this paper are the heuristic max-plus approximations, we will present these results in the style of theorems with a justification following each heuristic in lieu of a proof.
2. Heuristic approximation of the determinant. If we replace the sum by a maximum and the product by a summation in the Leibniz formula for the determinant of A ∈ C n×n , where Π(n) is the set of all permutations on {1, . . . , n}, and replace the complex scalars a i,π(i) by scalars a i,π(i) ∈ R, we obtain the formula for the max-plus permanent of The following heuristic is fundamental to our max-plus LU approximation.
Heuristic 2.1. Let A ∈ C n×n be sparse with nonzero entries that vary widely in magnitude, and let V c be as in (1). Then Justification. The determinant of A is a sum of terms the vast majority of which are zero (due to sparsity) and the remainder of which vary widely in order of magnitude (due to the wide variation in entry magnitude). The order of magnitude of the sum of a small number of terms of widely varying magnitude can then be approximated by the order of magnitude of the greatest of those terms, which is precisely perm(V c (A)).
We show in section 4 that the permanent can also be used to calculate the exact asymptotic growth rate of the determinant of a generic matrix of a Puiseux series, which provides some additional support for Heuristic 2.1. In the meantime let us look at a few examples.
Of course we can easily find counterexamples where the approximation in (4) is very poor. However, we can think of these matrices as occupying a set of small measure, so that the order of magnitude of the determinant of a "typical" complex matrix will be well approximated.
Choosing ω i = 1, i = 1, . . . , 4, yields a singular A and log | det(A)| = −∞, which is not detected by the max-plus approximation since perm(V c (A)) = 0. Likewise whenever det(A) is close to zero, the max-plus approximation will not be accurate. However, for most choices of ω the approximation will capture the order of magnitude of det(A). Indeed, if each ω i is an independent random variable uniformly distributed on the unit circle, then | det(A)| has expected value E(| det(A)|) = 4/π ≈ 1, and for small > 0 the probability P(| det(A)| ≤ ) ≈ /π. Thus the choices of ω i for which the max-plus approximation fails to capture the order of magnitude of det(A) represent a set of small measure.
3. Max-plus LU factors. An LU decomposition of A ∈ C n×n is a factorization of A into two factors, a unit lower triangular matrix denoted by L and an upper triangular matrix denoted by U such that A = LU . The entries of the L and U factors can be given explicitly in terms of determinants of submatrices of A (see [5, p. 35] or [11, p. 11]) by and l ik = u kj = 0 for i, j < k, where A(i : j, k : ) denotes the submatrix of A formed by the intersection of the rows i to j and columns k to . If both the numerator and denominator in (5)- (6) are zero, then we use the convention 0/0 = 0. If the denominator is equal to zero but the numerator is not, then we say that A does not admit an LU decomposition. If all of the denominators in (5)-(6) are nonzero, then A = LU is the unique LU decomposition of A.
Based on these formulae we define the max-plus LU factors of A ∈ R n×n to be the unit lower triangular max-plus matrix L and the upper triangular max-plus matrix U with entries given by and l ik = u kj = −∞ if i, j < k. If the two terms on the right-hand side of (7) or (8) are −∞, then we use the convention −∞ − (−∞) = −∞. If the second term is −∞ but the first is not, then we say that A does not admit max-plus LU factors.
Justification. From Heuristic 2.1, we expect that the determinant of a submatrix of A is zero if and only if the permanent of the corresponding submatrix of V c (A) is minus infinity. Therefore if V c (A) admits max-plus LU factors, then A admits an LU factorization A = LU , where the LU factors are as in (5)- (6). Taking the valuation of these expressions, applying Heuristic 2.1, and comparing to (7)- (8) gives the required result.
which provide good approximations of the orders of magnitude of the entries in L, U .
Since V c (A) is independent of the choice of the ω i since |ω i | = 1, so are its max-plus LU factors. The (2, 2) entry of U is the only entry where the max-plus approximation is not guaranteed to be perfectly accurate, but for most choices of the ω i the max-plus approximation captures the order of magnitude of the entries of L and U . There is, however, a small set of parameter values of small measure for which the max-plus approximation is not accurate.
Our definition of the max-plus LU factors of a max-plus matrix was chosen so that that we could use it to approximate the orders of magnitude of the entries in the LU factors of a complex matrix. But what do the max-plus LU factors of a max-plus matrix A ∈ R n×n tell us about A?
Theorem 3.4. Suppose that A = (a ij ) ∈ R n×n has max-plus LU factors L, U ∈ R n×n . Then for each i, j = 1, . . . , n either where the maximum is attained by at least two different values of k, or The proof of Theorem 3.4 is provided in section 4. We say that the max-plus matrix product L ⊗ U balances A.
3.1. Pivoting. After k steps of Gaussian elimination applied to A ∈ C n×n , the matrix A is reduced to where the M i are Gauss transforms and U (k) 11 ∈ C k×k is upper triangular. Like the LU factors themselves, the entries of U (k) can be expressed in terms of determinants of submatrices of A, as the next lemma shows.
Lemma 3.5. Let A ∈ C n×n have an LU factorization, and let U (k) be as in (11). Then where U = U (n−1) is the upper triangular factor in the LU factorization of A.
Proof. Suppose that i, j > k. Let R i and C j be elementary matrices swapping rows k + 1 and i, and columns k + 1 and j, respectively. Define A := R i AC j , and let U (k) be the matrix obtained after performing k steps of Gaussian elimination on A . Then U (k) = R i U (k) C j and in particular u so that the (k + 1, k + 1) entries of U (k) and M k+1 U (k) = U (k+1) are the same, that k+1,k+1 = u k+1,k+1 , and by (6), The next steps of Gaussian elimination leave the (i, j) entries of U (k) with min{i, j} ≤ k unchanged so that u (k) ij = u ij for min{i, j} ≤ k. We say that A ∈ C n×n is partial pivoting free if (13) |u where U (0) = A. If the matrix A is partial pivoting free, then it is possible to apply Gaussian elimination with partial pivoting to A without the need for any row interchanges. Let A = LU be the LU decomposition of A, and suppose that we compute an approximate LU pair L, U ∈ C n×n using Gaussian elimination. The backward error of these approximate LU factors is equal to the perturbation ∆A ∈ C n×n such that A + ∆A = L U and is known to satisfy [8, Lem. 9.6]: where u is the unit roundoff and ρ n (A) is the growth factor for A defined by (14) ρ n (A) = max 0≤k≤n−1 max k≤i,j≤n |u Thus if ∆A ∞ is small relative to A ∞ , which certainly happens when ρ n (A) is small, then the factorization is stable; otherwise it it unstable [8, sec. 9.3].
In analogy to (13), we say that the max-plus matrix A ∈ R n×n is partial pivoting free if (15) where u (0) ij := a ij and Also, in analogy to (14) we define the max-plus growth factor of A ∈ R n×n by Theorem 3.6. If A ∈ R n×n is partial pivoting free, then n (A) = 0.
The proof of Theorem 3.6 is deferred to the end of section 4.
Heuristic 3.7. For A ∈ C n×n we have V c (ρ n (A)) ≈ n (V c (A)). ) ≈ U (k) for k = 0, . . . , n − 1. Then using this and taking the valuation of (14) yields If V c (A) is partial pivoting free, then it follows from Theorem 3.6 that n V c (A) = 0 so that, based on Heuristic 3.7, the growth factor ρ n (A) should be of order one, implying a backward stable LU factorization. As before, this is a heuristic, and it is not difficult to construct counterexample matrices A for which V c (A) is partial or full pivoting free but that cannot be factorized in a stable way without further pivoting. Theorem 3.6 and Heuristic 3.7 suggest applying a permutation P to a given A such that V c (P A) is partial pivoting free. We show in section 5.2 how to update our max-plus LU algorithm to include partial pivoting. Another option is to apply Hungarian scaling, which is a two-sided diagonal scaling applied to A ∈ C n×n along with a permutation P that maximizes the product of the moduli of the diagonal entries of the matrix where D 1 , D 2 ∈ R n×n are nonsingular and diagonal, and such that H's entries satisfy (19) |h ij | ≤ 1, |h ii | = 1, i, j = 1, . . . , n.
We refer to any complex matrix satisfying (19) as a Hungarian matrix. The max-plus . . , n, and is referred to as a max-plus Hungarian matrix.
Theorem 3.8. Max-plus Hungarian matrices always admit max-plus LU factors. Proof. Suppose that H ∈ R n×n max is Hungarian. From (7) and (8)  Theorem 3.9. A max-plus Hungarian matrix is partial pivoting free. Proof. It follows from (15) and (16) that H is partial pivoting free if (20) perm for all i = k + 1, . . . , n and for all k = 0, . . . , n − 1. But since h ij ≤ 0 for all i, j, the permanent of any submatrix of H must be nonpositive. Hence the right-hand side of the inequality in (20) must be less than or equal to zero. Also, since H has zero diagonal entries, the permanent of any principal leading submatrix of H is equal to zero. Therefore the inequality in (20) must have left-hand side equal to zero, so that H is partial pivoting free. Therefore, given A ∈ C n×n , we apply Hungarian scaling to obtain H = P D 1 AD 2 , and from Theorems 3.6 and 3.9 and Heuristic 3.7, we expect that it should be possible to factorize H in a stable way without any need for interchange. This preprocessing technique was originally suggested by Olschowka and Neumaier in [15]. They prove that Hungarian scaling removes the need for interchange in Gaussian elimination for some special classes of matrices. While our results do not constitute a definite theorem, they provide some intuitive explanation for the widely observed fact that Hungarian scaling significantly reduces the need for pivoting (see section 6).
is not partial Similarly A is not partial pivoting free. It is easy to check that the matrices P A and V c (P A) with P = 0 1 1 0 are both partial pivoting free. Now a Hungarian scaling for A with P = I, is partial pivoting free, and it is easy to check that H is also partial pivoting free.

Puiseux series.
There is a stronger connection between the field of complex Puiseux series and the semiring R max = (R, ⊕, ⊗) than between the field of complex numbers and R max , which we now exploit to prove properties of the max-plus LU factors as well as Theorems 3.4 and 3.6 and to provide further justification of Heuristic 2.1. This section is not needed for the derivation of the max-plus LU algorithms presented in section 5.
Complex Puiseux series with m ∈ N, k ∈ Z, c i ∈ C, i ≥ k, and c k = 0 form an algebraically closed field under addition and multiplication denoted by C{{z}}. On that field, we define the valuation that is, the valuation of a Puiseux series is minus the degree of its lowest order term. This valuation provides a near homeomorphism between C{{z}} and R max , where the third relation holds except for when V p (f ) = V p (g) and where the coefficient of the lowest order term of f is equal to minus that of g. As for complex matrices, the valuation V p is applied componentwise to matrices with Puiseux series entries. We decorate matrices in C{{z}} n×n with a tilde to distinguish them from matrices in C n×n .
Any entry of A ∈ C{{z}} n×n can be written as where C = (c ij ) =: L( A) ∈ C n×n is the matrix of lowest order term coefficients of A with L : C{{z}} n×n → C n×n . For a set of permutations Φ ⊂ Π(n), we define the the set of optimal assignments for A. The next lemma identifies the set of matrices with Puiseux series entries such that the valuation of the determinant is exactly the permanent of the valuation (see Heuristic 2.1 for matrices with complex entries).
where h.o.t. stands for higher order terms. We break the sum into two parts, one over ap(A) and one over Π(n) \ ap(A). We have that The next lemma will be useful in showing that V p (det( A)) = perm(V p ( A)) holds for generic matrices A ∈ C{{z}} n×n but also in explaining what we mean by generic in this context. Proof. For each Φ ⊂ Π(n), g Φ (C) is a polynomial in the coefficients of C. A polynomial is either identically equal to zero or only zero on some low dimensional subset. Therefore is either the whole of C n×n or it is a lower dimensional subset of C n×n . Choose some permutation π ∈ Φ, and define C π ∈ C n×n by c ij = 1 if j = π(i) and c ij = 0 otherwise. By construction we have g Φ (C π ) = sign(π) = ±1 = 0 and therefore C π ∈ V (g φ ). Therefore V (g φ ) cannot be the whole of C n×n and must instead be a lower dimensional subset.
is a finite intersection of generic subsets and is therefore generic.
Now if A ∈ C{{z}} n×n is such that L( A) ∈ G n , then g ap(Vp( A)) (L( A)) = 0. Lemma 4.2 states that G n is a generic set, so that the property g ap(Vp( A)) (L( A)) = 0 is a generic property for A ∈ C{{z}} n×n with respect to the topology induced by the map L : C{{z}} n×n → C n×n . A more intuitive way of understanding this result is that, if we have a matrix A where the leading order coefficients L( A) have been chosen at random, according to a continuous distribution, then with probability one g ap(Vp( A)) (L( A)) = 0 will hold. We then say that It is easy to check that L( A) ∈ G 3 , det( A) = −z −3 + z −2 , and that V p (det( A)) = perm(V p ( A)) = 3 as expected from Lemma 4.1.
We now show how to use Puiseux series to further justify Heuristic 2.1.

Justification of Heuristic
For a desired level of approximation accuracy, we define the domain of the asymptotic regime of f to be the neighborhood of zero, denoted by Then, assuming thatž 0 is in the domain of the asymptotic regime of f , we have This approximation falls short of being a theorem because, given onlyž 0 and V p (f ), we have no way of guaranteeing thatž 0 is in the domain of the asymptotic regime of f . In other words, there is no uniform scale for determining what constitutes a small value of z ∈ C.
We can apply the same idea to approximate the determinant of A ∈ C n×n . Suppose that we know V p ( A) with A ∈ C{{z}} n×n such that A(ž 0 ) = A for somež 0 ∈ C and L( A) ∈ G n . Then, assumingž 0 is in the domain of the asymptotic regime of A, it follows from (26) that Assuming thatž 0 is in the domain of the asymptotic regime of det( A) ∈ C{{z}} and applying (26), we have Using Lemma 4.1 and (27), we obtain that V c (det(A)) ≈ perm(V c (A)), which provides another justification for Heuristic 2.1. We will need the next lemma.
Proof. If P, Q ∈ C n×n are permutation matrices, then C ∈ C n×n \ G n if and only if P CQ ∈ C n×n \ G n , so it suffices to prove the result for the principal submatrix C of order k, which we denote by S. Then As for complex matrices, an LU factorization of A ∈ C{{z}} n×n is a factorization of A into a lower triangular matrix L ∈ C{{z}} n×n with ones on the diagonal and an upper triangular matrix U ∈ C{{z}} n×n such that A = L U . When the factorization exists, the nonzero entries of L and U can be defined as in (5)-(6) with A in place of A. The next result should be compared to Heuristic 3.1.
Theorem 4.5. Let A ∈ C{{z}} n×n be such that L( A) ∈ G n , and suppose that V p ( A) ∈ R n×n admits max-plus LU factors L, U ∈ R n×n . Then A admits an LU If for A ∈ C{{z}} n×n , V p ( A) does not admit max-plus LU factors, then A does not admit an LU factorization. to minus infinity. Thus if A admits max-plus LU factors, then an LU factorization of A exists with entries given by (5)- (6). If V p ( A) does not have max-plus LU factors, then this means that for some i, j, k the first term on the right-hand side of (7) or (8) is equal to −∞ but the second term is not. As a result, the denominator on the right-hand side of (5) or (6) is equal to 0 but the numerator is not, so A does not have an LU factorization.
Recall from section 3 that for A, B, C ∈ R n×n the product A ⊗ B balances C if for where the maximum must be attained by at least two different values of k.
where c ∈ C is the coefficient of the lowest order term in the sum. Therefore We are now ready to prove Theorems 3.4 and 3.6.
Next, we show by induction on k that u (k) ij ≤ max p,q a p,q for all i, j, and k. Since ik , which combined with (29) and the induction hypothesis gives for all i, j. Hence n (A) = max 0≤k≤n−1 (max k≤i,j≤n u (k) ij ) − max 1≤i,j≤n a ij ≤ 0. But, by definition, n (A) ≥ 0, so n (A) = 0.

Max-plus LU algorithm.
Computing the max-plus LU factors directly from the formulae (7)-(8) is computationally expensive, as each entry in either the lower part of L or the upper part of U requires the computation of two max-plus permanents or, in other words, the solution of two optimal assignment problems. The best known x (1) x (2) x (3) y(1) algorithms for computing an optimal assignment of A ∈ R n×n max have worst case cost O nτ + n 2 log n , where τ is the number of nonzeros in A. So the computation of the LU factors using (7)-(8) can cost as much as O n 2 τ + n 3 log n operations. We now describe a more efficient approach, which consists of simultaneously computing all the permanents needed for all the entries in a row of U or a column of L, while sharing some of the computation along the way. The bipartite graph setup underpinning our method will be familiar to readers who already have some knowledge of primal dual optimal assignment solvers such as the Hungarian algorithm.
We include an edge e ij in E from x(i) to y(j) with weight w(e ij )) = a ij whenever a ij = −∞. Thus the edges out of a left vertex x(i) represent the finite entries in the ith row of A (see Figure 1(a)).
A matching M is a subset of E with the property that no vertex in M is incident to more than one edge. Vertices which are incident to edges in M are said to be matched. The weight of a matching w(M ) is the sum of its edge weights. Given a matching M , we define the residual graph R G (M ) to be the bipartite graph obtained from G by reversing the direction of all of the edges in M and changing the sign of the edges' weights (see Figure 1(b)). Note that we maintain the labelling of edges even when they are reversed. Thus e ij labels either the forward edge from x(i) to y(j) or the backward edge from y(j) to x(i), depending on whether or not it has been reversed. We do not switch to labelling this edge as e ji . We define the weight w(σ) of a directed path or cycle σ through R G (M ) to be equal to the sum of the weights of its constitute edges in R G (M ). For the directed path σ = {e 32 , e 12 , e 11 , e 21 , e 23 } through R G (M ) in Figure 1 We denote by R T G (M ) the transpose residual graph obtained from R G (M ) by reversing the direction of all edges (see Figure 1(c)).
Given a subset S of the edges of R G (M ), we augment M according to S, written as M S, by taking all the edges that appear in either M or S but not both; that is, When we augment with respect to a path/cycle through R G (M ), we treat the path/ cycle as a set of edges. For the path σ = {e 32 , e 12 , e 11 , e 21 , e 23 } in Figure 1 (ii) u ki is either the weight of the maximally weighted path through (iii) l ik is either the weight of the maximally weighted path through R T The proof of Proposition 5.1 is technical and is left to Appendix A. According to Proposition 5.1(i), the sequence of maximally weighted matchings M 1 , . . . , M n can be obtained iteratively starting with M 0 = ∅. At step k > 0, M k−1 is augmented with respect to a maximally weighted path through R G (M k−1 ) from x(k) to y(k) to form M k . From Proposition 5.1(ii), the entries in the kth row of U are given by the weights of n − k different maximally weighted paths through R G (M k−1 ). Likewise, from Proposition 5.1(iii), the entries in the kth columns of L are given by the weights of n − k different maximally weighted paths through R T G (M k ). For a given k, the weights of these maximally weighted paths can be obtained by solving two maximally weighted spanning tree problems rooted at x(k), one through R G (M k−1 ), the other through R T G (M k ). A maximally weighted spanning tree T through R G (M ) rooted at x(k) consists of the maximally weighted paths from x(k) to every reachable left and right vertex. The depth of a reachable vertex is the weight of the corresponding maximally weighted path in T . If T does not reach a vertex, then this vertex has depth −∞. All these facts lead to the following algorithm.  Compute the maximally weighted spanning tree T through R T G (M k ) rooted at x(k). 10 for i = k + 1 : n 11 l ik = depth of x(k) in T . 12 end 13 end If A does not admit max-plus LU factors, then at some step k of Algorithm 5.2, y(k) will have depth −∞ and there will be no path from x(k) to y(k), so the algorithm will not be able to augment the matching M k−1 in line 8.
The maximally weighted spanning tree T through R T G (M 1 ) rooted at x(1) is highlighted with thicker red lines in Figure 2(d). The depths of the X vertices give the entries for the first column of L. k = 2. Figure 2(b) highlights the maximally weighted spanning tree T through the residual graph R G (M (1)) rooted at x(2). The depths of the Y vertices give the entries for the second row of U.
The maximally weighted path σ through R G (M 1 ) from x(2) to y(2) consists of a single edge σ = {e 22 } (Figure 2 The residual graph R G (M (2)) is shown in Figure 2(c). The maximally weighted spanning tree T through R T G (M (2)) rooted at x(2) is highlighted in Figure 2(e). The depths of the X vertices give the entries for the second column of L below the diagonal. k = 3. The maximally weighted spanning tree T through R G (M 2 ) rooted at x(3) is highlighted in Figure 2(c). The depth of the y(3) vertex gives the entry for the third row of U. The algorithm returns the max-plus LU factors Algorithm 5.2 requires the solution of maximally weighted spanning tree problems in steps 4 and 9. Note that the spanning tree T in step 4 provides the maximally weighted path σ needed in step 8. To efficiently solve the maximally weighted spanning tree problems for a given bipartite graph G = (X, Y ; E), we follow an approach taken by Orlin and Lee for the optimal assignment problem [16]. Their approach consists of adjusting the edge weights of G by defining a potential φ : X, Y → R so that, for each edge e ∈ E from a vertex a to a vertex b, the new edge weight is given by w (e) = w(e) − φ(a) + φ(b) with the property that w (e) ≤ 0. This leads to adjusted path weights w (σ) = w(σ) − φ(a) + φ(b) for a path σ from vertex a to vertex b. Hence if σ is a maximally weighted path from a to b for the original bipartite graph, then it stays maximally weighted for the bipartite graph with adjusted weights, and vice versa. Now since all the adjusted edge weights are nonpositive, Dijkstra's algorithm can then be used to compute the maximally weighted spanning trees and the depth w (σ) of each of its maximally weighted paths σ. Then the depth of σ for the original graph G is given by w(σ) = w (σ) + φ(a) − φ(b).
The computational cost of adjusting the weights using the technique in [16] is O(τ ) operations, where τ is the number of edges in G. Dijkstra's algorithm solves the maximally weighted spanning tree problem with worst case cost O τ + n log n for a graph with n vertices and τ edges. Since we need to compute 2n of such spanning trees, our max-plus LU algorithm applied to A ∈ R n×n will have worst case cost O(nτ + n 2 log n), where τ is the number of finite entries in A.

5.1.
Max-plus LU algorithm for Hungarian matrices. Algorithm 5.2 simplifies if we first apply a Hungarian scaling and an optimal assignment to A to produce a Hungarian scaled and reordered max-plus matrix H. In particular, if A = V c (A) with A ∈ C n×n , then H = V c (H) with H as in (18) is Hungarian. The next lemma shows that there is no need to compute the sequence of maximally weighted matchings M 1 , . . . , M n (see step 8 of Algorithm 5.2), as these come for free for Hungarian matrices. Proof. We note that every principal submatrix H k of order k of a Hungarian matrix H is Hungarian. Since H k has nonpositive entries, k i=1 h i,π(i) ≤ 0 for any π ∈ Π(k). Now for π = id, the identity permutation, k i=1 h ii = 0 so that π = id is an optimal assignment for H k . But an optimal assignment corresponds to a permutation representing a maximally weighted perfect matching between the left and right vertices of the bipartite graph associated with H k ; in other words, M k = {e 11 , . . . , e kk } is a maximally weighted matching between {x(1), . . . , x(k)} and {y (1), . . . , y(k)}.
Knowing this sequence of maximally weighted matchings a priori enables us to parallelize Algorithm 5.2. We no longer need to compute the maximally weighted paths through R G (M k ) before we can form M k+1 and compute the maximally weighted paths through R G (M k+1 ). Instead we can treat each R G (M k ) separately, computing the kth row of U and (k − 1)th column of L in parallel. This approach yields the following algorithm.  Compute the maximally weighted spanning tree T through R G (M k−1 ) rooted at x(k). 6 for j = k : n 7 u kj = depth of y(j) in T . Compute the maximally weighted spanning tree T through R T G (M k ) rooted at x(k).  Because the entries of H are nonpositive, we can use Dijkstra's algorithm to compute the maximally weighted spanning trees in steps 5 and 9. Thus our max-plus LU algorithm applied to H ∈ R n×n max will have worse case cost O nτ + n 2 log n , where τ is the number of finite entries in H.
Dijkstra's algorithm permanently labels vertices in decreasing order of their depth in the tree. This means that when we run Dijkstra's algorithm to compute the kth row of U or kth column of L, we are given the position and value of the entries one at a time in decreasing order of their value. If we are only interested in entries that are greater than some threshold, or if we only want to know the m largest entries in each row, then we can stop Dijkstra's algorithm earlier and reduce the cost considerably. The exact cost of this implementation will depend heavily on the particular details of the problem matrix. This approach will not work for a non-Hungarian scaled matrix as, while Dijkstra's algorithm will always label vertices in decreasing order, it will be in the order of their adjusted depths. So we cannot be sure that we have found the m largest entries until we have computed all of the adjusted depths and then converted them back into their true depths using the potential.

5.2.
Max-plus LU algorithm with partial pivoting. At each step k of Algorithm 5.6, the unmatched left vertices are permuted to maximize the weight of the augmenting path. For this we find a maximally weighted path through R G (M k−1 ) from {x(k), . . . , x(n)} to y(k). If this maximally weighted path begins at x(i), then we interchange x(i) with x(k) and perform step k of Algorithm 5.2. This is analogous to interchanging rows in Gaussian elimination with partial pivoting. Note that we can compute the maximally weighted path through R G (M k−1 ) from {x(k), . . . , x(n)} to y(k) by solving a single maximally weighted spanning tree problem. This is done by adding a root vertex r and connecting it to each unmatched left vertex x(j), j = k, . . . , n, with an edge of weight zero (see Figure 3(a)). We then compute the maximally weighted spanning tree through R G (M k−1 )∪{r} rooted at r. Then, rather than choosing the starting vertex of the maximally weighted path from {x(k), . . . , x(n)} to y(k), we take the second vertex on the maximally weighted path from r to y(k). This ensures the same choice of vertex, but the algorithm requires us to compute fewer maximally weighted spanning trees. This approach yields the following algorithm.
Algorithm 5.6 (max-plus LU with partial pivoting). Given A ∈ R n×n , this algorithm returns a permutation π, a unit lower triangular matrix L, and an upper triangular matrix U such that L, U are max-plus LU factors for the partial pivoting free matrix P π ⊗ A, where (P π ) ij = 0 for j = π(i) and (P π ) ij = −∞ otherwise. % G π denotes the bipartite graph associated with A with left vertices % X π = {x(π(1)), . . . , x(π(n))} and right vertices Y = {y(1), . . . , y(n)} % for some permutation π. 1 Set the lower part of U and strictly upper part of L to −∞, and the diagonal entries of L to 0. 2 Set M 0 = ∅ and π = [1, 2, . . . , n]. 3 for k = 1 : n 4 Add a root vertex r and connect it to each left vertex x π(j) , j = k, . . . , n, by an edge of weight zero. 5 Compute the maximally weighted spanning tree T through R Gπ (M k−1 ) rooted at r. 6 Swap π(k) with π(i), where x(π(i)) is the 2nd vertex on the maximally weighted path from r to y(k). 7 for j = k, . . . , n 8 u kj = depth of y(j) in T. 9 end 10 M k = M k−1 σ, where σ is the maximally weighted path through R Gπ (M k−1 ) from x(π(k)) to y(k). 11 Compute the maximally weighted spanning tree T through R T Gπ (M k ) rooted at x(π(k)). 12 for i = k + 1 : n 13 l ik = depth of x(π(k)) in T . Line 6 in Algorithm 5.6 is the row interchanging step. Since we added a root vertex r before the left vertices X π = {x(π(1)), . . . , x(π(n))}, we need to look for the second vertex, say x(π(i)), on the maximally weighted path from r to y(k), the first vertex being r. We then swap π(k) with π(i), which corresponds to interchanging x(π(k)) with x(π(i)).
Proposition 5.7. For A ∈ R n×n , Algorithm 5.6 returns a permutation π such that the permuted matrix P π ⊗ A is partial pivoting free.
Proof. Applying the result of Proposition 5.1(iv) to the matrix P π ⊗ A, we have i,k+1 is the weight of the maximally weighted path through R G (M k ) from x(i) to y(k + 1) for i ≥ k + 1. But from line 6 in Algorithm 5.6 we are guaranteed because of the interchanging that the maximally weighted path from {x(k + 1), . . . , x(n)} to y(k + 1) is the path from x(k + 1) to y(k + 1) so that u (k) k+1,k+1 = max k+1≤i≤n u (k) i,k+1 , k = 0, . . . , n − 1, which is the definition of max-plus partial pivoting free as given in (15).

5.
3. Max-plus ILU preconditioner. Given the max-plus LU factors L and U of V c (A), we define the max-plus ILU preconditioner as follows. For a threshold t we store We then compute the ILU factors for A restricted to positions where S is nonzero using, for example, the general static pattern ILU algorithm described in [17,Alg. 10.1] or the more practical variant [17, Alg. 10.3].
6. Numerical experiments. For our numerical experiments, we select all real nonsymmetric matrices in the University of Florida sparse matrix collection [4] of size 100 ≤ n ≤ 5000 that have numeric value symmetry no greater than 0.9, and that are structurally nonsingular. When two matrices from the same group have size n and number of nonzero entries within 1% of each other, then we consider these matrices as duplicate and remove one of them. This leaves us with a total of 260 matrices. We used MATLAB version R2015b to perform the computations. Note that our max-plus LU algorithms are implemented as research codes rather than efficient implementations and, for this reason, we only work with matrices of moderate sizes. For the valuation V c we use the logarithm to base 10.
6.1. Stability of Gaussian elimination with no pivoting. The aim of our first set of experiments is to compare the numerical stability of Gaussian elimination with no pivoting applied to Hungarian scaled and reordered matrices H in (18), and to reordered matrices P π A, where π is the permutation returned by Algorithm 5.6. For each matrix A in the test set we construct H using the HSL code MC64 [18] and P π A using our MATLAB implementation of Algorithm 5.6. Theorem 3.6 together with Heuristic 3.7 suggests that the growth factors for both H and P π A are of order one since both V c (H) and V c (P π A) are partial pivoting free. Although this is just a heuristic, we expect the LU factorization with no pivoting of H and P π A to have better numerical stability than for A. To examine the stability of Gaussian elimination on these two classes of matrices we compute the relative backward errors for X = H and X = P π A, where L and U are the computed LU factors of the LU factorization of X. We also use Gaussian elimination with no pivoting to compute the LU factorizations of the original matrices A. For each class of matrices, i.e., X = A, H and P π A, we plot in Figure 4 the proportion of problems for which we are able to compute LU factors without breakdown and with η X ≤ α against α. If the factorization breaks down or if η X ≥ 10 −1 , we record a fail. Without pivoting or scaling, the LU factorization fails for almost half of the test matrices A, and η A ≤ 10 −10 for 53% of the test matrices. After applying the max-plus LU permutation P π to A, the number of failed LU factorizations falls from 120 to 60, and η P π A ≤ 10 −10 for 64% of the test matrices. The number of failed LU factorizations is lower for Hungarian scaled matrices H (only 23 fails), and η H ≤ 10 −10 for 86% of the test matrices. Since our aim is to build a new class of ILU preconditioners and because • for the vast majority of matrices in the test set a reasonably stable LU factorization with no pivoting is possible if A is Hungarian scaled and reordered into H, • Hungarian scaling has been shown experimentally to be beneficial for iterative methods [2], • the max-plus LU algorithm is easier to implement for Hungarian scaled matrices, from here on we will work with the subset of test problems for which the Hungarian scaled and reordered matrix H can be factorized with no pivoting and with η H < 0.1. This results in a test subset of 233 matrices.
6.2. Max-plus LU approximation. The max-plus LU approximation can assist in the computation of an ILU preconditioner by providing a prediction of the positions of larger entries in the LU factors of H. One way of measuring the quality of this prediction is to treat the max-plus LU approximation as a binary classifier. For the LU factorization H = LU of a matrix H from the test set and its max-plus LU factors L, U, we predict that |l ij | ≥ 10 −t for i > j if and only if l ij ≥ −t, and likewise for the entries of U . The entries of L and U are then labeled as true positive or true negative according to the scheme displayed in Figure 5 We also define the soft accuracy and soft precision, which are calculated in the same way using the number of entries in L and U that are soft true positive and soft true negative, where the labelling "soft true positive" and "soft true negative" is done l ij , u ij log 10 |l ij |, log 10 |u ij | false -ve true +ve false +ve true -ve  according to the scheme displayed in Figure 5(b). We record in Table 1 Table 1 are quite high and show that, for our test set, the max-plus LU factors provide a good prediction of the position of the larger entries in L and U .
6.3. Behavior of max-plus ILU preconditioning. In this section, we examine the behavior of our max-plus ILU preconditioner on our test set of 233 Hungarian matrices H. We apply GMRES with a right ILU preconditioner to the systems Hx = b. For the ILU preconditioner, we use one of the following: 1. Threshold ILU as implemented in the ilu function of MATLAB with the option setup.type = 'crout', which performs the Crout version of the threshold ILU factorization (see [17, sec. 10.4.6]) and drop tolerance equal to 10 −2 , which means that the nonzero and nondiagonal entries of U and L satisfy |u ij | ≥ 10 −2 H(i, :) 2 , |l ij | ≥ 10 −2 H(:, j) 2 /|u jj |.
Since the Hungarian scaled matrices we are working with typically have all row/col norms O(1), this choice of dropping strategy is roughly equivalent to the max-plus method we describe below.  2. ILU(k) through the MATLAB function iluk from [13], which is an optimized implementation of the incomplete LU factorization with fill level k described in [17,Chap. 10]. 3. ILU(0) with zero level of fill-in. 4. Max-plus ILU by forming the pattern matrix S in (30) with t = 10 −2 and by calling iluk with input parameters set up to bypass the symbolic factorization. The level k for ILU(k) is chosen as the smallest integer such that the resulting ILU factors are denser than those obtained by the max-plus method. We justify this choice at the end of this section. The distribution of the levels k is shown in Table 2.
We use unrestarted GMRES and stop the iterations when either the approximate solution x satisfies Hx − b 2 ≤ 10 −5 b 2 , in which case we say that GMRES has converged, or the maximum number of iterations has been reached (maxit=100), in which case the test matrix is marked as a fail. For each preconditioned system, we record the number of GMRES iterations required for convergence multiplied by the sum of the number of nonzero entries in H and the number of nonzero entries in the ILU factors. For the unpreconditioned systems we record the number of GMRES iterations multiplied by the number of nonzero entries in H. These measures give an approximation of the cost of the GMRES solves but do not include the cost of constructing the preconditioner. The left plot in Figure 6 is a performance profile that compares this cost measure over the different ILU strategies. For the unpreconditioned systems, GMRES fails to converge in less than maxit = 100 iterations for 144 out of 233 problems and has clearly the worse performance. Systems preconditioned by ILU(k) have a cost more than double that of the best method for about 50% of the problems. The figure shows that the performance of the max-plus ILU preconditioned systems is close to that of the standard threshold ILU preconditioned systems. For about 80% of problems the cost of the max-plus method is within a factor of 2 of the best method. We obtain an almost identical performance profile plot for GMRES with left ILU preconditioners.
We performed the same set of tests but with BiCGSTAB in place of GMRES for the iterative solver. The corresponding performance profile (see Figure 6(right)) shows that the different ILU preconditioners exhibit the same behavior as for GMRES.
The cost measure that we have used for these performance profiles tends to decrease with the number of nonzero positions included in the ILU factors for all three different ILU techniques. As a result, our choice for the level k is slightly generous to the ILU(k) method. Of course the total cost of solving the linear system should also include the cost of computing the ILU factors, which generally increases with the number of nonzero positions allowed, so that in practice choosing denser ILU factors is not always an advantage. We use this cost measure here for its simplicity. In future work we hope to develop an efficient implementation of our max-plus ILU algorithm that can be timed against state-of-the-art ILU(k) and threshold ILU methods.

Conclusion.
We have presented a new method for approximating the order of magnitude of the entries in the LU factors of a matrix A ∈ C n×n . This method uses max-plus algebra and is based solely on the moduli of the entries in A. If the matrix A is first Hungarian scaled and reordered, then this LU approximation can be computed in parallel with n independent computations, of cost O(τ + log n). If we seek only the positions and values of the largest entries in the LU factors, then this cost can be reduced further.
We have shown that this approximation can be used to help compute an ILU preconditioner for A. First we reorder and rescale A to obtain a Hungarian matrix H, then we compute the positions of the largest entries in the LU factors of H, and finally we use these positions as the sparsity pattern for an ILU preconditioner. The resulting preconditioner tends to outperform the comparable ILU(k) preconditioner and have performance very close to a comparable threshold ILU preconditioner when applied as preconditioner to GMRES or BiCGSTAB.
Like the traditional threshold ILU method, the idea behind our max-plus preconditioner is to try to capture the largest entries in the exact LU factors of A. By attempting to find the locations of these large entries a priori using our heuristic maxplus approximation, we trade some accuracy for potential computational advantage. Inaccuracy in the approximation results in slightly less effective preconditioners when compared to the threshold method. But since the columns of the max-plus sparsity pattern can be computed independently of each other and since we can then use the same techniques as in ILU(k) to compute the numerical factors, there is potential to develop a scalable parallel algorithm for max-plus ILU preconditioning.
The numerical examples presented in this paper represent a proof of principal that the max-plus ILU technique can be advantageous in the solution of sparse linear systems.
Appendix A. This appendix presents several technical results needed to prove Proposition 5.1. We refer to section 5 for notation and definitions. Proof. (i) There are no edges into any unmatched left vertices or out of any unmatched right vertices, so that σ can only visit its origin vertex x(k+1), destination vertex y(k + 1), as well as the vertices matched by M . Since σ is a path from x(k + 1) it must include exactly one edge out of this vertex, and since this vertex is unmatched in M there can be no edge out of it in M . Thus M σ contains exactly one edge incident to the origin vertex x(k + 1). Likewise for the destination vertex, M σ contains exactly one edge incident to y(k + 1).
Let u be a matched left vertex visited by σ. Then σ must include an edge into u, which, being a right-to-left edge, must be an edge in M ; σ must also include an edge out of u, which, being a left-to-right edge, must not be in M . Thus M σ contains exactly one edge incident to u. Likewise if v is a matched right vertex visited by σ, then M σ contains exactly one edge incident to v.
Matched vertices that are not visited by σ are unaffected; likewise unmatched vertices are unaffected. Thus M σ is a subset of E, with exactly one edge incident to each of the left vertices {x(1), . . . , x(k), x(k + 1)} and each of the right vertices {y(1), . . . , y(k), y(k + 1)}, and no edge incident to any other vertices.
The weight of M σ is equal to w(M ) plus the weight of any edges in σ but not in M minus the weight of any edges in M and σ. Since any edge in M ∩ σ is a backward edge with an extra minus sign and any edge in σ/M is a forwards edge without an extra minus sign, we have w(M σ) = w(M ) + w(σ).
(ii) The proof of (ii) is the same as for (i) but without the origin or destination vertices.
(iii) The argument is the same as (i) except for the destination vertex x(k). This vertex is incident to exactly one edge in M , which is a right-to-left vertex; since σ ends at this vertex it must also contain this edge, and therefore M σ does not contain an edge incident to x(k).
(iv) Since S 1 ∪ S 2 = ∅ we have M (S 1 ∪ S 2 ) = (M S 1 ) S 2 . Hence, whereŵ(S 2 ) is the weight of the edge set S 2 in the residual graph R G (M S 1 ). However, since S 1 and S 2 are disjoint we haveŵ(S 2 ) = w(S 2 ), where w(S 2 ) is the weight of the edge set S 2 in the residual graph R G (M ). (This follows because none of the edges affected by augmenting with respect to S 1 are in the set S 2 .) Therefore we have For the residual graph R G (M ) in Figure 1 (1), . . . , y(k)} can be represented by a unique permutation π ∈ Π(k) so that M π is the matching that matches x(i) to y π(i) . Now consider w(M π ) = k i=1 a x(i)y(π(i)) so that the weight of the maximally weighted matching is given by Lemma A.3. Let G = (X, Y ; E) be a bipartite graph, and let M π ⊂ E, M ω ⊂ E be matchings defined by the permutations π ∈ Π(k) and ω ∈ Π(k + 1), respectively. Then there exists a path σ through R G (M ) from x(k +1) to y(k +1) as well as disjoint cycles C 1 , . . . , C m through R G (M ) such that M ω = M π (σ ∪ C 1 ∪ · · · ∪ C m ).
There is no right vertex y(j) with π −1 (j) = k + 1 since the domain of π is {1, . . . , k}. Also all subsequent vertices visited by the constructed path can have only one predecessor, as π and ω are permutations. Therefore σ cannot contain any cycle and must terminate since there are only finitely many vertices that it can visit without repetition. The only way that the path can terminate is by reaching a vertex where the next step is not well defined, and the only such vertex is y(k +1). The constructed path therefore terminates at σ(2 ) = y(k + 1). Next we pick any left vertex x matched by M π , which is not visited by σ. We construct a new path starting from x with the same rule that we used for constructing σ. Since there are no possible termination points, where π −1 or ω are not defined, this new path must be cyclic. If the constructed cycle is of length 2, then we discount the cycle but still record the constituent vertices as having been visited. We construct further cycles C 1 , . . . , C m until all matched vertices have been visited.
By construction each vertex matched by M π either has the same matching under M ω or is incident to two edges in σ ∪ C 1 ∪ · · · ∪ C m , one edge from M π and one from M ω . When we augment by taking the symmetric difference, the edge from M π is replaced by the one from M ω . Likewise the origin and destination vertices are each incident to an edge which is in M ω but not in M π , so these edges are also included when we augment.
The following theorem shows us how we can obtain a sequence of maximally weighted matchings by augmenting with respect to maximally weighted paths through the residual graph. This is the mechanism by which we will compute all of the entries in the max-plus LU factors. Proof. (i) Let σ be a path through R G (M ), and let C 1 , . . . , C m be disjoint cycles in R G (M ). Then by Lemma A.1(iv), w(M (σ ∪ C 1 ∪ · · · ∪ C m )) = w(M ) + w(σ) + w(C 1 ) + · · · + w(C m ). It follows from Lemma A.1(ii) that if C is single cycle, then M C is a matching between the same vertices as M , and since M is the maximally weighted matching on its matched vertices, w(M C) = w(M ) + w(C) ≤ w(M ), and thus w(C) ≤ 0, showing that any cycle in R G (M ) must have nonpositive weight. Now consider max M w(M ), where the maximum is taken over all matchings from {x(1), . . . , x(k + 1)} to {y(k), . . . , y(k + 1)}. Lemma A.3 tells us that every matching M from {x(1), . . . , x(k + 1)} to {y(k), . . . , y(k + 1)} can be written as the augmentation of the matching M with respect to some path and cycles. Therefore max M w(M ) = w(M ) + max σ,C1,...,Cm w(σ) + w(C 1 ) + · · · + w(C m ), where the maximum is taken over all paths through R G (M ) from x(k + 1) to y(k + 1) and disjoint cycles C 1 , . . . , C m in R G (M ). Since the cycle weights are all nonpositive we have max We are now ready to prove Proposition 5.1.