A MAX-PLUS APPROACH TO INCOMPLETE CHOLESKY FACTORIZATION PRECONDITIONERS

. We present a new method for constructing incomplete Cholesky factorization precon-ditioners for use in solving large sparse symmetric positive-deﬁnite linear systems. This method uses max-plus algebra to predict the positions of the largest entries in the Cholesky factor and then uses these positions as the sparsity pattern for the preconditioner. Our method builds on the max-plus incomplete LU factorization preconditioner recently proposed in [J. Hook and F. Tisseur, SIAM J. Matrix Anal. Appl. , 38 (2017), pp. 1160–1189] but is applied to symmetric positive-deﬁnite matrices, which comprise an important special case for the method and its application. An attractive feature of our approach is that the sparsity pattern of each column of the preconditioner can be computed in parallel. Numerical comparisons are made with other incomplete Cholesky factorization preconditioners using problems from a range of practical applications. We demonstrate that the new preconditioner can outperform traditional level-based preconditioners and oﬀer a parallel alternative to a serial limited-memory–based approach.

1. Introduction.Incomplete Cholesky (IC) factorizations are an important tool in the solution of large sparse symmetric positive-definite linear systems of equations Ax = b.Preconditioners based on an incomplete factorization of A (that is, a factorization in which some of the fill entries that would occur in a complete factorization and possibly some of the entries of A are ignored) have been in widespread use for more than 50 years (see, for example, [34] for a brief historical overview that highlights some of the most significant developments).For general nonsymmetric systems, incomplete LU (ILU) factorization preconditioners are frequently used as they work well for a wide range of problems, and, again, many variants have been proposed (see [31] for an introduction).The basic idea is to compute a factorization A ≈ LU (or A ≈ LL T in the positive-definite case) with L and U sparse triangular matrices with the fill-in (that is, the entries in L and U that lie outside the sparsity pattern of A) restricted to some sparsity pattern S. Recently, Hook and Tisseur [20] have shown how max-plus algebra can be used to approximate the order of magnitude of the moduli of the entries in the LU factors of A and have used this to construct the sparsity pattern of ILU preconditioners.Max-plus algebra is the analogue of linear algebra developed for the binary operations max and plus over the real numbers together with −∞, the latter playing the role of additive identity; an introduction and a large number of references to the literature may be found in [16,Chap. 35].In the past few years, max-plus algebra has been used to examine a number of numerical linear algebra problems [2,12,15,28].While the numerical experiments reported on in [20] were limited to modest-sized sparse problems (of order up to 10 3 ), they did indicate the potential of the approach to compute ILU preconditioners that can outperform the traditional level of fill methods and be competitive with a threshold-based method.
One drawback of the max-plus method is that the sparsity pattern S cannot be updated to account for any pivoting during the factorization of A, so that the pattern chosen by the max-plus analysis is only useful when the factorization does not require row or column interchanges.An attractive feature of symmetric positive-definite matrices is that they will always (in exact precision) admit a Cholesky factorization without pivoting.However, if A is close to being indefinite or if an incomplete factorization is computed, then breakdown can occur (that is, a zero or negative pivot is encountered).In this case, we follow Manteuffel [27] and add a small multiple of the identity; that is, we factorize DAD + αI for some shift α > 0 and diagonal scaling D. This avoids the need for pivoting and preserves the chosen sparsity structure S of the factors, and, as shown recently by Scott and Tůma [35,36], the resulting IC factors generally still provide an effective preconditioner for A. We observe that prescaling of A is essential to limit the size of the shift; preordering is also normally needed to limit fill in the factors (and hence the number of entries that are dropped during the incomplete factorization).
The aim of this paper is to present an algorithm for constructing IC preconditioners for large sparse positive-definite problems using max-plus algebra to predict the positions of the largest entries in the Cholesky factor.The sparsity pattern S j of each column j can be determined in parallel.Once S j is found, the IC factorization can be computed using a conventional serial procedure or using the novel approach of Chow and Patel [3], who propose using an iterative method to compute the entries of the factors.All the nonzero entries in the incomplete factors can be computed in parallel and asynchronously, using a number of sweeps of an iterative method.A key issue with their approach is that the pattern S must be chosen a priori.
The remainder of this paper is organized as follows.In section 2, we briefly recall max-plus incomplete LU factorizations, and then, in section 3, we focus on positive-definite matrices.We present algorithms for computing each column of the max-plus factor independently. Results for a wide range of problems are presented in section 4, and comparisons are made between our new max-plus IC preconditioner and both level-based and memory-based IC preconditioners used with the preconditioned conjugate gradient method.Finally, some concluding remarks and possible future directions are given in section 5. Note that we do not assume that the reader is familiar with the use of max-plus algebra but define the concepts that we need as necessary.
Throughout this paper, matrices are denoted by capital letters with their entries given by the corresponding lower case letter in the usual way, that is, A = (a ij ) ∈ R n×n .Max-plus matrices are denoted by calligraphic capital letters and their entries by the corresponding lower case calligraphic letter, that is, A = (a ij ) ∈ R n×n max , where 2. Max-plus approximation of LU factorization.In [20], the authors use max-plus algebra to introduce the following method for approximating the moduli of A1989 the entries of the L and U factors of sparse matrices.
Consider the map V defined as with the convention that log 0 = −∞.For x, y ∈ R, V(xy) = V(x) + V(y), and when |x| |y| or |x| |y|, V(x + y) ≈ max{V(x), V(y)}, which suggests using the operations max and plus in place of the classical addition and multiplication once we have applied the map V.The set R max endowed with the addition x ⊕ y = max{x, y} and the multiplication x ⊗ y = x + y is called the max-plus semiring.It is not a ring as there is no additive inverse and hence there is no max-plus subtraction.The identity elements are −∞ for the addition and 0 for the multiplication.When applied componentwise to a matrix, the map (2.1) allows us to transform the matrix A ∈ R n×n into a max-plus matrix ; i.e., V(A) = A is a matrix with entries The max-plus matrix V(A) is termed the valuation of A.
For A ∈ R n×n max , define the max-plus permanent where Π(n) is the set of all permutations on {1, . . ., n}.In [20], Hook and Tisseur make repeated use of the heuristic that V det(A) ≈ perm V(A) .This approximation can be intuitively justified by expressing the determinant as a sum of terms coming from each permutation and the permanent as the maximum of those terms.If the matrix A is sparse and has a wide range of entry sizes, then the heuristic approximation should be more accurate.It is well known that the entries in the lower triangle of L and the upper triangle of U of an LU factorization of A ∈ R n×n can be expressed explicitly in terms of determinants of submatrices A (see [11, p. 35]) by Here the notation [1 : k −1, i] means the indices 1, 2, . . ., k −1, i.If both the numerator and denominator in either (2.3) or (2.4) are zero, then the convention 0/0 = 0 is used.Using this fact and heuristic (2.2), Hook and Tisseur [20] define the max-plus LU factors of A ∈ R n×n max as the lower triangular max-plus matrix L and upper triangular max-plus matrix U with entries and l ik = u kj = −∞ if i, j < k.If the two terms on the right-hand side of either (2.5) or (2.6) are −∞, then the convention −∞ − (−∞) = −∞ is used.If the second term is −∞ but the first is not, then A does not admit max-plus LU factors.Hook and Tisseur show that L and U are such that where the symbol "≈" should be interpreted componentwise as "offers an order of magnitude approximation." We compute the max-plus LU factors of V(A) using (2.5) and (2.6).For such a small example, we can evaluate the permanent of a matrix or submatrix by simply evaluating all possible permutations and recording the maximum.Note that the permanent can also be expanded along a row or a column in a similar way to the Leibniz formula for determinants but with no sign function, and the sum replaced by a maximum and the product by a sum (see [19,Prop. 1.2]).For example, We obtain which provides a good approximation of the order of magnitude of the moduli of the entries in the LU factors of A, where u 33 is only provided to five significant digits.

Hungarian matrices.
A matrix H ∈ R n×n is said to be Hungarian if its entries satisfy |h ij | ≤ 1 and |h ii | = 1 for all i, j = 1, . . ., n.It is well known that for any matrix A ∈ R n×n of full structural rank there exists a permutation matrix P and diagonal matrices D 1 , D 2 such that P D 1 AD 2 is a Hungarian matrix and that such a scaling is a highly effective preprocessing step both for sparse direct solvers and for incomplete factorizations.The idea was originally introduced by Olschowka and Neumaier [29] in the mid 1990s.They proposed using the solution to an optimal assignment problem as an ordering and scaling to reduce the need for pivoting within Gaussian elimination; their work was further developed by Duff and Koster [8] (see also Gupta and Ying [13]).The idea was subsequently extended to symmetric systems [7,9] and over the last 15 years or so, it has been adopted by the sparse linear algebra community for both nonsymmetric and symmetric problems (see, for example, [1,14,17,18,25,32,33]).
A max-plus matrix H ∈ R n×n max is said to be Hungarian if its entries satisfy h ij ≤ 0 and h ii = 0 for all i, j = 1, . . ., n.Note that H ∈ R n×n is Hungarian if and only if V(H) is Hungarian.Proposition 2.2.Let H ∈ R n×n max be Hungarian; then H admits max-plus LU factors.
Proof.From (2.5) and (2.6) we have that perm H(1 : k, 1 : k) = −∞ for all k is a sufficient condition for H to admit max-plus LU factors.But since h ij ≤ 0 and h ii = 0 for all i, j, we have perm H(1 : k, 1 : k) = 0 for all k.
3. The max-plus IC factorization.We now focus on symmetric positivedefinite matrices A ∈ R n×n .For such matrices, there exists a unique lower triangular matrix L ∈ R n×n such that A = LL T ; this is the Cholesky factorization of A. The following result is from Olschowka and Neumaier [29].
Lemma 3.1.Let A ∈ R n×n be a symmetric positive-definite matrix, and let D ∈ R n×n be the diagonal matrix with diagonal entries is a symmetric positive-definite Hungarian matrix.
Thus given a symmetric positive-definite matrix A ∈ R n×n , we can easily scale it to obtain a symmetric positive-definite Hungarian matrix H ∈ R n×n .Since the Hungarian property is very beneficial to our max-plus IC algorithm, in the remainder of the paper we assume that the matrix A has been scaled so that the scaled matrix H = DAD is symmetric positive-definite Hungarian.
Since H is symmetric, l ik = u ki for all i and k, i.e., U = L T .We say that L as defined in (3.1) is the max-plus lower Cholesky factor of H.In the special case of a symmetric positive-definite Hungarian matrix, the heuristic order of magnitude approximation in (2.7) can be rewritten as follows.

Heuristic 3.3 (max-plus approximation of Cholesky factors)
. Let H ∈ R n×n be a symmetric positive-definite Hungarian matrix, let L ∈ R n×n be the lower Cholesky factor of H, and let H = V(H).Then where L is the max-plus lower Cholesky factor of H. [20] present several algorithms for computing max-plus LU factors for nonsymmetric matrices.Their algorithms all work on a directed bipartite graph, with a pair of vertices for each row in the matrix.In the special case of a symmetric Hungarian matrix, we are able to simplify their algorithm so that it works on an undirected graph with a single vertex for each row.

Precedence graphs and fill paths. Hook and Tisseur
Let H ∈ R n×n max be a symmetric Hungarian matrix, and let G(H) be the precedence graph of H, that is, the graph with vertices {1, . . ., n} and an undirected edge e : i ↔ j The weight of a path W (σ) is given by the sum of its edge weights.We allow paths of zero length that consist of a single vertex and have zero weight, but we do not allow paths to visit the same vertex more than once.Let Σ(i, j, H) be the set of all paths in G(H) from i to j.A path σ of length from i to j is a fill path if σ(1) = i, σ(k) < min{i, j} for k = 2, . . ., and σ( + 1) = j.Let Σ F (i, j, H) be the set of all fill paths from i to j in the graph G(H).Clearly, Σ F (i, j, H) ⊆ Σ(i, j, H).Theorem 3.4.Let H ∈ R n×n be a symmetric positive-definite Hungarian matrix, and let H = V(H).Then the max-plus Cholesky lower factor L of H is given by Proof.From Proposition 3.2 and (2.2) we have that for i ≥ k, where Φ denotes the set of bijections from {1, 2, . . ., k − 1, i} to {1, 2, . . ., k}.
we allow paths of zero length that consist of a single vertex and have zero weight.So (3.3) holds when i = k.Assume now that i > k.Define the map : Φ → N by φ (φ) (i) = k, where φ t (i) means that the map φ is applied t times to the vertex i.Also define the map P : Φ → Σ F (i, k, H) by P (φ) = σ, where σ = i, φ(i), φ 2 (i), . . ., φ (φ) (i) .To see that the maps and P are well defined, consider the sequence σ = i, φ(i), φ 2 (i), . . . .By injectivity of φ and the fact that i has no φ preimage when i > k, the sequence σ = P (φ) can never repeat itself and must therefore terminate at vertex k after at most k steps.
Then P Q(σ) = σ by construction.Next, we show that for any φ ∈ Φ, the bijection To see this, let V ⊆ {1, . . ., k − 1, i} be the set of vertices in G(H) visited by the sequence σ = P (φ), and let since h ij ≤ 0 for all i, j.Now for ψ we have j∈{1,2,...,k−1,i}  since h ii = 0 for all i.Hence Finally, the permanent in (3.3) is given by the maximum weight of a bijection φ ∈ Φ.From the construction of P : Φ → Σ F (i, k, H), we know that each of these bijections can be mapped to a fill path P (φ) from k to i.Moreover, we know that for every bijection φ ∈ Φ, there is a bijection ψ = Q P (φ) ∈ Φ whose weight, greater than or equal to the weight of φ, is equal to the weight of the fill path P (φ).Therefore, since P is a surjection, the maximum weight of a bijection is equal to the maximum weight of a fill path.
The remaining columns of L can be computed in the same way to give the max-plus Cholesky factor of H, This provides a good approximation of the order of magnitude of the moduli of the entries in the Cholesky factor of We now define H(k) ∈ R n×n max , 1 ≤ k ≤ n, to be the matrix with entries given by Thus G H(k) is the graph with vertices {1, . . ., n} that contains all edges from {1, . . ., k} to itself and all edges from {1, . . ., k} to {k + 1, . . ., n} but no edges from {k + 1, . . ., n} to itself or from {k + 1, . . ., n} to {1, . . ., k}.This construction is illustrated for the matrix of Example 3.5 and k = 2 in Figure 3.2.Lemma 3.6.Let H ∈ R n×n max be the valuation of a symmetric positive-definite Hungarian matrix; then for k = 1, . . ., n and i = k, . . ., n.
Corollary 3.7.Let H ∈ R n×n max be symmetric and Hungarian, and let L be the Cholesky factor of H. Then for i ≥ k, W (σ).
To compute the kth column of L it is therefore sufficient to compute the weight of the maximally weighted path from k to i for all i = k + 1, . . ., n in H(k).Since the entries of H are nonpositive, this can be done using Dijkstra's algorithm [5] with worst-case cost O E k + n log(n) , where E k is the number of nonzero entries in H(k).The algorithm is given as Algorithm 1.For efficiency it uses a priority heap data structure that stores indices with an associated priority.The operation push(heap, index, priority) adds or updates an index-priority pair in the heap, while the operation (index, priority) = pop max(heap) returns and removes the index-priority pair with maximum priority.
Algorithm 1 Given the valuation H ∈ R n×n max of a real symmetric positive-definite Hungarian matrix and an integer k, 1 ≤ k ≤ n, this algorithm computes the kth column of the max-plus Cholesky factor L of H. 1: initialize heap; push(heap, k, 0) 3: while heap is nonempty do 4: (i, d i ) = pop max(heap) 5: for all j such that h ij = −∞ and checked j = false do 8: if d cand > d j then 10: push(heap, j, d cand ) 12: end if end if 18: end while 3.2.Max-plus IC preconditioner pattern.We can use the max-plus Cholesky factor L ∈ R n×n max of V(H) to construct a sparsity pattern for an IC preconditioner for a symmetric positive-definite Hungarian matrix H ∈ R n×n as follows.If we want the IC pattern to include the positions of all of the entries in the exact Cholesky factor L of H that are greater in modulus than some drop tolerance > 0, then Heuristic 3.3 suggests using the pattern S ∈ {0, 1} n×n given by (3.5) The total number of nonzero entries per column can also be restricted to an integer m, by setting s ij = 1 for the m largest positions in the jth column only, as predicted by Heuristic 3.3.Once we have a suitable pattern matrix, we can compute an IC factor with that pattern using standard algorithms from the level of fill IC(k) approach.Specifically, we compute one column of the preconditioner at a time using a leftlooking algorithm; details are given in [34].
If we are computing the max-plus Cholesky factor of H to predict the positions of large entries in the Cholesky factor L of H, then we can speed up Algorithm 1 by terminating it early.Algorithm 2 calculates the positions of the m largest entries in the kth column of L. If there are fewer than m entries greater than some chosen threshold log , then it will instead return the positions of the r < m entries that are greater than log .The worst-case cost of this algorithm is O E k,m, + V k,m, log(n) , where E k,m, is the total number of edges explored in line 6 of Algorithm 2 and V k,m, is the number of times the algorithm passes around the while loop that begins on line 3.A possible approximation of these quantities for the case = 0 is given by Here the assumption is that vertices are examined and checked in a uniform random order and that the number of nonzero entries per row in H is constant.
It is interesting to compare the max-plus pattern (3.5) with the level of fill IC(k) pattern.The IC(k) pattern P ∈ {0, 1} n×n can be expressed as 1 if there is a fill path of length ≤ k from i to j through G(H), 0 otherwise, whereas, using (3.2), the max-plus IC pattern S ∈ {0, 1} n×n in (3.5) can be expressed as The level of fill approach drops entries that correspond to longer paths, while the maxplus approach drops entries that correspond to paths with less weight.By taking this extra information into account, the max-plus approach has the ability to produce more effective preconditioners by dropping some smaller entries with lower levels of fill and including some larger entries with higher levels of fill.Note that in the special case that H ∈ R n×n has h ii = 1, 1 ≤ i ≤ n, and h ij = γ < 1 for all other nonzero positions, then the IC(k) pattern will be identical to the max-plus pattern chosen using = γ k+1 .Note also that Scott and Tůma [34] explored IC factorizations with variable levels of fill, allowing large entries to contribute to more levels of fill than small entries.Their numerical results illustrated the potential effectiveness of such an approach.
4. Numerical results.We present numerical results for problems taken from the University of Florida Sparse Matrix Collection [4].We select all symmetric positive-definite matrices of order n > 5000 except those that are diagonal or represent minor variations on other matrices.This gives a set of 132 problems.In each test, the matrix A is reordered, scaled, and, if necessary, shifted to avoid breakdown of the factorization so that the incomplete factorization of A = DQ T AQD + αI is computed, where Q is a permutation matrix, D is a diagonal scaling matrix with entries d ii = 1/ (Q T AQ) ii , and α is a nonnegative shift.The permutation matrix Q is computed using the Sloan profile reduction ordering algorithm [30,37,38].This ordering is used since, in our experience, it frequently leads to a reduction in the number of conjugate gradient iterations [35,36].Preconditioned conjugate gradient (PCG) is applied to the original matrix A (so that the incomplete preconditioner is ( L L T ) −1 with L = QD −1 L).The strategy for choosing α is as described in [35] (see also [26]).The max-plus IC factorization is started with α = 0, but if a zero (or Algorithm 2 Given the valuation H ∈ R n×n max of a real symmetric positive-definite Hungarian matrix, a tolerance > 0, and two integers m, k, 1 ≤ m, k ≤ n, this algorithm computes the kth column of a pattern matrix with 0 and 1 entries such that there are 1's in the r ≤ m entries corresponding to the largest entries in the kth column of the max-plus Cholesky factor L of H that are greater than log . 1: for i = 1, . . ., n set d i = −∞ and checked i = false 2: initialize heap; push(heap, k, 0) 3: set p ik = 0 for i = 1, 2, . . ., n; set r = 0 4: while r < m do 5: (i, d i ) = pop max(heap) if d i < log then exit 7: for all j such that h ij = −∞ and checked j = false do 10: if d cand > d j then end if 21: end while negative) pivot is encountered, a nonzero α is employed (the initial value used in our experiments is 0.001) and the factorization is restarted.This process may need to be repeated more than once, with α increased (normally by a factor of 2) each time the factorization breaks down.For our test set, we found that the largest shift needed by the max-plus IC factorization was 0.064.
We use the implementation MI21 of PCG provided by the HSL Mathematical Software Library [21].For each problem, we terminate the computation if a limit of either 10,000 iterations or 10 minutes is reached.The PCG algorithm is considered to have converged on the ith step if where r i is the current residual vector and r 0 = b − Ax 0 is the initial residual.In all our experiments, we take the initial solution guess to be x 0 = 0 and choose the righthand side b so that the solution is the vector of 1's.If PCG fails to converge within our chosen limits, the result is recorded as a failure.All runs are performed on a dual socket E5-2695 v3 machine using the Intel Compiler Suite v16.0.1.We use the Intel MKL sparse triangular solve and matrix-vector routines to apply the preconditioner and calculate matrix-vector products Ax.
To measure the performance of PCG, we use the following statistics: nitr is the number of iterations required for PCG to converge.ma pcg is the total number of memory accesses to perform PCG, given by (4.1) where nnz(A) and nnz(L) are the number of entries in the lower triangle of A and in the (incomplete) Cholesky factor, respectively.This represents a matrix-vector multiplication, and a forward and a backward solve with L at each iteration.In an ideal implementation, runtime would be proportional to ma pcg .We compare the new max-plus IC preconditioner with the following preconditioners.Diagonal: Equivalent (in exact arithmetic) to no preconditioning as our prescaling results in all diagonal entries being 1.0.IC(0): Incomplete Cholesky based on the sparsity pattern of A. A drop tolerance δ = 10 −3 is applied in a postfactorization filtering step (so that all entries in the computed factor that are of absolute value less than δ are discarded).IC(1): Incomplete Cholesky based on the pattern of A plus one level of fill.A drop tolerance δ = 10 −3 is applied in a postfactorization filtering step.HSL MI28: A limited memory IC preconditioner developed by Scott and Tůma [35,36].
We use the default drop tolerances of 10 −3 and 10 −4 and allow up to 10 fill entries in each column of the incomplete factor (that is, the HSL MA28 parameters lsize and rsize that control the memory usage and sparsity of the factor are both set to 10).Note that, for each problem and each algorithm, a different shift α may be needed.A nonzero value is only used if, during the construction of the preconditioner, a zero or negative pivot is encountered.We apply postfactorization filtering to the IC(0) and IC(1) patterns so as to be consistent with HSL MI28.In practice, we find that dropping small entries can improve the sparsity of the factors without significantly affecting the quality of the preconditioner.See, e.g., [34].
We do not report detailed times to form the preconditioner as the purpose of this study is to evaluate the numerical quality of the preconditioner rather than to develop an efficient implementation.However, to give an indication of runtimes for our current code, we comment that for each of our test problems, our basic max-plus serial implementation takes less than 7 minutes to construct the preconditioner.The slowest max-plus time that gives a preconditioner that leads to PCG converging within our chosen limits is for problem Janna/Bump 2911.In this case, our prototype code takes approximately 260 seconds to construct the preconditioner, followed by 58 seconds to run PCG, which compares to around 14 seconds for constructing IC(0), followed by 90 seconds to run PCG.For IC(1) (respectively, HSL MI28) the corresponding times are 30 seconds (respectively, 55 seconds) for constructing the preconditioner and 95 seconds (respectively, 35 seconds) to run PCG.The max-plus implementation can potentially be accelerated through the use of parallel processing as the pattern of each column can be calculated independently, but implementing this efficiently is nontrivial and outside our current study.We observe that the patterns of the columns of IC(1) (and, more generally, IC(k)) can also be computed in parallel [22], but HSL MI28 is a serial approach.
4.1.Max-plus parameters.Our algorithm for determining the incomplete max-plus pattern has three parameters: m, the maximum number of entries per column, , the max-plus drop tolerance, and δ, a tolerance that is applied to filter the final L factor.To be consistent with the IC(0) and IC(1) preconditioners, the latter is set to 10 −3 .
To establish suitable settings for m and , we perform experiments on a subset of 15 matrices.This subset was chosen by ordering the test set in order of nnz(A) and then choosing (approximately) every 10th example.We present results in Table 4.2 for m = 10 and 20 with = 10 −4 , 10 −5 , and 10 −6 .For comparison, the results for the other IC approaches are shown in Table 4.3.As m increases and decreases, more entries are included in the factors.The results show that typically the relaxation of has little effect on the size of the factors, although for some problems using ≤ 10 −5 can significantly decrease the number of iterations required (e.g., AMD/G2 circuit, ND/nd6k, Janna/Bump 2911).We therefore choose to use = 10 −6 in the rest of this paper.We observe that we also experimented with using = 0.0.For a small number of examples, this can further reduce the number of iterations (e.g., for Williams/cant, the count is cut from 2016 to 1617), but the time to compute the preconditioner increases significantly (for many of our tests, compared to using = 10 −6 , the time for = 0.0 increases by more than 50 percent, and this is not fully offset by the reduction in the iteration count).
The effect of m is much more dramatic, both in terms of increased factor size and decreased number of iterations.When we consider the balance of these qualities in the number of memory accesses ma pcg , the best result can go in either direction.As m = 10 always gives the sparsest factors, we choose m = 10 for the remainder of this paper.The combination m = 10, = 10 −6 has the property that (on these 15 matrices) the size of the factors is always smaller than or commensurate with those produced by HSL MI28 with the selected settings for its input parameters.

Comparison with other IC preconditioners.
To assess the performance of the different preconditioners on our test set of 132 problems, we employ performance profiles [6].The performance ratio for an algorithm on a particular problem is the performance measure for that algorithm divided by the best performance measure for the same problem over all the algorithms being tested (here we are assuming that the performance measure is one for which smaller is better-for example, the iteration count).The performance profile is the set of functions {p i (f ) : f ∈ [1, ∞)}, where p i (f ) is the fraction of problems where the performance ratio of the ith algorithm is at most f .Thus p i (f ) is a monotonically increasing function taking values in the interval [0, 1].In particular, p i (1) gives the fraction of the examples for which algorithm i is the winner (that is, the best according to the performance measure), while if we assume failure to solve a problem (for example, through the maximum iteration count or time limit being exceeded) is signaled by a performance measure of infinity, p * i := lim f →∞ p i (f ) gives the fraction for which algorithm i is successful.Figures 4.1 and 4.2 present performance profiles for the iteration counts nitr and memory accesses ma pcg , respectively.We use a logarithmic scale in order to observe the performance of the algorithms over a large range of f while still being able to  discern in some detail what happens for small f .The highest number of failures (a third of the examples) results from using diagonal preconditioning, while HSL MI28 has only three failures.In terms of both iteration counts and memory accesses, HSL MI28 has the best performance, but the max-plus preconditioner also performs well and outperforms the level-based preconditioners.5. Concluding remarks.We have described a novel approach to computing the sparsity pattern of an IC preconditioner, which makes use of max-plus algebra.Our numerical results demonstrate that this approach is able to produce effective preconditioners for use with the PCG method.It is outside the scope of the present study to develop an efficient implementation, and more work is needed to obtain a high-quality efficient parallel implementation that, in terms of the total solution time, can compete with simpler established IC preconditioners.The max-plus IC problem has some nice features that might lead one to think that this is possible.Importantly, each column can be computed independently using Algorithm 1, and having computed the max-plus preconditioner sparsity pattern, the parallel approach of Chow and Patel [3] can be used to (approximately) perform the incomplete factorization.
The process of computing the max-plus preconditioner sparsity pattern is remarkably similar to the process of computing a level of fill IC(k) preconditioner sparsity pattern.The level of fill sparsity pattern is computed using an unweighted graph G, such that the level of fill of the (i, j) entry in the Cholesky factor is equal to the length (counted in number of steps) of the shortest fill path through G from i to j.Just like in the max-plus case, the sparsity pattern for each column can be computed independently using a shortest path algorithm.The difference between the max-plus method and the level of fill method is that in the max-plus case the graph has weighted edges, and we seek the maximally weighted path rather than simply the path with the fewest steps.This extra flexibility allows the max-plus method to take into account the different-sized entries in the problem matrix and as a result produce better preconditioners.However, using a weighted graph means that we have to use Dijkstra's algorithm instead of a breadth first search (the algorithm typically used for the level of fill method).A breadth first search is generally a little faster than Dijkstra's algorithm as the data structure required to store the integer depths it requires is simpler.The exact costs of applying both methods depend strongly on the structure of the problem matrix and the parameters chosen for the preconditioner.For example, using smaller thresholds in the max-plus method allows Dijkstra's algorithm to be terminated early, which speeds up the computation of the preconditioner.If we can develop an implementation of Algorithm 1 that is not significantly slower than the symbolic phase of IC (1), then this should result in an overall faster method for solving sparse positive-definite linear systems.
Finally, we note that the Factorized Sparse Approximate Inverse (FSAI) preconditioner that was introduced more than 20 years by Kolotilina and Yeremin [24] requires a pattern for the nonzero entries of the factors.Originally, this had to be set statically by the user (typically using small powers of A), although, more recently, dynamic schemes have been proposed (see, for example, [10,23] for further details and references).Theory would suggest that the positions of the largest entries of L −1 would be the ideal choice for the sparsity pattern; it remains an open question whether we can use max-plus algebra in this case.

Proposition 3 . 2 .
Let H ∈ R n×n be a symmetric positive-definite Hungarian matrix, and let H = V(H) ∈ R n×n max .Then the max-plus LU factors L and U of H satisfy U = L T and (3.1)

Table 4 . 2
Results for various values of the max-plus parameters m and for a subset of 15 matrices.Entries in bold are within 10% of the best.

Table 4 . 3
Results for other IC preconditioner for our subset of 15 matrices.-indicates failure to converge within our set limits.