Eﬃcient block preconditioning for a C1 ﬁnite element discretisation of the Dirichlet biharmonic problem

. We present an eﬃcient block preconditioner for the two-dimensional biharmonic Dirichlet problem discretised by C 1 bicubic Hermite ﬁnite elements. In this formulation each node in the mesh has four diﬀerent degrees of freedom (DOFs). Grouping DOFs of the same type together leads to a natural blocking of the Galerkin coeﬃcient matrix. Based on this block structure, we develop two preconditioners: a 2 × 2 block diagonal preconditioner (BD) and a block bordered diagonal (BBD) preconditioner. We prove mesh-independent bounds for the spectra of the BD-preconditioned Galerkin matrix under certain conditions. The eigenvalue analysis is based on the fact that the proposed preconditioner, like the coeﬃcient matrix itself, is symmetric positive deﬁnite and is assembled from element matrices. We demonstrate the eﬀectiveness of an inexact version of the BBD preconditioner, which exhibits near-optimal scaling in terms of computational cost with respect to the discrete problem size. Finally, we study robustness of this preconditioner with respect to element stretching, domain distortion and non-convex domains.

1. Introduction.The biharmonic operator is a key component in mathematical models of a number of important physical problems.It arises in plane strain and plane stress elasticity problems, where the solution is expressed in terms of an Airy stress function [32, p. 79], [37, p. 288], and in plate bending problems.It also occurs in the stream-function-vorticity formulation of two-dimensional Stokes flow [27].
The strong formulation of the Dirichlet biharmonic problem seeks the function u ∈ C 4 (Ω) that satisfies in the domain (x 1 , x 2 ) ∈ Ω ⊂ R 2 with piecewise smooth boundary ∂Ω and source function f ∈ L 2 (Ω) subject to the Dirichlet boundary conditions where ∂u ∂ n denotes the outward normal derivative and g 1 and g 2 are given functions.In the context of the plate bending problem the case g 1 = g 2 = 0 corresponds to a clamped boundary.
Numerical schemes for solving (1.1)-(1.2) either approach the problem directly, or reformulate it as a mixed formulation (i.e., solve a system of two second-order problems).The advantages of using the former approach include better asymptotic † School of Mathematics, The University of Manchester, Manchester, M13 9PL, UK, (jennifer.pestana@manchester.ac.uk, rlmuddle@googlemail.com, matthias.heil@manchester.ac.uk, francoise.tisseur@manchester.ac.uk) ‡ School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK (milan.mihajlovic@manchester.ac.uk) § These authors were supported by Engineering and Physical Sciences Research Council grant EP/I005293.The work of the fourth author was supported by a Royal Society-Wolfson Research Merit Award.
accuracy for the same level of grid resolution [1,Theorem 5.4], [10, Theorems 6.1.6and 7.1.6and p. 392], and a symmetric positive definite coefficient matrix for the discrete problem.Conversely, for the mixed formulation, discretisation (by a finite difference or finite element method, for example) results in a linear algebraic system that is symmetric, but indefinite.
In this paper we consider a conforming C 1 finite element approach [26], for which the standard weak form is to find u ∈ H 2 (Ω) satisfying (1.2) such that holds for all test functions v ∈ H 2 0 (Ω), where The discrete weak formulation is obtained by restricting (1.3) to a finite dimensional space S(Ω) ⊂ H 2 (Ω), for which we adopt a basis associated with the bicubic Hermite (Bogner-Fox-Schmit) finite elements [6, p. 72]; these are formed from a tensor product of one-dimensional Hermite polynomials.The C 1 continuity across element boundaries is ensured by assigning four degrees of freedom (DOF) to each node, corresponding to four different basis functions.
The finite element approximation of (1.3) is then obtained by solving a linear system Ax = b, where A ∈ R N ×N is a large, sparse and symmetric positive definite (SPD) matrix and b ∈ R N .Such systems are usually solved by iterative methods, with the conjugate gradient (CG) method a popular choice [16,Ch. 2].Grouping together the unknowns corresponding to the same DOF type leads to the following natural 4 × 4 blocking of the coefficient matrix where A ij ∈ R n×n , i, j = 1, . . ., 4, and N = 4n, where n is the number of interior nodes.Since the biharmonic operator is fourth order, the two-norm condition number of the matrix A behaves as κ(A) = O(h −4 ), where h is the mesh parameter (assuming uniform discretisation) and we find that mesh refinement generally has a detrimental effect on the convergence speed of the CG method.This problem can be rectified by effective preconditioning.
Block preconditioners with multigrid components have also been considered.Aksoylu and Yeter [2] develop preconditioners with blocks based on regions of high and low bending while Bjørstad [3] uses blocks arising from a separation of variables of a related problem.Peisker and Braess [29] use a blocking based on basis function types, as we do, but their preconditioner is based on a mixed formulation of the biharmonic problem.Other preconditioners for the mixed formulation use blocks associated with different differential operators and efficient preconditioners of this sort apply multigrid to the Dirichlet Laplacian blocks [31], or the Schur complement system [17].
In this paper we propose two novel preconditioners that are fully algebraic and are assembled from the element matrices in an analogous manner to the matrix A, making them easy to implement.The first of these preconditioners is a 2 × 2 block diagonal (BD) matrix.The positive definiteness of A and the assembly of the preconditioner from element matrices means that analysis based on the general ideas of Wathen [34], [35], [36] can be applied to demonstrate that mesh-independent convergence is guaranteed in certain cases.
The second preconditioner introduced in this paper is a computationally cheaper block bordered diagonal (BBD) approximation of the block diagonal preconditioner, that is feasible for larger problems, and that can be implemented in a cost-effective manner.For this second preconditioner we provide some spectral analysis.We then employ numerical experiments to demonstrate mesh independent convergence rates and show that it is possible to deploy off-the-shelf multigrid approximations for certain matrix blocks.
The paper is organised as follows.In Section 2 we discuss the finite element assembly process of the matrix A in (1.4) and relevant aspects of the conjugate gradient method.Section 3 describes the new block diagonal preconditioner.We characterise the eigenvalues of the preconditioned matrix and give conditions for mesh-independent convergence.However, the preconditioner is costly to apply.We therefore introduce a more practical block bordered diagonal preconditioner in Section 4, and provide an eigenvalue analysis.We propose an inexact version of the BBD preconditioner, which involves matrix lumping and algebraic multigrid approximation.Finally, we present numerical experiments in Section 5 that verify the effectiveness of the inexact BBD preconditioner and investigate its robustness with respect to changes in the domain and element shape.

Preliminaries.
In this section we describe the details of the finite element assembly process for the biharmonic problem and introduce the preconditioned conjugate gradient (PCG) method.
2.1.The finite element assembly process.The analysis of the spectra of the preconditioned matrices in later sections will be based on the fact that the finite element matrix A in (1.4) is assembled from element contributions.In this section we describe this assembly process.
We discretise (1.3) using C 1 Hermite finite elements, defined in a reference domain with local co-ordinates (s 1 , s 2 ) ∈ Ω = [−1, 1] 2 .The solution within the element is represented as where U jk are the unknown coefficients and ψjk are the reference Hermitian basis functions.The subscript j represents the node number and k enumerates the DOF type, such that at node j, U jk interpolates u, ∂u ∂s1 , ∂u ∂s2 and ∂ 2 u ∂s1∂s2 for k = 1, . . ., 4, respectively.The same basis functions are used to isoparametrically map the reference element to the actual element Ω e .
Consider now a finite element discretisation of the domain Ω consisting of M elements and let A e ∈ R 16×16 , e = 1, . . ., M , be the biharmonic element matrices associated with these elements.The matrices A e are symmetric positive semidefinite and each entry is of the form where i = 4(i 2 −1)+i 1 , j = 4(j 2 −1)+j 1 and J e is the element Jacobian.Consequently, multiplying A e by a vector u ∈ R 16 , with elements u j = u j1j2 , is equivalent to computing integrals of linear combinations of basis vectors, that is, Thus, the nullspace vectors of A e can be thought of in terms of linear combinations of certain basis functions.These nullspace basis functions are harmonic functions, i.e., functions for which the Laplacian is zero (see (1.3)).It is straightforward to verify that a basis for these harmonic functions is from which the nullspace of A e can be computed.Now let us describe the assembly process of (1.4) mathematically.We introduce the matrix L e ∈ R 16×N that maps the entries of A e to entries of A. Then where and diag(A e ) is a block diagonal matrix of element matrices A e , i = 1, . . ., M .The matrix diag(A e ) is related to the differential operator and the choice of basis functions, while L provides information about the geometry and boundary conditions.During this assembly process, unknowns corresponding to the same DOF type are grouped together and this leads to the natural blocking of the coefficient matrix as: (2.4) The unknown vector x and the right-hand side b are blocked accordingly.

2.2.
The conjugate gradient method.The conjugate gradient method (CG) is perhaps the best known Krylov subspace method for solving sparse linear systems, and is suitable for systems with an SPD coefficient matrix.The relative error after k iterations of CG is bounded by [16, p. 51] , where e (k) = x − x (k) and α(A) = λ max (A)/λ min (A).Since A is symmetric positive definite α(A) corresponds to the 2-norm condition number κ(A).As mentioned in the introduction and verified numerically in Table 2.1, κ(A) = O(h −4 ).Although this bound may be pessimistic, we do see a deterioration in convergence speed of the CG solver as the mesh is refined (see the computations in Section 5).
The effective condition number [4], [30] can better describe the effect of perturbations of A and the right-hand side b on the solution x, but does not describe the convergence rate of the conjugate gradient method (which is determined by a complex interaction between the spectrum of A and the right-hand side).Li, Huang and Huang [24] have shown that for the biharmonic equation and Hermite elements the effective condition number is O(h −3.5 ) for general problems but can be as low as O(1) for certain boundary conditions.We observe this O(1) behaviour for the homogeneous Dirichlet biharmonic problem (1.1) with f = 1 when square, stretched or deformed elements are used (see Figure 5.

2).
The problem of slow convergence rates can be alleviated by solving an equivalent preconditioned system P − 1 2 AP − 1 2 y = P − 1 2 b with x = P − 1 2 y, where P ∈ R N ×N is SPD.Note that the CG algorithm itself requires only a linear system solve with P at each iteration, i.e., the matrix P − 1 2 is never explicitly formed.The error of the preconditioned CG iterates can be bounded by The error bound shows that the convergence of the CG method is accelerated when the condition number of the preconditioned matrix P −1 A is small.It can also be shown that fast convergence rates are achieved when the eigenvalues belong to a small number of tightly bounded clusters (see, for example, [16,Section 3.1]).If the eigenvalues of P −1 A can be bounded independently of the mesh size h (and possibly other problem parameters) then P is an optimal preconditioner, in the sense of convergence of the conjugate gradient method.If, in addition, linear systems involving P can be solved in a manner that scales linearly with the problem size then we have an optimal solver.
3. An ideal preconditioner.We first consider the block diagonal preconditioner Since any principal submatrix of an SPD matrix A is itself SPD [21, p. 397], the preconditioner P BD ∈ R N ×N is also symmetric positive definite.Additionally, P BD is formed from a subset of the block matrices A ij of A and so it is possible to assemble P BD from the element matrix contributions in a manner analogous to that described in Section 2.1.Thus, where P e is obtained from A e , with values that would be assembled into A i4 or A T i4 set to zero for i = 1, 2, 3.The element contribution to the preconditioner (henceforth, the element preconditioner) P e , like A e , is symmetric positive semidefinite, but it has rank 11 rather than 8. Straightforward computation shows that the nullspace of P e is spanned by vectors corresponding to 1, . Note that the last of these functions is a combination of the last three functions in (2.1).Consequently, the nullspace of diag(P e ) is contained in the nullspace of diag(A e ) as stated in the following lemma, which will be relevant in the subsequent analysis.
We investigate analytically the spectral properties of P −1 BD A. For convenience we introduce the notation Then the eigenvalues of P −1 BD A are characterised by the following theorem.Theorem 3.2.Assume that rank(B) = r in (3.3).Then P −1 BD A ∈ R N ×N , with A and P BD given by (1.4) and (3.1), respectively, has N − 2r unit eigenvalues.The remaining 2r eigenvalues λ satisfy where µ max ∈ (0, 1) is the largest eigenvalue of BD , which is symmetric positive definite, any eigenvalue λ of P −1 BD A is real and positive.Using (3.3), we see that λ satisfies where u ∈ R 3n and v ∈ R n are not simultaneously zero and N = 4n.If λ = 1 then (3.4) implies that Bv = 0, i.e., that v = 0 or v ∈ null(B).We can find n − r linearly independent vectors in null(B) for which (3.4) and (3.5) are satisfied with u = 0. Otherwise, v = 0 and it follows from (3.5) that u ∈ null(B T ).Since we can find 3n − r linearly independent vectors in null(B T ), we have that one is an eigenvalue of P −1 BD A with multiplicity 4n − 2r = N − 2r.If λ = 1 then (3.4) implies that (λ − 1) −1 A −1 Bv = u and substituting for u in (3.5) gives that From this we see that non-unit eigenvalues λ of P −1 BD A are given by λ = 1 + √ µ and [21,Theorem 7.7.7].The result follows.
The rank of B is at most n, so at least 2n eigenvalues are equal to one, while the largest non-unit eigenvalue is less than two, regardless of the mesh size.The focus of the remainder of this section is to bound the smallest non-unit eigenvalue, since doing so ensures mesh-independent convergence.
To bound the smallest eigenvalue of P −1 BD A we adapt the analysis of Wathen [34], [35], [36] to our case.The basic idea is to determine the eigenvalues of the preconditioned element matrix diag(P e ) −1 diag(A e ) and to then obtain mesh-independent bounds using Rayleigh quotients.In our case this approach is complicated by the fact that the singular matrices diag(A e ) and diag(P e ) have nullspaces of different dimensions.However, we can still apply the general methodology since we know from Lemma 3.1 that null(diag(P e )) ⊂ null(diag(A e )) ⊂ R 16M .
To deal with the different nullspaces involved it is useful to introduce certain subspaces of R 16M .Specifically, we define: R := range(diag(A e )), Z := null(diag(A e )), where N ⊥ is the space of all vectors orthogonal to vectors in N .With these spaces, R 16M = R + N + M, with N ⊂ Z. Furthermore, since the matrices diag(A e ) and diag(P e ) are block diagonal, the basis vectors of R, N , Z and M can be constructed from their element contributions.
In addition to the spaces defined above, we require the following lemma that shows that nonzero vectors in R N cannot be mapped to Z by the connectivity matrix.
y T diag(A e )y y T diag(P e )y .
Let y = y R + y M + y N , where y R ∈ R, y N ∈ N , and y M ∈ M with R, N and M defined in (3.6).Lemma 3.3 shows that y R = 0 and so This appears problematic because, without any restriction on the size of y M , the smallest eigenvalue λ min (P −1 BD A) could asymptotically tend to zero.To prevent this, we must somehow bound the size of the denominator of (3.7).This is achieved by the next result, provided that y T R diag(P e )y R ≥ δ y T M diag(P e )y M for some δ ≥ δ * > 0, a condition that we verified numerically in Table 3.3 for the regular and stretched grids depicted in Figure 3.1.
Lemma 3.4.Let y = Lx, x ∈ R N , x = 0 be decomposed as where y R ∈ R, y N ∈ N and y M ∈ M with R, N and M defined in (3.6).Additionally, assume that for some δ ≥ δ * > 0. Then where ζ = 2 (1 + 1/δ) .Proof.From Lemma 3.3 we know that y R = 0. Since diag(P e ) is symmetric positive semidefinite it has a semidefinite square root and there are vectors Now, for any vectors a and b of the same dimension Combining (3.11) with (3.9) gives (3.10).
We have been unable to prove that (3.9) holds for all meshes for the Dirichlet biharmonic problem, since it does not appear straightforward to remove the influence of the connectivity matrix L. However, there is strong numerical evidence to suggest that the assertion holds.In particular, let P R and P M be orthogonal projectors onto R and M, respectively.Then for any vector y = Lx, x = 0, where The value of δ min is tabulated for a sequence of uniformly refined meshes of square elements in Table 3.1.From this we see that for square elements δ min appears to tend to 1.05, so that (3.9) is satisfied for uniformly refined meshes of square elements with δ > 1.05.
With these results in hand, we now bound the smallest eigenvalue of P −1 BD A. Under the assumption (3.9), we combine the decomposition (3.8) with Lemma 3.4 to give that, for any y = Lx, x = 0, where by Lemma 3.3, y R = 0.It follows from (3.7) that Since diag(P e ), diag(A e ) and the projector P R onto R are block diagonal, the above minimisation over all nonzero y R can be carried out using individual element matrices.We computed the minimum for our element matrices and found that θ in (3.14) is larger than 0.046 for square elements.Since ζ < 3.91 for square domains and square elements, we have that λ min (P −1 BD A) > 0.0118.Combining (3.9) with Theorem 3.2 gives the following bounds on the eigenvalues of P −1 BD A. Corollary 3.5.Let A and P BD be as in (1.4) and (3.1) and assume that (3.9) holds.Then for square domains and square elements the eigenvalues λ of P −1 BD A satisfy 0.0118 < λ ≤ 2 and κ 2 (P −1 BD A) < 170 independently of the mesh spacing parameter h.
Comparison with Table 3.1 shows that the bounds in Corollary 3.5 are pessimistic.However, combined with the high multiplicity of the unit eigenvalue, they show that we can expect fast convergence of preconditioned CG whenever (3.9) is satisfied.
We also tested assumption (3.9) for rectangular domains using elements that are stretched in the x 1 direction, with a denoting the ratio of the length of the horizontal side to the length of the vertical, as shown in Figure 3.1.We see from Table 3.2 that δ min decreases as the aspect ratio increases but that, for a fixed aspect ratio, δ min seems to tend to a constant as the mesh is refined.On the other hand, θ in (3.14) actually increases with a (see Table 3.3).The net result is the eigenvalue bound θ/ζ in Table 3.3 that slowly decreases as the aspect ratio increases but is asymptotically independent of the mesh width, and that qualitatively captures the behaviour of the smallest eigenvalue.4. A practical preconditioner.Although the preconditioner P BD in (3.1) has favourable spectral properties, it is prohibitively expensive to apply for large problems, since it requires the solution of linear subsystems involving the 3n × 3n matrix A. We will now investigate the block bordered diagonal (BBD) preconditioner that is formed by omitting A 23 and A T 23 from P BD .Unlike P BD , the symmetric positive definiteness of A is not enough to guarantee that P BBD is positive definite.However, P BBD was found to be positive definite in all the numerical experiments (performed with square, stretched and deformed meshes) presented in Section 5 below.Compared to the even simpler block Jacobi preconditioner P BBD retains the coupling between u and both first derivative ( ∂u ∂s1 and ∂u ∂s2 ) DOFs.We will see in Section 5 that this is essential to obtaining low and asymptotically constant iteration counts as the mesh is refined.
The action of P −1 BBD on a vector can be computed by means of the unsymmetric UL decomposition where Note that the solve U w = v can be performed in a block parallel manner.The remainder of this section is devoted to understanding the spectral properties of P −1 BBD A and deriving an approximation that can be implemented in a cost-optimal manner.4.1.Eigenvalue analysis.The block structure of P BBD and, in particular, the indefiniteness of the element matrices used in its assembly prevent us from applying the previously introduced analysis to bound the spectrum of P −1 BBD A. Instead, we consider the eigenvalues of P −1 BBD P BD and then apply the bounds where v = 0, Proof.In the notation of (3.3) where I n is the identity matrix of dimension n.This shows that 1 is an eigenvalue of P −1 BBD P BD with multiplicity at least n.
To obtain the remaining 3n eigenvalues let us further partition A and A as where F and G are as in (4.7).Then, the result is obtained by a straightforward extension of Theorem 3.1 of Dollar et al. [13] to the case of rank-deficient F , which we sketch out for completeness.The eigenvalues η of A −1 A satisfy where u ∈ R n and v ∈ R 2n are not simultaneously zero.From (4.8) we see that either η = 1 or , we find that there are n − s linearly independent vectors v 1 ∈ null(A 23 ) for which (4.8) and (4.9) are satisfied with v 2 = u = 0. Similarly, there are n−s linearly independent vectors v 2 ∈ null(A T 23 ) for which (4.8) and (4.9) are satisfied with v 1 = u = 0. Otherwise, v = 0 and we can find n linearly independent vectors u = 0. Combining these results shows that η = 1 with multiplicity 3n − 2s.If η = 1 then u = −A −1 11 F T v and substituting into (4.9)gives (4.6).
Similarly to Theorem 3.2 we see that λ = 1 is an eigenvalue of P −1 BBD P BD with high multiplicity.However, we have been unable to bound the remaining 2s eigenvalues.In spite of this, combining Lemma 4.1 with Corollary 3.5 and (4.5) shows that most of the eigenvalues of P − The extreme eigenvalues of P −1 BBD A are given in Table 4.1 as a function of the problem size N .From this we see that these eigenvalues do not differ greatly from the extreme eigenvalues of P −1 BD A (see Table 3.1), and in practice little is lost in terms of the asymptotic convergence speed by using a more practical preconditioner.The numerical evidence in Table 4.1 suggests that the extreme eigenvalues of P −1 BBD A appear to be bounded under mesh refinement, although we have been unable to prove this analytically.

Further simplifications.
Although the block decomposition (4.3) allows the efficient application of P BBD , to achieve a preconditioner with optimal cost we require optimal solvers for linear systems involving the principal diagonal blocks S where lump(H) = {h ij } with Then for uniformly refined meshes of square elements the eigenvalues of L − .All six of these element matrices are SPD.As a result, we can use the approach of Wathen [34] to prove the result.(Recall that a similar result was used in Section 3, where we had to deal with singular element matrices.) Remark 2. This result is not surprising since the spectrum of A 22 = A 33 resembles that of a scaled mass matrix, and for such matrices lumping often gives spectrally equivalent operators.In particular, for our problem λ(A 22 ) = λ(A 33 ) ∼ O(h −4 )λ(M ), where M is a mass matrix; for uniform grids this can be verified using Fourier analysis, similarly to the approach in [14,Section 1.6].Additionally, on a uniform mesh all entries of A 22 and A 33 are nonnegative.
Remark 3.For the stretched grids used in the numerical experiments in Section 5 below, L 22 , L 33 and diag(A 44 ) are still spectrally equivalent to A 22 , A 33 and A 44 .However, the spectral equivalence bounds deteriorate as the aspect ratio increases.For example, when a = 2.5 the eigenvalues of L − where (4.12) The block S 11 in (4.12) is a sparse approximation of the Schur complement S 11 from (4.4), owing to the diagonal approximations (4.10), and can be assembled cheaply.
To apply the preconditioner P BBD within the preconditioned CG algorithm we must solve systems with S 11 , L 22 , L 33 and diag(A 44 ).The last three matrices are diagonal, and hence trivial to invert.For systems with S 11 we consider two approaches: an LU factorisation, which yields an exact solution but is not computationally optimal, or two V(2,2)-cycles of classical algebraic multigrid (AMG) with point Gauss-Seidel smoothing and Ruge-Stüben coarsening [25], which has optimal cost but leads to an inexact solution (cf.Table 5.1).Using these approximations for the Schur complement subsystem, we obtain the preconditioners P

[LU ]
BBD and P in which the Schur complement subsidiary system is solved using an LU factorisation and AMG, respectively.In Table 4.2 we present the spectral properties of the preconditioned operators P A. These results suggest that the spec- A is bounded under mesh refinement, as we might expect from the spectral equivalence bounds in Lemma 4.3, although the eigenvalues are not as tightly clustered as those of P −1 BBD A in Table 3.1.However, the smallest eigenvalue of A decreases with mesh refinement; that is, the AMG approximation is not spectrally equivalent to S 11 .

Numerical experiments.
In this section we examine the effectiveness of the preconditioners P BD , P BBD and P BBD at reducing the number of conjugate gradient A for a sequence of uniformly refined grids.For the AMG solver we use the HSL routine MI 20 [5,22].Note that we were unable to obtain the eigenvalues of the largest P iterations and the computational time.Additionally, we investigate the robustness of their performance with respect to stretching of the finite elements as well as deformations and non-convexity of the domain.Throughout, we choose the homogeneous Dirichlet boundary conditions g 1 = g 2 = 0 in (1.2).Our default domain is the unit square domain Ω = [0, 1] 2 discretised by a uniform grid of square elements.Although we note that for finite element problems the stopping criterion for CG should be tied to the discretisation error, to demonstrate mesh independence we terminate the preconditioned CG method when the residual decreases in norm by six orders of magnitude, that is, r (k) 2 ≤ 10 −6 r (0) 2 .All AMG results in this section are obtained with two V(2,2)-cycles using Ruge-Stüben coarsening and point Gauss-Seidel smoothing.We note that different AMG methods may give different results.However, our aim is to develop an effective preconditioner that is easy to implement, and so we choose off-the-shelf codes that generally work well for finite element problems [5].Thus, for the smaller experiments in Tables 5.1 and 5.2 we use the HSL code MI20 [5,22] with default options, except that we change the coarsening criterion c fail from 1 to 2 (and alter the number of V-cycles).For all the other experiments we use Hypre's BoomerAMG [20].
We first compare preconditioned CG iterations for P BD , P BBD , P for smaller problems using Matlab.For comparison, we also present iteration counts for the block Jacobi preconditioner (4.2) and AMG applied as a preconditioner to the entire block re-ordered matrix A from (1.4).We stress that preconditioners P BBD in (4.3) and P BBD in (4.11) are parallelisable, like the block Jacobi preconditioner P J in (4.2), as discussed in Section 4. Since the problems considered here are of relatively small dimension, in addition to measuring the norm of the residual we computed the relative error x − x (k) A / x A in the energy norm at termination, which we found to be uniformly smaller than 1.8 × 10 −7 .Computations were performed with different right-hand-sides b: we used the right-hand-side from the finite element discretisation of (1.1) for f = 1 and a random right-hand side b.Both choices result in similar behaviour, which shows that the convergence behaviour does not depend on the regularity of the forcing term.Consequently, only results for f = 1 are presented.
The results are given in Table 5.1, from which we see that without preconditioning the number of CG iterations increases rapidly, and appears to grow as O(h −2 ).The application of the AMG and block Jacobi preconditioners reduces the number of iterations somewhat, but convergence is still mesh dependent.This is not surprising since the condition numbers of these preconditioned matrices increase as the mesh is refined, as shown in Table 5.2.(Note that we were unable to obtain the eigenvalues of the largest AMG preconditioned matrix because of memory constraints.)Con- versely, the number of iterations required for P BD , P BBD and P [LU ] BBD do not increase markedly with mesh refinement, and our experiments later in this section for larger problems indicate asymptotically mesh-independent convergence.This is in line with the spectral analysis in previous sections and the computed eigenvalues in Tables 3.1, 4.1 and 4.2.
To explore the asymptotic behaviour of P BBD for larger problems, Figure 5.1 shows the number of iterations required for convergence of preconditioned CG, and solution times, for P BD and the three block bordered diagonal preconditioners (P BBD , P

[LU ]
BBD and P [AM G] BBD ).These results were obtained using a C++ implementation in oomph-lib [18] with SuperLU [12] for the direct solver.Note that times for the unpreconditioned system, and for the block Jacobi and full AMG preconditioned systems, were similar to or larger than those of the direct method, and had poor asymptotic behaviour.For this reason timings for these preconditioners are not shown in Figure 5.1.
We see from Figure 5.1 that P BD and P BBD give mesh independent convergence and that for both preconditioners the time to solution is lower than for the direct method.The use of P

[LU ]
BBD instead of P BBD leads to a slight increase in iteration counts but mesh independence is retained.Furthermore, the solution times in Figure 5.1 show that this increase in iterations is more than compensated for by the drastically reduced computational cost of applying the preconditioner.A further improvement can be achieved by replacing P  4.2, using AMG still reduces the solution times.This is due to the optimal cost of the AMG solver.Moreover, the solution times for P wall clock time for a simple test problem.We will now evaluate the robustness of our preconditioners for problems with stretched grids and domains that are non-square and non-convex.Stretched grids are needed, for example, for accurate computations of biharmonic eigenfunctions near the corners of the domain (see [8]). Figure 5.2 illustrates the three tests considered: 1. Stretched Elements.The domain is stretched in the x 1 direction and the deformation is described by the aspect ratio a. 2. Distorted Elements.The top right corner is stretched upwards in the x 2 direction; the ratio of the heights of the vertical boundaries is parameterised by b. 3. Curved Domain.The rectangular domain is isoparametrically deformed to form a non-convex curved domain.In all tests we used the same number of elements in each coordinate direction.
We start by examining the effect of stretching the grid, as in the left of Figure 5.2, on the preconditioners.The analysis in Section 3 (for P BD and P BBD ) suggests that an increase in the element aspect ratio is likely to have a detrimental effect on the effectiveness of the preconditioners.This is confirmed by Figure 5.3 which shows the iteration counts and solution times for a stretch ratio of a = 2.5.Element stretching leads to a slight increase in the iteration counts and the solution times for P BD and P BBD , though the asymptotic convergence rates obtained with these preconditioners remain mesh independent.As expected, the two inexact implementations of the block-bordered preconditioner, P BBD , are affected more strongly.We attribute this to the fact that for sufficiently large stretch ratios A 22 and A 33 have negative entries, which implies that the use of lumping and diagonal approximations in these preconditioners is less effective (cf.Remark 3).P [AM G] BBD is most sensitive because this preconditioner is also affected by the behaviour of AMG on stretched meshes [11].However, despite the noticeable increase in iteration counts, the plot of the solution times shows that the two inexact preconditioners remain significantly faster and scale better than the two exact preconditioners or the direct solver.In fact, over the range of problem sizes considered here, P [AM G] BBD performs best.Figure 5.4 illustrates the effect of element stretching on the CG convergence histories.For all four preconditioners, an increase in the element aspect ratio can be seen to lead to a decrease in the convergence rates (again consistent with the eigenvalue computations in Sections 3 and 4).In all cases the norm of the scaled residual starts with a value of one but jumps to a much larger value during the first CG iteration.Subsequently it decreases approximately linearly on a semi-log scale as the CG iteration proceeds.This implies that a reduction in the CG convergence tolerance will result in a controlled increase in the number of iterations required to achieve a solution of the desired accuracy.BBD now yields the shortest execution times.
Finally, Figure 5.6 illustrates the performance of the preconditioners for the case of the curved, non-convex domain shown in Figure 5.2.Here the trends are similar to those observed for the case of stretched grids.In particular, we observe meshindependent convergence for P [LU ] BBD but with higher iteration counts than for square elements.This again suggests issues with lumping/diagonal approximations for the show some signs of saturation, and this preconditioner leads to the shortest execution times overall, closely followed by P [LU ] BBD .6. Conclusions.We have presented effective preconditioners for the C 1 finite element discretisation of the Dirichlet biharmonic problem using Hermitian bicubic elements.The preconditioners are easy to set up as they only involve operations on blocks that are readily extracted from the full system; these blocks can also be computed from element matrices.On uniform meshes both the block diagonal and block bordered diagonal preconditioners appear to give mesh independent convergence.Moreover, we analysed the spectrum of block diagonal and block bordered diagonal preconditioners and showed that, under a certain condition, the block diagonal preconditioner P BD gives mesh independent convergence; the required condition holds for the uniform and stretched meshes tested here.Our analysis of the block diagonal preconditioner uses the approach of Wathen [34], [35], [36], which assumes that the coefficient matrix and preconditioner are symmetric positive definite and are assembled from element matrices.As such, Wathen's appealing technique is applicable to other finite element discretisations, differential operators and preconditioners.
To obtain a cost-optimal implementation, we further simplified the block bordered diagonal preconditioner P BBD by lumping certain block matrices and using AMG for the approximate solution of a sparse Schur complement subsidiary linear system.We tested this approximate preconditioner on square, stretched and distorted elements and on non-convex domains.In all cases we observed mesh-independent convergence for P BD , P BBD and P does not give mesh independent convergence, in many cases it gives the fastest execution time.However, stretching or distorting elements increased both the iteration counts and wall clock times, particularly for the AMG version of the preconditioner.An alternative to the current AMG solver could alleviate this issue.For example, in geometric multigrid, line smoothing is known to improve performance in the case of stretched or distorted meshes.It would be interesting to investigate this issue further.

Lemma 3 . 3 .
If x ∈ R N is a nonzero vector then Lx ∈ Z, where L, diag(A e ), diag(P e ) and Z are defined by (2.3), (2.2), (3.2) and (3.6), respectively.Proof.Both A and P BD are positive definite, which implies that for any x = 0, Ax = L T diag(A e )(Lx) = 0 and P BD x = L T diag(P e )(Lx) = 0.We know from Lemma 3.1 that null(diag(P e )) ⊂ null(diag(A e )) = Z, so Lx ∈ Z.Both A and P BD are positive definite and so λ min (P −1 BD A) has the variational characterisation[28, Chapters 1 and 15]

2 Fig. 3 . 1 .
Fig. 3.1.Stretched elements.The domain is stretched in the x 1 direction and the deformation is described by the aspect ratio a.

4 . 1 .
) which follow from the Courant-Fischer theorem[21, Theorem 4.2.11], in conjunction with the bounds in Corollary 3.5.The eigenvalues of P −1 BBD P BD are given in the following lemma.Lemma Let rank(A 23 ) = s.Then 1 is an eigenvalue of P −1 BBD P BD with multiplicity N − 2s while the remaining 2s eigenvalues η satisfy that although the iteration count increases significantly, as expected from the eigenvalue computations in Table

Fig. 5 . 1 .Fig. 5 . 2 .
Fig. 5.1.Number of preconditioned CG iterations (left) and solution times (in seconds).The execution time of the direct method SuperLU applied to the system with coefficient matrix (1.4) is presented for comparison.The legend applies to both plots.

Fig. 5 . 3 .
Fig. 5.3.Number of CG iterations (left) and solution time in seconds (right) for robustness test 1 (stretched elements) with aspect ratio a = 2.5.The execution time of the direct method SuperLU applied to the system with coefficient matrix (1.4) is presented for comparison.The legend applies to both plots.

Figure 5 .
5 shows the iteration counts and solution times for the deformed domain shown in the middle of Figure 5.2 (b = 1.5).In this case P BD , P BBD , and P [LU ] BBD yield mesh independent convergence rates, whereas the number of iterations obtained with P [AM G] BBD appears to increase linearly with the problem size.While P [AM G] BBD is still much faster than the exact preconditioners and the direct solver, P [LU ]

Fig. 5 . 6 .
Fig. 5.6.Number of CG iterations (left) and solution time in seconds (right) for robustness test 3 (curved non-convex domain).The execution time of the direct method SuperLU applied to the system with coefficient matrix (1.4) is presented for comparison.The legend applies to both plots.

Table 2 . 1
Extremal eigenvalues and 2-norm condition number of A for uniform meshes as a function of the problem size N .

Table 3 . 1
Smallest (λ min ) and largest (λmax) eigenvalues of the preconditioned operator P −1 BD A, rank(B) from Theorem 3.2 and δ min from (3.12) for a sequence of uniformly refined grids and square elements.

Table 3 . 2
Smallest (λ min ) and largest (λmax) eigenvalues of the preconditioned operator P −1 BD A and δ min from (3.12) for stretched grids with different aspect ratios a.

Table 3 . 3
The values of δ * and ζ in Lemma 3.4, θ in (3.14) and the lower bound θ/ζ on λ min (P −1 BD A) for stretched elements with different aspect ratios a.

Table 4 . 1
Computed smallest (λ min ) and largest (λmax) eigenvalues of the preconditioned operator P −1 BBD A as well as rank(A 23 ) for a sequence of uniformly refined grids of square elements.
1 BBD A lie in a bounded interval.Corollary 4.2.When square elements are used in a square domain and (3.9) is satisfied at least N − 2s eigenvalues of P −1 BBD A lie in (0.0118, 2).Any remaining eigenvalues lie in (0.0118η min , 2η max ), where η min and η max are the smallest and largest eigenvalues of the generalised eigenvalue problem (4.6).Remark 1. Analogous results hold for rectangular domains and stretched elements if we replace 0.0118 in Corollary 4.2 by the appropriate bound θ/ζ on λ min 11 , A 22 , A 33 and A 44 .First, we consider spectrally equivalent approximations of A 22 , A 33 and A 44 .
1 22 A 22 and L −1 33 A 33 are contained in [1/3, 1] while the eigenvalues of diag(A 44 ) −1 A 44 are contained in [0.43, 1.24].Proof.The matrices A 22 , A 33 and A 44 are assembled from 4 × 4 submatrices A of the element matrix A e .Additionally, the approximations L 22 and L 33 are assembled from lumped versions of A 1 22 A 22 lie in [0.04, 1], the eigenvalues of L −1 33 A 33 lie in [0.21, 1] and the eigenvalues of diag(A 44 ) −1 A 44 lie in [0.28, 2.4].Some off-diagonal elements of A 22 and A 33 were found to be negative even for an aspect ratio of a = 1.5.Using (4.3) and Lemma 4.3 we approximate P BBD by

Table 5 .
1 CG iteration counts for the unpreconditioned system, and preconditioned CG iterations counts for several different preconditioners: AMG applied to the whole matrix A in (1.4), P J , P BD , P BBD and the two inexact versions P