REDUCTION OF MATRIX POLYNOMIALS TO SIMPLER FORMS

. A square matrix can be reduced to simpler form via similarity transformations. Here “simpler form” may refer to diagonal (when possible), triangular (Schur), or Hessenberg form. Similar reductions exist for matrix pencils if we consider general equivalence transformations instead of similarity transformations. For both matrices and matrix pencils, well-established algorithms are available for each reduction, which are useful in various applications. For matrix polynomials, uni- modular transformations can be used to achieve the reduced forms but we do not have a practical way to compute them. In this work we introduce a practical means to reduce a matrix polynomial with nonsingular leading coeﬃcient to a simpler (diagonal, triangular, Hessenberg) form while preserving the degree and the eigenstructure. The key to our approach is to work with structure preserving similarity transformations applied to a linearization of the matrix polynomial instead of unimodular transformations applied directly to the matrix polynomial. As an application, we illustrate how to use these reduced forms to solve parameterized linear systems.

1. Introduction. Almost all matrices in C n×n can be reduced to diagonal form via a similarity transformation. (The exceptions constitute the measure-zero set of defective matrices.) Furthermore, all matrices in C n×n can be reduced to triangular and upper Hessenberg form via unitary similarity transformations. For matrices in R n×n , we have similar results with the difference that we now have quasi-diagonal and quasi-triangular forms instead of diagonal and triangular forms. Here the prefix "quasi" means that all diagonal blocks are either of size 1 × 1 or 2 × 2. Now, consider matrix polynomials with a nonsingular leading coefficient (1) P (λ) = λ P + · · · + λP 1 + P 0 with det(P ) = 0, over F = C or R. Is it possible to reduce such matrix polynomials to the simpler forms mentioned above while preserving the degree and the eigenstructure, that is, the eigenvalues and their partial multiplicities? If we use only similarity transformations, the answer is, in general, no. Even if we use the broader class of strict equivalence transformations, that is, multiplication by nonsingular matrices from left and right, it is, in general, not possible. Indeed, if there were to exist nonsingular matrices E and F such that EP (λ)F = T (λ) is triangular, say, of degree > 1 and with det P = 0 then the family of matrices (P −1 P −1 , . . . , P −1 P 1 , P −1 P 0 ) would be simultaneously triangularizable by similarity. This would imply (see, for example, [7,Thms. 2.4.8.6 and 2.4.8.7]) that for all i = j, i, j = 0, 1, . . . , − 1, the eigenvalues of P −1 P i P −1 P j − P −1 P j P −1 P i are all equal to zero. This is a very restrictive condition. A type of transformation that gives us a sufficient amount of freedom while preserving the eigenstructure is multiplication by unimodular matrix polynomials. A matrix polynomial U (λ) ∈ F[λ] n×n is said to be unimodular if det U (λ) ∈ F \ {0}, and two matrix polynomials that differ only by multiplication by unimodular matrix polynomials (from the left and the right) are said to be equivalent. It was shown in [16] and [17] that unimodular transformations are enough to reduce any square matrix polynomial to triangular form over C and quasi-triangular form over R, while preserving the degree. Of course, this includes the case of Hessenberg form since (quasi-) triangular matrices are also Hessenberg. Further, it is a straightforward exercise to show that any complex/real matrix polynomial with semisimple eigenstructure is equivalent to a diagonal/quasi-diagonal matrix polynomial of the same degree.
The reduction to diagonal form has applications in structural engineering, where it has been used to decouple systems of second-order differential equations (see, for example, [3] and [12]). In applications where parametrized linear systems of the form P (ω)x = b(ω) with P as in (1) need to be solved for many values of ω over a large range, it may be useful to first reduce P to simpler form before solving the linear systems (see section 5).
How can we compute these simpler forms in practice? The approach taken in the recent paper [2] is to define a pseudoinner product on the vector space F(λ) n where F(λ) is the field of rational functions. Then a Krylov-like subspace method is applied to any matrix polynomial to reduce it to Hessenberg form. In general, the entries in this Hessenberg matrix are rational functions. On the other hand, the discussions in [4,Thm. 1.7], [16] are based on applying unimodular transformations to the Smith form, and its numerical implementation is nontrivial. To avoid working with unimodular transformations, which, in general, affect the degree, we use linearizations. Recall that a pencil λI −A is a monic linearization of the matrix polynomial P (λ) ∈ F[λ] n×n in (1) if A ∈ F n× n and λI −A has the same elementary divisors as P (λ). Suppose P (λ) has the same eigenstructure as the monic matrix polynomial R(λ) = λ I + −1 j=0 λ j R j and take any monic linearization λI − A of P (λ). Note that λI − A is also a linearization of R(λ). The Gohberg, Lancaster, Rodman theory [4, sect. 1.10] tells us that there is an n × n matrix X such that (A, X) is a left standard pair for R(λ), that is, the n × n matrix is nonsingular and Taken together, (2) and (3) can be rewritten as  showing that A is similar to the left companion matrix associated with R(λ). Actually, for any given monic linearization λI − A of P (λ) and any nonsingular matrix S of the form (2), S −1 AS will always be the left companion matrix of some matrix polynomial, as in (4). This matrix polynomial R(λ) = λ I + λ −1 R −1 + · · · + λR 1 + R 0 will have the same degree and eigenstructure as P (λ). The above discussion suggests that in order to reduce P (λ) in (1) to a simpler form, it is enough to find an n × n matrix X such that S in (2) is nonsingular and S −1 AS has the desired zero pattern in the coefficient matrices (in the last block column), where A can be any matrix such that λI − A is a linearization of P (λ). One of the main contributions in this paper is to give a characterization of such a matrix X in terms of block Krylov subspaces (see section 2).
In the generic case, when all the eigenvalues are distinct, it turns out to be surprisingly easy to find X such that S −1 AS is the left companion matrix of a matrix polynomial in triangular, diagonal, or Hessenberg form. We illustrate this with a snippet of MATLAB code in Figure 1. If we replace schur(C P,'complex') by eig(C P), then C R becomes the companion matrix of an equivalent diagonal matrix polynomial, and if we replace schur(C P,'complex') by hess(C P) and ones(deg,1) by eye(deg,1), then C R becomes the companion matrix of an equivalent matrix polynomial in Hessenberg form. The code can be generalized to any degree and works as long as the block Krylov matrix S on line 8 is nonsingular, which it is for almost all coefficient matrices, as we will see in section 3.2. A colored spy plot from one execution of the MATLAB code in Figure 1 is shown on the left of Figure 2. The other plots correspond to the diagonal reduction (middle plot) and the Hessenberg reduction (right plot). We remark that the reduction to Hessenberg form requires no iterative process (such as computing the eigenvalues) and uses a fixed number of arithmetic operations. Our reduction gives a Hessenberg matrix polynomial with all but the second leading coefficient being triangular.
The code in Figure 1 is not meant to be a numerically efficient or stable algorithm. Although the threshold specified in the last line of the M-file works for all tried monic matrix polynomials with randomly chosen coefficients, the choice of X is by no means unique and likely improvable. Also, we need only the last block column of C L (R) and there may be more efficient ways to compute it. Nevertheless, the code suggests a possible practical procedure to reduce P (λ) in (1) to triangular form while preserving its degree and eigenstructure when all eigenvalues are distinct. In this paper we discuss why and when this procedure works. We also want to allow the monic matrix polynomials to have multiple eigenvalues and to use real arithmetic if the given matrix polynomial is real. In these cases, additional computations to those described in the code of Figure 1 may be needed. To be precise, one of the main goals of this work is to give a practical procedure to reduce any P (λ) in (1) to triangular or quasi-triangular form according as F = C or R, while preserving its degree and eigenstructure. The proposed procedure consists of the following steps: 1. Choose a monic linearization λI − A of P (λ). 2. Compute a real or complex Schur form, T 0 , of A according as F = R or C.
3. Reorder the diagonal entries of T 0 and, in the real case, the 2 × 2 blocks along its diagonal to produce a new Schur form T = U * AU that can be split into blocks that are suited to construct the matrix X of the next step. 4. Use U and the diagonal blocks of T to produce a matrix X ∈ F n×n of full column rank such that S in (2) is nonsingular and S −1 AS is the left companion matrix of a monic upper triangular matrix polynomial. 5. Compute S −1 A X, i.e., the last block column of C L (R) in (4), and extract the blocks R j , j = 0, . . . , − 1 defining R(λ) = λ I + λ −1 R −1 + · · · + λR 1 + R 0 . The matrix polynomial R(λ) will be upper (quasi-) triangular and have the same eigenstructure as P (λ). We remark that the structure of A and S can be exploited to compute S −1 A X at a reduced cost in step 5, but this is outside the scope of this work. Notice also that we could replace A in steps 4 and 5 by the Schur form T obtained in step 3. Nevertheless, the analysis of how to implement steps 4 and 5 in a numerically reliable and efficient way will remain as an open problem.
We will show in section 3 how to implement step 3 in a numerically stable manner when all the eigenvalues of P (λ) have algebraic multiplicity not greater than n (the size of P (λ)). The matrices X that are used to implement step 4 are characterized in section 2; a method to obtain them explicitly is provided in section 3. It will also be shown in that section that X can be constructed to have orthogonal columns. As mentioned above, it is left as an open problem how to obtain it in such a way that step 5 can be computed in a numerically stable manner. The quadratic case ( = 2) is fully examined in section 4, where a stable way of implementing step 3 is given that works independently of the algebraic multiplicity of the eigenvalues of P (λ).
To be slightly more general, we will also study how to construct matrices X to reduce P (λ) to one of the following forms: • block-diagonal form: si×si , 1 ≤ i ≤ k and s 1 + · · · + s k = n, • block-triangular form: monic of degree with T jj (λ) ∈ F[λ] sj ×sj , 1 ≤ j ≤ k, and s 1 + · · · + s k = n, and • Hessenberg form: with coefficient matrices H i , i = 0, . . . , − 1, in Hessenberg form. We will discuss in section 5 how to use the simpler forms to solve parameterized linear systems P (ω)x = b(ω), where x is to be computed for many values of the parameter ω.
2. Conditions for reduction to simpler forms. For matrices A ∈ F m×m and V ∈ F m×j we define the block Krylov matrix For a subspace X of F m and a matrix A operating on that subspace, we define AX = {Ax : x ∈ X }.
Assume that P (λ) is given by (1), and let λI − A be any monic linearization of P (λ), for example, the left companion linearization of P −1 P (λ). Recall that we are looking for a matrix X ∈ F n×n such that (i) S := K (A, X) = [X AX · · · A −1 X] is nonsingular, and (ii) λI − S −1 AS is the left companion linearization of one of the reduced forms in (5)-(7). If (i) holds, then S −1 AS is the left companion matrix of a monic matrix polynomial, say R(λ) = λ I + · · · + λR 1 + R 0 , and where e j denotes the jth column of the identity matrix and ⊗ denotes the Kronecker product. Then we see that the (i, j) entry of R(λ) with i = j is zero if and only if the vector S −1 A Xe j has zeros in the entries i, i + n, . . . , i + ( − 1)n. This means that S −1 A Xe j is in the span of the columns of the submatrix of I n obtained by deleting the columns i, i + n, . . . , i + ( − 1)n. Thus, taking into account that S −1 [X AX · · · A −1 X] = I and (8), it follows that where x j denotes the jth column of X.
We are now ready to state our main theorem, but before we do so we introduce some new notation. For the block reductions (5) and (6), it is convenient to partition X as where X j ∈ F n×sj and s 1 + · · · + s k = n. Also, we let x 1:i and X 1:i denote the matrices [x 1 x 2 · · · x i ] ∈ F n×i and [X 1 X 2 · · · X i ] ∈ F n×σi , respectively, where σ i := s 1 + · · · + s i . Finally, we define σ 0 := 0.
n×n be of degree with nonsingular leading matrix coefficient and let λI − A be any monic linearization of P (λ). Then P (λ) is equivalent to a monic matrix polynomial R(λ) of degree having one of the reduced forms (5)- (7) if and only if there exists a full rank matrix X ∈ F n×n such that for Hessenberg form as in (7).
(ii)(c): Suppose that R(λ) has the Hessenberg form of H(λ) in (7). From AS = SC L (H) and (9), we see that A x i lies in the span of K (A, x 1:i+1 ).
(⇐) Suppose that there exists X such that S = [X AX · · · A −1 X] is nonsingular. Then the matrix S −1 AS is the left companion form of a monic matrix polynomial of degree , say R(λ), equivalent to P (λ). Now, AS = SC L (R), (ii)(a), and (9) imply that the n × n blocks R 0 , . . . , R −1 in the last block column of C L (R) (see (8)) are block-diagonal with k diagonal blocks, the ith diagonal block being s i ×s i , where s i is the number of columns of X i , i = 1 : k. The proofs for (ii)(b) and (ii)(c) are similar.
3. Construction of the matrix X. In this section we discuss a process to construct the matrix X in Theorem 1 such that properties (i) and (ii) hold.
3.1. Auxiliary results. We start by proving some technical results that will be needed for the triangularization. Let λI −C L be the left companion matrix of a monic matrix polynomial P (λ) of size n × n and degree , and let Π denote the permutation matrix Π = [ π 1 π 2 · · · π n ] , π i = [ e i e n+i · · · e ( −1)n+i ] for i = 1, . . . , n.
Then the permuted linearization λI − Π T C L Π will be called the left companion linearization of P (λ) in controller form. This is not a common term in the context of linearizations. The name comes from the theory of linear control systems, where controllable systems whose state matrices have the form of Π T C L Π are said to be in controller form [8]. If we view this linearization as an n × n block pencil, then the zero-block structure of the pencil is the same as the zero structure of P (λ). Furthermore, the diagonal × blocks are the companion matrices of the corresponding scalar polynomials on the diagonal of P (λ). To illustrate the controller form, Figure 3 shows the spy plots of the left companion matrix for P (λ) in dense (no structure), diagonal, triangular, and Hessenberg forms. The controller form is useful in the proofs of the following theorems. In these theorems we will work with matrices having eigenvalues of geometric multiplicity at most n. The rationale behind this is that if λI − A is a linearization of an n × n matrix polynomial P (λ) as in (1), then, by [4,Thm. 1.7], the geometric multiplicity of the eigenvalues of A cannot be greater than n.
3.1.1. Existence of Schur form for triangular reduction. Recall that a matrix is called nonderogatory if every eigenvalue has geometric multiplicity one.
Theorem 2 (Schur form with nonderogatory blocks, complex version). Let A ∈ C n× n be a matrix whose eigenvalues have geometric multiplicity at most n. Then A has a Schur decomposition where the diagonal blocks T ii ∈ C × , i = 1, . . . , n, are upper triangular and nonderogatory.
Proof. Since A has no eigenvalue with geometric multiplicity greater than n, it follows from [4, Proof of Thm. 1.7] that λI − A is a linearization of an n × n upper triangular monic matrix polynomial R(λ) of degree . This matrix polynomial has a left companion linearization in controller form, which itself must be monic. Denote this linearization by λI − B. Then A = SBS −1 for some nonsingular S. Furthermore, B is block upper triangular, with blocks of size × , and all diagonal blocks must be nonderogatory (since they are companion matrices). Let U i T i U H i be a Schur decomposition of the ith diagonal block and set U = U 1 ⊕ U 2 ⊕ · · · ⊕ U n . Then is a Schur decomposition. Finally, let SU = QR be a QR factorization of SU and note that since R is upper triangular and nonsingular, A = Q(RT R −1 )Q H is a Schur decomposition of A. The next theorem follows from the fact that the ith diagonal × block of RT R −1 is similar to T i .
We now prove the real analog of Theorem 2.
Theorem 3 (Schur form with nonderogatory blocks, real version). Let A ∈ R n× n be a matrix whose eigenvalues have geometric multiplicity at most n. Then A has a real Schur decomposition where each T ii is either of size × and nonderogatory or of size 2 × 2 and such that all eigenvalues have geometric multiplicity one or two.
Proof. Since all eigenvalues of A have geometric multiplicity at most n, it follows that λI − A has a real Smith form D(λ) ⊕ I ( −1)n with deg det D(λ) = n . By [16,Theorem 4.1] D(λ) is equivalent to some real quasi-triangular matrix polynomial T (λ) of degree , which may be assumed to be monic. It follows that where ∼ denotes the equivalence relation for matrix polynomials. In other words, A is a linearization of some monic quasi-triangular matrix polynomial of degree . If B denotes the constant matrix of the left companion linearization of T (λ) in controller form, then the rest of the proof is essentially the same as the last part of the proof of Theorem 2, with the only difference being that we consider the real Schur decomposition instead of the complex one.
3.1.2. Numerically stable construction of a Schur form for (block-) triangular reduction. The above theorems are key stones in the process of constructing the matrix X of Theorem 1 for the (block-) triangular reduction of P (λ) in (1). To be numerically useful we need to overcome the drawback that the linearization λI − B in the proofs of Theorems 2 and 3 is obtained from λI − A via unimodular transformations. In what follows we propose a numerically stable procedure to construct the desired Schur form of A in Theorem 2 or Theorem 3 out of any of its Schur forms. This procedure works as long as all eigenvalues of A have algebraic multiplicity at most n. This will be our assumption.
We will proceed by induction on n. If n = 1, then A ∈ F × is a nonderogatory matrix because, by assumption, all its eigenvalues have algebraic multiplicity n = 1. Thus any Schur form of A (real or complex) is nonderogatory. Assume n > 1 and that any matrix A ∈ F m ×m , with m ≤ n − 1 and all its eigenvalues of algebraic multiplicity at most m, admits a Schur form satisfying the conditions of Theorems 2 or 3 according as F = C or F = R, respectively.
First, compute any (real or complex) Schur decomposition of A. Then reorder the diagonal entries/blocks using the procedure in Bai and Demmel [1] according to the rules described below. We discuss the real and complex cases separately.
(I) Complex case. Suppose there are k distinct eigenvalues of algebraic multiplicity n and s distinct eigenvalues of algebraic multiplicity less than n. Note that k ≤ and s = 0 or s > − k according as k = or k < , respectively. Reorder the Schur form such that the leading k × k principal submatrix has one instance of each eigenvalue of algebraic multiplicity n. If there are k < such eigenvalues, pick any − k(< s) distinct eigenvalues of algebraic multiplicity less than n and reorder the diagonal such that these appear after the k eigenvalues of algebraic multiplicity n. The leading × submatrix obtained in this way has simple eigenvalues and is thus nonderogatory. We can use the induction hypothesis on the lower right (n − 1) × (n − 1) part of the matrix because all eigenvalues of A with algebraic multiplicity n have been used.
(II) Real case. The procedure over R is more involved because we need to move nonreal eigenvalues in complex conjugate pairs in order to keep the decomposition real. In addition, 2 × 2 diagonal blocks may appear with eigenvalues of geometric multiplicity two. An example that illustrates the main features of the procedure that follows is given in Example 5.
At this point it is important for us to recall that when applying the Bai-Demmel algorithm [1] to block triangular matrices of size 3×3 (one element and one 2×2 block in the diagonal) and 4×4 (two blocks of size 2×2 in the diagonal), the blocks of size 2× 2 before and after applying the algorithm are similar but not necessarily identical. In what follows we will very often use phrases like "reordering" or "moving the diagonal blocks" to mean that consecutive diagonal elements and blocks are swapped to place them in desired diagonal positions. As in the complex case, our goal is to move (if needed) diagonal elements and blocks in order to obtain a real Schur form T of A whose × diagonal blocks have distinct eigenvalues and the eigenvalues of the 2 × 2 diagonal blocks are of algebraic multiplicity at most 2. This matrix would, of course, satisfy the conditions of Theorem 3.
Let us assume that the matrix A has the following: • k r distinct real eigenvalues of algebraic multiplicity n; • k c distinct pairs of nonreal complex conjugate eigenvalues of algebraic multiplicity n; • s i distinct real eigenvalues of algebraic multiplicity i < n; • q i distinct pairs of nonreal complex conjugate eigenvalues of algebraic multiplicity i < n. Define s := s 1 + s 2 + · · · + s n−1 , q := q 1 + q 2 + · · · + q n−1 and k := k r + 2k c . So, s and q are the number of distinct real and distinct pairs of nonreal complex conjugate eigenvalues, respectively, of multiplicity smaller than n. We need some inequalities that will be useful to ensure that the inductive process can be completed. First, k ≤ and n( − k) = s 1 + 2s 2 + · · · + (n − 1)s n−1 + 2q 1 + 4q 2 + · · · + 2(n − 1)q n−1 . Then, We also claim that In fact, first we have (s + 2q − q 1 )n = (s + q 1 + 2q 2 + · · · + 2q n−1 )n.
Thus, if −k > 0 and n ≥ 3, then (s+2q −q 1 )n > (n−1)s+2q 1 +4q 2 +· · · 2(n−1)q n−1 and (13) follows from (12). Start by reordering the Schur form such that one instance of each of the k r real eigenvalues and one instance of the 2×2 blocks corresponding to the k c pairs of nonreal complex conjugate eigenvalues of algebraic multiplicity n appear in the leading k × k principal submatrix. Let T (k) 11 denote this submatrix. If − k = 0, then T 11 := T (k) 11 is nonderogatory and we can apply the induction hypothesis to the (n − 1) × (n − 1) lower right part of the matrix. If − k > 0, this positive integer is either even or odd. Let us assume first that it is even. If there is at least ( − k)/2 diagonal blocks of size 2×2 corresponding to distinct nonreal complex conjugate eigenvalues of algebraic multiplicity smaller than n (i.e., if q ≥ ( − k)/2), then we can move ( − k)/2 diagonal blocks of size 2×2 to appear after T (k) 11 and the obtained leading × submatrix would be nonderogatory. If, on the contrary, q < ( − k)/2, then we will have to move all diagonal blocks of nonreal complex conjugate eigenvalues of multiplicity less than n and − k − 2q distinct real eigenvalues to appear after T (k) 11 . The question is: Do we have − k − 2q distinct real eigenvalues? The answer is in the affirmative because by either (11) or (13) s + 2q ≥ − k. In summary: (i) If − k is even, then let k 1 = min{q, −k 2 }. Choose k 1 2 × 2 blocks corresponding to distinct nonreal complex conjugate eigenvalues of algebraic multiplicity less than n and move them so that they appear directly after T (k) 11 . Denote the new submatrix T (k+2k1) 11 is nonderogatory. (i 2 ) If k 1 = q < −k 2 , then it follows from either (11) or (13) eigenvalues of algebraic multiplicity less than n so that they appear after T (k+2k1) 11 . The leading × submatrix is nonderogatory. Apply the induction hypothesis to the (n − 1) × (n − 1) lower right part of the matrix as above. Let us assume now that − k is odd. In order to complete T (k) 11 up to an × upper (block-) triangular matrix, we need at least one real eigenvalue of algebraic multiplicity less than n; i.e., s > 0. If this is the case, then we can proceed as in the case when − k is even but replacing − k by − k − 1 and moving one available real eigenvalue to the position ( , ). On the other hand, if s = 0, then we can try to produce a 2 × 2 diagonal block with eigenvalues of algebraic multiplicity at most 2.
We will see that this is always possible.
(ii) If − k is odd and s > 0, then let k 1 = min{q, −k−1 2 }. As in the case when − k is even, choose k 1 2 × 2 blocks corresponding to distinct nonreal complex conjugate eigenvalues of algebraic multiplicity less than n and move them so that they appear directly after T (k) 11 . Denote the new leading submatrix T (k+2k1) 11 < q, then = k + 2k 1 + 1. Since s > 0, one real eigenvalue of algebraic multiplicity less than n can be placed after T (k+2k1) 11 so that the × principal submatrix is nonderogatory.
− k − 2k 1 distinct real eigenvalues of algebraic multiplicity less than n so that they appear after T (k+2k1) 11 and the resulting × principal submatrix is nonderogatory. Apply the induction hypothesis as above.
(iii) If − k is odd and s = 0, we aim to form a 2 × 2 block with eigenvalues of geometric multiplicity at most two. Recall that we already have one instance of each of the k r real eigenvalues and one instance of the 2 × 2 blocks corresponding to the k c pairs of nonreal complex conjugate eigenvalues of algebraic multiplicity n ≥ 2 in T (k) 11 . Next, move another instance of each of the k r real eigenvalues and another instance of the 2 × 2 blocks corresponding to the k c pairs of nonreal complex conjugate eigenvalues of algebraic multiplicity n, so that they appear just after T (k) 11 . Let T (2k) 11 denote this matrix. These eigenvalues may have geometric multiplicity two in T (2k) 11 . (iii 1 ) If n = 2, then, from (11), 2( − k) = 2q 1 + s = 2q 1 . This means that there are − k 2 × 2 diagonal blocks corresponding to pairs of nonreal complex conjugate eigenvalues of algebraic multiplicity one. Reorder (if necessary) the diagonal blocks so that they appear just after T (2k) 11 . The resulting 2 × 2 submatrix has all its eigenvalues of geometric multiplicity 2 at the most. (iii 2 ) If n ≥ 3, then we have two possibilities: either − k ≤ q or − k > q.
If − k ≤ q, then we have − k 2 × 2 blocks corresponding to distinct nonreal complex conjugate eigenvalues of algebraic multiplicity less than n that can be moved so that they appear directly after T (2k) 11 . Then all the eigenvalues of the obtained 2 × 2 submatrix have geometric multiplicity one or two. If − k > q, the process is a little more involved. First, we move q 2 × 2 blocks corresponding to distinct nonreal complex conjugate eigenvalues of algebraic multiplicity less than n so that they appear directly after T (2k) 11 . Let T (2k+2q) 11 be the obtained submatrix. We need to move − k − q additional 2 × 2 blocks corresponding to distinct nonreal complex conjugate eigenvalues of algebraic multiplicity less than n. Notice that, since we have moved q such blocks to form T (2k+2q) 11 , we have already used all blocks corresponding to nonreal complex conjugate eigenvalues of algebraic multiplicity 1. So we are left with q − q 1 2 × 2 blocks corresponding to distinct nonreal complex conjugate eigenvalues of algebraic multiplicity less than n. But it follows from (13) (recall that − k > 0 and n ≥ 3) that − k < 2q − q 1 + s = 2q − q 1 and this means that − k − q < q − q 1 . Therefore, another copy of − k − q 2 × 2 blocks corresponding to nonreal complex conjugate eigenvalues of algebraic multiplicities between 2 and n − 1 can be moved to appear directly after T (2k+2q) 11 . The eigenvalues of the resulting × matrix have algebraic multiplicity at most 2. We can now apply the induction hypothesis to the (n − 2) × (n − 2) lower right part of the matrix. We note that when − k is odd and s = 0 (case (iii)), the constructed 2 × 2 may or may not be further split into two × nonderogatory blocks by moving the eigenvalues and blocks along the diagonal. The following example illustrates the two possibilities. This matrix is in real Schur form and satisfies the requirements of Theorem 3: one 2 × 2 block with eigenvalues of geometric multiplicity at most 2. However, we can swap the diagonal block 0 1 −1 0 and the last diagonal element of A to obtain another Schur form that also satisfies the conditions of Theorem 3. The MATLAB code in Figure 4 implements the Bai-Demmel algorithm [1] to perform the swapping. The returned matrices Q and T are (b) Let n = 2, = 3, and Despite the eigenvalues of A being simple, there is no real Schur form of A with two nonderogatory diagonal blocks of size 3 × 3. If we apply the procedure of item (II) to A, then k r = k c = k = 0, q 1 = q = 3, and s = 0. Since − k = 3 is odd and s = 0, we use item (iii 1 ). In fact, 2( − k) = 2 = 6 = 2q 1 and we must put together three diagonal blocks of size 2 × 2. This means that A is itself the desired matrix.
The following example clarifies the main features of the procedure for the real case (item (II)) to bring a matrix in real Schur form to another one satisfying the requirements in Theorem 3.
Example 5. Let n = 4, = 2, and let A ∈ R 8×8 be a matrix in real Schur form with the following diagonal blocks: We can write A = B+T where T is a strict block-upper triangular matrix (block-upper triangular with zero blocks in the diagonal). We are going to apply the procedure of item (II) to A to find an orthogonal matrix Q such that Q T AQ is a real Schur form satisfying the requirements in Theorem 3.
Step 1. For A we have k r = k c = 0, s = 2, and q 1 = 1. Then, k = k r + k c = 0 and − k = 2. Thus − k is even and k 1 = min q, −k Step 2. For A 1 we have k r = 1, k c = 0, s = 1, and q 1 = 1. Since k r = 1 for eigenvalue 1, first of all, we must move it to position (1,1). In this case no action is needed because it is already there. Now, k = k r + k c = 1, − k = 1 is odd, s > 0, and k 1 = min q, −k−1 2 = min{1, 0} = 0. Hence we use (ii 2 ): move a real eigenvalue of multiplicity less than n = 3 to position (2,2). There is only one choice: use the Bai-Demmel algorithm to exchange the block 0 1 −1 0 and the entry in position (5, 5) (actually, we must swap first diagonal entries 1 and 2 and then swap 2 and block We remove again the two first rows and columns of A 2 and pay attention to 1, 1). Now n = 2 and = 2.
Step 3. For A 2 we have k r = 1, k c = 0, s = 0, and q 1 = 1. Again k r = 1 and we must place the eigenvalue of algebraic multiplicity 2 in position (1, 1). We use the Bai-Demmel algorithm to swap the diagonal block B 21 and the diagonal entry (3,3). Let B 21 be the resulting 2 × 2 block. Now k = k r +k c = 1, −k = 1 is odd and s = 0 so case (iii) applies: move another copy of the eigenvalues of algebraic multiplicity n = 2 to position (2,2). We use the Bai-Demmel algorithm to exchange the diagonal block B 21 and the entry in position (4,4). Let B 21 be the obtained block. We observe that n = 2 and 2( − k) = 2 = 2q 1 . So we proceed as indicated in item (iii 1 ): move the block B 21 to place it right after the two repeated eigenvalues to get a diagonal block of size 4. In this case, no action is needed. Thus there is an orthogonal Q 3 such that (1, 1, B 21 ), and B 21 similar to 0 , then Q is an orthogonal matrix and is a matrix in real Schur form. Blocks B 11 and B 21 are both similar to 0 1 −1 0 . The 4 × 4 block in the lower-right corner will be nonderogatory if its (1, 2) entry is not zero; otherwise, the geometric multiplicity of 1 in that block would be 2.
Matrix A in Example 5 has repeated eigenvalues, but even in the generic case of real matrices with simple eigenvalues, the diagonal blocks of a computed real Schur form might need to be rearranged in order to satisfy the requirements of Theorem 3. In addition, as part (b) of Example 4 shows, the diagonal blocks in the Schur form of Theorem 3 for matrices with simple eigenvalues may need to be of size 2 × 2 .
If one eigenvalue has algebraic multiplicity greater than n, the problem of computing the desired Schur forms in a stable manner, using unitary/orthogonal transformations, becomes significantly more complicated. We devote section 4 to this problem for quadratic matrix polynomials ( = 2). The higher-degree case > 2 is left as an open problem. There is a process to obtain a desired form by manipulating Jordan forms, but we omit the details as it is an unstable process.
3.1.3. Sufficient conditions for nonsingular K (A, X). Theorems 2 and 3 will be used in combination with the following lemmas. They show a nice connection with the following known result in the theory of linear control systems: the minimum number of inputs needed to control a linear time-invariant system is the geometric multiplicity of the eigenvalues with highest geometric multiplicity (see, for example, [19]). Lemma 6. If B ∈ F × is nonderogatory, then there exists x ∈ F such that the Krylov matrix K (B, x) is nonsingular.
Proof. Since B is nonderogatory it is similar to the left companion matrix C L of its characteristic polynomial [7,Thm. 3.3.15], that is, B = SC L S −1 for some nonsingular matrix S. It is now easy to see that K (C L , e 1 ) = I. Hence letting x = Se 1 yields the desired result.
The next lemma is the real counterpart of Lemma 6.
with J 1 and J 2 nonderogatory. Note that the matrix B is allowed to be nonderogatory: in this case S −1 BS = J 1 , m 2 = 0, and J 2 is empty. Since J 1 and J 2 are nonderogatory matrices, they are similar (via real arithmetic) to the left companion matrices C 1 ∈ R m1×m1 and C 2 ∈ R m2×m2 of their characteristic polynomials, respectively. Hence there exists a nonsingular W ∈ R 2 ×2 such that If m 2 = 0, then C := C 1 . It suffices to prove that there exist u, v ∈ R 2 such that M = [K (C, u) K (C, v)] is nonsingular because we then get the desired result by taking x = W u and y = W v.
If m 1 = m 2 or m 2 = 0, then u = e 1 and v = e +1 yield M = I 2 and we are done. If m 1 > m 2 > 0, we let u = e 1 and v = e −m2+1 + e m1+1 . Then direct calculations show that where * is some m 2 × ( − m 2 ) matrix. It is now easy to see that M has full column rank, and thus is nonsingular.
Finally, we provide a lemma that can be seen as a block generalization of Lemmas 6 and 7.
Lemma 8. If all eigenvalues of A ∈ F k ×k have geometric multiplicity at most k, then there exists X ∈ F k ×k such that K (A, X) is nonsingular.
Proof. We will handle the real and complex case simultaneously. Let A = ZT Z −1 be the decomposition from Theorem 2 or Theorem 3 and denote the diagonal blocks by T ii , i = 1 : r. For each T ii we define W i in the following way. If T ii is of size × , take W i to be the × 1 vector in Lemma 6 such that K (T ii , W i ) is nonsingular, and if T ii is of size 2 × 2 , take W i to be the 2 × 2 matrix whose columns are the two real vectors in Lemma 7. Letting W = W 1 ⊕ W 2 ⊕ · · · ⊕ W r and X = ZW yields K (A, X) = ZK (T, W ), which is of full rank.

Reduced forms.
For a given matrix polynomial with nonsingular leading matrix coefficient and monic linearization λI − A, we now discuss how to construct a matrix X such that properties (i) and (ii) in Theorem 1 hold.

Block-triangular form.
For the reduction to (block-) triangular form we have the following result.
Proposition 9. Let s 1 ,. . . , s k be positive integers such that s 1 + · · · + s k = n, and let where T ii ∈ F si× si has no eigenvalues of geometric multiplicity more than s i for i = 1, . . . , k. If A ∈ F n× n is similar to T, then there exists X = [X 1 X 2 · · · X k ] with X i ∈ F n ×si such that S = K (A, X) is nonsingular and K (A, X 1:i ) is A-invariant for i = 1, . . . , k.

Proof. By Lemma 8, we can for each
Let Z be a nonsingular matrix such that Z −1 AZ = T and put X = ZV . Then S = K (A, X) = ZK (T, V ) is nonsingular. In addition, if σ i = s 1 + · · · + s i and Since the columns of T j W i are also columns of K (T, W i ) for j < , we only have to show that there is a matrix R such that T W i = K (T, W i )R. If T i is the submatrix of T formed by its σ i first rows and columns and V i = V 1 ⊕ · · · ⊕ V i , then as desired.
Remark 10. The proof of Proposition 9 provides a practical means to construct X. From the proof we see that the columns of K (A, X 1:i ) must be a basis for the invariant subspace of A corresponding to the eigenvalues of T 11 , T 22 , . . . , T ii .
We now explain why the MATLAB M-file in Figure 1 successfully reduced P (λ) to triangular form (see the left plot of Figure 2). Since the coefficients are generated randomly, the eigenvalues are all distinct with probability one. Therefore, MATLAB's schur function computes a Schur decomposition C P = ZT Z H , where Z H = Z −1 and the × diagonal blocks are all nonderogatory. Thus each K (T ii , V i ) ∈ R × becomes nonsingular by taking each V i ∈ F ×1 to be a vector of ones (almost any random vector would do). Hence, X := Z(V 1 ⊕ V 2 ⊕ · · · ⊕ V k ) is as in Proposition 9, and so the conditions (i) and (ii)(b) in Theorem 1 are fulfilled.

Block-diagonal form.
For the reduction to block-diagonal form we have the following result.
Proposition 11. Let A ∈ F n× n and assume that for some nonsingular Z where D ii ∈ F si ×si has eigenvalues of geometric multiplicity at most s i ∈ N, i = 1, . . . , k, with s 1 + · · · + s k = n. Then there exists X = [X 1 X 2 . . . X k ] with X i ∈ F n ×si such that S = K (A, X) is nonsingular and K (A, X i ) is A-invariant for i = 1, . . . , k.
The proof is similar to that of Proposition 9 and is omitted. We have the following analog to Remark 10.
Remark 12. With the notation of Proposition 11, the columns of K (A, X i ) are a basis for the invariant subspace of A corresponding to the eigenvalues of D ii .
We now explain how the diagonalization corresponding to the middle plot of Figure 2 was achieved. The eigenvalues are again all distinct (with probability one), and the eig function computes Λ, Z such that C P = ZΛZ −1 is an eigenvalue decomposition with Λ diagonal. Thus by taking V i ∈ F ×1 to be vectors of ones and letting X := Z(V 1 ⊕ V 2 ⊕ · · · ⊕ V k ), the conditions (i), (ii)(a) in Theorem 1 are satisfied.
Clearly the number of blocks in the decomposition (14) of Proposition 11 is not arbitrary. Indeed, the linear matrix polynomial λI − J α , where is of size 2 × 2 , cannot be reduced to a block-diagonal structure with smaller block sizes. Further, since λI − J α is a linearization of it may also be the case that for matrix polynomials of degree > 1, the block sizes of a block-diagonal form cannot be reduced. Let λI − A be a linearization of P (λ) ∈ F[λ] n×n in (1). From Theorem 1 and Proposition 11, we see that a reduction to diagonal form is possible if we can partition the Jordan blocks associated with A into n sets such that (a) each set has at most one Jordan block of each eigenvalue, and (b) the sizes of all Jordan blocks in each set sum up to . The result also holds in the opposite direction, that is, it is possible to reduce P (λ) to diagonal form, only if we can partition the Jordan blocks of A such that (a) and (b) hold. To see this, we simply note that any diagonal monic matrix polynomial D(λ) = d 1 (λ) ⊕ d 2 (λ) ⊕ · · · ⊕ d n (λ) has left companion linearization in controller form: The following question arises: When is it possible to partition the Jordan blocks such that (a) and (b) are satisfied? This problem was solved by Lancaster and Zaballa [10] for the special case of quadratic matrix polynomials with nonsingular leading matrix coefficient, and by Zúñiga Anaya [20] for general regular quadratics. For matrix polynomials of higher degree the problem is still open.

Hessenberg form.
For the reduction to Hessenberg form we have the following result.
Proposition 13. Let A ∈ F n× n , and let Z be a nonsingular matrix such that where H is upper Hessenberg and partitioned in × blocks. Assume that the × diagonal blocks are unreduced, that is, H i+1,i = 0 for all i. If we let V = [e 1 e +1 · · · e (n−1) +1 ] and X = ZV ∈ F n×n , then K (A, X) is nonsingular and A x i ∈ K (A, x 1:i+1 ) for i = 1, . . . , n − 1.
Proof. We have K (A, X) = ZK (H, V ), which is obviously nonsingular. Furthermore, if v i and x i are the ith columns of V and X, respectively, then completing the proof.
In practice we are interested in Hessenberg decompositions A = U HU H , where U is unitary or real orthogonal, depending on whether we work over C or R. By the implicit Q-theorem [5,Thm. 7.4.2], the Hessenberg matrix H is uniquely defined, up to products by real or complex numbers of absolute value 1, by the first column of U . Hence a random Hessenberg matrix similar to A via unitary/real orthogonal transformations can be constructed using, e.g., the Arnoldi algorithm with a random starting vector (or equivalently, the standard Hessenberg reduction step [5, sect. 7.4.2] applied to QAQ H , where Q is a random orthogonal matrix). If a matrix has distinct eigenvalues, the resulting Hessenberg matrix will be unreduced with probability one. Since this is the generic case for matrix polynomials, Proposition 11 may be used to reduce almost all matrix polynomials to Hessenberg form without further care. This is how the right plot of Figure 2 was obtained.
If a matrix, on the other hand, has an eigenvalue of geometric multiplicity greater than one, then any similar Hessenberg matrix is necessarily reduced. Now, according to Proposition 13 the reduction of P (λ) to Hessenberg form is still valid if H is reduced, as long as the diagonal × blocks are unreduced. This means that all zeros on the subdiagonal are in some of the positions ( + 1, ), (2 + 1, 2 ), . . . , ((n − 1) + 1, (n − 1) ). If H has a zero in any other position on the subdiagonal (that is, if some Hessenberg diagonal block is reduced), K (A, X) becomes singular and the reduction will fail with the matrix X selected in the statement of Proposition 13. This raises the following question: Is it possible to move zeros on the subdiagonal, from unwanted to wanted positions, using a finite number of Givens rotations or Householder reflectors? Intuitively, this should not be possible since if moving a zero is possible, then we can change the number of "deflated" eigenvalues; a rigorous argument can be found in [15, pp. 104-105].
Proposition 14. Matrices X of Propositions 9 and 13 can be taken to have orthonormal columns.
Proof. Let X be the matrix constructed in the proof of either Proposition 9 or that of Proposition 13. Let X = QR be a QR factorization of X. Since X has full column rank, R is nonsingular and Q = XR −1 . Thus Therefore, K (A, X 1:i ) = K (A, Q 1:i ), i = 1, . . . , k, and so if X is the matrix of Proposition 9, then K (A, Similarly, the ith column of X and Q are x i = q 1:i r i and q i = x 1:i u i , respectively, where r i and u i are the last columns of R i and R −1 i , respectively. A simple induction argument shows that A x i ∈ K (A, x 1:i+1 ) if and only if A q i ∈ K (A, q 1:i+1 ).
In practice, using a matrix X with orthonormal columns to construct S = K (A, X) may result in a more reliable way of computing S −1 AS to obtain the left companion matrix of a block-triangular or Hessenberg matrix polynomial equivalent to P (λ). We note that while we have discussed (when possible) how to compute X in a stable manner, finding the reduced matrix polynomial R(λ) appears to require further computing S −1 AS. As mentioned in the introduction, we leave the stable computation of R(λ) as an open problem.
4. Stable computation of a special Schur form for quadratic matrix polynomials. A procedure was exhibited in section 3.1.2 to compute the Schur decompositions in Theorems 2 and 3 in a numerically stable manner when all eigenvalues have algebraic multiplicity at most n. In this section we aim to complete the study to cover the case of eigenvalues of arbitrary algebraic multiplicity for quadratic matrix polynomials, that is, when = 2. Recall that the eigenvalues of any linearization of P (λ) in (1) have geometric multiplicity at most n [4, Thm. 1.7], but they may have algebraic multiplicity greater than n. We will assume in this section that one eigenvalue (and only one because = 2) has algebraic multiplicity greater than n. It must be real (again because = 2) and will be denoted by α.
We collect key tools in three lemmas, which all have algorithmic proofs. These proofs rely on the possibility of computing the geometric multiplicity of α; equivalently, computing rank (A − α I). Since we are dealing with a very ungeneric case, this is not an unreasonable assumption. Our goal is to follow a procedure similar to that of section 3.1.2: we start with a computed Schur form of a linearization of P (λ) and use the Bai-Demmel algorithm [1] to move the eigenvalues (or 2 × 2 blocks with complex conjugate eigenvalues if F = R) along the main diagonal. Now we have one eigenvalue, α, whose algebraic multiplicity is n + t with t > 0. When F = C we will pair a copy of α with an eigenvalue different from α. The corresponding 2×2 diagonal block will be nonderogatory. Once the n − t eigenvalues different from α have been used and the rows and columns associated to the corresponding 2 × 2 blocks have been constructed, we are left with a 2t × 2t triangular matrix, T 1 say, whose only eigenvalue is α. We also need the 2 × 2 diagonal blocks of T 1 to be nonderogatory. Hence α, as an eigenvalue of T 1 , must have geometric multiplicity at most t. So, our strategy will be to move the eigenvalues along the diagonal in order to pair a copy of α with an eigenvalue different from α in such a way that, when removing the rows and columns of the corresponding 2 × 2 diagonal block, the geometric multiplicity of α, as an eigenvalue of the resulting submatrix, is at most n − 1. This is Lemma 15. In this way, α as an eigenvalue of T 1 will have geometric multiplicity at most t. Then we show that any 2n × 2n matrix with α as the only eigenvalue and geometric multiplicity at most n admits a Schur form with 2 × 2 nonderogatory diagonal blocks (Lemma 16). When F = R the Schur form may have 2 × 2 diagonal blocks associated to pairs of complex conjugate eigenvalues in addition to real eigenvalues. In this case, we first construct 4×4 blocks with two copies of α so that, after removing the first four rows and columns, the geometric multiplicity of α as an eigenvalue of the obtained submatrix is at most n − 2. This is Lemma 17. Examples are provided at the end of the section.
Here and below, MATLAB notation is used to denote the submatrices of a given matrix. For instance, X(i 1 : i 2 , j 1 : j 2 ) is the submatrix of X ∈ F m×n formed with the i 1 through i 2 rows and the j 1 through j 2 columns and X(:, j 1 : j 2 ) = X(1 : m, j 1 : j 2 ).
In some parts of the proofs we will use "bottom-up" QR factorizations of matrices. We discuss the complex and real cases in different subsections.
4.1. The complex case. We start by proving two lemmas.
Lemma 15. Assume that A ∈ C 2n×2n (n ≥ 2) has at least two distinct eigenvalues α and β with α of geometric multiplicity at most n. Then there exists a Schur form of A, A = U T U H , such that (i) T (1 : 2, 1 : 2) = β * 0 α .
Proof. Let T be a Schur form of A. By using, if necessary, the Bai-Demmel algorithm [1], we can assume that the blocks in the diagonal of T are so that T (1 : 2, 1 : 2) is as in (i). Then the condition (ii) necessarily holds if the geometric multiplicity of α as an eigenvalue of A is less than n. Hence below we suppose that it is equal to n.
Let m 1 ≥ m 2 ≥ · · · ≥ m n be the partial multiplicities of α as an eigenvalue of A (that is, the sizes of the Jordan blocks) and s = m 1 + · · · + m n . Since A has at least two distinct eigenvalues, n ≤ s < 2n and, given that the geometric multiplicity of α is n and s < 2n, we have m n = 1. We aim to detect one eigenvalue α in the diagonal of T associated with a Jordan block of size 1. This will be the copy of α to be placed in T (1 : 2, 1 : 2).
We use again the Bai-Demmel algorithm [1] to reorder the diagonal of T (2 : 2n, 2 : 2n) so that in the new matrix T 0 the s copies of the eigenvalue α appear in the submatrix T 1 = T 0 (2 : s + 1, 2 : s + 1). Observe that T 0 is still a Schur form of A.
Thus α is the only eigenvalue of T 1 and recall that we are assuming that its geometric multiplicity is n. Let Q 1 ∈ C s×n be a matrix whose columns are an orthonormal basis of Ker(T 1 − αI s ) and complete Q 1 up to a unitary matrix Q = [Q 1 Q 1 ] ∈ C s×s (using a full QR factorization of Q 1 , for example). Then we have for some B ∈ F (n−s)×(n−s) and C 1 ∈ F n×(n−s) . Let C = Q 2 R be a bottom-up QR factorization of C. Since s < 2n, C has more rows than columns and so the entries of the first row of R are all zero. Hence, if Q = diag( Q 2 , I s−n ) Q, then is a Schur decomposition of A and α is not an eigenvalue of T D . Notice that the first row of RQ 2 is still zero and so the first row of T 2 (2 : s + 1, 2 : s + 1) − αI s is also zero. Therefore, rank (T 2 (2 : s + 1, 2 : s + 1) − αI s ) = rank (T 2 (3 : s + 1, 3 : s + 1) − αI s−1 ). Since T 0 and T 2 are similar, the geometric multiplicity of α as an eigenvalue of T 2 (2 : s + 1, 2 : s + 1) is n and so s − n = rank (T 2 (2 : s + 1, 2 : s + 1) − αI s ) = rank (T 2 (3 : s + 1, 3 : s + 1) − αI s−1 ). This means that null(T 2 (3 : s + 1, 3 : s + 1) − αI s−1 ) = n − 1.
That is to say, the geometric multiplicity of α as eigenvalue of T 2 (3 : s + 1, 3 : s + 1) is n − 1 and T 2 satisfies conditions (i) and (ii).
We note that the structure in (16) is the first step of a proof of the Jordan canonical form (e.g., [18, sect. 2.4]), and a further reduction of B establishes the Weyr characteristics [13], leading to the Weyr canonical form.
The next lemma is needed for dealing with a matrix with only one real eigenvalue.
Lemma 16. Let A ∈ F 2n×2n (F = R or C) be upper triangular with zero diagonal entries and assume that the geometric multiplicity of the zero eigenvalue is at most n. Then there exists a unitary U (orthogonal if F = R) such that U H AU is upper triangular with nonderogatory 2 × 2 diagonal blocks.
Proof. Notice that the hypothesis about the geometric multiplicity of zero as an eigenvalue of A is equivalent to rank (A) ≥ n. We will assume F = C but the proof for the real case is the same changing unitary matrices by orthogonal matrices.
We use induction on n. For n = 1, rank (A) ≥ n implies that A = 0 0 a12 0 with a 12 = 0, that is, A is nonderogatory. Suppose the result holds for n − 1. Let A ∈ C 2n×2n be upper triangular with zero diagonal and rank (A) ≥ n. If a 12 = 0, then we can unitarily transform A so that its (1, 2) entry becomes nonzero as follows. Use a sequence of Givens rotations G to transform the first nonzero column of A, say Ae m , m ≥ 2, to a multiple of e 1 . Then G H AG is still upper triangular with zero diagonal, first m − 1 columns equal to zero, and the mth column equal to a multiple of e 1 . Then we move the (1, m) nonzero entry to the (1, 2) position with a permutation P 2,m , where P 2,m swaps the second and mth row/column of G H AG. The resulting matrix P H 2,m G H AGP 2,m is still upper triangular. Hence below we assume that a 12 = 0 in A.
To use the induction hypothesis, we need to make sure that A 22 is upper triangular with rank(A 22 ) ≥ n−1, that is, the geometric multiplicity of the eigenvalue zero is at most n − 1. Since it cannot be greater than n, rank (A 22 ) ≥ n − 2 and so care is needed only when rank (A 22 ) = n − 2. Notice first that, in this case, n ≥ null(A) ≥ null(A 22 ) = n and so rank (A) = null(A) = n. Hence the geometric multiplicity of the eigenvalue zero in A is n. Now, rank (A) = n and rank (A 22 ) = n − 2 imply that the second row of A cannot be zero. In order to unitarily transform A so that rank (A 22 ) = n − 1, we can use the same technique as that in the proof of Lemma 15: since the geometric multiplicity of the eigenvalue zero is n, we can unitarily transform A = Q H AQ with Q = [ 1 ] ⊕ Q 1 so that A is upper triangular and the first n columns of A(2 : 2n, 2 : 2n) become zero: Notice that the first column of A(2 : 2n, 2 : 2n) is zero and so e 1 can be taken as the first column of Q 1 . In other words, Q can be chosen to have the form Q = I 2 ⊕ Q 1 with Q 1 a unitary matrix of order 2n − 2. Bearing in mind that the second row of A is not zero, we conclude that the second row of A (and so the first row of C) is not zero.
Assume now that rank C(2:n,:) B = n − 2. Since the size of C is n × (n − 1), its rank is not greater than n − 1, so there exists a row, say k, that linearly depends on the remaining rows of C. But the first row of C is not zero. Hence k can be chosen such that 1 < k ≤ n + 1. We can use a Givens rotation G applied to C in the planes (1, k) to change the kth row of C in such a way that if C = GCG H , then We are now ready to describe an algorithm that stably computes the Schur form in Theorem 2 when = 2. Let A = U H T U be any computed Schur decomposition of the matrix A in Theorem 2, and suppose that some of the 2 × 2 diagonal blocks of T are derogatory.
If all eigenvalues have algebraic multiplicity at most n, then we can reorder the diagonal entries of T using the Bai-Demmel algorithm [1], as was discussed in section 3.1.2. Thus assume that the eigenvalue α has algebraic multiplicity n + t with 1 ≤ t ≤ n. If t = n, we use the procedure described in the proof of Lemma 16 to further unitarily reduce T − αI to an upper triangular matrix T 1 with 2 × 2 nonderogatory diagonal blocks. T 1 + αI is the desired Schur form of A.
If t < n, note that all other eigenvalues must have algebraic multiplicity less than n. By using the Bai-Demmel algorithm [1] we pair as many α as possible with eigenvalues other than α thereby forming nonderogatory blocks in the top-left corner of T . In doing so, we use Lemma 15 to ensure that the resulting 2t × 2t bottom-right corner of T has eigenvalue α with geometric multiplicity no larger than t. Thus we are left with where T 22 ∈ C 2t×2t contains 2 × 2 diagonal blocks with eigenvalue α and rank (T 22 − αI) ≥ t. Lemma 16 is then applied to T 22 − αI as above to obtain a unitarily similar upper triangular matrix with nonderogatory 2 × 2 diagonal blocks.

The real case.
To describe an algorithm that works in real arithmetic and computes the Schur decomposition in Theorem 3, we need a real version of Lemma 15.
Lemma 17. Let A ∈ R 2n×2n (n ≥ 2) and suppose that the spectrum of A contains a pair of nonreal complex eigenvalues a ± ib and a real eigenvalue α of geometric multiplicity at most n and algebraic multiplicity greater than n. Then there exists a Schur form T of A, such that Proof. Let s ≤ 2n − 2 and k ≤ n be the algebraic and geometric multiplicities of α as an eigenvalue of A. Consider an arbitrary real Schur form of A and reorder the diagonal blocks, using the Bai-Demmel algorithm [1] so as to obtain a Schur form T of A such that the leading 2 × 2 block is as in (i) and α appears on the diagonal of X = T (3 : s + 2, 3 : s + 2). Thus and α is not eigenvalue of Y . Then T satisfies condition (ii) of the lemma if and only if the geometric multiplicity of α as an eigenvalue of X(3 : s, 3 : s) is at most n − 2.
Hence it is enough to show how to construct a Schur form, T 1 , of X such that the geometric multiplicity of α as an eigenvalue of T 1 (3 : s, 3 : s) is at most n − 2. In fact, if Q 1 is an orthogonal matrix such that Q T 1 XQ = T 1 and Q = I 2 ⊕ Q 1 ⊕ I 2n−s−2 , then Q T T Q satisfies conditions (i) and (ii). In what follows we will show how to obtain the desired Schur form T 1 of X.
First, the geometric multiplicity of α as an eigenvalue of X is the same as that of α as an eigenvalue of A and we are assuming that this is k. Then dim Ker( X −α I s ) = k ≤ n, and if k ≤ n − 2, then dim Ker( X(3 : s, 3 : s) − α I s ) ≤ k ≤ n − 2. This means that if T 1 = X, then the geometric multiplicity of T 1 (3 : s, 3 : s) is not greater than n − 2. Hence we only have to analyze the cases k = n and k = n − 1. In both cases, we proceed as in Lemma 15. We find an orthogonal Q 0 such that (see (16)) is upper triangular and compute a "bottom-up" QR factorization of C, C = Q 1 R.
Recall that s ≤ 2n − 2 and for k ≤ n − 2 the lemma has been already proved. We split the remaining possibilities into three different cases. Each case requires a different proof.
(ii) If k = n − 1 and s < 2n − 2, then null(X 1 − αI s ) = n − 1, the size of R is (n − 1) × (s − n + 1), and s − n + 1 < 2n − 2 − n + 1 = n − 1. Then the first row of R is zero and since s − n + 1 = rank (T 1 − αI s ) = rank (ii) If k = n − 1 and s = 2n − 2, then X − αI s fulfils the hypothesis of Lemma 16 because this matrix is nilpotent, its size is 2(n − 1), and the geometric multiplicity of α is n−1. Following the procedure designed in the proof of that lemma, an orthogonal matrix U can be obtained such that U T ( X − αI 2n−2 )U is upper triangular with nonderogatory 2 × 2 diagonal blocks. Therefore, T 1 = U T XU is a Schur form of X such that the geometric multiplicity of α as an eigenvalue of T 1 (3 : s, 3 : s) is at most n − 2 as desired.
We now have the artillery to describe a stable algorithm that computes the Schur form of A in Theorem 3 for = 2. Suppose a real Schur form of A is given. Any 2 × 2 block on the diagonal associated to a pair of nonreal complex conjugate eigenvalues is obviously nonderogatory, so we need only take care of the real eigenvalues. The case when all eigenvalues have algebraic multiplicities at most n was discussed in section 3. In the real case with = 2 there cannot be nonreal complex eigenvalues of algebraic multiplicity greater than n. Hence, we only have to deal with the case when exactly one real eigenvalue α has algebraic multiplicity greater than n. We first use Lemma 17 as many times as possible, that is, we pair two copies of α with as many pairs of nonreal complex conjugate eigenvalues as possible. After doing this we are left with real eigenvalues only. Henceforth, Lemmas 15 and 16 can be used as in the complex case to get a Schur form of A with all its diagonal blocks either nonderogatory of size 2 × 2 or of size 4 × 4 with eigenvalues whose geometric multiplicity is at most two.
We illustrate this process in the following long but complete example.
Example 18. Let n = 4, = 2, and let A be the following matrix in real Schur form: where a is either 1 or 2. Thus the distinct eigenvalues of A are 1, i, and −i when a = 1 and 1, 2, i, and −i when a = 2. In addition, the algebraic multiplicity of 1 as an eigenvalue of A is 5 or 6 according as a = 2 or a = 1. In both cases it is greater than n which is 4. Let us also assume that the geometric multiplicity of the eigenvalue 1 is 4. Under these conditions (see [ and the entry 2 in position (8,8). The resulting matrix is where C is similar to 1 d 1 , so it is nonderogatory. If C is itself upper triangular, then T 4 is the desired matrix. Otherwise, Step 5. Reduce C to upper triangular form by orthogonal similarity and apply it to the last two rows and columns of T 4 to obtain T 5 is a real Schur form of A with two nonderogatory blocks of size 2 × 2 and one block of size 4 × 4 in the diagonal. The geometric multiplicity of 1 as an eigenvalue of T 5 (1 : 4, 1 : 4) is two; thus we have a desired Schur form for a = 2. Assume now that a = 1. Since, in this case, the entries in positions (7, 7) and (8,8) are both equal to 1 there is no need to exchange these diagonal elements. We put T 2 = T 1 and go straight ahead to the next step.
Step 3. Let X = T 2 (3 : 8, 3 : 8) and the columns of Q 1 be an orthonormal basis of Ker(X − I 6 ). This is a 6 × 4 matrix with orthonormal columns. We can complete it to a 6 × 6 orthogonal matrix Q 1 such that where T C = 1 r32 1 . Compute a bottom-up QR factorization of C = Q 2 R and define Q 3 = diag(I 2 , Q 1 Q 2 ) with Q 2 = diag( Q 2 , I 2 ). Then where r 21 r 12 = 0 or r 21 r 32 = 0 (that is, r 21 = 0 and at least one of r 12 or r 32 is not zero) because otherwise null(T 3 − I) > 4. Thus T 3 is a real Schur form of A which satisfies conditions (i) and (ii) of Lemma 17.
Step 4. Deflate T 3 (1 : 4, 1 : 4) and pay attention to Y = T 3 (5 : 8, 5 : 8). This is a 4 × 4 real matrix with all eigenvalues 1. Its algebraic multiplicity is 4 and its geometric multiplicity is 2 because r 21 r 12 = 0 or r 21 r 32 = 0. We use the proof of Lemma 16 to get a real Schur form with two nonderogatory blocks of size 2 × 2 in the diagonal.
First, we define Z = Y − I 4 , which is a nilpotent matrix, and notice that if S is a real Schur form of Z, then S + I 4 is a real Schur form of Y . Now we apply the method proposed in the proof of Lemma 16: observe that the first nonzero column of Z is the third one, so we use a Givens rotation in order to replace that column by a multiple of e 1 . In the present case a permutation of the first and second rows and columns suffices: Next, we permute the second and third rows and columns to get With the notation of Lemma 15, A 22 = 0 0 r12 0 and n = 2. Thus, if r 12 = 0, then rank (A 22 ) = 1 = n − 1 and no further transformation is needed on Z 1 because it is upper triangular with 2 × 2 nonderogatory diagonal blocks. But if r 12 = 0, then rank (A 22 ) = 0 = n − 2 and one additional transformation is needed. In fact, as shown above and in the proof of Lemma 16, r 32 = 0, and we can perform a Givens rotation on rows and columns two and three to place simultaneously nonzero elements in (1,2) and (3,4) Summarizing, there is an orthogonal matrix Q = Q 1 Q 3 Q 4 with Q 4 = I 4 ⊕ P 1 P 2 G where G is an appropriate Givens rotation (the identity if r 12 = 0) such that 5. Parameterized linear systems. We consider parameterized linear systems of the form (18) P (ω)x = b(ω), x = x(ω).
These types of systems appear when computing numerical solutions of differential equations which arise in areas including electromagnetic scattering, wave propagation in porous media, or structural dynamics (see, for example, [6,9,14] and the references therein). The coefficient matrix in (18) is the matrix polynomial in (1) and b may be constant [11,14] or a (in general, nonlinear) function of the parameter ω [6,9]. For quadratic matrix polynomials ω is either real or pure imaginary with |ω| ∈ I = [ω , ω h ], ω ω h [6,9,11,14], and the solution of (18) is to be computed for many values of the parameter ω. In particular, in [9] b(ω) is supposed to be analytic in I except at points ω where det P (ω) = 0; the solution x(ω) then inherits the same property. Whether we are interested in analytic solutions of (18) or in solutions for finitely many values of ω, reduced forms R(ω) of P (ω) can be used to convert system (18) into a simpler equivalent one (19) R(ω)y = c(ω), y = y(ω).
We have shown in the previous sections a procedure to compute R(ω) from P (ω) without using unimodular transformations. We must show how to compute c(ω) so that systems (18) and (19) are equivalent. We will show a little more: how to obtain c(ω) from b(ω) so that the solution of (18) can be given explicitly in terms of b(ω) and the solution of (19). As a result we will give an explicit expression of the solution of (18) in terms of y(ω) and R(ω) for every ω ∈ C which is not an eigenvalue of P . For simplicity we will consider the case where R(ω) is triangular. Let C L (P ) be the left companion matrix of P (ω). In computing R(ω) we first use the algorithm of section 3.1.2 (or, in the quadratic case, those of section 4 if needed) to compute a Schur form of C L (P ) satisfying the properties of Theorem 2 (i.e., T ii ∈ C × is upper triangular and nonderogatory). Let T = Q H C L (P )Q be such a Schur form. Then we use Proposition 9 to obtain an n × n matrix X = [ x 1 x 2 · · · x n ] such that V = [ X T X · · · T −1 X ] is nonsingular and K (A, [ x 1 x 2 · · · x i ]) is A-invariant for 1 ≤ i ≤ n − 1 (notice that V is block-triangular because, following the proof of Proposition 9, X is of the form X = v 1 ⊕ v 2 ⊕ · · · ⊕ v n with v i of size × 1, i = 1, . . . , n). Then it follows from Theorem 1 that V −1 T V = C L (R) is the left companion matrix of a triangular matrix polynomial of degree . Thus, if S = QV, then S −1 C L (P )S = C L (R) for some upper triangular matrix polynomial. This is the matrix polynomial R(ω) of system (19). Now we are going to find c(ω) so that the solution x(ω) of system (18) can be explicitly given in terms of R(ω) and the solution of (19) for that c(ω).
Using · · · Y T ] T and Z = [ Z 1 · · · Z ] be the first n columns of S −1 and the last n columns of S, respectively. That is, Z = S( (n − 1) + 1 : n, :) and Y = S −1 (:, 1 : n). On substituting ωI − C L (R) −1 , Y , and Z in (20) we get If we let c(ω) = i=1 ω i−1 Y i b(ω) and solve (19) for y(ω), then for the solution x(ω) to the parameterized linear system (18) we have The structure of S and that of the left companion matrix can be exploited to construct the last n rows of S and the first n columns of S −1 . For each value of ω, x(ω) can be computed in O(n 2 + n 2 ) operations, by precomputing Y i b(ω) for every i and reusing them to obtain the second term in x(ω).
6. Conclusions. All matrix polynomials with nonsingular leading coefficients can be reduced to triangular form while keeping the size, degree, and eigenstructure of the original matrix polynomial by means of unimodular transformations. We do not have a practical way to compute the unimodular transformations, so instead, we have proposed a practical procedure that, starting from a Schur form of any linearization λI − A of a given n × n matrix polynomial of degree , consists of three steps: 1. Moving the diagonal elements (and the 2 × 2 diagonal blocks, in the real case) of the Schur form so as to obtain a new Schur form satisfying the properties of Theorems 2 or 3. 2. Using the obtained Schur form to construct a full column rank matrix X satisfying the conditions of Theorem 1 (X may be taken to have orthonormal columns). 3. Performing a structure preserving similarity transformation S = K (A, X) as in (2) so that S −1 AS is the left companion matrix of a monic triangular matrix polynomial of degree (only the last n columns of S −1 AS are needed). We showed how to implement step 1 in a stable way so that the procedure reduces any quadratic matrix polynomials to triangular form. For > 2, however, we only discussed how to succeed with step 1 in the case when no eigenvalue has algebraic multiplicity larger than n.
Reduction to other simple forms like block-diagonal, block-triangular, or Hessenberg forms was also considered. In particular, it was shown that if a Hessenberg form of a linearization, when partitioned in × blocks, has unreduced diagonal blocks, then the matrix polynomial can be brought to Hessenberg form using steps 2 and 3 above (with the obvious substitutions "Schur form" by "Hessenberg form" and "triangular matrix" by "Hessenberg matrix").