The Block Rational Arnoldi Method

. The block version of the rational Arnoldi method is a widely used procedure for generating an orthonormal basis of a block rational Krylov space. We study block rational Arnoldi decompositions associated with this method and prove an implicit Q theorem. We relate these de-compositions to nonlinear eigenvalue problems. We show how to choose parameters to prevent a premature breakdown of the method and improve its numerical stability. We explain how rational matrix-valued functions are encoded in rational Arnoldi decompositions and how they can be evaluated numerically. Two diﬀerent types of deﬂation strategies are discussed. Numerical illustrations using the MATLAB Rational Krylov Toolbox are included.

Block rational Krylov spaces are closely connected to their polynomial counterparts.To introduce notation, we will include a short review here.Block polynomial Krylov spaces are linear subspaces of C N ×s built with a matrix A ∈ C N ×N and a starting block vector b = [b 1 , . . ., b s ] ∈ C N ×s of maximal rank.The associated (classical) block Krylov space of order m + 1 is defined as (1.1) We have adopted the square superscript notation from [28].If there is no room for ambiguity, we sometimes omit (A, b) for brevity and just write K m+1 .There exists an integer M ≤ N , called the invariance index of (A, b) (or block grade of b with respect to A; see [29]) such that We define the dimension d of a block Krylov space K m+1 (A, b) as the cardinality of a basis over C N ×s , that is, there exist d block vectors b 1 , . . ., b d ∈ K m+1 such that every v ∈ K m+1 can be written in a unique way as v = α the block Krylov space is of dimension d = (m + 1)s 2 and there is a one-to-one correspondence between any block vector and the matrix polynomial The circ (•) notation has been attributed to Gragg in [34].Given a polynomial q m ∈ P m such that q m (A) is nonsingular, we define the (classical) block rational Krylov space associated with A, b, and q m as follows: Note that the polynomial q m is implicit in our notation Q m+1 = Q m+1 (A, b).The roots ξ 1 , ξ 2 , . . ., ξ m ∈ C := C ∪ {∞} of q m are referred to as the poles of the block rational Krylov space.Both the dimension and the invariance index of a block rational Krylov space equal those of its polynomial counterpart.
The focus of this paper is on theoretical and algorithmic aspects of generating bases of block rational Krylov spaces.A convenient tool in this respect are so-called block rational Arnoldi decompositions (BRADs), which we introduce in Section 2. In Section 3 we provide an implicit Q theorem which characterizes the parameters that uniquely determine a block rational Krylov space.The elements of a block rational Krylov space are closely linked to rational functions: if Q m+1 is of full dimension (m + 1)s 2 , then there is a one-to-one correspondence between any vector and the rational matrix-valued function We refer to such functions as RKFUNBs (block rational Krylov functions), generalizing the RKFUN concept [11] to the block case.In Section 4, we explain how RKFUNBs are uniquely encoded in block rational Arnoldi decompositions and how they can be evaluated efficiently via "rerunning" the decomposition.During the algorithmic construction of a block rational Krylov basis, an appropriate choice of so-called continuation matrices needs to be made, and this is explained in Section 5. Linear dependencies between the block Krylov vectors can lead to breakdowns of the rational Arnoldi method.Possible deflation stategies, which can circumvent such breakdowns, are described in Section 6.Finally, Section 7 contains numerical illustrations.
Notation.Finding an accessible notation for block Krylov methods is a challenge and varying notation is used in the literature.For clarity, we give a short summary of our notation.Throughout this work, scalars are denoted by lowercase Greek letters, e. .This notation has been chosen so that bold symbols imply block structure.An element of a block vector is a vector, whereas an element of a block matrix is a smaller matrix or a block vector.The linear space of scalar polynomials of degree at most m is denoted by P m .Finally, C := C ∪ {∞} denotes the extended complex plane.
2. Block rational Arnoldi decompositions.We aim to establish a correspondence between block rational Krylov spaces and a particular type of block matrix decomposition.A block matrix is called a block upper-Hessenberg matrix.A block upper-Hessenberg matrix H m ∈ C (m+1)s×ms is unreduced if all its subdiagonal blocks are nonsingular, i.e., H j+1,j is nonsingular for j = 1, . . ., m.The notion of being unreduced can be extended to a block upper-Hessenberg pencil as follows.
Definition 2.1.Let H m , K m ∈ C (m+1)s×ms be block upper-Hessenberg matrices.We say that (H m , K m ) is an unreduced block upper-Hessenberg pencil if, for every j = 1, . . ., m, at least one of the matrices H j+1,j and K j+1,j is nonsingular.
We now define block rational Arnoldi decompositions as matrix equations which generalize the decompositions described in [9,46,47] to block rational Krylov spaces.Definition 2.2.Let A ∈ C N ×N be a given matrix.A relation of the form is called a block rational Arnoldi decomposition (BRAD) if the following conditions are satisfied: ) is an unreduced block upper-Hessenberg pencil of size (m+1)s×ms, (c) µ j K j+1,j = ν j H j+1,j with scalars µ j , ν j ∈ C such that |µ j | + |ν j | = 0 for j = 1, . . ., m, and (d) the numbers ξ j = µ j /ν j are outside the spectrum Λ(A) for j = 1, . . ., m.The numbers ξ 1 , . . ., ξ m ∈ C are called the poles of the BRAD.
The block columns of V m+1 = [v 1 , . . ., v m+1 ] blockspan the space of the decomposition, that is, the linear space of block vectors v = m+1 j=1 v j C j with arbitrary coefficient matrices C j ∈ C s×s .If V m+1 has (m + 1)s orthonormal columns, we have an orthonormal BRAD.Any BRAD can be transformed into an orthonormal BRAD with the same poles ξ j by using a thin QR factorization V m+1 =: V m+1 R, where V m+1 ∈ C N ×(m+1)s has orthonormal columns, and R ∈ C (m+1)s×(m+1)s is upper triangular and nonsingular.Setting K m := RK m , and H m := RH m , we obtain an orthonormal BRAD A V m+1 K m = V m+1 H m that is equivalent to the original BRAD in the following sense.Definition 2.3.Two BRADs (2.1) with the same matrix A ∈ C N ×N are equivalent if they blockspan the same space and have the same poles ξ 1 , . . ., ξ m (not necessarily in this order).
This definition of BRAD equivalence generalizes [9, Definition 2.4] from standard rational Krylov spaces to the block case.It justifies that, without loss of generality, one may assume that BRADs are orthonormal.
In order to construct an orthonormal BRAD, Algorithm 2.1 can be used.This algorithm is a natural extension of the single-vector rational Arnoldi method, using thin QR factorizations to ensure orthonormality of the block vectors v j , that is, This type of orthonormality is also enforced in the so-called "classical block method" for polynomial Krylov spaces; see [24,38].Our variant of the algorithm starts with the orthogonalization of the initial block vector b ∈ C N ×s , which we assume to be of maximal rank s.For each of the poles ξ j , a quadruple (ν j , µ j , ρ j , η j ) ∈ C 4 is chosen such that µ j /ν j = ξ j and µ j /ν j = η j /ρ j .A simple strategy is to set (ν j , µ j ) = (1, ξ j ) for finite poles ξ j , and (ν j , µ j ) = (0, 1) when ξ j = ∞.To guarantee that µ j /ν j = η j /ρ j , we set (ρ j , η j ) = (1, 0) when |ξ j | > 1, and (ρ j , η j ) = (0, 1) otherwise.The quotient η j /ρ j is referred to as the continuation root [10].The computational core of Algorithm 2.1 is line 5, where a linear system (ν j A − µ j I)w j = (ρ j A − η j I)V j T j is solved for w j .The continuation matrix T j ∈ C js×s is used to select an appropriate element from Q j = blockspan{v 1 , . . ., v j } to form the right-hand side of the linear system.Different choices of continuation matrices T j are discussed in Section 5.The block vector w j is orthogonalized against the previous block vectors.For simplicity of exposition, Algorithm 2.1 uses the classic Gram-Schmidt procedure, but one could equally use the modified Gram-Schmidt procedure; see, e.g., [32,Chapter 19].Furthermore, we remark that when s > 1, there are many possibilities for the implementation of the orthogonalization: for example, one can treat each block vector as a set of individual vectors and orthogonalize on a vector level as opposed to a block orthogonalization.Finally, the newly orthogonalized block vector w j is made orthonormal by computing a thin QR factorization.
It is straightforward to verify that Algorithm 2.1 produces a BRAD by decomposing the for loop.Combining lines 6-8, we have and by line 5, Separating the terms containing A as a factor, we obtain AV j+1 (ν j c j − ρ j T j ) = V j+1 (µ j c j − η j T j ).

Algorithm 2.1 Block rational Arnoldi method
RKToolbox: rat krylov The block rational implicit Q theorem.The implicit Q theorem, to be given in Theorem 3.3, states that an orthonormal BRAD (2.1) is essentially uniquely determined by the starting block vector v 1 and the ordered poles ξ 1 , . . ., ξ m .To put this result in context, consider a standard (s = 1) polynomial Krylov space associated with the matrix A ∈ C N ×N .Given a matrix V m ∈ C N ×m with orthonormal columns, the standard implicit Q theorem states that if H m = V * m AV m is an unreduced upper-Hessenberg matrix, then V m is uniquely determined (up to unimodular column scaling) by its first column; see, e.g., [50,Chapter 2,Theorem 3.3].This result is extended to standard (s = 1) rational Krylov spaces in [9, Theorem 3.2], using the notion of essentially equal rational Arnoldi decompositions (RADs); see [9,Definition 3.1].The concept of essentially equal RADs extends to block rational Krylov spaces as follows.
Definition 3.1.Two orthonormal BRADs, namely, are essentially equal if there exists a unitary block diagonal matrix D m+1 ∈ C (m+1)s×(m+1)s , and a nonsingular block upper-triangular matrix Essentially equal orthonormal BRADs form an equivalence class and we call any of its elements essentially unique.
Two orthonormal BRADs may be equivalent but not essentially equal if the poles are ordered differently.The following lemma collects some useful results on unreduced block upper-Hessenberg matrices and BRADs.
s×ms is an unreduced block upper-Hessenberg matrix, then it is of maximal rank ms.
(ii) Let (H m , K m ) be an unreduced block upper-Hessenberg pencil of a BRAD, then each subdiagonal H j+1,j is either nonsingular or the zero matrix.(iii) Let (H m , K m ) be an unreduced block upper-Hessenberg pencil of a BRAD, then the matrix αH m −βK m is of maximal rank ms for all α, β ∈ C satisfying |α| + |β| = 0. (iv) Let L m ∈ C (m+1)s×ms be of maximal rank ms, then the dimension of the left null space of L m is s.Moreover, if x , y ∈ C (m+1)s×s are of maximal rank s and x * L m = O s×ms and y * L m = O s×ms , then there exists a nonsingular matrix M ∈ C s×s such that x = y M .Proof.
(i) Removing the first block row of H m yields a block upper-triangular matrix with nonsingular diagonal blocks.Hence H m has full column rank.(ii) Assume that H j+1,j is singular, in which case K j+1,j must be nonsingular.
The pencil satisfies a BRAD and so µ j K j+1,j = ν j H j+1,j with |µ j | + |ν j | = 0.This is only possible when µ j = 0 and hence H j+1,j = O s×s .(iii) This proof follows closely that of [10,Lemma 2.5].Consider auxiliary scalars α = 1 and β ∈ C such that αH m − βK m is of rank ms.Multiplying the BRAD (2.1) by α and subtracting βV m+1 K m from both sides gives Clearly, the right-hand side of this equation is of rank ms, hence so is the left-hand side.As a consequence, K m is of rank ms, which proves the claim for α = 0.If α = 0, consider α = α and β = β in (3.1).There are two possible cases: either αH m − βK m is unreduced and so by (i) of maximal rank, or αH m − βK m is not unreduced, and so there exists an index j ∈ {1, . . ., m} such that αH j+1,j − βK j+1,j is singular.Now either H j+1,j is nonsingular or H j+1,j = O s×s by (ii).If H j+1,j is nonsingular, then αH j+1,j − βK j+1,j being singular is equivalent to being singular, therefore β/α = µ j /ν j , i.e., equal to the jth pole and hence αA−βI must be nonsingular.If H j+1,j = O s×s , then we would have β/α = ∞ and hence α = 0, contradicting our assumption.Finally, since αA − βI is nonsingular, and V m+1 and K m are of full column rank, we conclude that αH m − βK m is of full column rank by (3.1).(iv) The first part follows directly from the rank-nullity theorem (see, e.g., [49,Theorem 3.17]).For the second part, assume that x , y ∈ C (m+1)s×s are of maximal rank s and x * L m = O s×ms and y * L m = O s×ms .Then by definition, the columns of x , y form a basis of the left null space of L m , and hence there exists a nonsingular matrix M ∈ C s×s such that x = y M .
We are now in the position to state our implicit Q theorem.
The orthonormal matrix V m+1 and the pencil (H m , K m ) are essentially uniquely determined by v 1 (the first block column of V m+1 ) and the poles Without loss of generality, we may assume that H j+1,j is nonsingular and therefore µ j /ν j = 0, for j = 1, . . ., m.Otherwise, if H j+1,j is singular for some j, the matrix K j+1,j is nonsingular as (H j , K j ) is an unreduced block upper-Hessenberg pencil, and so µ j = 0 as ν j H j+1,j = µ j K j+1,j .In this case 0 = µ j /ν j / ∈ Λ(A) and so A is nonsingular, hence we can consider where By construction, for each pole {µ j /ν j } m j=1 , the matrix L has all but the leading j blocks equal to the zero matrix: consider the (j + 1, j)th block of L We now prove by induction on m, one block column at a time, that the two BRADs are essentially equal.Define D 1,1 = I s×s , as v 1 = v 1 D 1,1 , and consider the first block column of (3.2), (3. 3) The block columns of V m+1 are block orthonormal, therefore pre-multiplying Substituting (3.4) into (3.3),rearranging and right-multiplying by H −1 2,1 yields Analogously, we have and By rearranging (3.5) and (3.6), we find The matrix D 2 is unitary as the block vectors v 2 and v 2 have orthonormal columns.
) and D 2 = diag(D 1 , D 2 ), we can write An analogous calculation shows that D * 2 K 1 T 1 = K 1 .Assume that for 2 ≤ j ≤ m, there exists a unitary block diagonal matrix D j = diag(D 1 , . . ., D j ) and a block upper-triangular matrix Reading off the jth block column of (3.2) yields where H :,j is the jth block column of the matrix H j .The columns of V j+1 are block orthonormal, and so Substituting (3.8) into (3.7) and rearranging for v j+1 yields We can write the block column L with block vectors z j−1 ∈ C (j−1)s×s and q j ∈ C js×s , where the columns of q j are chosen so that they do not lie in the span of the columns of L j−1 (ν j /µ j ) , those components are in z j−1 , and so q * j L By Lemma 3.2 (iii), the block matrix L (ν j /µ j ) j is of full column rank.Furthermore, the block vector q j has full column rank by (3.10), as the s columns of L (ν j /µ j ) :,j (the ) are linearly independent, and are not contained in the span of the columns of L (ν j /µ j ) j−1 .Substituting (3.10) into (3.9),gives We show that the first term on the right hand side of (3.11) is O N ×s , by substituting the first j − 1 block columns of (3.2) with ν = ν j and µ = µ j : Analogously, we have L and v j+1 H j+1,j = (I− V j V * j )A (ν j /µ j ) V j q j .By the induction hypothesis V j = V j D j , and so T j−1 from the induction hypothesis and q * j L s .The block vectors D j q j , q j and the block matrix L (ν j /µ j ) j−1 are of maximal rank.By Lemma 3.2 (iv), there exists a nonsingular matrix M ∈ C s×s such that D j q j = q j M .Combining equations (3.12) and (3.13), we obtain and so the columns of the block vectors v j+1 and v j+1 span the same space.Furthermore, v * j+1 v j+1 = v * j+1 v j+1 = I s×s , so there exists a unitary matrix D j+1,j+1 ∈ C s×s such that v j+1 = v j+1 D j+1,j+1 .This relation can be written as v j+1 (D j+1,j+1 H j+1,j − H j+1,j M ) = O (j+1)s×s , and so H j+1,j = D * j+1,j+1 H j+1,j M .It remains to find the block column T :,j of the nonsingular matrix T j such that H j = D * j+1 H j T j , and K j = D * j+1 K j T j .Rearranging and substituting (3.10) and analogous result into D j q j = q j M yields which can be rearranged to where Finally substituting our results in (3.8), we have T :,j .
Substituting (3.7) yields The subdiagonal elements follow from (3.14), and so H j = D * j+1 H j T j .An analogous result can be found for K j .Therefore V j+1 = V j+1 D j+1 , H j = D * j+1 H j T j , and K j = D * j+1 K j T j , so the two BRADs are essentially equal, thereby completeing the induction.
In Section 2 we showed how any BRAD can be transformed into an orthonormal BRAD using a thin QR factorization.By the implicit Q theorem, these two decompositions are essentially equal.
4. Rational matrix-valued functions and RKFUNB objects.Let us recall from the introduction that, under assumption (1.2), there is a one-to-one correspondence between any vector R(A) 3), and a rational matrix-valued function R(z) given in (1.4).Our aim is to show how rational matrixvalued functions are encoded in BRADs.The following lemma will be useful. Proof.
(i) This result follows from the definition of the circ operator in (1.3) and the fact that p(A) commutes with q m (A) −1 and all matrix powers of A.
For the right-hand side, we have where for the second equality we have used part (i) of the lemma and the fact that there exists a polynomial p(z) such that q m (A) −1 = p(A).
Given a BRAD AV m+1 K m = V m+1 H m , we can identify each block column of V m+1 = [v 1 , . . ., v m+1 ] with a rational matrix-valued function R j (z) such that v j+1 = R j (A) • b.In other words, we can write Using Lemma 4.1 (i) and (iii), this is equivalent to Note that we have formally "isolated" both A and b from the decomposition using the circ operator, hence it is meaningful to consider the functions R j (z) independently from (A, b).Moreover, these functions are fully encoded in the matrix pencil (H m , K m ).To give a more insightful characterization of the functions R j (z), it is helpful to transform a BRAD into an essentially equal BRAD where the subdiagonals of the upper-Hessenberg matrices are multiples of the identity (and therefore commute with all other square matrices of appropriate size).Definition 4.2.A BRAD is normalized if the subdiagonal blocks of the matrix pencil (H m , K m ) are multiples of the identity.
By the definition of a BRAD AV m+1 K m = V m+1 H m , the subdiagonal blocks satisfy ν j H j+1,j = µ j K j+1,j for j = 1, . . ., m with at least one of the two matrices being nonsingular.We define Z m := diag(Z 1 , . . ., Z m ) with Setting K m := K m Z m and H m := H m Z m , we obtain a normalized BRAD A V m+1 K m = V m+1 H m which is essentially equal to the original BRAD (by the implicit Q theorem).The following theorem gives a recursive characterization of the functions R j (z) associated with a normalized BRAD.Theorem 4.3.Given a normalized BRAD with block upper-Hessenberg pencil (H m , K m ) having subdiagonal blocks (µ j I s×s , ν j I s×s ), j = 1, . . ., m, and a block matrix V m+1 = [v 1 , . . ., v m+1 ].Then, for j = 1, . . ., m, Proof.Reading off the jth block column of the normalized BRAD yields for j = 1, . . ., m. Multiplying on the left by (Iµ j − Aν j ) −1 , which is guaranteed to exist by the choice of the pole µ j /ν j , gives Making use of the circ operator, we can write this as We can also characterize the functions R j (z) in terms of block determinants.Given an s × s matrix X = [X], the associated block determinant is defined as det (X ) := X.Further, the block determinant of a block upper-Hessenberg matrix X whose subdiagonal block elements are multiples of the identity, where M i,n is obtained by removing the ith block row and nth block column of X .This block determinant is a special form of a quasideterminant; see, e.g., [25,26].
In our case, the assumption of the block upper-Hessenberg structure and structured subdiagonal elements simplifies the following derivations significantly and is sufficient for our purposes.We have the following theorem.
Theorem 4.4.Given a normalized BRAD with block upper-Hessenberg pencil (H m , K m ) having subdiagonal blocks (µ j I s×s , ν j I s×s ), j = 1, . . ., m, and a block matrix V m+1 = [v 1 , . . ., v m+1 ].Then, for j = 1, . . ., m, where q j (z) = j =1 (µ − zν ).Proof.Multiplying (4.2) on the left by q j−1 (A) yields It is clear that (4.5) is the same as (4.4) for j = 1.Assume that (4.4) is true for j = 1, . . ., n − 1 and consider j = n.Substituting every occurence of v i on the right-hand side of (4.5) by (4.4) yields Using Lemma 4.1 (i) and (iv) yields It remains to show that the term in the last square brackets corresponds to det (zK n − H n ), from which would then follow the required statement that Indeed, the term in square brackets can be identified with the expansion given in (4.3) by setting X i,n = (zK i,n − H i,n ) and The latter formula follows from the fact that each M i,n has an (i − 1) × (i − 1) leading normalized block upper-Hessenberg matrix with block determinant det (zK i−1 − H i−1 ), and an (n − i) × (n − i) block upper-triangular part along the diagonal with block determinant (−1) n−i q n−1 (z)q i−1 (z) −1 .
We have shown that a block matrix pencil (H m , K m ) encodes a recursion of rational matrix-valued functions.If we fix R 0 ≡ I s×s , then all functions R j (z) are specified.Given a set of matrix coefficients D 0 , D 1 , . . ., D m ∈ C s×s , which we collect in a block matrix D := [D 0 , D 1 , . . ., D m ] for convenience, then defines a rational matrix-valued function which we refer to as RKFUNB (short for block rational Krylov function).The function R(z) is represented by the triplet (H m , K m , D), a block extension of the RKFUN objects introduced in [7].
The RKFUNB representation R(z) ≡ (H m , K m , D) allows for an efficient numerical evaluation procedure.Note that evaluating R(z) at a scalar z ∈ C such that q m (z) = 0 is equivalent to evaluating R(A) • b for A = zI s×s and b = I s×s .More generally, we consider evaluating R( A) • b for A ∈ C N × N with ξ 1 , . . ., ξ m / ∈ Λ( A) and b ∈ C N ×s .We start by computing {R j ( A) • b} m j=0 by rerunning the block rational Arnoldi method without computing the orthogonalization coefficients, but reusing the quantities provided in (H m , K m ).This is shown in Algorithm 4.2, which constructs a new decomposition Note that this decomposition is not necessarily a BRAD as V m+1 might fail to be of full column rank.Finally, the evaluated RKFUNB is obtained by summation This evaluation procedure is implemented in the Rational Krylov Toolbox since version 2.8 as a method of the rkfunb class.Given an RKFUNB represented as a MATLAB object R, the user can type R(A,b) to obtain R(A) • b.We will show an application of this procedure in Section 7.1.

Continuation matrices.
As before we work with the rank assumption (1.2).At each iteration j = 1, 2, . . ., m of the block rational Arnoldi method (Algorithm 9), we have available a BRAD
Compute v j+1 := w j C −1 j+1,j and set which we would like to extend by an orthonormalized version of the block vector To select an appropriate block element V j T j ∈ Q j (A, v 1 ) for the extension, a continuation matrix T j ∈ C js×s of rank s needs to be chosen.By Theorem 4.3, V j T j can be written in terms of a rational matrix-valued function Let us investigate the case of an avoidable breakdown of the orthogonalization procedure with a finite pole ξ j = µ j /ν j , where there exists a vector x ∈ C s \ {0} and another function S This means that the columns of the block vector w j can be linearly combined into a vector already contained in the span of V j , hence not expanding this space by s new directions.Left-multiplying (5.2) by the nonsingular matrix q j−1 (A)(A − ξ j I), it can be rewritten in the equivalent form Comparing the coefficients of the independent variable z in results in the following system of equations: This can be rewritten as a polynomial (nonlinear) eigenvalue problem [30,51] ( The factor (ρ j ξ j − η j ) is nonzero by our assumption ξ j = µ j /ν j = η j /ρ j (see Algorithm 9).Consequently, given a block vector V j T j = R(A) • v 1 , an avoidable breakdown will occur if ξ j is an eigenvalue of the polynomial eigenvalue problem Throughout the literature, the most commonly used continuation matrix is such that V j T j = v j , i.e., With this continuation strategy, which always uses the last computed block basis vector and is henceforth referred to as strategy 'last', one can read off the "forbidden poles" (which would lead to an avoidable breakdown) as the eigenvalues of the matrix pencil in (5.1).
Proof.Consider (5.1) in the "scalar" form merge the last two columns and rearrange to get where Theorem 5.1 provides an easy way to check whether a given pole ξ j should be avoided when the continuation strategy 'last', according to (5.3), is used.If all poles ξ j are pairwise distinct, another strategy is to use This strategy, called 'first', allows for parallelism in the construction of the block Krylov basis vectors, but even for the case s = 1 it can lead to severe numerical instabilities [10].
In practice we are often in the situation where the pole sequence ξ 1 , ξ 2 , . . ., ξ m is given a priori and we wish to have a continuation strategy that avoids breakdowns and numerical instabilities.For the single-vector (s = 1) case, Ruhe [47] introduced a continutation vector designed to prevent breakdowns caused by the cancelation of a common root in the numerator and denominator of the rational function underlying a rational Krylov vector.We now extend this strategy, referred to as 'ruhe', to the block case: again we assume that at iteration j of Algorithm 2.1 we have a BRAD (5.1) which we would like to extend by a new block vector using the pole ξ j = µ j /ν j = ∞.We start with computing a full QR factorization of (5.4) The involved matrices are of size ν j H j−1 − µ j K j−1 ∈ C js×(j−1)s , Q j ∈ C js×js , and R j,j−1 ∈ C js×(j−1)s .Note that the subscripts refer to the iteration index and not the matrix sizes.By Lemma 3.2 (iii) we know that R j−1,j−1 , the upper (j − 1)s × (j − 1)s part of R j,j−1 , is nonsingular.
Multiplying the BRAD by ν j η j , subtracting µ j η j V j K j−1 from both sides and rearranging yields ( Similarly, multiplying the BRAD by ρ j µ j , subtracting ρ j ν j AV j H j−1 from both sides and rearranging yields (ν j A − µ j I)V j ρ j H j−1 = ρ j AV j (ν j H j−1 − µ j K j−1 ).
(5.6) Subtracting (5.5) from (5.6) and rearranging gives the decomposition Substituting the QR factorization (5.4) on the left and inserting identity on the right, we have The matrix R j,j−1 has s more rows than columns, and so the last s rows must be zero.Multiplying on the right by the inverse of R j−1,j−1 leads to where Our aim is to find a continuation combination V j T j such that multiplying by (ν j A − µ j I) −1 (ρ j A − η j I) expands the space.By (5.7) we know that any vector in the span of W j−1 does not enlarge the span of W j = V j Q j , and so we should choose vectors from the orthogonal complement of W j−1 in the span of V j .This can be achieved by selecting the last s columns of Q j as the continuation matrix.Hence, the 'ruhe' continuation strategy extended to the block case becomes Q j (:,end − s + 1 : end) otherwise. (5.8) In Section 7.2 we will provide a numerical illustration of the benefits of the continuation strategy 'ruhe' over the more commonly used strategy 'last'.
Remark 5.2.It is worth noting that, when constructing a rational Krylov basis using Algorithm 2.1 with a pole ξ j = ξ j−1 repeated from the previous iteration, the continuation strategies 'ruhe' and 'last' are essentially equivalent.With ξ j−1 = µ j−1 /ν j−1 we have that µ j−1 K j,j−1 = ν j−1 H j,j−1 in (5.1).Hence, the final s rows of the matrix ν j H j−1 − µ j K j−1 in (5.4) will be zero and therefore the last s columns of Q j must be of the form for any unitary matrix U s×s , which can be chosen as I s×s .This is the same continuation matrix as (5.3).
6. Deflation.So far we have worked with the full rank assumption (1.2), but in practice exact rank deficiencies may occur in the Krylov matrix [b, Ab, . . ., A j b], or the basis vectors w j computed in line 5 of Algorithm 2.1 might be such that [V j , w j ] is badly conditioned.If the (nearly) dependent vectors in w j are not removed, the matrix C j+1,j computed in line 8 of the algorithm is (nearly) singular, resulting in a (near) breakdown of the orthonormalization procedure.The removal of the (nearly) linearly dependent vectors from w j is known as deflation.
Deflation can implemented by a straightforward modification of Algorithm 2.1.Assume that the block vector w j in line 8 has s j ≤ s columns.We can compute a thin SVD or column-pivoted QR factorization which, in both cases, is of the form where ∈ C (s j −s j+1 )×s j , and P j ∈ C s j ×s j is unitary.Here, Q , and the block columns of the residual matrix S m = [s 1 , . . ., s m ] are This type of fat decomposition, used by Gutknecht in [28] for block polynomial Krylov spaces, ensures that the diagonal blocks H j,j and K j,j are square.The sequence s 1 , s 2 , . . ., s m+1 is monotonically nonincreasing and at least one of the subdiagonal blocks H j+1,j or K j+1,j is of maximal rank s j+1 .This means that, given s 1 , one can infer the complete block structure of the pencil (H m , K m ) by calculating the ranks of the subdiagonal blocks.
In this type of thin decomposition, all the subdiagonal blocks H j+1,j and K j+1,j are square.
The MATLAB Rational Krylov Toolbox implements both types of decompositions in the rat krylov function; see [8] for details.

Examples.
In this section we provide numerical illustrations using the MAT-LAB Rational Krylov Toolbox version 2.8 .All the experiments were performed with the 64-bit version of MATLAB 2018a on a machine equipped with an Intel I7-6700 processor running at 3.40 GHz.

Vector autoregression via RKFUNB.
In Section 4, we introduced the RKFUNB framework which provides a Krylov-based representation of rational matrixvalued functions.Matrix-valued functions have applications to modeling multivariate time series, with one particular example being vector autoregression (VAR).Consider the realization of a multivariate time series y = [y 1 , . . ., y s ] ∈ R N ×s (s N ) such that y is of maximal rank s.Assuming that this times series is generated by a VAR(p) process with mean zero, it can be written as Available for download from http://rktoolbox.org.
where y t refers to the t-th row of y , C 1 , . . ., C p ∈ R s×s , and ε t is a white noise process [39,Chapter 2].The coefficients C 1 , . . ., C p can be estimated by solving the least squares problem min Let us define the matrices and the seminorm y D := Dy 2 .Then (7.1) can be written more concisely as min where the minimization on the right-hand side is over all matrix polynomials Using the block rational Arnoldi method, we can generate a block basis V p = [v 1 , . . ., v p ] which blockspans K p (A, y ) and is such that y = v 1 R with nonsingular R ∈ R s×s , and V * p DV p = I ps×ps .Using Lemma 4.1 (ii)-(iii), we can as well perform the minimization (7.1) using the block basis V p as min P (z) where P (z) = RP (z)R −1 and the minimizer P (A) • v 1 is naturally represented as an RKFUNB object.Single-step predictions can now be obtained by repeatedly applying P (A) to the time series data.The intimate connection between univariate autoregressive modeling and polynomial Krylov spaces is well known, with [17] being one of the key references on this topic.The connection between the multivariate case and block Krylov methods is natural.Here we want to demonstrate the use of block Krylov spaces on a time series example taken from [39, Example 3.2.3] .The s = 3 time series under consideration, stored in the matrix y , correspond to first-order differences of the logarithms of quarterly seasonally adjusted West German fixed investment, disposable income, and consumption expenditures.The aim is to forecast these time series using a VAR(2) model.Figure 7.1 shows a snippet of MATLAB code performing a one-step VAR(2) prediction using the RKFUNB class of the Rational Krylov Toolbox.The predictions shown in Figure 7. 7.2.Block continuation strategies.We now consider the INLET problem from the Oberwolfach Model Reduction Benchmark Collection [1], an active control model of a supersonic engine inlet; see also [35].There are two nonsymmetric matrices A, E ∈ R N ×N , a block vector b ∈ R N ×2 , and a row vector c T ∈ R 1×N with N = Data available from http://www.jmulti.de/data_imtsa.htmlunder the filename e1.dat.before orthogonalization, where W m+1 = [R, w 1 , w 2 , . . ., w m ] with the quantities computed in lines 1 and 5 of Algorithm 2.1, and D is a diagonal matrix chosen such that κ(W m+1 D) is (approximately) minimized.Furthermore, orth = V * m+1 V m+1 − I (m+1)s×(m+1)s 2 and space = V m+1 (V * m+1 V m+1 ) − V m+1 2 , where V m+1 is the rational Krylov basis computed using double precision and V m+1 is computed using quadruple precision via the Multiprecision Computing Toolbox [3].
Table 7.1 shows that the condition number of the block basis being orthogonalized is smallest with the continuation strategy 'ruhe', and the computed block vectors are closer to being orthonormal after a single orthogonalization step.The 13th pole in ξ (2) results in a near-breakdown and an inaccurate basis when the continuation strategy 'last' is used, while the strategy 'ruhe' still performs robustly.8. Conclusions and future work.In view of the favourable numerical stability observed with the continuation strategy 'ruhe', which we have extended to the block case in Section 5, we believe this should be the default strategy for the (block) rational Arnoldi method.It now is the default choice in the Rational Krylov Toolbox.While the RKFUNB framework is applicable to vector autoregressive modeling as demonstrated in Section 7.1, we currently do not know how to obtain rational models of vector autoregression with moving averages (VARMA).This would require a block-version of the RKFIT pole relocation strategy developed in [11].
The connections between rational Krylov methods and nonlinear eigenvalues of rational matrix-valued functions might open several research directions.For example, it does not seem to be understood how these nonlinear eigenvalues relate to the eigenvalues of the matrix A in the block case, and a generalization of a convergence theory such as that in [5] is currently lacking.An example illustrating the different convergence behavior of single-vector and block rational Ritz values for a Wilkinson matrix can be found in the example collection of the RKToolbox.

1 b 1 +
• • • + α d b d with scalar coefficients α k ∈ C.Under the assumption that the (m + 1)s columns of [b, Ab, . . ., A m b] are linearly independent, (1.2) g., α, β ∈ C. Vectors are denoted by lowercase Latin letters, e.g., b ∈ C N and v k ∈ C N .Matrices are denoted by uppercase Latin letters and their elements with the corresponding lowercase letters, e.g., A = [a ij ] ∈ C N ×N .Block vectors are denoted by bold lowercase Latin letters and their columns are vectors, e.g., b = [b 1 , . . ., b s ] ∈ C N ×s .Block matrices are denoted by bold uppercase Latin letters and their elements are matrices, e.g., A = [A ij ] and H m = [H ij ], or block vectors, e.g., V m = [v 1 , . . ., v m ] columns to keep, and Q (d) j to the columns to deflate.The matrix R (k) j is of upper trapezoidal form and R (d) j should have small norm.From here, we have two options: (i) We define the next basis vector v j+1 := Q (k) j and set C j+1,j := R (k) j P * j ∈ C s j+1 ×s j .When combined into an approximate BRAD AV m+1 K m = V m+1 H m + S m , the block upper-Hessenberg matrices H m and K m are of the format (only shown for H m ): 2 are visually identical to those in [39, Fig. 3.3, Example 3.5.4].