The RKFIT algorithm for nonlinear rational approximation

The RKFIT algorithm outlined in [M. Berljafa and S. Güttel, Generalized rational Krylov decompositions with an application to rational approximation, SIAM J. Matrix Anal. Appl., 2015] is a Krylov-based approach for solving nonlinear rational least squares problems. This paper puts RKFIT into a general framework, allowing for its extension to nondiagonal rational approximants and a family of approximants sharing a common denominator. Furthermore, we derive a strategy for the degree reduction of the approximants, as well as methods for their conversion to partial fraction form, for the efficient evaluation, and root-finding. We also discuss commons and differences of RKFIT and the popular vector fitting algorithm. A MATLAB implementation of RKFIT is provided and numerical experiments, including the fitting of a MIMO dynamical system and an optimization problem related to exponential integration, demonstrate its applicability.

1. Introduction.Rational approximation problems arise in many areas of engineering and scientific computing.A prominent example is that of system identification and model order reduction, where calculated or measured frequency responses of dynamical systems are approximated by (low-order) rational functions [20,26,2,23,19].Some other areas where rational approximants play an important role are analog filter design [7], time stepping methods [43], transparent boundary conditions [28,17], and iterative methods in numerical linear algebra (see, e.g., [32,42,18,27,33]).Here we focus on discrete rational approximation in the least squares (LS) sense.
In its simplest form the weighted rational LS problem is the following: given data pairs {(λ i , f i )} N i=1 with pairwise distinct λ i , and positive weights {w i } N i=1 , find a rational function r of type (m, m), that is, numerator and denominator of degree at most m, such that (1.1) The weights can be used to assign varying relevance to the data points.For example, when the function values f i are known to be perturbed by white Gaussian noise, then the w i can be chosen inversely proportional to the variance.
Even in their simplest form (1.1), rational LS problems are challenging.Finding a rational function r = p m /q m in (1.1) corresponds to a nonlinear minimization problem, as the denominator q m is generally unknown, and solutions may depend discontinuously on the data, be nonunique, or even nonexistent.An illustrative example inspired by Braess [10, p. 109] is to take fixed m ≥ 1 and N > 2m and let (1.2) Then the sequence of rational functions r j (z) = 1/(1 + jz) makes the misfit for (1.1) arbitrarily small as j → ∞, but the f i do not correspond to values of a type (m, m) rational function (there are too many roots).Hence a rational LS solution does not exist.If, however, the data f i are slightly perturbed to f i = r j (λ i ) for an arbitrarily large j, then of course r j itself is an LS solution to (1.1).
A very common approach for solving (1.1) approximately is linearization.Consider again the data (1.2) and the problem of finding polynomials p m and q m of degree at most m such that This problem has a trivial solution with q m ≡ 0, and to exclude it we need some normalization like, for example, a "pointwise" condition q m (0) = 1.Under this condition the linear problem (1.3) is guaranteed to have a nontrivial solution (p m , q m ), but the solution is clearly not unique; since f i = 0 for i ≥ 2, any admissible denominator polynomial q m with q m (0) = 1 corresponds to a minimal solution with p m ≡ 0. On the other hand, for the normalization condition q m (1) = 1, the polynomials q m (z) = z and p m ≡ 0 solve (1.3) with zero misfit.This example shows that linearized rational LS problems can have nonunique solutions, and these may depend on normalization conditions.With both normalization conditions, the rational function r = p m /q m with (p m , q m ) obtained from solving the linearized problem (1.3) may yield an arbitrarily large (or even infinite) misfit for the nonlinear problem (1.1).
The pitfalls of nonlinear and linearized rational approximation problems have not prevented the development of algorithms for their solution.An interesting overview of algorithms for the nonlinear problem based on repeated linearization, such as Wittmeyer's algorithm, is given in [3].Robust solution methods for the linearized problem using regularized SVD are discussed in [22,21].
The aim of this paper is to present and analyze an extension of the RKFIT algorithm initially outlined in [6].RKFIT is an iterative method for solving rational LS problems more general than (1.1).For given matrices {A, F } ⊂ C N ×N and a vector b ∈ C N , RKFIT attempts to find a rational function r such that Note that this problem contains (1.1) as a special case with For RKFIT the matrices A and F are not required to be diagonal.In many applications F is a matrix function of A or an approximation thereof, i.e., F = f (A) or F ≈ f (A).
A main contribution of this work compared to [6] is the extension of RKFIT to nondiagonal approximants, i.e., allowing one to compute rational functions of the general type (m + k, m) with k ≥ −m.Further, we extend RKFIT to rational approximation problems involving a family of matrices {F [j] } j=1 ⊂ C N ×N and a block Downloaded 09/22/17 to 130.88.240.95.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php More precisely, we seek a family of rational functions {r [j] } j=1 of type (m + k, m), all sharing a common denominator q m , such that the relative misfit is minimal, i.e., (1.5 The matrices F [j] may, for instance, correspond to values of a parameter-dependent matrix function like F [j] = exp(−t j A), and in section 6 we consider an application of such a problem.The matrices D [j] act as elementwise weights, whereas the vectors in B can be viewed as spectral weights relative to the eigenpairs of A.
To summarize our terminology, here is a list of the data in problem (1.5): A : interpolation node matrix of size N × N , F [j] : interpolation data matrices of size N × N , D [j] : elementwise weight matrices of size N × N , B : block of spectral weight vectors, an N × n matrix, r [j] : rational functions sharing the same denominator q m , (m + k, m) : type of the rational functions r [j] with k ≥ −m.
We show how rational Krylov techniques can be used to tackle problems of the form (1.5).The outgrowth of this work is a new MATLAB implementation of RKFIT, which is part of the Rational Krylov Toolbox [5] available online. 1 One particularity of RKFIT is its ease of use.For example, with = 1 and the matrices A, F, B and a vector of initial poles xi being defined in MATLAB, the user simply calls [xi, r, misfit] = rkfit(F, A, B, xi) to obtain a rational function r represented as a MATLAB object of class RKFUN, which stands for rational Krylov function.The toolbox implements several RKFUN methods, for example, the evaluation of r at scalar arguments or as a matrix function; the commands r(z) and r(A,B) evaluate r(z) and r(A)B, respectively (where A and B can be different from the matrices used for the construction of r).The conversion of an RKFUN to partial fraction form (the residue command), root-finding (roots), or easy-to-use plotting (ezplot) are provided as well.The use of the MATLAB object-oriented programming capabilities for these purposes is inspired by the Chebfun system [14].
Alongside the extension of RKFIT to nondiagonal approximants in section 2, another contribution of this paper is Theorem 2.2, which shows that RKFIT solves (1.4) exactly if F is a rational matrix function of type (m + k, m).In section 3 we propose a procedure for automatically decreasing the degree parameters m and k, thereby reducing possible deficiencies in the rational approximants.That section also contains Theorem 3.1, which relates the roots of an RKFUN to the eigenvalues of a matrix pencil.Based on this theorem, we present a new procedure to obtain good starting guesses for RKFIT after a degree reduction has been performed.
We point out that initially, in sections 2 and 3, we only consider problem (1.4), which is a special case of (1.5) for a single rational function ( = 1) and a single vector B = b (n = 1).The generalization to the full problem (1.5) is discussed in section 4. In section 5 we develop a new approach for the efficient evaluation of the RKFUNs produced by RKFIT.We also show how to compute the roots of RKFUNs and how to convert them into partial fraction form.Numerical examples are given in section 6, including the fitting of a multiple-input/multiple-output (MIMO) dynamical system and a new pole optimization approach for exponential integration.An appendix discusses the connections of RKFIT and other approximation algorithms, in particular, the popular vector fitting (VFIT) method [26].
2. The RKFIT algorithm.The nondiagonal version of the RKFIT algorithm considered here aims to find a rational function r = p m+k /q m of type (m + k, m) which solves problem (1.4).As the denominator q m is not known and (1.4) depends nonlinearly on it, RKFIT tries to iteratively improve a starting guess for q m by solving a linearized problem at each iteration.Once a satisfactory q m is obtained, the linear part p m+k is easily found.
The method is succinctly described in Algorithm 2.1.Different from the basic version presented in [6], it makes use of two linear spaces in C N , a search space S and a target space T , both of which are (rational) Krylov spaces.Given a matrix A ∈ C N ×N , a (so-called) starting vector b ∈ C N , an integer m ≥ 0, and a nonzero polynomial q m ∈ P m with roots disjoint from the spectrum of A, we define the associated rational Krylov space of order m as The roots of q m are called poles of the rational Krylov space, and they are denoted by ξ 1 , ξ 2 , . . ., ξ m .For convenience, we sometimes refer to q m itself as poles of the rational Krylov space.If deg(q m ) < m, then m − deg(q m ) of the poles are set to ∞, and we refer to them as formal (multiple) roots of q m .If all poles are set to ∞, we obtain the Algorithm 2.1.High-level description of RKFIT.
Let q m ∈ P m be such that v = q m (A)q m (A) −1 b.
By P T in line 5 of Algorithm 2.1 we denote the orthogonal projection onto T .The essence of Algorithm 2.1 is the relocation of poles in line 7. Since with any polynomial q m ∈ P m we can associate a vector v = q m (A)q m (A) −1 b ∈ S, and the other way around, we may identify q m , the improvement of q m , by looking for the corresponding vector v ∈ S. Theorem 2.2 below, a consequence of Lemma 2.1, provides insight into the RKFIT pole relocation, i.e., lines 5-7 of Algorithm 2.1.Downloaded 09/22/17 to 130.88.240.95.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpLemma 2.1.Let q m , q m ∈ P m be nonzero polynomials with roots disjoint from the spectrum of A ∈ C N ×N .Fix −m ≤ k ∈ Z, and let b ∈ C N be such that 2m + k < M (A, b).Assume that F = p m+k (A)q m (A) −1 for some p m+k ∈ P m+k .Define S and T as in lines 3 and 4 of Algorithm 2.1, respectively, and let V m+1 be an orthonormal basis of S. Then the matrix (I−P T )F V m+1 has a nullspace of dimension ∆m+1 if and only if ∆m is the largest integer such that p m+k /q m is of type has a unique representation in terms of p 2m+k /(q m q m ) since 2m + k < M .Assume that F v ∈ T .In this case we also have the representation F v = p m+k (A)q m (A) −1 b, with a uniquely determined p m+k ∈ P m+k .By the uniqueness of the rational representations we conclude that p 2m+k /(q m q m ) = p m+k /q m or, equivalently, that q m divides p 2m+k = p m+k p m .Hence, the poles of p m+k−∆m /q m−∆m ≡ p m+k /q m must be roots of p m .The other ∆m roots of p m can be chosen freely, giving rise to the (∆m + 1)-dimensional subspace (2.1) whose elements v are such that F v ∈ T .Hence, ∆m + 1 is the dimension of the nullspace of (I − P T )F V m+1 .
Theorem 2.2.Let q m , q m , F, A, b, m, k, S, and T be as defined in Lemma 2.1.Then p m+k and q m are coprime and either deg( The theorem asserts that if F b = p m+k (A)q m (A) −1 b and ∆m = 0, then the "roots" of v = γq m (A)q m (A) −1 b match the unknown poles q m , and the next approximate poles become q m := q m .Hence RKFIT identifies the exact poles within one iteration independently of the starting guess q m .If ∆m > 0, the exact m − ∆m poles are also found, but additional ∆m superfluous poles at arbitrary locations are present as well.In section 3 we develop a procedure for automatically reducing the denominator degree m by ∆m and adapting k.Comments regarding the convergence of RKFIT when dealing with noisy data (and roundoff) or when F b cannot be exactly represented as r(A)b for a rational function r of type (m + k, m) are included in section A.5 of the appendix.
In the remaining part of this section we discuss line-by-line how Algorithm 2.1 can be implemented using rational Krylov techniques.These considerations are also important for developments in the forthcoming sections.
• Line 3: An orthonormal basis V m+1 ∈ C N ×(m+1) for the search space S = R( V m+1 ) can be obtained with the rational Arnoldi algorithm which, given A, b and q m , constructs a decomposition of the form . ., m and with { h j+1,j / k j+1,j } m j=1 being the (formal) roots of q m , i.e., the poles of the rational Krylov space S. A decomposition of the form (2.2) is called a rational Arnoldi decomposition (RAD).For details of the rational Arnoldi algorithm and properties of RADs we refer the reader to [4,6,35,37,38].Downloaded 09/22/17 to 130.88.240.95.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

we can compute an orthonormal basis
V m+k+1 for T using once again the rational Arnoldi algorithm.A computationally more economic alternative is to reuse (2.2).Indeed, if k = 0, we simply have T = S.
Otherwise, S has to be either expanded (if k > 0) or compressed (if k < 0) to get T : -In the case of superdiagonal approximants (k > 0), T = Q m+k+1 (A, b, q m ) is the rational Krylov space of dimension m + k + 1 with m poles being the roots of q m and additional k poles at infinity.In order to get an orthonormal basis for Q m+k+1 (A, b, q m ), we expand (2.2) into AV m+k+1 K m+k = V m+k+1 H m+k by performing k additional polynomial steps with the rational Krylov algorithm.
Let us, for convenience, label by V m+k+1 := V m+k+1 the orthonormal basis for T when k ≥ 0. Thus, ).An orthonormal basis for T is then given by truncating V m+1 to V m+k+1 , the first m + k + 1 columns of V m+1 .Using a sequence of Givens rotations in a QZ fashion (as explained in [39, p. 495] or [4, section 5.2]), we get unitary matrices Q m+1 and Z m such that

, m, and we can set
a solution is given by v = V m+1 c, where c is a right singular vector of S corresponding to a smallest singular value σ min .• Lines 6-7: What we need in line 3 as input for the rational Arnoldi algorithm are the poles of the rational Krylov space that is being constructed, that is, the roots of q m .Let Q m+1 be unitary with first column Q m+1 e 1 = c; then the roots of q m are the eigenvalues of the m × m pencil [6, section 5] for details).
• Line 9: The approximant r of type (m + k, m) is computed by LS approximation of F b from the target rational Krylov space T .More precisely, if V m+k+1 is an orthonormal basis for T , then the approximant r is represented by a coefficient vector c ∈ C m+k+1 such that r(A)b = b 2 V m+k+1 c.The coefficient vector is given by Computing the coefficient vector c at each iteration does not significantly increase the computational complexity because F b needs to be computed only once.The availability of c also enables the cheap evaluation of the relative misfit (1.5), which allows us to stop the RKFIT iteration when a desired tolerance ε tol is achieved.Downloaded 09/22/17 to 130.88.240.95.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php3. Tuning degree parameters m and k.In some applications, one may want to construct a rational function of sufficiently small misfit without knowing the required degree parameters m and k in advance.In such situations one can try to fit the data with high enough (for instance, the maximal one is willing to use) degree parameters and then, after RKFIT has found a sufficiently good approximant, reduce m and k without deteriorating much the approximation accuracy.In this section we present a strategy for performing this reduction.
We assume to have at hand an (m + k, m) approximant r such that F b − r(A)b 2 ≤ F b 2 ε tol .We then propose the following three-step procedure.These steps are discussed in the following three subsections for the case that F is a rational matrix function, while in subsection 3.4 we provide a numerical illustration.In subsection 3.5 we discuss the case when F is not a rational matrix function.This is followed by another numerical illustration in subsection 3.6.

3.1.
Reducing the denominator degree m.Assume that F is a rational matrix function.Our reduction procedure for m is based on Lemma 2.1, which asserts that a defect ∆m + 1 of the matrix S = (I − P T )F V m+1 corresponds to F being of type (m − ∆m + k, m − ∆m).Due to numerical roundoff, the numerical rank of S related to a given tolerance F b 2 ε tol (with, e.g., ε tol = 10 −15 ) is computed.More precisely, we reduce m by the largest integer ∆m ≤ min{m, m + k} such that where σ 1 ≥ • • • ≥ σ m+1 are the singular values of S.
3.2.Finding a lower-degree approximant.If ∆m ≥ 1, then m needs to be reduced, and a new approximant of lower numerator and denominator degree is required.As seen in the proof of Lemma 2.1, the ∆m + 1 linearly independent vectors spanning N all share as the greatest common divisor (GCD) the polynomial q m−∆m , and its roots should be used as poles of the reduced-degree rational approximant.The following theorem shows how these roots can be obtained from the pencil ( H m , K m ) in (2.2).Theorem 3.1.Let (2.2) be an RAD for Q m+1 (A, b, q m ) with m + 1 < M (A, b), and let the r j ≡ V m+1 c j for j = 1, . . ., ∆m + 1 be linearly independent.Assume that the numerators of r j share as GCD a polynomial of degree m − ∆m.Let X ∈ C (m+1)×(m+1) be a nonsingular matrix with Xe j = c j for j = 1, . . ., ∆m+1.
Assume further that K is nonsingular.Then the roots of the GCD are the eigenvalues of the (m − ∆m) × (m − ∆m) generalized eigenproblem ( H , K ).
Let us now assume that λ is a root of multiplicity 2. Note that K = I m−∆m .Differentiating the scalar RAD with respect to λ gives The last m − ∆m columns in the latter relation give Hence r (λ) is a generalized eigenvector for the eigenvalue λ of (H , K ), which is hence of multiplicity two or greater.The proof for roots of higher multiplicity follows the same argument.
Remark 3.2.The assumption that K is nonsingular is used in the proof of Theorem 3.1 for the case of repeated roots only.We conjecture that this assumption can be removed also when there are multiple roots, and that it follows from the fact that the numerators of {r j } ∆m+1 j=1 have as GCD a polynomial of degree m − ∆m.

3.3.
Numerator degree-revealing basis.We now assume that the denominator degree m := m − ∆m has already been reduced and a new approximant r of type (m + k, m) such that F b − r(A)b 2 ≤ F b 2 ε tol has been found.Reducing the numerator degree is a linear problem, and we can guarantee that the misfit stays below ε tol after the reduction.
Let T = K m+k+1 (A, q m (A) −1 b) be the final target space such that r(A)b ∈ T , and let V j be an orthonormal basis for K j (A, q m (A) −1 b) for j = 1, . . ., m + k + 1.As the vectors in V j have ascending numerator degree, this basis reveals the degree of r(A)b by looking at the trailing expansion coefficients c ∈ C m+k+1 satisfying r The degree of the numerator of r can therefore be reduced to m + k − ∆k, where ∆k is the maximal integer 1 or ∆k = 0 if such an integer i does not exist.The last ∆k components of c may hence be truncated, giving c  The table on the left shows the number ∆m+1 of singular values of (I−P T )F V m+1 below the tolerance F b 2 ε tol = 10 −15 for different choices of m and k.For the choice (m + k, m) = (3, 9), for instance, we obtain ∆m = 2, and hence the reduced type is (1,7).In this case m is not fully reduced because k was chosen too small.For the choice (m + k, m) = (8, 6) we obtain ∆m = 3, giving the reduced type (5,3).The roots of the GCD are −1 and −3 ± i2.32 × 10 −7 .With these three finite poles and another two poles at infinity, the type (5, 3) approximant r produces a relative misfit 7.02 × 10 −17 .The expansion coefficients c Q of r in the orthonormal rational basis are listed to the right of the table.They indicate that the last two poles at infinity are actually superfluous, and r is of type at most (3,3).Only the expansion of r in the orthonormal polynomial basis, as explained in subsection 3.3, reveals that r is of type (1,3).The coefficients c K in this polynomial basis are also given.
3.5.General F .The following lemma extends Lemma 2.1 to the case when F is not necessarily a rational matrix function.
Lemma 3.3.Let q m , A, b, m, k, S, T , and V m+1 be as in Lemma 2.1.Assume that for F ∈ C N ×N we have found a rational approximant r = p m+k /q m of type (m+k, m) such that F b − r(A)b 2 ≤ F b 2 ε tol .If the matrix (I − P T )F V m+1 has ∆m + 1 singular values smaller than F b 2 ε tol , then there exists a (∆m + 1)-dimensional subspace N g ⊆ S, containing b, such that min

Proof. Consider a thin SVD of the matrix (I
Recall that if F is a rational matrix function, then the space N g defined in Lemma 3.3 corresponds to the exact nullspace N = K ∆m+1 (A, q m−∆m (A)q m (A) −1 b) defined in (2.1), where the (numerators of the) rational functions share as GCD the polynomial q m−∆m .In the general case N g is only a subspace of the larger rational Downloaded 09/22/17 to 130.88.240.95.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpKrylov space S, and the rational functions present in N g do not necessarily share a common denominator.However, for every v = p m (A)q m (A) −1 b ∈ N g the vector F p m (A)q m (A) −1 b is well approximated in the 2-norm by some vector p(A)q m (A) −1 b, with p ∈ P m+k .This suggests that the polynomials p m corresponding to vectors v ∈ N g share an approximate GCD (see, e.g., [8]) whose roots approximate the poles of a "good" rational approximation r(A)b for F b. We therefore propose to use the same reduction procedure as suggested by Theorem 3.1.
As there is no guarantee that after reduction RKFIT will be able to find an approximant of relative misfit below ε tol , the use of a safety parameter ε safe is recommended.More precisely, we reduce m by the largest integer ∆m ≤ min{m, m + k} such that where σ 1 ≥ • • • ≥ σ m+1 are the singular values of S. By default we use ε safe = 0.1.

3.6.
Example: Degree reduction for a nonrational matrix function.The table on the left shows the number ∆m+1 of singular values of (I−P T )F V m+1 below F b 2 ε tol ε safe = 10 −5 for different choices of m and k.For the choice (m + k, m) = (9, 10) we obtain ∆m = 4, implying the reduced type (5,6).The choice (m + k, m) = (11, 6) is reduced down to (9,4) as ∆m = 2. Representing this new approximant in the numerator degree-revealing basis allows for a further reduction to type (5,4).The table on the right visualizes how many RKFIT iterations are required after reduction to reobtain an approximant of misfit below ε tol = 10 −4 , using the approximate GCD strategy for selecting the poles for the restart of RKFIT.Note that in most instances the misfit remains acceptable after the reduction, while otherwise only one further RKFIT iteration is needed to obtain an acceptable approximation.This shows the benefit of the developed reduction strategy.(Another example is given in section 6.1.)Downloaded 09/22/17 to 130.88.240.95.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php 4. Extensions.In this section we discuss extensions of RKFIT for solving problems of the more general form (1.5).

Family of rational functions.
In order to tackle the more general problem (1.5) of finding a family {r [j] } j=1 of rational functions with a common denominator, we only need to modify line 5 in Algorithm 2.1 to Once again, a solution is v = V m+1 c, where c is a right singular vector corresponding to a smallest singular value of the matrix m+1) , where (4.1) The rational approximants {r [j] } j=1 may be represented by the coefficient vectors The remaining parts of RKFIT, with the exception of the degree-reducing strategy, are unaffected.In order to make sure that all of {r [j] } j=1 share the same denominator, the reduction of m should be based on the singular values of S, and not the individual S j .The numerator reduction can be performed for each r [j] individually.

Block case. Let us consider the case
, and where I n ⊗ X = blkdiag(X, . . ., X).Since , we recover the single-column case n = 1 considered so far, with b = vec(B).
Our implementation [5] supports the case n > 1 and takes advantage of the structure present in (4.4) so that only {D [j] , F [j] } j=1 and A are stored, while D [j] , F [j] , and A are never constructed explicitly.In fact, D [j] , F [j] , and A are not explicitly needed either, as all that is required is the ability to compute D [j] x , F [j] x , Ax for arbitrary x ∈ C N , as well as the ability to solve shifted linear systems (A − ξI)x = v .4.3.Avoiding complex arithmetic.If {D [j] , F [j] } j=1 , A, and B are realvalued and the set of starting poles {ξ j } m j=1 is closed under complex conjugation, we can use the "real version" of the rational Arnoldi algorithm and avoid complex arithmetic; see [36].The matrix S in (4.1) is guaranteed to be real-valued, and the generalized eigenproblem (2.5) is real-valued as well.This guarantees that the relocated poles appear in complex-conjugate pairs.For more details we refer the reader to [4, section 6.1.4].Note that (5.1) is equivalent to for any scalars µ, ν, ρ, η ∈ C such that µρ = νη.By taking µ/ν ≡ h j+1,j /k j+1,j , we can compute where γ j = ρh j+1,j − ηk j+1j for j = 1, 2, . . ., d.
We have overloaded the feval function in MATLAB for RKFUN objects to implement this evaluation procedure.The function can be invoked by typing either If k < 0, then among the d eigenvalues there are −k infinite eigenvalues, or numerically, eigenvalues of large modulus.In our implementation roots of the RKToolbox [5] we hence sort the roots by their magnitudes and return only the m + k smallest ones.

Conversion to partial fraction form.
Here we only consider the case k ≤ 0, i.e., d = m, and pairwise distinct finite poles ξ 1 , . . ., ξ m ; generalizations are discussed in section 7. The conversion of a type (m + k, m) rational function r into partial fraction form can be achieved by transforming the RAD AV m+1 K m = V m+1 H m in such a way that it reveals the residues.We aim to transform the latter RAD into (5.2) RKToolbox [5]: residue Input: Upper-Hessenberg pencil (H m , K m ) with finite distinct poles.Output: Invertible matrices L m+1 and R m representing the conversion.
where W m+1 e 1 = v 1 .One then easily verifies that the columns of W m+1 satisfy w j+1 = (A − ξ j ) −1 v 1 .This conversion is achieved via left-and right-multiplication of the pencil (H m , K m ) by invertible matrices given in Algorithm 5.1.
The algorithm consists of four parts.The first corresponds to lines 1-3, and it transforms the pencil so that the lower m × m part matches that of (5.2).The matrix [0 I m ]K m is invertible since it is upper-triangular with no zero elements on the diagonal (there are no infinite poles), and hence R m is well defined in line 1.The second part corresponds to lines 4-5, and it zeroes the first row in K m .The third part, line 6, takes care of the first row in H m , setting all its elements to one.After this transformation, as the fourth part, we rescale The transformation of V m+1 into the partial fraction basis W m+1 has condition number cond(L m+1 ), which can be arbitrarily bad, in particular if some of the poles ξ j are close together.Our implementation residue in the RKToolbox [5] therefore supports the use of the MATLAB variable precision arithmetic as well as the use of the Advanpix Multiprecision Toolbox [1].
6. Numerical experiments.In the following we demonstrate RKFIT with numerical experiments.MATLAB files for reproducing these experiments are part of the RKToolbox [5], among other examples (including those in [6]).Additionally, an RKFIT-based method for computing perfectly matched layers for Helmholtz problems on nonhomogeneous media has been developed and tested in [17].
6.1.MIMO dynamical system.We consider a model for the transfer function of the MIMO system ISS 1R taken from [11].There are three input and three output channels, giving = 9 functions to be fitted.We use N = 2 × 561 sampling points λ j given in [11], appearing in complex-conjugate pairs on the range ±i[10 −2 , 10 3 ].The data are closed under complex conjugation, and hence we can Downloaded 09/22/17 to 130.88.240.95.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phpare plotted in Figure 6.1(c).The misfit achieved to the new 56 (respectively, 54) poles corresponds to iteration 5.As this misfit is still below ε tol , no further RKFIT iterations are required.
For the second experiment we compare RKFIT with the vector fitting code VFIT [13,24,26] for two different choices of initial poles, and with different normalization conditions for VFIT.(We briefly review VFIT in subsection A.2 of the appendix.)The results are reported in Figure 6.1(d).Here we search for type (m − 1, m) approximants with m = 56, do not enforce the poles to be stable, and do not perform any further degree reductions.The solid convergence curves are obtained with initial poles of the form −ξ/100 ± iξ, with the ξ being logarithmically spaced on [10 −2 , 10 3 ].This is regarded as a good initial guess in the literature.The dashed curves result when using as initial poles the eigenvalues of a real random matrix.In both cases RKFIT outperforms VFIT, independently of the normalization condition used by VFIT.Depending on the 56 initial poles, RKFIT requires either 4 or 6 iterations.This has to be compared to Figure 6.1(b), where the 56 poles selected by our reduction strategy immediately gave a misfit below ε tol so that no further iterations were required.This validates our approximate GCD strategy for choosing the poles after degree reduction.

Pole optimization for exponential integration. Let us consider the problem of solving a linear constant-coefficient initial-value problem
at several time points t 1 , . . ., t .Problems like this arise, for example, after spacediscretization of parabolic PDEs via finite differences or finite elements, in which case K and L are large sparse matrices.Assuming that K is invertible, the exact solutions u(t j ) are given as u(t j ) = exp(−t j K −1 L)u 0 , and a popular approach for approximating u(t j ) is to use rational functions r [j] of the form constructed so that r [j] (K −1 L)u 0 ≈ u(t j ).Note that the poles of r [j] do not depend on t j and we have the evaluation of which amounts to the solution of m decoupled linear systems.Such fixed-pole approximants have great computational advantage, in particular in combination with direct solvers (the LU factorization of ξ i K − L can be used for all t j ) and on parallel computers.The correct design of the pole-residue pairs (ξ i , σ [j] i ) is closely related to the scalar rational approximation of e −tz , a problem which has received considerable attention in the literature [34,32,42,18,9].Let us assume L is Hermitian positive semidefinite, and K is Hermitian positive definite, and let us define the vector K-norm In order to use RKFIT for finding poles ξ 1 , . . ., ξ m of the rational functions r [j]  such that the right-hand side (6.1) of the inequality is small for all j = 1, . . ., , we propose a surrogate approach similar to that in [9].Let A = diag(λ 1 , . . ., λ N ) be a diagonal matrix with "sufficiently dense" eigenvalues on λ ≥ 0. In this example we take N = 500 logspaced eigenvalues on the interval [10 −6 , 10 6 ].Further, we define = 41 logspaced time points t j on the interval [10 −1 , 10 1 ], and the matrices F [j] = exp(−t j A).We also define b = [1 . . .1] T to assign equal weight to each eigenvalue of A and then run RKFIT for finding a family of type (m − 1, m) rational functions r [j] with m = 12 so that absmisfit and hence a small misfit implies that all r [j] are accurate uniform approximants for e −t j λ on the eigenvalues Λ(A).If these eigenvalues are dense enough on λ ≥ 0, one can expect the upper error bound (6.1) to be tight.Figure 6.2(a) shows the convergence of RKFIT, starting from an initial guess of m = 12 poles at infinity (iteration 0 corresponds to the absolute misfit of the linearized rational approximation problem).We find that RKFIT attains its smallest absolute misfit of ≈ 3.44 × 10 −3 after 6 iterations.From iteration 7 onwards the misfit slightly oscillates about the stagnation level.To evaluate the quality of the common-pole rational approximants for all = 41 time points t j , we perform an experiment similar to that in [42, Figure 6.1] by approximating u(t j ) = exp(−t j L)u 0 and comparing the result to the MATLAB function expm.Here, L ∈ R 2401×2401 is a finite-difference discretization of the scaled 2D Laplace operator −0.02∆ on the domain [−1, 1] 2 with homogeneous Dirichlet boundary condition, and u 0 corresponds to the discretization of u 0 (x, y) = (1 − x 2 )(1 − y 2 )e x on that domain.Figure 6.2(b) shows the error u(t j ) − r [j] (L)u 0 2 for each time point t j (solid curve with circles), together with the approximate upper error bound exp(−t j A)b − r [j] (A)b ∞ (dotted curve).We see that the error is approximately uniform and smaller than 6.21 × 10 −5 over the whole time interval [10 −1 , 10 1 ].The m = 12 poles of the rational functions r [j] are shown in Figure 6.2(c) (circles).
Another approach for obtaining a family of rational approximants is to use contour integration [42].Applying an m-point quadrature rule to the Cauchy integral on a contour Γ enclosing the positive real axis, one obtains a family of rational functions r [j] whose poles are the quadrature points ξ i ∈ Γ and whose residuals σ i depend on t j .As has already been pointed out in [42], such quadrature-based approximants tend to be good only for a small range of parameters t j .In Figure 6 A.5. Convergence.To date, no complete convergence analyses for VFIT and RKFIT are available.Both algorithms have the property that if a rational function is fitted with sufficiently many nodes, then in the absence of rounding errors this function is recovered exactly; see [30,Corollary III.1] and our Theorem 2.2.Some further work is available for VFIT.In [30, section IV], and subsequently in [41], a degree m = 2 example is constructed where the VFIT fixed-point iteration is repellent and hence diverges, independently of the starting guess for the poles.Furthermore, it is known that VFIT does not necessarily satisfy first-order optimality conditions for the nonlinear LS problem upon convergence to a fixed point [41].In our numerical experiments we typically observe that RKFIT reduces the fitting error more efficiently than VFIT; however, oscillations around a stagnation level may still occur (see, e.g., Figure 6.2(a)).Furthermore, we observed that for the example specified in [41,Table I], RKFIT exhibits an oscillatory behavior similar to that of VFIT.
Despite a few constructed examples of nonconvergence, VFIT has been used successfully by the IEEE community for various (large-scale) rational fitting problems.We have argued and demonstrated with (scalar) examples that RKFIT is more robust and typically converges faster than VFIT.Additionally, unlike VFIT, RKFIT is equipped with an automated degree reduction procedure.Therefore, we believe that RKFIT may be a useful algorithm for the IEEE community.For nonscalar approximation problems where A and F are not necessarily diagonalizable, we are currently not aware of an algorithm similar to RKFIT.
(polynomial) Krylov space K m+1 (A, b) := {p m (A)b : p m ∈ P m } as a special case of a rational Krylov space.The smallest integer m such that K m (A, b) = K m+1 (A, b) is denoted by M ≡ M (A, b) and called the invariance index of (A, b).

Figure 3 .
2 illustrates our reduction strategy for the function F = A + A 2 , where A = tridiag(−1, 2, −1) ∈ R N ×N and N = 150.The vector b is chosen as b = e 1 .The poles of the search space are obtained after three RKFIT iterations with all initial poles at infinity.

5 .
Working with rational functions.After the RKFIT algorithm has terminated, a rational function r of type (m + k, m) is represented by the pencil (H d , K d ), satisfying AV d+1 K d = V d+1 H d with d := max{m, m + k}, and with the coefficients c = V * d+1 F b/ b 2 .We now show how to perform computations with such an RKFUN representation r ≡ (H d , K d , c).Downloaded 09/22/17 to 130.88.240.95.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php(5.1) AW d+1 K d = W d+1 H d by rerunning the rational Arnoldi algorithm with the starting vector W d+1 e 1 = b.
feval(r, A, b) or r(A, b).5.2.Root-finding.For finding the roots of r, we recall that r(A)b/ b 2 = V d+1 c = p d (A)q m (A) −1 b.Let us assume that c = e 1 ; otherwise r(A)b = c 1 b, i.e., r has no roots.Define P = I m+1 − 2uu * , where u = (γc − e 1 )/ γc − e 1 2 and γ ∈ C is a unimodular scalar such that γe * 1 c is real and nonnegative.It follows from [6, Theorem 4.4] that the roots of p d are the eigenvalues of the d × d pencil 0 I d P H d , 0 I d P K d .
[0 I m ]K m in lines 7-8 to recover I m .The process corresponds to transforming the original H m and K m as H m := L m+1 H m R m and K m := L m+1 K m R m , and the rational Krylov basis V m+1 is transformed accordingly as W m+1 = V m+1 L −1 m+1 .Given a coefficient representation r(A)b = b 2 V m+1 c m+1 in the basis V m+1 , we arrive at the partial fraction expansion r(A)b = b W m+1 d m+1 = d 0 b + m j=1 d j (A − ξ j I) −1 b, with residues d m+1 = L m+1 c m+1 = [d 0 d 1 . . .d m ] T .