The Structured Condition Number of a Differentiable Map between Matrix Manifolds, with Applications

. We study the structured condition number of differentiable maps between smooth matrix manifolds, extending previous results to maps that are only \BbbR  -differentiable for complex manifolds. We present algorithms to compute the structured condition number. As special cases of smooth manifolds, we analyze automorphism groups, and Lie and Jordan algebras associated with a scalar product. For such manifolds, we derive a lower bound on the structured condition number that is cheaper to compute than the structured condition number. We provide numerical comparisons between the structured and unstructured condition numbers for the principal matrix logarithm and principal matrix square root of matrices in automorphism groups as well as for the map between matrices in automorphism groups and their polar decomposition. We show that our lower bound can be used as a good estimate for the structured condition number when the matrix argument is well conditioned. We show that the structured and unstructured condition numbers can differ by many orders of magnitude, thus motivating the development of algorithms preserving structure.

that cond(log, X) = e a > 1, but if we restrict f to \scrM , the real symplectic group, then cond \scrM (log, X) = a/ sinh(a) < 1 (the same result holds when \scrM is the complex symplectic group or \scrM is the conjugate symplectic group). Hence we conclude that cond \scrM (f, X) \ll cond(f, X) is possible since for X = diag(e a , e - a ), the ratio cond \scrM (log, X)/cond(log, X) = a/(e a sinh a) exponentially decays when a \rightar \infty . We show with this simple example that the lower bound on cond \scrM (f, X) obtained in section 3 can be attained. In section 4, we illustrate through numerical experiments the quality of the lower and upper bounds derived in section 3 for the matrix logarithm and matrix square root of matrices in automorphism groups as well as for the unitary polar factor of polar decompositions. The experiments show that our lower bound on cond \scrM (f, X) tends to be much sharper than our upper bound, and the former, when combined with a backward error, provides a good approximation to the forward error \| f (Y ) -f (X)\| .

Structured and unstructured condition numbers of matrix functions.
We start with a brief summary of the theory and algorithms for the unstructured condition number of a matrix function before discussing the structured case. Because the relative condition number of f at an n \times n matrix X, denoted by rcond(f, X), can be written in terms of the absolute condition number cond(f, X) as rcond(f, X) = cond(f, X) \cdot \| X\| \| f (X)\| (see [10,Chap. 3]), we just concentrate on absolute condition numbers. Here and throughout, \BbbF = \BbbR or \BbbC .

Unstructured condition number.
Let f be a differentiable endofunction of \BbbF n\times n . The unstructured condition number cond(f, X) in (1.1) can be expressed in terms of the Fr\' echet derivative of f at X, which is an \BbbF -linear map L f (X, \cdot ) : \BbbF n\times n \rightar \BbbF n\times n such that for all E \in \BbbF n\times n . When the Fr\' echet derivative of f at X exists, it is unique. In that case, we have [10,Thm. 3.1] cond(f, X) = max E\not =0 \| L f (X, E)\| \| E\| =: \| L f (X)\| .
An explicit formula for the Fr\' echet derivative is not always available. So we assume that we have a numerical method to evaluate L f (X, E) for a given E. Since the Fr\' echet derivative is linear in E, applying the vec operator, which stacks the columns of a matrix into one long vector [16], to L f (X, E) gives (2.2) vec(L f (X, E)) = K f (X)vec(E), where K f (X) \in \BbbF n 2 \times n 2 is called the Kronecker form of the Fr\' echet derivative. When the canonical bases on \BbbF n 2 and \BbbF n\times n are chosen (as we will implicitly do throughout the paper), then when we identify \BbbF n\times n with \BbbF n 2 through the mapping represented by vec, the matrix K f (X) represents the derivative at X of the function f . If we specialize to the Frobenius norm, then where we use the fact that for A \in \BbbF n\times n , \| A\| F = \| vec(A)\| 2 . The problem of computing cond(f, X) in the Frobenius norm then reduces to finding the 2-norm of K f (X). Note that the latter matrix can be constructed explicitly by forming one column at a time using (2.2), that is, (2.4) K f (X)e i+(j - 1)n = vec(L f (X, e i e T j )), i = 1 : n, j = 1 : n, where e k is a vector of appropriate dimension with the kth entry equal to one and zero everywhere else. Constructing K f (X) this way costs O(n 5 ) operations, assuming that L f (X, E) can be computed in O(n 3 ) operations. Note that, based on (2.1), L f (X, E) can be approximated by finite differences, for a small value of t. We refer the reader to [10,Chap. 3] on how to choose t. When f is a matrix function in the linear algebra sense (see [10,Chap.1] for a precise definition), the formula \biggr] also holds [10] and yields a useful tool to compute Fr\' echet derivatives. However, in this paper we consider a much wider class of functions f , and hence this expression is not always true. A lower bound for cond(f, X) can be computed by the power method applied to K f (X) \ast K f (X), which, for a nonzero matrix E 0 \in \BbbF n\times n , constructs the iterates [15] (2.5) with \gamma k such that \gamma k \leq \| L f (X)\| F and \gamma k \rightar \| L f (X)\| F as k \rightar \infty [5,Chap. 7]. In (2.5), L adj f is the adjoint of L f and is given by where \= f (z) := f (\= z). The computation of the kth iteration in (2.5) costs O(n 3 ) operations, and just a few iterations are usually needed for an accurate bound.
Having fixed a convenient map to express the isomorphism between n \times n complex matrices and real vectors with 2n 2 entries, the Fr\' echet derivative can then be represented in the canonical basis as a 2n 2 \times 2n 2 real matrix K (\BbbR ) f (X) such that (2.6) vec(\rho (L f (X, E))) = K (\BbbR ) where (2.7) \rho : \BbbC n\times n \rightar \BbbR n\times 2n , \rho Now when \BbbF = \BbbC , it also makes sense to consider maps f : \BbbC n\times n \rightar \BbbC n\times n that satisfy the less demanding assumption of real Fr\' echet differentiability, that is, L f (X, E) in (2.1) is real linear (i.e, L f (X, E + \alpha F ) = L f (X, E) + \alpha L f (X, F ) for all \alpha \in \BbbR ; see also Again, K (\BbbR ) f (X) can be explicitly formed by computing its 2n 2 columns as vec(\rho (L f (X, e i e T j ))), vec(\rho (L f (X, ie i e T j ))), i = 1 : n, j = 1 : n, where i = \surd - 1 denotes the imaginary unit. If f is not only real but also complex Fr\' echet differentiable, then, by the Cauchy--Riemann equations, K (\BbbR ) f has the form (we henceforth occasionally omit the dependence on X for notational simplicity) Let unvec be the inverse of the vec operator, which in this case is defined from \BbbF 2n 2 to \BbbF n\times 2n . After applying vec \circ \rho - 1 \circ unvec to (2.6), the Cauchy--Riemann equation where \otimes denotes the Kronecker product, block diagonalizes K (\BbbR ) f , i.e., f \| 2 . This shows that the theory is coherent in the sense that the computation of the unstructured condition number is independent of whether the real or complex Kronecker form of the Fr\' echet derivative is considered.
Example 2.1. Let X \in \BbbC n\times n be nonsingular and let f : X \mapsto \rightar U associate to X the unitary factor U of its polar decomposition X = U H. The map f is real differentiable but not complex differentiable [19]. Now for n = 1 and and f has for the real Fr\' echet derivative in Kronecker form the matrix is not of the form (2.8), revealing that f is not complex Fr\' echet differentiable.
2.2. Structured condition number. Suppose now that g : \scrM \rightar \scrN is a differentiable map, where \scrM , \scrN \subsete \BbbF n\times n are smooth square matrix manifolds. We consider in particular three classes of smooth manifolds: 1. real submanifolds of the n 2 -dimensional real vector space \BbbR n\times n (e.g., orthogonal matrices, real symplectic matrices), c \bigcirc 2019 SIAM. Published by SIAM under the terms of the Creative Commons 4.0 license 2. complex submanifolds of the n 2 -dimensional complex vector space \BbbC n\times n (e.g., complex orthogonal matrices, complex symplectic matrices), and 3. real submanifolds of the 2n 2 -dimensional real vector space \BbbC n\times n (e.g., Hermitian matrices, unitary matrices, symplectic matrices). These submanifolds are often associated with, respectively, real bilinear forms, complex bilinear forms, and sesquilinear forms. We use the expression``square matrix manifold"" to mean any of the three classes of submanifolds of square matrices described above. The choice of considering square matrices is for the sake of simplicity in the exposition; we emphasize, though, that our theory can be easily extended to rectangular matrices.
We need to distinguish between the field in which the entries of the matrices are allowed to lie (i.e., the ambient field ), denoted by \BbbF as in the previous sections, and the field on which the ambient vector space is built (i.e., the base field ), which we instead denote by \BbbK . We will also assume that \BbbK is a (possibly not proper) subfield of \BbbF . To be more concrete, \BbbK = \BbbF (= \BbbR or \BbbC , respectively) for either case 1 or case 2 above, while case 3 of a real submanifold of \BbbC n\times n is special in the sense that \BbbC = \BbbF \not = \BbbK = \BbbR . We will need to deal with case 2 carefully when g is only \BbbR -differentiable since in this case it will be necessary to identify \scrM with \rho (\scrM ) \subsete \BbbR n\times 2n (see Remark 2.2 below).
Clearly, dg X in (2.11) is a \BbbK -linear operator [17,Exerc. 3.7], and it comes equipped with an induced norm, Remark 2.2. When \scrM \subsete \BbbC n\times n is a complex submanifold, there is freedom in the choice of the base field. Indeed, any complex submanifold is also a real submanifold, although the converse does not always hold. If g is only real differentiable, then it is necessary to pick \BbbK = \BbbR , thus (implicitly) identifying the complex submanifold \scrM with the real submanifold \rho (\scrM ) =: \scrM \BbbR \subsete \BbbR n\times 2n , which has base field \BbbR and is such that X \in \scrM if and only if \rho (X) = [Re(X) Im(X)] \in \scrM \BbbR . A version of the following theorem appears in [3, pp. 4--5] and [20,Thms. 3 and 4]. Note, however, that our setting is more general in the sense that real submanifolds of complex matrices (the case \BbbK \not = \BbbF ) are not analyzed in [3] or [20]. Theorem 2.3. Let g : \scrU \rightar \scrV be a \BbbK -differentiable map between two open subsets \scrU \subset \scrM and \scrV \subset \scrN , where \scrM and \scrN are smooth square matrix manifolds of \BbbF n\times n with base field \BbbK . Then for X \in \scrU it holds that Proof. Suppose X, Y \in \scrU and let \gamma : \BbbK \rightar \scrU be any curve such that \gamma (0) = X, \gamma (\epsilon ) = Y . Letting E = \gamma \prime (0) \in T X \scrU , we get Y = X + \epsilon E + o(\epsilon ) by using the definition of derivative along a curve, and by using the definition of differential we get g(Y ) = g(X) + \epsilon dg X (E) + o(\epsilon ). Then Now if f | \scrM denotes the restriction to the manifold \scrM of the map f : \BbbF n\times n \rightar \BbbF n\times n , then, by Theorem 2.3 applied to g = f | \scrM , it follows that for the structured condition number in (1.3), we have that (2.12) cond \scrM (f, X) = \| d(f | \scrM ) X \| .
We now give more details about how the structured condition number (2.12) can be computed, distinguishing between the possible situations depending on \BbbF , \BbbK , and the differentiability properties of f , as summarized in Table 1. (Note that, when f is only real differentiable, we always take \BbbK = \BbbR as the base field.) 2.2.1. When the ambient and base fields are the same. Suppose that \BbbF = \BbbK and that the map f : \BbbF n\times n \rightar \BbbF n\times n is \BbbK -Fr\' echet differentiable (and so is the restriction of f to the manifold \scrM ). The uniqueness of the differential and the uniqueness of the Fr\' echet derivative imply that for any E \in T X \scrM , Since T X \scrM is a \BbbK -linear subspace of \BbbF n\times n , the linear nature of E \in T X \scrM is then encoded by where B \in \BbbF n 2 \times p is a matrix of full rank giving (in essence) a basis for T X \scrM and y \in \BbbK p with p = dim \BbbK T X \scrM being a vector of parameters; see section 3 for examples of how to construct B. Applying the vec operator to (2.13) and using (2.2), (2.14) yields (recalling that here and throughout we are implicitly picking the canonical bases on both \BbbK n\times n and \BbbK n ) where B + denotes the Moore--Penrose pseudoinverse of B. Hence, from (2.12) and using the Frobenius norm, we find that A similar equation holds for the structured condition number in any entrywise p-norm, \| X\| p := ( \sum ij | x ij | p ) 1/p . However, for concreteness, we only report results in the Frobenius norm (p = 2). Observe now that Moreover, if B can be chosen to have orthonormal columns, i.e., B \ast B = I p , then the lower and upper bounds in (2.15) are equal so that , the upper bound in (2.15) is loose and we should use the unstructured condition number cond(f, X) as an upper bound for cond \scrM (f, X).
To compute \| K f (X)B\| 2 , we need to characterize T X \scrM , find its dimension p := dim \BbbK T X \scrM over the base field \BbbK , and then construct a matrix B \in \BbbF n 2 \times p such that for any E \in T X \scrM , vec(E) = By for some y \in \BbbK p . If we assume that we have a numerical method to compute L f (X, E) for a given X \in \scrM and E \in T X \scrM , then we can compute the columns of K f (X)B using where the inverse vec operator unvec is in this case defined from \BbbF n 2 to \BbbF n\times n . Note that (2.17) reduces to (2.4) when \BbbK = \BbbF and X \in \scrM = \BbbF n\times n = T X \scrM so that B = I n 2 in (2.14).
Applying \rho and the vec operator to (2.13) yields and since B \BbbR = B \BbbR B + \BbbR B \BbbR , we have that by (2.12) cond \scrM (f, X) = max The lower and upper bounds on cond \scrM (f, X) follow: Now if B \BbbR has orthonormal columns, then Depending on the manifold and the map, a strict inequality may hold (see Example 3.3). This shows that, when \scrM is not a complex submanifold and even for holomorphic maps f , only the real Fr\' echet derivatives are linked to the structured condition number. Algorithm 2.6. Given (i) any algorithm to compute the Fr\' echet derivative of f : \BbbF n\times n \rightar \BbbF n\times n , (ii) X \in \scrM , where \scrM is a smooth manifold of \BbbF n\times n with base field \BbbK , and (iii) either B \in \BbbF n 2 \times p such that for any E \in T X \scrM , vec(E) = By for some y \in \BbbK p if \BbbK = \BbbF , or B \BbbR \in \BbbF 2n 2 \times p such that for any E \in T X \scrM ,

Computation of cond
operations, assuming that L f (X, E) can be computed in O(n 3 ) operations, and the cost of computing the 2-norm of K in step 6 is O(p 2 n 2 ), and so Algorithm 2.6 costs O(pn 3 + p 2 n 2 ) operations, which is high in particular when p = O(n 2 ). As in the unstructured case, once B is known, we can use the power method to obtain a lower bound for \| K f (X)B\| 2 , which corresponds to a lower bound for cond \scrM (f, X) when B has orthonormal columns.
Analogous remarks hold for \BbbK \not = \BbbF , except that the factor 2 in the sizes of K (\BbbR ) f and B \BbbR leads to higher constants in front of the asymptotic complexities.
Algorithm 2.7. Given the same input as Algorithm 2.6, this algorithm uses the power method to compute \gamma such that \gamma \leq \| K f (X)B\| 2 for \BbbK = \BbbF or \gamma \leq \| K 1 Choose a nonzero starting vector z 0 \in \BbbK p . 2 for k = 0: \infty Unless B (or B \BbbR ) has a special structure that can be exploited in steps 4 and 11 (or steps 6 and 13), the cost of Algorithm 2.7 is O(kpn 2 ) operations, where k is the number of iterations and we assume that p \geq n and L f (X, E k ) and L adj f (X, W k+1 ) can be computed in O(n 3 ) operations. 3. Application: Matrix manifolds associated with scalar products. We illustrate the technique presented in section 2.2 on smooth square matrix manifolds \scrM associated with a scalar product on \BbbF n , that is, a nondegenerate bilinear or sesquilinear form defined by any nonsingular matrix M by, for x, y \in \BbbF n , x T M y for real or complex bilinear forms, x \ast M y for sesquilinear forms.
For any matrix A \in \BbbF n\times n , there exists a unique A \star \in \BbbF n\times n called the adjoint of A with respect to \langle \cdot , \cdot \rangle M and given by There are three classes of structured matrices associated with \langle \cdot , \cdot \rangle M : a Jordan algebra 3 \BbbJ M , a Lie algebra \BbbL M , and an automorphism group \BbbG M defined by For several special choices of M , a specific nomenclature exists to indicate these sets. For example, for real bilinear forms (\BbbF = \BbbR ) and \bullet M = I, \BbbJ I , \BbbL I , and \BbbG I are the set of symmetric, skew-symmetric, and orthogonal matrices, respectively; with p + q = n, \BbbJ Sp,q , \BbbL Sp,q , and \BbbG Sp,q correspond to the class of pseudosymmetric, pseudoskew-symmetric, and pseudo-orthogonal matrices, respectively; \bullet M = J := \bigl[ 0 - I n/2 I n/2 0 \bigr] with n even, \BbbJ J , \BbbL J , and \BbbG J correspond to the class of skew-Hamiltonian, Hamiltonian, and symplectic matrices, respectively; , \BbbJ R , \BbbL R , and \BbbG R correspond to the class of persymmetric, perskew-symmetric, and perplectic matrices, respectively. When M defines a real (respectively, complex) bilinear form, the three matrix classes defined above are real (respectively, complex) smooth manifolds; when M defines a sesquilinear form, they are real smooth submanifolds of the 2n-dimensional real vector space \BbbC n\times n . This is immediate for \BbbJ M and \BbbL M , as they are \BbbK -linear subspaces, and hence they are (flat) smooth manifolds. Automorphism groups are not vector subspaces, but they are known to be smooth manifolds, as we now show.
Theorem 3.1. The automorphism group \BbbG M is a real submanifold of \BbbF n\times n . Furthermore, when M defines a complex bilinear form, \BbbG M is also a complex submanifold of \BbbC n\times n .
Proof. The first part of the statement is [21,Thm. 7.17]. For the second part, note that \BbbG M is the set of solutions of the quadratic matrix equation M = X T M X. Since the latter is equivalent to either n 2 complex polynomial equations or 2n 2 real polynomial equations, it follows that \BbbG M is both a complex algebraic variety and a real algebraic variety. Recall that a complex (respectively, real) algebraic variety is a complex (respectively, real) manifold if and only if it does not contain singular points, i.e., points such that the rank of the Jacobian is locally smaller than the dimension of the variety cut out by the aforementioned polynomial equations. It now suffices to observe that (in the canonical bases of \BbbC n\times n as a complex and a real vector space, respectively), by the Cauchy--Riemann equations, \scrJ = Re \scrJ + i Im \scrJ is the Jacobian of \BbbG M as a complex algebraic variety if and only if \scrJ \BbbC = \bigl[ Re \scrJ -Im \scrJ Im \scrJ Re \scrJ \bigr] is the Jacobian of \BbbG M as a real algebraic variety. The statement then follows by observing that \scrJ \BbbC is unitarily similar to \scrJ \oplus \scrJ \ast . Now suppose that X belongs to an automorphism group, Lie algebra, or Jordan algebra. Suppose, moreover, that f (X) is a matrix function in the linear algebra sense (see [10,Chap. 1] for a precise definition). It is shown in [11,Thm. 3.1] that, for bilinear forms, holds for all matrix functions f , assuming that f is defined at X and X \star . For sesquilinear forms, (3.1) holds when, for example, the function f has a convergent power series representation f (X) = \sum \infty k=0 \alpha k X k , with \alpha k \in \BbbR . Assuming that f is defined at the indicated arguments and that f satisfies (3.1), we have the following: Note that the factor U is à generalized matrix function"" in the sense of [6]. The existence of structured singular value decompositions (see, e.g., [4], [22]) can be used to derive conditions on f such that other generalized matrix functions preserve structure. A precise statement is beyond our scope here and left for future research.
3.1. Computation of cond \BbbS \bfitM (\bfitf , \bfitX ) when \BbbS \bfitM is a Jordan or Lie algebra. Davies [2] showed that for bilinear forms with M = \pm M T and M T M = I, i.e., when the scalar product is orthosymmetric and unitary [18], cond(f, X) = cond \BbbS M (f, X) for \bullet all functions f and either X \in \BbbJ I or X \in \BbbL I (i.e., for symmetric and skewsymmetric matrices X), and \bullet even functions f and X \in \BbbL M . Davies also showed that equality between structured and unstructured condition numbers for X \in \BbbJ M does not always hold but the ratio cond(f, X)/cond \scrM (f, X) is bounded in terms of n for all functions f . On the other hand, the ratio is unbounded for odd functions and X \in \BbbL M (see [2, Tab. 6.1]). For sesquilinear forms with M = \pm M T \in \BbbR n\times n and M T M = I, Davies showed that cond(f, X) = cond \scrM (f, X) for Jordan algebras and that equality also holds for Lie algebras when f is an odd or even function.
In what follows, we show how to compute or approximate cond \scrM (f, X) when it differs from cond(f, X). We make the simplifying assumption that M = \mu M T \in \BbbR n\times n with \mu = \pm 1 (which is true in essentially all the cases of practical interest), but we do not assume that M is orthogonal, as in [2]. The techniques we use are similar to that in [2].
It follows directly from (2.10) that the tangent space of any vector subspace is the vector subspace itself. Hence, for any X \in \BbbS M with \BbbS M \in \{ \BbbJ M , \BbbL M \} , To construct a basis for T X \BbbS M , we start with bilinear forms first and then use the results to construct a basis when the scalar product is a sesquilinear form.  Since P has eigenvalues 1 with multiplicity n(n + 1)/2 and - 1 with multiplicity n(n -1)/2, and M is nonsingular, we have that rank \bigl( (M T \otimes I n )P -s(I n \otimes M ) \bigr) = rank \bigl( (\mu P -sI n 2 )(I n \otimes M ) \bigr) = n(n -\mu s)/2 so that the dimension of the null space of (\mu P -sI n 2 )(I n \otimes M ) is n(n + \mu s)/2. Hence, from this and on using (3.8) and vec(E) = B \BbbS M y, it follows that (3.9) range(B \BbbS M ) = null \bigl( (P -\mu sI n 2 )(I n \otimes M ) \bigr) .
If we define D \BbbS M \in \BbbR n 2 \times p by where \widetil D \mu s \in \BbbR n 2 \times n(n - 1)/2 has for columns the n(n -1)/2 unit vectors (e (i - 1)n+j + \mu se (j - 1)n+i )/ \surd 2, 1 \leq i < j \leq n, and \v I \in \BbbR n 2 \times n has for columns the vectors e (i - 1)n+i , i = 1, . . . n, then D \BbbS M has full rank and orthonormal columns, and from (3.5)--(3.6) we have that (P -\mu sI n 2 )D \BbbS M = 0. From this and (3.9), it follows that Note that if M is orthogonal then B \BbbS M = \mu (I n \otimes M )D \BbbS M has orthonormal columns and its construction is essentially computation free. In this case, (3.12) cond \BbbS M (f, X) = \| K f (X)(I n \otimes M )D \BbbS M \| 2 and we refer the reader to section 2.2 for the computation or approximation of cond \BbbS M (f, X). Davies showed in [2, Thm. 3.3] that (using our notation) which is a less compact expression than in (3.12) since D \BbbS M has p = n(n + \mu s)/2 < n 2 columns for n > 1. When M is not orthogonal, then, unless it has a special structure that somehow allows the use of computational tricks, orthogonalizing the columns of B \BbbS M , or equivalently computing B + \BbbS M , costs O(n 6 ) operations. However, instead of orthonormalizing B or computing its pseudoinverse we can use (2.15) to obtain lower and upper bounds for cond \scrM (f, X) that are cheaper to compute than cond \scrM (f, X). We note that \| B \BbbS M By exploiting the special structure of B \BbbS M in (3.11), approximating \| K f (X)B \BbbS M \| 2 using Algorithm 2. 3.1.2. Sesquilinear forms or complex bilinear forms with \BbbR -differentiable map. When either the scalar product is a sesquilinear form or a complex bilinear form but we are dealing with a map that is only real differentiable, the ambient and base fields differ (see section 2.2.2). The latter case is simple to deal with since under our assumptions on M , the matrix in (3.11) is real, and on using the map \rho in (2.7), we have that for some z \in \BbbR n 2 . The matrices B \BbbJ M and B \BbbL M are as in (3.11) so that the n 2 \times n 2 matrix forms a basis over the base field \BbbR for T X \BbbJ M (i.e., p = n 2 ). Similarly, we find that  3.2. Computation of cond \bfscrM (\bfitf , \bfitX ) when \bfscrM is an automorphism group. The next lemma provides an explicit characterization of T X \BbbG M .
Lemma 3.2. Let \BbbG M be the automorphism group of a scalar product \langle \cdot , \cdot \rangle M on \BbbF n and let X \in \BbbG M . Then the tangent space at X to \BbbG M is given by where \BbbL M is the Lie algebra associated with \langle \cdot , \cdot \rangle M .
Proof. We only prove the lemma for bilinear forms, the proof for sesquilinear forms being analogous. We start by showing that By definition, E \in T X \BbbG M is equivalent to the existence of a smooth curve \gamma (t) \in \BbbG M satisfying \gamma (0) = X, \gamma \prime (0) = E. Now \gamma (t) \in \BbbG M is equivalent to \gamma (t) T M \gamma (t) = M . By differentiating the latter equation and evaluating at t = 0, we obtain \gamma \prime (0) T M \gamma (0) + \gamma (0) T M \gamma \prime (0) = 0.
Substituting \gamma (0) = X and \gamma \prime (0) = E gives Defining F := X - 1 E (note that X \in \BbbG M implies that X is nonsingular) we can rewrite ( As a consequence, for complex bilinear forms (\BbbF = \BbbC ) and a real differentiable, but not complex differentiable, f (so that we pick \BbbK = \BbbR ) we have Similarly, for sesquilinear forms we also have \BbbF = \BbbC \not = \BbbR = \BbbK and Now if M and X are not orthogonal or unitary, then B \BbbG M does not have orthonormal columns and orthonormalizing these columns, or computing the pseudoinverse, can cost as much as O(n 6 ) operations. As for the Lie and Jordan algebra cases, we can use (2.15) to obtain lower and upper bounds for cond \BbbG M (f, X) that are hopefully cheaper to compute than cond \BbbG M (f, X). Note that \| B \BbbG M \| 2 \leq \| M - 1 \| 2 \| X\| 2 and Then, for the case of (real or complex) bilinear forms, (2.15) yields the following lower and upper bounds on the structured condition number: As for the bounds in If M is orthogonal, then \| X - 1 \| = \| X\| for both the 2-norm and Frobenius norm since X - 1 = X \star = M - 1 X T M so that \| X\| = \kappa (X) 1/2 , where \kappa (X) := \| X\| \| X - 1 \| . In particular, if X is well-conditioned, i.e., \kappa F (X) \approx 1, then \| K f (X)B \BbbG M \| 2 /\| f (X)\| F offers a good estimate of the relative structured condition number since (3.27) Hence, the quality of the bounds in (3.26) is likely to be influenced by the condition number of X.
Using Lemma A.1 and (2.18), and by an argument analogous to the one for bilinear forms, we obtain the following lower and upper bounds on the structured conditioned number for automorphism groups associated with sesquilinear forms and complex bilinear forms when f is only \BbbR -differentiable: Example 3.3. We apply our theory to the 2 \times 2 real diagonal symplectic matrix , a > 0.
Note that X is in three different automorphism groups \BbbG J : 1. the real symplectic group associated with the real bilinear form defined by J = \bigl[ 0 These bases will be needed in the computation of the structured condition numbers and their lower and upper bounds.
As differentiable map f , we consider the principal matrix logarithm [10]. Since X has no eigenvalues on \BbbR -, log X - 1 = -log X, and since X \in \BbbG J , we have from section 3 that log X \in \BbbL J . Indeed, log is Hamiltonian. To compute the unstructured condition number, we construct K log (X) one column at a time using K log (X)e i+2j - 2 = vec(L log (X, e i e T j )), i, j = 1, 2, as in (2.4). Because X and e i e T i commute, we have that L log (X, e i e T i ) = X - 1 e i e T i , i = 1, 2 [10, Prob. 3.8]. Since X + e i e T j is triangular, we can use the explicit expression for the matrix function of a triangular matrix in [10, p. 84] and the definition of the Fr\' echet derivative in (2.1) to get an expression for L log (X, e i e T j ). We find that showing that the unstructured absolute and relative condition numbers increase rapidly with a. For the real and complex symplectic groups, we find that Similarly, for the conjugate symplectic group, we have that where K (\BbbR ) log (X) = K log (X) \oplus K log (X). Hence, for all three symplectic groups, the ratio cond \BbbG J (log, X)/cond(log, X) = a/(e a sinh a) exponentially decays as a \rightar \infty . The lower and upper bounds in both (3.26) and (3.28) yield showing that for this particular matrix and function f , the lower bound is attained, whereas the upper bound is larger than cond(log, X). Also, for this particular choice of f and X, and the conjugate symplectic group, equality holds in (2.20); that is, f (X) \widetil B \BbbG J \| 2 for some orthonormalization \widetil B \BbbG J of the basis B \BbbG J for the conjugate symplectic group, but this is not always the case, as we next illustrate.
Example 3.4. Let us consider f (X) = X 2 , X as in (3.29) with a = log 2 and the conjugate symplectic group as the manifold. Since f is complex differentiable, we can compute K (\BbbC ) f (X) \equiv K f (X) and find that K Now, for an orthonormalization \widetil B \BbbG J of the 4 \times 4 matrix B \BbbG J for the conjugate symplectic group, we find that \| K (\BbbC ) 17 and hence the inequality is strict in (2.20) for that particular case so that cond \BbbG J (f, X) \not = \| K Example 3.5. Finally, consider the map f that associates to X the unitary factor of its polar decomposition X = U H. This map is real differentiable but not complex differentiable. For the matrix X in (3.29), we have that U = I 2 . Following [19,Cor. 3.12], we can compute the Kronecker form of the Fr\' echet derivative of the unitary factor analytically. Indeed, using [19, eq. (3.13) where \circ denotes the Schur product and, for our choice of X, so that the unstructured condition number is cond(f, X) = 1/ cosh(a). For the manifold of real symplectic matrices, we find that Hence, cond \BbbG J (f, X)/cond(f, X) = 1. For complex perturbations, the Kronecker form of the real Fr\' echet derivative is an 8 \times 8 matrix K and cond \BbbG J (f, X)/cond(f, X) = 1/(e a cosh(a)), which decays exponentially with a.
Finally, for the real manifold of conjugate symplectic matrices \BbbG J , it follows from . It can be readily proved that Hence, cond \BbbG J (f, X)/cond(f, X) = 1/(e a cosh(a)), which again decays exponentially with a.
4. Numerical experiments. The purpose of this section is to compare the structured and unstructured condition numbers numerically and to illustrate the quality of the lower and upper bounds on the structured condition number for automorphism groups of real and complex bilinear forms and sesquilinear forms displayed in (3.26) and (3.28) since these bounds are cheaper to compute than cond \scrM (f, X). All our experiments are performed with MATLAB R2017a, for which the unit roundoff is u \approx 1.1 \times 10 - 16 .
We consider the maps where log X is the principal logarithm of X, X 1/2 is the principal square root of X, and U is the unitary factor in the polar decomposition X = U H of X. Both f 1 and f 2 are complex differentiable, but f 3 is only real differentiable. Algorithms 2.6 and 2.7 require an algorithm to compute L f (X, E) for a given E. For the logarithm, we use the MATLAB function logmfrechetpade from Higham's Matrix Function Toolbox [8]. For the matrix square root, L f2 = L f2 (X, E) is the solution to the Sylvester equation XL f2 + L f2 X = E [10, p. 134], which can be computed with the function sylvsol from [8]. derivatives of any generalized matrix function [6]. We prefer the latter approach for efficiency and numerical stability.
For our numerical experiments, we use Algorithm 2.6 and report the relative unstructured/structured condition numbers, i.e., rather than the absolute ones, as we will be varying the condition number of X. The upper and lower bounds in (3.26) and (3.28) multiplied by \| X\| F /\| f (X)\| F provide upper and lower bounds for rcond \BbbG M (f, X). These bounds and condition numbers are reported in Figure 1 for the principal logarithm of real perplectic, real symplectic, complex symplectic, and conjugate symplectic matrices of increasing condition numbers. In Figure 2, we report the same quantities for the principal square root of real pseudo-orthogonal matrices with p = q = 5, real pseudo-orthogonal matrices with p = 1, q = 9, complex pseudo-orthogonal matrices with p = q = 5, and pseudo-unitary matrices with p = q = 5. Figure 3 compares the relative unstructured/structured condition numbers for the map f 3 as well as the lower and upper bounds on rcond \BbbG M (f, X). All plots show that the unstructured condition number can be much larger than the structured one, in particular when the argument X of the matrix function has a large condition number. When the latter is not too large, the lower bound in (3.26) or (3.28) offers a good approximation to cond \BbbG M (f, X).
Our numerical experiments also suggest that for most of the real automorphism groups we consider (see Figure 1(a)--(b), Figure 2(a), and Figure 3(a)) the lower bound in (3.26) is a good approximation to cond \BbbG M (f, X) even for badly conditioned matrices X. An explanation of why (3.26) is a good approximation for many, but not all, groups is left for future research; here, we note that more insight could be gained by studying the angle between the dominant singular vector of the matrix B and the vector y that achieves the maximum in the definition of a structured condition number. The plots show that the upper bound ubcond(f, X) is in general not sharp, in particular when \kappa 2 (X) is large, as expected from the analysis in section 3.2. Our experiments show that cond(f, X) is usually a better upper bound on cond \BbbG M (f, X) than the upper bound in (3.26) or (3.28) but not always as plot (a) in Figure 1 indicates.
Let A = X 1/2 be the principal square root of X and let \widehat A be the computed square root of X. The backward error of \widehat A is \| E\| F , where E = \widehat A 2 -X is the unique matrix satisfying \widehat A = (X + E) 1/2 . Then, on using (1.2), we get the following approximate      upper bound on the relative error: When X \in \BbbG M and X 1/2 is computed with a structure preserving algorithm such as those derived in [11], cond(sqrt, X) can be replaced by cond \BbbG M (sqrt, X) in (4.1), yielding a sharper upper bound. This is illustrated in Figure 4 for symplectic and pseudo-orthogonal matrices. To obtain the computed square root \widehat A, we use the structure preserving and cubically converging iteration      Structured and unstructured relative condition numbers, and lower and upper bounds on the structured condition number for the principal square root of 10 \times 10 randomly generated real pseudo-orthogonal matrices with p = q = 5 in plot (a), real pseudo-orthogonal matrices with p = 1, q = 9 in plot (b), complex pseudo-orthogonal matrices with p = q = 5 in plot (c), and pseudo-unitary matrices with p = q = 5 in plot (d), with increasing condition number \kappa 2 (X).
where Y k , Z k \in \BbbG M and Y k \rightar X 1/2 [11, sect. 6]. For the relative error err( \widehat A) in (4.1), we use funmx from Higham's Matrix Function Toolbox [8] to compute X 1/2 in extended precision. The relative errors are plotted as``\ast "" in Figure 4, and the test matrices are sorted with increasing values of err( \widehat A). We compare these relative errors to the unstructured bound in (4.1). The structured bounds in Figure 4 correspond to (4.1) with cond(sqrt, X) replaced with cond \BbbG M (sqrt, X), and the approximate bound corresponds to (4.1) with cond(sqrt, X) replaced with the lower bound on cond \BbbG M (sqrt, X) displayed in (3.26) or (3.28) (so the approximate bound is not a strict bound). The plot on the right-hand side shows that even when lbcond(f, X) is an order of magnitude smaller than the relative structured condition number rcond struc (sqrt, X) (which happens for pseudo-unitary matrices with p \not = q), the product of the lower bound on the relative structured condition number times the relative backward error, i.e.,      lbcond(f, X) \times \| \widehat A 2 -X\| F /\| X\| F , offers a good approximation of err( \widehat A).

Concluding remark.
We emphasize that in this work we have focused on manifolds embedded in a larger Euclidean space, which is the natural theoretical setting for algorithms working on the entries of a structured matrix. Several, but not all, algorithms in numerical linear algebra are in this category. A common alternative approach is to work via the atlas of a manifold, which is, for example, a standard paradigm to represent the manifold of semiseparable matrices. Extending the theory presented in this paper to include the study of computational problems based on parametrizations of manifolds, a setting where the condition number now also depends on the choice of an atlas, is an interesting line of research left for future work.  Relative error bounds (b) Pseudo-unitary X with p = 1, q = 9, and \kappa 2 (X) = 10 5 . Fig. 4. Bounds on the relative error err( \widehat A) = \| \widehat A - X 1/2 \| F /\| X 1/2 \| F for 50 randomly generated 10 \times 10 symplectic matrices X in plot (a) and for 50 randomly generated 10 \times 10 pseudo-orthogonal matrices X in plot (b), where \widehat A is an approximate square root of X computed by the structure preserving iteration (4.2).
after talks related to this research: they led to the addition of clarifying remarks to this paper. We are also grateful to Nick Higham for reading a preliminary draft and giving various suggestions. We thank the referees for carefully reading the paper and providing several suggestions to improve the presentation.