A MULTIGRID APPROACH TO SDP RELAXATIONS OF SPARSE POLYNOMIAL OPTIMIZATION PROBLEMS

. We propose a multigrid approach for the global optimization of polynomial optimization problems with sparse support. The problems we consider arise from the discretization of inﬁnite dimensional optimization problems, such as PDE optimization problems, boundary value problems, and some global optimization applications. In many of these applications, the level of discretization can be used to obtain a hierarchy of optimization models that capture the underlying inﬁnite dimensional problem at diﬀerent degrees of ﬁdelity. This approach, inspired by multigrid methods, has been successfully used for decades to solve large systems of linear equations. However, multi-grid methods are diﬃcult to apply to semideﬁnite programming (SDP) relaxations of polynomial optimization problems. The main diﬃculty is that the information between grids is lost when the original problem is approximated via an SDP relaxation. Despite the loss of information, we develop a multigrid approach and propose prolongation operators to relate the primal and dual variables of the SDP relaxation between lower and higher levels in the hierarchy of discretizations. We develop suﬃcient conditions for the operators to be useful in practice. Our conditions are easy to verify, and we discuss how they can be used to reduce the complexity of infeasible interior point methods. Our preliminary results highlight two promising advantages of following a multigrid approach compared to a pure interior point method: the percentage of problems that can be solved to a high accuracy is much greater, and the time necessary to ﬁnd a solution can be reduced signiﬁcantly, especially for large scale problems.

1. Introduction.Exploiting sparsity with specialized semidefinite programming (SDP) relaxations had a huge impact on the application of SDP relaxations to realistic polynomial optimization problems.Indeed, when using the classical Lasserre hierarchy, it is only possible to solve problems with a few dimensions, but by exploiting the sparsity present in many applications, it is possible to solve problems with several hundred variables [18,36].In this paper, we argue that many applications have additional structure that can be exploited to a similar effect.In particular, many large scale polynomial optimization problems have their origins in the discretization of an infinite dimensional model.The resulting finite dimensional model is sparse but has a large number of degrees of freedom.Optimization models that fit this class are boundary value problems [28], optimization with PDEs [12,4], optimal control [3,8], and Markov decision processes [13], among others.Despite the progress made in the last decade, it is still not possible to solve realistic instances of the models arising in these applications using sparse SDP relaxations.The main contribution of this paper is to show how to take advantage of both sparse and hierarchical structures present in many applications.Our theoretical results suggest that under appropriate conditions we should expect significant improvements in computational complexity.Our numerical results further support this claim, and we show that a multigrid approach can improve the robustness and reduce the time required to solve large scale polynomial optimization problems.
Our approach is inspired by multigrid methods.When solving a system of linear equations, and in some optimization problems, it is widely accepted that if a multigrid method is applicable, then it is often the best numerical method to use [5,33].For examples of the multigrid approach to various optimization problems we refer the interested reader to [5,10,26,39] for PDE optimization, [15] for convex optimization problems in image processing, and [13] for Markov decision processes.The core principle of multigrid methods is to construct a coarse model of the original (high resolution/fine) model and use the information obtained from solving the coarse model to improve the current solution.This approach works extremely well when coarse and fine models share a common structure.Additionally, based on the intuition that the coarse model is a global approximation to the fine model (as opposed to only using local information to construct a search direction), the hope is that multigrid methods can potentially be applied to global optimization problems too.Motivated by the potential numerical improvements and the fact that the coarse model retains global information about the model, we develop the multigrid principle for SDP relaxations of polynomial optimization problems (POP).In particular, we propose a multigrid framework for the SDP relaxation of the following POP: (1) min where f k : R p → R are p-dimensional polynomial functions of degree d k , k = 0, 1, . . ., n 1 − 1, n 2 + 1, . . ., n − p + 1, x k = (x k , x k+1 , . . ., x k+p−1 ), and n 1 , n 2 , n are positive integers such that n 1 +p+1 ≤ n 2 ≤ n−p+1.Note that the problem is sparse in the sense that every variable only appears together with p − 1 of its neighbors.In our numerical experiments we typically have n 1 = 2 and n 2 = n − p.The principal technical difficulty of applying multigrid to a (sparse or otherwise) SDP relaxation of (1) is that the information among the variables is lost through the relaxation process.
In this paper, we take the first steps towards addressing this issue.We show that despite the loss of information, it is still possible to obtain useful information from the coarse SDP relaxation and to construct a good approximation to the solution of the fine SDP relaxation.We do not propose a new hierarchy for polynomial problems.point methods.
A multigrid approach in the context of sparse SDP relaxations was used in [21,22] to solve finite difference approximations to optimal control problems and PDE problems.However, the authors used the standard multigrid operators to interpolate between the variables in the original space, whereas we work directly with the primal/dual variables of the SDP relaxation.The SDP variables contain much more information than just the solution to (1).The additional information can be put to good use in the next level of the hierarchy.This advantage is reflected in our numerical and theoretical results.In particular, we can solve bigger problems with our approach rather than using SDP relaxations as a black box.Another related approach is the application of multigrid methods to SDP relaxations of combinatorial optimization problems [20].Their approach is specific to the particular SDP relaxations appearing in graph problems and are not applicable to the general SDP relaxations of (1) that we consider in this paper.
The rest of the paper is structured as follows: section 2 defines the notation used, and in section 3 we review sparse relaxations for unconstrained problems developed in [36].In section 4 we study the characteristics of the relaxation when applied to problem (1), and section 5 defines the hierarchy at different levels and the prolongation operators.Section 5 also establishes improvements in worst case complexity of an infeasible interior point method when our assumptions are satisfied.Finally, in section 6 we report results from our numerical experiments.
2. Notation and preliminaries.Given a real-valued polynomial function f : R n → R of degree d, we denote the monomial x α1 1 x α2 2 . . .x αn n by x α and denote its coefficient by b α , where α ∈ N n and N is the set of nonnegative integers.Letting Γ n d = {α ∈ N n : i α i ≤ d}, any polynomial of degree at most d can be written as ∈ Φ}, and let u x, A Φ d be a column vector with the monomials |Φ|!d! and will be denoted by g(|Φ|, d), where |Φ| corresponds to the number of elements in the set Φ.For any matrix Q ∈ R r1×r2 , [Q] i,j will correspond to the element in position (i, j) , definite).The matrix I ∈ R r×r will represent the identity matrix, and its size will be understood from the context.The notation so far is standard, and we refer the interested reader to [36] for more details and examples.Below we introduce two definitions that are specific to this paper.

Sparse POP relaxations.
In our work we will use the relaxations formulated in [36] to find an approximate solution for problem (1).In this section, we briefly describe the so-called sparse relaxations for unconstrained problems.
Consider the unconstrained POP for the function f (x) = α∈Γ n d b α x α with even degree d, given by (2) f min In [16] Lasserre developed a hierarchy of SDP relaxations for polynomial minimization and proved that under certain assumptions, it is possible to extract an approximate solution for problems like (2) by solving these relaxations (see [11] for details).This hierarchy can be obtained by viewing (2) as a moment problem or trying to compute a sum-of-squares decomposition of f (x) − f (see, for example, [29,30] for more details).However, if the number of variables (or constraints in the general case) is large, the resulting semidefinite program can be too large to be solved in practice.For the case when the number of elements in the support of the objective function is small, Waki et al. [36] developed a sparse relaxation that reduces the number of variables and constraints in the SDP relaxation.Although this methodology was originally heuristic, Lasserre proved in [17] that the optimal value of this new hierarchy converges to the optimal value of the polynomial problem under some additional technical assumptions.
In order to define the sparse hierarchy, Waki et al. [36] used the structure of the so-called correlative sparsity pattern (CSP) matrix.If R is the CSP matrix, then [R] i,j is nonzero if there exists a monomial with variables x i and x j which has a nonzero coefficient in the objective function, and [R] i,j is zero otherwise.If R is sparse, then problem (2) is called correlatively sparse.Associated to the CSP matrix is the CSP graph G(V, E).The node set is V = {1, 2, . . ., n} and E = {{i, j} : i, j ∈ V, [R] i,j = , i < j}, where is any nonzero real number.The idea is to generate a set of support sets for the polynomial function using the maximal cliques of this graph.However, finding the maximal cliques of a graph is in general NP-hard (see, for example, [2]), and for this reason the sparse relaxations are defined using the maximal cliques of a chordal extension of the CSP graph. 1 We refer the reader to [1,9] for more information about chordal extensions and algorithms to find maximal cliques in chordal graphs.
Let {Φ l } m l=1 be the maximal cliques of a chordal extension of G(V, E), and note that zz 0 for any real vector z.Then, adding the constraints u x, A Φ l w u x, A Φ l w 0 (l = 1, 2, . . ., m) to problem (2), we obtain the following equivalent problem: (3) min where w ≥ d/2 is a degree that denotes the relaxation order.Note that the left-hand side of constraint l is a square matrix containing monomials , and let y = {y α }.If the monomial x α is replaced with the real variable y α , the wth sparse SDP relaxation is given by ( 4) where The matrix M Φ l w (y) is called the wth moment matrix for variables indexed by the set A Φ l w .Note that this SDP relaxation admits a strict interior point (see Theorem 3.1 in [27]).(1).In this section we analyze the sparse relaxation for problem (1) and derive connections between the variables and constraints that will be useful in the multigrid setting.

Sparse POP relaxations for problem
Let R ∈ R n×n be the CSP matrix for (1), and let G(V, E) be the associated CSP graph.The CSP matrix is a band symmetric matrix with bandwidth equal to p − 1.This follows from the fact that the polynomials f k in (1) are functions of {x l , x l+1 , . . ., x l+p−1 } for l = 1, 2, . . ., n − p + 1, and therefore for any α ∈ supp(F n ), if α i > 0 and α j > 0, then |i − j| ≤ p − 1.Given that the graph G is not necessarily chordal, we will consider as a chordal extension of G the graph G(V, E ), where E ⊆ E = {{i, j} : i, j ∈ V, |i − j| ≤ p − 1}.The CSP matrix for this chordal extension is The next lemma establishes that G(V, E ) is indeed a chordal graph and identifies its maximal cliques.Lemma 5.If the CSP matrix R is given by (5), then the associated CSP graph G(V, E ) is chordal, and the maximal cliques are given by Φ l = {l, l + 1, . . ., l + p − 1} for l = 1, 2, . . ., n − p + 1.
Proof.The graph G(V, E ) is chordal because it is an interval graph with intervals [35] for a definition of interval graphs).Using the definition of maximal cliques it is straightforward to show that the sets Φ l for l = 1, 2, . . ., n − p + 1 are indeed the maximal cliques of G(V, E ).
Let d (even) be the degree of F n (x) in (1), and write F n (x) as where A (l,α) can be deduced from the definition of the moment matrix M Φ l w (y) (l = 1, 2, . . ., n − p + 1), and C is a matrix with one in position (1, 1) and zeros everywhere else corresponding to the monomial of degree zero.Due to Lemma 5 all the moment matrices in (6) and S l have the same dimension g(p, w) × g(p, w).Letting X = (X 1 , X 2 , . . ., X n−p+1 ), the dual of problem ( 7) is The matrices in the constraints of the relaxations satisfy important properties that will be used in the proofs.We illustrate two of these properties with an example and summarize them along with other properties in Lemma 7.
Example 6. Suppose that Φ l = {l, l + 1} (i.e., p = 2) and w = 1; then The six matrices in the above equation are independent of the clique Φ i .This means that the matrices multiplying the monomials x i , x i+1 , x 2 i , x i x i+1 , x 2 i+1 in this equation will be the same matrices multiplying the monomials x i+1 , x i+2 , x 2 i+1 , x i+1 x i+2 , x 2 i+2 in the equation for i + 1, respectively (i.e., u(x, ) ).Also, monomials x α with α k > 0 for k ≤ i − 1 or k ≥ i + 2 do not belong to the equation which means they have a zero matrix coefficient.(a) , and therefore, From the equation above we can deduce that for any 1 ≤ i, j ≤ g(p, w) (u x, A Φ l w is a vector with g(|Φ l |, w) = g(p, w) elements), ( 9) (a) We will prove the first part of the statement; the second part can be shown following the same reasoning.Note that if the vectors u are ordered according to some polynomial ordering (see [7] for more on orderings) we have that if )).Therefore, (9) implies that to prove (a) we need to show that the following two conditions are true: If x α = x γ i x γ j , then x α + = x γ + i x γ + j (i.e., A (l,α) i,j = A (l+1,α + ) i,j = 1), and if x α = x γ i x γ j , then w j for any i, j.We can therefore conclude that A (l,α) i,j = 0 for all i, j.
(c) For any α = γ, if Then according to (9), if A (l,α) has a nonzero element in position (i, j), A (l,γ) must have a zero in position (i, j) for any α = γ.This implies that α∈H A (l,α) is a g(p, w) × g(p, w) matrix of ones and zeros, and also that Lower dimensional SDP relaxations.In this section, we will define fine and coarse level problems and relate their corresponding hierarchies of sparse SDP relaxations.The coarse level model has the same structure as the fine level model, but it has fewer dimensions.We will define prolongation operators for the primal and dual variables of the SDP relaxation.The aim of the prolongation operators is to transfer information from the lower dimensional coarse model to the high dimensional fine model.In the multigrid literature this operation is called prolongation, and we adopt the same terminology here.We will study the properties of these operators and establish theoretical results.In particular we will derive conditions that will guarantee that the prolongation solution is within of the true solution (where > 0 is a userspecified parameter).The conditions only include information from the coarse model and thus are easy to compute in practice (Corollary 14).We show that when these conditions are satisfied for a low tolerance , the complexity of infeasible interior point methods is expected to improve (Theorem 15).
Consider the following problem for 0 ≤ t ≤ n 2 − n 1 − p: (10) Note that t = 0 corresponds to the original problem (which we call the fine problem or the problem at the fine level ); models for t ≥ 1 are lower dimensional problems.We will refer to lower dimensional models as coarse problems or problems at the coarse level.Let variables in the coarse levels (t ≥ 1) and fine level (t = 0) spaces.If the order of the relaxation w is fixed, a sparse SDP relaxation using different values of t can be constructed, (11) SDP t : where Note that all the properties in Lemma 7 are still valid for any fixed t (the underlying POP has the same structure as (1)), and the sets Φ l are the same as in Lemma 5.The next example illustrates other important properties relating the coefficients of the monomials for the different relaxations at different levels (coarse and fine).Lemma 9 formalizes these properties.4 ).
Note that the monomials x αi i x αi+1 i+1 have the same coefficients in both levels for i = 1, 2, 3, and the coefficient of the monomial x 2 4 at level t = 1 is the same as the coefficient of x 2 5 for the level t = 0.
Lemma 9.For any t such that 0 ≤ t ≤ n 2 − n 1 − p, the SDP models in (11) and (12) satisfy the following: does not contain the monomial x α for any k.Using (10) with t = 0 and t = t, we have that the coefficients b n α and b n−t α are determined by the functions , respectively, from where the result follows.
(b) Let does not contain the monomial x α for any k.Setting t = t in (10), we observe that the coefficient b n−t α is determined by the function does not contain the monomial x α for any k.Then, setting t = t + 1 in (10), it can be deduced that the coefficient b , from where the statement follows.
(c) Similarly to (b), the statement in (c) follows by noticing that b n−t α and b n−t α − are determined by the functions 5.1.Analysis for models F n and F n−1 .In this subsection we will analyze the relation between levels n and n − 1 (t = 0 and t = 1).The goal is to define prolongation operators that allow the transformation of any point in the coarse level semidefinite program into the fine level semidefinite program.It is not computationally advantageous to consider a coarse model with n − 1 components since the dimensionality reduction is too small.In section 6 we will show how to use the operators obtained from this simple case in a recursive manner to obtain much coarser models with, for example, n/2 components.
Action of the primal prolongation operators on the coarse primal variables.We denote the prolongation operators from the coarse level by P y and P S for the variables y n−1 and S n−1 , respectively.If y n = P y y n−1 and S n = P S S n−1 , c 2018 SIAM.Published by SIAM under the terms of the Creative Commons 4.0 license Downloaded 03/09/18 to 146.169.147.104.Redistribution subject to CCBY license then these operators prolongate the variables to the fine level following ( 13) and ( 14): where 2 ≤ i 0 ≤ n − p − 1.These operators are linear and depend on an integer i 0 ∈ {2, 3, . . ., n − p + 1}.This means that we can construct different operators by selecting different values of i 0 .To gain intuition on how the primal operators work, note that the fine primal semidefinite program has one more matrix variable than the coarse primal semidefinite program.Suppose that the additional matrix at the fine level is the i 0 th matrix.Then P S computes the i 0 th matrix as an average of the i 0 th and (i 0 − 1)th coarse matrices.Likewise, the operator P y assumes that the additional variables y n α for the fine primal relaxation correspond to the monomials x α with α ∈ B i0 .Obviously, the operators depend on the choice of i 0 , and we discuss how to choose i 0 in practice in subsection 6.1.
For any feasible set for the coarse problems, the following theorem characterizes the feasibility of the prolongated primal variables at the fine level.For y i , S i , X i we define the primal (R i l ) and dual residuals (r i α ) as Theorem 10.Let y n−1 , S n−1 be feasible points for the coarse primal problem (11) for t = 1.If y n = P y (y n−1 ), S n = P S S n−1 are defined according to (13) and (14), respectively, for some where R n l is the residual matrix defined in (15).Proof.(a) S n l is positive semidefinite for l = 1, 2, . . ., n − p + 1 because the point S n−1 is feasible in the coarse relaxation, and hence the coarse matrices are positive semidefinite.
(b) To calculate the feasibility of the primal fine problem, we use Lemma 7(e) to write the feasibility constraints of the n − t relaxation as ( 16) We can now evaluate five different cases for the residual constraints R n l .We will only show the details of two of these cases which illustrate how to use the properties of the problem; the remaining three cases can be proved using a similar argument.Given that y n−1 and S n−1 are feasible points for the coarse primal relaxation, when the variable S n−1 l appears it will be replaced according to the constraints in (11) for t = 1, and the variables y n α and S n l will be replaced by the operators ( 13) and ( 14), respectively.
where (17) was used to replace to go from the third to the fourth equality.
B k , using Lemma 7(c) we can conclude from the previous five cases that the norm of the constraints of the fine primal relaxation are either zero or smaller than g(p, w) 1 .
We note that if 1 → 0, then R n l → 0. We also note that 1 is easy to calculate from coarse information only.
Action of the dual prolongation operators on the coarse dual variables.We now turn our attention to the relationship between the coarse and fine dual variables.As the fine dual relaxation contains one more matrix variable than the coarse relaxation, the operator works in the same fashion as P S for the primal case (i.e., the additional matrix is calculated as an average of the coarse matrices).We perform an analysis similar to that in the primal case.
Let P X be the prolongation operator for the variable X n−1 .As in the case of the primal operators, the dual prolongation will depend on an integer j 0 , which allows us to define different operators.If X where The feasibility of the dual prolongation is proven below.
Theorem 11.Let X n−1 be a feasible point for the coarse dual problem (12) for t = 1.If X n = P X X n−1 is defined according to (18) for some , then for any α ∈ F n , where r n α is the residual defined in (15).Proof.(a) As in the primal case, the matrices in P X X n−1 are positive semidefinite because they are a linear combination of positive semidefinite matrices (X n−1 is feasible for the coarse dual relaxation).
(b) Once more we will divide the proof into different cases to calculate the dual residual for r n α (α ∈ F n ), and we will make use of the following fact.If α ∈ B l , then a consequence of Lemma 7(b) is (19) Given that X n−1 is feasible for the coarse dual relaxation, when the variable b n−1 α appears it will be replaced according to the constraints in (12) for t = 1, and the c 2018 SIAM.Published by SIAM under the terms of the Creative Commons 4.0 license Downloaded 03/09/18 to 146.169.147.104.Redistribution subject to CCBY license variable X n l will be replaced by the operator (18).We will show the details for two of the five cases to illustrate how to use the properties of the problem; the remaining cases can be deduced following a similar procedure.
To find the upper bound of the constraints that are not equal to zero (Cases 4 and 5), let Z k ∈ R g(p,w)×g(p,w) be such that Z k ≤ .Note that A (k,α) is a matrix of zeros and ones with at most g(p, w) 2 − 1 elements different from zero.Then, using the triangle and Cauchy-Bunyakovskii-Schwarz inequalities, we observe that for any Note that for the cases where the constraints are not zero, the number of terms in the summation does not exceed p (in Case ≤ 2 for l = j 0 , j 0 + 1, . . ., j 0 + p − 1, and α ∈ F n , using the previous inequality with = 1 and Z , it is easy to see that the constraints that are not zero are less than g(p, w)p 2 .
As in the primal case, we note that if 2 → 0, then |r n α | → 0, and 2 is easy to calculate from coarse information only.
Note that the primal prolongations were defined for some i 0 (with 2 ≤ i 0 ≤ n − p − 1), and the dual prolongation were defined for some j 0 (with n 1 + p + 1 ≤ j 0 ≤ n 2 − 2), but these numbers do not need to be the same.The constant j 0 for the dual prolongation has to be selected from a bounded set that depends on the variables n 1 and n 2 , but i 0 does not.This is due to the fact that the constraints of the primal relaxation depend only on the sets Φ l (which do not depend on any particular value of n 1 or n 2 ), while the constraints of dual relaxation depend directly on the coefficients of F n , which, given the structure of problem (1), are a function of n 1 and n 2 .The case i 0 = j 0 characterized the duality gap, leading to the following theorem (Theorem 12).Theorem 12. Let y n−1 , S n−1 , X n−1 be feasible points for the coarse primal and dual problems (11) and (12) for t = 1.If y n = P y y n−1 , S n = P S S n−1 , X n = P X X n−1 are defined as in (13), (14), and (18) for some where 1 , 2 are defined in Theorem 10 and Theorem 11, respectively.
Proof.Using the constraints in (11) for t = 1, and properties (e) (with l = i 0 ) and (d) of Lemma 7, we can deduce that Using the definition of 1 in Theorem 10 and Lemma 7(c) with . Replacing X n by P X X n−1 and S n by P S S n−1 , and using the upper bound for S n−1 i0−1 − S n−1 i0 and the condi- To obtain the last inequality, we used the inequality All the bounds calculated so far in this section depend on the terms 1 and 2 defined in Theorem 10 and Theorem 11.It is straightforward to see that if the goal is to obtain a feasible point (y n , S n , X n ) for the fine problem, it would be enough to have a feasible coarse point such that 1 and 2 are zero.In the next corollary, we formalize this idea by giving conditions for a coarse point (y n−1 , S n−1 , X n−1 ) to provide a prolongated point that is -optimal.
(a) If there exist i 0 , then it is possible to prolongate the coarse variables using (13), (14), and (18) to obtain (b) If in addition i 0 = j 0 and (y n−1 , S n−1 , X n−1 ) are optimal points with zero duality gap for the coarse problem, then n−p+1 k=1 X n k , S n k ≤ 2 /(g(p, w)p).5.2.Exploiting multigrid structure in infeasible interior point methods (IPM).The complexity (in terms of number of iterations) of infeasible IPMs depends on the feasibility of the initial points and the associated duality gap.In light of the results of Theorem 10, Theorem 11, and Theorem 12, it is reasonable to expect that if a solution of the coarse level is prolongated and used as an initial point to solve the fine level model using an infeasible IPM, then the complexity will depend again on the variables 1 and 2 defined in Theorem 10, Theorem 11, and the duality gap of the coarse solution.In this subsection we will use the results of the algorithm proposed in [31] to show that its complexity can be improved as long as the values of 1 , 2 and the coarse duality gap are small.Thus the proposed approach is reminiscent of one-way multigrid methods; i.e., we start at the bottom with the coarsest model, and then use the solution of the coarse model to initialize the solution of the model one level up.Our results in the next section will show that this approach can yield significant benefits.
If the infeasible IPM proposed in [31] is used to solve the SDP relaxation at level t = 0 with feasible or near feasible starting points y n,0 , S n,0 , X n,0 ∈ N (γ, τ 0 ), then it will terminate in at most O( √ N ln( 0 / )) iterations (see Theorem 3.7 in [31]), where is the user-specified solution accuracy, N = g(p, w)(n − p + 1), and X n,0 k , S n,0 k .Let y n−1 , S n−1 , X n−1 be a feasible point for the coarse problem, and use ( 13), (14), and (18) with i 0 = j 0 to calculate the fine level point y n,0 , S n,0 , X n,0 as y n,0 = P y (y n−1 ), S n,0 = P S (S n−1 ), and X n,0 = P X (X n−1 ).Then using Theorem 10, Theorem 11, and Corollary 13, and setting µ = If y n,0 , S n,0 , X n,0 ∈ N (γ, τ 0 ), the previous inequality shows how smaller values of µ, 1 , and 2 can reduce the maximum number of iterations needed to achieve a solution with tolerance equal to (smaller values of µ will be expected if the initial coarse point is close to the coarse solution).Although it is not possible to guarantee that any prolongated solution of the coarse level will belong to N (γ, τ 0 ) for some γ ∈ (0, 1), the next result shows that if 1 and 2 are small enough, and the coarse point is close to the infeasible central path of the coarse relaxation (i.e., close to the set of points Theorem 15.Under the assumptions of Theorem 12, if X n−1 k 0 and S n−1 k 0 (k = 1, 2, . . ., n − p) and µ > 0, then τ 0 > 0 and ρ(X n,0 , S n,0 , τ 0 ) , and w 1 , w 2 < ∞ are constants that depend only on the parameters n, p and the order of the relaxation w.
Proof.Replacing S n,0 and X n,0 by the prolongated coarse solutions and using Note that λ i (X n,0 i0 S n,0 i0 ) > 0 for all i because X n,0 i0 and S n,0 i0 are positive definite (see Corollary 7.6.2 in [14]), and therefore X n,0 i0 , S n,0 i0 = Tr(X n,0 i0 S n,0 i0 ) = i λ i (X n,0 i0 S n,0 i0 ) > 0. From this it follows that τ 0 > 1 N (N − g(p, w))µ > 0. To prove the bound for ρ(X n,0 , S n,0 , τ 0 ), first note that if and S n,0 i0 by the prolongated coarse points we can write ) and therefore Q i0 ≤ µ + g(p, w) 1 2 (here we used the fact that under the assumptions of Theorem 12, ≤ 2 ).Also, using the Bauer-Fike theorem (see Theorem 6.3.2 in [14]) we can deduce that (28) Let g = g(p, w).Using ( 28) and the bounds for Q k , we have where N 1/2 g, and ( 27) was used to replace µ − τ 0 by Tr(Q i0 ). 6. Numerical experiments.In this section we present numerical experiments to evaluate the performance of the prolongation operators.We implement two types of tests.The first set of experiments aims to illustrate how close a prolongated solution is to optimality.The second applies the operators in a one-way multigrid fashion along with an infeasible IPM to solve the resulting problems.The goal of this second test is to compare a basic multigrid method with a pure interior point algorithm.
To generate the problems, we use the package SparsePOP version 3.0 [37], which is an implementation of the algorithm in [36], and to solve the SDP relaxation, we use the infeasible IPM implemented in SDPT3 version 4.0 [34].In order to obtain problems with a unique solution we perturb every polynomial by adding a small linear term (see [36]).If d is the degree of F n , then the order of each relaxation is taken as w = d/2 .We also note that for the problems considered in this section, the maximal cliques created using SparsePOP are identical to those described in section 4.
The infeasibility and gaps for (y i , S i , X i ) at level n = i are defined as follows: • Primal feasibility: p f eas .
• Dual feasibility: d f eas In the main diagonal the matrices X i and S i contain the matrices and S i 1 , S i 2 , . . ., S i i−p+1 , respectively, while C i contains i − p + 1 times the matrix C. b i = {b i α } α∈F i , and If is the given tolerance level, then SDPT3 will stop when p f eas , d f eas , and gap are less than .
Our set of problems includes the following two classical test functions for global optimization problems: • Broyden tridiagonal [24]: The second set of test problems follows the approach in [27] to solve problems that arise when a finite difference method is used to solve nonlinear boundary value problems.The boundary value problem of finding a function x(t) such that f (t, x(t), x (t), x (t)) = 0, x(a) = x a , x(b) = x b , a ≤ t ≤ b, can be solved numerically by uniformly discretizing the domain, using central differences to approximate the derivatives of x(t) and then solving the following system of polynomial equations: This system can be solved by minimizing the sum of the squares of the functions f k .For example, consider the following boundary value problem: . The associated polynomial is We use this approach to solve nine nonlinear boundary value problems for the case where all problem data are given by polynomials.To formulate the constrained problems as in (1), the constraints for the first and last variables are eliminated c 2018 SIAM.Published by SIAM under the terms of the Creative Commons 4.0 license Downloaded 03/09/18 to 146.169.147.104.Redistribution subject to CCBY license by introducing them into the objective function.Note that for the boundary value problems the function F n does not follow exactly the structure of (1).However, when the number of points in the grid is large (i.e., h is small) the difference between the functions goes to zero and the formulation would follow the same structure.The nine nonlinear differential equation problems we considered are given in Table 2. 6.1.Prolongated variables results.In section 5 we discussed how the prolongation operators can be defined for a single level, i.e., t = 1.If t > 1 in the relaxation of the coarse problem (10), we can prolongate points in the coarse level space n − t into level n by using t times the one level operators.In order to allow for t > 1 we start by selecting i 0 ∈ {n 1 + p + 1, n 1 + p + 2, . . ., n 2 − t − 1} and use it to define the prolongation operators applied to the points (y n−t , S n−t ), and select j 0 ∈ {2, 3, . . ., n − t − p} for the prolongation of X n−t to obtain the points one level up: (y n−t+1 , S n−t+1 , X n−t+1 ).Then the process is repeated for the points at level n−t+1 by selecting i 0 ∈ {n 1 +p+1, n 1 +p+2, . . ., n 2 −t}, j 0 ∈ {2, 3, . . ., n−t−p+1} for the prolongation of (y n−t+1 , S n−t+1 ) and X n−t , respectively, and so on until level n is reached.
Using the process described above we now specify prolongation operators for more than one level.In the case of the dual operator (18), Theorem 11 suggests that the selection of the integer j 0 should be done such that 2 is small.We found that for our problems, selecting j 0 = (n − t − p + 1)/2 (i.e., selecting the middle) returned small values for 2 .From this observation, our strategy consists of prolongating X n−t using (18) t consecutive times, setting j 0 = (n − t − p + 1)/2 for all the prolongations.If ) is obtained following the previous description, we can write it as (29) if l ∈ {j 0 , j 0 + 1, . . ., j 0 + t − 1}, X n−t l−t if l ∈ {j 0 + t, . . ., n − p + 1}.It is possible to take two different approaches for the primal variables.The first is a linear operator similar to the one used for the dual variables.The prolongation for S n−t follows the same criterion as the one used for the dual variable using j 0 .The prolongated primal variable y n is defined as (30) y where i 0 = j 0 = (n − t − p + 1)/2 .We found this approach effective for the Broyden tridiagonal and generalized Rosenbrock problems.Consider the solution for the generalized Rosenbrock function; in this case the optimal polynomial variables extracted after solving the SDP relaxation are x 1 ≈ −1 and x j ≈ 1 for j ≥ 2 independent of the number n in F n , and therefore ≈ 0 for any α ∈ supp(F n ) for i ≥ 2 (note that the primal variables y α replace the monomials x α to obtain the SDP relaxation; then we expect that |x α − − x α | should be approximately equal to |y α − − y α |, which determines 1 in Theorem 10).The Broyden tridiagonal test problem also has a constant solution for most of its variables, and therefore the prolongation works well for this problem too.In general we expect this kind of operator to work well when the difference between the variables in the original formulation is small or constant.For the boundary value problems, the use of the linear operators for the primal coarse variables gave prolongated points at fine levels with large values for p f eas for many of the problems.This is due to the fact that the difference between consecutive variables (see Theorem 10(b)) should be small when the corresponding points in the grid are close enough for the linear operator to be useful.Although the difference between the variables of the solution in the polynomial space should decrease as the number of points in the grid increases (assuming convergence of the method), in our experiments the value of n used was not large enough to make the linear operator useful.To address this issue, we defined a second prolongation for the primal variable that directly follows the ideas of multigrid theory (see, for example, [6]).Let y nc be a point for the coarse level n c of the primal relaxation of a nonlinear differential equation with boundary conditions x 0 and x nc+1 .If e j ∈ N n is a unit vector with one in position j, then the nonlinear prolongation y n = P nl y (y nc ) from level n c to level n is calculated as (31) y n α = y n e1 α1 y n e2 α2 . . .y n en αn , α ∈ F n , where with y nc e0 = x 0 and y nc en c +1 = x nc+1 .The operator calculates the first moment fine variables (y n ei ) by using a linear interpolation of the first moment coarse variables (y nc ei ), and then calculates the rest of the moments in a nonlinear way.It is not difficult to see that this operator gives feasible points for any y nc in the sense that α∈F n A (l,α) [P nl y (y nc )] α + C 0 for l = 1, 2, . . ., n − p + 1.Therefore, there is no need to define an operator for the variable S nc , and hence we take S n l = α∈F n A (l,α) [P nl y (y nc )] α + C. Note that in practice it may not be known which of the two prolongations to use.Since both prolongations are computationally inexpensive, the best approach is to calculate both and use the one that provides the least error.
In our first set of experiments we evaluate the performance of the prolongated solutions in terms of feasibility and gaps in the fine level.For n = 1000 we prolongate the solutions of the coarse model n/2 and calculate feasibility and optimality measures of the new points (the SDPT3 tolerance was set to 10 −7 ).Table 1 shows the results for the Broyden tridiagonal and the generalized Rosenbrock functions using the linear operators (29) and (30).The first three columns of each row contain the results when the prolongated variables are used in the fine level n, and the last three columns contain the information {p f eas , d f eas , gap} for the coarse model (n c = n/2).In both problems the prolongated solutions give points with infeasibility and duality gap no greater than 10 −4 .The same exercise was done for the boundary value problems but using the nonlinear operator (31) and size n + 1 = 500 (see Table 2).In this case the coarse model 6e-05 2e-07 3e-08 3e-10 3e-07 2. x + 1 2 (x + t) 3 = 0, x(0) = 0, x(1) = 0 4e-05 3e-05 4e-09 6e-11 1e-04 3. x − 2x 3 + 100 sin(t) = 0, x(0) = has n c = (n − 1)/2 variables.The results show how the prolongated solutions can provide good initial points to use, for example, as initial guesses for an algorithm to solve the fine level.This idea will be explored in the next subsection.Note also that SDPT3 cannot solve some of the coarse problems for the differential equations to the desired accuracy (10 −7 ).We will also address this point in the next subsection.

6.2.
A multigrid approach to solve the fine problems.In our final set of experiments we use a one-way multigrid approach to solve large scale boundary value problems.In the previous subsection we observed that even when the size of the problem is moderate, SDPT3 could not solve many of the relaxations for nonlinear differential equations to the required tolerance.We overcome this problem for a large number of cases using a multigrid approach in conjunction with an infeasible IPM (in our experiments we use SDPT3).We describe the method in Algorithm 1.Let IP M (A, b, C, y 0 , S 0 , X 0 , ) be a function that uses an infeasible IPM to solve the SDP problem with parameters A, b, C, using the initial point (y 0 , S 0 , X 0 ).This function then returns a solution (y, S, X) that satisfies the required error tolerance .If no initial point is given, we will write IP M (A, b, C, [ ], [ ], [ ], ) (in which case we use the default initialization procedure of the algorithm).Also, let P X,j0 be the prolongation operator defined by (29) for some n 1 + p + 1 ≤ j 0 ≤ n 2 − 2, and let P nl y be the nonlinear operator defined by (31).The parameter L indicates the total number of levels, including the fine level, that are going to be used.If the goal is to solve a problem with n variables, the method uses L − 1 coarse levels with n i = n i+1 /2 variables at level i (n L = n).Then the first coarse level (n 1 variables) is solved to the accuracy [tol] 1 with no initial point provided, and the prolongated solution is used as a starting point to solve the second coarse level (n 2 variables) to an accuracy of [tol] 2 .The process is repeated until the level L corresponding to the fine level is solved to an accuracy of [tol] L .
There is a trade-off between solving the coarse levels to a high accuracy and the CPU time used to achieve it.The coarse solution must provide meaningful information to the fine level.However, computing a highly accurate coarse solution may not be the most efficient use of CPU time; even an exact solution to the coarse model will be at best an approximation for the fine model.We found that starting with an accuracy of 10 −4 for the coarse level with the fewest variables, and slowly increasing the accuracy as the number of variables increases, is a good rule of thumb.We use Algorithm 1 in conjunction with an infeasible IPM (SDPT3) to solve the SDP relaxation for the nine nonlinear differential equations to an accuracy of 10 −7 (i.e., of the 99 relaxations solved for each differential equation.SeDuMi improves over these results by solving 356 relaxations.Using the multigrid approach M ulti 2 we are able to solve 682 relaxations, with problems 4, 5, and 9 having the worst results with only 10, 76, and 16 relaxations solved.Using more than two levels, we can match or increase the number of relaxations solved by M ulti 2 , with a total of 799 relaxations solved (except in problem 7, where we saw a small difference of six additional relaxations solved by M ulti 2 ).In particular, for problems 4 and 5 M ulti L≥2 can solve 97 of the 99 relaxations.We attribute the improvement to the fact that with more levels the algorithm can find a better solution to prolongate because smaller problems are easier to solve to a higher tolerance.We also tried two level experiments with 10 −7 accuracy for the coarse level, but the results did not improve with respect to M ulti 2 due to the inability of SDPT3 to solve the relaxation to that accuracy.
M ulti L≥2 performed well, except for problem 9 where it was only able to solve 27 relaxations.Upon further investigation we found that for problem 9 the finite difference scheme used in this paper is not a consistent approximation for the underlying differential equation.However, the resulting POP shares the same polynomial structure as the other problems considered in this paper.We therefore included this problem in our numerical experiments to see if the inconsistency of the discretization scheme has a large effect.Nevertheless, it is important to remark that M ulti L≥2 was able to solve 55 and 94 relaxations to an accuracy of 10 −6 and 10 −4 , respectively, while SDPT3 solved 6 and 12 relaxations to an accuracy of 10 −6 and 10 −4 , respectively, and SeDuMi reports 9 and 99 cases to an accuracy of 10 −6 10 −4 , respectively.
When comparing how many models are solved as a function of the size (see Figure 1), we observe that as n becomes larger, both SDPT3 and SeDuMi have difficulties solving the sparse relaxations.In contrast, the multigrid approach with more than two levels is able to solve almost all the problems (except problem 9) independent of the size.
To investigate the reasons for the increase in performance of the multigrid approach, we report the condition number of the Schur-complement matrix for the last iteration performed by the SDPT3 and M ulti L≥2 (see [34] for more on the Schurcomplement matrix).In [32], it was shown how exploiting sparsity for SDP problems using chordal completion and maximal clique decomposition approaches (like those used by [36] and this paper) may lead to SDP problems that are primal degenerate.As a consequence, the numerical experiments in this paper show an increase in the condition number of the Schur-complement matrix when IPMs were used to solve the sparse problem, compared with the condition number of the Schur-complement matrix of the IPM of the nonsparse semidefinite program.Table 4 shows, for each problem, in how many of the 99 relaxations SDPT3 had a larger condition number for the last iteration than M ulti L≥2 ("# CN SDP T 3 > CN M ulti L≥2 "), the average of the ratio between the condition number of the last iteration using SDPT3 and the   condition number of the last iteration using M ulti L≥2 ("mean CN SDP T 3 /CN M ulti L≥2 "), and the minimum and maximum ratios.With the exception of problem 9, for most of the problems the condition number is larger for the pure IPM method compared with M ulti L≥2 , which could explain the results of the multigrid approach.However, in some cases the condition number was larger for the multigrid approach, but SDPT3 could not solve the problem and M ulti L≥2 could.More research is needed to determine if this is the only reason that explains the superior performance of the multigrid method and exactly how the approach helps in the case of SDP problems with degenerate solutions.
In the final set of experiments, we use Algorithm 1 with the same settings as in M ulti 2 and M ulti L≥2 , but we change the 10 −7 tolerance at the fine level.In particular, we set [tol] L in Algorithm 1 equal to the maximum between the feasibility and duality gap measures for the fine model when it is solved by SDPT3.We repeat this exercise using the maximum between the feasibility and duality gap measures for the fine model obtained by SeDuMi (since SeDuMi uses a different feasibility measures than SDPT3, we calculate p f eas , d f eas , and gap using the solutions reported by the solver so they match the SDPT3 criteria).For these experiments we only considered the cases where SDPT3 and SeDuMi achieved at least a 10 −4 tolerance.We show the results in three tables based on the number of variables n: small size (Table 5, where n = 20, 30, . . ., 100), medium size (Table 6, where n = 110, 120, . . ., 500), and large size (Table 7, where n = 510, 520, . . ., 1000).Each table is divided into two parts: the first part shows the comparison with SDPT3, and the second part shows comparison with SeDuMi.We described the first part; the second part with the SeDuMi results has the same structure.For each of the nine differential equations, the first part shows the number of relaxations solved by SDPT3 to at least 10 −4 accuracy ("# solved SDPT3"), how many of those models solved were also solved by M ulti 2 and M ulti L≥2 to the same accuracy obtained by SDPT3 ("# solved M ulti 2 ," "# solved M ulti L≥2 "), the percentage of the relaxations solved to the same accuracy where the time spent by SDPT3 was larger than the multigrid time ("% faster M ulti 2 ," "% faster M ulti L≥2 "), and, in the last two rows, the average of the ratio between the CPU time required by SDPT3 and the CPU time required by the multigrid approach  oretical results suggest that if some easy-to-check conditions are satisfied, then we should expect a significant reduction in complexity by integrating multigrid ideas with infeasible IPMs.Our numerical results back up our theoretical analysis.In particular, we showed how the multigrid approach can improve the robustness of IPMs and reduce solution times, especially for medium to large problems and when high accuracy solutions are required.This work suggests some interesting directions for future research.Using a basic one-way multigrid approach, we observed substantial improvements in practical applications.Therefore, it would be interesting to implement a full multigrid algorithm that may include v and w-cycles.Another obvious extension is to study the constrained cases such as in [21,23], where we would expect similar patterns such as those described in this work.We concentrated on SDP relaxations arising from global optimization problems, particularly for the solution of boundary value problems; however, similar ideas are applicable in other settings such as moment relaxations of optimal control problems [19].Finally, in our work we used the sparse polynomial relaxation developed in [36].Recently a sparse relaxation was proposed in [38] with very promising theoretical and numerical results.The application of multigrid and the development of prolongation operators for this new hierarchy are also likely to lead to improvements.
, and then using (a).(e) This equality is obtained by replacing F by ∪ n k=1 B k and then eliminating the zero matrices according to (b).c 2018 SIAM.Published by SIAM under the terms of the Creative Commons 4.0 license Downloaded 03/09/18 to 146.169.147.104.Redistribution subject to CCBY license 5.

Fig. 1 .
Fig. 1.Comparison of relaxations solved by SDPT 3, SeDuMi, and the multigrid algorithm as a function of the size of the problem (n) for the nonlinear differential equations.

Table 1
Feasibility and gaps of projected variables for Broyden tridiagonal and generalized Rosenbrock functions (n = 1000).

Table 2
Feasibility and gaps of projected variables for the nonlinear differential equations (n + 1 = 500).

Table 3
Comparison between relaxations solved by SDPT 3, SeDuMi, and the multigrid approach for the nonlinear differential equations.The total number of relaxations per differential equation is 99. * c 2018 SIAM.Published by SIAM under the terms of the Creative Commons 4.0 license Downloaded 03/09/18 to 146.169.147.104.Redistribution subject to CCBY license

Table 4
Condition number of the Schur-complement matrix for the last iteration at the fine level using SDPT 3 and M ulti L≥2 for the nonlinear differential equations.