Optimizing hypergraph-based polynomials modeling job-occupancy in queueing with redundancy scheduling

We investigate two classes of multivariate polynomials with variables indexed by the edges of a uniform hypergraph and coefficients depending on certain patterns of union of edges. These polynomials arise naturally to model job-occupancy in some queuing problems with redundancy scheduling policy. The question, posed by Cardinaels, Borst and van Leeuwaarden (arXiv:2005.14566, 2020), is to decide whether their global minimum over the standard simplex is attained at the uniform probability distribution. By exploiting symmetry properties of these polynomials we can give a positive answer for the first class and partial results for the second one, where we in fact show a stronger convexity property of these polynomials over the simplex.


Introduction
We consider the minimization of a class of polynomials over the standard simplex. These polynomials have their variables labelled by the edges of a complete uniform hypergraph and their coefficients are defined in terms of some cardinality patterns of unions of edges. They arise naturally within the modelling of job-occupancy in some queuing problems with redundancy scheduling policies [3]. The question is whether these polynomials attain their minimum value at the barycenter of the standard simplex, which corresponds to showing optimality of the uniform distribution for the underlying queuing problem. This paper is devoted to this question.
We now introduce the classes of polynomials of interest. Given integers n, L ≥ 2 we set V = [n] = {1, . . . , n} and E = {e ⊆ V : |e| = L}, so that (V, E) can be seen as the complete L-uniform hypergraph on n elements. We set m := |E| = n L , where we omit the explicit dependence on n, L to simplify notation, and we let denote the standard simplex in R m . The elements of ∆ m correspond to probability vectors on m items and the barycenter x * = 1 m (1, . . . , 1) of ∆ m corresponds to the uniform probability vector.
Given an integer d ≥ 2 we consider the following m-variate polynomial in the variables x = (x e : e ∈ E), which is a main player in the paper: So p d is homogeneous with degree d. We are interested in the following optimization problem p * d := min x∈∆m p d (x), asking to minimize the polynomial p d over the simplex ∆ m . Our main objective is to show that the global minimum is attained at the uniform probability vector x * . The following is the main result of the paper. There is a second related class of polynomials of interest also homogeneous with degree d. Note that for d = 1 both polynomials coincide: p 1 = f 1 and, for d = 2, we have f 2 = 1 L p 2 . Here too the question is whether the minimum of f d over the standard simplex ∆ m is attained at the uniform probability vector x * . Question 1. Given integers n, d, L ≥ 2 is it true that the polynomial f d (x) in (2) attains its minimum over ∆ m at the barycenter x * of ∆ m ?
As noted above the answer is positive for d = 2 and n, L ≥ 2. As a further partial result we give a positive answer for the case of degree d = 3 and edge size L = 2.
Theorem 2. For d = 3 and L = 2 the global minimum of the polynomial f d from (2) over the standard simplex ∆ m is attained at the barycenter x * = 1 m (1, . . . , 1) of ∆ m .
As we will mention in the next section the above question about the polynomials f d , posed by the authors of [3] (in the case L = 2), is motivated by its relevance to a problem in queueing theory. The polynomials f d can be seen as a variant of the polynomials p d and as mentioned above both classes coincide (up to scaling) for degree d = 2. For the polynomials p d we can give a full answer and show that they indeed attain their minimum at the barycenter of ∆ m . The analysis of the polynomials f d is technically much more involved and we have only partial results so far. In both cases the key ingredient is showing that the polynomials are convex, i.e., that they have positive semidefinite Hessians. It turns out that the Hessian of the polynomial p d enters in some way as a component of the Hessian of the polynomial f d . So this forms a natural motivation for the study of the polynomials p d , though they form a natural class of symmetric polynomials that are interesting for their own sake.
Exploiting symmetry plays a central role in our proofs. Indeed the key idea is to show that the polynomials are convex, which, combined with their symmetry properties, implies that the global minimum is attained at the barycenter of the simplex. For this we show that their Hessian matrices are positive semidefinite at each point of the simplex, which we do through exploiting again their symmetry structure and links to Terwilliger algebras.
Symmetry is a widely used ingredient in optimization, in particular in semidefinite optimization and algebraic questions involving polynomials. We mention a few landmark examples as background information. Symmetry can indeed be used to formulate equivalent, more compact reformulations for semidefinite programs. The underlying mathematical fact is Wedderburn-Artin theory, which shows that matrix * -algebras can be block-diagonalized (see Theorem 3 below). An early well-known example is the linear programming reformulation from [18] for the Lovász theta number of Hamming graphs, showing the link to the Delsarte bound and Bose-Mesner algebras of Hamming schemes [5,6]. Symmetry is used more generally to give tractable reformulations for the semidefinite bounds arising from the next levels of Lasserre's hierarchy in [19] (which gives the explicit block-diagonalization for the Terwilliger algebra of Hamming schemes, see Theorem 4 below) and, e.g., in [9], [10], [11], [12]. For more examples and a broad exposition about the use of symmetry in semidefinite programming we refer, e.g., to [1,4] and further references therein. Symmetry is also a crucial ingredient in the study of algebraic questions about polynomials, like representations in terms of sums of squares, and in polynomial optimization. We refer to [8] for a broad exposition and, e.g., to [17] (for compact reformulations of Lasserre relaxations of symmetric polynomial optimization problems), [16] (for methods to reduce the number of variables in programs involving symmetric polynomials), and the recent works [13,14] (which consider symmetric polynomials with variables indexed by the k-subsets hypercube (as in our case) and uncover links with the theory of flag algebras by Razborov [15] x e 1 x e 2 .
Using the notation p d (x) = e=(e 1 ,...,e d )∈E d c e x e 1 · · · x e d from relation (8) below for the polynomial p d , we show in Figure 1 the three possible patterns for pairs of edges e = (e 1 , e 2 ) and the corresponding coefficients c e . In the same way, where the summand q d,k (x) is a summation over all d-tuples of edges with a given pattern, depending on the cardinality of their union: For the case d = 3 we need to consider the values k = 2, 3, 4, 5, 6; as an illustration we show in Figure 2 all the possible patterns of triplets of edges e = (e 1 , e 2 , e 3 ) and the corresponding coefficients c e that contribute to the summands q 3,k . Organization of the paper. In the rest of this section we first indicate in Section 1.1 how the polynomials f d naturally arise within a problem of queuing theory with redundancy scheduling policies. After that we present in Section 1.2 the main ideas of the proofs, which highly rely on exploiting symmetry properties of the polynomials. This involves in particular using the Terwilliger algebra of the binary Hamming cube, so we include some preliminaries about these Terwilliger algebras in Section 1.3.
In Section 2 we give the full proof for Theorem 1 showing that the polynomials p d attain their global minimum at the barycenter of the simplex and, in Section 3, we investigate the second class of polynomials f d . We prove several properties of these polynomials, which we use to show Theorem 2. We also present a range of values of (n, d, L) for which the polynomials f d are indeed convex and thus Question 1 has a positive answer.
Some notation. Throughout we let I, J denote the identity matrix and the all-ones matrix, whose size should be clear from the context. When we want to specify the size we let I n (resp., J n ) denote the n×n identity matrix (resp., all-ones matrix) and, given two integers n, m ≥ 1, J m,n denotes the m×n allones matrix. For a symmetric matrix A the notation A 0 means that A is positive semidefinite. Given two matrices A, B ∈ R n×n we let A • B ∈ R n×n denote their Hadamard product, with entries ( For a sequence α ∈ N n we set |α| = n i=1 α i and, for an integer d ∈ N, we set N n d = {α ∈ N n : |α| = d}. Given a vector x ∈ R n and α ∈ N n we set x α = x α 1 1 · · · x αn n . Throughout we let u 1 , . . . , u m denote the standard basis of R m , where all entries of u i are 0 except its i-th entry which is equal to 1. We let Sym(n) denote the set of permutations of the set V = [n].

Motivation
Our motivation for the study of the polynomials p d and f d comes from their relevance to a problem in queueing theory. The question whether they attain their minimum at the uniform probability distribution was posed to us by the authors of [3], who use a positive answer to this question to establish a result about the asymptotic behaviour of the job occupancy in a parallelservers system with redundancy scheduling in the light-traffic regime. In what follows we will give only a high level sketch of this connection and we refer to the paper [3] for a detailed exposition. We also refer to [3] for an extended review of the relevant literature.
A crucial mechanism that has been considered to improve the performance of parallel-servers systems in queueing theory is redundancy scheduling. The key feature of this policy is that several replicas are created for each arriving job, which are then assigned to distinct servers (and then, as soon as the first of these replicas completes (or enters) service on a server the remaining ones are stopped). The underlying idea is that sending replicas of the same job to several servers will increase the chance of having shorter queueing times. This however must be weighted against the risk of wastage of capacity. An important question is thus to assess the impact of redundancy scheduling policies. While most papers in the literature of redundant scheduling assume that the set of servers to which the replicas are sent is selected uniformly at random, the paper [3] considers the case when the set of servers is selected according to a given probability distribution and it investigates what is the impact of this probability distribution on the performance of the system. It is shown there that while the impact remains relatively limited in the heavy-traffic regime, the system occupancy is much more sensitive to the selected probability distribution in the light-traffic regime.
We will now only introduce a few elements of the model considered in [3], so that we can make the link to the polynomials studied in this paper. We keep our presentation high level and refer to [3] for details. The setting is as follows. There are n parallel servers, with average speed µ. Jobs arrive as a Poisson process of rate nλ for some λ > 0. When a job arrives, L replicas of it are created that are sent -with probability x e -to a subset e ⊆ [n] of L servers. Here, L ≥ 2 is an integer and x = (x e ) e∈E is a probability distribution on the set E = {e ⊆ [n] : |e| = L} of possible collections of L servers. As noted in [3] this can be seen as selecting an edge e ∈ E with probability x e in the uniform hypergraph (V = [n], E) (with edge size L).
An important performance parameter is the system occupancy at time t, which is represented by a vector (e 1 , ..., e M ) ∈ E M , where M = M (t) is the total number of jobs present in the system and e i ∈ E is the collection of servers to which the replicas of the i-th longest job in the system have been assigned. Under suitable stability conditions and assuming each server has the same speed µ and service requirements of the jobs are independent and exponentially distributed with unit mean, the stationary distribution of the occupancy of this edge selection is given by for some constant C > 0 ( [7], see relation (3) in [3]). Following [3], let Q λ (x) be a random variable with the stationary distribution of the system occupancy when the edge selection is given by the probability vector x = (x e ) e∈E . It then follows that, for any integer d ≥ 1, the probability that d jobs are present in the system is given by Hence, P{Q λ (x) = 0} = C and (See relation (11) in [3]). Therefore, P{Q λ (x) = d} is the polynomial f d (x) (up to a scalar multiple). In [3] the light-traffic regime is considered, i.e., when λ ↓ 0, in the case L = 2. By doing a Taylor expansion one can see that (see relation (13) in [3]). Therefore, with x * = (1, . . . , 1)/|E| denoting the uniform probability vector, we have Hence, if the polynomial f d attains its minimum at the uniform distribution x * , then one has This indicates that in the light-traffic regime the system occupancy is minimized when selecting uniformely at random the assignments to the servers of the job replicas. This thus motivates Question 1 of showing that the polynomial f d attains its minimum over the probability simplex at the uniform point x * .

Sketch of proof
Here we give a sketch of proof for our main results. We start with indicating the main steps for proving Theorem 1, dealing with the class of polynomials p d and after that we briefly indicate how to deal with the polynomials f d . A first easy observation is that in order to show that the polynomial p d attains its minimum at the barycenter of the standard simplex ∆ m it suffices to show that p d is convex over ∆ m . This follows from a symmetry argument, namely we exploit the fact that the polynomial p d is invariant under the permutations of the edge set E that are induced by permutations of [n]. Proof. The key fact we use is that the polynomial p d enjoys some symmetry property; namely, for any tuple (e 1 , . . . , In turn, σ acts on ∆ m by setting σ(x) = (x σ(e) ) e∈∆m for x = (x e ) e∈E ∈ ∆ m . We now observe that p d is invariant under this action of permutations σ ∈ Sym(n). Indeed, for any σ ∈ Sym(n), we have Let x ∈ ∆ m be a global minimizer of p d . For any permutation σ ∈ Sym(n) the permuted point σ(x) belongs to ∆ m and p d (x) = p d (σ(x)) holds. Hence, for the full symmetrization of x, we have x * ∈ ∆ m and where the inequality holds since p d is convex over ∆ m . This shows that x * is again a global minimizer of p d in ∆ m . It suffices now to observe that, by construction, x * = (1/m)(1, . . . , 1).
Therefore we are left with the task of showing that the polynomial p d is convex over the simplex ∆ m or, equivalently, that its Hessan matrix is positive semidefinite over ∆ m . This forms the core technical part of the proof. Here is a rough sketch of our proof technique.
A first step is to express the Hessian matrix as a matrix polynomial, involving a collection of matrices M γ ; see Lemma 4. The next step is to show that each of the matrices M γ appearing in this decomposition of the Hessian is positive semidefinite. For this, we reduce to the task of showing that a certain set of well-structured matrices are positive semidefinite, see Lemmas 5 and 6. This last task is done by showing that these matrices lie in the Terwilliger algebra of the Hamming cube, which enables us to exploit its explicitly known block-diagonalization. The proof is then concluded by using an integral representation argument, see Section 2.3.
The treatment for the polynomials f d has the same starting point: the polynomial f d is invariant under any permutation of the edge set E induced by permutations of [n] and thus it suffices to show that f d is convex in order to conclude that it attains its global minimum at the barycenter of the simplex (i.e., the analogue of Lemma 1 holds for f d ). After that we again express the Hessian matrix H(f d ) as a matrix polynomial, involving a collection of marices Q γ ; see Lemma 9. Hence here too the task boils down to showing that each of these matrices Q γ is positive semidefinite. This task turns out to be considerably more difficult than for the matrices M γ which occurred in the analysis of the polynomial p d . As a first step toward the analysis of the matrices Q γ we give a recursive reformulation for them, which also makes apparent how the matrices M γ enter their definition (namely as a factor of a Hadamard product definition of Q γ ); see Lemma 12. Based on this we can show that the matrices Q γ are indeed positive semidefinite in the case d = 3 and L = 2, thus showing Theorem 2; see Section 3.2.

Preliminaries on the Terwilliger algebra
As mentioned above we need to exploit the symmetry structure of the polynomial p d in order to show that its Hessian matrix is positive semidefinite. A crucial ingredient will be that the Hessiam matrix can be decomposed into matrices that (after some reduction steps) all lie in the Terwiliger algebra of the binary Hamming cube. We begin with introducing the definition of the Terwiliger algebra A n of the binary Hamming cube on n elements.
Definition 1 (Terwilliger algebra of the binary Hamming cube). Let P n denote the collection of all subsets of the set V = [n]. For every triple of nonnegative integers i, j, t we define the 2 n × 2 n matrix D t i,j , indexed by P n , with entries for sets S, T ∈ P n . Then the Terwilliger algebra of the binary Hamming cube, denoted by A n , is defined as the (real) span of all these matrices: It is easy to see that A n is a matrix * -algebra, i.e., A n is closed under taking linear combinations, matrix multiplications and transposition. One way to see this is by realizing that the matrices D t i,j are exactly the indicator matrices of the orbits of pairs in P n × P n under the element-wise action of the symmetric group Sym(n).
Theorem 3 (Artin-Wedderburn). Let A be a matrix * -algebra. Then there exist nonnegative integers d and m 1 , . . . , m d and a * -algebra isomorphism The important property here is that ϕ is an algebra isomorphism. Hence we know that this isomorphism maintains positive semidefiniteness: for any matrix A ∈ A, we have A 0 ⇐⇒ ϕ(A) 0. Moreover, the matrix ϕ(A) is block-diagonal, with d diagnal blocks of sizes m 1 , . . . , m d . This is a crucial property which can be exploited in order to get a more efficient way of encoding positive semidefiniteness of matrices in A.
The explicit block-diagonalization of the Terwilliger algebra A n was given by Schrijver [19].
Theorem 4 (Schrijver [19]). The Terwiliger algebra A n can be block-diagonalized into ⌊ n 2 ⌋ + 1 blocks, of sizes m k = n − 2k + 1 for k = 0, . . . , ⌊ n 2 ⌋. The algebra isomorphism ϕ sends the matrix for k = 0, 1, . . . , ⌊ n 2 ⌋. Here, for any nonnegative integers i, j, t, k, we set In particular we have n i,j,t=0 2 Proof of Theorem 1 In this section we give the proof of Theorem 1. As a warm-up we start with the special case when the degree is d = 2 and the edge size is L = 2, where we can easily show that the polynomial p 2 is convex. After that we proceed to the general case. We follow the steps as sketched above: first we express the Hessian matrix of p d as a matrix polynomial and we indicate some reductions that lead to the task of showing that a set of well-structured matrices are positive semidefinite. After that we show the positive semidefiniteness of these matrices by exploiting a link to the Terwilliger algebra of the Boolean Hamming cube.

The case d = 2 and L = 2
Here we consider the polynomial We show that the polynomial p 2 is convex over the standard simplex or, equivalently, that its Hessian matrix is positive semidefinite over ∆ m . Here, the Hessian matrix of p 2 is the matrix indexed by E, renamed M , with entries Consider the matrices A 2 , A 3 , A 4 indexed by E, with entries Then, we have A 2 = I and A 2 + A 3 + A 4 = J. Clearly we can express the matrix M as a linear combination of these matrices: We can now conclude that M 0 (and thus the polynomial p 2 is convex) in view of the next lemma, which claims that A 3 + 2I 0.
Note that the matrices A 2 = I, A 3 , A 4 generate the Bose-Mesner algebra of the Johnson scheme J n 2 , with length n and weight 2, and thus the matrix M belongs to this Bose-Mesner algebra (see [6] for details on the Johnson scheme). For arbitrary degree d ≥ 3 and edge size L = 2 one could proceed to show that the Hessian matrix of p d is convex by using a similar symmetry reduction based on the Bose-Mesner algebra of the Johnson scheme J p 2 for suitable values of p. However, for general edge size L ≥ 3 we will need to use a richer algebra, namely the Terwilliger algebra of the Hamming cube. Hence we will treat in the rest of the section the general case d ≥ 2 and L ≥ 2.

Computing the Hessian matrix of p d
In this section we indicate how to compute the Hessian matrix of the polynomial where we set and as before E = {e ⊆ V = [n] : |e| = L} with L ≥ 2. We begin with getting the explicit coefficients of the polynomial p d expressed in the standard monomial basis. The basic fact we will now use is that the parameter c (e 1 ,...,e d ) depends only on the set of distinct indices e i that are present in the tuple (e 1 , . . . , e d ) ∈ E d and not on their multiplicities. To formalize this, recall m = |E| and label the edges as e 1 , . . . , e m so that , α(e) ℓ is the number of indices among i 1 , . . . , i d that are equal to ℓ. Then we have: and |α(e)| = d so that α(e) ∈ N m d . This justifies the following definition. For α ∈ N m d , consider a d-tuple e ∈ E d such that α(e) = α and define c α := c e .
Proof. Using the definition of the coefficients c α , we can rewrite p d as Here, for this last equality, we use the monomial theorem, which claims or, equivalently, that the number of d-tuples e ∈ E d for which α(e) = α is equal to d!/α!.
We now proceed to compute the Hessian matrix of p d .
and the vectors u 1 , . . . , u m ∈ R m form the standard basis of R m .
Proof. The partial derivatives of p d are Similarly we see that This concludes the proof.
Hence, if we can show that the matrices M γ in (12) are all positive semidefinite then it follows directly that the Hessian matrix of p d is positive semidefinite on the standard simplex.
We next observe how to further simplify the matrices M γ . For γ ∈ N m , define its support as the set S γ = {e ∈ E : γ e ≥ 1} and let W γ = e∈Sγ e denote the subset of elements of V = [n] that are covered by some edge in S γ . Then, for any i, j ∈ [m], the support of γ + u i + u j is the set S γ ∪ {e i , e j } and we have Hence the matrix M γ depends only on the set W γ (and not on the specific choice of γ). This justifies defining the matrices If d = 2 then there is only one matrix to check, namely the matrix M ∅ (for W = ∅). Note that the matrix M ∅ coincides with the matrix in (6), so we already know it is positive semidefinite when L = 2. However, if d ≥ 3, then one needs to check all the matrices of the form M W in (13). Now comes the last reduction, useful to link these matrices M W to the Terwiliger algebra, which consists in removing duplicate rows and columns. Set p := |W | and U := V \ W , so that |U | = n − p. In addition let denote the collection of subsets of U with size at most L. Now we consider the matrix M p , which is indexed by F , with entries Note that for p = 0 the matrix M 0 coincides with the matrix M ∅ in (13) (and with the matrix in (6)). The next lemma links the matrices M W and M p .  (13) and M p in (15). The following assertions are equivalent: With respect to this partition of its index set, the matrix M W has the following block-form: where the block M i,j W has its rows indexed by E i and its columns by E j . Note that, if two edges e, e ′ ∈ E satisfy e \ W = e ′ \ W , then the two rows of M W indexed by e and e ′ coincide: for any f ∈ E we have In fact, after removing these duplicate rows (and columns) and keeping only one copy for each subset of U = V \ W , we obtain the matrix  which coincides with the matrix M p in (15). Indeed, the above matrix is indexed by the set F in (14) and its block-form is with respect to the So the block M i,j p has its rows indexed by F i , its columns indexed by F j , and its entries are As the matrices M p arise from M W by removing its duplicates rows and columns it is clear that the matrices M W are positive semidefinite if and only if the same holds for the matrices M p . This concludes the proof.

The general case d ≥ 2 and L ≥ 2
In Section 1.3 we gave preliminary results on the Terwilliger algebra, which we will now use to prove that the matrices M p in (15) are positive semidefinite. Fix 0 ≤ p ≤ n and consider the matrix M p in (15) (with blocks as in (16)). We start with observing that M p belongs to the Terwilliger algebra A n−p . This is clear since relation (16) provides the explicit correspondence between the blocks M i,j p of M p and the generating matrices D t i,j of the algebra A n−p : after setting Let B k be the corresponding matrices from (18). Then, in view of Theorem 4, we know that M p 0 if and only if B k 0 for all 0 ≤ k ≤ ⌈n/2⌉. In what follows p, k are fixed integers. We now proceed to show that B k 0.
To simplify the notation we introduce the following parameters for any integers i, u. Note that we may omit the obvious bounding conditions on i and u since the corresponding parameters are zero if these conditions are not satisfied; for instance, a(i) = 0 if i < k and b(u, i) = 0 if u > i. Then we have and We now give an integral reformulation for the entries of the matrix B k from (18). It is based on the fact that which permits to give an integral reformulation for the scalars x t i,j in (17). This simple but powerful fact will be very useful to show B k 0.
where we define the function g(u, z) = z p−1 ( 1−z z ) u for z ∈ (0, 1]. Proof. First we use the expressions of β t i,j,k in (19) and of x t i,j in (17) and we exchange the summations in t and u to obtain min{i,j} (21) Now we use (20), which gives the following integral representation Using this integral representation (and the binomial theorem for the equality marked (*) below) we can reformulate the inner summation appearing in (21) as follows: This concludes the proof.
We can now proceed to show that the matrices B k in (18) are positive semidefinite.
Lemma 8. We have B k 0.
Proof. We use Lemma 7 to reformulate the matrix B k . First, note that in the result of Lemma 7, since b(u, i)b(u, j) = 0 if u > min{i, j}, we may replace the summation on u from 0 ≤ u ≤ min{i, j} to 0 ≤ u ≤ n − p. This implies: Here we used the fact that, for any u ∈ [0, n − p], the function g(u, z) is nonnegative on (0, 1] and that the matrix H(u, z, k) is positive semidefinite for any z ∈ [0, 1] since it is the outerproduct of the vector h(u, z, i) with itself.
Therefore we have shown that the matrices B k are positive semidefinite and thus the following result. In view of Lemmas 5 and 6 we can conclude that the polynomial p d is convex on ∆ m , which concludes the proof of Theorem 1.

Investigating the polynomials f d
Here we consider the second class of polynomials f d from (2), namely We address Question 1, which asks whether f d attains its minimum value on the simplex ∆ m at the barycenter of ∆ m . Here too this question has a positive answer if one can show that f d is convex over ∆ m . This follows since the analogue of Lemma 1 extends easily for the polynomial f d . We conjecture that convexity holds in general.

Conjecture 1.
For any integers n, L, d ≥ 2 the polynomial f d is convex over the simplex ∆ m .
For degree d = 2, we have f 2 = 1 L p 2 and thus we know from Theorem 1 that f 2 is convex. We will prove in Section 3.2 that Conjecture 1 holds for degree d = 3 and edge size L = 2 and, in Section 3.3 and Appendix A, we will give a range of values for (n, L, d) that were numerically tested and support Conjecture 1.
In what follows we begin in Section 3.1 with giving a polynomial matrix decomposition for the Hessian of f d and then a recursive reformulation for it, making also apparent some links to the Hessian of p d . From this we see that convexity of f d follows if we can show that a family of well-structured matrices Q γ are positive semidefinite (see Lemmas 9 and 12). We can complete this task in the case d = 3 and L = 2 (see Section 3.2). However, understanding the general case is technically involved and would require developing new tools for exploiting the symmetry structure present in the matrices Q γ (which is now not captured by the Terwilliger algebra). This goes beyond the scope of this paper and we leave it for further research.

Computing the Hessian of f d
We begin with expressing the polynomial f d in the standard monomial basis: where we set Next we compute the Hessian of f d and we give a matrix polynomial reformulation for it.
Lemma 9. The Hessian of the polynomial f d is given by where as before u 1 , . . . , u m form the standard basis of R m . In other words, where we define the symmetric m × m matrix Q γ with entries Proof. Direct verification.
We now give a recursive reformulation for the coefficients of the polynomial f d and for its Hessian matrix, that may possibly be helpful for a proof Observe that the inner summation S in (27) does not depend on the choice of the sequence e such that α(e) = α; thus we may consider it fixed, denoted as (e i 1 , . . . , e i d ). Since there are d! α! possible choices for selecting this sequence, using relation (27) we can reformulate b α as follows: Next we pull out the factor Here, in the last equality marked (*), we use the fact that α k of the elements in the multiset {i 1 , . . . , i d } are equal to k. Summarizing we have shown: We now proceed to give a recursive reformulation for the matrices Q γ in (24). First we reformulate them using the scaled parameters which satisfy the recursive relation: Indeed, by Lemma 10 we have Proof. Direct verification: where the matrices M γ were introduced in (12).
Proof. Combining Lemmas 10 and 12 we obtain which shows the claim.

The polynomial f d in the case d = 3, L = 2
Here we show that the polynomial f d is convex in the case d = 3 and L = 2. In view of Lemma 9 it suffices to show that the matrix Q γ is positive semidefinite for any γ ∈ N m 1 . Up to symmetry it suffices to show that M γ 0 for γ = u 1 . In view of Lemma 12 we have where M u 1 = M 2 is as defined in relation (15). We have seen in the previous section that the matrix M u 1 is positive semidefinite (see Corollary 1). Hence it suffices now to show that Q 0 + R u 1 0. By definition, the entries of Q 0 (case γ = 0) are Using this we obtain that where we define the matrix B as The main result of this section is the next lemma, which implies that the polynomial f 3 is convex for L = 2 and thus settles Conjecture 1 for the case d = 3, L = 2.
Before proceeding to the proof, let us make a few observations. Note that B can be decomposed as B = M 0 + R, where M 0 = M ∅ has been shown earlier to be positive semidefinite (recall Corollary 1, or note that M 0 is the matrix M from (6) as we are in the case L = 2). On the other hand, the matrix R is not positive semidefinite. In fact, R has rank 2 and it has a negative eigenvalue. One can infer from the results in Section 2.1 that λ min (M 0 ) = 1/12, while one can check that λ min (R) < −1/12 = −0.0833... when n ≥ 6 (see Table 1). Hence in general one cannot argue that B 0 by simply looking at the smallest eigenvalues of its summands M 0 and R.
In the rest of the section we prove Lemma 13. To fix ideas let e be the edge e 1 = {1, 2} and to simplify notation set p = n − 2 and q = n−2 2 . Then the index set of B can be partitioned into {e 1 } ∪ I 1 ∪ I 2 ∪ I 0 , setting I k = {{k, i} : 3 ≤ i ≤ n} for k = 1, 2, and I 0 = {{i, j} : 3 ≤ i < j ≤ n}. So |I 1 | = |I 2 | = p and |I 0 | = q. With respect to this partition one can verify that the matrix B has the following block-form: .
Here the matrix M is the matrix from (6) (replacing n by p = n − 2), i.e., where Γ = Γ p is the p 2 × p matrix from Lemma 2, whose (f, i)th entry is |{i} ∩ f |.
We now proceed to show that the matrix B is positive semidefinite. Note that its lower right diagonal block indexed by the set I 0 is positive semidefinite (since M 0). Our strategy is now to 'eliminate' the borders indexed by the sets {e 1 }, I 1 and I 2 successively, one by one, by taking Schur complements, until reaching a final matrix whose positive semidefiniteness can be seen directly. To do the Schur complement operations we will need to invert matrices of the form aI + bJ. The next lemma indicates how to do that, its proof is straightforward and thus omitted.
Lemma 14. For a, b ∈ R such that a + pb = 0, the matrix aI p + bJ p is nonsingular with inverse Step 1: We take a first Schur complement with respect to the upper left corner of B (indexed by e 1 ) and call B 1 the resulting matrix, which reads Step 2: We now take the Schur complement with respect to the upper left corner of B 1 (indexed by I 1 ), where we use Lemma 14 to invert it: After taking this Schur complement the resulting matrix B 2 reads: Step 3: Inverting the top left block of B 2 via Lemma 14 gives Taking the third and final Schur complement with respect to this block in B 2 we get the matrix It is now clear that B 3 0. In turn, this implies that B 2 0 and thus B 0, which concludes the proof of Lemma 13.

Some numerical justification for convexity of f d
We have carried out some numerical experiments for a range of values of d, L, n and verified that the matrices Q γ are positive semidefinite for all γ ∈ N n d−2 in these cases. Hence for these values the polynomial f d is convex and Conjecture 1 holds. Recall from Lemma 12 that the matrix Q γ can be decomposed as By the results in Section 2 we already know that the matrix M γ is positive semidefinite. Hence it now suffices to show that the matrix B γ + R γ is positive semidefinite for each γ ∈ N n d−2 . In Tables 1-10 in Appendix A we provide information about the minimum eigenvalues of the matrices Q γ , B γ and R γ for different values of n, d and L. In each case we consider the possible different cases for selecting γ ∈ N n d−2 up to symmetry; the different instances of γ are indicated in the column labeled γ. For instance, for d = 3, L = 2 there is only one possibility, say γ = u 1 corresponding to edge e 1 = {1, 2} (see Table 1 Table 2).
In all cases we find that Q γ is positive semidefinite (in fact, positive definite). As mentioned above for the case d = 3, we see that in general this cannot be deduced by considering its summands separately, since R γ has a negative smallest eigenvalue and λ min (B γ ) + λ min (R γ ) < 0 from a certain n (which depends on d and L). In addition we observe that λ min (B γ ) stays constant from a certain n while λ min (R γ ) keeps decreasing.