Decoding from Pooled Data: Sharp Information-Theoretic Bounds

Consider a population consisting of n individuals, each of whom has one of d types (e.g. their blood type, in which case d = 4). We are allowed to query this database by specifying a subset of the population, and in response we observe a noiseless histogram (a d -dimensional vector of counts) of types of the pooled individuals. This measurement model arises in practical situations such as pooling of genetic data and may also be motivated by privacy considerations. We are interested in the number of queries one needs to unambiguously determine the type of each individual. In this paper, we study this information-theoretic question under the random, dense setting where in each query, a random subset of individuals of size proportional to n is chosen. This makes the problem a particular example of a random constraint satisfaction problem (CSP) with a “planted” solution. We establish almost matching upper and lower bounds on the minimum number of queries m such that there is no solution other than the planted one with probability tending to 1 as n → ∞ . Our proof relies on the computation of the exact “annealed free energy” of this model in the thermodynamic limit, which corresponds to the exponential rate of decay of the expected number of solution to this planted CSP. As a by-product of the analysis, we show an identity of independent interest relating the Gaussian integral over the space of Eulerian ﬂows of a graph to its spanning tree polynomial.


Introduction
Constraint satisfaction problems (CSPs) have been the object of intense study in recent years in probability theory, computer science, information theory and statistical physics.For certain families of CSPs, a deep understanding has begun to emerge regarding the number of solutions as a function of problem size, as well as the algorithmic feasibility of finding solutions when they exist (see e.g.[COMV09, COF14, COHH16, COP16, DSS15, DSS16, SSZ16]...) Consider in particular a planted random constraint satisfaction problem with n variables that take their values in the discrete set {1, • • • , d}, with d ≥ 2, and with m clauses drawn uniformly at random under the constraint that they are all satisfied by a pre-specified assignment, which is referred to as the planted solution.It is of interest to determine the properties of the set of all solutions of CSP as n and m grow to infinity at a some relative rate.Two questions are of particular importance: (1) how large should m be so that the planted solution is the unique solution to CSP? and (2) given that it is unique, how large should m be so that it is recoverable by a "tractable" algorithm?Significant progress has been made on these questions, often initiated by insights from statistical physics and followed by a growing body of rigorous mathematical investigation.The emerging

Problem and motivation
The setting Let {h a } 1≤a≤m be a collection of d-dimensional arrays with non-negative integer entries.For an assignment τ : {1, • • • , n} → {1, • • • , d} of the n variables, and given a realization of m random subsets S a ⊂ {1, • • • , n}, the constraints of the HQP are given by h a = h a (τ ) for all 1 ≤ a ≤ m with We let τ * : {1, • • • , n} → {1, • • • , d} be a planted assignment; i.e., we set h a := h a (τ * ) for all a for some realization of the sets {S a }, and consider the problem of recovering the map τ * given the observation of the arrays {h a } 1≤a≤m .This problem can be viewed informally as that of decoding a discrete high-dimensional signal consisting of categorical variables from a set of measurements formed by pooling together the variables belonging to a subset of the signal.It is useful to think of the n variables as each describing the type or category of an individual in a population of size n, where each individual has exactly one type among d.For instance the categories may represent blood types or some other discrete feature such as ethnicity or age group.Then, the observation h a is the histogram of types of a subpopulation S a .We let π = 1 n τ * −1 (1) , • • • , τ * −1 (d) denote the vector of proportions of assigned values; i.e., the empirical distribution of categories.
We consider here a model in which each variable participates in a given constraint independently and with probability α ∈ (0, 1).Thus, the sets {S a } 1≤a≤m are independent draws of a random set S where Pr(i ∈ S) = α independently for each i ∈ {1, . . ., n}.We are thus in the "dense regime" where E[|S|] = αn; i.e., the number of variables participating in each constraint (the degree of each factor in the CSP) is linear in n.
Motivation This model is inspired by practical problems in which a data analyst can only assay certain summary statistics involving a moderate or large number of participants.This may be done for privacy reasons, or it may be inherent in the data-collection process (see e.g.[SBC + 02, HLB + 01]).For example, in DNA assays, the pooling of allele measurements across multiple strands of DNA is necessary given the impracticality of separately analyzing individual strands.Thus the data consists of a frequency spectrum of alleles; a "histogram" in our language.In the privacy-related situation, one may take the viewpoint of an attacker whose goal is to gain a granular knowledge of the database from coarse measurements, or that of a guard who wishes to prevent this scenario from happening.It is then natural to ask how many histogram queries it takes to exactly determine the category of each individual.
Related problems Note that the case d = 2 of HQP can be seen as a compressed sensing problem with a binary sensing matrix and binary signal.While the bulk of the literature in the field of compressed sensing is devoted to the case in which both the signal of interest and the sensing matrix are real-valued, the binary case has also been considered, notably in relation to Code Division Multiple Access (CDMA) [Zig04,Tan02], and Group Testing [DH06,MT11]: in the latter, one observes the logical "OR" of subsets of the entries of the signal.In the case of categorical variables with d ≥ 3 categories, it is natural to consider measurements consisting of histograms of the categories in the pooled sub-population.In the literature on compressed sensing one commonly considers the setting where the sensing matrices have i.i.d.entries with finite second moment, and the signal has an arbitrary empirical distribution of its entries.It has been established that, under the scaling m = κn, whereas the success of message-passing algorithms requires κ > κ BP [BLM15], the information-theoretic threshold is κ IT = 0 in the discrete signal case [WV09,DJM13], indicating that uniqueness of the solution happens at a finer scale m = o(n).Here we consider the HQP with arbitrary d, for which the exact scaling for investigating uniqueness is m = γ n log n with finite γ > 0, and provide tight bounds on the information-theoretic threshold.
Prior work on HQP The study of this problem for generic values of d was initiated in [WHLC16] in the two settings where the sets {S a } are deterministic and random.They showed in both these cases with a simple counting argument that under the condition that π is the uniform distribution, if m < log d d−1 n log n then the set of collected histograms does not uniquely determine the planted assignment τ * (with high probability in the random case).On the other hand, for the deterministic setting, they provided a querying strategy that recovers τ * provided that m > c 0 n log n , where c 0 is an absolute constant independent of d.For the random setting and under the condition that the sets S a are of average size n/2, they proved via a first moment bound that m > c 1 n log n with c 1 also constant and independent of d, suffices to uniquely determine τ * , although no algorithm was proposed in this setting.
In the above results, there is a gap that is both information-theoretic and algorithmic depending on the dimension d between the upper and lower bounds.Intuitively, the upper bounds should also depend on d since the decoding problem becomes easier (or at least, it is no harder) for large d, for the simple reason that if it is possible to determine the categories of the population for d = 2, then one can proceed by dichotomy for larger d by merging the d groups into two super-groups, identifying which individuals belong to each of the two super-groups, and then recurse.We attempt to fill the information-theoretic gap in the random setting by providing tighter upper and lower bounds on the number of queries m necessary and sufficient to uniquely determine the planted assignment τ * with high probability, which depend on the dimension d and π along with explicit constants.In a sequel paper, we consider the algorithmic aspect of the problem and provide a Belief Propagationbased algorithm that recovers the planted assignment if m ≥ κ * (π, d) • n for a specific threshold κ * (π, d) and fails otherwise, indicating the putative existence of a statistical-computational gap in the random setting.

Main result
Let ∆ d−1 be the d − 1-dimensional simplex and H(x) = − d r=1 x r log x r for x ∈ ∆ d−1 be the Shannon entropy function.We write τ ∼ π to indicate that τ is a random assignment drawn from the uniform distribution over maps τ : Theorem 1.For n ≥ 2 integer, m = γ n log n , γ > 0, α ∈ (0, 1), and π ∈ ∆ d−1 with entries bounded away from 0 and 1.Let E be the event that τ * is not the unique satisfying assignment to HQP: (ii) On the other hand, let π [•] be the vector of order statistics of π: Remarks and special cases: • The resulting bounds do not depend on α as long as it is fixed and bounded away from 0 and 1.Its contribution in the problem is sub-dominant and vanishes as n → ∞ under the scaling considered here.
• The number k in the expression of γ up can be interpreted as the number of connected components of a graph on d vertices that depends on the overlap structure of the two assignments τ and τ * , and induces "maximum confusion" between them.This will become clear in latter sections.
The proof of the above Theorem occupies the rest of the manuscript.

Main ideas of the proof
Our main contribution is the second part of Theorem 1, which establishes an upper bound on the uniqueness threshold of the random CSP with histogram constraints HQP.The proof uses the first moment method to upper bound the probability of existence of a non-planted solution.Since we are in a planted model, the analysis of the first moment ends up bearing many similarities with a second moment computation in a purely random (non-planted) model.Although second moment computations often require approximations, for the HQP it turns out that we are able to compute the exact annealed free energy of the model in the thermodynamic limit.That is, letting Z be the number of solutions of the CSP, we show that the limit exists and we compute its value exactly.Then the value of the threshold γ up is obtained by locating the first point at which F becomes negative: Together with the fact that F is a monotone function, which will become clear once F is computed, it is clear that for any γ > γ up , E[Z − 1] decays exponentially with n when the latter is sufficiently large.This general strategy has been successfully pursued for a range of CSPs, such as K-SAT, NAE-SAT, and Independent Set, most of which are Boolean.For larger domain sizes, in order to carry out the second moment method one needs fine control of the overlap structure between the planted and a candidate solution.This control is at the core of the difficulty that arises in any second moment computation.To obtain such control, researchers have often imposed additional assumptions, at a cost of a weakening of the resulting bounds.For example, existing proofs for Graph Coloring and similar problems assume certain balancedness conditions (the overlap matrix needs to be close to doubly stochastic.)without which the annealed free energy cannot be computed [AN05, AM04, COEH16, BCOH + 16, BMNN16]; this yields results that fall somewhat short of the bounds that the second moment method could achieve in principle [DMO12].In the present problem, due its rich combinatorial structure, we are able to obtain unconditional control of the overlap structure, for any domain size d, and compute the exact annealed free energy.
Concretely, computing the function F requires tight control of the "collision probability" of two non-equal assignments τ 1 and τ 2 .This is the probability that the random histograms h(τ 1 ) = from a random draw of a pool S coincide.The collision probability roughly measures the correlation strength between the two assignments.Specifically, we will be interested in the collision probabilities of the pairs (τ * , τ ) where τ * is the planted assignment and τ is any candidate assignment.Its decay reveals how long an assignment τ "survives" as a satisfying assignment to HQP as n → ∞.The study of these collision probabilities requires the evaluation of certain Gaussian integrals over the space of Eulerian flows of a weighted graph on d vertices that is defined based on the overlap structure of τ and τ * .We prove a family of identities that relate these integrals to some combinatorial polynomials in the weights of the graph: the spanning tree and spanning forest polynomials.We believe that these identities are of independent interest beyond the problem studied in this paper.Once these collision probabilities are controlled, the computation of F(γ) per se requires the analysis of a certain sequence of optimization problems.We show that the sequence of maximum values converges to a finite limit that yields the value of the annealed free energy.
On the other hand, the proof of the first part of Theorem 1 is straightforward-it is an extension of a standard counting argument used in [ZKMZ13] and [WHLC16].The argument goes as follows: if m is too small then the number of possible histograms one could potentially observe is exponentially smaller than the number of assignments of n variables that agree with π.Therefore when the planted assignment τ * is drawn at random, there will exist at least one τ = τ * that satisfies the constraints of the CSP with overwhelming probability.We begin with this argument in the next section and then turn to the more challenging computation of the upper bound.

Proof of Theorem 1
Notation We denote vectors in R d in bold lower case letters, e.g., x, and matrices in R d×d will be written in bold lower case underlined letters, e.g., x.We denote the coordinates of such vectors and matrices as x r and x rs respectively.Matrices that act either as linear operators on the space R d×d or that are functions of elements in this space are written in bold upper case letters, e.g., M x and L(x), for x ∈ R d×d .These choices will be clear from the context.We may write x/y to indicate coordinate-wise division.Additionally, for two d × d matrices a, b ∈ R d×d , a ⊙ b ∈ R d×d is their Hadamard product.We let 1 ∈ R d be the all-ones vector.

The first part of Theorem 1: the lower bound
Let m = γ n log n with γ > 0. The number of potential histograms one could possibly observe in a single query with pool size Since the queries are independent, the number of collections of histograms {h a } 1≤a≤m one could potentially observe in m queries is m a=1 f (|S a |, d).On the other hand, the number of possible assignments /2 exp(H(π)n), for some constant C(π) > 0 depending on π.Now, the probability that τ * is the unique satisfying assignment of the CSP with constraints given by the random histograms {h a (τ * )} 1≤a≤m , averaged over the random choice of τ * ∼ π, is If γ < γ low the last quantity tends to 0 as n → ∞.This concludes the proof of the first assertion of the theorem.

The second part of Theorem 1 : the upper bound
We use a first moment method to show that when γ is greater than γ up , the only assignment satisfying HQP is τ * with high probability.Let Z be the number of satisfying assignments to HQP: The planted assignment τ * is obviously a solution, so we always have Z ≥ 1. Recall the definition of the annealed free energy Also, recall that for 1 Theorem 2. Let m = γ n log n with γ > 0. The limit (2) exists for all γ > 0 and its value is We can deduce from Theorem 2 the smallest value of γ past which F(γ) becomes negative.In particular, we see that F is a decreasing function of γ that crosses the horizontal axis at From this result it is easy to prove the second assertion of Theorem 1.By averaging over τ * and applying Markov's inequality, we have: For γ > γ up , it is clear that F(γ) < 0. Let 0 < ǫ < |F(γ)| /2; then there is an integer n 0 (ǫ) ≥ 0 such that for all n ≥ n 0 (ǫ), Now it remains to prove Theorem 2, and this represents the main technical thrust of our paper.

Collisions, overlaps, and the first moment
Preliminaries We begin by presenting the main quantities to be analyzed in our application of the first moment method.We have Next, we write the collision probability, Pr S (h(τ ) = h(τ * )), for fixed τ and τ * in a convenient form.Let us first define the overlap matrix, µ(τ, τ * ) = (µ rs ) 1≤r,s≤d ∈ Z d×d + , of τ and τ * , by Remark that h(τ ) = h(τ * ) if and only if S ∩ τ −1 (r) = S ∩ τ * −1 (r) for all r ∈ {1, • • • , d}.Since the collection of sets {τ −1 (r)} 1≤r≤d forms a partition of {1, • • • , n}, and similarly with τ * , we have the following equality of events Therefore, the probability that two assignments τ and τ * collide on a random pool S-meaning that their histograms formed on the pool S coincide-is where the outer sum is over all arrays of integer numbers ν = (ν rs ) 1≤r,s≤d such that 0 ≤ ν rs ≤ µ rs for all r, s.We see from the above expression that the collision probability of τ and τ * only depends on the overlap matrix µ(τ, τ * ).We henceforth denote the probability in equation ( 5) by q(µ), where we dropped the dependency on τ and τ * .Remark that τ = τ * if and only if their overlap matrix µ is diagonal.Thus, we can rewrite the expected number of solutions as where the sum is over all non-diagonal arrays µ = (µ rs ) 1≤r,s≤d with non-negative integer entries that sum to n, and n µ = n! r,s µrs! .
The rest of the proof From here, the proof of Theorem 2 roughly breaks into three parts: (i) One needs to have tight asymptotic control on the collision probability q(µ) when any subset of the entries of µ becomes large.This will be achieved via the Laplace method (see, e.g., [DB70]).The outcome of this analysis is an asymptotic estimate that exhibits two different speeds of decay, polynomial or exponential, depending on the "balancedness" of µ as its entries become large.This notion of balancedness, namely that µ must have equal row-and column-sums 1 , is specific to the histogram setting and departs from the usual "double stochasticity" that arises in other more classical problems such as Graph Coloring, and Community Detection under the stochastic block model [AN05, AM04, COEH16, BCOH + 16, BMNN16].As we will explain in the next section, configurations (τ, τ * ) with an unbalanced overlap matrix have an exponentially decaying collision probability, i.e., they exhibit weak correlation, and disappear very early on as n → ∞ under the scaling m = γ n log n .On the other hand, those configurations with balanced overlap exhibit a slow decay of correlation: their collision probability decays only polynomially, and these are the last surviving configurations in expression (6) as n → ∞.
(ii) Understanding the above-mentioned polynomial decay of q(µ) requires the evaluation of a multivariate Gaussian integral (which is a product of the above analysis) over the space of constraints of the array ν in (5); the latter being the space of Eulerian flows on the graph on d vertices whose edges are weighted by the (large) entries of µ.We show that this integral, properly normalized, evaluates to the inverse square root of the spanning tree (or forest) polynomial of this graph.This identity seems to be new, to the best of our knowledge, and may be of independent interest.We therefore provide two very different proofs of it, each highlighting different combinatorial aspects.
(iii) Lastly, armed with these estimates, we show the existence of, and compute the exact value of, the annealed free energy of the model in the thermodynamic limit, thereby completing the proof of Theorem 2. This last part requires the analysis of a certain optimization problem involving an entropy term and an "energy" term accounting for the correlations discussed above.Here we can exactly characterize the maximizing configurations for large n, and this allows the computation of the value of F(γ).We note once more that this situation contrasts with the more traditional case of Graph Coloring, where we lack a rigorous understanding of the maximizing configurations of the second moment, except when certain additional constraints are imposed on their overlap matrix.

Bounding the collision probabilities
Here we provide tight asymptotic bounds on the collision probabilities q(µ) defined in (5).Consider the following subspace of R d×d , which will play a key role in the analysis: This is a linear subspace of dimension ) be an undirected graph on d vertices where we allow up to two parallel edges between each pair of vertices, i.e., V = {1, • • • , d}, and and recalling that ⊙ represents the Hadamard product, we let where for two d × d matrices a, b, M G (a, b) is the d × d matrix with entries a rs if (r, s) ∈ E and b rs otherwise.By strong duality (see, e.g., [BV04,Roc70]), the function (9) can be written in the more transparent form where φ * µ is the Legendre-Fenchel transform of the (convex) function We may note that since φ * µ is convex on R d , ϑ is a continuous function of its first argument.Before we state our bounds on the collision probability, we recall the following concept from algebraic graph theory.Define the spanning tree polynomial of G as , where the sum is over all spanning trees of G, and nst(G) is the number of spanning trees of G.In cases where G is not connected, we define the following polynomial where G l is the lth connected component of G, and we denote by ncc(G) the number of connected components of G.This polynomial may be interpreted as the generating polynomial of spanning forests of G having exactly ncc(G) trees.The polynomials T G and P G are multi-affine, homogenous of degree d−1 for T G (when G is connected) and d−ncc(G) for P G , and do not depend on the diagonal entries {z rr : 1 ≤ r ≤ d}.Furthermore, letting z rs = 1 for all r = s, we have P G (z) = T G (z) = 1.We now provide tight asymptotic bounds on the collision probability q(µ) when a subset E of the entries of µ become large.
and ǫ ∈ (0, 1).There exist two constants 0 < c u < c l depending on ǫ, d and α such that for all n sufficiently large, and all µ ∈ {0, • • • , n} d×d with µ rs ≥ ǫn if and only if (r, s) ∈ E, we have Let us now expand on the above result and derive some special cases and corollaries.First, we see that the collision probabilities can decay at two different speeds-polynomial or exponential-in the entries of the overlap matrix µ, depending on whether ϑ u (µ) (and/or ϑ l (µ)) is zero or strictly negative.Second, the apparent gap in the exponential decay of q(µ) in the above characterization is artificial; one can make ϑ u and ϑ l equal by taking µ rs = 0 for all (r, s) / ∈ E. Alternatively, they could be made arbitrarily close to each other under an appropriate limit: Assume for simplicity that µ rs = nw rs > 0 for all (r, s) ∈ E for some w ∈ [0, 1] d×d .We have For (r, s) / ∈ E, we have µ rs < ǫn, therefore The last step is justified by the continuity of ϑ(•, w).The same argument holds for v l (µ).Denoting the limiting function under this operation as ϑ(w), we obtain: The function ϑ can be seen as the exponential rate of decay of q(µ).The reason ϑ u and ϑ l cannot (in general) be replaced by ϑ in Theorem 3 is that all control on the constants c u and c l is lost when ǫ → 0. Next, we identify the cases where this exponential decay is non-vacuous.
Lemma 4. Let α ∈ (0, 1), and µ ∈ R d×d + .We have Now we specialize Theorem 3 to the case where the entries of the overlap matrix are either zero or grow proportionally to n.From Theorem 3 and Lemma 4, we deduce a key corollary on the convergence of the properly rescaled logarithm of the collision probabilities.
We see that the assignments τ such that µ(τ, τ * ) ∈ F exhibit a much stronger correlation to τ * than those for which this overlap matrix does not belong to F, and will hence survive much longer as n → ∞.
Proof of Lemma 4. Let µ, ν ∈ R d×d + with µ = 0. Let α ∈ (0, 1), and let G = (V, E) denote a graph on d vertices.The function ϕ µ defined in (8) is strictly convex on the support of µ, i.e., on the subspace induced by the non-zero coordinates of µ, so it admits a unique minimizer on the closed convex set {x ∈ [0, 1] d×d : M G (x * ⊙ µ, ν) ∈ F} intersected with that subspace.Let x * be this minimizer.By differentiating the associated Lagrangian, the entries of x * admit the expressions for all (r, s) ∈ E (recall that µ rs > 0 for all such (r, s)), and where the vector λ ∈ R d is the unique solution up to global shifts of the system of equations s:(r,s)∈E (10) The claims of the lemma follow directly from the system of equations (10) and the fact that the non-negative function ϕ µ vanishes if and only if x * rs = α for all (r, s) ∈ E: to show (i), we take ν = 0.It is clear from the equations that µ ∈ F if and only if λ = c1, c ∈ R, is a solution to the above equations; and this is equivalent to x * rs = α whenever µ rs > 0. This is in turn equivalent to ϑ(µ) = ϕ µ (x * ) = 0.The same strategy is employed to show (ii), in conjunction with the continuity of the function ν → ϑ(ν, µ) over a compact domain (the infimum defining ϑ u is attained).
Proof of Corollary 5. Fix G = (V, E), let w ∈ (0, 1) d×d with w rs > 0 if and only if (r, s) ∈ E, and let n be an integer.For simplicity, assume that for nw is an array of integer entries.The noninteger part introduces easily manageable error terms.Applying Theorem 3 with ǫ = min (r,s)∈E w rs , we have for n large Moreover, since w rs = 0 for (r, s) / ∈ E, we have On the other hand, by homogeneity of the polynomial Otherwise,

A Gaussian integral
One important step in proving Theorem 3 (specifically for obtaining the polynomial decay part of q(µ)) is the following identity relating the Gaussian integral on a linear space F(G) defined based on a graph G to the spanning tree/forest polynomial of G.We denote by K d the complete graph on d vertices where every pair of distinct vertices is connected by two parallel edges.
Proposition 6.Let G = (V, E) be a graph on d vertices, where self-loops and up to two parallel edges are allowed: For any array of positive real numbers (w rs ) (r,s)∈E , we have In the case where G is the complete graph s)∈T w rs where the sum is over all spanning trees of K d .The pre-factor in the last expression comes from Cayley's formula for the number of spanning trees of the complete graph.We will show that it suffices to prove Proposition 6 in the case where G = K d in order to establish it for any graph G.We were not able to locate this identity in the literature.To illuminate the combinatorial mechanisms behind it, we provide what appear to be two very different proofs of it.A first "direct" and purely combinatorial proof views F(G) as the space of Eulerian flows of the graph G.A second, slightly indirect proof which is mainly analytic, and relates the above Gaussian integral to the characteristic polynomial of the Laplacian matrix of G then invokes the Principal Minors Matrix Tree theorem (see, e.g., [Cha82]).

Computing the annealed free energy
In this section we establish the existence of F(γ), and compute its value for all γ > 0. For 1 ≤ k ≤ d let D k denote the set of binary matrices X ∈ {0, 1} k×d such that each column of X contains exactly one non-zero entry and each row contains at least one non-zero entry.The elements of D k represent partitions of the set {1, • • • , d} into k non-empty subsets.
Proposition 7. Let m = γ n log n with γ > 0 fixed for all n ≥ 2. We have Moreover, the inner minimization problem in the above expression can be solved explicitly: And for Theorem 2 follows from Proposition 7 and Lemma 8. We begin with the proof of the latter and devote the next subsection to the lengthier proof of the former.
Proof of Lemma 8. We start with an arbitrary partition of π into k groups, and define a sequence of operations on the set of k-partitions of π that strictly decreases H(Xπ) at each step, and, irrespective of the starting point, always converges to π (k) .Starting with an arbitrary kpartition, write down the groups from left to right in decreasing order of total weight of each group.Initially, every group is marked incomplete.Then we perform the following operations: 1. Start with the rightmost incomplete group.
2. If it has more than one element, transfer the largest element to the leftmost group.This strictly decreases the entropy, since the heaviest group gets heavier and the lightest group gets lighter.Repeat this step until the rightmost group has exactly one element, and then move to the next step.
3. Consider this (now singleton) group.If there is no element to its left that is lighter than it, mark the group as complete.Else, swap this element with the lightest element to its left, and then mark it complete.Then go back to step 1.

Proof of Proposition 7
Let m = γ n log n .Recall from equation ( 6) that where the sum is over all arrays µ ∈ Z d×d + such that 1 ⊺ µ1 = n, 1 ≤ r =s µ rs .Since the sum defining E[Z − 1] is larger than its maximum term and smaller than the maximum term times (n + 1) d 2 , we only need to understand the convergence of the sequence If this sequence converges, we would have since 1 n log n nπ → H(π) by Stirling's formula.Next, we show that the above limit indeed exists.Let By Corollary 5, the function is the point-wise limit of the sequence of functions {ψ n } n≥2 on ∆ d×d−1 .Next, we use the following lemma which states that any non-diagonal sequence of maximizers {µ (n) } n of ψ n is such that rs grows proportionally to n.
Lemma 9.For all n ≥ 2, let By Lemma 9, which we prove at the end of the current argument, we can safely restrict the set of candidate maximizers to those µ such that r =s µ rs ≥ c 0 n for some fixed but small c 0 > 0. From here, and by a change of variables µ = nw, mere point-wise convergence suffices to interchange lim inf and sup: Now we present a matching upper bound for lim sup F n .For ǫ > 0, let l=1 denote the connected components of the graph G n , k = ncc(G n ).Also, for w an array for positive entries, let ncc ǫ (w) denote the number of connected components of the graph G(w, ǫ) = (V, E(w, ǫ)), V = {1, • • • , d}, E(w, ǫ) = {(r, s) : r = s, w rs > ǫ}, and let We will also write ncc(w) for ncc 0 (w).Let w (n) = µ (n) /n for all n ≥ 2, where µ (n) is defined in Lemma 9.By Theorem 3, we have for n sufficiently large Since w ) is bounded below by ǫ d independently of n.Therefore, for n sufficiently large, where the last inequality is obtained by Stirling's formula and taking a supremum over all w.By Lemma 4, ϑ ǫ (w) = 0 if and only if M G (αw, x) ∈ F for some x ∈ [0, 1] d×d such that 0 ≤ x rs ≤ ǫ for all (r, s) / ∈ E, G = (V, E) being the graph whose edges are (r, s) : w rs ≥ ǫ.This constrains the supremum to be achieved in the space of such w for n sufficiently large.Moreover, this condition implies in particular that w1 − w ⊺ 1 ℓ∞ ≤ 2dα −1 ǫ, where • ℓ∞ is the ℓ ∞ norm of a vector in R d .Consequently, this yields the following upper bound as n → ∞, for all ǫ > 0. Next, we argue that as ǫ → 0, the right-hand side of the above inequality converges to sup thereby establishing the existence of the limit lim F n along with its precise value.Since the function ǫ → ncc ǫ (w) is non-decreasing for any fixed w, the limit of the right-hand side of (15) as ǫ → 0 exists by monotone convergence.The limit can be decomposed as where {V l } k l=1 ranges over a partitions of the set {1, • • • , d} with k non-empty subsets, and G l (w) = (V l , {(r, s) ∈ V l × V l : w rs > ǫ}) for all 1 ≤ l ≤ k.Letting ǫ < c 0 , the range of the outer-most maximum becomes 1 ≤ k ≤ d − 1.By concavity of the entropy, the constraint that the graphs G l (w) must be connected can be safely removed from the maximization problem without changing its maximum value since it will be automatically satisfied.Thus, the inner-most optimization problem is that of a continuous function on a closed and bounded domain that shrinks with ǫ.Its value is therefore a continuous function of ǫ.Hence, by sending ǫ to 0, in conjunction with the lower bound (14), we conclude that As a final step, we make the above expression a bit more explicit.As argued previously, the supremum in ( 16) can be decomposed such that one first takes the maximum of ψ(w) over all w such that w rs = 0 for all (r, s) ∈ V l × V l ′ , l = l ′ where {V l } 1≤l≤k is a fixed partition of {1, • • • , d} into non-empty subsets, then maximize over all such partitions, then over all 1 where the constraint c 0 ≤ r =s w rs ≤ 1 is not active for c 0 small enough, hence can be removed.Let w be in the above constraint set.Then H(w) = − k l=1 (r,s)∈V l ×V l w rs log w rs , and this is maximized at with maximum value where X ∈ {0, 1} k×d , X l,r = 1 if and only if r ∈ V l .Note that D k is the set of all such matrices (each one corresponding to a partition {V l } of {1, • • • , d}).Finally, by maximizing over all possible partitions, and using (11) we get This completes the proof of Proposition 7, except for the proof Lemma 9, which we provide below.
Proof of Lemma 9. Let and then remove the logarithmic factor.We proceed by contradiction, by showing that if the above statement is not true, then the expected number of non-planted solutions E[Z − 1] vanishes as n → ∞ for any γ > 0, which contradicts our lower bound of Theorem 1.We have with q max = max q(µ) : 1 ≤ r =s µ rs < 1.Moreover, for all γ > 0, and this contradicts the fact that below γ low there are exponentially many distinct satisfying assignments.Now let us assume that rs = 0. We proceed by contradiction once more, and construct a sequence of points that have a higher objective value than µ (n) .Instead of working with convergent subsequences, we may as well assume that {µ (n) } is convergent.Let , and for all n and some ǫ > 0 sufficiently small.Let k n = ncc(G n ) be the number of connected components of the graph Observe that E ∞ and E n are both non-empty sets, hence k ∞ , k n ≤ d − 1 for all n.Now we consider an arbitrary partition of the set of vertices {1, • • • , d} into k ∞ subsets {V l } 1≤l≤k∞ , and let G be the graph on d vertices with edge set ∪ k∞ l=1 V l × V l ; i.e., G is the union of k ∞ complete connected components.Finally, let v (n) := nw for all n, with Recall that this construction provides one of the candidate maximizers of the annealed free energy (see ( 17)).Observe that v (n) satisfies all the constraints satisfied by µ (n) , and additionally, v (n) ∈ F. Therefore, by Corollary 5, we have Recall that the function ψ n is defined in (12).On the other hand, to study the asymptotics of ψ n (µ (n) /n), we apply Theorem 3 with n replaced by r =s µ (n) rs (which grows to infinity), and we get The term in the right-hand side follows from Stirling's formula and the fact that µ (n) rs /n → 0 for all r = s.The second term follows from the fact that Next, we argue based on these estimates that ψ n (v (n) /n) > ψ n (µ (n) /n) for all n large enough.First, the term involving ϑ u in the upper bound on ψ n (µ (n) /n) can be dropped since it is always non-negative.By direct computation (we already showed this in (18)), we have with p ∈ ∆ k∞−1 with p l = r∈V l π r for all 1 ≤ l ≤ k ∞ .We show that the right-hand side of this equality is strictly positive: We used Jensen's inequality on the concave function x → log x, and the fact that r∈V l π 2 r ≤ p l r∈V l π r = p 2 l for all l.Moreover, since all coordinates of π are strictly positive, equality holds if and only if π r = p l for all l and r ∈ V l which implies that the partition must be trivial; i.e., k ∞ = d.Recall that this does not happen since E ∞ is non-empty.
On the other hand, by setting ǫ sufficiently small (smaller than all the limits in the definition of E ∞ ), any edge in E ∞ will eventually (and permanently from then on) be in E n .Therefore the number of connected components of G n does not exceed that of G ∞ : k n ≤ k ∞ for n sufficiently large.We conclude that ψ n (v (n) /n) > ψ n (µ (n) /n) for all n large enough.Therefore µ (n) is not always a maximizer of ψ n , and this leads to a contradiction.

Proof of Theorem 3
Our proof is based on the method of Laplace from asymptotic analysis: when the entries of µ are large, the sum defining q(µ) is dominated by its largest term corrected by a sub-exponential term which is represented by a Gaussian integral (see, e.g., [DB70] for the univariate case).Since we are in a multivariate situation, the asymptotics of q depend on which subset of the entries of µ are large.Our approach is inspired by [AN05].We recall that for µ ∈ Z d×d + , The graph G will be used to store information about which entries of µ are going to infinity linearly in n, and which entries are not.We can split the sum defining q into a double sum, one involving the large terms (A in subsequent notation), and the rest: Let x * (ν, µ) be the optimal solution of the above optimization problem.
Before stating our asymptotic approximation result for A, we state an important lemma on the boundedness of the entries of x * (ν, µ), where the bounds depend only on ǫ and α.
Lemma 10.Let G be fixed as above, α ∈ (0, 1) and ǫ ∈ (0, 1).There exist two constants 0 < c l ≤ c u < 1 depending only on d, α and ǫ such that the following is true: For all integers n ≥ 1, and Therefore, the entries of x * can effectively be treated as constants throughout the rest of the proof.Now we state our asymptotic estimate for A.
By the above proposition, we have The estimate above (ignoring the term P G (µ)) can be interpreted as the expected value of the function e −ϑ(ν,µ) under the law of the random variable ν where each entry ν rs for (r, s) / ∈ E is independently binomial with parameters α and µ rs .From here, the bounds claimed in Theorem 3 follow immediately.
Proof of Proposition 11.We will show that e − (r,s)∈E z 2 rs /2µrs dz.
Then the result follows by applying Proposition 6 to evaluate the Gaussian integral.We proceed by showing the upper and lower bounds separately.
Here, H(x rs ) = −x rs log x rs − (1 − x rs ) log(1 − x rs ).Thus, the summand in A(ν ′ , µ) is bounded by if not.The function ϕ µ is smooth, and we have Based on Lemma 10, all the entries of x * will be treated as constants.If ν ∈ Ω then (νrs−µrsx * rs ) 2 x * rs (1−x * rs )µrs ≤ ℓ µ (ν) ≤ C(µ) 2 , and rs ) for all (r, s) ∈ E. Then we have (20) The second term in the sum above is bounded by this choice satisfies the condition C(µ) = o(µ 1/2 rs ) for (r, s) ∈ E. On the other hand, let with ℓ µ defined in (19).We ignored the dependence of S on ν ′ in the notation on purpose: this dependence is inessential.The first sum in (20) is upper bounded by S(µ).Therefore for some c u depending on ǫ and d, and ǫ n → 0 as n → ∞.We now turn our attention to the lower bound, deferring the analysis of the Gaussian sum S(µ) to a subsequent paragraph.
Bounding the Gaussian sum.Here we approximate S by a continuous Gaussian integral.We prove that where the symbol " ≍ " hides constants depending on G, ǫ, d and α as n → ∞.For ν ∈ F(G) an array of integer numbers such that 0 ≤ ν rs ≤ µ rs , let The sum is understood in the Minkowski sense.T (ν) is a "tile" of side 1 centered around ν. Two crucial facts are (i) : T (ν) and T (ν ′ ) are of disjoint interiors when ν = ν ′ and (ii) : T (ν) ⊂ F(G).Now for a fixed ν, let z ∈ T (ν).For r, s ∈ E, we have ν rs −1/2 ≤ z rs ≤ ν rs +1/2 and νrs−1/2 µrs −x * rs ≤ zrs µrs − x * rs ≤ νrs+1/2 µrs − x * rs .Thus By integrating both sides of the above inequality on T (ν) in the variable z, and summing over all ν with integer entries such that M G (ν, ν ′ ) ∈ F, we get where vol is the volume according to the dim(F(G))-dimensional Lebesgue measure.Since M G (x * ⊙ µ, ν ′ ) ∈ F, we have ν − x * ⊙ µ ∈ F for all ν we are summing over.Moreover, since the tiles T (ν) are of mutually disjoint interiors, and given that their union is in F(G), the left-hand side is upper bounded by (there is actually equality) Here, to get sharper constants, one could apply a theorem by Vaaler [Vaa79] which states that the volume of any linear subspace intersected with the cube C is at least 1; i.e., vol(C ∩ F(G)) ≥ 1.This yields As for the reverse inequality, slightly more care is needed in constructing the approximation.For a given ν, let Ω + = {(r, s) : ν rs ≥ x * rs µ rs + 1/2} and Ω − = {(r, s) : .
On the other hand, (ν rs − x * rs µ rs ) 2 ≤ (ν rs − x * rs µ rs + 1/2) 2 when (r, s) ∈ Ω + and (ν rs − x * rs µ rs ) 2 ≤ (ν rs − x * rs µ rs − 1/2) 2 when (r, s) ∈ Ω − .After integrating on T (ν) and summing over all ν ∈ Z E such that ν − x * ⊙ µ ∈ F, we obtain: and the last sum is equal to Finally, Proof of Lemma 10.Recall that x * is the unique minimizer of the function Recall also that the entries of x * admit the expressions (23) Our claim reduces to the boundedness of the differences |λ * r − λ * s | for all (r, s) ∈ E independently of n, µ, ν and r, s.We will shortly prove the following inequality where κ(α) = 1 α 2 + 1 (1−α) 2 .Assuming the above is true, by the Cauchy-Schwarz inequality, we would have since ǫn ≤ µ rs ≤ n for all (r, s) ∈ E. We would then be done.Now, the inequality (24) follows from convexity considerations.We let φ be the function being maximized in (23).By concavity of φ, we have The gradient and the Hessian of φ are Substituting in the expression of φ(λ * ), the term (r,s) / ∈E ν rs (λ * r − λ * s ) cancels out on both sides and we get (r,s)∈E which can be written as Now we approximate the logarithm by the positive part: log(α + (1 − α)e x ) ≤ x + = max{0, x} for all x ∈ R and α ∈ (0, 1), so that we almost get a quadratic polynomial inequality.We make this a genuine quadratic inequality by applying the additional approximation that for all x ∈ R and α ∈ (0, 1): This is easy to check by verifying that the discriminants of the resulting quadratics (one for x ≥ 0 and one for x < 0) are negative.Now, inequality (26) implies In other words, (r,s)∈E 6 Two proofs of Proposition 6 We first reduce the proof to the case where G = K d by a limiting argument.Let G = (V, E) be a graph on d vertices.If G is not connected then the constraints defining the space F(G) decouple across the connected components of G and so does the integrand exp − 1 2 (r,s)∈E x 2 rs /w rs , therefore the Gaussian integral factors across the connected components of G. Hence, we may assume that G is connected.Now, if where c(G) > 0 is a constant that only depends on G. On the other hand . Now we set w rs = 1 for all (r, s) ∈ E to clear out the constants.Since F (G) e − 1 2 (r,s)∈E x2 rs dx = (2π) dim(F (G))/2 , we get . Now it remains to prove the proposition for the complete graph.

A combinatorial proof
We proceed by adopting a combinatorial view on the structure of the space F. This will lead us to consider a very special basis of F in which the computations become tractable.(Background on the concepts used in this construction can be found in [Big97].)We first orient K d in such a way that every pair of distinct vertices is connected by two parallel edges pointing in opposite directions.There are d(d − 1) (oriented) edges in total.Then, the subgraphs whose edges are weighted by an array x ∈ F are called Eulerian: the total weight of the incoming edges is equal to that of the outgoing edges on each vertex.An important property of Eulerian graphs is that they can be decomposed into a superposition of cycles.In particular, fix a spanning tree T * of K d (the tree uses only one edge, if any, between each pair of vertices, and ignores their orientation).Every edge e / ∈ T * can be identified with the oriented cycle C e in the graph which consists of the oriented edge e and the unique path between the endpoints of e in the tree T * (where the direction of the edges on the path are flipped if necessary).Let χ e ∈ {0, ±1} d(d−1) be the indicator vector of the cycle C e 2 .Since a cycle is Eulerian, the vector χ e -when folded into a d × d matrix-belongs to F. Furthermore, the family {χ e : e / ∈ T * } is linearly independent since a cycle C e contains at least one edge-namely e-that is not contained in any other cycle C e ′ , e ′ = e.There are exactly d(d − 1) − (d − 1) = (d − 1) 2 off-tree edges in K d , and this number coincides with the dimension of F. Therefore B = {χ e : e / ∈ T * } is a basis of F, that we henceforth call a fundamental cycle basis of F.
Let P ∈ {0, ±1} (d−1) 2 ×d(d−1) be the matrix where the rows are indexed by the off-tree edges of the graph, and whose eth row is equal to χ e .The matrix P can be regarded as the cycle-edge incidence matrix of the graph K d : an entry (e, e ′ ) is non-zero if and only if e ′ ∈ C e .
Let M ∈ R d(d−1)×d(d−1) be the diagonal matrix with entries w rs , r = s on the diagonal.Then by a change of variables F e − rs x 2 rs /2wrs dx = Det(P P ⊺ ) 1/2 R (d−1) 2 e −z ⊺ (P M −1 P ⊺ )z/2 dz = (2π) (d−1) 2 /2 Det(P P ⊺ ) 1/2 Det(P M −1 P ⊺ ) −1/2 .Now it remains to show that Det(P M −1 P ⊺ ) = T (r,s) / ∈T w −1 rs where the sum is over all spanning trees of K d .This will finish the proof since we would then have Det(P P ⊺ ) = nst(K d ) = 2 d−1 d d−2 by Cayley's formula on the number of spanning trees in the complete graph.
We expand the determinant using the Cauchy-Binet formula.Let D = M −1/2 , and let E be the set of edges in K d . .If T = T * then we split the set of edges in T into those that belong to T * and those that do not.Each column in P D[ : , T ] corresponding to an edge in T ∩ T * contains only one non-zero entry (since this edge is contained in only one cycle in B).Therefore all such edges (columns) along with the corresponding cycles (rows of the non-zero entry) can be successively eliminated from the determinant, yielding Notice that this operation has drastically reduced the size of the problem; the common size k of the sets T ∩ T * and T ∩ T * is anywhere between 0 and d − 1 at most.Now we will show that rs , using a peeling argument slightly more delicate than the one previously applied.Observe that P D[ T ∩ T * , T ∩ T * ] is the k × k cycle-edge incidence matrix with k edges T ∩ T * indexing the rows and k edges in T ∩ T * indexing the columns, such that a row indexed by e indicates the edges e ′ ∈ T ∩ T * that participate in the cycle C e .So far, the spanning tree T * was arbitrary.To continue, we choose T * to be the star tree rooted at vertex 1 (see Figure 1, left).This choice simplifies the combinatorial argument to come, because the fundamental cycle basis B is now composed of triangles rooted at vertex 1. Crucially, this is where the assumption G = K d is needed; to ensure the existence of a star spanning tree.Figure 1 (right) illustrates the remaining edges after the first elimination procedure.Since T is a tree, by Lemma 9, each row and column of the matrix P D[ T ∩ T * , T ∩ T * ] contains at least one non-zero entry.Furthermore, T * being the star graph, each cycle C e ∈ B is a triangle rooted at vertex 1, thus each row of the above matrix contains at most two non-zero entries.This is simply because one of the three edges that compose the triangle C e -namely e-is not selected by the set T ∩ T * that indexes the columns of the matrix.See Figure 1, right (any blue edge has at most two adjacent red edges).
Furthermore, if all the rows contain exactly two non-zero entries then by the pigeonhole principle (since |T ∩ T * | = |T ∩ T * |), there will exist three edges in T ∩ T * that form a cycle C (see Figure 2, left).However, we assumed that T is a tree so this cannot happen.Therefore there must exist at least one row in the matrix with exactly one non-zero entry (i.e., there must exist an edge e ∈ T ∩T * such that C e = {e, e 1 , e 2 } ∈ B with e 1 ∈ T ∩ T * and e 2 ∈ T ∩ T * ).See Figure 2. Hence, we can eliminate this row and its corresponding column from the determinant.This corresponds to eliminating (dashing) the edges e and e 1 in the right figure above.Applying this argument iteratively allows us to peel all the edges and the cycles they belong to (see Figure 3), so that we obtain This completes the proof.
Proof of Lemma 12. Since we assumed the entries of the diagonal matrix D are strictly positive, we assume without loss of generality that D is the identity matrix.Assume now that the complement of S contains a cycle whose indicator vector is ξ ∈ {0, ±1} d(d−1) .Since B is a fundamental cycle basis, there exists x ∈ R (d−1) 2 \{0} such that ξ = e / ∈T * x e χ e = P ⊺ x.Since S selects no edges in the cycle indicated by ξ, it is clear that x ⊺ P [ : , S ] = 0, and this settles one direction.As for the other direction, let x ∈ R (d−1) 2 \{0} lie in the null space of (P [ : , S ]) ⊺ .The vector P ⊺ x indicates the weights of a Eulerian subgraph in K d (this vector belong to F when written in the form of a d × d matrix).The condition (P [ : , S ]) ⊺ x = 0 implies that this Eulerian subgraph involves no edges from S. In particular, any cycle from this subgraph (there always exists one) is in the complement of E. This completes the proof.

An analytic proof
This proof contrasts with the previous purely combinatorial approach in that it is mainly analytic.The approach relies on an interpolation argument that involves expressing the Gaussian integral over F as the limit of another parameterized Gaussian integral, when the parameter tends to zero.This latter integral can on the other hand be written in closed form, by relating it to the characteristic polynomial of a Laplacian matrix.Then the Principal Minors Matrix-Tree Theorem is invoked to finish the argument.This final step is the only place where combinatorics appear.(This proof approach was suggested to us by Andrea Sportiello.)Incidentally, this proof can be carried out with an arbitrary graph G; there is no need to reduce to the complete case.For δ > 0 let The additional Gaussian term in I(δ) gradually concentrates the mass of the integral on F as δ becomes small, and we have the following limiting statement: Lemma 13.We have lim w rs = (2d) d−1 T (w), since the above limit singles out the rooted spanning forests with exactly one root-i.e., rooted spanning trees-from the characteristic polynomial, and there are d ways of choosing the root of a spanning tree.This exactly leads to the desired identity Proof of Lemma 13.We decompose R d×d into the direct sum F ⊕ F ⊥ .It is to see that F ⊥ = {z = λ1 ⊺ − 1λ ⊺ , λ ∈ R d } which is a (d − 1)-dimensional space.For x ∈ R d×d , let y ∈ R d×d be its orthogonal projection on F, and z = x − y.Therefore (x − x ⊺ )1 = (z − z ⊺ )1 = 2z1 = 2(dλ − (1 ⊺ λ)1).For δ > 0, we have r,s (yrs+zrs) 2 /wrs e − 2 δ 2 z1 2 ℓ 2 dydz.
Proof of Lemma 14.Let δ > 0. We linearize the quadratic term (x − x ⊺ )1 2 ℓ 2 in I(δ) by writing the corresponding Gaussian as the Fourier transform of another Gaussian: ∀x ∈ R d×d , e Then by Fubini's theorem, where L(w) ∈ R d×d is the Laplacian matrix of the weighted graph G.

Discussion
Our main result, Theorem 1, leaves a gap of essentially a factor of two between γ low and γ up .This is a limitation of the methods employed.In particular, the upper bound is likely to be loose due to the lack of concentration of the random variable Z about its mean, and this translates to the possibility of existence of a non-trivial interval inside [γ low , γ up ] where Z is typically close to 1 while its expectation is exponentially large.A sharper bound could be obtained by computing E[|Z − 1| 1/n ], or even, and perhaps less ambitiously, E[|Z − 1| β ] for some 0 < β < 1.The first quantity would correspond to the free energy of the model in the limit; the quantity |Z − 1| 1/n is believed to concentrate for large n, so taking its logarithm before or after averaging would lead to the same outcome.In a different vein, the "sparse" regime where the sets S a are of constant size k (exactly or on average) could also be of interest.Here, the relevant scaling is one where m is proportional to n.The lower bound argument could be easily extended and yields a bound of H(π) (d−1) log k .As for the upper bound, one could in principle follow the same first moment strategy, but our analysis breaks in a quite serious fashion, in that none of our asymptotic estimates hold true in this regime.
where for two d × d matrices a, b, M G (a, b) is the d × d matrix with entries a rs if (r, s) ∈ E and b rs otherwise.The quantity A will be approximated using the Laplace method.Recall from the expressions (8) and (9) that ϕ µ (x) = (r,s)∈E µ rs D(x rs α), and ϑ(ν, µ) = min x∈[0,1] d×d M G (x⊙µ,ν)∈F ϕ µ (x).

F e − 1 2(
rs x 2 rs /wrs dx = (2π) ((d−1) 2 +d)/2 r,s w rs T (w)1/2, for all w ∈ R d×d + where T = T K d , then taking a limit w rs → 0 for all (r, s) / ∈ E, we get 1 s)∈E x 2 rs /wrs dx, For a matrix A of size n × m, I ⊆ {1, • • • , n}, J ⊆ {1, • • • , m}, we denote by A[I, J] the matrix of size |I| × |J| whose rows and columns are indexed by I and J respectively.If I = {1, • • • , n}, then we write A[ : , J], and likewise for the column indices.Then, we haveDet(P M −1 P ⊺ ) = S⊆E |S|=(d−1) 2 Det(P D[ : , S ]) 2 .(27)Now we use the following key lemma that we prove later.Lemma 12. Assuming the diagonal entries of the (diagonal) matrix D are positive, the matrix P D[ : , S ] is singular if and only if the graph spanned by the complement S = E\S of S in K d contains a cycle.Since there are exactly (d − 1) edges left unchosen by S, this lemma implies that they must form a spanning tree in order for the corresponding term to contribute to the sum in identity (27).Hence Det(P M −1 P ⊺ ) = T : spanning tree Det(P D[ : , T ]) 2 .Fix a spanning tree T of K d .Observe that if T = T * then the edges that generate the cycles in the fundamental cycle basis B are exactly the ones that are selected in T .In other words, each row and each column of P D[ : , T ] contain exactly one non-zero entry, (i.e., P [ : , T ] is a permutation matrix), hence Det(P D[ : , T ]) = ± (r,s) / ∈T w −1/2 rs

Figure 1 .
Figure1.Left: the graph K d where the star tree T * is highlighted in red.Right: remaining edges in red and blue after the first elimination procedure (violet edges were removed).

Figure 2 .
Figure 2. Left: an impossible situation where there remains a cycle C where no edge was eliminated in the first step.Right: a logical situation where there exist a fundamental cycle C e = (e, e 1 , e 2 ) with one edge in T * only, one edge in T only, and one edge in their intersection.

TFigure 3 .
Figure 3.An illustration of the peeling process."Wedges" with one edge in T only and the other in T * only are eliminated successively until no edges remain.Violet edges were eliminated in the first step.