Consider a population consisting of $n$ individuals, each of whom has one of $d$ types (e.g., blood types, in which case $d=4$). We are allowed to query this population by specifying a subset of it, and in response we observe a noiseless histogram (a $d$-dimensional vector of counts) of types of the pooled individuals. This measurement model arises in practical situations such as pooling of genetic data and may also be motivated by privacy considerations. We are interested in the number of queries one needs to unambiguously determine the type of each individual. We study this information-theoretic question under the random, dense setting where in each query, a random subset of individuals of size proportional to $n$ is chosen. This makes the problem a particular example of a random constraint satisfaction problem (CSP) with a “planted” solution. We establish upper and lower bounds on the minimum number of queries $m$ such that there is no solution other than the planted one with probability tending to one as $n\to\infty$. The bounds are nearly matching. Our proof relies on the computation of the exact “annealed free energy” of this model in the thermodynamic limit, which corresponds to an exponential rate of decay of the expected number of solutions to this planted CSP. As a by-product of the analysis, we derive an identity of independent interest relating the Gaussian integral over the space of Eulerian flows of a graph to its spanning tree polynomial.

  • 1.  D. Achlioptas and  A. Coja-Oghlan , Algorithmic barriers from phase transitions , in Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science , 2008 , pp. 793 -- 802 . Google Scholar

  • 2.  D. Achlioptas and  C. Moore , The chromatic number of random regular graphs , in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques , Springer , Berlin, Heidelberg , 2004 , pp. 219 -- 228 . Google Scholar

  • 3.  D. Achlioptas and  A. Naor , The two possible values of the chromatic number of a random graph , Ann. of Math. (2) , 162 ( 2005 ), pp. 1335 -- 1351 . ISIGoogle Scholar

  • 4.  J. Banks C. Moore J. Neeman and  P. Netrapalli , Information-theoretic thresholds for community detection in sparse networks , in Proceedings of the 49th Annual Conference on Learning Theory , 2016 , pp. 383 -- 416 . Google Scholar

  • 5.  V. Bapst A. Coja-Oghlan S. Hetterich F. Raßmann and  D. Vilenchik , The condensation phase transition in random graph coloring , Comm. Math. Phys. , 341 ( 2016 ), pp. 543 -- 606 . ISIGoogle Scholar

  • 6.  M. Bayati M. Lelarge and  A. Montanari , Universality in polytope phase transitions and message passing algorithms , Ann. Appl. Probab. , 25 ( 2015 ), pp. 753 -- 822 . ISIGoogle Scholar

  • 7.  N. Biggs , Algebraic potential theory on graphs , Bull. London Math. Soc. , 29 ( 1997 ), pp. 641 -- 682 . ISIGoogle Scholar

  • 8.  S. Boyd and  L. Vandenberghe , Convex Optimization , Cambridge University Press , Cambridge, UK , 2004 . Google Scholar

  • 9.  E. Candès J. Romberg and  T. Tao , Stable signal recovery from incomplete and inaccurate measurements , Comm. Pure Appl. Math. , 59 ( 2006 ), pp. 1207 -- 1223 . ISIGoogle Scholar

  • 10.  E. J. Candès and  B. Recht , Exact matrix completion via convex optimization , Found. Comput. Math. , 9 ( 2009 ), pp. 717 -- 772 . ISIGoogle Scholar

  • 11.  E. J. Candés and  T. Tao , Decoding by linear programming , IEEE Trans. Inform. Theory , 51 ( 2005 ), pp. 4203 -- 4215 . ISIGoogle Scholar

  • 12.  S. Chaiken , A combinatorial proof of the all minors matrix tree theorem , SIAM J. Algebraic Discrete Methods , 3 ( 1982 ), pp. 319 -- 329 , https://doi.org/10.1137/0603033. LinkISIGoogle Scholar

  • 13.  A. Coja-Oghlan , Random Constraint Satisfaction Problems, preprint, https://arxiv.org/abs/0911.2322 , 2009 . , https://arxiv.org/abs/0911.2322. Google Scholar

  • 14.  A. Coja-Oghlan C. Efthymiou and  S. Hetterich , On the chromatic number of random regular graphs , J. Combin. Theory Ser. B , 116 ( 2016 ), pp. 367 -- 439 . ISIGoogle Scholar

  • 15.  A. Coja-Oghlan and  A. Frieze , Analyzing \tt Walksat on random formulas , SIAM J. Comput. , 43 ( 2014 ), pp. 1456 -- 1485 , https://doi.org/10.1137/12090191X. LinkISIGoogle Scholar

  • 16.  A. Coja-Oghlan A. Haqshenas and  S. Hetterich , \tt Walksat Stalls Well Below the Satisfiability Threshold, preprint, https://arxiv.org/abs/1608.00346 , 2016 . , https://arxiv.org/abs/1608.00346. Google Scholar

  • 17.  A. Coja-Oghlan E. Mossel and  D. Vilenchik , A spectral approach to analysing belief propagation for 3-colouring , Combin. Probab. Comput. , 18 ( 2009 ), pp. 881 -- 912 . ISIGoogle Scholar

  • 18.  A. Coja-Oghlan and  W. Perkins , Belief propagation on replica symmetric random factor graph models , Ann. Inst. Henri Poincaré D , 5 ( 2018 ), pp. 211 -- 249 . ISIGoogle Scholar

  • 19.  I. Csiszar and  J. Körner , Information Theory: Coding Theorems for Discrete Memoryless Systems , Cambridge University Press , Cambridge, UK , 2011 . Google Scholar

  • 20.  V. Dani C. Moore and  A. Olson , Tight bounds on the threshold for permuted k-colorability , in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques , Springer , Berlin, Heidelberg , 2012 , pp. 505 -- 516 . Google Scholar

  • 21.  N. G. De Bruijn , Asymptotic Methods in Analysis , Dover , New York , 1970 . Google Scholar

  • 22.  J. Ding A. Sly and  N. Sun , Proof of the satisfiability conjecture for large k , in Proceedings of the 47th Annual ACM Symposium on Theory of Computing , 2015 , pp. 59 -- 68 . Google Scholar

  • 23.  J. Ding A. Sly and  N. Sun , Satisfiability threshold for random regular nae-sat , Comm. Math. Phys. , 341 ( 2016 ), pp. 435 -- 489 . ISIGoogle Scholar

  • 24.  D. L. Donoho and  Compressed , IEEE Trans. Inform. Theory , 52 ( 2006 ), pp. 1289 -- 1306 . ISIGoogle Scholar

  • 25.  D. L. Donoho , For most large underdetermined systems of linear equations, the minimal $\ell_1$-norm solution is also the sparsest solution , Comm. Pure Appl. Math. , 59 ( 2006 ), pp. 797 -- 829 . ISIGoogle Scholar

  • 26.  D. L. Donoho A. Javanmard and  A. Montanari , Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing , IEEE Trans. Inform. Theory , 59 ( 2013 ), pp. 7434 -- 7464 . ISIGoogle Scholar

  • 27.  D. Du and  F. Hwang , Pooling Designs and Nonadaptive Group Testing: Important Tools for DNA Sequencing , Ser. Appl. Math. 18 , World Scientific , Hackensack, NJ , 2006 . Google Scholar

  • 28.  A. El Alaoui A. Ramdas F. Krzakala L. Zdeborová and  M. I. Jordan , Decoding from pooled data: Phase transitions of message passing , in Proceedings of the 2017 IEEE International Symposium on Information Theory , 2017 , pp. 2780 -- 2784 . Google Scholar

  • 29.  M. Fazel , Matrix Rank Minimization with Applications , Ph.D. thesis, Stanford University , Stanford, CA , 2002 ; available online from http://faculty.washington.edu/mfazel/thesis-final.pdf. , http://faculty.washington.edu/mfazel/thesis-final.pdf. Google Scholar

  • 30.  M. Heo R. L. Leibel B. B. Boyer W. K. Chung M. Koulu M. K. Karvonen U. Pesonen A. Rissanen M. Laakso M. I. J. Uusitupa Y. Chagnon C. Bouchard P. A. Donohoue T. L. Burns A. R. Shuldiner K. Silver R. E. Andersen O. Pedersen S. Echwald T. I. A. Sørensen P. Behn M. A. Permutt K. B. Jacobs R. C. Elston D. J. Hoffman and  D. B. Allison , Pooling analysis of genetic data: The association of leptin receptor (LEPR) polymorphisms with variables related to human adiposity , Genetics , 159 ( 2001 ), pp. 1163 -- 1178 . ISIGoogle Scholar

  • 31.  F. Krzakala M. Mézard and  L. Zdeborová , Reweighted belief propagation and quiet planting for random K-SAT , J. Satisf. Boolean Model. Comput. , 8 ( 2012/14 ) pp. 149 -- 171 . Google Scholar

  • 32.  F. Krzakał A. Montanari F. Ricci-Tersenghi G. Semerjian and  L. Zdeborová , Gibbs states and the set of solutions of random constraint satisfaction problems , Proc. Natl. Acad. Sci. USA , 104 ( 2007 ), pp. 10318 -- 10323 . ISIGoogle Scholar

  • 33.  F. Krzakala and  L. Zdeborová , Hiding quiet solutions in random constraint satisfaction problems , Phys. Rev. Lett. , 102 ( 2009 ), 238701 . ISIGoogle Scholar

  • 34.  M. Mézard and  C. Toninelli , Group testing with random pools: Optimal two-stage algorithms , IEEE Trans. Inform. Theory , 57 ( 2011 ), pp. 1736 -- 1745 . ISIGoogle Scholar

  • 35.  B. Recht M. Fazel and  P. A. Parrilo , Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization , SIAM Rev. , 52 ( 2010 ), pp. 471 -- 501 , https://doi.org/10.1137/070697835. LinkISIGoogle Scholar

  • 36.  R. T. Rockafellar , Convex Analysis , Princeton University Press , Princeton, NJ , 1970 . Google Scholar

  • 37.  J. Scarlett and  V. Cevher , Phase transitions in the pooled data problem , in Proceedings of the Thirty-First Annual Conference on Neural Information Processing Systems , 2017 . Google Scholar

  • 38.  A. Sebö , On two random search problems , J. Statist. Plann. Inference , 11 ( 1985 ), pp. 23 -- 31 . ISIGoogle Scholar

  • 39.  P. Sham J. S. Bader I. Craig M. O'Donovan and  M. Owen , DNA pooling: A tool for large-scale association studies , Nat. Rev. Genet. , 3 ( 2002 ), pp. 862 -- 871 . ISIGoogle Scholar

  • 40.  A. Sly N. Sun and  Y. Zhang , The number of solutions for random regular NAE-SAT , in Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science , 2016 , pp. 724 -- 731 . Google Scholar

  • 41.  T. Tanaka , A statistical-mechanics approach to large-system analysis of CDMA multiuser detectors , IEEE Trans. Inform. Theory , 48 ( 2002 ), pp. 2888 -- 2910 . ISIGoogle Scholar

  • 42.  I.-H. Wang S.-L. Huang and  K.-Y. Lee C. Chen, Data extraction via histogram and arithmetic mean queries: Fundamental limits and algorithms , in Proceedings of the IEEE International Symposium on Information Theory , 2016 , pp. 1386 -- 1390 . Google Scholar

  • 43.  Y. Wu and  S. Verdú , Fundamental limits of almost lossless analog compression , in Proceedings of the IEEE International Symposium on Information Theory , 2009 , pp. 359 -- 363 . Google Scholar

  • 44.  L. Zdeborová and  F. Krzakala , Statistical Physics of Inference: Thresholds and Algorithms, preprint, https://arxiv.org/abs/1511.02476 , 2015 . , https://arxiv.org/abs/1511.02476. Google Scholar

  • 45.  P. Zhang F. Krzakala M. Mézard and  L. Zdeborová , Non-adaptive pooling strategies for detection of rare faulty items , in Proceedings of the IEEE International Conference on Communications Workshop , 2013 , pp. 1409 -- 1414 . Google Scholar

  • 46.  K. S. Zigangirov , Theory of Code Division Multiple Access Communication , John Wiley & Sons , Hoboken, NJ , 2004 . Google Scholar