Abstract

Relational joins are at the core of relational algebra, which in turn is the core of the standard database query language SQL. As their evaluation is expensive and very often dominated by the output size, it is an important task for database query optimizers to compute estimates on the size of joins and to find good execution plans for sequences of joins. We study these problems from a theoretical perspective, both in the worst-case model and in an average-case model where the database is chosen according to a known probability distribution. In the former case, our first key observation is that the worst-case size of a query is characterized by the fractional edge cover number of its underlying hypergraph, a combinatorial parameter previously known to provide an upper bound. We complete the picture by proving a matching lower bound and by showing that there exist queries for which the join-project plan suggested by the fractional edge cover approach may be substantially better than any join plan that does not use intermediate projections. On the other hand, we show that in the average-case model, every join-project plan can be turned into a plan containing no projections in such a way that the expected time to evaluate the plan increases only by a constant factor independent of the size of the database. Not surprisingly, the key combinatorial parameter in this context is the maximum density of the underlying hypergraph. We show how to make effective use of this parameter to eliminate the projections.

Keywords

  1. fractional edge cover
  2. linear programming
  3. join
  4. database query
  5. query plan

MSC codes

  1. 68P15
  2. 68Q25

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
S. Abiteboul, R. Hull, and V. Vianu, Foundations of Databases, Addison-Wesley, Reading, MA, 1995.
2.
N. Alon and J. Spencer, The Probabilistic Method, 2nd ed., John Wiley, New York, 1992.
3.
S. Chaudhuri, An overview of query optimization in relational systems, in Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), 1998, pp. 34--43.
4.
F. R. K. Chung, R. L. Graham, P. Frankl, and J. B. Shearer, Some intersection theorems for ordered sets and graphs, J. Combin. Theory Ser. A, 43 (1986), pp. 23--37.
5.
J. Flum, M. Frick, and M. Grohe, Query evaluation via tree-decompositions, J. ACM, 49 (2002), pp. 716--752.
6.
E. Friedgut and J. Kahn, On the number of copies of a hypergraph in another, Israel J. Math., 105 (1998), pp. 251--256.
7.
H. Garcia-Molina, J. Widom, and J. D. Ullman, Database System Implementation, Prentice-Hall, Englewood Cliffs, NJ, 1999.
8.
G. Gottlob, S. T. Lee, G. Valiant, and P. Valiant, Size and treewidth bounds for conjunctive queries, J. ACM, 59 (2012), pp. 1--35.
9.
G. Graefe, Query evaluation techniques for large databases, ACM Computing Surveys, 25 (1993), pp. 73--169.
10.
M. Grohe and D. Marx, Constraint solving via fractional edge covers, in Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2006, pp. 289--298.
11.
J. H\aastad, Clique is hard to approximate within $n\sp {1-\epsilon}$, Acta Math., 182 (1999), pp. 105--142.
12.
J. H. Kim and V. H. Vu, Concentration of multivariate polynomials and its applications, Combinatorica, 20 (2000), pp. 417--434.
13.
C. H. Papadimitriou, Computational Complexity, Addison-Wesley, Reading, MA, 1994.
14.
J. Rhadakrishnan, Entropy and Counting, http://www.tcs.tifr.res.in/$\sim$jaikumar/mypage.html.
15.
G. Valiant and P. Valiant, Size bounds for conjunctive queries with general functional dependencies, arXiv:0909.2030v2, 2010.
16.
V. V. Vazirani, Approximation Algorithms, Springer-Verlag, New York, 2001.

Information & Authors

Information

Published In

cover image SIAM Journal on Computing
SIAM Journal on Computing
Pages: 1737 - 1767
ISSN (online): 1095-7111

History

Submitted: 19 December 2011
Accepted: 5 June 2013
Published online: 22 August 2013

Keywords

  1. fractional edge cover
  2. linear programming
  3. join
  4. database query
  5. query plan

MSC codes

  1. 68P15
  2. 68Q25

Authors

Affiliations

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media