Abstract

We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a space-efficient data structure that would allow us to search, given a query vector, for the closest or nearly closest vector in the database. We also address this problem when distances are measured by the L1 norm and in the Hamming cube. Significantly improving and extending recent results of Kleinberg, we construct data structures whose size is polynomial in the size of the database and search algorithms that run in time nearly linear or nearly quadratic in the dimension. (Depending on the case, the extra factors are polylogarithmic in the size of the database.)

MSC codes

  1. 68Q25

Keywords

  1. nearest neighbor search
  2. data structures
  3. random projections

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
P. K. Agarwal and J. Matoušek, Ray shooting and parametric search, in Proceedings of the 24th Annual ACM Symposium on Theory of Computing, Victoria, Canada, 1992, pp. 517–526.
2.
Michael Molloy, The probabilistic method, Algorithms Combin., Vol. 16, Springer, Berlin, 1998, 1–35
3.
S. Arya, D. Mount, N. Netanyahu, R. Silverman, and A. Wu, An optimal algorithm for approximate nearest neighbor searching in fixed dimensions, in Proceedings of the 5th Annual ACM‐SIAM Symposium on Discrete Algorithms, Arlington, VA, 1994, pp. 573–582.
4.
J. S. Beis and D. G. Lowe, Shape indexing using approximate nearest‐neighbor search in high‐dimensional spaces, in Procceedings of the IEEE Conference on Computer Vision and Pattern Recogognition, Japan, 1997, pp. 1000–1006.
5.
Kenneth Clarkson, A randomized algorithm for closest‐point queries, SIAM J. Comput., 17 (1988), 830–847
6.
K. Clarkson, An algorithm for approximate closest‐point queries, in Proceedings of the 10th Annual ACM Symposium on Computational Geometry, Stony Brook, New York, 1994, pp. 160–164.
7.
S. Cost and S. Salzberg, A weighted nearest neighbor algorithm for learning with symbolic features, Machine Learning, 10 (1993), pp. 57–67.
8.
T. M. Cover and P. E. Hart, Nearest neighbor pattern classification, IEEE Trans. Inform. Theory, 13 (1967), pp. 21–27.
9.
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, Indexing by latent semantic analysis, J. Amer. Soc. Inform. Sci., 41 (1990), pp. 391–407.
10.
L. Devroye, T. Wagner, Nearest neighbor methods in discrimination, Handbook of Statist., Vol. 2, North‐Holland, Amsterdam, 1982, 193–197
11.
David Dobkin, Richard Lipton, Multidimensional searching problems, SIAM J. Comput., 5 (1976), 181–186
12.
Danny Dolev, Yuval Harari, Nathan Linial, Noam Nisan, Michal Parnas, Neighborhood preserving hashing and approximate queries, ACM, New York, 1994, 251–259
13.
D. Dolev, Y. Harari, and M. Parnas, Finding the neighborhood of a query in a dictionary, in Proceedings of the 2nd Israel Symposium on the Theory of Computing and Systems, IEEE Computer Society Press, Los Alamitos, CA, 1993, pp. 33–42.
14.
R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, John Wiley and Sons, New York, 1973.
15.
T. Figiel, J. Lindenstrauss, V. Milman, The dimension of almost spherical sections of convex bodies, Acta Math., 139 (1977), 53–94
16.
M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, Query by image and video content: The QBIC system, IEEE Computer, 28 (1995), pp. 23–32.
17.
U. Feige, D. Peleg, P. Raghavan, and E. Upfal, Computing with unreliable information, in Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, Baltimore, MD, 1990, pp. 128–137.
18.
A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic, Dordrecht, The Netherlands, 1991.
19.
O. Goldreich and L. Levin, A hard‐core predicate for all one‐way functions, in Proceedings of the 21st Annual ACM Symposium on Theory of Computing, Seattle, WA, 1989, pp. 25–32.
20.
T. Hastie and R. Tibshirani, Discriminant adaptive nearest neighbor classification, in 1st International ACM Conference on Knowledge Discovery and Data Mining, ACM, New York, 1995.
21.
P. Indyk and R. Motwani, Approximate nearest neighbors: Towards removing the curse of dimensionality, in Proceedings of the 30th Annual ACM Symposium on Theory of Computing, Dallas, TX, 1998, pp. 604–613.
22.
P. Indyk, R. Motwani, P. Raghavan, and S. Vempala, Locality‐preserving hashing in multidimensional spaces, in Proceedings of the 29th Annual ACM Symposium on Theory of Computing, El Paso, TX, 1997, pp. 618–625.
23.
W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitz mappings into Hilbert space, Contemp. Math., 26 (1984), pp. 189–206.
24.
J. Kleinberg, Two algorithms for nearest‐neighbor search in high dimensions, in Proceedings of the 29th Annual ACM Symposium on Theory of Computing, El Paso, TX, 1997, pp. 599–608.
25.
Bruno Codenotti, Gianna Del Corso, Giovanni Manzini, Matrix rank and communication complexity, Linear Algebra Appl., 304 (2000), 193–200
26.
Nathan Linial, Eran London, Yuri Rabinovich, The geometry of graphs and some of its algorithmic applications, Combinatorica, 15 (1995), 215–245
27.
Nathan Linial, Ori Sasson, Non‐expansive hashing, ACM, New York, 1996, 509–518
28.
Jiří Matoušek, Reporting points in halfspaces, IEEE Comput. Soc. Press, Los Alamitos, CA, 1991, 207–215
29.
S. Meiser, Point location in arrangements of hyperplanes, Inform. and Comput., 106 (1993), 286–303
30.
K. Mulmuley, Computational Geometry: An Introduction Through Randomized Algorithms, Prentice Hall, Englewood Cliffs, NJ, 1993.
31.
A. Pentland, R. W. Picard, and S. Sclaroff, Photobook: tools for content‐based manipulation of image databases, in Proceedings of the SPIE Conference on Storage and Retrieval of Image and Video Databases II, San Jose, CA, SPIE Proceedings 2185, Bellingham, WA, 1994, pp. 34–37.
32.
G. Mihlin, R. Piotrovskii˘, V. Frumkin, Contraction of word codes in automatic text processing, Naučn.‐Tehn. Informacija (VINITI) Ser. 2. Informacionnye Processy i Sistemy, (1974), 0–028–30, 40
33.
A. W. M. Smeulders and R. Jain, Proceedings of the 1st Workshop on Image Databases and Multi‐Media Search, World Sci. Ser. Software Engrg. and Knowledge Engrg. 8, World Scientific, River Edge, NJ, 1996.
34.
V. N. Vapnik and A. Y. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities, Theory Probab. Appl., 16 (1971), pp. 264–280.
35.
A. C. Yao and F. F. Yao, A general approach to d‐dimension geometric queries, in Proceedings of the 17th Annual ACM Symposium on Theory of Computing, Providence, RI, 1985, pp. 163–168.

Information & Authors

Information

Published In

cover image SIAM Journal on Computing
SIAM Journal on Computing
Pages: 457 - 474
ISSN (online): 1095-7111

History

Published online: 27 July 2006

MSC codes

  1. 68Q25

Keywords

  1. nearest neighbor search
  2. data structures
  3. random projections

Authors

Affiliations

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.