Abstract

The k-means algorithm is widely used for clustering, compressing, and summarizing vector data. In this paper, we propose a new acceleration for exact k-means that gives the same answer, but is much faster in practice. Like Elkan's accelerated algorithm [8], our algorithm avoids distance computations using distance bounds and the triangle inequality. Our algorithm uses one novel lower bound for point-center distances, which allows it to eliminate the innermost k-means loop 80% of the time or more in our experiments. On datasets of low and medium dimension (e.g. up to 50 dimensions), our algorithm is much faster than other methods, including methods based on low-dimensional indexes, such as k-d trees. Other advantages are that it is very simple to implement and it has a very small memory overhead, much smaller than other accelerated algorithms.

Formats available

You can view the full content in the following formats:

Information & Authors

Information

Published In

cover image Proceedings
Proceedings of the 2010 SIAM International Conference on Data Mining
Pages: 130 - 140
Editors: Srinivasan Parthasarathy, The Ohio State University, Columbus, Ohio, Bing Liu, University of Illinois – Chicago, Chicago, Illinois, Bart Goethals, University of Antwerp, Antwerpen, Belgium, Jian Pei, Simon Fraser University, Burnaby, British Columbia, Canada, and Chandrika Kamath, Lawrence Livermore National Laboratory, Livermore, California
ISBN (Print): 978-0-898717-03-7
ISBN (Online): 978-1-61197-280-1

History

Published online: 18 December 2013

Authors

Affiliations

Greg Hamerly
Department of Computer Science, Baylor University

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item

View Options

View options

PDF

View PDF

Get Access

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media