Free access
Proceedings
Proceedings of the 2017 SIAM International Conference on Data Mining

A Dual-Tree Algorithm for Fast k-means Clustering With Large k

Abstract

k-means is a widely used clustering algorithm, but for k clusters and a dataset size of N, each iteration of Lloyd's algorithm costs O(kN) time. This is problematic because increasingly, applications of k-means involve both large N and large k, and there are no accelerated variants that handle this situation. To this end, we propose a dual-tree algorithm that gives the exact same results as standard k-means; when using cover trees, we bound the single-iteration runtime of the algorithm as O(N + k log k), under some assumptions. To our knowledge these are the first sub-O(kN) bounds for exact Lloyd iterations. The algorithm performs competitively in practice, especially for large N and k in low dimensions. Further, the algorithm is tree-independent, so any type of tree may be used.

Formats available

You can view the full content in the following formats:

Information & Authors

Information

Published In

cover image Proceedings
Proceedings of the 2017 SIAM International Conference on Data Mining
Pages: 300 - 308
Editors: Nitesh Chawla, University of Notre Dame, Notre Dame, Indiana, USA and Wei Wang, University of California, Los Angeles, California, USA
ISBN (Online): 978-1-61197-497-3

History

Published online: 9 June 2017

Authors

Affiliations

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item

View Options

View options

PDF

View PDF

Get Access

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media