Open access
Proceedings
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms

Analysis of Ward's Method

Abstract

We study Ward's method for the hierarchical k-means problem. This popular greedy heuristic is based on the complete linkage paradigm: Starting with all data points as singleton clusters, it successively merges two clusters to form a clustering with one cluster less. The pair of clusters is chosen to (locally) minimize the k-means cost of the clustering in the next step.
Complete linkage algorithms are very popular for hierarchical clustering problems, yet their theoretical properties have been studied relatively little. For the Euclidean k-center problem, Ackermann et al. [1] show that the k-clustering in the hierarchy computed by complete linkage has a worst-case approximation ratio of Θ(log k). If the data lies in ℝd for constant dimension d, the guarantee improves to O(1) [23], but the O-notation hides a linear dependence on d. Complete linkage for k-median or k-means has not been analyzed so far.
In this paper, we show that Ward's method computes a 2-approximation with respect to the k-means objective function if the optimal k-clustering is well separated. If additionally the optimal clustering also satisfies a balance condition, then Ward's method fully recovers the optimum solution. These results hold in arbitrary dimension. We accompany our positive results with a lower bound of Ω((3/2)d) for data sets in ℝd that holds if no separation is guaranteed, and with lower bounds when the guaranteed separation is not sufficiently strong. Finally, we show that Ward produces an O(1)-approximative clustering for one-dimensional data sets.

Formats available

You can view the full content in the following formats:

Information & Authors

Information

Published In

cover image Proceedings
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms
Pages: 2939 - 2957
Editor: Timothy M. Chan, University of Illinois at Urbana-Champaign, USA
ISBN (Online): 978-1-61197-548-2

History

Published online: 2 January 2019

Authors

Affiliations

Notes

*
This research was supported by ERC Starting Grant 306465 (BeyondWorstCase).

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item

View Options

View options

PDF

View PDF

Get Access

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media

The SIAM Publications Library now uses SIAM Single Sign-On for individuals. If you do not have existing SIAM credentials, create your SIAM account https://my.siam.org.