Free access
Proceedings
Proceedings of the 2020 SIAM International Conference on Data Mining

Maximizing diversity over clustered data

Abstract

Maximum diversity aims at selecting a diverse set of high-quality objects from a collection, which is a fundamental problem and has a wide range of applications, e.g., in Web search. Diversity under a uniform or partition matroid constraint naturally describes useful cardinality or budget requirements, and admits simple approximation algorithms [5]. When applied to clustered data, however, popular algorithms such as picking objects iteratively and performing local search lose their approximation guarantees towards maximum intra-cluster diversity because they fail to optimize the objective in a global manner. We propose an algorithm that greedily adds a pair of objects instead of a singleton, and which attains a constant approximation factor over clustered data. We further extend the algorithm to the case of monotone and submodular quality function, and under a partition matroid constraint. We also devise a technique to make our algorithm scalable, and on the way we obtain a modification that gives better solutions in practice while maintaining the approximation guarantee in theory. Our algorithm achieves excellent performance, compared to strong baselines in a mix of synthetic and real-world datasets.

Formats available

You can view the full content in the following formats:

Information & Authors

Information

Published In

cover image Proceedings
Proceedings of the 2020 SIAM International Conference on Data Mining
Pages: 649 - 657
Editors: Carlotta Demeniconi, George Mason University, USA and Nitesh Chawla, University of Notre Dame
ISBN (Online): 978-1-611976-23-6

History

Published online: 26 March 2020

Authors

Affiliations

Notes

This work was supported by the Academy of Finland project “Active knowledge discovery in graphs (AGRA)” (313927), the EC H2020 RIA project “SoBigData” (871042), and the Wallenberg AI, Autonomous Systems and Software Program (WASP).

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Figures

Tables

Media

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media