Free access
Proceedings of the 2010 SIAM International Conference on Data Mining

Mining Top-K Patterns from Binary Datasets in presence of Noise


The discovery of patterns in binary dataset has many applications, e.g. in electronic commerce, TCP/IP networking, Web usage logging, etc. Still, this is a very challenging task in many respects: overlapping vs. non overlapping patterns, presence of noise, extraction of the most important patterns only.
In this paper we formalize the problem of discovering the Top-K patterns from binary datasets in presence of noise, as the minimization of a novel cost function. According to the Minimum Description Length principle, the proposed cost function favors succinct pattern sets that may approximately describe the input data.
We propose a greedy algorithm for the discovery of Patterns in Noisy Datasets, named PaNDa, and show that it outperforms related techniques on both synthetic and real-world data.

Formats available

You can view the full content in the following formats:

Information & Authors


Published In

cover image Proceedings
Proceedings of the 2010 SIAM International Conference on Data Mining
Pages: 165 - 176
Editors: Srinivasan Parthasarathy, The Ohio State University, Columbus, Ohio, Bing Liu, University of Illinois – Chicago, Chicago, Illinois, Bart Goethals, University of Antwerp, Antwerpen, Belgium, Jian Pei, Simon Fraser University, Burnaby, British Columbia, Canada, and Chandrika Kamath, Lawrence Livermore National Laboratory, Livermore, California
ISBN (Print): 978-0-898717-03-7
ISBN (Online): 978-1-61197-280-1


Published online: 18 December 2013



Metrics & Citations



If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

There are no citations for this item

View Options

View options


View PDF







Copy the content Link

Share with email

Email a colleague

Share on social media