DisClose: Discovering colossal closed itemsets via a memory efficient compact row-tree
A recent focus in itemset mining has been the discovery of frequent itemsets from high-dimensional datasets. With exponentially increasing running time as average row length increases, mining such datasets renders most conventional algorithms impractical. Unfortunately, large cardinality itemsets ar...
Main Authors: | , , |
---|---|
Format: | Book Chapter |
Language: | English English |
Published: |
Springer Berlin Heidelberg
2013
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/51446/ http://irep.iium.edu.my/51446/ http://irep.iium.edu.my/51446/ http://irep.iium.edu.my/51446/1/DisClose_2013.pdf http://irep.iium.edu.my/51446/4/51446-DisClose_Discovering_Colossal_Closed_Itemsets-SCOPUS.pdf |
Summary: | A recent focus in itemset mining has been the discovery of frequent itemsets from high-dimensional datasets. With exponentially increasing running time as average row length increases, mining such datasets renders most conventional algorithms impractical. Unfortunately, large cardinality itemsets are likely to be more informative than small cardinality itemsets in this type of dataset. This paper proposes an approach, termed DisClose, to extract large cardinality (colossal) closed itemsets from high-dimensional datasets. The approach relies on a Compact Row-Tree data structure to represent itemsets during the search process. Large cardinality itemsets are enumerated first followed by smaller ones. In addition, we utilize a minimum cardinality threshold to further reduce the search space. Experimental results show that DisClose can achieve extraction of colossal closed itemsets in the discovered datasets, even for low support thresholds. The algorithm immediately discovers closed itemsets without needing to check if each new closed itemset has previously been found. |
---|