An efficient algorithm to discover large and frequent itemset in high dimensional data
The current trend of data collection involves a small number of observations with a very large number of variables, known as high dimensional data. Mining these data produces an explosive number of smaller itemsets which are less important than colossal (large) ones. As the trend in Frequent Itemset...
Main Author: | |
---|---|
Format: | Monograph |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/70312/ http://irep.iium.edu.my/70312/1/FRGS_Closing_Report.pdf |
Summary: | The current trend of data collection involves a small number of observations with a very large number of variables, known as high dimensional data. Mining these data produces an explosive number of smaller itemsets which are less important than colossal (large) ones. As the trend in Frequent Itemset Mining is moving towards mining colossal itemsets, it is important to understand the challenges in order to formulate a better method that is faster in running time, more scalable and able to produce useful and interesting knowledge. For this reason, this research has proposed two new algorithms; RARE and RARE II, which mine colossal closed itemsets. Both algorithms apply a minimum cardinality threshold to limit the search space and a closure computation method that does not require storage of previously discovered itemsets for duplicates checking. These approaches improved both memory and time requirement of the algorithms to finish mining tasks. Algorithm RARE searches the rowset lattice in breadth-first manner which resulted to a reduced itemset intersections compare to other state-of-the-art algorithms, CARPENTER and IsTa. Meanwhile, RARE II further reduced itemset intersections by evaluating only the closed rowsets in order to mine the next closed itemsets. Although the different thresholds used in CARPENTER and IsTa make direct comparison difficult, RARE and RARE II proved to be better. The algorithms can finish mining all closed itemsets with less time compared to CARPENTER and IsTa which discovered only a fraction of the closed itemsets at a much longer time, before running out of memory. |
---|