An efficient algorithm to discover large and frequent itemset in high dimensional data

The current trend of data collection involves a small number of observations with a very large number of variables, known as high dimensional data. Mining these data produces an explosive number of smaller itemsets which are less important than colossal (large) ones. As the trend in Frequent Itemset...

Full description

Bibliographic Details
Main Author: Zulkurnain, Nurul Fariza
Format: Monograph
Language:English
Published: 2019
Subjects:
Online Access:http://irep.iium.edu.my/70312/
http://irep.iium.edu.my/70312/1/FRGS_Closing_Report.pdf
id iium-70312
recordtype eprints
spelling iium-703122019-12-01T00:05:58Z http://irep.iium.edu.my/70312/ An efficient algorithm to discover large and frequent itemset in high dimensional data Zulkurnain, Nurul Fariza TK7885 Computer engineering The current trend of data collection involves a small number of observations with a very large number of variables, known as high dimensional data. Mining these data produces an explosive number of smaller itemsets which are less important than colossal (large) ones. As the trend in Frequent Itemset Mining is moving towards mining colossal itemsets, it is important to understand the challenges in order to formulate a better method that is faster in running time, more scalable and able to produce useful and interesting knowledge. For this reason, this research has proposed two new algorithms; RARE and RARE II, which mine colossal closed itemsets. Both algorithms apply a minimum cardinality threshold to limit the search space and a closure computation method that does not require storage of previously discovered itemsets for duplicates checking. These approaches improved both memory and time requirement of the algorithms to finish mining tasks. Algorithm RARE searches the rowset lattice in breadth-first manner which resulted to a reduced itemset intersections compare to other state-of-the-art algorithms, CARPENTER and IsTa. Meanwhile, RARE II further reduced itemset intersections by evaluating only the closed rowsets in order to mine the next closed itemsets. Although the different thresholds used in CARPENTER and IsTa make direct comparison difficult, RARE and RARE II proved to be better. The algorithms can finish mining all closed itemsets with less time compared to CARPENTER and IsTa which discovered only a fraction of the closed itemsets at a much longer time, before running out of memory. 2019-01-29 Monograph NonPeerReviewed application/pdf en http://irep.iium.edu.my/70312/1/FRGS_Closing_Report.pdf Zulkurnain, Nurul Fariza (2019) An efficient algorithm to discover large and frequent itemset in high dimensional data. Project Report. UNSPECIFIED. (Unpublished)
repository_type Digital Repository
institution_category Local University
institution International Islamic University Malaysia
building IIUM Repository
collection Online Access
language English
topic TK7885 Computer engineering
spellingShingle TK7885 Computer engineering
Zulkurnain, Nurul Fariza
An efficient algorithm to discover large and frequent itemset in high dimensional data
description The current trend of data collection involves a small number of observations with a very large number of variables, known as high dimensional data. Mining these data produces an explosive number of smaller itemsets which are less important than colossal (large) ones. As the trend in Frequent Itemset Mining is moving towards mining colossal itemsets, it is important to understand the challenges in order to formulate a better method that is faster in running time, more scalable and able to produce useful and interesting knowledge. For this reason, this research has proposed two new algorithms; RARE and RARE II, which mine colossal closed itemsets. Both algorithms apply a minimum cardinality threshold to limit the search space and a closure computation method that does not require storage of previously discovered itemsets for duplicates checking. These approaches improved both memory and time requirement of the algorithms to finish mining tasks. Algorithm RARE searches the rowset lattice in breadth-first manner which resulted to a reduced itemset intersections compare to other state-of-the-art algorithms, CARPENTER and IsTa. Meanwhile, RARE II further reduced itemset intersections by evaluating only the closed rowsets in order to mine the next closed itemsets. Although the different thresholds used in CARPENTER and IsTa make direct comparison difficult, RARE and RARE II proved to be better. The algorithms can finish mining all closed itemsets with less time compared to CARPENTER and IsTa which discovered only a fraction of the closed itemsets at a much longer time, before running out of memory.
format Monograph
author Zulkurnain, Nurul Fariza
author_facet Zulkurnain, Nurul Fariza
author_sort Zulkurnain, Nurul Fariza
title An efficient algorithm to discover large and frequent itemset in high dimensional data
title_short An efficient algorithm to discover large and frequent itemset in high dimensional data
title_full An efficient algorithm to discover large and frequent itemset in high dimensional data
title_fullStr An efficient algorithm to discover large and frequent itemset in high dimensional data
title_full_unstemmed An efficient algorithm to discover large and frequent itemset in high dimensional data
title_sort efficient algorithm to discover large and frequent itemset in high dimensional data
publishDate 2019
url http://irep.iium.edu.my/70312/
http://irep.iium.edu.my/70312/1/FRGS_Closing_Report.pdf
first_indexed 2023-09-18T21:39:49Z
last_indexed 2023-09-18T21:39:49Z
_version_ 1777413064931409920