An efficient algorithm to discover large and frequent itemset in high dimensional data

The current trend of data collection involves a small number of observations with a very large number of variables, known as high dimensional data. Mining these data produces an explosive number of smaller itemsets which are less important than colossal (large) ones. As the trend in Frequent Itemset...

Full description

Bibliographic Details
Main Author:	Zulkurnain, Nurul Fariza
Format:	Monograph
Language:	English
Published:	2019
Subjects:	TK7885 Computer engineering
Online Access:	http://irep.iium.edu.my/70312/ http://irep.iium.edu.my/70312/1/FRGS_Closing_Report.pdf

id	iium-70312
recordtype	eprints
spelling	iium-703122019-12-01T00:05:58Z http://irep.iium.edu.my/70312/ An efficient algorithm to discover large and frequent itemset in high dimensional data Zulkurnain, Nurul Fariza TK7885 Computer engineering The current trend of data collection involves a small number of observations with a very large number of variables, known as high dimensional data. Mining these data produces an explosive number of smaller itemsets which are less important than colossal (large) ones. As the trend in Frequent Itemset Mining is moving towards mining colossal itemsets, it is important to understand the challenges in order to formulate a better method that is faster in running time, more scalable and able to produce useful and interesting knowledge. For this reason, this research has proposed two new algorithms; RARE and RARE II, which mine colossal closed itemsets. Both algorithms apply a minimum cardinality threshold to limit the search space and a closure computation method that does not require storage of previously discovered itemsets for duplicates checking. These approaches improved both memory and time requirement of the algorithms to finish mining tasks. Algorithm RARE searches the rowset lattice in breadth-first manner which resulted to a reduced itemset intersections compare to other state-of-the-art algorithms, CARPENTER and IsTa. Meanwhile, RARE II further reduced itemset intersections by evaluating only the closed rowsets in order to mine the next closed itemsets. Although the different thresholds used in CARPENTER and IsTa make direct comparison difficult, RARE and RARE II proved to be better. The algorithms can finish mining all closed itemsets with less time compared to CARPENTER and IsTa which discovered only a fraction of the closed itemsets at a much longer time, before running out of memory. 2019-01-29 Monograph NonPeerReviewed application/pdf en http://irep.iium.edu.my/70312/1/FRGS_Closing_Report.pdf Zulkurnain, Nurul Fariza (2019) An efficient algorithm to discover large and frequent itemset in high dimensional data. Project Report. UNSPECIFIED. (Unpublished)
repository_type	Digital Repository
institution_category	Local University
institution	International Islamic University Malaysia
building	IIUM Repository
collection	Online Access
language	English
topic	TK7885 Computer engineering
spellingShingle	TK7885 Computer engineering Zulkurnain, Nurul Fariza An efficient algorithm to discover large and frequent itemset in high dimensional data
description	The current trend of data collection involves a small number of observations with a very large number of variables, known as high dimensional data. Mining these data produces an explosive number of smaller itemsets which are less important than colossal (large) ones. As the trend in Frequent Itemset Mining is moving towards mining colossal itemsets, it is important to understand the challenges in order to formulate a better method that is faster in running time, more scalable and able to produce useful and interesting knowledge. For this reason, this research has proposed two new algorithms; RARE and RARE II, which mine colossal closed itemsets. Both algorithms apply a minimum cardinality threshold to limit the search space and a closure computation method that does not require storage of previously discovered itemsets for duplicates checking. These approaches improved both memory and time requirement of the algorithms to finish mining tasks. Algorithm RARE searches the rowset lattice in breadth-first manner which resulted to a reduced itemset intersections compare to other state-of-the-art algorithms, CARPENTER and IsTa. Meanwhile, RARE II further reduced itemset intersections by evaluating only the closed rowsets in order to mine the next closed itemsets. Although the different thresholds used in CARPENTER and IsTa make direct comparison difficult, RARE and RARE II proved to be better. The algorithms can finish mining all closed itemsets with less time compared to CARPENTER and IsTa which discovered only a fraction of the closed itemsets at a much longer time, before running out of memory.
format	Monograph
author	Zulkurnain, Nurul Fariza
author_facet	Zulkurnain, Nurul Fariza
author_sort	Zulkurnain, Nurul Fariza
title	An efficient algorithm to discover large and frequent itemset in high dimensional data
title_short	An efficient algorithm to discover large and frequent itemset in high dimensional data
title_full	An efficient algorithm to discover large and frequent itemset in high dimensional data
title_fullStr	An efficient algorithm to discover large and frequent itemset in high dimensional data
title_full_unstemmed	An efficient algorithm to discover large and frequent itemset in high dimensional data
title_sort	efficient algorithm to discover large and frequent itemset in high dimensional data
publishDate	2019
url	http://irep.iium.edu.my/70312/ http://irep.iium.edu.my/70312/1/FRGS_Closing_Report.pdf
first_indexed	2023-09-18T21:39:49Z
last_indexed	2023-09-18T21:39:49Z
_version_	1777413064931409920

An efficient algorithm to discover large and frequent itemset in high dimensional data

Similar Items