Datasets Size: Effect on Clustering Results

The recent advancement in the way we capture and store data pose a serious challenge for data analysis. This gives a wider acceptance to data mining, being an interdisciplinary field that implements algorithm on stored data with a view to discovering hidden knowledge. Most people that keep records,...

Full description

Bibliographic Details
Main Authors: Raheem, Ajiboye Adeleke, Ruzaini, Abdullah Arshah, Hongwu, Qin
Format: Conference or Workshop Item
Language:English
Published: 2013
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/5007/
http://umpir.ump.edu.my/id/eprint/5007/1/22-UMP.pdf
id ump-5007
recordtype eprints
spelling ump-50072018-05-18T02:50:35Z http://umpir.ump.edu.my/id/eprint/5007/ Datasets Size: Effect on Clustering Results Raheem, Ajiboye Adeleke Ruzaini, Abdullah Arshah Hongwu, Qin QA76 Computer software The recent advancement in the way we capture and store data pose a serious challenge for data analysis. This gives a wider acceptance to data mining, being an interdisciplinary field that implements algorithm on stored data with a view to discovering hidden knowledge. Most people that keep records, however, are yet to reap the benefits of this tool, this is due to the general notion that a large datasets is required to guarantee reliable results. However, this may not be applicable in all cases. In this paper, we proposed a research technique that implements descriptive algorithms on numeric datasets of varied sizes. We modeled each subset of our data using EM clustering algorithm; two different numbers of partitions (k) were estimated and used for each experiment. The clustering results were validated using external evaluation measure in order to determine their level of correctness. The approach unveils the implication of datasets size on the clusters formed and the impact of estimated number of partitions. 2013-04-20 Conference or Workshop Item PeerReviewed application/pdf en http://umpir.ump.edu.my/id/eprint/5007/1/22-UMP.pdf Raheem, Ajiboye Adeleke and Ruzaini, Abdullah Arshah and Hongwu, Qin (2013) Datasets Size: Effect on Clustering Results. In: 3rd International Conference on Software Engineering & Computer Systems (ICSECS - 2013), 20-22 Ogos 2013 , Universiti Malaysia Pahang. pp. 1-9.. (Unpublished)
repository_type Digital Repository
institution_category Local University
institution Universiti Malaysia Pahang
building UMP Institutional Repository
collection Online Access
language English
topic QA76 Computer software
spellingShingle QA76 Computer software
Raheem, Ajiboye Adeleke
Ruzaini, Abdullah Arshah
Hongwu, Qin
Datasets Size: Effect on Clustering Results
description The recent advancement in the way we capture and store data pose a serious challenge for data analysis. This gives a wider acceptance to data mining, being an interdisciplinary field that implements algorithm on stored data with a view to discovering hidden knowledge. Most people that keep records, however, are yet to reap the benefits of this tool, this is due to the general notion that a large datasets is required to guarantee reliable results. However, this may not be applicable in all cases. In this paper, we proposed a research technique that implements descriptive algorithms on numeric datasets of varied sizes. We modeled each subset of our data using EM clustering algorithm; two different numbers of partitions (k) were estimated and used for each experiment. The clustering results were validated using external evaluation measure in order to determine their level of correctness. The approach unveils the implication of datasets size on the clusters formed and the impact of estimated number of partitions.
format Conference or Workshop Item
author Raheem, Ajiboye Adeleke
Ruzaini, Abdullah Arshah
Hongwu, Qin
author_facet Raheem, Ajiboye Adeleke
Ruzaini, Abdullah Arshah
Hongwu, Qin
author_sort Raheem, Ajiboye Adeleke
title Datasets Size: Effect on Clustering Results
title_short Datasets Size: Effect on Clustering Results
title_full Datasets Size: Effect on Clustering Results
title_fullStr Datasets Size: Effect on Clustering Results
title_full_unstemmed Datasets Size: Effect on Clustering Results
title_sort datasets size: effect on clustering results
publishDate 2013
url http://umpir.ump.edu.my/id/eprint/5007/
http://umpir.ump.edu.my/id/eprint/5007/1/22-UMP.pdf
first_indexed 2023-09-18T22:00:05Z
last_indexed 2023-09-18T22:00:05Z
_version_ 1777414340094197760