Datasets Size: Effect on Clustering Results
The recent advancement in the way we capture and store data pose a serious challenge for data analysis. This gives a wider acceptance to data mining, being an interdisciplinary field that implements algorithm on stored data with a view to discovering hidden knowledge. Most people that keep records,...
Main Authors: | , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://umpir.ump.edu.my/id/eprint/5007/ http://umpir.ump.edu.my/id/eprint/5007/1/22-UMP.pdf |
id |
ump-5007 |
---|---|
recordtype |
eprints |
spelling |
ump-50072018-05-18T02:50:35Z http://umpir.ump.edu.my/id/eprint/5007/ Datasets Size: Effect on Clustering Results Raheem, Ajiboye Adeleke Ruzaini, Abdullah Arshah Hongwu, Qin QA76 Computer software The recent advancement in the way we capture and store data pose a serious challenge for data analysis. This gives a wider acceptance to data mining, being an interdisciplinary field that implements algorithm on stored data with a view to discovering hidden knowledge. Most people that keep records, however, are yet to reap the benefits of this tool, this is due to the general notion that a large datasets is required to guarantee reliable results. However, this may not be applicable in all cases. In this paper, we proposed a research technique that implements descriptive algorithms on numeric datasets of varied sizes. We modeled each subset of our data using EM clustering algorithm; two different numbers of partitions (k) were estimated and used for each experiment. The clustering results were validated using external evaluation measure in order to determine their level of correctness. The approach unveils the implication of datasets size on the clusters formed and the impact of estimated number of partitions. 2013-04-20 Conference or Workshop Item PeerReviewed application/pdf en http://umpir.ump.edu.my/id/eprint/5007/1/22-UMP.pdf Raheem, Ajiboye Adeleke and Ruzaini, Abdullah Arshah and Hongwu, Qin (2013) Datasets Size: Effect on Clustering Results. In: 3rd International Conference on Software Engineering & Computer Systems (ICSECS - 2013), 20-22 Ogos 2013 , Universiti Malaysia Pahang. pp. 1-9.. (Unpublished) |
repository_type |
Digital Repository |
institution_category |
Local University |
institution |
Universiti Malaysia Pahang |
building |
UMP Institutional Repository |
collection |
Online Access |
language |
English |
topic |
QA76 Computer software |
spellingShingle |
QA76 Computer software Raheem, Ajiboye Adeleke Ruzaini, Abdullah Arshah Hongwu, Qin Datasets Size: Effect on Clustering Results |
description |
The recent advancement in the way we capture and store data pose a serious challenge for data analysis. This gives a wider acceptance to data mining, being an interdisciplinary field that implements algorithm on stored data with a view to discovering hidden knowledge. Most people that keep records, however, are yet to reap the benefits of this tool, this is due to the general notion that a large datasets is required to guarantee reliable results. However, this may not be applicable in all cases. In this paper, we proposed a research technique that implements descriptive algorithms on numeric datasets of varied sizes. We modeled each subset of our data using EM clustering algorithm; two different numbers of partitions (k) were estimated and used for each experiment. The clustering results were validated using external evaluation measure in order to determine their level of correctness. The approach unveils the implication of datasets size on the clusters formed and the impact of estimated number of partitions. |
format |
Conference or Workshop Item |
author |
Raheem, Ajiboye Adeleke Ruzaini, Abdullah Arshah Hongwu, Qin |
author_facet |
Raheem, Ajiboye Adeleke Ruzaini, Abdullah Arshah Hongwu, Qin |
author_sort |
Raheem, Ajiboye Adeleke |
title |
Datasets Size: Effect on Clustering Results |
title_short |
Datasets Size: Effect on Clustering Results |
title_full |
Datasets Size: Effect on Clustering Results |
title_fullStr |
Datasets Size: Effect on Clustering Results |
title_full_unstemmed |
Datasets Size: Effect on Clustering Results |
title_sort |
datasets size: effect on clustering results |
publishDate |
2013 |
url |
http://umpir.ump.edu.my/id/eprint/5007/ http://umpir.ump.edu.my/id/eprint/5007/1/22-UMP.pdf |
first_indexed |
2023-09-18T22:00:05Z |
last_indexed |
2023-09-18T22:00:05Z |
_version_ |
1777414340094197760 |