Density subspace clustering: a case study on perception of the required skill

This research aims to develop an improved model for subspace clustering based on density connection. The researches started with the problem were there are hidden data in a different space. Meanwhile the dimensionality increases, the farthest neighbour of data point expected to be almost as close as...

Full description

Bibliographic Details
Main Author:	Sembiring, Rahmat Widia
Format:	Thesis
Language:	English
Published:	2014
Subjects:	QA76 Computer software
Online Access:	http://umpir.ump.edu.my/id/eprint/9449/ http://umpir.ump.edu.my/id/eprint/9449/ http://umpir.ump.edu.my/id/eprint/9449/1/CD8255.pdf

id	ump-9449
recordtype	eprints
repository_type	Digital Repository
institution_category	Local University
institution	Universiti Malaysia Pahang
building	UMP Institutional Repository
collection	Online Access
language	English
topic	QA76 Computer software
spellingShingle	QA76 Computer software Sembiring, Rahmat Widia Density subspace clustering: a case study on perception of the required skill
description	This research aims to develop an improved model for subspace clustering based on density connection. The researches started with the problem were there are hidden data in a different space. Meanwhile the dimensionality increases, the farthest neighbour of data point expected to be almost as close as nearest neighbour for a wide range of data distributions and distance functions. In this case avoid the curse of dimensionality in multidimensional data and identify cluster in different subspace in multidimensional data are identified problem. However develop an improved model for subspace clustering based on density connection is important, also how to elaborate and testing subspace clustering based on density connection in educational data, especially how to ensure subspace clustering based on density connection can be used to justify higher learning institution required skill. Subspace clustering is projected as a search technique for grouping data or attributes in different clusters. Grouping done to identify the level of data density and to identify outliers or irrelevant data that will create each to cluster exist in a separate subset. This thesis proposed subspace clustering based on density connection, named DAta MIning subspace clusteRing Approach (DAMIRA), an improve of subspace clustering algorithm based on density connection. The main idea based on the density in each cluster is that any data has the minimum number of neighbouring data, where data density must be more than a certain threshold. In the early stage, the present research estimates density dimensions and the results are used as input data to determine the initial cluster based on density connection, using DBSCAN algorithm. Each dimension will be tested to investigate whether having a relationship with the data on another cluster, using proposed subspace clustering algorithms. If the data have a relationship, it will be classified as a subspace. Any data on the subspace clusters will then be tested again with DBSCAN algorithms, to look back on its density until a pure subspace cluster is finally found. The study used multidimensional data, such as benchmark datasets and real datasets. Real datasets are from education, particularly regarding the perception of students’ industrial training and from industries due to required skill. To verify the quality of the clustering obtained through proposed technique, we do DBSCAN, FIRES, INSCY, and SUBCLU. DAMIRA has successfully established very large number of clusters for each dataset while FIRES and INSCY have a high failure tendency to produce clusters in each subspace. SUBCLU and DAMIRA have no un-clustered real datasets; thus the perception of the results from the cluster will produce more accurate information. The clustering time for glass dataset and liver dataset using DAMIRA method is more than 20 times longer than the FIRES, INSCY and SUBCLU, meanwhile for job satisfaction dataset, DAMIRA has the shortest time compare to SUBCLU and INSCY methods. For larger and more complex data, the DAMIRA performance is more efficient than SUBCLU, but, still lower than the FIRES, INSCY, and DBSCAN. DAMIRA successfully clustered all of the data, while INSCY method has a lower coverage than FIRES method. For F1 Measure, SUBCLU method is better than FIRES, INSCY, and DAMIRA. This study present improved model for subspace clustering based on density connection, to cope with the challenges clustering in educational data mining, named as DAMIRA. This method can be used to justify perception of the required skill for higher learning institution.
format	Thesis
author	Sembiring, Rahmat Widia
author_facet	Sembiring, Rahmat Widia
author_sort	Sembiring, Rahmat Widia
title	Density subspace clustering: a case study on perception of the required skill
title_short	Density subspace clustering: a case study on perception of the required skill
title_full	Density subspace clustering: a case study on perception of the required skill
title_fullStr	Density subspace clustering: a case study on perception of the required skill
title_full_unstemmed	Density subspace clustering: a case study on perception of the required skill
title_sort	density subspace clustering: a case study on perception of the required skill
publishDate	2014
url	http://umpir.ump.edu.my/id/eprint/9449/ http://umpir.ump.edu.my/id/eprint/9449/ http://umpir.ump.edu.my/id/eprint/9449/1/CD8255.pdf
first_indexed	2023-09-18T22:08:02Z
last_indexed	2023-09-18T22:08:02Z
_version_	1777414839896899584
spelling	ump-94492018-11-07T03:21:41Z http://umpir.ump.edu.my/id/eprint/9449/ Density subspace clustering: a case study on perception of the required skill Sembiring, Rahmat Widia QA76 Computer software This research aims to develop an improved model for subspace clustering based on density connection. The researches started with the problem were there are hidden data in a different space. Meanwhile the dimensionality increases, the farthest neighbour of data point expected to be almost as close as nearest neighbour for a wide range of data distributions and distance functions. In this case avoid the curse of dimensionality in multidimensional data and identify cluster in different subspace in multidimensional data are identified problem. However develop an improved model for subspace clustering based on density connection is important, also how to elaborate and testing subspace clustering based on density connection in educational data, especially how to ensure subspace clustering based on density connection can be used to justify higher learning institution required skill. Subspace clustering is projected as a search technique for grouping data or attributes in different clusters. Grouping done to identify the level of data density and to identify outliers or irrelevant data that will create each to cluster exist in a separate subset. This thesis proposed subspace clustering based on density connection, named DAta MIning subspace clusteRing Approach (DAMIRA), an improve of subspace clustering algorithm based on density connection. The main idea based on the density in each cluster is that any data has the minimum number of neighbouring data, where data density must be more than a certain threshold. In the early stage, the present research estimates density dimensions and the results are used as input data to determine the initial cluster based on density connection, using DBSCAN algorithm. Each dimension will be tested to investigate whether having a relationship with the data on another cluster, using proposed subspace clustering algorithms. If the data have a relationship, it will be classified as a subspace. Any data on the subspace clusters will then be tested again with DBSCAN algorithms, to look back on its density until a pure subspace cluster is finally found. The study used multidimensional data, such as benchmark datasets and real datasets. Real datasets are from education, particularly regarding the perception of students’ industrial training and from industries due to required skill. To verify the quality of the clustering obtained through proposed technique, we do DBSCAN, FIRES, INSCY, and SUBCLU. DAMIRA has successfully established very large number of clusters for each dataset while FIRES and INSCY have a high failure tendency to produce clusters in each subspace. SUBCLU and DAMIRA have no un-clustered real datasets; thus the perception of the results from the cluster will produce more accurate information. The clustering time for glass dataset and liver dataset using DAMIRA method is more than 20 times longer than the FIRES, INSCY and SUBCLU, meanwhile for job satisfaction dataset, DAMIRA has the shortest time compare to SUBCLU and INSCY methods. For larger and more complex data, the DAMIRA performance is more efficient than SUBCLU, but, still lower than the FIRES, INSCY, and DBSCAN. DAMIRA successfully clustered all of the data, while INSCY method has a lower coverage than FIRES method. For F1 Measure, SUBCLU method is better than FIRES, INSCY, and DAMIRA. This study present improved model for subspace clustering based on density connection, to cope with the challenges clustering in educational data mining, named as DAMIRA. This method can be used to justify perception of the required skill for higher learning institution. 2014-01 Thesis NonPeerReviewed application/pdf en http://umpir.ump.edu.my/id/eprint/9449/1/CD8255.pdf Sembiring, Rahmat Widia (2014) Density subspace clustering: a case study on perception of the required skill. PhD thesis, Universiti Malaysia Pahang. http://iportal.ump.edu.my/lib/item?id=chamo:83654&theme=UMP2

Density subspace clustering: a case study on perception of the required skill

Similar Items