New discrimination procedure of location model for handling large categorical variables
The location model proposed in the past is a predictive discriminant rule that can classify new observations into one of two predefined groups based on mixtures of continuous and categorical variables. The ability of location model to discriminate new observation correctly is highly dependent on the...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Penerbit Universiti Kebangsaan Malaysia
2017
|
Online Access: | http://journalarticle.ukm.my/11115/ http://journalarticle.ukm.my/11115/ http://journalarticle.ukm.my/11115/1/20%20Hashibah%20Hamid.pdf |
id |
ukm-11115 |
---|---|
recordtype |
eprints |
spelling |
ukm-111152017-12-18T08:28:46Z http://journalarticle.ukm.my/11115/ New discrimination procedure of location model for handling large categorical variables Hashibah Hamid, Long, Mei Mei Sharipah Soaad Syed Yahaya, The location model proposed in the past is a predictive discriminant rule that can classify new observations into one of two predefined groups based on mixtures of continuous and categorical variables. The ability of location model to discriminate new observation correctly is highly dependent on the number of multinomial cells created by the number of categorical variables. This study conducts a preliminary investigation to show the location model that uses maximum likelihood estimation has high misclassification rate up to 45% on average in dealing with more than six categorical variables for all 36 data tested. Such model indicated highly incorrect prediction as this model performed badly for large categorical variables even with large sample size. To alleviate the high rate of misclassification, a new strategy is embedded in the discriminant rule by introducing nonlinear principal component analysis (NPCA) into the classical location model (cLM), mainly to handle the large number of categorical variables. This new strategy is investigated on some simulation and real datasets through the estimation of misclassification rate using leave-one-out method. The results from numerical investigations manifest the feasibility of the proposed model as the misclassification rate is dramatically decreased compared to the cLM for all 18 different data settings. A practical application using real dataset demonstrates a significant improvement and obtains comparable result among the best methods that are compared. The overall findings reveal that the proposed model extended the applicability range of the location model as previously it was limited to only six categorical variables to achieve acceptable performance. This study proved that the proposed model with new discrimination procedure can be used as an alternative to the problems of mixed variables classification, primarily when facing with large categorical variables. Penerbit Universiti Kebangsaan Malaysia 2017-06 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/11115/1/20%20Hashibah%20Hamid.pdf Hashibah Hamid, and Long, Mei Mei and Sharipah Soaad Syed Yahaya, (2017) New discrimination procedure of location model for handling large categorical variables. Sains Malaysiana, 46 (6). pp. 1001-1010. ISSN 0126-6039 http://www.ukm.my/jsm/malay_journals/jilid46bil6_2017/KandunganJilid46Bil6_2017.html |
repository_type |
Digital Repository |
institution_category |
Local University |
institution |
Universiti Kebangasaan Malaysia |
building |
UKM Institutional Repository |
collection |
Online Access |
language |
English |
description |
The location model proposed in the past is a predictive discriminant rule that can classify new observations into one of two predefined groups based on mixtures of continuous and categorical variables. The ability of location model to discriminate new observation correctly is highly dependent on the number of multinomial cells created by the number of categorical variables. This study conducts a preliminary investigation to show the location model that uses maximum likelihood estimation has high misclassification rate up to 45% on average in dealing with more than six categorical variables for all 36 data tested. Such model indicated highly incorrect prediction as this model performed badly for large categorical variables even with large sample size. To alleviate the high rate of misclassification, a new strategy is embedded in the discriminant rule by introducing nonlinear principal component analysis (NPCA) into the classical location model (cLM), mainly to handle the large number of categorical variables. This new strategy is investigated on some simulation and real datasets through the estimation of misclassification rate using leave-one-out method. The results from numerical investigations manifest the feasibility of the proposed model as the misclassification rate is dramatically decreased compared to the cLM for all 18 different data settings. A practical application using real dataset demonstrates a significant improvement and obtains comparable result among the best methods that are compared. The overall findings reveal that the proposed model extended the applicability range of the location model as previously it was limited to only six categorical variables to achieve acceptable performance. This study proved that the proposed model with new discrimination procedure can be used as an alternative to the problems of mixed variables classification, primarily when facing with large categorical variables. |
format |
Article |
author |
Hashibah Hamid, Long, Mei Mei Sharipah Soaad Syed Yahaya, |
spellingShingle |
Hashibah Hamid, Long, Mei Mei Sharipah Soaad Syed Yahaya, New discrimination procedure of location model for handling large categorical variables |
author_facet |
Hashibah Hamid, Long, Mei Mei Sharipah Soaad Syed Yahaya, |
author_sort |
Hashibah Hamid, |
title |
New discrimination procedure of location model
for handling large categorical variables |
title_short |
New discrimination procedure of location model
for handling large categorical variables |
title_full |
New discrimination procedure of location model
for handling large categorical variables |
title_fullStr |
New discrimination procedure of location model
for handling large categorical variables |
title_full_unstemmed |
New discrimination procedure of location model
for handling large categorical variables |
title_sort |
new discrimination procedure of location model
for handling large categorical variables |
publisher |
Penerbit Universiti Kebangsaan Malaysia |
publishDate |
2017 |
url |
http://journalarticle.ukm.my/11115/ http://journalarticle.ukm.my/11115/ http://journalarticle.ukm.my/11115/1/20%20Hashibah%20Hamid.pdf |
first_indexed |
2023-09-18T19:59:22Z |
last_indexed |
2023-09-18T19:59:22Z |
_version_ |
1777406745518276608 |