New discrimination procedure of location model for handling large categorical variables

The location model proposed in the past is a predictive discriminant rule that can classify new observations into one of two predefined groups based on mixtures of continuous and categorical variables. The ability of location model to discriminate new observation correctly is highly dependent on the...

Full description

Bibliographic Details
Main Authors: Hashibah Hamid, Long, Mei Mei, Sharipah Soaad Syed Yahaya
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2017
Online Access:http://journalarticle.ukm.my/11115/
http://journalarticle.ukm.my/11115/
http://journalarticle.ukm.my/11115/1/20%20Hashibah%20Hamid.pdf
id ukm-11115
recordtype eprints
spelling ukm-111152017-12-18T08:28:46Z http://journalarticle.ukm.my/11115/ New discrimination procedure of location model for handling large categorical variables Hashibah Hamid, Long, Mei Mei Sharipah Soaad Syed Yahaya, The location model proposed in the past is a predictive discriminant rule that can classify new observations into one of two predefined groups based on mixtures of continuous and categorical variables. The ability of location model to discriminate new observation correctly is highly dependent on the number of multinomial cells created by the number of categorical variables. This study conducts a preliminary investigation to show the location model that uses maximum likelihood estimation has high misclassification rate up to 45% on average in dealing with more than six categorical variables for all 36 data tested. Such model indicated highly incorrect prediction as this model performed badly for large categorical variables even with large sample size. To alleviate the high rate of misclassification, a new strategy is embedded in the discriminant rule by introducing nonlinear principal component analysis (NPCA) into the classical location model (cLM), mainly to handle the large number of categorical variables. This new strategy is investigated on some simulation and real datasets through the estimation of misclassification rate using leave-one-out method. The results from numerical investigations manifest the feasibility of the proposed model as the misclassification rate is dramatically decreased compared to the cLM for all 18 different data settings. A practical application using real dataset demonstrates a significant improvement and obtains comparable result among the best methods that are compared. The overall findings reveal that the proposed model extended the applicability range of the location model as previously it was limited to only six categorical variables to achieve acceptable performance. This study proved that the proposed model with new discrimination procedure can be used as an alternative to the problems of mixed variables classification, primarily when facing with large categorical variables. Penerbit Universiti Kebangsaan Malaysia 2017-06 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/11115/1/20%20Hashibah%20Hamid.pdf Hashibah Hamid, and Long, Mei Mei and Sharipah Soaad Syed Yahaya, (2017) New discrimination procedure of location model for handling large categorical variables. Sains Malaysiana, 46 (6). pp. 1001-1010. ISSN 0126-6039 http://www.ukm.my/jsm/malay_journals/jilid46bil6_2017/KandunganJilid46Bil6_2017.html
repository_type Digital Repository
institution_category Local University
institution Universiti Kebangasaan Malaysia
building UKM Institutional Repository
collection Online Access
language English
description The location model proposed in the past is a predictive discriminant rule that can classify new observations into one of two predefined groups based on mixtures of continuous and categorical variables. The ability of location model to discriminate new observation correctly is highly dependent on the number of multinomial cells created by the number of categorical variables. This study conducts a preliminary investigation to show the location model that uses maximum likelihood estimation has high misclassification rate up to 45% on average in dealing with more than six categorical variables for all 36 data tested. Such model indicated highly incorrect prediction as this model performed badly for large categorical variables even with large sample size. To alleviate the high rate of misclassification, a new strategy is embedded in the discriminant rule by introducing nonlinear principal component analysis (NPCA) into the classical location model (cLM), mainly to handle the large number of categorical variables. This new strategy is investigated on some simulation and real datasets through the estimation of misclassification rate using leave-one-out method. The results from numerical investigations manifest the feasibility of the proposed model as the misclassification rate is dramatically decreased compared to the cLM for all 18 different data settings. A practical application using real dataset demonstrates a significant improvement and obtains comparable result among the best methods that are compared. The overall findings reveal that the proposed model extended the applicability range of the location model as previously it was limited to only six categorical variables to achieve acceptable performance. This study proved that the proposed model with new discrimination procedure can be used as an alternative to the problems of mixed variables classification, primarily when facing with large categorical variables.
format Article
author Hashibah Hamid,
Long, Mei Mei
Sharipah Soaad Syed Yahaya,
spellingShingle Hashibah Hamid,
Long, Mei Mei
Sharipah Soaad Syed Yahaya,
New discrimination procedure of location model for handling large categorical variables
author_facet Hashibah Hamid,
Long, Mei Mei
Sharipah Soaad Syed Yahaya,
author_sort Hashibah Hamid,
title New discrimination procedure of location model for handling large categorical variables
title_short New discrimination procedure of location model for handling large categorical variables
title_full New discrimination procedure of location model for handling large categorical variables
title_fullStr New discrimination procedure of location model for handling large categorical variables
title_full_unstemmed New discrimination procedure of location model for handling large categorical variables
title_sort new discrimination procedure of location model for handling large categorical variables
publisher Penerbit Universiti Kebangsaan Malaysia
publishDate 2017
url http://journalarticle.ukm.my/11115/
http://journalarticle.ukm.my/11115/
http://journalarticle.ukm.my/11115/1/20%20Hashibah%20Hamid.pdf
first_indexed 2023-09-18T19:59:22Z
last_indexed 2023-09-18T19:59:22Z
_version_ 1777406745518276608