Performance analysis of machine learning algorithms for missing value imputation

Data mining requires a pre-processing task in which the data are prepared, cleaned, integrated, transformed, reduced and discretized for ensuring the quality. Missing values is a universal problem in many research domains that is commonly encountered in the data cleaning process. Missing values usua...

Full description

Bibliographic Details
Main Authors: Zainal Abidin, Nadzurah, Ismail, Amelia Ritahani, Emran, Nurul Akmar
Format: Article
Language:English
English
English
Published: Science and Information Organization 2018
Subjects:
Online Access:http://irep.iium.edu.my/65381/
http://irep.iium.edu.my/65381/
http://irep.iium.edu.my/65381/
http://irep.iium.edu.my/65381/1/65381_Performance%20analysis%20of%20machine%20learning.pdf
http://irep.iium.edu.my/65381/2/65381_Performance%20analysis%20of%20machine%20learning_SCOPUS.pdf
http://irep.iium.edu.my/65381/3/65381_Performance%20analysis%20of%20machine%20learning_WOS.pdf
id iium-65381
recordtype eprints
spelling iium-653812018-08-03T00:53:46Z http://irep.iium.edu.my/65381/ Performance analysis of machine learning algorithms for missing value imputation Zainal Abidin, Nadzurah Ismail, Amelia Ritahani Emran, Nurul Akmar T10.5 Communication of technical information Data mining requires a pre-processing task in which the data are prepared, cleaned, integrated, transformed, reduced and discretized for ensuring the quality. Missing values is a universal problem in many research domains that is commonly encountered in the data cleaning process. Missing values usually occur when a value of stored data absent for a variable of an observation. Missing values problem imposes undesirable effect on analysis results, especially when it leads to biased parameter estimates. Data imputation is a common way to deal with missing values where the missing value's substitutes are discovered through statistical or machine learning techniques. Nevertheless, examining the strengths (and limitations) of these techniques is important to aid understanding its characteristics. In this paper, the performance of three machine learning classifiers (K-Nearest Neighbors (KNN), Decision Tree, and Bayesian Networks) are compared in terms of data imputation accuracy. The results shows that among the three classifiers, Bayesian has the most promising performance. © 2015 The Science and Information (SAI) Organization Limited. Science and Information Organization 2018 Article PeerReviewed application/pdf en http://irep.iium.edu.my/65381/1/65381_Performance%20analysis%20of%20machine%20learning.pdf application/pdf en http://irep.iium.edu.my/65381/2/65381_Performance%20analysis%20of%20machine%20learning_SCOPUS.pdf application/pdf en http://irep.iium.edu.my/65381/3/65381_Performance%20analysis%20of%20machine%20learning_WOS.pdf Zainal Abidin, Nadzurah and Ismail, Amelia Ritahani and Emran, Nurul Akmar (2018) Performance analysis of machine learning algorithms for missing value imputation. International Journal of Advanced Computer Science and Applications, 9 (6). pp. 442-447. ISSN 2158-107X E-ISSN 2156-5570 http://thesai.org/Downloads/Volume9No6/Paper_60-Performance_Analysis_of_Machine_Learning_Algorithms.pdf 10.14569/IJACSA.2018.090660
repository_type Digital Repository
institution_category Local University
institution International Islamic University Malaysia
building IIUM Repository
collection Online Access
language English
English
English
topic T10.5 Communication of technical information
spellingShingle T10.5 Communication of technical information
Zainal Abidin, Nadzurah
Ismail, Amelia Ritahani
Emran, Nurul Akmar
Performance analysis of machine learning algorithms for missing value imputation
description Data mining requires a pre-processing task in which the data are prepared, cleaned, integrated, transformed, reduced and discretized for ensuring the quality. Missing values is a universal problem in many research domains that is commonly encountered in the data cleaning process. Missing values usually occur when a value of stored data absent for a variable of an observation. Missing values problem imposes undesirable effect on analysis results, especially when it leads to biased parameter estimates. Data imputation is a common way to deal with missing values where the missing value's substitutes are discovered through statistical or machine learning techniques. Nevertheless, examining the strengths (and limitations) of these techniques is important to aid understanding its characteristics. In this paper, the performance of three machine learning classifiers (K-Nearest Neighbors (KNN), Decision Tree, and Bayesian Networks) are compared in terms of data imputation accuracy. The results shows that among the three classifiers, Bayesian has the most promising performance. © 2015 The Science and Information (SAI) Organization Limited.
format Article
author Zainal Abidin, Nadzurah
Ismail, Amelia Ritahani
Emran, Nurul Akmar
author_facet Zainal Abidin, Nadzurah
Ismail, Amelia Ritahani
Emran, Nurul Akmar
author_sort Zainal Abidin, Nadzurah
title Performance analysis of machine learning algorithms for missing value imputation
title_short Performance analysis of machine learning algorithms for missing value imputation
title_full Performance analysis of machine learning algorithms for missing value imputation
title_fullStr Performance analysis of machine learning algorithms for missing value imputation
title_full_unstemmed Performance analysis of machine learning algorithms for missing value imputation
title_sort performance analysis of machine learning algorithms for missing value imputation
publisher Science and Information Organization
publishDate 2018
url http://irep.iium.edu.my/65381/
http://irep.iium.edu.my/65381/
http://irep.iium.edu.my/65381/
http://irep.iium.edu.my/65381/1/65381_Performance%20analysis%20of%20machine%20learning.pdf
http://irep.iium.edu.my/65381/2/65381_Performance%20analysis%20of%20machine%20learning_SCOPUS.pdf
http://irep.iium.edu.my/65381/3/65381_Performance%20analysis%20of%20machine%20learning_WOS.pdf
first_indexed 2023-09-18T21:32:46Z
last_indexed 2023-09-18T21:32:46Z
_version_ 1777412621278904320