Comparison of feature selection techniques in classifying stroke documents

The amount of digital biomedical literature grows that make most of the researchers facing the difficulties to manage and retrieve the required information from the Internet because this task is very challenging. The application of text classification on biomedical literature is one of the solutions...

Full description

Bibliographic Details
Main Authors: Nur Syaza Izzati, Mohd Rafei, Rohayanti, Hassan, Saedudin, R. D. Rohmat, Anis Farihan, Mat Raffei, Zalmiyah, Zakaria, Shahreen, Kasim
Format: Article
Language:English
Published: Institute of Advanced Engineering and Science 2019
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/25304/
http://umpir.ump.edu.my/id/eprint/25304/
http://umpir.ump.edu.my/id/eprint/25304/
http://umpir.ump.edu.my/id/eprint/25304/1/18512-34084-1-PB_2.pdf
id ump-25304
recordtype eprints
spelling ump-253042019-07-22T09:00:29Z http://umpir.ump.edu.my/id/eprint/25304/ Comparison of feature selection techniques in classifying stroke documents Nur Syaza Izzati, Mohd Rafei Rohayanti, Hassan Saedudin, R. D. Rohmat Anis Farihan, Mat Raffei Zalmiyah, Zakaria Shahreen, Kasim QA75 Electronic computers. Computer science The amount of digital biomedical literature grows that make most of the researchers facing the difficulties to manage and retrieve the required information from the Internet because this task is very challenging. The application of text classification on biomedical literature is one of the solutions in order to solve problem that have been faced by researchers but managing the high dimensionality of data being a common issue on text classification. Therefore, the aim of this research is to compare the techniques that could be used to select the relevant features for classifying biomedical text abstracts. This research focus on Pearson‟s Correlation and Information Gain as feature selection techniques for reducing the high dimensionality of data. Towards this effort, we conduct and evaluate several experiments using 100 abstract of stroke documents that retrieved from PubMed database as datasets. This dataset underwent the text pre-processing that is crucial before proceed to feature selection phase. Features selection phase is involving Information Gain and Pearson Correlation technique. Support Vector Machine classifier is used in order to evaluate and compare the effectiveness of two feature selection techniques. For this dataset, Information Gain has outperformed Pearson‟s Correlation by 3.3%. This research tends to extract the meaningful features from a subset of stroke documents that can be used for various application especially in diagnose the stroke disease. Institute of Advanced Engineering and Science 2019-06 Article PeerReviewed pdf en cc_by_nc_4 http://umpir.ump.edu.my/id/eprint/25304/1/18512-34084-1-PB_2.pdf Nur Syaza Izzati, Mohd Rafei and Rohayanti, Hassan and Saedudin, R. D. Rohmat and Anis Farihan, Mat Raffei and Zalmiyah, Zakaria and Shahreen, Kasim (2019) Comparison of feature selection techniques in classifying stroke documents. Indonesian Journal of Electrical Engineering and Computer Science, 14 (3). pp. 1244-1250. ISSN 2502-4752 http://ijeecs.iaescore.com/index.php/IJEECS/article/view/18512 http://doi.org/10.11591/ijeecs.v14.i3.pp1244-1250
repository_type Digital Repository
institution_category Local University
institution Universiti Malaysia Pahang
building UMP Institutional Repository
collection Online Access
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Nur Syaza Izzati, Mohd Rafei
Rohayanti, Hassan
Saedudin, R. D. Rohmat
Anis Farihan, Mat Raffei
Zalmiyah, Zakaria
Shahreen, Kasim
Comparison of feature selection techniques in classifying stroke documents
description The amount of digital biomedical literature grows that make most of the researchers facing the difficulties to manage and retrieve the required information from the Internet because this task is very challenging. The application of text classification on biomedical literature is one of the solutions in order to solve problem that have been faced by researchers but managing the high dimensionality of data being a common issue on text classification. Therefore, the aim of this research is to compare the techniques that could be used to select the relevant features for classifying biomedical text abstracts. This research focus on Pearson‟s Correlation and Information Gain as feature selection techniques for reducing the high dimensionality of data. Towards this effort, we conduct and evaluate several experiments using 100 abstract of stroke documents that retrieved from PubMed database as datasets. This dataset underwent the text pre-processing that is crucial before proceed to feature selection phase. Features selection phase is involving Information Gain and Pearson Correlation technique. Support Vector Machine classifier is used in order to evaluate and compare the effectiveness of two feature selection techniques. For this dataset, Information Gain has outperformed Pearson‟s Correlation by 3.3%. This research tends to extract the meaningful features from a subset of stroke documents that can be used for various application especially in diagnose the stroke disease.
format Article
author Nur Syaza Izzati, Mohd Rafei
Rohayanti, Hassan
Saedudin, R. D. Rohmat
Anis Farihan, Mat Raffei
Zalmiyah, Zakaria
Shahreen, Kasim
author_facet Nur Syaza Izzati, Mohd Rafei
Rohayanti, Hassan
Saedudin, R. D. Rohmat
Anis Farihan, Mat Raffei
Zalmiyah, Zakaria
Shahreen, Kasim
author_sort Nur Syaza Izzati, Mohd Rafei
title Comparison of feature selection techniques in classifying stroke documents
title_short Comparison of feature selection techniques in classifying stroke documents
title_full Comparison of feature selection techniques in classifying stroke documents
title_fullStr Comparison of feature selection techniques in classifying stroke documents
title_full_unstemmed Comparison of feature selection techniques in classifying stroke documents
title_sort comparison of feature selection techniques in classifying stroke documents
publisher Institute of Advanced Engineering and Science
publishDate 2019
url http://umpir.ump.edu.my/id/eprint/25304/
http://umpir.ump.edu.my/id/eprint/25304/
http://umpir.ump.edu.my/id/eprint/25304/
http://umpir.ump.edu.my/id/eprint/25304/1/18512-34084-1-PB_2.pdf
first_indexed 2023-09-18T22:38:47Z
last_indexed 2023-09-18T22:38:47Z
_version_ 1777416774674808832