Enhancing latent semantic analysis (LSA) using tagging algorithm in retrieving Malay documents / Afiqah Bazlla Md Soom
Latent Semantic Analysis (LSA) algorithm is a mathematical approach that uses Singular Value Decomposition (SVD) to discover the important association of the relationship between terms and terms, terms and documents and also documents and documents. Furthermore, LSA uses cosine similarity measuremen...
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://ir.uitm.edu.my/id/eprint/26905/ http://ir.uitm.edu.my/id/eprint/26905/1/TM_AFIQAH%20BAZLLA%20MD%20SOOM%20CS%2018_5.pdf |
id |
uitm-26905 |
---|---|
recordtype |
eprints |
spelling |
uitm-269052019-12-16T06:35:07Z http://ir.uitm.edu.my/id/eprint/26905/ Enhancing latent semantic analysis (LSA) using tagging algorithm in retrieving Malay documents / Afiqah Bazlla Md Soom Md Soom, Afiqah Bazlla Analysis Algorithms Latent Semantic Analysis (LSA) algorithm is a mathematical approach that uses Singular Value Decomposition (SVD) to discover the important association of the relationship between terms and terms, terms and documents and also documents and documents. Furthermore, LSA uses cosine similarity measurement to measure the similarity between the query word and terms as well as the documents. This approach seem to be efficient if each of the term only have single meaning and a meaning only represent a single term. Unfortunately, in Malay language there exists many terms that have multiple meanings and a single meaning that are represented by multiple terms. If these terms are treated as a single word, it will lead the search engine to retrieve irrelevant documents. These irrelevant documents retrieved will effect the effectiveness of the search engine. To investigate the enhancement of LSA using tagging algorithm (LSAT) in retrieving Malay documents, eight experiments are conducted in this research. The first experiment is conducted to compare the time taken for extracting normal term list and tagged term list, total number of both lists and also the time taken for the creation of term document matrix. Another six experiments record all the results of the LSA and LSAT search engine by using different dimension and threshold value. While the last experiment to compare the LSAT result with previous work on LSA using the same test collection. Outcomes of this study indicate that by using tagging algorithm, the recall value of the LSA algorithm can be enhanced up to 4% , the precision value also can be enhanced up to 16% and the F-measure value of LSA retrieval result can be enhanced by approximately up to 7% compared to LSA retrieval result without tagging algorithm. Furthermore, this research provides fundamental analyses to the other Information Retrieval (IR) developer in selecting the value of dimension and threshold value of retrieval that using LSA. 2018 Thesis NonPeerReviewed text en http://ir.uitm.edu.my/id/eprint/26905/1/TM_AFIQAH%20BAZLLA%20MD%20SOOM%20CS%2018_5.pdf Md Soom, Afiqah Bazlla (2018) Enhancing latent semantic analysis (LSA) using tagging algorithm in retrieving Malay documents / Afiqah Bazlla Md Soom. Masters thesis, Universiti Teknologi MARA. |
repository_type |
Digital Repository |
institution_category |
Local University |
institution |
Universiti Teknologi MARA |
building |
UiTM Institutional Repository |
collection |
Online Access |
language |
English |
topic |
Analysis Algorithms |
spellingShingle |
Analysis Algorithms Md Soom, Afiqah Bazlla Enhancing latent semantic analysis (LSA) using tagging algorithm in retrieving Malay documents / Afiqah Bazlla Md Soom |
description |
Latent Semantic Analysis (LSA) algorithm is a mathematical approach that uses Singular Value Decomposition (SVD) to discover the important association of the relationship between terms and terms, terms and documents and also documents and documents. Furthermore, LSA uses cosine similarity measurement to measure the similarity between the query word and terms as well as the documents. This approach seem to be efficient if each of the term only have single meaning and a meaning only represent a single term. Unfortunately, in Malay language there exists many terms that have multiple meanings and a single meaning that are represented by multiple terms. If these terms are treated as a single word, it will lead the search engine to retrieve irrelevant documents. These irrelevant documents retrieved will effect the effectiveness of the search engine. To investigate the enhancement of LSA using tagging algorithm (LSAT) in retrieving Malay documents, eight experiments are conducted in this research. The first experiment is conducted to compare the time taken for extracting normal term list and tagged term list, total number of both lists and also the time taken for the creation of term document matrix. Another six experiments record all the results of the LSA and LSAT search engine by using different dimension and threshold value. While the last experiment to compare the LSAT result with previous work on LSA using the same test collection. Outcomes of this study indicate that by using tagging algorithm, the recall value of the LSA algorithm can be enhanced up to 4% , the precision value also can be enhanced up to 16% and the F-measure value of LSA retrieval result can be enhanced by approximately up to 7% compared to LSA retrieval result without tagging algorithm. Furthermore, this research provides fundamental analyses to the other Information Retrieval (IR) developer in selecting the value of dimension and threshold value of retrieval that using LSA. |
format |
Thesis |
author |
Md Soom, Afiqah Bazlla |
author_facet |
Md Soom, Afiqah Bazlla |
author_sort |
Md Soom, Afiqah Bazlla |
title |
Enhancing latent semantic analysis (LSA) using tagging algorithm in retrieving Malay documents / Afiqah Bazlla Md Soom |
title_short |
Enhancing latent semantic analysis (LSA) using tagging algorithm in retrieving Malay documents / Afiqah Bazlla Md Soom |
title_full |
Enhancing latent semantic analysis (LSA) using tagging algorithm in retrieving Malay documents / Afiqah Bazlla Md Soom |
title_fullStr |
Enhancing latent semantic analysis (LSA) using tagging algorithm in retrieving Malay documents / Afiqah Bazlla Md Soom |
title_full_unstemmed |
Enhancing latent semantic analysis (LSA) using tagging algorithm in retrieving Malay documents / Afiqah Bazlla Md Soom |
title_sort |
enhancing latent semantic analysis (lsa) using tagging algorithm in retrieving malay documents / afiqah bazlla md soom |
publishDate |
2018 |
url |
http://ir.uitm.edu.my/id/eprint/26905/ http://ir.uitm.edu.my/id/eprint/26905/1/TM_AFIQAH%20BAZLLA%20MD%20SOOM%20CS%2018_5.pdf |
first_indexed |
2023-09-18T23:17:34Z |
last_indexed |
2023-09-18T23:17:34Z |
_version_ |
1777419214596866048 |