A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium

The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This rese...

Full description

Bibliographic Details
Main Authors: Abu Seman, Muhamad Sadry, Wan Mamat, Wan Ali @ Wan Yusoff, Noordin, Mohamad Fauzan, Othman, Roslina
Format: Monograph
Language:English
Published: 2019
Subjects:
Online Access:http://irep.iium.edu.my/73052/
http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf
id iium-73052
recordtype eprints
spelling iium-730522019-12-01T03:57:12Z http://irep.iium.edu.my/73052/ A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina T Technology (General) Z665 Library Science. Information Science ZA4450 Databases The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This research assesses the quality scores of utilizing a prevalent statistical model, Statistical Model Transliteration (SMT) for jawi-roman transliteration. This research utilizes exploratory approach. The data used were extracted from 3 Malay manuscripts: Bidāyat al-Mubtadī bi-Faḍlillāh al-Muhdī, Kashf al-Asrār and Hujjat al-Balighah, acquired from ISTAC with a total of 3,420 rows of data transliterated into old jawi, modern jawi and roman form. Quality scores of Bilingual Evaluation Understudy (BLEU) score and word error rate are used for evaluation of SMT output. The findings show that E-Jawi.net word error rate for old jawi-roman is 55.8% error while modern jawi-roman is 32.42% on the initial data. Hence, the research opted for human expert to develop a quality corpus for SMT consisting of multiple transliterations of the manuscript contents in modern jawi and roman. Significantly, the model is dependable on a quality parallel corpus. 2019-07-01 Monograph NonPeerReviewed application/pdf en http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf Abu Seman, Muhamad Sadry and Wan Mamat, Wan Ali @ Wan Yusoff and Noordin, Mohamad Fauzan and Othman, Roslina (2019) A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium. Research Report. UNSPECIFIED. (Unpublished)
repository_type Digital Repository
institution_category Local University
institution International Islamic University Malaysia
building IIUM Repository
collection Online Access
language English
topic T Technology (General)
Z665 Library Science. Information Science
ZA4450 Databases
spellingShingle T Technology (General)
Z665 Library Science. Information Science
ZA4450 Databases
Abu Seman, Muhamad Sadry
Wan Mamat, Wan Ali @ Wan Yusoff
Noordin, Mohamad Fauzan
Othman, Roslina
A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
description The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This research assesses the quality scores of utilizing a prevalent statistical model, Statistical Model Transliteration (SMT) for jawi-roman transliteration. This research utilizes exploratory approach. The data used were extracted from 3 Malay manuscripts: Bidāyat al-Mubtadī bi-Faḍlillāh al-Muhdī, Kashf al-Asrār and Hujjat al-Balighah, acquired from ISTAC with a total of 3,420 rows of data transliterated into old jawi, modern jawi and roman form. Quality scores of Bilingual Evaluation Understudy (BLEU) score and word error rate are used for evaluation of SMT output. The findings show that E-Jawi.net word error rate for old jawi-roman is 55.8% error while modern jawi-roman is 32.42% on the initial data. Hence, the research opted for human expert to develop a quality corpus for SMT consisting of multiple transliterations of the manuscript contents in modern jawi and roman. Significantly, the model is dependable on a quality parallel corpus.
format Monograph
author Abu Seman, Muhamad Sadry
Wan Mamat, Wan Ali @ Wan Yusoff
Noordin, Mohamad Fauzan
Othman, Roslina
author_facet Abu Seman, Muhamad Sadry
Wan Mamat, Wan Ali @ Wan Yusoff
Noordin, Mohamad Fauzan
Othman, Roslina
author_sort Abu Seman, Muhamad Sadry
title A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_short A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_full A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_fullStr A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_full_unstemmed A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_sort model for islamic istilahnet in malay manuscripts for big data analytics and linguistics consortium
publishDate 2019
url http://irep.iium.edu.my/73052/
http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf
first_indexed 2023-09-18T21:43:35Z
last_indexed 2023-09-18T21:43:35Z
_version_ 1777413301258420224