A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This rese...
Main Authors: | , , , |
---|---|
Format: | Monograph |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/73052/ http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf |
id |
iium-73052 |
---|---|
recordtype |
eprints |
spelling |
iium-730522019-12-01T03:57:12Z http://irep.iium.edu.my/73052/ A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina T Technology (General) Z665 Library Science. Information Science ZA4450 Databases The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This research assesses the quality scores of utilizing a prevalent statistical model, Statistical Model Transliteration (SMT) for jawi-roman transliteration. This research utilizes exploratory approach. The data used were extracted from 3 Malay manuscripts: Bidāyat al-Mubtadī bi-Faḍlillāh al-Muhdī, Kashf al-Asrār and Hujjat al-Balighah, acquired from ISTAC with a total of 3,420 rows of data transliterated into old jawi, modern jawi and roman form. Quality scores of Bilingual Evaluation Understudy (BLEU) score and word error rate are used for evaluation of SMT output. The findings show that E-Jawi.net word error rate for old jawi-roman is 55.8% error while modern jawi-roman is 32.42% on the initial data. Hence, the research opted for human expert to develop a quality corpus for SMT consisting of multiple transliterations of the manuscript contents in modern jawi and roman. Significantly, the model is dependable on a quality parallel corpus. 2019-07-01 Monograph NonPeerReviewed application/pdf en http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf Abu Seman, Muhamad Sadry and Wan Mamat, Wan Ali @ Wan Yusoff and Noordin, Mohamad Fauzan and Othman, Roslina (2019) A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium. Research Report. UNSPECIFIED. (Unpublished) |
repository_type |
Digital Repository |
institution_category |
Local University |
institution |
International Islamic University Malaysia |
building |
IIUM Repository |
collection |
Online Access |
language |
English |
topic |
T Technology (General) Z665 Library Science. Information Science ZA4450 Databases |
spellingShingle |
T Technology (General) Z665 Library Science. Information Science ZA4450 Databases Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium |
description |
The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This research assesses the quality scores of utilizing a prevalent statistical model, Statistical Model Transliteration (SMT) for jawi-roman transliteration. This research utilizes exploratory approach. The data used were extracted from 3 Malay manuscripts: Bidāyat al-Mubtadī bi-Faḍlillāh al-Muhdī, Kashf al-Asrār and Hujjat al-Balighah, acquired from ISTAC with a total of 3,420 rows of data transliterated into old jawi, modern jawi and roman form. Quality scores of Bilingual Evaluation Understudy (BLEU) score and word error rate are used for evaluation of SMT output. The findings show that E-Jawi.net word error rate for old jawi-roman is 55.8% error while modern jawi-roman is 32.42% on the initial data. Hence, the research opted for human expert to develop a quality corpus for SMT consisting of multiple transliterations of the manuscript contents in modern jawi and roman. Significantly, the model is dependable on a quality parallel corpus. |
format |
Monograph |
author |
Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina |
author_facet |
Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina |
author_sort |
Abu Seman, Muhamad Sadry |
title |
A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium |
title_short |
A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium |
title_full |
A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium |
title_fullStr |
A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium |
title_full_unstemmed |
A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium |
title_sort |
model for islamic istilahnet in malay manuscripts for big data analytics and linguistics consortium |
publishDate |
2019 |
url |
http://irep.iium.edu.my/73052/ http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf |
first_indexed |
2023-09-18T21:43:35Z |
last_indexed |
2023-09-18T21:43:35Z |
_version_ |
1777413301258420224 |