A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium

The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This rese...

Full description

Bibliographic Details
Main Authors:	Abu Seman, Muhamad Sadry, Wan Mamat, Wan Ali @ Wan Yusoff, Noordin, Mohamad Fauzan, Othman, Roslina
Format:	Monograph
Language:	English
Published:	2019
Subjects:	T Technology (General) Z665 Library Science. Information Science ZA4450 Databases
Online Access:	http://irep.iium.edu.my/73052/ http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf

id	iium-73052
recordtype	eprints
spelling	iium-730522019-12-01T03:57:12Z http://irep.iium.edu.my/73052/ A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina T Technology (General) Z665 Library Science. Information Science ZA4450 Databases The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This research assesses the quality scores of utilizing a prevalent statistical model, Statistical Model Transliteration (SMT) for jawi-roman transliteration. This research utilizes exploratory approach. The data used were extracted from 3 Malay manuscripts: Bidāyat al-Mubtadī bi-Faḍlillāh al-Muhdī, Kashf al-Asrār and Hujjat al-Balighah, acquired from ISTAC with a total of 3,420 rows of data transliterated into old jawi, modern jawi and roman form. Quality scores of Bilingual Evaluation Understudy (BLEU) score and word error rate are used for evaluation of SMT output. The findings show that E-Jawi.net word error rate for old jawi-roman is 55.8% error while modern jawi-roman is 32.42% on the initial data. Hence, the research opted for human expert to develop a quality corpus for SMT consisting of multiple transliterations of the manuscript contents in modern jawi and roman. Significantly, the model is dependable on a quality parallel corpus. 2019-07-01 Monograph NonPeerReviewed application/pdf en http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf Abu Seman, Muhamad Sadry and Wan Mamat, Wan Ali @ Wan Yusoff and Noordin, Mohamad Fauzan and Othman, Roslina (2019) A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium. Research Report. UNSPECIFIED. (Unpublished)
repository_type	Digital Repository
institution_category	Local University
institution	International Islamic University Malaysia
building	IIUM Repository
collection	Online Access
language	English
topic	T Technology (General) Z665 Library Science. Information Science ZA4450 Databases
spellingShingle	T Technology (General) Z665 Library Science. Information Science ZA4450 Databases Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
description	The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This research assesses the quality scores of utilizing a prevalent statistical model, Statistical Model Transliteration (SMT) for jawi-roman transliteration. This research utilizes exploratory approach. The data used were extracted from 3 Malay manuscripts: Bidāyat al-Mubtadī bi-Faḍlillāh al-Muhdī, Kashf al-Asrār and Hujjat al-Balighah, acquired from ISTAC with a total of 3,420 rows of data transliterated into old jawi, modern jawi and roman form. Quality scores of Bilingual Evaluation Understudy (BLEU) score and word error rate are used for evaluation of SMT output. The findings show that E-Jawi.net word error rate for old jawi-roman is 55.8% error while modern jawi-roman is 32.42% on the initial data. Hence, the research opted for human expert to develop a quality corpus for SMT consisting of multiple transliterations of the manuscript contents in modern jawi and roman. Significantly, the model is dependable on a quality parallel corpus.
format	Monograph
author	Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina
author_facet	Abu Seman, Muhamad Sadry Wan Mamat, Wan Ali @ Wan Yusoff Noordin, Mohamad Fauzan Othman, Roslina
author_sort	Abu Seman, Muhamad Sadry
title	A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_short	A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_full	A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_fullStr	A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_full_unstemmed	A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
title_sort	model for islamic istilahnet in malay manuscripts for big data analytics and linguistics consortium
publishDate	2019
url	http://irep.iium.edu.my/73052/ http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf
first_indexed	2023-09-18T21:43:35Z
last_indexed	2023-09-18T21:43:35Z
_version_	1777413301258420224

A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium

Similar Items