Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation

Bahasa Melayu (Malay language) is a language spoken in Malaysia and many countries around it. It has rich literature and deep roots in culture. Bahasa Melayu language uses roman character set (i.e.A-Z) identical to English language. The written language uses the character set as building blocks to...

Full description

Bibliographic Details
Main Authors: Shah, Asadullah, Saidin, Aznan Zuhid, Taha, Imad, Zeki, Akram M.
Format: Conference or Workshop Item
Language:English
Published: 2011
Subjects:
Online Access:http://irep.iium.edu.my/2933/
http://irep.iium.edu.my/2933/
http://irep.iium.edu.my/2933/1/Poster-asadullah_aznan.ppt
id iium-2933
recordtype eprints
spelling iium-29332014-12-09T02:47:42Z http://irep.iium.edu.my/2933/ Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation Shah, Asadullah Saidin, Aznan Zuhid Taha, Imad Zeki, Akram M. PL Languages and literatures of Eastern Asia, Africa, Oceania PL5101 Malay Bahasa Melayu (Malay language) is a language spoken in Malaysia and many countries around it. It has rich literature and deep roots in culture. Bahasa Melayu language uses roman character set (i.e.A-Z) identical to English language. The written language uses the character set as building blocks to build word, sentences and phrases along with special punctuations and signs to create documents of interest. In this paper, results of preliminary investigation of Malay text documents are provided. For this purpose scanning of articles written upon various topics in Malay were carried out. Approximately 31 thousand characters from different articles are scanned. Preliminary observations indicate that on average, character “A” occurs 19%, character “N” occur 10%, character “E” occur “9%”and character “I” 8% in text. However, it is also observed from the data that, these are the characters from over all set with highest frequencies of occurances and it is expected that during further investigation they will remain as higher frequency occurring characters. Furthermore, the results indicate that for Bahasa Melayu characters appearance in text is very close in character frequencies of Bahasa Indonesia, but having different appearance of characters than English language. The investigation also indicate that these two languages, Bahasa Melayu and Bahasa Indonesia share close phonetic structure but not English, though all three use same character set 2011-07 Conference or Workshop Item PeerReviewed application/pdf en http://irep.iium.edu.my/2933/1/Poster-asadullah_aznan.ppt Shah, Asadullah and Saidin, Aznan Zuhid and Taha, Imad and Zeki, Akram M. (2011) Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation. In: 12th Conference of the Pacific Association for Computational Linguistics (PACLING 2011), 19 - 21 July 2011, IIUM. (Unpublished) http://kict.iium.edu.my/pacling/index.html
repository_type Digital Repository
institution_category Local University
institution International Islamic University Malaysia
building IIUM Repository
collection Online Access
language English
topic PL Languages and literatures of Eastern Asia, Africa, Oceania
PL5101 Malay
spellingShingle PL Languages and literatures of Eastern Asia, Africa, Oceania
PL5101 Malay
Shah, Asadullah
Saidin, Aznan Zuhid
Taha, Imad
Zeki, Akram M.
Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
description Bahasa Melayu (Malay language) is a language spoken in Malaysia and many countries around it. It has rich literature and deep roots in culture. Bahasa Melayu language uses roman character set (i.e.A-Z) identical to English language. The written language uses the character set as building blocks to build word, sentences and phrases along with special punctuations and signs to create documents of interest. In this paper, results of preliminary investigation of Malay text documents are provided. For this purpose scanning of articles written upon various topics in Malay were carried out. Approximately 31 thousand characters from different articles are scanned. Preliminary observations indicate that on average, character “A” occurs 19%, character “N” occur 10%, character “E” occur “9%”and character “I” 8% in text. However, it is also observed from the data that, these are the characters from over all set with highest frequencies of occurances and it is expected that during further investigation they will remain as higher frequency occurring characters. Furthermore, the results indicate that for Bahasa Melayu characters appearance in text is very close in character frequencies of Bahasa Indonesia, but having different appearance of characters than English language. The investigation also indicate that these two languages, Bahasa Melayu and Bahasa Indonesia share close phonetic structure but not English, though all three use same character set
format Conference or Workshop Item
author Shah, Asadullah
Saidin, Aznan Zuhid
Taha, Imad
Zeki, Akram M.
author_facet Shah, Asadullah
Saidin, Aznan Zuhid
Taha, Imad
Zeki, Akram M.
author_sort Shah, Asadullah
title Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_short Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_full Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_fullStr Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_full_unstemmed Frequencies determination of characters for Bahasa Melayu: results of preliminary investigation
title_sort frequencies determination of characters for bahasa melayu: results of preliminary investigation
publishDate 2011
url http://irep.iium.edu.my/2933/
http://irep.iium.edu.my/2933/
http://irep.iium.edu.my/2933/1/Poster-asadullah_aznan.ppt
first_indexed 2023-09-18T20:10:37Z
last_indexed 2023-09-18T20:10:37Z
_version_ 1777407453045981184