An Arabic Script Recognition System

A system for the recognition of machine printed Arabic script is proposed. The Arabic script is shared by three languages i.e., Arabic, Urdu and Farsi. The three languages have a descent amount of vocabulary in common, thus compounding the problems for identification. Therefore, in an ideal scenario...

Full description

Bibliographic Details
Main Authors:	Alginahi, Yasser M., Mudassar, Mohammed, M. Nomani, Kabir
Format:	Article
Language:	English
Published:	KSII 2015
Subjects:	QA76 Computer software
Online Access:	http://umpir.ump.edu.my/id/eprint/10823/ http://umpir.ump.edu.my/id/eprint/10823/ http://umpir.ump.edu.my/id/eprint/10823/ http://umpir.ump.edu.my/id/eprint/10823/1/fskkp-2015-nomani-Arabic%20Script%20Recognition%20System.pdf

id	ump-10823
recordtype	eprints
spelling	ump-108232018-09-25T08:44:23Z http://umpir.ump.edu.my/id/eprint/10823/ An Arabic Script Recognition System Alginahi, Yasser M. Mudassar, Mohammed M. Nomani, Kabir QA76 Computer software A system for the recognition of machine printed Arabic script is proposed. The Arabic script is shared by three languages i.e., Arabic, Urdu and Farsi. The three languages have a descent amount of vocabulary in common, thus compounding the problems for identification. Therefore, in an ideal scenario not only the script has to be differentiated from other scripts but also the language of the script has to be recognized. The recognition process involves the segregation of Arabic scripted documents from Latin, Han and other scripted documents using horizontal and vertical projection profiles, and the identification of the language. Identification mainly involves extracting connected components, which are subjected to Principle Component Analysis (PCA) transformation for extracting uncorrelated features. Later the traditional K-Nearest Neighbours (KNN) algorithm is used for recognition. Experiments were carried out by varying the number of principal components and connected components to be extracted per document to find a combination of both that would give the optimal accuracy. An accuracy of 100% is achieved for connected components >=18 and Principal components equals to 15. This proposed system would play a vital role in automatic archiving of multilingual documents and the selection of the appropriate Arabic script in multi lingual Optical Character Recognition (OCR) systems. KSII 2015-09-02 Article PeerReviewed application/pdf en http://umpir.ump.edu.my/id/eprint/10823/1/fskkp-2015-nomani-Arabic%20Script%20Recognition%20System.pdf Alginahi, Yasser M. and Mudassar, Mohammed and M. Nomani, Kabir (2015) An Arabic Script Recognition System. KSII Transactions on Internet and Information Systems, 9 (9). pp. 3701-3720. ISSN 1976-7277 http://dx.doi.org/10.3837/tiis.2015.09.023 DOI: 10.3837/tiis.2015.09.023
repository_type	Digital Repository
institution_category	Local University
institution	Universiti Malaysia Pahang
building	UMP Institutional Repository
collection	Online Access
language	English
topic	QA76 Computer software
spellingShingle	QA76 Computer software Alginahi, Yasser M. Mudassar, Mohammed M. Nomani, Kabir An Arabic Script Recognition System
description	A system for the recognition of machine printed Arabic script is proposed. The Arabic script is shared by three languages i.e., Arabic, Urdu and Farsi. The three languages have a descent amount of vocabulary in common, thus compounding the problems for identification. Therefore, in an ideal scenario not only the script has to be differentiated from other scripts but also the language of the script has to be recognized. The recognition process involves the segregation of Arabic scripted documents from Latin, Han and other scripted documents using horizontal and vertical projection profiles, and the identification of the language. Identification mainly involves extracting connected components, which are subjected to Principle Component Analysis (PCA) transformation for extracting uncorrelated features. Later the traditional K-Nearest Neighbours (KNN) algorithm is used for recognition. Experiments were carried out by varying the number of principal components and connected components to be extracted per document to find a combination of both that would give the optimal accuracy. An accuracy of 100% is achieved for connected components >=18 and Principal components equals to 15. This proposed system would play a vital role in automatic archiving of multilingual documents and the selection of the appropriate Arabic script in multi lingual Optical Character Recognition (OCR) systems.
format	Article
author	Alginahi, Yasser M. Mudassar, Mohammed M. Nomani, Kabir
author_facet	Alginahi, Yasser M. Mudassar, Mohammed M. Nomani, Kabir
author_sort	Alginahi, Yasser M.
title	An Arabic Script Recognition System
title_short	An Arabic Script Recognition System
title_full	An Arabic Script Recognition System
title_fullStr	An Arabic Script Recognition System
title_full_unstemmed	An Arabic Script Recognition System
title_sort	arabic script recognition system
publisher	KSII
publishDate	2015
url	http://umpir.ump.edu.my/id/eprint/10823/ http://umpir.ump.edu.my/id/eprint/10823/ http://umpir.ump.edu.my/id/eprint/10823/ http://umpir.ump.edu.my/id/eprint/10823/1/fskkp-2015-nomani-Arabic%20Script%20Recognition%20System.pdf
first_indexed	2023-09-18T22:10:52Z
last_indexed	2023-09-18T22:10:52Z
_version_	1777415018236608512

An Arabic Script Recognition System

Similar Items