Bangla speech-to-text conversion using SAPI

Speech is the most natural form of communication and interaction between humans; whereas, text and symbols are the most common form of transaction in computer systems. Therefore, interest regarding conversion between speech and text is increasing day by day for speech oriented human-computer inter...

Full description

Bibliographic Details
Main Authors: Sultana, Shaheena, Akhand, M. A. H, Das, Prodip Kumar, Rahman, M.M. Hafizur
Format: Conference or Workshop Item
Language:English
Published: 2012
Subjects:
Online Access:http://irep.iium.edu.my/24980/
http://irep.iium.edu.my/24980/1/1164C.pdf
Description
Summary:Speech is the most natural form of communication and interaction between humans; whereas, text and symbols are the most common form of transaction in computer systems. Therefore, interest regarding conversion between speech and text is increasing day by day for speech oriented human-computer interaction. Microsoft Corporation developed Speech Application Program Interface (SAPI) for speech related works in its Windows operating systems that includes features for only eight languages including English. So, the aim of this study is to investigate Speech-to-Text (STT) conversion using SAPI for Bangla language. Bangla is an important language with a rich heritage; 21st February is declared as the International Mother Language day by UNESCO to respect the language martyrs for the language in Bangladesh at the year of 1952. We managed SAPI to match pronunciation from continuous Bangla speech in precompiled grammar file of SAPI and SAPI returned Bangla words in English character if matches occur. The words are then used to fetch Bangla words from database and return words in true Bangla characters and to complete the sentences. Several English words for particular Bangla word in the grammar file of SAPI is found to overcome tone variation of persons as well as pronunciation variation in language communities and shown to improve overall performance of the system. Experimental study is carried out for the technique on an article from a news paper and the recognition rate was approximately 78% on an average. Although achieved performance is promising for STT related studies, we identified several elements to improve the performance and might give better accuracy. The theme of this study will also be helpful for other languages for Speech-to-Text conversion and similar tasks.