Universality and diversity of cultural-influenced speech emotion recognition system

Culture refers to the cumulative knowledge, beliefs, values and concepts that are accepted by a group of people. Such information are shared and inherited from the previous generations in order for one to be blended and accepted in a society. Different cultural groups communicate differently that is...

Full description

Bibliographic Details
Main Authors: Kamaruddin, Norhaslinda, Abdul Rahman, Abdul Wahab, Mazlan, Muhammad Jaliluddin, Norzilan, Norul Ayny
Format: Article
Language:English
English
Published: Medwell Journals 2016
Subjects:
Online Access:http://irep.iium.edu.my/55698/
http://irep.iium.edu.my/55698/
http://irep.iium.edu.my/55698/1/55698_Universality%20and%20diversity.pdf
http://irep.iium.edu.my/55698/2/55698_Universality%20and%20diversity_SCOPUS.pdf
Description
Summary:Culture refers to the cumulative knowledge, beliefs, values and concepts that are accepted by a group of people. Such information are shared and inherited from the previous generations in order for one to be blended and accepted in a society. Different cultural groups communicate differently that is distinct and unique making homogeneous interpretation of underlying emotional contents are more accurate. However, universality of cultural-influenced speech can be observed when cross cultural speeches are being interacted from different cultural groups to one another especially with the advancement of communication technology. In this study, two different cultural-influenced speech datasets representing American (NTU-American) and European (Netherland EmoSpeech) are employed to investigate their similarity and dissimilarity in term of heterogeneous listener's perception on the underlying emotional contents. The Mel Frequency Cepstral Coefficient (MFCC) feature extraction method and Multi Layer Perceptron (MLP) classifier are coupled to determine four different emotions, namely; anger, happiness, sadness and neutral acting as emotionless state. From the experimental result, it is noted that the proposed approach yielded accuracy performance of two times better than chance guessing. Moreover, the Netherland EmoSpeech dataset managed to obtain comparative accuracy with the established NTU-American dataset demonstrating that the data is satisfactory for speech emotion recognition purposes.