Performance Evaluation of Learning Classifiers of Children Emotions using Feature Combinations in the Presence of Noise

—Recognition of emotion-based utterances from speech has been produced in a number of languages and utilized in various applications. This paper makes use of the spoken utterances corpus recorded in Urdu with different emotions of normal and special children. In this paper, the performance of learning classifiers is evaluated with prosodic and spectral features. At the same time, their combinations considering children with autism spectrum disorder (ASD) as noise in terms of classification accuracy has also been discussed. The experimental results reveal that the prosodic features show significant classification accuracy in comparison with the spectral features for ASD children with different classifiers, whereas combinations of prosodic features show substantial accuracy for ASD children with J48 and rotation forest classifiers. Pitch and formant express considerable classification accuracy with MFCC and LPCC for special (ASD) children with different classifiers.


I. INTRODUCTION AND RELATED WORK
In the modern era of human-computer interaction, emotional speech recognition is a field of vast concern.SER has a great influence on human behavior and is a key point to build relations.Different emotions have their own characteristics that make it memorable in their own way [1].Furthermore, "EmoChildRu" has been introduced [2] as the first child emotion database created to recognize speech and voice emotions from children's behavior.Two child emotional speech examination probes are reported in the context of the corpus: grown-up audience members and programmed listeners.Automatic classification results are fundamentally the same as human discernment, despite the fact that the precision is underneath 55 % for both, demonstrating the trouble of child emotion recognition from speech under natural conditions.To improve the state of people with ASD, a few CAL procedures have been executed.Authors in [3] depict the investigation of fluctuated CAL strategies actualized to enhance the everyday life states of such individuals.Furthermore, the authors briefed about the CAL strategies involved in various applications improving correspondence, behavioral and social abilities of such special children.In [4], it was noticed that it is not easy to observe mental and emotional conditions in autism spectrum conditions (ASC) aspects.A technique was proposed to recognize their speech in [5] independently from the speaker.In this technique, MEL & BARK scale, Equivalent Rectangular Bandwidth in filter space along with gamma toned features were utilized at the front end, whereas at the back end, Fuzzy C Means (FCM), Multivariate Hidden Markov Models (MHMM) and Vector Quantization (VQ) approaches were applied.Individual words and short sentences in Tamil language were used to evaluate the performance by three variants.The data of two speakers were tested against the features of eight speakers.A prominent real situational database was organized in [6] to detect fear thorough the feature classifier "interjection" through speech in extreme emotional and real world emergencies.MFCCs along with Support Vector Machine with variant interjections were utilized to categorize speech emotions.In [7], Urdu language has been taken to recognize emotions from primary age children.The authors used 3 different prosodic features with 5 different classifiers and four emotions.They reported that J-48 classifier achieved the highest accuracy.A profound architecture was utilized in [8], which uses a convolutional network for extricating space shared highlights and a long short-term memory network for arranging emotions utilizing space particular features.A complete cross-corpora exploration of different avenues regarding various speech emotions areas uncovers that transferable features give increases extending from 4.3% to 18.4% in speech emotions recognition.The fundamental of Deep Neural Networks (DNNs) is to perceive human emotions from a speech signal.Mel-frequency Cepstral Coefficient (MFCC) is selected as a one of the frequently used speech features from crude sound information.In next step, DNN nourished the extricated speech features to prepare the system.Also, a hand crafted database was presented improving the utilization of the system [9].The work-related recognition, classification, emotion detection children with ASD is still an open topic of research, while researchers are now more concerned to help these children by making them realize the emotions in the real world.Under this scope, an autism-based game "emotify" has been developed [10].It comprises two levels of difficulty and attempts to teach children about neutral, anger, sadness and happiness emotions.At the second level, children are helped in expressing their feelings which would be evaluated and examined.Machine learning approaches are exploited to develop a multilingual emotion recognition system.This paper evaluates the performance of learning classifiers when dealing with prosodic and spectral features and their combinations considering special (ASD) children as noise in terms of classification accuracy.
II. LEARNING CLASSIFIERS AND FEATURES In daily routine conversations the prosodic features play a vital role [11].The parameters used in expressing the speech to perceive the feelings of users are speech rate, length, pitch, formant, intensity, Mel frequency cepstrum coefficient (MFCC) and LPCC [12].Two spectral features and three prosodic features (intensity, pitch, and formant) and their potential combinations are utilized in this research.
• Pitch: Pitch and frequency are correlated.The analysis of every speech frame is obtained by their statistical values throughout the sample.These values [13] depict the clear picture of properties of audio parameters.
• Intensity: Intensity demonstrates the prosodic feature encoding and the emotion based spoken utterance expressions [12].
• Formant: Formant is a critical recurrence segment of speech which gives quantifiable results of the consonants and vowels of the speech signal [12].
Four learning classifiers were used in the experimental framework.The evaluation of the performance of these classifiers regards spectral and prosodic features and their combinations.The comprehensive description of the learning classifiers can be found in [14].The classifiers are: • J48: A family of decision tree algorithms used to figure the feature vector for different examples.The classes for the recently produced events are being learned on the basis of the training examples.With the support of tree grouping calculation the elementary dispersion of the information is successfully justifiable [15].
• Multi-Layer Perceptron (MLP): A class of deep artificial neural network that contains three layers at minimum.The first is the input layer while the last one is the output layer.
The middle is a hidden layer and different MLPs can have various numbers of invisible layers [16].
• Rotation forest: Rotation forest [15] eliminates randomly any subsets of classes, performs Bootstrap on the remaining data and finally performs PCA and establishes free decision trees.
• Logit Boost Classifier: Boosting [16] works on the principal that a set of weak learners can be used to create a strong classifier.Logit Boost provides higher weights for misclassified classifiers.

III. CORPUS COLLECTION AND RECORDING SPECIFICATION
The corpus has been collected from both categories of children (normal and with ASD) in Urdu and it comprises of 200 samples, equally divided for both cases.As per the research methodology, ASD children have been considered as noise in the experimental framework.The recording specification has been considered in standard conditions with Signal-to-Noise Ratio ≥ 45dB.Microsoft Windows 7 sound recorder has been utilized to record the emotion based spoken utterances of normal and special (ASD) children.The configuration is 16 bit, Mono, PCM with a testing rate of 48KHz with Microphone hazard and awareness of 54dB±2dB and 2.2W separately, 3.5mm mash stereo and link length of 1.8m.The choice of a spoken utterance incorporated these qualities a) semantically impartial, b) simple to investigate, c) reliable with any circumstance exhibited, and d) having comparable importance for every dialect.The sentence was: -"Mujhe Khelna Hai" or "I have to play".

IV. EXPERIMENTAL RESULTS AND DISCUSSION
The performance of learning classifiers is evaluated in the experimental framework for normal and special children making use of prosodic and spectral features and their combinations on spoken utterances recorded in Urdu.The corpus collection comprises of 200 spoken utterance samples in different emotions equally distributed in normal and special children.The experimental framework further classifies inter and intra feature combinations with four different classifiers (logit boost, MLP, J48 and rotation forest) with the following feature configurations: 1) separate prosodic and spectral features, 2) combinations of all three prosodic features (intensity, formant, and pitch) with two spectral features (LPCC and MFCC).The objective of the proposed framework is to identify the behavior of the four classifiers in a single feature or different combinations of spectral and prosodic features on spoken utterances of special (ASD) and normal children in terms of classification accuracy.The experimental results for both corpuses taken from normal and special (ASDtreated as noise) children in Urdu follow.

A. Prosodic Features
Pitch demonstrates great precision in portraying the states of children under all four classifiers while classifying special children more precisely than normal children (Table I).The classification accuracy of rotation forest with prosodic feature pitch for ASD (noisy) children spoken utterances was significantly better than the accuracy of the other classifiers.Intensity also shows good classification accuracy for ASD children for all classifiers except from rotation forest.All learning classifiers demonstrate higher classification accuracy with formant for ASD children in comparison with normal children

B. Spectral Features
MFCC has significant accuracy with MLP and logit boost classifier in case of normal children, on the other hand, only rotation forest shows considerable classification accuracy for ASD children.LPCC has very good accuracy only from logit boost classifier in both normal and special children.The outcome demonstrates that in any study of LPCC the logit boost classifier ought to be utilized.

C. Inter Combination of Prosodic and Spectral Features
The prosodic feature pitch shows significant accuracy with the combination of two other prosodic features.In combination with intensity, pitch illustrates considerable classification accuracy with logit boost and J48 for ASD (special) children.Pitch with formant performs well in classifying special (ASD) children with all classifiers except logit boost.In combination with intensity and formants, MLP and rotation forest show significant classification accuracy in comparison with the other two classifiers for ASD children.J48 has comparable accuracy in classifying special and normal children.Table III provides the results of the learning classifiers with combinations of prosodic and spectral features for classifying the noisy spoken utterances in terms of classification accuracy.The most significant results can be observed of the combination of LPCC with pitch and formant in classifying the noisy (special children) utterances, whereas, pitch and formant with MFCC also show substantial classification accuracy for special (noisy) children.Other combinations such as intensity and LPCC and intensity with MFCC also provide considerable results in classification.

V. CONCLUSION
In this paper, the performance of learning classifiers has been evaluated considering prosodic and spectral features and their combinations for children with ASD in terms of classification accuracy.The experimental frame comprises four different classifiers with different inter and intra combinations of prosodic and spectral features.The experiments were conducted on a sample of 200 individuals equally taken from normal and children with ASD which were considered as noise.The conclusions of the experimental results are: • The spectral features show significant classification accuracy with prosodic features (pitch & formant) with rotation forest and J48 classifiers.
• Separate analysis of the spectral and prosodic features reveals that the classification accuracy of prosodic features is considerably better than of spectral features.
• The intra feature combinations of spectral features with pitch and formant demonstrate better classification accuracy for different classifiers.
www.etasr.comSamad et al.: Performance Evaluation of Learning Classifiers of Children Emotions using Feature … Fig. 1.Prosodic features

Fig. 3 .
Fig. 3.Inter and intra combinations of featuresD.Intra Combination of Prosodic and Spectral FeaturesIn Intensity-MFCC, MLP performs better in classifying the normal children, while logit boost is considerably good in

TABLE I .
CLASSIFICATION ACCURACY FOR PROSODIC FEATURES www.etasr.comSamad et al.: Performance Evaluation of Learning Classifiers of Children Emotions using Feature … classifying noisy spoken utterances for special children.In this case, rotation forest and J48 are performing averagely.Similarly, in Intensity-LPCC combination, the MLP is good and accurate.Nevertheless, logit boost more accurately classifies the normal children's spoken utterances.Rotation forest and J48 perform averagely.