Reduced Feature Set for Emotion Based Spoken Utterances of Normal and Special Children Using Multivariate Analysis and Decision Trees

—The current paper deals with the use of multivariate data analysis and decision tree methods in order to reduce the feature set for the normal and special children speech in four different emotions: anger, happiness, neutral and sadness. Ten features were extracted, by an algorithm implemented in a previous study to classify the speech emotions of normal and special children. In the current study, the best features are selected using multivariate analysis: principal component analysis (PCA), factor analysis and decision tree. Step by step PCA is applied to reduce the feature set according to the variables that are collinear. The obtained reduced feature sets are applicable to both normal and special children samples. Experimental results revealed that PCA yields the feature set comprising pitch, intensity, formant, LPCC and rate of acceleration. Factor analysis provides three feature sets out of which the feature set comprising of Rasta PLP, MFCC, ZCR and intensity provides the best result. Decision tree yields a feature set comprising energy, pitch and LPCC.


INTRODUCTION
Emotion recognition system identifies the emotional state from voice [1], therefore it is called speech emotion recognition (SER).There are four modules of SER: input, feature extraction, feature selection and classification of emotions [2].Prosodic features, particularly pitch, intensity and duration were used in early research studies.Currently, LLD's features such as shimmer, jitter, harmonic to noise ratio (HNR) and cepstrum have been used extensively [3,4].LPCC and MFCC were also accompanied in the speech feature set [5].In [6], 40 depressed patients and 40 control subjects were used in a study for speech feature analysis.Characteristics of depressed patients were found using ANOVA analysis and the results were linked to Gaussian mixture model (GMM) and support vector machine (SVM).Autism spectrum disorder comorbid for children (ASD-CC) psychometric properties were evaluated and developed in [7].Confirmatory factor analysis (CFA) is used for the factor structure of the Korean version of ASD-CC.
In [8], ten features were extracted: frequency, pitch, intensity, rate of acceleration, formant frequencies, log power, log energy, rate of zero passages, Mel frequency cepstrum coefficient (MFCC), linear prediction cepstrum coefficient (LPCC).The extraction of frequency starts with the speech signal loading and the conversion of analog signal into numeric data.After loading, maximum and minimum frequency were set and fast Fourier transform (FFT) of windowed signal was performed as shown in (1).Followed by the cepstrum calculation the frequency is extracted as shown in (2).For pitch extraction, signal acquisition and signal processing are the same as in frequency extraction.FFT is applied on the processed signal.Discrete Fourier transform (DFT) is taken of the FFT signal as in (3).Then the log of FFT is calculated.After that the real cepstrum is calculated and it is the absolute value of filtered log of DFT as shown in (4).Then finally real cepstrum pitch is extracted given in (5).
The magnitude of the DFT signal with log 10 is filtered as in (6).The conversion of the magnitude into decibels and transpose result of decibel gives out intensity as shown in (7).where M=magnitude, Y=filtered signal, x=input signal, i=intensity, mag2db=magnitude to decibels.
Processing of the signal involves the time instant calculation as given in (8).The derivative of speed gives the velocity in (9).The gradient of velocity by 0.01 yields the acceleration in (10).Finally the average rate of acceleration is calculated by taking the mean of acceleration.For extracting the formant frequency, preprocessing involves setting the number of coefficients according to the rule of thumb for formant estimation given in (11).After that calculation of the linear prediction coefficients is carried out in (12).Then the frequencies are calculated using (13).The imaginary part of the root gives out formant frequencies.
Preprocessing involves setting and analysis of frame duration and frame shift.It is followed by the setting of the pre emphasis coefficient alpha=0.97, the number of filter bank channels M=20 and lower and upper frequency limits.Then the Hertz to Mel-wrapping function is calculated.Application of DCT matrix routine and the magnitude of the spectrum are calculated in (19).The filter bank is applied to the unique part of the magnitude spectrum.Finally the calculation of cepstral liftering gives MFCC (20).
where dctm=secrete cosine transform matrix, N=number of coefficients, M=number of filter bank channels, L=length of channel, CL= cepstrum lifter.
Preprocessing involves the estimation exponent of next high power according to signal size.Then the number of prediction paths 'p' is set.Calculation of the number of linear prediction of coefficients is carried out.Fourier transform is applied on X-lpc according to the number of shifts N as given in ( 21).The logarithm of LPC is taken and then the LPC coefficients are converted back to spectra.The number of cepstra is then set and the first and second derivative of LPCC features are estimated to have the value of coefficients in (22).
After extracting these features, speech emotion recognition of normal and special children (SERNSC) is implemented.To make the algorithm run efficiently, dimension reduction is a valuable approach.The advantage of dimensionality reduction is that it helps to discover the grouping of features that for sure run the algorithm with improved accuracy [9].Detection of projection subspace basis evaluation is suggested in [10].For deduction it uses generalized hyperbolic mixture (HMMDR) fit.This method is well accepted along with discriminant analysis, model based classification and clustering analysis.Two SDR techniques are demonstrated in [11].The relationship between partial least square (PLS) and principal component regression (PCR) is explained.Dimensionality reduction by joining features is one of the best strategies proposed so far [12].Sparse partial least square regression (SPLSR) is investigated in depth in [13].It is revealed that the recognition rate of SPLSR is up to 79.23% and it is superior when compared to other methods used for dimensionality reduction.In this paper, feature reduction is presented using

Fig
Fig. 1.PC e variables ar and coding sch n: (a) Eigenva the componen mponent (varia xcluded from mponent is re the model.In ures and cod rs that have th this method t (a) communa the commun be between 0. d on the strong

Fig
Fig. 2. Factor ees tree method a eatures on the e threshold of curacy feature w diagram is s

TABLE III .
RESULTSThan de