Early Detection of Parkinson’s and Alzheimer’s Diseases Using the VOT_Mean Feature

Alzheimer’s (AD) and Parkinson’s diseases (PD) are tw of the most common neurological diseases in the world. Several studies have been conducted on the identification of these diseases using speech and laryngeal disorders. Those symptoms can appear even at the early stages of AD and PD, but not in very specific and prominent ways. Voice Onset Time (VOT) is an acoustic specification of the stopping consonant that is commonly discussed in studies of phonetic perception. In this study, the VOT_Mean feature was explored to identify AD and PD early using /pa/, /ka/, and /ta/ syllables for the diadochokinetic task (DDK). VOT_Mean was calculated as the average of the first and the second VOT values (VOT_1 and VOT_2), corresponding to the second and the penultimate VOT measurement cycles. Experimental tests were performed on Tunisian Arabic and Spanish databases for the early detection of AD and PD respectively. The results showed a very high significance of VOT_Mean on the early detection of AD and PD. Moreover, the best results were achieved using the XGBoost (XGBT) algorithm as a classifier on the VOT_Mean feature. Keywords-Alzheimer’s disease (AD); Parkinson's disease (PD); early detection; neurological disorders; VOT_Mean; DDK; Tunisian Arabic database; Spanish database


INTRODUCTION
Associated speech disturbances caused by disturbances in the speech mechanism, particularly in muscle control, are grouped under the single definition called dysarthria. These disturbances result from disorders of the basic motor processes involved in the production of speech [1]. The difficulties in verbal production caused by dysarthria are due to weaknesses of muscle speech, lack of coordination, and paralysis. Moreover, dysarthria can cause peripheral and central nervous system damages. These lesions can appear from birth, as in the cases of cerebral palsy and muscular dystrophy, or later in life engendered by different conditions that can disrupt the nervous system, involving injuries, Parkinson's disease (PD), cerebral moods, multiple sclerosis, Huntington's disease, etc. The first studies related to the perceptual characteristics of dysarthria were carried in [2][3]. A total of 212 patients having symptoms of joint dysarthria with different neurological disorders, such as Parkinsonism, amyotrophic lateral sclerosis, chorea and cerebellar ataxia, bulbar palsy, dystonia, and pseudobulbar palsy were examined, identifying 38 distinct characteristics of speech and categorizing them into seven modalities: articulation, respiration, pitch, prosody, loudness, resonance, and vocal quality. These studies specified the characteristics of each neurogenic group in addition to those shared by more than one. Once the perceptual characteristics of each neurogenic group were identified and grouped, the characteristics of dysarthric speech were classified into the following dysarthria types: hyperkinetic, ataxic, unilateral upper motor neuron, mixed and flaccid, and hypokinetic. Hypokinetic dysarthria is considered to be one type of dysarthria primarily associated with alterations in the functions of the control circuits of the basal ganglia. These disorders are the iterations that occur between motor and cognitive functions in the hippocampus. Hypokinetic dysarthria includes several aetiologies. Idiopathic PD is one such prototypical and common disease linked to this type of dysarthria. As the rate value of the estimated prevalence of PD in people aged over 65 years is 1.5%, it is considered to be the second most common neurodegenerative disorder [4], following Alzheimer's disease (AD) [5][6]. Moreover, cognitive impairment can be observed in 80% of people with PD [7]. The epidemiological study demonstrated in [8] showed that 80% of the patients with PD developed cognitive impairment, while 50% of them had PD dementia. The risk of developing dementia was higher in people with PD than in healthy ones [9]. Additionally, PD dementia contributes to the increased burden of care in addition to morbidity or mortality [10]. Thus, it is important to detect early Parkinson's Disease Dementia (PDD).
Further studies are needed to identify, evaluate, and distinguish PDD indices from other types of dementia, such as AD. Although the discriminating characteristics of PDD were manifested in damage on the executive function from the cognitive function test and problems related to the free recall in the memory test, PD without dementia may have diminished executive functions, such as worsening the verbal fluency [11]. However, problems with memory and executive functions can also be evidenced by AD and other types of cognitive impairments at the early-stage, complicating their distinction based on executive functions. Additionally, there may still be a significant overlap between the cognitive impairments of AD and PD patients, although the former primarily suffer from deterioration of the cortical profile (i.e. memory and ability language) and the latter have defects in the subcortical profile (i.e. executive functions and visuospatial ability) [12]. Moreover, similar cognitive disorders have been reported for AD and the 26% of the PDD cases [13]. Furthermore, 50% of PD patients without dementia exhibited amyloid accumulation [14], which is a widely known characteristic of AD. It should be noted that greater patient's age and later onset of PD increase the risk of dementia [15].
PD can lead to motor limitations affecting the speech production process causing several disorders, broadly categorized into three main dimensions: Linguistic, prearticulatory (resonance and phonatory characteristics), and prelinguistic (prosodic fluctuations and articulatory changes) [16]. In this context, sound production from the physical point of view is referred to prearticulatory, keeping in mind only the characteristics linked to the phonation process at the level of modulations integrated by the vocal tract (such as a resonant structure invariant in time) and phonation. Prelinguistic dimension refers to variations in the prosody of speech and is rendered as variations in f 0 and intensity, as well as articulatory variations appearing due to changes in the coordination frequency, the position of articulators, and a result of their interconnection with excitement. Thus, it mentions the mechanisms of speech production as forms varying in time. The linguistic dimension is linked to changes in the expected acoustic content of speech, mainly due to substitutions and/or repetitions. To sum up, the linguistic dimension is related to the expressive capacities of the language, while the prelinguistic and prearticulatory dimensions are related to the speech [16].
AD involves the whole process of the first pathological variations in the brain before its symptoms manifest as dementia [17]. AD includes patients with dementia, patients with Mild Cognitive Impairment (MCI), and asymptomatic persons having positive AD biomarkers [16]. Brain variations occur when specific neurons are deteriorated or destroyed. The progression of symptoms varies more quickly when the diagnosis of AD is made at an early stage [17]. Language issues are considered as one of the most specific symptoms of AD, which manifest as an unavoidable and direct outcome of cognitive impairment [18]. Primary Progressive Aphasia (PPA), defined as a clinical syndrome specified by a progressive language disorder with a neurogenerative etiology, could be a manifestation mode of several neurodegenerative deficiencies such as AD [19]. Communication and language problems are evident in interactions between neurologists and patients, and interactional feedback can be explored to distinguish cognitive difficulties caused by neurodegenerative or functional memory impairments [20]. The use of different linguistic tests has shown poor performance for patients with AD [21], as they have more difficulty in naming tests where they often repeat the same ideas, use simpler forms of speech, and use empty or longer pauses. Moreover, people having AD show monotonous prosody and less cohesive, informative, and consistent speech compared to control groups [22].
Methods based on the automatic processing of the voice signal from its forms have been recently explored for the detection of neurodegenerative impairments. The traditional and common features of AD detection using voice signals are linear and considered to be the easiest to interpret clinically [23][24][25][26][27][28]. Other innovative and current techniques have explored non-linear features [29]. Both types of features are primary indicators of expressive forms of language based on the tasks performed [24]. Besides the feature extraction process, where mathematical models and/or statistical analysis are used to quantify the characteristics of speech, there is the classification process based on the exploration of machine learning techniques. Different types of classifiers have been constructed to perform the classification procedure, such as Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), and K-Nearest Neighbour (KNN). Deep learning appears as a more sophisticated learning technique making possible the advance on automatic AD detection from the voice signal.
The DDK task is defined as a clinical test explored in the evaluation of the articulatory system's functional capacities [30]. Its concept is that motor deficits in vocal abilities of people suffering from PD appear more strongly in circumstances requiring motor execution and planning over long sequences of motor production (i.e. repeating the same sequence several times or in cases where a given fragment appears in a sentence context). Correcting PD results leads to simplifying the articulation at the cost of ease of processing [31], indicating important signs on speech production from a motor point of view. Fine articulatory precision is required for the DDK test, including the alternating production of syllable sequences, as well as the ability to quickly change articulators between consecutive segments. This is usually performed by asking the patient to produce combinations comprising of a voiceless consonant and vowels with velar, bilabial, and alveolar places of articulation. More precisely, the subjects are usually asked to constantly repeat the sequence of syllables /pa/-/ta/-/ka/ for about 10 seconds as quickly and clearly as possible. This test needs fast movements of the articulators using the soft palate (back of the mouth), the lips (the front), and the lip of the tongue (middle) continuously and sequentially. Regardless of its simplicity, this task reveals some signs of the speaker's ability to assess subphonemic durations or syllable-to-syllable stability and produce the speech at an appropriate rate [31]. The syllable rates of the DDK test are used in this sense to quickly alternate speech movements and assess the patient's ability [30]. The test was also used to evaluate imprecise consonantal coordination determining the VOT, typically measured as the duration between the initial burst and the vowel onset [32]. Moreover, in [33], it was proposed that the imprecisions of articulations on stop consonants were mainly due to low-frequency frictional noise or spirantization substituting stop gaps as a consequence of reduced closure. In this sense, the change in VOT for voiceless (/p/, /t/, /k/) and voiced (/b/, /d/, /g/) stops was identified as an indicator for the presence of PD [33][34][35][36][37]. Furthermore, VOT measurements were explored and compared for analyzing the speech production [p, t, k] of bilingual adults. VOT values were higher in Brazilian Portuguese (BP) than in English [38]. In [39], the differences between the VOT duration measures  [42][43][44], while the VOTs in Lebanese Arabic were shorter compared to other languages [45].
This study adopted the VOT_Mean measure as an indicator to identify PD on a Tunisian Arabic dataset and AD on a Spanish dataset, using different machine learning algorithms and voiceless consonants (/pa/, /ta/, and /ka/) for the DDK task. The measurement of VOT_Mean was carried out by taking the average value of the second (VOT_1) and the penultimate (VOT_2) VOT measurement cycles.

A. Methods
The proposed method was composed of five phases: feature extraction, feature selection, feature scaling, data distribution, and classification.

1) Feature Extraction
The VOT measurements were carried out manually for /pa/, /ta/, and /ka/ syllables by utilizing the Praat software. As shown in Figure 1, the VOT_Mean was calculated by averaging the values of VOT_1 and VOT_2 which corresponded to the second and the penultimate VOT measurement cycles. The second cycle of the VOT measurement was selected due to the insufficient speech energy of the speaker, during the first VOT cycle. The penultimate VOT measurement cycle was selected due to its sufficient energy. The time cycle measurement was given in ms. The VOT_Mean values were consistent with the production of the stop consonant. The extraction of VOT_1 and VOT_2 using Praat software.

2) Feature Selection
In this phase, Principal Component Analysis (PCA) was used due to its high performance on feature extraction and feature dimension reduction. This approach was based on the assumption that features with high variance included most of the information relating to certain classes. Furthermore, it permitted the conversion of observation groups with correlated variables into smaller groups with linear correlated variables by exploiting an orthogonal transformation.

3) Feature Scaling
This step was applied for the samples remaining after the feature selection. MinMax was used as a scaling function to scale the samples between -1 and 1. This phase's objective was to restrict the values of the concerned features in this range, avoiding large variations. Thus, the computational cost could be decreased ameliorating the model's performance.

4) Data Distribution
The explored database was split into training and test sets using the Python train_test_split function. As shown in Table I, 20% of the database was used for testing, while 80% was used for training. This distribution reduced computation time for the first cycles of modeling.

1) Tunisian Arabic Dataset for Alzheimer's Disease
The dataset included numerous vocal recordings of male and female individuals of the AD group, where 20 suffered from AD, and 20 were healthy (HC). Each group was equally divided between the two sexes. Each speaker was requested to pronounce the sequence of syllables /pa/, /ka/, /ta/, or /pataka/ in the Tunisian Arabic dialect with a comfortable height and intensity. The goal behind the repetition of each syllable was the preservation of high reliability and quality in speech production for each participant. The number of repetitions was linked to the ability of each participant to repeat the requested sequence, so the duration of the recordings differ. All samples were recorded at 16-bit resolution, 22KHz sample rate, and saved in WAV format. The extracted features were saved in CSV format. Healthy participants' samples were labeled with "0", whereas participants suffering from AD were marked with "1". The mean age of the participants was 76 years, while its standard deviation was 8.17 years. The youngest and the oldest participants were 61 and 88 years old, respectively. The mean age of the healthy participants was 74.125, while its standard deviation was 5.79.

2) Spanish Dataset for Parkinson's Disease
This dataset was composed of voice recordings from 50 people suffering from PD and 50 HC. Each group consisted of 25  Additionally, all participants were diagnosed and labeled by expert neurologists using the UPDRS and H&Y scales [46]. The samples were recorded at 16-bit resolution and 44.1KHz sample rate using a dynamic omnidirectional microphone.

3) Data Analysis
The SPSS software was utilized for performing Multivariate Analysis of Variance (MANOVA) on the repeated tests of VOT_Mean data. As shown in Table II, the VOT_Mean demonstrated a very high significant effect on the AD and PD groups ([F(1, 63)=44.773; p<0.000] and [F(1.296)=94.346; p<0.000], respectively), proving that the AD and HC groups' VOT_Means differ significantly. The sum and the average of squares of VOT_Means for inter and intragroups are also presented in Table II.

4) Evaluation Metrics
Different evaluation metrics were explored for verifying the predictability of the different models. The accuracy and the Fmeasure offered the opportunity to measure the capacity of differentiation of one class among others, even if they were unbalanced. Nevertheless, accuracy in such a case can lead to misleading results. The precision, sensitivity, and measurement metrics F were expressed as: where tp, fp, and fn represent true positive, false positive, and false negative, respectively. Precision and specificity were determined as: where tn represents true negative. Moreover, the Matthews Correlation Coefficient (MCC) was used to prove the quality of the binary classification. This metric was expressed as:

III. RESULTS AND DISCUSSION
A. Results Table III shows the results of using different classifiers on the Tunisian Arabic dataset to detect AD. The best performance in all terms was achieved by XGBT, followed by RF. The XGBT algorithm had 92% precision, 92% accuracy, 92% sensitivity, 92% F1-measure, and 0.84 MCC. The RF algorithm had 90% precision, 90% sensitivity, 90% accuracy, 92% F1-measure, and 0.80 MCC.  Table IV shows the results of using different classifiers on the Spanish dataset for detecting PD. The best results were achieved by using the XGBT algorithm on the /ka/ syllable. The XGBT algorithm achieved the best average for precision (92%), followed by RF (88%). Regarding accuracy, the best average value was reached by XGBT (92%), followed by the RF (90%). The best sensitivity was achieved by XGBT (92%), followed by RF (90%). Concerning the F1, the best average values were achieved by XGBT (92%), followed by RF (87%), while the best MCC was achieved by XGBT (0.88), followed by RF (0.73). Overall, it can be concluded that XGBT was the best classifier on both AD and PD, in terms of results.

B. Discussion
This study aimed to examine whether subtle early symptoms of AD and PD exist in the acoustic signal of Arabic and Spanish speech respectively, by comparing the VOT_Mean feature between HC, AD, and PD groups. In [47], it was noted that the overall variation in VOT production could be revealed in individuals with moderate AD. Also, relying on results from various measurements on older people, there was an expectation that people at this stage of AD can show an overall change around the smaller VOT values. According to this study's results, the differences in VOT_Mean values were statistically very significant in people with AD or PD compared to a healthy control group, on both Arabic and Spanish databases. To the best of our knowledge, no published studies have tested VOT_Mean measurements in people with AD in Arabic languages. Nevertheless, some studies have been performed on Arabic databases using the VOT measure for other tasks, such as testing the impact of contrast on F0 (CF0) caused by consonants in Lebanese Arabic [48]. Testing VOT measures on English [48] showed that there was no statistical significance in the differences of people with AD for the early detection of this disease. For early detecting PD in the Spanish language, it was found that utilizing the VOT_Mean measure produced better results using the /ka/ syllable than /ta/ or /pa/. This is in agreement with [49], who proved that the proposed VOT was accurate for estimating its limits with the /ka/ syllable for healthy and PD-affected individuals. Moreover, the highest discrimination ability (94.4% with the leave-one-out method and 92.2% with the 10-fold cross-validation) was achieved using the PD detection approach proposed in that study with /ka/ syllable. Furthermore, good results were obtained on the early detection of AD and PD using the VOT_Mean measurement, proving that this approach is effective regardless of the languages explored.

IV. CONCLUSION
This paper studied the early detection of Alzheimer's and Parkinson's diseases using the VOT_Mean feature of the DDK task. The early detection of AD test was performed on a Tunisian Arabic dataset using /pa/, /ta/, and /ka/ syllables, and the best results were obtained using the XGBT algorithm (precision=92%, accuracy=92%, sensitivity=92%, F1=92%, MCC=0.84). The early detection of PD test was carried out on a Spanish dataset using separately the /pa/, /ta/ and /ka/ syllables, and the best average results (precision=92%, accuracy=92%, sensitivity=92%, F1=92, MCC=0.88) were reached with XGBT on the /ka/ syllable. Moreover, the very high significance of the VOT_Mean feature was noted concerning early detection of AD and PD. Future work should test the VOT_Mean feature on the early detection of PD on Arabic databases, as there is no such database to this date. Furthermore, the proposed model for PD detection should be tested on other syllables, such as /pata/, /pakata/, etc. Regarding the early detection of AD, it is important to test the suggested approach on different language datasets.