A Novel Approach on Speaker Gender Identification and Verification Using DWT First Level Energy and Zero Crossing
Received: 18 August 2022 | Revised: 3 September 2022 | Accepted: 5 September 2022 | Online: 15 December 2022
Corresponding author: S. Saadi
Abstract
The aim of this work is to find a new criterion for determining a range of values in order to determine the gender of a speaker. The use of the Discrete Wavelet Transform (DWT) of the Daubechies db7 parent wavelet and the computation of the zero crossing energy from the first level of the DWT was followed by computation of the values of the criterion for both genders and comparison with the value of the speech basic frequency for both genders for the same sign or sentence. The standard has a limited range of values close to the basic frequency range of the same speaker through which we can determine gender. This criterion has been tested on several men and women databases with different repeated sentences for the same person or for both genders and it gives acceptable results that can be worked on.
Keywords:
Speaker gender, DWT, Energy, Zero crossingDownloads
References
L. Jeancolas, "Détection précoce de la maladie de Parkinson par l’analyse de la voix et corrélations avec la neuroimagerie," Ph.D. dissertation, Paris-Saclay University, Paris, France, 2019.
R. Ajgou, "Techniques De Détection De La Période Du Pitch Par Les Méthodes Temps Fréquence Et Temps Échelle.," M.S. thesis, University of Biskra, Biskra, Algeria, 2010.
F. Bahja, Détection du fondamental de la parole en temps-réel: Application aux voix pathologiques. Presses Académiques Francophones, 2014.
R. Ajgou, S. Sbaa, S. Aouragh, and A. Taleb, "Détection Du Pitch Par Les Ondelettes Continues En Temps Réel Pour Un Signal Parole Basée Sur Un Seuil Adaptatif Pour Une Détermination V/Nv," Courrier du Savoir Scientifique et Technique, vol. 12, no. 12, pp. 21–26, May 2014.
M. A. Ben Messaoud, A. Bouzid, and N. Ellouze, "Estimation du pitch et décision de voisement par compression spectrale de l’autocorrélation du produit multi-échelle (Pitch estimation and voiced decision by spectral autocorrelation compression of multi-scale product) [in French]," in Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, Grenoble, France, Mar. 2012, vol. 1, pp. 201–208.
Y. Fayçal, R. Amiar, S. Hecini, W. Benzaba, and L. Bendaouia, "Etude Comparative des Performances de Plusieurs Techniques de Détection de la Fréquence Fondamentale des Signaux Vocaux.," in Proceedings of the 2nd Conférence Internationale sur l’Informatique et ses Applications (CIIA’09), Saida, Algeria, Jan. 2009.
M. A. Nasr, M. Abd-Elnaby, A. S. El-Fishawy, S. El-Rabaie, and F. E. Abd El-Samie, "Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients," International Journal of Speech Technology, vol. 21, no. 4, pp. 941–951, Dec. 2018.
M. Chandra, P. Nandi, A. kumari, and S. Mishra, "Spectral-Subtraction Based Features for Speaker Identification," in Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA), 2015, pp. 529–536. DOI: https://doi.org/10.1007/978-3-319-12012-6_58
S. R. Shahamiri and F. Thabtah, "An investigation towards speaker identification using a single-sound-frame," Multimedia Tools and Applications, vol. 79, no. 41, pp. 31265–31281, Nov. 2020. DOI: https://doi.org/10.1007/s11042-020-09580-4
I. Vélez, C. Rascon, and G. Fuentes-Pineda, "Lightweight speaker verification for online identification of new speakers with short segments," Applied Soft Computing, vol. 95, Oct. 2020, Art. no. 106704. DOI: https://doi.org/10.1016/j.asoc.2020.106704
W. Helali, Ζ. Hajaiej, and A. Cherif, "Real Time Speech Recognition based on PWP Thresholding and MFCC using SVM," Engineering, Technology & Applied Science Research, vol. 10, no. 5, pp. 6204–6208, Oct. 2020. DOI: https://doi.org/10.48084/etasr.3759
K. Daqrouq and K. Y. Al Azzawi, "Average framing linear prediction coding with wavelet transform for text-independent speaker identification system," Computers & Electrical Engineering, vol. 38, no. 6, pp. 1467–1479, Nov. 2012. DOI: https://doi.org/10.1016/j.compeleceng.2012.04.014
C. Turner and A. Joseph, "A Wavelet Packet and Mel-Frequency Cepstral Coefficients-Based Feature Extraction Method for Speaker Identification," Procedia Computer Science, vol. 61, pp. 416–421, Jan. 2015. DOI: https://doi.org/10.1016/j.procs.2015.09.177
M. A. Nasr, M. Abd-Elnaby, A. S. El-Fishawy, S. El-Rabaie, and F. E. Abd El-Samie, "Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients," International Journal of Speech Technology, vol. 21, no. 4, pp. 941–951, Dec. 2018. DOI: https://doi.org/10.1007/s10772-018-9524-7
M. Kiran Reddy et al., "The automatic detection of heart failure using speech signals," Computer Speech & Language, vol. 69, Sep. 2021, Art. no. 101205. DOI: https://doi.org/10.1016/j.csl.2021.101205
A. Mnassri, M. Bennasr, and C. Adnane, "A Robust Feature Extraction Method for Real-Time Speech Recognition System on a Raspberry Pi 3 Board," Engineering, Technology & Applied Science Research, vol. 9, no. 2, pp. 4066–4070, Apr. 2019. DOI: https://doi.org/10.48084/etasr.2533
A. Amehraye and S. Saoudi, Débruitage perceptuel de la parole. 2009.
R. Narayanam, "Voiced and Unvoiced Separation in Speech Auditory Brainstem Responses of Human Subjects Using Zero Crossing Rate (ZCR) and Energy of the Speech Signal," International Journal of Engineering Sciences & Research Technology, vol. 4, no. 9, pp. 370–380, Jun. 2017.
"Fréquence de coupure," Wikipédia. Feb. 11, 2022, [Online]. Available: https://fr.wikipedia.org/w/index.php?title=Fr%C3%A9quence_de_coupure&oldid=190757368.
M. V. Daithankar and S. D. Ruikar, "Analysis of the Wavelet Domain Filtering Approach for Video Super-Resolution," Engineering, Technology & Applied Science Research, vol. 11, no. 4, pp. 7477–7482, Aug. 2021. DOI: https://doi.org/10.48084/etasr.4262
A. Pini, "Notions de base sur les filtres passe-bas antirepliement (et pourquoi ils doivent être adaptés au CAN)," Digi-Key Electronics, Mar. 24, 2020. https://www.digikey.fr/fr/articles/the-basics-of-anti-aliasing-low-pass-filters.
D. Sripath, "Efficient Implementations of Discrete Wavelet Transforms Using FPGAs," Jan. 2003.
E. Hostalkova, "Wavelet Transform," Athens, Greece, Nov. 2009.
A. Sumithra and B. Thanushkodi, "Performance Evaluation of Different Thresholding Methods in Time Adaptive Wavelet Based Speech Enhancement," International Journal of Engineering and Technology, vol. 1, no. 5, pp. 439–447, 2009. DOI: https://doi.org/10.7763/IJET.2009.V1.82
K. Tajane, R. Pitale, and J. Umale, "Review Paper :Comparative Analysis Of Mother Wavelet Functions With The ECG Signals," International Journal of Engineering Research and Applications, vol. 4, no. 1, pp. 38–41, Jan. 2014.
Downloads
How to Cite
License
Copyright (c) 2022 A. Amraoui, S. Saadi
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.