Voice Activity Detection Using Weighted K-Means Thresholding Algorithm

Alimi Sheriff; Yussuff I. O. Abayomi

doi:10.35940/ijitee.D1051.14040325

PDF

Published: 29-03-2025

DOI: https://doi.org/10.35940/ijitee.D1051.14040325

Keywords:

K-Means, Thresholding Algorithm, Voice Activity Detection

Alimi Sheriff

Department of Computer Science, Babcock University, Ilishan Remo (Ogun State), Nigeria.

https://orcid.org/0009-0002-1954-1598

Yussuff I. O. Abayomi

Associate Professor, Department of Electronic and Computer Engineering, Lagos State University, Epe (Lagos), Nigeria.

https://orcid.org/0000-0003-3829-9944

Abstract

Voice activity detection (VAD) separates speech segments from silent segments of an audio signal, and it is valuable for many speech-processing applications because it assists in improving performance and system efficiency; such applications include speech recognition and speaker verification. In this study, K-means, a clustering algorithm, was extended to a thresholding algorithm termed K-means weighted thresholding and was utilized for discriminating voiced/speech segments from silent segments from audio or speech signals. The voice signal was fragmented into frames of 2048 samples, and the spectral power of the frames served as input for computing the threshold value by the extended k-means algorithm; hence, any frame whose spectral power is greater than or equal to the threshold value is considered to part of the voice segments; otherwise, it is tagged as a silent frame. The implemented voice activity detection system achieved outstanding performances with a true acceptance rate (sensitivity), false acceptance rate, true rejection rate (specificity), false rejection rate (miss rate), and a classification accuracy of 100%, 0.025%, 100%, 0%, and 99.97%, respectively.

Downloads

Download data is not yet available.

Issue

Vol. 14 No. 4 (2025): Volume-14 Issue-4, March 2025

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

CC-BY-NC-ND 4.0

How to Cite

[1]

Alimi Sheriff and Yussuff I. O. Abayomi , Trans., “Voice Activity Detection Using Weighted K-Means Thresholding Algorithm”, IJITEE, vol. 14, no. 4, pp. 1–7, Mar. 2025, doi: 10.35940/ijitee.D1051.14040325.

References

S. Alimi and A. Oludele, “Voice Activity Detection: Fusion of Time and Frequency Domain Features with A SVM Classifier,” Comput. Eng. Intell. Syst., vol. 13, no. 3, pp. 20–29, 2022, DOI: https://doi.org/10.7176/CEIS/13-3-03

M. Faghani, H. Rezaee-Dehsorkh, N. Ravanshad, and H. Aminzadeh, “Ultra-Low-Power Voice Activity Detection System Using Level-Crossing Sampling,” Electron., vol. 12, no. 4, 2023, DOI: https://doi.org/10.3390/electronics12040795

H. Krishnakumar and D. S. Williamson, “A comparison of boosted deep neural networks for voice activity detection,” in GlobalSIP 2019 - 7th IEEE Global Conference on Signal and Information Processing, Proceedings, 2019. DOI: https://doi.org/10.1109/GlobalSIP45357.2019.8969258

J. N. de Boer et al., “Acoustic speech markers for schizophrenia-spectrum disorders: A diagnostic and symptom-recognition tool,” Psychol. Med., 2021, DOI: https://doi.org/10.1017/S0033291721002804

V. Rapcan, S. D’Arcy, S. Yeap, N. Afzal, J. Thakore, and R. B. Reilly, “Acoustic and temporal analysis of speech: A potential biomarker for schizophrenia,” Med. Eng. Phys., vol. 32, no. 9, pp. 1074–1079, Nov. 2010, DOI: https://doi.org/10.1016/j.medengphy.2010.07.013

R. Makowski and R. Hossa, “Voice activity detection with quasi-quadrature filters and GMM decomposition for speech and noise,” Appl. Acoust., vol. 166, 2020, DOI: https://doi.org/10.1016/j.apacoust.2020.107344

S. Dwijayanti, K. Yamamori, and M. Miyoshi, “Enhancement of speech dynamics for voice activity detection using DNN,” Eurasip J. Audio, Speech, Music Process., vol. 2018, no. 1, 2018, DOI: https://doi.org/10.1186/s13636-018-0135-7

M. Lavechin, M. P. Gill, R. Bousbib, H. Bredin, and L. P. Garcia-Perera, “End-to-end domain-adversarial voice activity detection,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2020. DOI: https://doi.org/10.21437/Interspeech.2020-2285

X.-K. Yang, L. He, D. Qu, and W.-Q. Zhang, “Voice activity detection algorithm based on long-term pitch information,” 2016, DOI: https://doi.org/10.1186/s13636-016-0092-y

C. E. Chelloug and A. Farrouki, “Robust Voice Activity Detection Against Non Homogeneous Noisy Environments,” in 2018 International Conference on Signal, Image, Vision and their Applications, SIVA 2018, 2019. DOI: https://doi.org/10.1109/SIVA.2018.8661045

Z. H. Tan, A. kr Sarkar, and N. Dehak, “rVAD: An unsupervised segment-based robust voice activity detection method,” Comput. Speech Lang., vol. 59, 2020, DOI: https://doi.org/10.1016/j.csl.2019.06.005

J. Pang, “Spectrum energy based voice activity detection,” in 2017 IEEE 7th Annual Computing and Communication Workshop and Conference, CCWC 2017, 2017. DOI: https://doi.org/10.1109/CCWC.2017.7868454

Z. Ali and M. Talha, Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments, vol. 6. 2018. DOI: https://doi.org/10.1109/ACCESS.2018.2805845

P. D. Ortiz, L. F. Villa, C. Salazar, and O. L. Quintero, “A simple but efficient voice activity detection algorithm through Hilbert transform and dynamic threshold for speech pathologies,” J. Phys. Conf. Ser., vol. 705, no. 1, 2016, DOI: https://doi.org/10.1088/1742-6596/705/1/012037

R. J. Elton, P. Vasuki, and J. Mohanalin, “Voice activity detection using fuzzy entropy and support vector machine,” Entropy, vol. 18, no. 8, 2016, DOI: https://doi.org/10.3390/e18080298

R. Zazo, T. N. Sainath, G. Simko, and C. Parada, “Feature learning with raw-waveform CLDNNs for Voice Activity Detection,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2016. DOI: https://doi.org/10.21437/Interspeech.2016-268

A. Sehgal and N. Kehtarnavaz, “A Convolutional Neural Network Smartphone App for Real-Time

Voice Activity Detection,” IEEE Access, vol. 6, 2018, DOI: https://doi.org/10.1109/ACCESS.2018.2800728

J. Dey, M. S. Bin Hossain, and M. A. Haque, “An ensemble SVM-based approach for voice activity detection,” in ICECE 2018 - 10th International Conference on Electrical and Computer Engineering, 2019. DOI: https://doi.org/10.1109/ICECE.2018.8636745

T. Drugman, Y. Stylianou, Y. Kida, and M. Akamine, “Voice Activity Detection: Merging Source and Filter-based Information,” IEEE Signal Process. Lett., vol. 23, no. 2, 2016, DOI: https://doi.org/10.1109/LSP.2015.2495219

G. Fant, Acoustic Theory of Speech Production: With Calculations based on X-Ray Studies of Russian Articulations. Berlin: Gruyter Mouton, 1971. [Online]. Available: DOI: https://doi.org/10.1515/9783110873429

H. Uhrmann, R. Kolm, and H. Zimmermann, “Analog Filters,” Springer Ser. Adv. Microelectron., vol. 45, pp. 3–11, 2014, DOI: https://doi.org/10.1007/978-3-642-38013-6_2

P. Händel, “Power spectral density error analysis of spectral subtraction type of speech enhancement methods,” EURASIP J. Adv. Signal Process., vol. 2007, 2007, DOI: https://doi.org/10.1155/2007/96384

Das, P., Pathak, R., & Beulet. P, A. S. (2019). Low Power Implementation Of Ternary Content Addressable Memory (TCAM). In International Journal of Engineering and Advanced Technology (Vol. 9, Issue 1s3, pp. 455–460). DOI: https://doi.org/10.35940/ijeat.A1083.1291S319

Sharma, Dr. A., & Sohal, Dr. H. (2019). Sleepy- Gate Diffusion Input (S-GDI)—Ultra Low Power Technique for Digital Design. In International Journal of Innovative Technology and Exploring Engineering (Vol. 9, Issue 1, pp. 4340–4347). DOI: https://doi.org/10.35940/ijitee.A4998.119119

Chandra, K. S., Kishore, K. H., Giri, P., & Reddy, E. S. (2019). Design of 8T CNTFET SRAM for Ultra-Low Power Microelectronic Applications. In International Journal of Recent Technology and Engineering (IJRTE) (Vol. 8, Issue 4, pp. 10148–10152). DOI: https://doi.org/10.35940/ijrte.D4368.118419

Article Sidebar

Main Article Content