Mapping the Sound of India: Machine Learning-Based Regional Classification of Folk Songs
Main Article Content
Abstract
Preserving Indian folk music in digital repositories poses significant challenges because robust classification systems are lacking to capture its linguistic, instrumental, and acoustic diversity. As a cornerstone of India’s intangible cultural heritage, this music faces the risk of marginalisation and loss unless systematic, scalable methods are employed to identify and preserve it. This research aims to develop an automated, multi-modal framework for regional classification of Indian folk music, thereby enabling structured archiving and improved accessibility. To achieve this, a novel machine learning pipeline was designed, integrating Whisper for speech recognition and regional language identification. Instrument detection was performed using YAMNet, which has proven effective in recognizing traditional instruments. Acoustic features such as MFCCs, chroma, and spectral descriptors were extracted using Librosa [Error! R eference source not found.]. Together, these tools provide a comprehensive understanding of the songs’ linguistic, instrumental, and rhythmic content. The curated dataset includes folk music from linguistically rich regions of India, such as Marathi, Punjabi, Urdu Qawwali, and dialects from Uttar Pradesh and Bihar. Seven supervised learning algorithms were trained and evaluated, including Random Forest, Support Vector Machine, and Gradient Boosting. Simpler classifiers, such as K-Nearest Neighbours, Naive Bayes, and Logistic Regression, were also tested. A hybrid ensemble model combining Random Forest, SVM, and Gradient Boosting through soft voting achieved a classification accuracy of 99%. This result demonstrates the effectiveness of ensemble learning, combined with multimodal features, in handling nuanced differences in regional folk genres. This research addresses the critical gap in scalable and automated tools for preserving folk music. The study highlights the potential of artificial intelligence in safeguarding endangered cultural assets.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
M. A. Al-Waisy, N. F. Hamed, and H. Al-Bayati, “Comparative analysis of deep learning and gradient boosting models for audio genre classification,” arXiv preprint arXiv:2401.04737, Jan. 2024. [Online]. Available: https://arxiv.org/abs/2401.04737
S. Choi, S. Lee, and S. Kim, “Deep learning-based music emotion recognition: A survey,” Electronics, vol. 10, no. 13, p. 1572, 2021, doi: http://doi.org/10.3390/electronics10131572.
A. Radford et al., “Robust speech recognition via large-scale weak supervision,” arXiv preprint, arXiv:2212.04356, 2022. [Online]. Available: https://arxiv.org/abs/2212.04356
L..J.M. Raboy and A.Taparugssanagorn, “Verse1‑Chorus‑Verse2 Structure: A Stacked Ensemble Approach for Enhanced Music Emotion Recognition,” Appl. Sci. (Applied Sciences), vol.14, no.13, art.no.5761, 2024, doi: http://doi.org/10.3390/app14135761.
G. Douzas, F. Bacao, and F. Last, “Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE,” Information Sciences, vol. 465, pp. 1–20, Oct. 2018, doi: http://doi.org/10.1016/j.ins.2018.06.056.
T.-Y. Lin et al., “Focal loss for dense object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 318–327, 2020, doi: http://doi.org/10.1109/TPAMI.2018.2858826.
Y. Meng, “S3T: Self-Supervised Pre-training with Swin Transformer for Music Classification,” arXiv preprint, arXiv:2402.10139, 2024. [Online]. Available: https://arxiv.org/abs/2402.10139
Y. Liu, Q. Zhang, and W. Chen, “Music Genre Classification Using Ensemble Learning with Subcomponent-level Attention,”
arXiv preprint, arXiv:2412.15602, 2024. [Online]. Available: https://arxiv.org/abs/2412.15602
Y. Meng, “Comparative Study of CNN, VGG16, and XGBoost for Music Genre Classification with Mel-spectrogram and MFCC Features,” arXiv preprint, arXiv:2401.04737, 2024. [Online]. Available: https://arxiv.org/abs/2401.04737
M.Chaudhury, A.Karami, and M.A.Ghazanfar, “Large‑Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark,” Electronics, vol.11, no.16, art. no. 2567, 2022, DOI: http://doi.org/10.3390/electronics11162567.
Owodeyi, “Dissecting the genre of Nigerian music with machine learning models,” J. King Saud Univ. – Comput. Inf. Sci., vol. 34, no. 8, pp. 6266–6279, 2022, DOI: http://doi.org/10.1016/j.jksuci.2021.07.009.
Y. Li, L. Zhang, and Y. Zhao, “Comparison of CNN+GRU, CRNN, and Hybrid Models for Music Emotion Recognition with Data Augmentation,” Sensors, vol. 24, no. 7, p. 2201, 2024. DOI: http://doi.org/10.3390/s24072201.
audEERING, “openSMILE 3.0: Audio Feature Extraction Toolkit,” 2023. [Online]. Available: https://www.audeering.com/research/opensmile/
Google Research, “Fine-tuning YAMNet for Instrument Detection: Practical Recommendations,” DataScience StackExchange, 2023. [Online]. Available: https://datascience.stackexchange.com/questions/129556/detection-of-musical-instruments-using-yamnet