Multi-Modal Emotion Recognition Feature Extraction and Data Fusion Methods Evaluation
Main Article Content
Abstract
Research into emotion detection is crucial because of the wide range of fields that can benefit from it, including healthcare, intelligent customer service, and education. In comparison to unimodal approaches, multimodal emotion recognition (MER) integrates many modalities including text, facial expressions, and voice to provide better accuracy and robustness. This article provides a historical and present-day overview of MER, focusing on its relevance, difficulties, and approaches. We examine several datasets, comparing and contrasting their features and shortcomings; they include IEMOCAP and MELD. Recent developments in deep learning approaches, particularly fusion strategies such as early, late, and hybrid fusion are covered in the literature review. Data redundancy, complicated feature extraction, and real-time detection are among the identified shortcomings. Our suggested technique enhances emotion recognition accuracy by using deep learning to extract features using a hybrid fusion approach. To overcome existing restrictions and advance the area of MER, this study intends to direct future investigations in the right direction. Examining various data fusion strategies, reviewing new methodologies in multimodal emotion identification, and identifying problems and research needs to make up the primary body of this work.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
Swain, M.; Routray, A.; Kabisatpathy, P. Databases, features and classifiers for speech emotion recognition: A review. Int. J. Speech Technol. 2018, 21, 93–120.
Zong, Y.; Lian, H.; Chang, H.; Lu, C.; Tang, C. Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora. Entropy 2022, 24, 1250.
Li, S.; Deng, W. Deep facial expression recognition: A survey. IEEE Trans. Affect. Comput. 2020, 13, 1195–1215.
Yang, H.; Xie, L.; Pan, H.; Li, C.; Wang, Z.; Zhong, J. Multimodal Attention Dynamic Fusion Network for Facial Micro-Expression Recognition. Entropy 2023, 25, 1246.
Zeng, J.; Liu, T.; Zhou, J. Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1545–1554.
Shou, Y.; Meng, T.; Ai, W.; Yang, S.; Li, K. Conversational emotion recognition studies based on graph convolutional neural networks and a dependent syntactic analysis. Neurocomputing 2022, 501, 629–639.
Li, Y.; Wang, Y.; Cui, Z. Decoupled Multimodal Distilling for Emotion Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6631–6640.
Shirke, B., Wong, J., Libut, J. C., George, K., & Oh, S. J. (2020). Brain-iot-based emotion recognition system. In Proceedings of the 10th annual Computing and Communication Workshop and Conference (CCWC) (pp. 0991–0995). IEEE.
Busso, C.; Bulut, M.; Lee, C.C.; Kazemzadeh, A.; Mower, E.; Kim, S.; Chang, J.N.; Lee, S.; Narayanan, S.S. IEMOCAP: Interactive emotional dyadic motion capture database. Lang. Resour. Eval. 2008, 42, 335–359.
Poria, S.; Hazarika, D.; Majumder, N.; Naik, G.; Cambria, E.; Mihalcea, R. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 527–536.
Wei-Long Zheng , Wei Liu, Yifei Lu, Bao-Liang Lu, and Andrzej Cichocki."EmotionMeter: A Multimodal Framework for Recognizing Human Emotions" Volume: 49, Issue: 3, March 2019, Pages: 1110 – 1122, February 2018, DOI:10.1109/TCYB.2018.2797176.
SHAHLA NEMATI ,REZA ROHANI , MOHAMMAD EHSAN BASIRI , MOLOUD ABDAR , NEIL Y. YEN , AND VLADIMIR MAKARENKOV. “A Hybrid Latent Space Data Fusion Method for Multimodal Emotion Recognition, IEEE Access Volume: 7, Pages: 172948 – 172964, ISSN:2169-3536, November 2019, DOI:10.1109/ACCESS.2019.2955637.
HAIPING HUANG, ZHENCHAO HU, WENMING WANG, AND MIN WU“Multimodal Emotion Recognition Based on Ensemble Convolutional Neural Network”, IEEE Access Volume: 8, Pages:3265 – 3271, December 2019, DOI:10.1109/ACCESS.2019.2962085.
HONGLI ZHANG “Expression-EEG Based Collaborative Multimodal Emotion Recognition Using Deep AutoEncoder”, IEEE Access Volume: 8, Pages: 164130 – 164143, ISSN: 2169-3536, September 2020, DOI:10.1109/ACCESS.2020.3021994.
SHAMANE SIRIWARDHANA , THARINDU KALUARACHCHI , MARK BILLINGHURST , AND SURANGA NANAYAKKARA. “Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion”, IEEE Access Volume: 8, Pages: 176274 – 176285, ISSN: 2169-3536, September 2020, DOI:10.1109/ACCESS.2020.3026823,
Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, Haizhou Li, "MEMOBERT: PRE-TRAINING MODEL WITH PROMPT-BASED LEARNING FOR MULTIMODAL EMOTION RECOGNITION", 27oct 2021.
Sarala Padi, Seyed Omid Sadjadi, Dinesh Manocha, and Ram D. Sriram. “Multimodal Emotion Recognition using Transfer Learning from SpeakerRecognition and BERT-based models”, 16 Feb 2022.
Puneet Kumar, Sarthak Malik, and Balasubramanian Raman, "Interpretable Multimodal Emotion Recognition using Hybrid Fusion of Speech and Image Data", Springer Nature 2023.
YUCEL CIMTAY, ERHAN EKMEKCIOGLU, AND SEYMA CAGLAR-OZHAN, "Cross-Subject Multimodal Emotion Recognition Based on Hybrid Fusion", September 14, 2020, DOI: 10.1109/ACCESS.2020.3023871
Fengmao Lv, Xiang Chen, Yanyong Huang, Lixin Duan, Guosheng Lin. "Progressive Modality Reinforcement for Human Multimodal EmotionRecognition from Unaligned Multimodal Sequences"
Jiahui Pan, Weijie Fang, Zhihang Zhang, Bingzhi Chen, Zheng Zhang, Shuihua Wang, “Multimodal Emotion Recognition based on Facial Expressions Speech, and EEG”, DOI 10.1109/OJEMB.2023.3240280
SANGHYUN LEE, DAVID K. HAN, AND HANSEOK KO, "Multimodal Emotion Recognition Fusion Analysis Adapting BERT With Heterogeneous Feature Unification", June 2021, Digital Object Identifier 10.1109/ACCESS.2021.3092735.
Dung Nguyen, Duc Thanh Nguyen , Rui Zeng, Thanh Thi Nguyen , Son N. Tran, Thin Nguyen, Sridha Sridharan, “Deep Auto-Encoders With Sequential Learning for Multimodal Dimensional Emotion Recognition" IEEE, VOL. 24, 2020, pages:1313-1324.
Yi Yang, Qiang Gao, Yu Song, Xiaolin Song, Zemin Mao, and Junjie Liu, "Investigating of Deaf Emotion Cognition Pattern By EEG and Facial Expression Combination", IEEE, VOL. 26, FEBRUARY 2022, pg.589-599
Lucas Goncalves, Carlos Busso, "Robust Audiovisual Emotion Recognition: Aligning Modalities, Capturing Temporal Information, and Handling Missing Features", IEEE, VOL. 13, OCTOBER-DECEMBER 2022, pg. 2156- 2169
Ke Zhang, Yuanqing Li, Jingyu Wang, Erik Cambria, Xuelong Li, "Real-Time Video Emotion Recognition Based on Reinforcement Learning and Domain Knowledge", IEEE Vol 32, MARCH 2022, pg. 1034- 1047
Norbert Braunsch weiler , Rama Doddipatla ,Simon Keizer, and Svetlana Stoyanchev, “Factors in Emotion Recognition With Deep Learning Models Using Speech and Text on Multiple Corpora”,IEEE , VOL. 29, 2022, pg. 722-726
Guan-Nan Dong, Chi-Man Pun, and Zheng Zhang, “Temporal Relation Inference Network for Multimodal Speech Emotion Recognition” IEEE, VOL. 32, SEPTEMBER 2022, pg.6472- 6485
Jiménez-Guarneros, Magdiel, and Gibran Fuentes-Pineda. "CFDA-CSF: A Multi-modal Domain Adaptation Method for Cross-subject Emotion Recognition." IEEE Transactions on Affective Computing (2024).
Sun, Teng, et al. "Muti-modal Emotion Recognition via Hierarchical Knowledge Distillation." IEEE Transactions on Multimedia (2024).
Alsaadawı, Hussein Farooq Tayeb, and Resul Daş. "Multimodal Emotion Recognition Using Bi-LG-GCN for MELD Dataset." Balkan Journal of Electrical and Computer Engineering 12.1 (2024): 36-46.
Kumar, Puneet, Sarthak Malik, and Balasubramanian Raman. "Interpretable multimodal emotion recognition using a hybrid fusion of speech and image data." Multimedia Tools and Applications 83.10 (2024): 28373-28394.
Umair, Muhammad, et al. "Emotion Fusion-Sense (Emo Fu-Sense)–A novel multimodal emotion classification technique." Biomedical Signal Processing and Control 94 (2024): 106224.
Makhmudov, Fazliddin, Alpamis Kultimuratov, and Young-Im Cho. "Enhancing Multimodal Emotion Recognition through Attention Mechanisms in BERT and CNN Architectures." Applied Sciences 14.10 (2024): 4199.
Wang, Ruiqi, et al. "Husformer: A multi-modal transformer for multi-modal human state recognition." IEEE Transactions on Cognitive and Developmental Systems (2024).
Pereira, Rafael, et al. "Systematic Review of Emotion Detection with Computer Vision and Deep Learning." Sensors 24.11 (2024): 3484.
Li, Xingye, et al. "Magdra: a multi-modal attention graph network with dynamic routing-by-agreement for multi-label emotion recognition." Knowledge-Based Systems 283 (2024): 111126.
Wang, Shuai, et al. "Multimodal emotion recognition from EEG signals and facial expressions." IEEE Access 11 (2023): 33061-33068.
Lei, Yuanyuan, and Houwei Cao. "Audio-visual emotion recognition with preference learning based on intended and multi-modal perceived labels." IEEE Transactions on Affective Computing (2023).
Liu, Shuai, et al. "Multi-modal fusion network with complementarity and importance for emotion recognition." Information Sciences 619 (2023): 679-694.
Hou, Mixiao, et al. "Semantic alignment network for multi-modal emotion recognition." IEEE Transactions on Circuits and Systems for Video Technology (2023).
Shahzad, H. M., et al. "Multi-Modal CNN Features Fusion for Emotion Recognition: A Modified Xception Model." IEEE Access (2023).
Zhang, Duzhen, et al. "Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations." IEEE Transactions on Multimedia (2023).
Zhang, Yazhou, et al. "M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition." ACM Transactions on Information Systems 42.1 (2023): 1-32
Singh, Gopendra Vikram, et al. "Emoint-trans: A multimodal transformer for identifying emotions and intents in social conversations." IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2022): 290-300.
Zou, ShiHao, et al. "Improving multimodal fusion with Main Modal Transformer for emotion recognition in conversation." Knowledge-Based Systems 258 (2022): 109978.
Lian, Zheng, Bin Liu, and Jianhua Tao. "Smin: Semi-supervised multi-modal interaction network for conversational emotion recognition." IEEE Transactions on Affective Computing (2022).
Yoon, Yeo Chan. "Can we exploit all datasets? Multimodal emotion recognition using cross-modal translation." IEEE Access 10 (2022): 64516-64524.
Wang, Qian, et al. "Multi-modal emotion recognition using EEG and speech signals." Computers in Biology and Medicine 149 (2022): 105907.
Zheng, Jiahao, et al. "Multi-channel weight-sharing autoencoder based on cascade multi-head attention for multimodal emotion recognition." IEEE Transactions on Multimedia (2022).
Yang, Dingkang, et al. "Contextual and cross-modal interaction for multi-modal speech emotion recognition." IEEE Signal Processing Letters 29 (2022): 2093-2097.
Liu, Wei, et al. "Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition." IEEE Transactions on Cognitive and Developmental Systems 14.2 (2021): 715-729.
Guanghui, Chen, and Zeng Xiaoping. "Multi-modal emotion recognition by fusing correlation features of speech-visual." IEEE Signal Processing Letters 28 (2021): 533-537.
Kanani, P., & Padole, Dr. M. (2019). Deep Learning to Detect Skin Cancer using Google Colab. In International Journal of Engineering and Advanced Technology (Vol. 8, Issue 6, pp. 2176–2183). https://doi.org/10.35940/ijeat.f8587.088619
Sultana, N., Rahman, Md. T., Parven, N., Rashiduzzaman, M., & Jabiullah, Md. I. (2020). Computer Vision based Plant Leaf Disease Recognition using Deep Learning. In International Journal of Innovative Technology and Exploring Engineering (Vol. 9, Issue 5, pp. 622–626). https://doi.org/10.35940/ijitee.e2486.039520
Radhamani, V., & Dalin, G. (2019). Significance of Artificial Intelligence and Machine Learning Techniques in Smart Cloud Computing: A Review. In International Journal of Soft Computing and Engineering (Vol. 9, Issue 3, pp. 1–7). https://doi.org/10.35940/ijsce.c3265.099319