Autores
Mahmood Ahmad
Torres Ruiz Miguel Jesús
Ahmad Zainab
Quintero Téllez Rolando
Título An Efficient Approach for Code-Mixed Emotion Classification Applying Machine Learning
Tipo Revista
Sub-tipo JCR
Descripción IEEE Access
Resumen Emotion classification aims to find and extract all possible emotions from a piece of text that best represent the author’s state of mind. The task of emotion classification is still considered challenging for under-resourced languages, especially in the case of code-mixing, which is not a standardized language on its own. The widespread use of social media has led to the emergence of code-mixed language, which later gained attention from researchers due to its extensive usage. Emotion classification is an important problem with a range of applications, from healthcare and e-learning to social media. While some work has been done on code-mixed emotion classification, very few studies have focused on code-mixed emotion classification for English and Roman Urdu. Previously, researchers attempted to solve the problem of codemixed multi-label emotion classification using code-mixed English and Roman Urdu, but the results were relatively low (e.g., Micro F1 = 0.67), indicating that there is still a need for improvement in this area. In this study, we mainly aim to solve two complex tasks: i) code-mixed multi-label emotion classification and ii) code-mixed multi-class emotion classification. Our contribution lies in utilizing classical machine learning methods with three distinct multi-label and multi-class classification approaches: i) One-VersusRest (OvR), ii) Label Powerset (LP), and iii) Binary Relevance (BR), along with two distinct feature extraction techniques. First, we employ content-based methods using TF-IDF at the word unigram level and experiment with various feature sets ranging from 500 to 3000 features. Second, we use contextbased methods by leveraging SBERT-based models for embeddings to capture semantic meanings. Finally, we apply a state-of-the-art Generative AI-based approach, utilizing a quantized version of LLaMa, which is fine-tuned for evaluation. We conducted over 2,000 experiments, and the best results were obtained using classical machine learning (Micro F1 = 0.9142 for multi-label classification and Micro F1 = 0.9238 for multi-class classification) with the combination of the Binary Relevance approach in a context-based setting for both tasks, which indicates that Binary Relevance is an optimized approach for breaking complex multilabel, multi-class tasks into easier ones, especially when the language is difficult enough in its own.
Observaciones DOI 10.1109/ACCESS.2025.3598754
Lugar Piscataway. NJ
País Estados Unidos
No. de páginas 166973-166986
Vol. / Cap. v. 13
Inicio 2025-08-13
Fin
ISBN/ISSN