Resumen |
Background: The notion of hope is characterized as an optimistic expectation or anticipation of favorable outcomes. In the age of extensive social media usage, research has primarily focused on monolingual techniques, and the Urdu and Arabic languages have not been addressed. Purpose: This study addresses joint multilingual hope speech detection in the Urdu, English, and Arabic languages using a transfer learning paradigm. We developed a new multilingual dataset named Posi-Vox-2024 and employed a joint multilingual technique to design a universal classifier for multilingual dataset. We explored the fine-tuned BERT model, which demonstrated a remarkable performance in capturing semantic and contextual information. Method: The framework includes (1) preprocessing, (2) data representation using BERT, (3) fine-tuning, and (4) classification of hope speech into binary (‘hope’ and ‘not hope’) and multiclass (realistic, unrealistic, and generalized hope) categories. Results: Our proposed model (BERT) demonstrated benchmark performance to our dataset, achieving 0.78 accuracy in binary classification and 0.66 in multi-class classification, with a 0.04 and 0.08 performance improvement over the baselines (Logistic Regression, in binary class 0.75 and multi class 0.61), respectively. Conclusion: Our findings will be applied to improve automated systems for detecting and promoting supportive content in English, Arabic and Urdu on social media platforms, fostering positive online discourse. This work sets new benchmarks for multilingual hope speech detection, advancing existing knowledge and enabling future research in underrepresented languages. © 2024 National Research University, Higher School of Econoimics. All rights reserved. |