SABER

Autores
Hafeez Momina
Hussain Nisar
Qasim Amna
Zain Muhammad
Mehak Gull
Kolesnikova Olga
Sidorov Grigori
Gelbukh Alexander

Título	Sarcasm Detection in Roman Urdu Text: A Comprehensive Study Using Machine Learning and Large Language Model
Tipo	Congreso
Sub-tipo	Memoria
Descripción	24th Mexican International Conference on Artificial Intelligence, MICAI 2025
Resumen	Sarcasm detection is important in sentiment analysis and social media analysis, as the literal meaning does not agree with the filtered sentiment. This paper is specifically about sarcasm detection in the Roman Urdu language. The dataset was originally introduced for Urdu sarcasm detection and was alternatively transliterated into Roman Urdu using systematic phonetic mapping. We evaluated twelve popular machine-learning models with GloVe and Word2Vec embeddings and fine-tuned several large language models (LLMs) such as LLaMA 2 (7B), LLaMA 3 (8B), and Mistral. Experimental results reveal that Gradient Boosting and Support Vector Machines showed the best performance with F1-scores of 96.62% and 95.32% respectively, using the GloVe-based embeddings. For the LLM model that reached the best result among all classifiers, it was LLaMA 3, which was also the best among all nine evaluated models LLaMA 3 with 97.15% F1-Score, followed by Mistral with 96.32%, then LLaMA 2 with 95.43%. This work highlights the potential of Roman Urdu for advanced sarcasm detection and compares the performance of traditional machine learning techniques with state-of-the-art large language models (LLMs). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
Observaciones	DOI 10.1007/978-3-032-09037-9_20 Lecture Notes in Computer Science, v. 16221 LNCS
Lugar	Guanajuato
País	Mexico
No. de páginas	245-254
Vol. / Cap.
Inicio	2025-11-03
Fin
ISBN/ISSN	9783032090362