Autores
Hafeez Momina
Hussain Nisar
Qasim Amna
Zain Muhammad
Mehak Gull
Kolesnikova Olga
Sidorov Grigori
Gelbukh Alexander
Título Sarcasm Detection in Roman Urdu Text: A Comprehensive Study Using Machine Learning and Large Language Model
Tipo Congreso
Sub-tipo Memoria
Descripción 24th Mexican International Conference on Artificial Intelligence, MICAI 2025
Resumen Sarcasm detection is important in sentiment analysis and social media analysis, as the literal meaning does not agree with the filtered sentiment. This paper is specifically about sarcasm detection in the Roman Urdu language. The dataset was originally introduced for Urdu sarcasm detection and was alternatively transliterated into Roman Urdu using systematic phonetic mapping. We evaluated twelve popular machine-learning models with GloVe and Word2Vec embeddings and fine-tuned several large language models (LLMs) such as LLaMA 2 (7B), LLaMA 3 (8B), and Mistral. Experimental results reveal that Gradient Boosting and Support Vector Machines showed the best performance with F1-scores of 96.62% and 95.32% respectively, using the GloVe-based embeddings. For the LLM model that reached the best result among all classifiers, it was LLaMA 3, which was also the best among all nine evaluated models LLaMA 3 with 97.15% F1-Score, followed by Mistral with 96.32%, then LLaMA 2 with 95.43%. This work highlights the potential of Roman Urdu for advanced sarcasm detection and compares the performance of traditional machine learning techniques with state-of-the-art large language models (LLMs). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
Observaciones DOI 10.1007/978-3-032-09037-9_20 Lecture Notes in Computer Science, v. 16221 LNCS
Lugar Guanajuato
País Mexico
No. de páginas 245-254
Vol. / Cap.
Inicio 2025-11-03
Fin
ISBN/ISSN 9783032090362