| Resumen |
It is well known that detecting offensive language is a challenging task in online content moderation. In this work, we compare four open-domain and easily accessible LLMs, namely, LLaMA 2 7B, LLaMA 3 8B, Mistral 8B, and GPT-4o mini 8B—on a Roman Urdu-English code-mixed dataset for offensive language classification. We fine-tune all models using the QLoRA (Quantized Low-Rank Adaptation) framework that enables memory-efficient optimization. Experimental results show that LLaMA 3 8B performs better than others with an F1-score of 96.78; GPT-4o mini follows with 94.34; Mistral performs well with 92.16; and LLaMA 2 7B has a score of 90.02. Our findings demonstrate the benefits of using QLoRA for optimizing modern LLMs on low-resource multilingual tasks, offering a path towards developing scalable and lightweight toxic content detection systems. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026. |