Autores
Naseeb Amna
Zain Muhammad
Hussain Nisar
Qasim Amna
Sidorov Grigori
Gelbukh Alexander
Título Machine Learning- and Deep Learning-Based Multi-Model System for Hate Speech Detection on Facebook
Tipo Revista
Sub-tipo CONACYT
Descripción Algorithms
Resumen Hate speech is a complex topic that transcends language, culture, and even social spheres. Recently, the spread of hate speech on social media sites like Facebook has added a new layer of complexity to the issue of online safety and content moderation. This study seeks to minimize this problem by developing an Arabic script-based tool for automatically detecting hate speech in Roman Urdu, an informal script used most commonly for South Asian digital communications. Roman Urdu is relatively complex as there are no standardized spellings, leading to syntactic variations, which increases the difficulty of hate speech detection. To tackle this problem, we adopt a holistic strategy using a combination of six machine learning (ML) and four Deep Learning (DL) models, a dataset from Facebook comments, which was preprocessed (tokenization, stopwords removal, etc.), and text vectorization (TF-IDF, word embeddings). The ML algorithms used in this study are LR, SVM, RF, NB, KNN, and GBM. We also use deep learning architectures like CNN, RNN, LSTM, and GRU to increase the accuracy of the classification further. It is proven by the experimental results that deep learning models outperform the traditional ML approaches by a significant margin, with CNN and LSTM achieving accuracies of 95.1% and 96.2%, respectively. As far as we are aware, this is the first work that investigates QLoRA for fine-tuning large models for the task of offensive language detection in Roman Urdu. © 2025 by the authors.
Observaciones DOI 10.3390/a18060331
Lugar Basel
País Suiza
No. de páginas Article number 331
Vol. / Cap. v. 18 no. 6
Inicio 2025-06-01
Fin
ISBN/ISSN