SABER

Autores
Kolesnikova Olga
Sidorov Grigori
Abdullah

Título	Detection of Biased Phrases in the Wiki Neutrality Corpus for Fairer Digital Content Management Using Artificial Intelligence
Tipo	Revista
Sub-tipo	CONACYT
Descripción	Big Data and Cognitive Computing
Resumen	Detecting biased language in large-scale corpora, such as the Wiki Neutrality Corpus, is essential for promoting neutrality in digital content. This study systematically evaluates a range of machine learning (ML) and deep learning (DL) models for the detection of biased and pre-conditioned phrases. Conventional classifiers, including Extreme Gradient Boosting (XGBoost), Light Gradient-Boosting Machine (LightGBM), and Categorical Boosting (CatBoost), are compared with advanced neural architectures such as Bidirectional Encoder Representations from Transformers (BERT), Long Short-Term Memory (LSTM) networks, and Generative Adversarial Networks (GANs). A novel hybrid architecture is proposed, integrating DistilBERT, LSTM, and GANs within a unified framework. Extensive experimentation with intermediate variants DistilBERT + LSTM (without GAN) and DistilBERT + GAN (without LSTM) demonstrates that the fully integrated model consistently outperforms all alternatives. The proposed hybrid model achieves a cross-validation accuracy of 99.00%, significantly surpassing traditional baselines such as XGBoost (96.73%) and LightGBM (96.83%). It also exhibits superior stability, statistical significance (paired t-tests), and favorable trade-offs between performance and computational efficiency. The results underscore the potential of hybrid deep learning models for capturing subtle linguistic bias and advancing more objective and reliable automated content moderation systems. © 2025 Elsevier B.V., All rights reserved.
Observaciones	DOI 10.3390/bdcc9070190
Lugar	Basel
País	Suiza
No. de páginas	Article number 190
Vol. / Cap.	v. 9 no. 7
Inicio	2025-07-21
Fin
ISBN/ISSN