SABER

Autores
Acevedo Sánchez Gerardo
Alarcón Paredes Antonio
Yáñez Márquez Cornelio

Título	Effect of agriculture-related dataset complexity on classical machine learning and deep learning classifiers performance
Tipo	Revista
Sub-tipo	JCR
Descripción	Computers and Electronics in Agriculture
Resumen	This study evaluates how five indicators of dataset complexity affect the performance of 24 machine learning (ML) and deep learning (DL) classifiers across eight publicly available agriculture-related datasets. The indicators were cardinality (320-13,611 instances), dimensionality (7-35 features), class imbalance (Imbalance Ratio [IR] = 1-109.9), class number (2-40 classes), and feature types (numeric and ordinal). Performance measures, including sensitivity, specificity, balanced accuracy (BA), precision, F1-score, and Matthews Correlation Coefficient (MCC), were derived from confusion matrices generated via 10-fold cross-validation procedure. Macro and weighted-average were included as overall measures. Nonparametric tests (Friedman-Nemenyi; p < 0.05 and Cliff's [delta]) were performed for weighted-average sensitivity and BA. Across 192 analyses, ensembles (GBM, XGBoost, RF) and C5.0 significantly outperformed other classifiers on 5 out of 8 datasets, achieving values greater than 0.91. Artificial Neural Networks (ANN) showed ineffectiveness for tabular data (BA < 0.50). Extreme imbalance (White Wine: IR = 109.9) affected the classifiers performance, mainly for distance-based and probabilistic (MCC < 0.34), even the ensembles partially mitigated the bias (BA < 0.65). High dimensionality (Date Fruits: 34 features) favored LDA and RF (BA >= 0.93). Conversely, large multiclass (Soybean Cultivars: 40 classes) demonstrated higher performance of IBk (BA = 0.87). Sixty paired comparisons confirmed significant differences (p < 0.00001) and strong effects (delta = -0.57 to 0.18) between ensembles and underperforming classifiers, confirming that dimensionality, IR, and multiclass directly determine the performance. To the best of our knowledge, this is the first large-scale comparison of 24 ML/DL classifiers on eight agricultural datasets.
Observaciones	DOI 10.1016/j.compag.2025.110941
Lugar	London
País	Reino Unido
No. de páginas	Article number 110941
Vol. / Cap.	v. 239 part A
Inicio	2025-12-01
Fin
ISBN/ISSN