Autores
Acevedo Sánchez Gerardo
Alarcón Paredes Antonio
Yáñez Márquez Cornelio
Título Effect of agriculture-related dataset complexity on classical machine learning and deep learning classifiers performance
Tipo Revista
Sub-tipo JCR
Descripción Computers and Electronics in Agriculture
Resumen This study evaluates how five indicators of dataset complexity affect the performance of 24 machine learning (ML) and deep learning (DL) classifiers across eight publicly available agriculture-related datasets. The indicators were cardinality (320-13,611 instances), dimensionality (7-35 features), class imbalance (Imbalance Ratio [IR] = 1-109.9), class number (2-40 classes), and feature types (numeric and ordinal). Performance measures, including sensitivity, specificity, balanced accuracy (BA), precision, F1-score, and Matthews Correlation Coefficient (MCC), were derived from confusion matrices generated via 10-fold cross-validation procedure. Macro and weighted-average were included as overall measures. Nonparametric tests (Friedman-Nemenyi; p < 0.05 and Cliff's [delta]) were performed for weighted-average sensitivity and BA. Across 192 analyses, ensembles (GBM, XGBoost, RF) and C5.0 significantly outperformed other classifiers on 5 out of 8 datasets, achieving values greater than 0.91. Artificial Neural Networks (ANN) showed ineffectiveness for tabular data (BA < 0.50). Extreme imbalance (White Wine: IR = 109.9) affected the classifiers performance, mainly for distance-based and probabilistic (MCC < 0.34), even the ensembles partially mitigated the bias (BA < 0.65). High dimensionality (Date Fruits: 34 features) favored LDA and RF (BA >= 0.93). Conversely, large multiclass (Soybean Cultivars: 40 classes) demonstrated higher performance of IBk (BA = 0.87). Sixty paired comparisons confirmed significant differences (p < 0.00001) and strong effects (delta = -0.57 to 0.18) between ensembles and underperforming classifiers, confirming that dimensionality, IR, and multiclass directly determine the performance. To the best of our knowledge, this is the first large-scale comparison of 24 ML/DL classifiers on eight agricultural datasets.
Observaciones DOI 10.1016/j.compag.2025.110941
Lugar London
País Reino Unido
No. de páginas Article number 110941
Vol. / Cap. v. 239 part A
Inicio 2025-12-01
Fin
ISBN/ISSN