Resumen |
We present the results of the application of some machine learning algorithms to predict the hot spots & hot regions residues in protein complexes at the protein-protein interface between their polypeptide chains. The dataset consisted of twenty-nine bone morphogenetic proteins (BMPs) obtained from the Protein Data Bank (PDB). The training features were selected from biochemical and biophysical properties such as B-factor, hydrophobicity index, prevalence score, accessible surface area (ASA), conservation score, and the ground-state energy (using Density Functional Theory (DFT)) of each amino acid of these interfaces. Also, we implemented parallel CPU/GPU hardware acceleration techniques during the preprocessing in order to speed up the ASA and DFT calculations with more efficient execution times. We evaluated the performance of the classifiers with several metrics. The random forest classifier obtained the best performance, achieving an average of 90 % of well-classified residues in both the true negative and true positive rates. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG. |