Recent development of machine learning-based methods for the prediction of defensin family and subfamily
Keywords:defensins, sequence analysis, bioinformatics, classification, machine learning, feature selection
Nearly all living species comprise of host defense peptides called defensins, that are crucial for innate immunity. These peptides work by activating the immune system which kills the microbes directly or indirectly, thus providing protection to the host. Thus far, numerous preclinical and clinical trials for peptide-based drugs are currently being evaluated. Although, experimental methods can help to precisely identify the defensin peptide family and subfamily, these approaches are often time-consuming and cost-ineffective. On the other hand, machine learning (ML) methods are able to effectively employ protein sequence information without the knowledge of a protein’s three-dimensional structure, thus highlighting their predictive ability for the large-scale identification. To date, several ML methods have been developed for the in silico identification of the defensin peptide family and subfamily. Therefore, summarizing the advantages and disadvantages of the existing methods is urgently needed in order to provide useful suggestions for the development and improvement of new computational models for the identification of the defensin peptide family and subfamily. With this goal in mind, we first provide a comprehensive survey on a collection of six state-of-the-art computational approaches for predicting the defensin peptide family and subfamily. Herein, we cover different important aspects, including the dataset quality, feature encoding methods, feature selection schemes, ML algorithms, cross-validation methods and web server availability/usability. Moreover, we provide our thoughts on the limitations of existing methods and future perspectives for improving the prediction performance and model interpretability. The insights and suggestions gained from this review are anticipated to serve as a valuable guidance for researchers for the development of more robust and useful predictors.
How to Cite
Copyright (c) 2022 Phasit Charoenkwan, Nalini Schaduangrat, S. M. Hasan Mahmud, Orawit Thinnukool, Watshara Shoombuatong
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish in this journal agree to the following terms:
- The authors keep the copyright and grant the journal the right of first publication under the terms of the Creative Commons Attribution license, CC BY 4.0. This licencse permits unrestricted use, distribution and reproduction in any medium, provided that the original work is properly cited.
- The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.
- Because the advice and information in this journal are believed to be true and accurate at the time of publication, neither the authors, the editors, nor the publisher accept any legal responsibility for any errors or omissions presented in the publication. The publisher makes no guarantee, express or implied, with respect to the material contained herein.
- The authors can enter into additional contracts for the non-exclusive distribution of the journal's published version by citing the initial publication in this journal (e.g. publishing in an institutional repository or in a book).