Recent development of machine learning-based methods for the prediction of defensin family and subfamily

Authors

  • Phasit Charoenkwan Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200 https://orcid.org/0000-0002-8161-6856
  • Nalini Schaduangrat Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700 https://orcid.org/0000-0002-0842-8277
  • S. M. Hasan Mahmud Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700; Department of Computer Science, American International University-Bangladesh (AIUB), Kuratoli, Dhaka 1229, Bangladesh https://orcid.org/0000-0002-6828-3559
  • Orawit Thinnukool Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200 https://orcid.org/0000-0002-1664-0059
  • Watshara Shoombuatong Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700. Phone: +66 2 441 4371; Fax: +66 2 441 4380; E-mail: watshara.sho@mahidol.ac.th https://orcid.org/0000-0002-3394-8709

DOI:

https://doi.org/10.17179/excli2022-4913

Keywords:

defensins, sequence analysis, bioinformatics, classification, machine learning, feature selection

Abstract

Nearly all living species comprise of host defense peptides called defensins, that are crucial for innate immunity. These peptides work by activating the immune system which kills the microbes directly or indirectly, thus providing protection to the host. Thus far, numerous preclinical and clinical trials for peptide-based drugs are currently being evaluated. Although, experimental methods can help to precisely identify the defensin peptide family and subfamily, these approaches are often time-consuming and cost-ineffective. On the other hand, machine learning (ML) methods are able to effectively employ protein sequence information without the knowledge of a protein’s three-dimensional structure, thus highlighting their predictive ability for the large-scale identification. To date, several ML methods have been developed for the in silico identification of the defensin peptide family and subfamily. Therefore, summarizing the advantages and disadvantages of the existing methods is urgently needed in order to provide useful suggestions for the development and improvement of new computational models for the identification of the defensin peptide family and subfamily. With this goal in mind, we first provide a comprehensive survey on a collection of six state-of-the-art computational approaches for predicting the defensin peptide family and subfamily. Herein, we cover different important aspects, including the dataset quality, feature encoding methods, feature selection schemes, ML algorithms, cross-validation methods and web server availability/usability. Moreover, we provide our thoughts on the limitations of existing methods and future perspectives for improving the prediction performance and model interpretability. The insights and suggestions gained from this review are anticipated to serve as a valuable guidance for researchers for the development of more robust and useful predictors.

Author Biography

Watshara Shoombuatong, Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700. Phone: +66 2 441 4371; Fax: +66 2 441 4380; E-mail: watshara.sho@mahidol.ac.th

I am highly motivated to design and develop cutting-edge computational algorithms, models and pipelines to address a range of challenging problems in drug discovery and development.

Published

2022-05-05

How to Cite

Charoenkwan, P., Schaduangrat, N., Mahmud, S. M. H., Thinnukool, O., & Shoombuatong, W. (2022). Recent development of machine learning-based methods for the prediction of defensin family and subfamily. EXCLI Journal, 21, 757–771. https://doi.org/10.17179/excli2022-4913

Issue

Section

Review articles

Categories

Most read articles by the same author(s)