Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins
Keywords:thermophilic protein, bioinformatics, classification, machine learning, feature representation, feature selection
Thermophilic proteins (TPPs) are critical for basic research and in the food industry due to their ability to maintain a thermodynamically stable fold at extremely high temperatures. Thus, the expeditious identification of novel TPPs through computational models from protein sequences is very desirable. Over the last few decades, a number of computational methods, especially machine learning (ML)-based methods, for in silico prediction of TPPs have been developed. Therefore, it is desirable to revisit these methods and summarize their advantages and disadvantages in order to further develop new computational approaches to achieve more accurate and improved prediction of TPPs. With this goal in mind, we comprehensively investigate a large collection of fourteen state-of-the-art TPP predictors in terms of their dataset size, feature encoding schemes, feature selection strategies, ML algorithms, evaluation strategies and web server/software usability. To the best of our knowledge, this article represents the first comprehensive review on the development of ML-based methods for in silico prediction of TPPs. Among these TPP predictors, they can be classified into two groups according to the interpretability of ML algorithms employed (i.e., computational black-box methods and computational white-box methods). In order to perform the comparative analysis, we conducted a comparative study on several currently available TPP predictors based on two benchmark datasets. Finally, we provide future perspectives for the design and development of new computational models for TPP prediction. We hope that this comprehensive review will facilitate researchers in selecting an appropriate TPP predictor that is the most suitable one to deal with their purposes and provide useful perspectives for the development of more effective and accurate TPP predictors.
How to Cite
Copyright (c) 2022 Phasit Charoenkwan, Nalini Schaduangrat, Md Mehedi Hasan, Mohammad Ali Moni, Pietro Lió, Watshara Shoombuatong
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish in this journal agree to the following terms:
- The authors keep the copyright and grant the journal the right of first publication under the terms of the Creative Commons Attribution license, CC BY 4.0. This licencse permits unrestricted use, distribution and reproduction in any medium, provided that the original work is properly cited.
- The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.
- Because the advice and information in this journal are believed to be true and accurate at the time of publication, neither the authors, the editors, nor the publisher accept any legal responsibility for any errors or omissions presented in the publication. The publisher makes no guarantee, express or implied, with respect to the material contained herein.
- The authors can enter into additional contracts for the non-exclusive distribution of the journal's published version by citing the initial publication in this journal (e.g. publishing in an institutional repository or in a book).