Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins

Authors

  • Phasit Charoenkwan Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200 https://orcid.org/0000-0002-8161-6856
  • Nalini Schaduangrat Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700 https://orcid.org/0000-0002-0842-8277
  • Md Mehedi Hasan Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA https://orcid.org/0000-0003-4952-0739
  • Mohammad Ali Moni School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, the University of Queensland, St Lucia, QLD 4072, Australia https://orcid.org/0000-0003-0756-1006
  • Pietro Lió Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK https://orcid.org/0000-0002-0540-5053
  • Watshara Shoombuatong Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700. Phone: +66 2 441 4371; Fax: +66 2 441 4380; E-mail: watshara.sho@mahidol.ac.th https://orcid.org/0000-0002-3394-8709

DOI:

https://doi.org/10.17179/excli2022-4723

Keywords:

thermophilic protein, bioinformatics, classification, machine learning, feature representation, feature selection

Abstract

Thermophilic proteins (TPPs) are critical for basic research and in the food industry due to their ability to maintain a thermodynamically stable fold at extremely high temperatures. Thus, the expeditious identification of novel TPPs through computational models from protein sequences is very desirable. Over the last few decades, a number of computational methods, especially machine learning (ML)-based methods, for in silico prediction of TPPs have been developed. Therefore, it is desirable to revisit these methods and summarize their advantages and disadvantages in order to further develop new computational approaches to achieve more accurate and improved prediction of TPPs. With this goal in mind, we comprehensively investigate a large collection of fourteen state-of-the-art TPP predictors in terms of their dataset size, feature encoding schemes, feature selection strategies, ML algorithms, evaluation strategies and web server/software usability. To the best of our knowledge, this article represents the first comprehensive review on the development of ML-based methods for in silico prediction of TPPs. Among these TPP predictors, they can be classified into two groups according to the interpretability of ML algorithms employed (i.e., computational black-box methods and computational white-box methods). In order to perform the comparative analysis, we conducted a comparative study on several currently available TPP predictors based on two benchmark datasets. Finally, we provide future perspectives for the design and development of new computational models for TPP prediction. We hope that this comprehensive review will facilitate researchers in selecting an appropriate TPP predictor that is the most suitable one to deal with their purposes and provide useful perspectives for the development of more effective and accurate TPP predictors.

Author Biography

Watshara Shoombuatong, Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700. Phone: +66 2 441 4371; Fax: +66 2 441 4380; E-mail: watshara.sho@mahidol.ac.th

I am highly motivated to design and develop cutting-edge computational algorithms, models and pipelines to address a range of challenging problems in drug discovery and development.

Published

2022-03-02

How to Cite

Charoenkwan, P., Schaduangrat, N., Hasan, M. M., Moni, M. A., Lió, P., & Shoombuatong, W. (2022). Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins. EXCLI Journal, 21, 554–570. https://doi.org/10.17179/excli2022-4723

Issue

Section

Review articles

Categories

Most read articles by the same author(s)