<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD 2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">EXCLI J</journal-id>
      <journal-title>EXCLI Journal</journal-title>
      <issn pub-type="epub">1611-2156</issn>
      <publisher>
        <publisher-name>Leibniz Research Centre for Working Environment and Human Factors</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">2023-6410</article-id>
      <article-id pub-id-type="doi">10.17179/excli2023-6410</article-id>
      <article-id pub-id-type="pii">Doc915</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Review article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Empirical comparison and analysis of machine learning-based approaches for druggable protein identification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Shoombuatong</surname>
            <given-names>Watshara</given-names>
          </name>
          <xref ref-type="corresp" rid="COR1">&#x0002a;</xref>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Schaduangrat</surname>
            <given-names>Nalini</given-names>
          </name>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Nikom</surname>
            <given-names>Jaru</given-names>
          </name>
          <xref ref-type="aff" rid="A2">2</xref>
        </contrib>
      </contrib-group>
      <aff id="A1">
        <label>1</label>Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700</aff>
      <aff id="A2">
        <label>2</label>Research Methodology and Data Analytics Program, Faculty of Science &#x26; Technology, Prince of Songkla University, Pattani, Thailand, 94000</aff>
      <author-notes>
        <corresp id="COR1">*To whom correspondence should be addressed: Watshara Shoombuatong, Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700; Phone: +66 2 441 4371, Fax: +66 2 441 4380, E-mail: <email>watshara.sho@mahidol.ac.th</email></corresp>
      </author-notes>
      <pub-date pub-type="epub">
        <day>29</day>
        <month>08</month>
        <year>2023</year>
      </pub-date>
      <pub-date pub-type="collection">
        <year>2023</year>
      </pub-date>
      <volume>22</volume>
      <fpage>915</fpage>
      <lpage>927</lpage>
      <history>
        <date date-type="received">
          <day>27</day>
          <month>07</month>
          <year>2023</year>
        </date>
        <date date-type="accepted">
          <day>15</day>
          <month>08</month>
          <year>2023</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>Copyright &#xA9; 2023 Shoombuatong et al.</copyright-statement>
        <copyright-year>2023</copyright-year>
        <license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
          <p>This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/4.0/) You are free to copy, distribute and transmit the work, provided the original author and source are credited.</p>
        </license>
      </permissions>
      <self-uri xlink:href="https://www.excli.de/vol22/excli2023-6410.pdf">This article is available from https://www.excli.de/vol22/excli2023-6410.pdf</self-uri>
      <abstract><p>Efficiently and precisely identifying drug targets is crucial for developing and discovering potential medications. While conventional experimental approaches can accurately pinpoint these targets, they suffer from time constraints and are not easily adaptable to high-throughput processes. On the other hand, computational approaches, particularly those utilizing machine learning (ML), offer an efficient means to accelerate the prediction of druggable proteins based solely on their primary sequences. Recently, several state-of-the-art computational methods have been developed for predicting and analyzing druggable proteins. These computational methods showed high diversity in terms of benchmark datasets, feature extraction schemes, ML algorithms, evaluation strategies and webserver&#x2F;software usability. Thus, our objective is to reexamine these computational approaches and conduct a comprehensive assessment of their strengths and weaknesses across multiple aspects. In this study, we deliver the first comprehensive survey regarding the state-of-the-art computational approaches for <italic>in silico</italic> prediction of druggable proteins. First, we provided information regarding the existing benchmark datasets and the types of ML methods employed. Second, we investigated the effectiveness of these computational methods in druggable protein identification for each benchmark dataset. Third, we summarized the important features used in this field and the existing webserver&#x2F;software. Finally, we addressed the present constraints of the existing methods and offer valuable guidance to the scientific community in designing and developing novel prediction models. We anticipate that this comprehensive review will provide crucial information for the development of more accurate and efficient druggable protein predictors.</p></abstract>
      <kwd-group>
        <kwd>druggable proteins</kwd>
        <kwd>sequence analysis</kwd>
        <kwd>bioinformatics</kwd>
        <kwd>machine learning</kwd>
        <kwd>deep learning</kwd>
        <kwd>ensemble learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec sec-type="intro">
      <title>Introduction</title><p>Druggable proteins belong to large protein families identified as suitable drug targets. These proteins exhibit the ability to bind with high affinity to small drug-like molecules, leading to desirable therapeutic effects (Liu and Altman, 2014[<xref ref-type="bibr" rid="R25">25</xref>]; Owens, 2007[<xref ref-type="bibr" rid="R32">32</xref>]). Approximately 60 &#x25; of projects in the drug discovery domain lead to failure due to the target being considered undruggable (Sakharkar et al., 2007[<xref ref-type="bibr" rid="R35">35</xref>]). Therefore, the advancement in a drug discovery project, where the precise identification of drug targets is essential, depends on the druggability of a protein (Overington et al., 2006[<xref ref-type="bibr" rid="R31">31</xref>]). Analyzing the three-dimensional structure of a protein through experimental methods leads to a lengthy development cycle (Sakharkar et al., 2007[<xref ref-type="bibr" rid="R35">35</xref>]). Although traditional experimental approaches are capable of accurately identifying drug targets, they are labor-intensive and not easily adaptable for high-throughput applications. Computational approaches that rely solely on the primary sequences of proteins can serve as a valuable supplement to experimental methods, enabling swift characterization and prediction of druggable proteins. The continuous discovery of novel proteins through next-generation sequencing opens up vast opportunities to identify potential druggable candidates that remain unexplored. Therefore, the accurate and rapid identification of druggable proteins from an extensive pool of sequenced proteins is of utmost importance in the quest for developing new drugs (Lindsay, 2005[<xref ref-type="bibr" rid="R23">23</xref>]). </p><p>Over the last few decades, numerous attempts have been made to develop data-driven machine learning (ML)-based computational approaches to further the identification and characterization of a variety of potential proteins and peptides in tandem with the experimental techniques (Charoenkwan et al., 2023[<xref ref-type="bibr" rid="R5">5</xref>][<xref ref-type="bibr" rid="R8">8</xref>]; Hasan et al., 2021[<xref ref-type="bibr" rid="R14">14</xref>]; Qiang et al., 2020[<xref ref-type="bibr" rid="R33">33</xref>]; Rao et al., 2018[<xref ref-type="bibr" rid="R34">34</xref>]; Wang et al., 2019[<xref ref-type="bibr" rid="R41">41</xref>]; Wei et al., 2018[<xref ref-type="bibr" rid="R44">44</xref>]; Xie et al., 2021[<xref ref-type="bibr" rid="R46">46</xref>]). In this field, there are ten existing state-of-the-art computational approaches, including DrugMiner (Jamali et al., 2016[<xref ref-type="bibr" rid="R17">17</xref>]), Sun&#x27;s method (Sun et al., 2018[<xref ref-type="bibr" rid="R38">38</xref>]), GA-Bagging-SVM (Lin et al., 2019[<xref ref-type="bibr" rid="R22">22</xref>]), DrugHybrid&#x5F;BS (Gong et al., 2021[<xref ref-type="bibr" rid="R13">13</xref>]), XGB-DrugPred (Sikander et al., 2022[<xref ref-type="bibr" rid="R36">36</xref>]), Iraji&#x27;s method (Iraji et al., 2022[<xref ref-type="bibr" rid="R16">16</xref>]), Yu&#x27;s method (Yu et al., 2022[<xref ref-type="bibr" rid="R47">47</xref>]), SPIDER (Charoenkwan et al., 2022[<xref ref-type="bibr" rid="R10">10</xref>]), QuoteTarget (Chen et al., 2023[<xref ref-type="bibr" rid="R12">12</xref>]), and DrugFinder (Zhang et al., 2023[<xref ref-type="bibr" rid="R48">48</xref>]). Table 1<xref ref-type="fig" rid="T1">(Tab. 1)</xref> (References in Table 1: Charoenkwan et al., 2022[<xref ref-type="bibr" rid="R10">10</xref>]; Chen et al., 2023[<xref ref-type="bibr" rid="R12">12</xref>]; Gong et al., 2021[<xref ref-type="bibr" rid="R13">13</xref>]; Iraji et al., 2022[<xref ref-type="bibr" rid="R16">16</xref>]; Jamali et al., 2016[<xref ref-type="bibr" rid="R17">17</xref>]; Lin et al., 2019[<xref ref-type="bibr" rid="R22">22</xref>]; Sikander et al., 2022[<xref ref-type="bibr" rid="R36">36</xref>]; Sun et al., 2018[<xref ref-type="bibr" rid="R38">38</xref>]; Yu et al., 2022[<xref ref-type="bibr" rid="R47">47</xref>]; Zhang et al., 2023[<xref ref-type="bibr" rid="R48">48</xref>]) provides the information of these ten existing predictors in terms of benchmark datasets, feature extraction schemes, ML strategies, evaluation methods, and webserver availability. Furthermore, the timelines of the existing computational approaches and webserver&#x2F;software availability are summarized in Figure 1<xref ref-type="fig" rid="F1">(Fig. 1)</xref>. </p><p>In this article, we deliver the first comprehensive survey regarding the existing state-of-the-art predictors. Specifically, we cover a variety of multiple important aspects, including benchmark datasets along with feature extraction schemes, ML strategies, evaluation methods, and webserver availability. First, we summarized all benchmark datasets and the three types of ML methods used for the construction and evaluation of the existing state-of-the-art approaches. Second, we investigated the effectiveness of these computational approaches for each benchmark dataset, considering both cross-validation and independent tests. Third, we provided a summary regarding the important features used in this field and the availability of existing webserver&#x2F;software. Finally, we discussed the current limitations of the existing methods and provided useful guidance to researchers who are interested in developing a more accurate and robust approach in future studies.</p></sec>
    <sec sec-type="materials|methods">
      <title>Materials and Methods</title><sec><title>Overall framework of druggable protein identification using machine learning methods</title><p>The ML framework of druggable protein identification is summarized in Figure 2<xref ref-type="fig" rid="F2">(Fig. 2)</xref>. As can be seen, there are five main stages (Charoenkwan et al., 2021[<xref ref-type="bibr" rid="R3">3</xref>], 2022[<xref ref-type="bibr" rid="R7">7</xref>]; Hongjaisee et al., 2019[<xref ref-type="bibr" rid="R15">15</xref>]). The first stage is to prepare the benchmark training and independent test datasets. The training datasets are used for model training and optimization, while the independent test datasets are used for validating the generalizability and reliability of the models. The second stage is to represent protein sequences into fix-length feature vectors (Qiang et al., 2020[<xref ref-type="bibr" rid="R33">33</xref>]; Wei et al., 2018[<xref ref-type="bibr" rid="R44">44</xref>]). The third stage involves training and optimization of the prediction model based on several ML frameworks. In the fourth stage, the trained prediction models are evaluated using well-known performance evaluation strategies, such as k-fold cross-validation and independent tests (Arif et al., 2020[<xref ref-type="bibr" rid="R1">1</xref>]; Manavalan et al., 2018[<xref ref-type="bibr" rid="R29">29</xref>]). Finally, the selected prediction models are implemented as an online webserver.</p></sec><sec><title>Construction of training and independent test datasets</title><p>Until now, there are four benchmark datasets that have been used for developing the ten existing state-of-the-art computational approaches, including Jamali2016 (Jamali et al., 2016[<xref ref-type="bibr" rid="R17">17</xref>]), Sun2018 (Sun et al., 2018[<xref ref-type="bibr" rid="R38">38</xref>]), Yu2022 (Yu et al., 2022[<xref ref-type="bibr" rid="R47">47</xref>]), and Chen2022 (Chen et al., 2023[<xref ref-type="bibr" rid="R12">12</xref>]). Table 2<xref ref-type="fig" rid="T2">(Tab. 2)</xref> (References in Table 2: Chen et al., 2023[<xref ref-type="bibr" rid="R12">12</xref>]; Jamali et al., 2016[<xref ref-type="bibr" rid="R17">17</xref>]; Sun et al., 2018[<xref ref-type="bibr" rid="R38">38</xref>]; Yu et al., 2022[<xref ref-type="bibr" rid="R47">47</xref>]) provides details of these datasets. The Jamali2016 dataset was established by Jamali et al. (2016[<xref ref-type="bibr" rid="R17">17</xref>]). This dataset consisted of 1,224 positives and 1,319 negatives. In the Jamali2016 dataset, the positive samples were derived from proteins that are able to interact with drugs, while the negative samples were derived from proteins that cannot be deemed as drug targets. The Jamali2016 dataset was selected to develop six druggable protein predictors (i.e., DrugMiner (Jamali et al., 2016[<xref ref-type="bibr" rid="R17">17</xref>]), GA-Bagging-SVM (Lin et al., 2019[<xref ref-type="bibr" rid="R22">22</xref>]), DrugHybrid&#x5F;BS (Gong et al., 2021[<xref ref-type="bibr" rid="R13">13</xref>]), XGB-DrugPred (Sikander et al., 2022[<xref ref-type="bibr" rid="R36">36</xref>]), Iraji&#x27;s method (Iraji et al., 2022[<xref ref-type="bibr" rid="R16">16</xref>]), and DrugFinder (Zhang et al., 2023[<xref ref-type="bibr" rid="R48">48</xref>])). For the Sun2018 dataset, it was introduced by Sun et al. (2018[<xref ref-type="bibr" rid="R38">38</xref>]) and comprises two main sub-datasets, including small and large datasets. The positive samples for the small dataset was directly obtained from the Jamali2016 dataset (1,224 positives), while the positive samples for the large dataset was obtained from experimental small molecules&#x27; targets based on DrugBank (5,503 positives). The negative samples for the small and large datasets consisted of 1,235 and 5,498 samples, respectively, derived from Swiss-Prot (Boeckmann et al., 2003[<xref ref-type="bibr" rid="R2">2</xref>]). Regarding the dataset from Yu2022, it was proposed by Yu et al. (2022[<xref ref-type="bibr" rid="R47">47</xref>]) by considering the Jamali2016 dataset as the training dataset, while Yu et al. utilized the DrugBank 5.0 database (Wishart et al., 2018[<xref ref-type="bibr" rid="R45">45</xref>]) along with the Kim&#x27;s study (Kim et al., 2017[<xref ref-type="bibr" rid="R18">18</xref>]) to create the independent test dataset containing 224 positives and 237 negatives. The Yu2022 dataset was employed to develop a few druggable protein predictors (i.e., Yu&#x27;s method (Yu et al., 2022[<xref ref-type="bibr" rid="R47">47</xref>]) and SPIDER (Charoenkwan et al., 2022[<xref ref-type="bibr" rid="R10">10</xref>])). As for the last benchmark dataset in this field, it was collected from the DrugBank 5.0 database (Wishart et al., 2018[<xref ref-type="bibr" rid="R45">45</xref>]) and the Therapeutic Target Database (TTD) (Wang et al., 2020[<xref ref-type="bibr" rid="R42">42</xref>]). The Blast tool was used to exclude redundant samples, with E-values of 0.001, 1, and 10 (positives, negatives) resulting in databases of (11,803, 7900), (9,389, 5941), and (5330, 3078), respectively.</p></sec><sec><title>State-of-the-art computational approaches for druggable protein identification</title><p>Based on the types of ML methods employed, the existing computational approaches listed in Table 1<xref ref-type="fig" rid="T1">(Tab. 1)</xref> can be categorized into three groups. The first group is developed based on single ML methods, such as neural network (NN), random forest (RF), and eXtreme gradient boosting (XGB). The second group is developed based on ensemble learning methods, such as bagging and stacking strategies; and the third group is developed based on deep learning (DL) methods, such as convolutional neural network (CNN) and recurrent neural network (RNN).</p><p>As can be noticed in Table 1<xref ref-type="fig" rid="T1">(Tab. 1)</xref>, there are four out of ten existing computational approaches designed using single ML methods, including DrugMiner (Jamali et al., 2016[<xref ref-type="bibr" rid="R17">17</xref>]), Sun&#x27;s method (Sun et al., 2018[<xref ref-type="bibr" rid="R38">38</xref>]), XGB-DrugPred (Sikander et al., 2022[<xref ref-type="bibr" rid="R36">36</xref>]), and DrugFinder (Zhang et al., 2023[<xref ref-type="bibr" rid="R48">48</xref>]). In 2016, DrugMiner was introduced by Jamali et al. (2016[<xref ref-type="bibr" rid="R17">17</xref>]) and considered the first sequence-based predictor designed for discriminating druggable proteins from non-druggable proteins. In this method, three feature descriptors, consisting of amino acid composition (AAC), dipeptide composition (DPC), and physicochemical properties (PCP), were used to represent druggable proteins as fix-length feature vectors. Then, Jamali et al. combined these three feature descriptors and represented each sequence with 443-D feature vectors. The Relief method was then used to identify <italic>m</italic> out of 443 features. The high accuracy (ACC) of 0.921 was achieved by using NN in conjunction with the top-130 informative features. For XGB-DrugPred, it was developed based on three well-known feature descriptors (i.e., grouped dipeptide composition (GDPC), reduced amino acid alphabet (RAAA), and pseudo amino acid segmentation (S-PseAAC)). Then, each feature descriptor was optimized using the combination of RFE and XGB. After performing the feature optimization, top-73, top-17, and top-36 information features from RAAA, GDPC, and S-PseAAC, respectively, were determined and integrated to generate the final feature vector. These fnal feature vectors were trained and tested for the performance of ET, RF, and XGB. The high ACC of 0.949 was achieved by using XGB. In case of DrugFinder, it was developed by Zhang et al. (2023[<xref ref-type="bibr" rid="R48">48</xref>]). Zhang et al. performed experiments with many ML methods (i.e., XGB, RF, support vector machine (SVM), naive Bayes (NB), and k-nearest neighbors (KNN)) and feature encoding schemes (i.e., Seq2Vec, Prot&#x5F;T5&#x5F;Xl&#x5F;Uniref50 (T5), position-specific scoring matrix (PSSM), and Prot&#x5F;Bert&#x5F;BFD). Among the four feature encoding schemes, the T5 model was then selected to perform the feature optimization process. The optimal model of Zhang&#x27;s study achieving a cross-validation ACC of 0.950, was obtained from the combination of XGB and the top-1500 information features.</p><p>The limitation of single ML methods is that their performance was not satisfactory enough for practical applications. Therefore, the goal of ensemble learning methods is to integrate heterogenous weak ML models to create a single hybrid model with a more comprehensive performance. As shown in Table 1<xref ref-type="fig" rid="T1">(Tab. 1)</xref>, there are three computational approaches employed the ensemble learning methods to construct the prediction models, including GA-Bagging-SVM (Lin et al., 2019[<xref ref-type="bibr" rid="R22">22</xref>]), DrugHybrid&#x5F;BS (Gong et al., 2021[<xref ref-type="bibr" rid="R13">13</xref>]), and SPIDER (Charoenkwan et al., 2022[<xref ref-type="bibr" rid="R10">10</xref>]). Specifically, GA-Bagging-SVM and DrugHybrid&#x5F;BS were developed based on the bagging strategy, while only SPIDER was developed based on the stacking strategy. For the bagging strategy, there are three main steps for the construction of GA-Bagging-SVM and DrugHybrid&#x5F;BS, including feature representation, feature importance selection, and final model construction. Taking GA-Bagging-SVM as an example, first, three feature descriptors (i.e., PAAC, DPC, and reduced sequence (RS)) were used to represent druggable proteins. The PAAC, DPC, and RS descriptors were defined as 23-D, 400-D, and 163-D feature vectors, respectively. Second, the genetic algorithm (GA) was employed to optimize the original feature vector. Finally, multiple SVM classifiers were integrated to develop a hybrid model using the bagging algorithm. The highest ACC and Matthew&#x27;s correlation coefficient (MCC) of 0.934 and 0.871 were attained by using top-143 informative features. In case of the stacked model SPIDER, it is known as a stacked ensemble learning model. Specifically, SPIDER involves two main levels of learning processes, where the classifiers developed based on the first and second learning processes are called as the base-classifier and meta-classifier, respectively. For the first step, 60 base-classifiers were created by using six different ML methods, each in conjunction with ten feature encodings. In the second step, all the base-classifiers were employed to generate 60 probabilistic features. These features were represented as a 60-dimensional (60-D) feature vector and used for the construction of the stacked model.</p><p>To date, DL method has been known as a cutting-edge technique that is successfully utilized in the field of bioinformatics and computational biology (Charoenkwan et al., 2021[<xref ref-type="bibr" rid="R6">6</xref>]; Rao et al., 2018[<xref ref-type="bibr" rid="R34">34</xref>]; Wang et al., 2019[<xref ref-type="bibr" rid="R41">41</xref>]; Xie et al., 2021[<xref ref-type="bibr" rid="R46">46</xref>]). In this field, Table 1<xref ref-type="fig" rid="T1">(Tab. 1)</xref> shows that there are three computational approaches that employed DL methods to construct the prediction models, including Iraji&#x27;s method (Iraji et al., 2022[<xref ref-type="bibr" rid="R16">16</xref>]), Yu&#x27;s method (Yu et al., 2022[<xref ref-type="bibr" rid="R47">47</xref>]), and QuoteTarget (Chen et al., 2023[<xref ref-type="bibr" rid="R12">12</xref>]). Among these three druggable protein predictors, Iraji&#x27;s method is the first druggable protein predictor applied using the DL method (Iraji et al., 2022[<xref ref-type="bibr" rid="R16">16</xref>]). In Iraji&#x27;s method, Iraji et al. created two prediction models using PCPs. In the first prediction model, each protein sequence is encoded into fix-length feature vectors based on the autocovariance method. The six PCPs, including polarity, hydrophilicity, hydrophobicity, polarizability, net charge index of side chain, and solvent-accessible surface area, were applied in this step. As a result, each protein sequence is represented with a 180-D feature vector. The deep stacked sparse auto-encoders (DSSAEs) network determines important features from the 180 features. Then, a set of the important features is translated into a 30-D feature vector. In the second prediction model, the deep CNN was fed the output of DSSAEs.</p></sec><sec><title>Performance evaluation measures</title><p>To date, k-fold cross-validation and independent tests have been widely used for the performance evaluation of the existing druggable protein predictors. In the case of the 10-fold cross-validation test, the dataset is divided into 10 sub-datasets. For the 1<sup>st</sup> iteration, one of the 10 sub-datasets is treated as the 1<sup>st</sup> testing dataset, while the remaining nine sub-datasets are employed to train the 1<sup>st</sup> prediction model. Thus, the prediction results of the 1<sup>st</sup> prediction model will be evaluated based on the 1<sup>st</sup> testing dataset. As a result, the process of the 10-fold cross-validation test is repeated 10 times. The final performance is obtained from the average performance over 10 individual prediction results. To assess the predictive ability of the existing druggable protein predictors, seven commonly used performance metrics were employed. These include ACC, F1, MCC, sensitivity (Sn), specificity (Sp), area under the receiver operating curve (AUC), and precision (PRE) (Charoenkwan et al., 2022[<xref ref-type="bibr" rid="R4">4</xref>][<xref ref-type="bibr" rid="R9">9</xref>]; Mandrekar, 2010[<xref ref-type="bibr" rid="R30">30</xref>]; Ullah et al., 2021[<xref ref-type="bibr" rid="R39">39</xref>]). They are defined as follows:</p><p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-22-915-i-001" ></inline-graphic></p><p>Specifically, TP and TN represent the numbers of true positives and true negatives, respectively, while FP and FN the numbers of false positives and false negatives, respectively (Lai et al., 2019[<xref ref-type="bibr" rid="R19">19</xref>]; Lv et al., 2020[<xref ref-type="bibr" rid="R28">28</xref>], 2021[<xref ref-type="bibr" rid="R27">27</xref>]; Su et al., 2018[<xref ref-type="bibr" rid="R37">37</xref>]).</p></sec></sec>
    <sec sec-type="discussion">
      <title>Results and Discussion</title><sec><title>Comparative assessment and analysis</title><p>Among the four benchmark datasets, Jamali2016 (Jamali et al., 2016[<xref ref-type="bibr" rid="R17">17</xref>]), Yu2022 (Yu et al., 2022[<xref ref-type="bibr" rid="R47">47</xref>]), and Chen2022 (Chen et al., 2023[<xref ref-type="bibr" rid="R12">12</xref>]) are commonly used for developing druggable protein predictors (Table 2<xref ref-type="fig" rid="T2">(Tab. 2)</xref>). In this section, we assessed and analyzed the performance of all available druggable protein predictors based on each benchmark dataset. </p></sec><sec><title>Performance evaluation on the Jamali2016 dataset</title><p>Jamali et al. (Jamali et al., 2016[<xref ref-type="bibr" rid="R17">17</xref>]) created the Jamali2016 dataset containing 1,224 positives and 1,319 negatives (Table 2<xref ref-type="fig" rid="T2">(Tab. 2)</xref>). Six state-of-the-art druggable protein predictors, including DrugMiner (Jamali et al., 2016[<xref ref-type="bibr" rid="R17">17</xref>]), GA-Bagging-SVM (Lin et al., 2019[<xref ref-type="bibr" rid="R22">22</xref>]), DrugHybrid&#x5F;BS (Gong et al., 2021[<xref ref-type="bibr" rid="R13">13</xref>]), XGB-DrugPred (Sikander et al., 2022[<xref ref-type="bibr" rid="R36">36</xref>]), Iraji&#x27;s method (Iraji et al., 2022[<xref ref-type="bibr" rid="R16">16</xref>]), and DrugFinder (Zhang et al., 2023[<xref ref-type="bibr" rid="R48">48</xref>]), were built and evaluated based on this benchmark dataset using the 5-fold and 10-fold cross-validation tests. The performance comparison results of the Jamali2016 dataset are summarized in Table 3<xref ref-type="fig" rid="T3">(Tab. 3)</xref>. The prediction performance of these six druggable protein predictors was directly obtained from two literatures (i.e., Iraji et al. (2022[<xref ref-type="bibr" rid="R16">16</xref>]) and Zhang et al. (2023[<xref ref-type="bibr" rid="R48">48</xref>])). The highest ACC of 0.983 was achieved by Iraji&#x27;s method, while DrugHybrid&#x5F;BS and DrugFinder performed well with the second and third highest ACC of 0.966 and 0.950, respectively. In addition, Sn and Sp of Iraji&#x27;s method were higher than the compared methods. </p><p>These results indicate that Iraji&#x27;s method achieved superior predictive performance in terms of the Jamali2016 dataset.</p></sec><sec><title>Performance evaluation on the Yu2022 dataset</title><p>Yu et al. (Sun et al., 2018[<xref ref-type="bibr" rid="R38">38</xref>]) constructed the Yu2022 dataset by treating the Jamali2016 dataset as the training dataset and employing the DrugBank 5.0 database (Wishart et al., 2018[<xref ref-type="bibr" rid="R45">45</xref>]) and Kim&#x27;s study (Kim et al., 2017[<xref ref-type="bibr" rid="R18">18</xref>]) to construct the independent test dataset. The final training dataset of this benchmark dataset consisted of 1,224 positives and 1,319 negatives, while its independent test dataset consisted of 224 positives and 237 negatives (Table 2<xref ref-type="fig" rid="T2">(Tab. 2)</xref>). Only two druggable protein predictors, including Yu&#x27;s method (Yu et al., 2022[<xref ref-type="bibr" rid="R47">47</xref>]) and SPIDER (Charoenkwan et al., 2022[<xref ref-type="bibr" rid="R10">10</xref>]), were developed and assessed based on this benchmark dataset in terms of cross-validation and independent tests. The prediction performance of these two druggable protein predictors were directly obtained from the literature (Charoenkwan et al., 2022[<xref ref-type="bibr" rid="R10">10</xref>]). As can be seen in Table 4<xref ref-type="fig" rid="T4">(Tab. 4)</xref>, cross-validation results reveal that SPIDER achieved the highest ACC, Sn, MCC, and F-score of 0.919, 0.895, 0.839, and 0.914, respectively. In terms of the independent test results, SPIDER still demonstrated better performance across almost all performance metrics (i.e., ACC, Sn, MCC, and F-score). Thus, the cross-validation and independent test results on the Yu2022 dataset are sufficient to indicate that SPIDER is an accurate and stable druggable protein predictor. </p></sec><sec><title>Performance evaluation on the Chen2022 datasets</title><p>Chen et al. (2023[<xref ref-type="bibr" rid="R12">12</xref>]) constructed the Chen2022 dataset from the DrugBank 5.0 database (Wishart et al., 2018[<xref ref-type="bibr" rid="R45">45</xref>]) and the Therapeutic Target Database (TTD) (Wang et al., 2020[<xref ref-type="bibr" rid="R42">42</xref>]). In this benchmark dataset, Chen et al. created multiple datasets based on the E-value. Among the several datasets in the study of Chen et al. (2023[<xref ref-type="bibr" rid="R12">12</xref>]), two datasets, namely All-Pfam and App-Pfam, were used to develop and assess three druggable protein predictors, which include GA-Bagging-SVM (Lin et al., 2019[<xref ref-type="bibr" rid="R22">22</xref>]), Yu&#x27;s method (Yu, et al., 2022[<xref ref-type="bibr" rid="R47">47</xref>]), and QuoteTarget (Chen et al., 2023[<xref ref-type="bibr" rid="R12">12</xref>]). The prediction performance of these three druggable protein predictors were directly obtained from the literature (Chen et al., 2023[<xref ref-type="bibr" rid="R12">12</xref>]). The performance comparison results are recorded in Table 5<xref ref-type="fig" rid="T5">(Tab. 5)</xref>. It can be observed that QuoteTarget outperformed GA-Bagging-SVM and Yu&#x27;s method in terms of ACC, Sn, Sp, MCC, and F1 on both the All-Pfam and App-Pfam datasets. Specifically, QuoteTarget achieved the highest MCC of 0.900 and 0.840 on the All-Pfam and App-Pfam datasets, respectively. Meanwhile, the MCC of GA-Bagging-SVM and Yu&#x27;s method on the All-Pfam and App-Pfam datasets were 0.410, 0.250 and 0.500, 0.650, respectively.</p></sec><sec><title>Mechanistic interpretation of the models</title><p>The analysis of important features is able to provide a better understanding of druggable protein identification. Among the existing studies, DrugHybrid&#x5F;BS (Gong et al., 2021[<xref ref-type="bibr" rid="R13">13</xref>]), Iraji&#x27;s method (Iraji et al., 2022[<xref ref-type="bibr" rid="R16">16</xref>]), Yu&#x27;s method (Yu et al., 2022[<xref ref-type="bibr" rid="R47">47</xref>]), SPIDER (Charoenkwan et al., 2022[<xref ref-type="bibr" rid="R10">10</xref>]), and XGB-DrugPred (Sikander et al., 2022[<xref ref-type="bibr" rid="R36">36</xref>]) have made efforts to determine the optimal feature sets and understand the models&#x27; output. For example, in the study of SPIDER, the genetic algorithm (GA) in conjunction with self-assessment-report (SAR) (Charoenkwan et al., 2019[<xref ref-type="bibr" rid="R11">11</xref>]) was used to filter informative features to construct the optimal feature set. Specifically, the Shapley Additive exPlanations (SHAP) method (Li et al., 2021[<xref ref-type="bibr" rid="R20">20</xref>]; Lundberg and Lee, 2017[<xref ref-type="bibr" rid="R26">26</xref>]; Wei et al., 2021[<xref ref-type="bibr" rid="R43">43</xref>]) was selected to perform the feature optimization. In particular, SHAP positive and negative values are referred to as predictions for druggable and non-druggable proteins, respectively. Charoenkwan et al. (2022[<xref ref-type="bibr" rid="R10">10</xref>]) mentioned that LR-RSsecond, LR-DPC, SVM-AAC, SVM-RSpolar, and PLS-RScharge were listed as the top five important features in terms of SHAP value. Their analysis results reported that LR-RSsecond, LR-DPC, SVM-AAC, and SVM-RSpolar had positive SHAP values indicating that they contribute to the prediction of druggable proteins. As a result, for a new unknown sample, if the value of LR-RSsecond of this sample is very low, then this sample will likely be classified as a non-druggable protein; otherwise, it will be classified as a druggable protein.</p></sec><sec><title>Webserver and code availability</title><p>To date, numerous studies have mentioned that developing webservers play an important role in facilitating experimental researchers to carry out their experimental analyses (Charoenkwan et al., 2022[<xref ref-type="bibr" rid="R4">4</xref>], 2023[<xref ref-type="bibr" rid="R5">5</xref>][<xref ref-type="bibr" rid="R8">8</xref>]; Li et al., 2021[<xref ref-type="bibr" rid="R20">20</xref>]). However, only two existing computational approaches (i.e., DrugMiner (Jamali et al., 2016[<xref ref-type="bibr" rid="R17">17</xref>]) and SPIDER (Charoenkwan, et al., 2022[<xref ref-type="bibr" rid="R10">10</xref>])) were deployed as webserver, while five existing studies (i.e., GA-Bagging-SVM (Lin et al., 2019[<xref ref-type="bibr" rid="R22">22</xref>]), XGB-DrugPred (Sikander et al., 2022[<xref ref-type="bibr" rid="R36">36</xref>]), Yu&#x27;s method (Yu et al., 2022[<xref ref-type="bibr" rid="R47">47</xref>]), QuoteTarget (Chen et al., 2023[<xref ref-type="bibr" rid="R12">12</xref>]), and DrugFinder (Zhang et al., 2023[<xref ref-type="bibr" rid="R48">48</xref>])) provided their source codes (Table 6<xref ref-type="fig" rid="T6">(Tab. 6)</xref>; References in Table 6: Charoenkwan et al., 2022[<xref ref-type="bibr" rid="R10">10</xref>]; Chen et al., 2023[<xref ref-type="bibr" rid="R12">12</xref>]; Jamali et al., 2016[<xref ref-type="bibr" rid="R17">17</xref>]; Lin et al., 2019[<xref ref-type="bibr" rid="R22">22</xref>]; Sikander et al., 2022[<xref ref-type="bibr" rid="R36">36</xref>]; Yu et al., 2022[<xref ref-type="bibr" rid="R47">47</xref>]; Zhang et al., 2023[<xref ref-type="bibr" rid="R48">48</xref>]). Please note that, among the five existing studies, the source code of XGB-DrugPred is not accessible (at <ext-link ext-link-type="uri" xlink:href="https:&#47;&#47;github.com&#47;wangphd0&#47;drug">https:&#47;&#47;github.com&#47;wangphd0&#47;drug</ext-link>). In contrast, the DrugMiner source code is publicly available at <ext-link ext-link-type="uri" xlink:href="http:&#47;&#47;www.drugminer.org&#47;">http:&#47;&#47;www.drugminer.org&#47;</ext-link>. DrugMiner was developed using NN in conjunction with top-130 informative features, but its evaluation was based solely on the cross-validation test, limiting its applicability for practical use. On the other hand, SPIDER was evaluated using both the cross-validation and independent tests, and its source code is publicly available at <ext-link ext-link-type="uri" xlink:href="http:&#47;&#47;pmlabstack.pythonanywhere.com&#47;SPIDER">http:&#47;&#47;pmlabstack.pythonanywhere.com&#47;SPIDER</ext-link>. The cross-validation and independent test ACC for SPIDER were 0.919 and 0.907, respectively (Table 4<xref ref-type="fig" rid="T4">(Tab. 4)</xref>). Overall, it can be concluded that SPIDER outperformed that other existing approaches in terms of predictive accuracy.</p></sec></sec>
    <sec>
      <title>Current Limitations and Future Improvements</title><p>In this section, we aim to discuss the current limitations of the ten existing state-of-the-art predictors and provide useful guidanceto the scientific community in the design and development of more accurate, robust, and stable prediction models for in silico prediction of druggable proteins. First, data redundancy is one of the most important factors for model development (Charoenkwan et al., 2021[<xref ref-type="bibr" rid="R3">3</xref>]; Wei et al., 2018[<xref ref-type="bibr" rid="R44">44</xref>]). The current training datasets used to develop the existing methods contained redundant samples. Thus, it could be inferred that the existing methods might not provide stable and robust performance in some cases. To improve the stability and robustness of the models, it is desirable to construct a high-quality dataset by removing redundant samples using the CD-HIT tool (Li and Godzik, 2006[<xref ref-type="bibr" rid="R21">21</xref>]). Second, the interpretability of the existing methods remains unsatisfactory. As mentioned above, few existing methods, including Yu&#x27;s method (Yu et al., 2022[<xref ref-type="bibr" rid="R47">47</xref>]) and SPIDER (Charoenkwan et al., 2022[<xref ref-type="bibr" rid="R10">10</xref>]), achieved impressive performance in both the cross-validation and independent tests. However, these methods cannot directly provide a better understanding of druggable proteins (Liou et al., 2015[<xref ref-type="bibr" rid="R24">24</xref>]; Vasylenko et al., 2015[<xref ref-type="bibr" rid="R40">40</xref>]). Recently, Charoenkwan et al. (2023[<xref ref-type="bibr" rid="R5">5</xref>][<xref ref-type="bibr" rid="R8">8</xref>]) introduced a novel propensity score representation learning (PSR) method for the identification and analysis of several proteins and peptides. In the PSR method, it is capable of generating the propensities of amino acids and dipeptides in a supervised manner. Additionally, PSR-derived propensity scores are able to elucidate the relationship between proteins&#x2F;peptides and their essential physicochemical properties. In the future, we are motivated to employ the PSR method for developing an interpretable druggable protein predictor. Last, a webserver that can predict druggable proteins based on sequence information will greatly facilitate large-scale identification. To date, numerous attempts have been made to develop more accurate and stable druggable protein predictors. However, they have not been deployed as webservers or stand-alone software, limiting their utilization. It is recommended that more online webservers are highly needed to be developed to serve the community-wide efforts in identifying new druggable proteins.</p></sec>
    <sec sec-type="conclusions">
      <title>Conclusions</title><p>In this study, we provide the first comprehensive survey regarding the state-of-the-art computational approaches for <italic>in silico</italic> prediction of druggable proteins. Specifically, we discussed the advantages and disadvantages of the state-of-the-art computational approaches, considering a variety of important aspects that are beneficial for developing an efficient and stable prediction model. These aspects include benchmark datasets along with feature extraction schemes, ML strategies, evaluation methods, and webserver availability. Among the state-of-the-art computational approaches, the experimental results demonstrated that SPIDER was able to provide a more reliable performance in terms of both the cross-validation and independent test results. In addition, this approach has been deployed as a user-friendly webserver, accessible at http:&#x2F;&#x2F;pmlabstack.pythonanywhere.com&#x2F;SPIDER. Although QuoteTarget, Yu&#x27;s method, and Iraji&#x27;s method can produce great performance, their utilization for large-scale identification is limited. Based on our comparative analysis, it can be demonstrated that the SPIDER approach is deemed as the best computational approaches in terms of prediction performance and usability.</p></sec>
    <sec>
      <title>Declaration</title><sec><title>Ethical statement</title><p>This review paper does not include animal or human experiments.</p></sec><sec><title>Conflicts of interest</title><p>The authors declare no conflict of interest.</p></sec><sec><title>Author contribution&#x27;s statement</title><p>WS: Project administration, supervision, designing the study, formal analysis, visualization, investigation, preparation of the manuscript, revision of the manuscript. NS: Revision of the manuscript. JN: Preparation of the manuscript. All authors reviewed and approved the manuscript.</p></sec><sec><title>Acknowledgments</title><p>This work was fully supported by Mahidol University and Faculty of Medical Technology, Mahidol University.</p></sec><sec><title>Funding</title><p>This project is funded by the National Research Council of Thailand and Mahidol University (N42A660380), and Specific League Funds from Mahidol University.</p></sec></sec>
  </body>
  <back>
    <ref-list>
      <ref id="R1">
        <label>1</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Arif</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Ali</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Ahmad</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Kabir</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Ali</surname>
              <given-names>Z</given-names>
            </name>
            <name>
              <surname>Hayat</surname>
              <given-names>M</given-names>
            </name>
          </person-group>
          <article-title>Pred-BVP-Unb: Fast prediction of bacteriophage Virion proteins using un-biased multi-perspective properties with recursive feature elimination</article-title>
          <source>Genomics</source>
          <year>2020</year>
          <volume>112</volume>
          <fpage>1565</fpage>
          <lpage>1574</lpage>
        </citation>
      </ref>
      <ref id="R2">
        <label>2</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Boeckmann</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Bairoch</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Apweiler</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Blatter</surname>
              <given-names>M-C</given-names>
            </name>
            <name>
              <surname>Estreicher</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Gasteiger</surname>
              <given-names>E</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003</article-title>
          <source>Nucl Acids Res</source>
          <year>2003</year>
          <volume>31</volume>
          <issue>1</issue>
          <fpage>365</fpage>
          <lpage>370</lpage>
        </citation>
      </ref>
      <ref id="R3">
        <label>3</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Charoenkwan</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Anuwongcharoen</surname>
              <given-names>N</given-names>
            </name>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Hasan</surname>
              <given-names>MM</given-names>
            </name>
            <name>
              <surname>Shoombuatong</surname>
              <given-names>W</given-names>
            </name>
          </person-group>
          <article-title>In silico approaches for the prediction and analysis of antiviral peptides: a review</article-title>
          <source>Curr Pharm Des</source>
          <year>2021</year>
          <volume>27</volume>
          <fpage>2180</fpage>
          <lpage>2188</lpage>
        </citation>
      </ref>
      <ref id="R4">
        <label>4</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Charoenkwan</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Chiangjong</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Moni</surname>
              <given-names>MA</given-names>
            </name>
            <name>
              <surname>Lio&#x2019;</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Manavalan</surname>
              <given-names>B</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>SCMTHP: A new approach for identifying and characterizing of tumor-homing peptides using estimated propensity scores of amino acids</article-title>
          <source>Pharmaceutics</source>
          <year>2022</year>
          <volume>14</volume>
          <issue>1</issue>
          <fpage>122</fpage>
        </citation>
      </ref>
      <ref id="R5">
        <label>5</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Charoenkwan</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Chumnanpuen</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Schaduangrat</surname>
              <given-names>N</given-names>
            </name>
            <name>
              <surname>Oh</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Manavalan</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Shoombuatong</surname>
              <given-names>W</given-names>
            </name>
          </person-group>
          <article-title>PSRQSP: An effective approach for the interpretable prediction of quorum sensing peptide using propensity score representation learning</article-title>
          <source>Comput Biol Med</source>
          <year>2023</year>
          <volume>158</volume>
          <fpage>106784</fpage>
        </citation>
      </ref>
      <ref id="R6">
        <label>6</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Charoenkwan</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Hasan</surname>
              <given-names>MM</given-names>
            </name>
            <name>
              <surname>Manavalan</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Shoombuatong</surname>
              <given-names>W</given-names>
            </name>
          </person-group>
          <article-title>BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides</article-title>
          <source>Bioinformatics</source>
          <year>2021</year>
          <volume>37</volume>
          <fpage>2556</fpage>
          <lpage>2562</lpage>
        </citation>
      </ref>
      <ref id="R7">
        <label>7</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Charoenkwan</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Hasan</surname>
              <given-names>MM</given-names>
            </name>
            <name>
              <surname>Moni</surname>
              <given-names>MA</given-names>
            </name>
            <name>
              <surname>Manavalan</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Shoombuatong</surname>
              <given-names>W</given-names>
            </name>
          </person-group>
          <article-title>StackDPPIV: A novel computational approach for accurate prediction of dipeptidyl peptidase IV (DPP-IV) inhibitory peptides</article-title>
          <source>Methods</source>
          <year>2022</year>
          <volume>204</volume>
          <fpage>189</fpage>
          <lpage>198</lpage>
        </citation>
      </ref>
      <ref id="R8">
        <label>8</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Charoenkwan</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Pipattanaboon</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Hasan</surname>
              <given-names>MM</given-names>
            </name>
            <name>
              <surname>Moni</surname>
              <given-names>MA</given-names>
            </name>
            <name>
              <surname>Shoombuatong</surname>
              <given-names>W</given-names>
            </name>
          </person-group>
          <article-title>PSRTTCA: A new approach for improving the prediction and characterization of tumor T cell antigens using propensity score representation learning</article-title>
          <source>Comput Biol Med</source>
          <year>2023</year>
          <volume>152</volume>
          <fpage>106368</fpage>
        </citation>
      </ref>
      <ref id="R9">
        <label>9</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Charoenkwan</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Schaduangrat</surname>
              <given-names>N</given-names>
            </name>
            <name>
              <surname>Moni</surname>
              <given-names>MA</given-names>
            </name>
            <name>
              <surname>Manavalan</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Shoombuatong</surname>
              <given-names>W</given-names>
            </name>
          </person-group>
          <article-title>SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins</article-title>
          <source>Comput Biol Med</source>
          <year>2022</year>
          <volume>146</volume>
          <fpage>105704</fpage>
        </citation>
      </ref>
      <ref id="R10">
        <label>10</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Charoenkwan</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Schaduangrat</surname>
              <given-names>N</given-names>
            </name>
            <name>
              <surname>Moni</surname>
              <given-names>MA</given-names>
            </name>
            <name>
              <surname>Shoombuatong</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Manavalan</surname>
              <given-names>B</given-names>
            </name>
          </person-group>
          <article-title>Computational prediction and interpretation of druggable proteins using a stacked ensemble-learning framework</article-title>
          <source>Iscience</source>
          <year>2022</year>
          <volume>25</volume>
          <issue>9</issue>
          <fpage>104883</fpage>
        </citation>
      </ref>
      <ref id="R11">
        <label>11</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Charoenkwan</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Schaduangrat</surname>
              <given-names>N</given-names>
            </name>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Piacham</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Shoombuatong</surname>
              <given-names>W</given-names>
            </name>
          </person-group>
          <article-title>iQSP: a sequence-based tool for the prediction and analysis of quorum sensing peptides via Chou&#x2019;s 5-steps rule and informative physicochemical properties</article-title>
          <source>Int J Mol Sci</source>
          <year>2019</year>
          <volume>21</volume>
          <issue>1</issue>
          <fpage>Int J Mol Sci</fpage>
        </citation>
      </ref>
      <ref id="R12">
        <label>12</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Chen</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Gu</surname>
              <given-names>Z</given-names>
            </name>
            <name>
              <surname>Xu</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Deng</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Lai</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Pei</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>QuoteTarget: A sequence&#x2010;based transformer protein language model to identify potentially druggable protein targets</article-title>
          <source>Protein Sci</source>
          <year>2023</year>
          <volume>32</volume>
          <issue>2</issue>
          <fpage>e4555</fpage>
        </citation>
      </ref>
      <ref id="R13">
        <label>13</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Gong</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Liao</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Zou</surname>
              <given-names>Q</given-names>
            </name>
          </person-group>
          <article-title>DrugHybrid&#x5F;BS: Using hybrid feature combined with bagging-SVM to predict potentially druggable proteins</article-title>
          <source>Front Pharmacol</source>
          <year>2021</year>
          <volume>12</volume>
          <fpage>771808</fpage>
        </citation>
      </ref>
      <ref id="R14">
        <label>14</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Hasan</surname>
              <given-names>MM</given-names>
            </name>
            <name>
              <surname>Alam</surname>
              <given-names>MA</given-names>
            </name>
            <name>
              <surname>Shoombuatong</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Deng</surname>
              <given-names>H-W</given-names>
            </name>
            <name>
              <surname>Manavalan</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Kurata</surname>
              <given-names>H</given-names>
            </name>
          </person-group>
          <article-title>NeuroPred-FRL: an interpretable prediction model for identifying neuropeptide using feature representation learning</article-title>
          <source>Brief Bioinform</source>
          <year>2021</year>
          <volume>22</volume>
          <issue>6</issue>
          <fpage>bbab167</fpage>
        </citation>
      </ref>
      <ref id="R15">
        <label>15</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Hongjaisee</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Carraway</surname>
              <given-names>TS</given-names>
            </name>
            <name>
              <surname>Shoombuatong</surname>
              <given-names>W</given-names>
            </name>
          </person-group>
          <article-title>HIVCoR: A sequence-based tool for predicting HIV-1 CRF01&#x5F;AE coreceptor usage</article-title>
          <source>Comput Biol Chem</source>
          <year>2019</year>
          <volume>80</volume>
          <fpage>419</fpage>
          <lpage>432</lpage>
        </citation>
      </ref>
      <ref id="R16">
        <label>16</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Iraji</surname>
              <given-names>MS</given-names>
            </name>
            <name>
              <surname>Tanha</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Habibinejad</surname>
              <given-names>M</given-names>
            </name>
          </person-group>
          <article-title>Druggable protein prediction using a multi-canal deep convolutional neural network based on autocovariance method</article-title>
          <source>Comput Biol Med</source>
          <year>2022</year>
          <volume>151</volume>
          <fpage>106276</fpage>
        </citation>
      </ref>
      <ref id="R17">
        <label>17</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Jamali</surname>
              <given-names>AA</given-names>
            </name>
            <name>
              <surname>Ferdousi</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Razzaghi</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Safdari</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Ebrahimie</surname>
              <given-names>E</given-names>
            </name>
          </person-group>
          <article-title>DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins</article-title>
          <source>Drug Discov Today</source>
          <year>2016</year>
          <volume>21</volume>
          <fpage>718</fpage>
          <lpage>724</lpage>
        </citation>
      </ref>
      <ref id="R18">
        <label>18</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Kim</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Jo</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Han</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Park</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Lee</surname>
              <given-names>H</given-names>
            </name>
          </person-group>
          <article-title>In silico re-identification of properties of drug target proteins</article-title>
          <source>BMC Bioinformatics</source>
          <year>2017</year>
          <volume>18</volume>
          <fpage>35</fpage>
          <lpage>44</lpage>
        </citation>
      </ref>
      <ref id="R19">
        <label>19</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Lai</surname>
              <given-names>H-Y</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>Z-Y</given-names>
            </name>
            <name>
              <surname>Su</surname>
              <given-names>Z-D</given-names>
            </name>
            <name>
              <surname>Su</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Ding</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>W</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>iProEP: a computational predictor for predicting promoter</article-title>
          <source>Mol Ther Nucleic Acids</source>
          <year>2019</year>
          <volume>17</volume>
          <fpage>337</fpage>
          <lpage>346</lpage>
        </citation>
      </ref>
      <ref id="R20">
        <label>20</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Li</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Guo</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Jin</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Xiang</surname>
              <given-names>D</given-names>
            </name>
            <name>
              <surname>Song</surname>
              <given-names>J</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Porpoise: a new approach for accurate prediction of RNA pseudouridine sites</article-title>
          <source>Brief Bioinform</source>
          <year>2021</year>
          <volume>22</volume>
          <issue>6</issue>
          <fpage>bbab245</fpage>
        </citation>
      </ref>
      <ref id="R21">
        <label>21</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Li</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Godzik</surname>
              <given-names>A</given-names>
            </name>
          </person-group>
          <article-title>Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences</article-title>
          <source>Bioinformatics</source>
          <year>2006</year>
          <volume>22</volume>
          <fpage>1658</fpage>
          <lpage>1659</lpage>
        </citation>
      </ref>
      <ref id="R22">
        <label>22</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Lin</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Liu</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Yu</surname>
              <given-names>B</given-names>
            </name>
          </person-group>
          <article-title>Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier</article-title>
          <source>Artif Intell Med</source>
          <year>2019</year>
          <volume>98</volume>
          <fpage>35</fpage>
          <lpage>47</lpage>
        </citation>
      </ref>
      <ref id="R23">
        <label>23</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Lindsay</surname>
              <given-names>MA</given-names>
            </name>
          </person-group>
          <article-title>Finding new drug targets in the 21st century</article-title>
          <source>Drug Discov Today</source>
          <year>2005</year>
          <volume>10</volume>
          <fpage>1683</fpage>
          <lpage>1687</lpage>
        </citation>
      </ref>
      <ref id="R24">
        <label>24</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Liou</surname>
              <given-names>Y-F</given-names>
            </name>
            <name>
              <surname>Vasylenko</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Yeh</surname>
              <given-names>C-L</given-names>
            </name>
            <name>
              <surname>Lin</surname>
              <given-names>W-C</given-names>
            </name>
            <name>
              <surname>Chiu</surname>
              <given-names>S-H</given-names>
            </name>
            <name>
              <surname>Charoenkwan</surname>
              <given-names>P</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>SCMMTP: identifying and characterizing membrane transport proteins using propensity scores of dipeptides</article-title>
          <source>BMC Genomics</source>
          <year>2015</year>
          <volume>16</volume>
          <fpage>1</fpage>
          <lpage>14</lpage>
        </citation>
      </ref>
      <ref id="R25">
        <label>25</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Liu</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Altman</surname>
              <given-names>R</given-names>
            </name>
          </person-group>
          <article-title>Identifying druggable targets by protein microenvironments matching: application to transcription factors</article-title>
          <source>CPT Pharmacometrics Syst Pharmacol</source>
          <year>2014</year>
          <volume>3</volume>
          <issue>1</issue>
          <fpage>e93</fpage>
        </citation>
      </ref>
      <ref id="R26">
        <label>26</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Lundberg</surname>
              <given-names>SM</given-names>
            </name>
            <name>
              <surname>Lee</surname>
              <given-names>S-I</given-names>
            </name>
          </person-group>
          <article-title>A unified approach to interpreting model predictions</article-title>
          <year>2017</year>
          <conf-name>NIPS&#x27;17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Dec</conf-name>
          <publisher-loc>Red Hook, NY</publisher-loc>
          <publisher-name>Curran Associates Inc.</publisher-name>
          <fpage>4768–77</fpage>
        </citation>
      </ref>
      <ref id="R27">
        <label>27</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Lv</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Dao</surname>
              <given-names>F-Y</given-names>
            </name>
            <name>
              <surname>Guan</surname>
              <given-names>Z-X</given-names>
            </name>
            <name>
              <surname>Yang</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>Y-W</given-names>
            </name>
            <name>
              <surname>Lin</surname>
              <given-names>H</given-names>
            </name>
          </person-group>
          <article-title>Deep-Kcr: accurate detection of lysine crotonylation sites using deep learning method</article-title>
          <source>Brief Bioinform</source>
          <year>2021</year>
          <volume>22</volume>
          <issue>4</issue>
          <fpage>bbaa255</fpage>
        </citation>
      </ref>
      <ref id="R28">
        <label>28</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Lv</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>Z-M</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>S-H</given-names>
            </name>
            <name>
              <surname>Tan</surname>
              <given-names>J-X</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Lin</surname>
              <given-names>H</given-names>
            </name>
          </person-group>
          <article-title>Evaluation of different computational methods on 5-methylcytosine sites identification</article-title>
          <source>Briefings in bioinformatics</source>
          <year>2020</year>
          <volume>21</volume>
          <fpage>982</fpage>
          <lpage>995</lpage>
        </citation>
      </ref>
      <ref id="R29">
        <label>29</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Manavalan</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Shin</surname>
              <given-names>TH</given-names>
            </name>
            <name>
              <surname>Lee</surname>
              <given-names>G</given-names>
            </name>
          </person-group>
          <article-title>PVP-SVM: sequence-based prediction of phage virion proteins using a support vector machine</article-title>
          <source>Front Microbiol</source>
          <year>2018</year>
          <volume>9</volume>
          <fpage>476</fpage>
        </citation>
      </ref>
      <ref id="R30">
        <label>30</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Mandrekar</surname>
              <given-names>JN</given-names>
            </name>
          </person-group>
          <article-title>Receiver operating characteristic curve in diagnostic test assessment</article-title>
          <source>J Thorac Oncol</source>
          <year>2010</year>
          <volume>5</volume>
          <fpage>1315</fpage>
          <lpage>1316</lpage>
        </citation>
      </ref>
      <ref id="R31">
        <label>31</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Overington</surname>
              <given-names>JP</given-names>
            </name>
            <name>
              <surname>Al-Lazikani</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Hopkins</surname>
              <given-names>AL</given-names>
            </name>
          </person-group>
          <article-title>How many drug targets are there&#x3F;</article-title>
          <source>Nat Rev Drug Discov</source>
          <year>2006</year>
          <volume>5</volume>
          <fpage>993</fpage>
          <lpage>996</lpage>
        </citation>
      </ref>
      <ref id="R32">
        <label>32</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Owens</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Determining druggability</article-title>
          <source>Nat Rev Drug Discov</source>
          <year>2007</year>
          <volume>6</volume>
          <issue>3</issue>
          <fpage>187</fpage>
        </citation>
      </ref>
      <ref id="R33">
        <label>33</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Qiang</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Zhou</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Ye</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Du</surname>
              <given-names>P-f</given-names>
            </name>
            <name>
              <surname>Su</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Wei</surname>
              <given-names>L</given-names>
            </name>
          </person-group>
          <article-title>CPPred-FL: a sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning</article-title>
          <source>Brief Bioinform</source>
          <year>2020</year>
          <volume>21</volume>
          <issue>1</issue>
          <fpage>11</fpage>
          <lpage>23</lpage>
        </citation>
      </ref>
      <ref id="R34">
        <label>34</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Rao</surname>
              <given-names>RSP</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>N</given-names>
            </name>
            <name>
              <surname>Xu</surname>
              <given-names>D</given-names>
            </name>
            <name>
              <surname>M&#xF8;ller</surname>
              <given-names>IM</given-names>
            </name>
          </person-group>
          <article-title>CarbonylDB: a curated data-resource of protein carbonylation sites</article-title>
          <source>Bioinformatics</source>
          <year>2018</year>
          <volume>34</volume>
          <fpage>2518</fpage>
          <lpage>2520</lpage>
        </citation>
      </ref>
      <ref id="R35">
        <label>35</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Sakharkar</surname>
              <given-names>MK</given-names>
            </name>
            <name>
              <surname>Sakharkar</surname>
              <given-names>KR</given-names>
            </name>
            <name>
              <surname>Pervaiz</surname>
              <given-names>S</given-names>
            </name>
          </person-group>
          <article-title>Druggability of human disease genes</article-title>
          <source>Int J Biochem Cell Biol</source>
          <year>2007</year>
          <volume>39</volume>
          <fpage>1156</fpage>
          <lpage>1164</lpage>
        </citation>
      </ref>
      <ref id="R36">
        <label>36</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Sikander</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Ghulam</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Ali</surname>
              <given-names>F</given-names>
            </name>
          </person-group>
          <article-title>XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set</article-title>
          <source>Sci Rep</source>
          <year>2022</year>
          <volume>12</volume>
          <issue>1</issue>
          <fpage>5505</fpage>
        </citation>
      </ref>
      <ref id="R37">
        <label>37</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Su</surname>
              <given-names>Z-D</given-names>
            </name>
            <name>
              <surname>Huang</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>Z-Y</given-names>
            </name>
            <name>
              <surname>Zhao</surname>
              <given-names>Y-W</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>D</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>W</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC</article-title>
          <source>Bioinformatics</source>
          <year>2018</year>
          <volume>34</volume>
          <fpage>4196</fpage>
          <lpage>4204</lpage>
        </citation>
      </ref>
      <ref id="R38">
        <label>38</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Sun</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Lai</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Pei</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Analysis of protein features and machine learning algorithms for prediction of druggable proteins</article-title>
          <source>Quant Biol</source>
          <year>2018</year>
          <volume>6</volume>
          <fpage>334</fpage>
          <lpage>343</lpage>
        </citation>
      </ref>
      <ref id="R39">
        <label>39</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Ullah</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Han</surname>
              <given-names>K</given-names>
            </name>
            <name>
              <surname>Hadi</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Xu</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Song</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Yu</surname>
              <given-names>D-J</given-names>
            </name>
          </person-group>
          <article-title>PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection</article-title>
          <source>Brief Bioinform</source>
          <year>2021</year>
          <volume>22</volume>
          <issue>6</issue>
          <fpage>bbab278</fpage>
        </citation>
      </ref>
      <ref id="R40">
        <label>40</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Vasylenko</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Liou</surname>
              <given-names>Y-F</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>H-A</given-names>
            </name>
            <name>
              <surname>Charoenkwan</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Huang</surname>
              <given-names>H-L</given-names>
            </name>
            <name>
              <surname>Ho</surname>
              <given-names>S-Y</given-names>
            </name>
          </person-group>
          <article-title>SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method</article-title>
          <source>BMC Bioinformatics</source>
          <year>2015</year>
          <volume>16</volume>
          <issue>Suppl 1</issue>
          <fpage>S8</fpage>
        </citation>
      </ref>
      <ref id="R41">
        <label>41</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Wang</surname>
              <given-names>D</given-names>
            </name>
            <name>
              <surname>Liang</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Xu</surname>
              <given-names>D</given-names>
            </name>
          </person-group>
          <article-title>Capsule network for protein post-translational modification site prediction</article-title>
          <source>Bioinformatics</source>
          <year>2019</year>
          <volume>35</volume>
          <fpage>2386</fpage>
          <lpage>2394</lpage>
        </citation>
      </ref>
      <ref id="R42">
        <label>42</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Wang</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Zhou</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>Z</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics</article-title>
          <source>Nucl Acids Res</source>
          <year>2020</year>
          <volume>48</volume>
          <issue>D1</issue>
          <fpage>D1031</fpage>
          <lpage>D1041</lpage>
        </citation>
      </ref>
      <ref id="R43">
        <label>43</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Wei</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>He</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Malik</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Su</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Cui</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Manavalan</surname>
              <given-names>B</given-names>
            </name>
          </person-group>
          <article-title>Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework</article-title>
          <source>Brief Bioinform</source>
          <year>2021</year>
          <volume>22</volume>
          <issue>4</issue>
          <fpage>bbaa275</fpage>
        </citation>
      </ref>
      <ref id="R44">
        <label>44</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Wei</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Zhou</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Song</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Su</surname>
              <given-names>R</given-names>
            </name>
          </person-group>
          <article-title>ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides</article-title>
          <source>Bioinformatics</source>
          <year>2018</year>
          <volume>34</volume>
          <fpage>4007</fpage>
          <lpage>4016</lpage>
        </citation>
      </ref>
      <ref id="R45">
        <label>45</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Wishart</surname>
              <given-names>DS</given-names>
            </name>
            <name>
              <surname>Feunang</surname>
              <given-names>YD</given-names>
            </name>
            <name>
              <surname>Guo</surname>
              <given-names>AC</given-names>
            </name>
            <name>
              <surname>Lo</surname>
              <given-names>EJ</given-names>
            </name>
            <name>
              <surname>Marcu</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Grant</surname>
              <given-names>JR</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>DrugBank 5.0: a major update to the DrugBank database for 2018</article-title>
          <source>Nucl Acids Res</source>
          <year>2018</year>
          <volume>46</volume>
          <issue>D1</issue>
          <fpage>D1074</fpage>
          <lpage>D1082</lpage>
        </citation>
      </ref>
      <ref id="R46">
        <label>46</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Xie</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Dai</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Leier</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Marquez-Lago</surname>
              <given-names>TT</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy</article-title>
          <source>Brief Bioinform</source>
          <year>2021</year>
          <volume>22</volume>
          <issue>3</issue>
          <fpage>bbaa125</fpage>
        </citation>
      </ref>
      <ref id="R47">
        <label>47</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Yu</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Xue</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Liu</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Jing</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Luo</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>The applications of deep learning algorithms on in silico druggable proteins identification</article-title>
          <source>J Adv Res</source>
          <year>2022</year>
          <volume>41</volume>
          <fpage>219</fpage>
          <lpage>231</lpage>
        </citation>
      </ref>
      <ref id="R48">
        <label>48</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Zhang</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Wan</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Liu</surname>
              <given-names>T</given-names>
            </name>
          </person-group>
          <article-title>DrugFinder: Druggable protein identification model based on pre-trained models and evolutionary information</article-title>
          <source>Algorithms</source>
          <year>2023</year>
          <volume>16</volume>
          <issue>6</issue>
          <fpage>263</fpage>
        </citation>
      </ref>
    </ref-list>
  </back>
  <floats-wrap>
    <fig id="T1" position="float">
      <label>Table 1</label>
      <caption><title>Summary of existing methods and tools for prediction of druggable proteins</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-22-915-t-001" />
    </fig>
    <fig id="T2" position="float">
      <label>Table 2</label>
      <caption><title>A summary of three benchmark datasets used in the existing methods</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-22-915-t-002" />
    </fig>
    <fig id="T3" position="float">
      <label>Table 3</label>
      <caption><title>Performance comparison of DrugMiner, GA-Bagging-SVM, DrugHybrid&#x5F;BS, XGB-DrugPred, Iraji&#x27;s method, and DrugFinder on the Jamali2016 dataset</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-22-915-t-003" />
    </fig>
    <fig id="T4" position="float">
      <label>Table 4</label>
      <caption><title>Performance comparison of Yu&#x27;s method and SPIDER on the Yu2022 dataset</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-22-915-t-004" />
    </fig>
    <fig id="T5" position="float">
      <label>Table 5</label>
      <caption><title>Performance comparison of Yu&#x27;s method and SPIDER on the Yu2022 dataset</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-22-915-t-005" />
    </fig>
    <fig id="T6" position="float">
      <label>Table 6</label>
      <caption><title>Summary of web server&#x2F;source code availability for druggable protein identification</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-22-915-t-006" />
    </fig>
    <fig id="F1" position="float">
      <label>Figure 1</label>
      <caption><title>Timeline of the existing state-of-the-art predictors (A) and webserver&#x2F;software availability (B)</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-22-915-g-001" />
    </fig>
    <fig id="F2" position="float">
      <label>Figure 2</label>
      <caption><title>The general machine learning framework of the prediction of druggable proteins</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-22-915-g-002" />
    </fig>
  </floats-wrap>
</article>