﻿<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD 2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="review-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">EXCLI J</journal-id>
      <journal-title>EXCLI Journal</journal-title>
      <issn pub-type="epub">1611-2156</issn>
      <publisher>
        <publisher-name>Leibniz Research Centre for Working Environment and Human Factors</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">2017-911</article-id>
      <article-id pub-id-type="doi">10.17179/excli2017-911</article-id>
      <article-id pub-id-type="pii">Doc72</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Review article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Data mining for the identification of metabolic syndrome status</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Worachartcheewan</surname>
            <given-names>Apilak</given-names>
          </name>
          <xref ref-type="corresp" rid="COR1">&#x0002a;</xref>
          <xref ref-type="aff" rid="A1">1</xref>
          <xref ref-type="aff" rid="A2">2</xref>
          <xref ref-type="aff" rid="A3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Schaduangrat</surname>
            <given-names>Nalini</given-names>
          </name>
          <xref ref-type="aff" rid="A3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Prachayasittikul</surname>
            <given-names>Virapong</given-names>
          </name>
          <xref ref-type="aff" rid="A4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Nantasenamat</surname>
            <given-names>Chanin</given-names>
          </name>
          <xref ref-type="aff" rid="A3">3</xref>
        </contrib>
      </contrib-group>
      <aff id="A1">
        <label>1</label>Department of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand</aff>
      <aff id="A2">
        <label>2</label>Department of Clinical Chemistry, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand</aff>
      <aff id="A3">
        <label>3</label>Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand</aff>
      <aff id="A4">
        <label>4</label>Department of Clinical Microbiology and Applied Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand</aff>
      <author-notes>
        <corresp id="COR1">*To whom correspondence should be addressed: Apilak Worachartcheewan, Department of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand; Telephone: +66 2 441 4371 ext. 2720, Fax: +66 2 441 4380, E-mail: <email>apilak.woa@mahidol.edu</email></corresp>
      </author-notes>
      <pub-date pub-type="epub">
        <day>10</day>
        <month>01</month>
        <year>2018</year>
      </pub-date>
      <pub-date pub-type="collection">
        <year>2018</year>
      </pub-date>
      <volume>17</volume>
      <fpage>72</fpage>
      <lpage>88</lpage>
      <history>
        <date date-type="received">
          <day>24</day>
          <month>10</month>
          <year>2017</year>
        </date>
        <date date-type="accepted">
          <day>19</day>
          <month>12</month>
          <year>2017</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>Copyright &#xA9; 2018 Worachartcheewan et al.</copyright-statement>
        <copyright-year>2018</copyright-year>
        <license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
          <p>This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/4.0/) You are free to copy, distribute and transmit the work, provided the original author and source are credited.</p>
        </license>
      </permissions>
      <self-uri xlink:href="http://www.excli.de/vol17/Worachartcheewan_10012018_proof.pdf">This article is available from http://www.excli.de/vol17/Worachartcheewan_10012018_proof.pdf</self-uri>
      <abstract><p>Metabolic syndrome (MS) is a condition associated with metabolic abnormalities that are characterized by central obesity (e.g. waist circumference or body mass index), hypertension (e.g. systolic or diastolic blood pressure), hyperglycemia (e.g. fasting plasma glucose) and dyslipidemia (e.g. triglyceride and high-density lipoprotein cholesterol). It is also associated with the development of diabetes mellitus (DM) type 2 and cardiovascular disease (CVD). Therefore, the rapid identification of MS is required to prevent the occurrence of such diseases. Herein, we review the utilization of data mining approaches for MS identification. Furthermore, the concept of quantitative population-health relationship (QPHR) is also presented, which can be defined as the elucidation&#x2F;understanding of the relationship that exists between health parameters and health status. The QPHR modeling uses data mining techniques such as artificial neural network (ANN), support vector machine (SVM), principal component analysis (PCA), decision tree (DT), random forest (RF) and association analysis (AA) for modeling and construction of predictive models for MS characterization. The DT method has been found to outperform other data mining techniques in the identification of MS status. Moreover, the AA technique has proved useful in the discovery of in-depth as well as frequently occurring health parameters that can be used for revealing the rules of MS development. This review presents the potential benefits on the applications of data mining as a rapid identification tool for classifying MS.</p></abstract>
      <kwd-group>
        <kwd>metabolic syndrome</kwd>
        <kwd>health parameters</kwd>
        <kwd>diabetes mellitus</kwd>
        <kwd>cardiovascular diseases</kwd>
        <kwd>data mining</kwd>
        <kwd>QPHR</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec sec-type="intro">
      <title>Introduction</title><p>Over the past century, the advents in science and technology have led to significant and enormous changes in the development of countries, economies, societies and environment as well as improving quality of life. However, the effects of these advancements have led to changes and perturbation of individual&#x2F;population life style, environment, culture, socioeconomic and community network. As a result, this predisposes the population with several internal and external risk factors that possibly cause pathological conditions leading up to diseases (Figure 1<xref ref-type="fig" rid="F1">(Fig. 1)</xref>). These diseases occur via multiple risk factors such as being infected by pathogenic microorganisms (e.g. bacteria, fungi, parasites and viruses), free radicals, carcinogens, toxic compounds, pollutants and genetic abnormalities. Moreover, lifestyle and dietary modifications as well as physical inactivity have led to metabolic abnormalities. The aforementioned risk factors possibly caused diseases such as metabolic syndrome, cardiovascular diseases, diabetes mellitus, cerebrovascular diseases, foodborne diseases, infectious diseases and cancer. Therefore, focusing on health parameters provides an interesting opportunity to explore the health status in individual and population subjects correlating with biochemical changes in the body.</p><p>Interestingly, metabolic syndrome (MS) has been implicated in the development of diabetes mellitus (DM) type 2 (WHO, 2008[<xref ref-type="bibr" rid="R43">43</xref>]) and cardiovascular disease (CVD) (WHO, 2007[<xref ref-type="bibr" rid="R42">42</xref>]). A MS is defined as a clustering of metabolic abnormalities, especially including central obesity (e.g. waist circumference (WC) or body mass index (BMI)), dyslipidemia (e.g. triglyceride (TG) and high-density lipoprotein-cholesterol (HDL-C)), hyperglycemia (e.g. fasting plasma glucose (FPG)), and hypertension (e.g. systolic or diastolic blood pressure (SBP or DBP)) (Alberti et al., 2009[<xref ref-type="bibr" rid="R1">1</xref>]). </p><p>The prevalence of DM has been reported in global incidences from 150 million in the year 2000 with a rapid increase to 220 million by 2010 and is estimated to reach 360 million by 2030 (Amos et al., 1997[<xref ref-type="bibr" rid="R2">2</xref>]; WHO, 2008[<xref ref-type="bibr" rid="R43">43</xref>]). Furthermore, the prevalence of CVD has been predicted to increase from 17.5 million in 2005 to 20 million in 2015 (WHO, 2007[<xref ref-type="bibr" rid="R42">42</xref>]). Therefore, the classification of MS for rapid diagnosis to prevent the development of type 2 DM and CVD is urgently required.</p><p>The criteria for identifying MS has been developed by many organizations, for example, the first criteria was reported by the World Health Organization (WHO) in 1999 (WHO, 1999[<xref ref-type="bibr" rid="R45">45</xref>]). Other criteria for defining MS have been organized by the European Group for the Study of Insulin Resistance (EGIR) (Balkau and Charles, 1999[<xref ref-type="bibr" rid="R3">3</xref>]), the National Cholesterol Education Program Adult Treatment Panel III (NCEP ATPIII) (NCEP ATPIII, 2001[<xref ref-type="bibr" rid="R28">28</xref>]) and the International Diabetes Federation (IDF) (Alberti et al., 2009[<xref ref-type="bibr" rid="R1">1</xref>]). The criteria for identification of MS obtained from different organizations are presented in Table 1<xref ref-type="fig" rid="T1">(Tab. 1)</xref>. </p><p>In fact, the geographical location, ethnicity, race as well as various social and dietary behaviors may lead to obesity, hypertension and diabetes. Moreover, according to the IDF criteria, central obesity (i.e. WC or BMI) is usually indicated as the first criteria followed by a set of two or more metabolic abnormalities. The IDF criteria uses BMI in place of waist circumference as it is significantly correlated (Ryan et al., 2008[<xref ref-type="bibr" rid="R35">35</xref>]). The cut-off for obesity as outlined by the WHO is BMI &#x2265; 30 kg&#x2F;m<sup>2</sup>. However, this value was not appropriate for identifying the BMI status of Asian populations. This may be due to the differences in anthropometry, race&#x2F;ethnic, percentage of body fat, society and dietary behaviors. Therefore, the cut-off criteria was redefined and constructed by the Steering Committee of the Regional office for the Western Pacific Region of WHO, the International Association for the Study of Obesity and the International Obesity Taskforce (WPRO) to be assigned as the new standard, whereby overweight individuals have a BMI &#x2265; 23 kg&#x2F;m<sup>2</sup> and obese individuals have a BMI &#x2265; 25 kg&#x2F;m<sup>2</sup> (WHO, 2000[<xref ref-type="bibr" rid="R44">44</xref>]). Furthermore, Asian populations have a high record of morbidity and mortality rate arising from diabetes mellitus and cardiovascular disease even with a low threshold of central obesity with correspondingly lower waist circumference and lower BMI. Hence, the BMI cut-off for defining obesity in Asian populations was changed to 25 kg&#x2F;m<sup>2</sup> (WHO, 2000[<xref ref-type="bibr" rid="R44">44</xref>]). This new BMI cut-off has demonstrated successful identification of obesity in the Chinese (Ko et al., 2001[<xref ref-type="bibr" rid="R15">15</xref>]), Japanese (Morimoto et al., 2008[<xref ref-type="bibr" rid="R23">23</xref>]), Korean (Oh et al., 2004[<xref ref-type="bibr" rid="R31">31</xref>]), Taiwanese (Pan et al., 2004[<xref ref-type="bibr" rid="R32">32</xref>]) and Thai (Worachartcheewan et al., 2010[<xref ref-type="bibr" rid="R47">47</xref>]) populations as well as being used as the first criteria for MS identification. Furthermore, individuals with an abnormal glucose level or a corresponding insulin level as the first component of the WHO and EGIR criteria (Table 1<xref ref-type="fig" rid="T1">(Tab. 1)</xref>), respectively, followed by 2 or more metabolic abnormalities were identified as having MS. In the NCEP ATPIII criteria, individuals having 3 or more of the MS components were defined as having MS (Table 1<xref ref-type="fig" rid="T1">(Tab. 1)</xref>) while the IDF criteria considered the use of the central obesity as the first component followed by 2 or more abnormalities as identification for MS.</p></sec>
    <sec>
      <title>Overview of Data Mining for Assessment of Health Status</title><sec><title>Concepts of data mining</title><p>Data mining is the process of analyzing and managing data from a large pool of information which leads to the summarization of the data for obtaining knowledge and insight into large databases which seek unknown patterns, classifications, clustering and relationships in the data set (Han and Kamber, 2001[<xref ref-type="bibr" rid="R9">9</xref>]). Data mining is composed of six steps according to the Cross-Industry Standard Process for Data Mining (CRISP-DM) established in 1996. The CRISP-DM aimed to produce a protocol on the performance of data mining that was applicable to everyone (from the novice up to an expert in the field) for a comprehensive data mining methodology and process model (Shearer, 2000[<xref ref-type="bibr" rid="R36">36</xref>]). The Knowledge Discovery in Database (KDD) is also used together with data mining. The process of KDD and data mining are similar, however, data mining is one of the steps of the KDD process which includes data selection, data preprocessing, data transformation, data mining, interpretation&#x2F;evaluation of the model and use of the discovered knowledge (Fayyad et al., 1996[<xref ref-type="bibr" rid="R7">7</xref>]).</p><p>A typical data set as formatted in a spreadsheet or CSV text file is comprised of patients&#x2F;individuals (rows) as well as health parameters and class labels (columns). Health parameters are essentially independent variables X<sub>1</sub><italic><sub>i</sub></italic>, X<sub>2</sub><italic><sub>i</sub></italic>,&#x2026;, X<sub>n</sub><italic><sub>i</sub></italic> defining the unique characteristics of patients&#x2F;individuals while the class label is a dependent variable Y<italic><sub>i</sub></italic>, Y<italic><sub>ii</sub></italic>,&#x2026;, Y<sub>n</sub><italic><sub>i</sub></italic>) of each sample (Nisbet et al., 2009[<xref ref-type="bibr" rid="R29">29</xref>]) as shown in Table 2<xref ref-type="fig" rid="T2">(Tab. 2)</xref>. </p><p>Prior to model construction, independent variables (quantitative data) are scaled so as to afford comparison of variables by means of normalization or standardization (Nantasenamat et al., 2009[<xref ref-type="bibr" rid="R25">25</xref>], 2010[<xref ref-type="bibr" rid="R26">26</xref>]).</p><p>Normalization for independent variables is adjusted in the range of 0 and 1 according to the following equation:</p><p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-i-001" ></inline-graphic></p><p>where</p><p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-i-002" ></inline-graphic></p><p>is the normalized value, <italic>x</italic><italic><sub>ij</sub></italic> is the value of interest,</p><p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-i-003" ></inline-graphic></p><p>is the minimum value and</p><p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-i-004" ></inline-graphic></p><p>is the maximum value.</p><p>Standardization for independent variables is performed in the mean and unit variance by using the following equation:</p><p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-i-005" ></inline-graphic></p><p>where</p><p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-i-006" ></inline-graphic></p><p>represents the standardized value, <italic>x</italic><italic><sub>ij </sub></italic>represents the value of each sample, <italic>x&#x305;</italic><italic><sub>j</sub></italic> represents the mean of each descriptor, and <italic>N</italic> represents the sample size of the data set.</p><p>In addition, the original quantitative data (without normalization or standardization) and qualitative data can also be used to directly build predictive models.</p><p>In the construction of a predictive model, the data set is typically divided into two sets: 1) training set (i.e. for the training of machine learning algorithms to recognize patterns and generate models) 2) testing set (i.e. for the evaluation of the model). The types of generated testing set can be obtained from internal and external testing sets. Cross-validation is an internal testing set which divides the data set into <italic>n</italic> equal parts whereby one part is used as a testing set and the remaining parts are used as training sets until all parts are used as the testing set. A variety of <italic>n</italic>-fold cross validation selection methods have been used to evaluate the predictive models such as 10-fold cross-validation used for a large number of data set which are generated into 10 parts, for example, 500 subjects were separated into 10 equal parts, where 50 samples were used as the testing set and 450 used as the training set. In contrast, leave-one-out is employed for data sets containing a small number of objects where the numbers of folds are equal to the number of data sets. Furthermore, model validation was also performed using an external set that consists of data not used in the model construction (Nantasenamat et al., 2009[<xref ref-type="bibr" rid="R25">25</xref>], 2010[<xref ref-type="bibr" rid="R26">26</xref>]).</p><p>The types of machine learning are categorized into 2 groups: supervised and unsupervised learning. Supervised learning consists of dependent variables assigned as numerical or class labels that make use of machine learning algorithms for the classification or prediction of the data set whereas unsupervised learning is directly performed on the data set for clustering within which dependent variables are not used (Nantasenamat et al., 2009[<xref ref-type="bibr" rid="R25">25</xref>], 2010[<xref ref-type="bibr" rid="R26">26</xref>]; Nantasenamat and Prachayasittikul, 2015[<xref ref-type="bibr" rid="R27">27</xref>]: Prachayasittikul et al., 2015[<xref ref-type="bibr" rid="R33">33</xref>]). Examples of data mining techniques used for supervised and unsupervised learning are displayed in Figure 1<xref ref-type="fig" rid="F1">(Fig. 1)</xref>. In supervised learning, the data mining techniques such as MLR, PLS, ANN and SVM are used to construct predictive models in outputs of numeric data as classification and regression models, and DT, AA, RF, ANN and SVM are used for generating classification model in output of class labels. Considering unsupervised learning, the PCA, HCA, SOM, <italic>k</italic>NN clustering and AA, was applied for the build-up of clustering or classifying data in unassigned output data which is used for understanding the distribution of each cluster and for identifying similar or different groups between the information. (Nantasenamat et al., 2010[<xref ref-type="bibr" rid="R26">26</xref>]; Nantasenamat and Prachayasittikul, 2015[<xref ref-type="bibr" rid="R27">27</xref>]; Prachayasittikul et al., 2015[<xref ref-type="bibr" rid="R33">33</xref>]). Each data mining technique has shown its advantage and disadvantage such as ANN and SVM are non-linear techniques as well as black-box methods whereas MLR is an easy technique that is limited in a huge number of features. Therefore, using data mining should be considered with the type of data that can interpret significant parameters related in the output data. Furthermore, data mining could be applied in sciences and health from small molecules, chemical polymer as well as biological macromolecules up to the population level (Isarankura-Na-Ayudhya, 2009[<xref ref-type="bibr" rid="R12">12</xref>]). </p></sec><sec><title>Data mining for medical&#x2F;clinical applications</title><p>Medical&#x2F;clinical databases are considered as large collections of data composed of patient&#x2F;individual information such as patient history, physiological and biochemical parameters and diseases which have been collected in the hospital or laboratory systems. Therefore, understanding and revealing relationships using medical&#x2F;clinical data are needed to obtain new knowledge in medical&#x2F;clinical fields. Advances in the realm of computational information have allowed the development of new methods and tools for analyzing large quantities of data. Data mining has made use of medical&#x2F;clinical data for discovering patterns and building predictive models (Iavindrasana et al., 2009[<xref ref-type="bibr" rid="R11">11</xref>]; Koh and Tan, 2005[<xref ref-type="bibr" rid="R16">16</xref>]; Lee et al., 2000[<xref ref-type="bibr" rid="R19">19</xref>]; Obenshain, 2004[<xref ref-type="bibr" rid="R30">30</xref>]; Ting et al., 2009[<xref ref-type="bibr" rid="R40">40</xref>]; Yoo et al., 2012[<xref ref-type="bibr" rid="R51">51</xref>]) to help physicians in the decision-making for diagnosis, prognosis and treatment of patients. In addition, data mining has been successfully applied for identifying and building relationship models to display the relationship between health parameters and diseases such as cancer, cerebrovascular disease, diabetes mellitus, food-borne diseases, heart diseases, hypertension, hyperlipidemia, ischemic heart disease, inflammatory bowel disease and metabolic syndrome as shown in Table 3<xref ref-type="fig" rid="T3">(Tab. 3)</xref> (References in Table 3: Nahar et al., 2011[<xref ref-type="bibr" rid="R24">24</xref>]; Yeh et al., 2011[<xref ref-type="bibr" rid="R50">50</xref>]; Quentin-Trautvetter et al., 2002[<xref ref-type="bibr" rid="R34">34</xref>]; Su et al., 2006[<xref ref-type="bibr" rid="R37">37</xref>]; Thakur et al., 2010[<xref ref-type="bibr" rid="R39">39</xref>]; Lee et al., 2000[<xref ref-type="bibr" rid="R19">19</xref>]; Chang et al., 2011[<xref ref-type="bibr" rid="R5">5</xref>]; Wei et al., 2012[<xref ref-type="bibr" rid="R41">41</xref>]; Tantimongcolwat et al., 2008[<xref ref-type="bibr" rid="R38">38</xref>]; Firouzi et al., 2007[<xref ref-type="bibr" rid="R8">8</xref>]; Karimi-Alavijeh et al., 2016[<xref ref-type="bibr" rid="R13">13</xref>]; Worachartcheewan et al., 2010[<xref ref-type="bibr" rid="R46">46</xref>], 2013[<xref ref-type="bibr" rid="R48">48</xref>], 2015[<xref ref-type="bibr" rid="R49">49</xref>]).</p></sec><sec><title>Health parameters</title><p>Health parameters are important variables for assessing health status and for the proper diagnosis of diseases. These parameters are collected in medical databases and are obtained when individuals receive their health check-up and&#x2F;or health assessment with disease conditions. Generally, a physician uses blood chemistry and physical examination together with health history and interview in order to evaluate the health status of a patient. However, delaying diagnosis of diseases may lead to morbidity and mortality for the patient. Therefore, the progression of informative computational technology can help physicians rapidly diagnose and find patterns that recognize risk factors related to developing diseases. As mention above, medical databases collecting a large amount of data are interesting and can be used as a health status evaluation of diseases for individuals. Therefore, to manage this data, powerful computational tools are necessary. Particularly, machine learning approaches namely, data mining is applied on health parameters as to discover patterns and construct predictive models of diseases. The benefit of data mining, using biomedical databases, is for the rapid and automatic diagnosis of MS in order to help with therapeutic or health prevention for individuals having risk factors for disease development.</p></sec><sec><title>Statistical analysis</title><p>To evaluate predictive models, the statistical parameters were performed which comprised of accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) (Kuo et al., 2001[<xref ref-type="bibr" rid="R17">17</xref>]) and Matthews correlation coefficient (MCC) (Matthews, 1975[<xref ref-type="bibr" rid="R21">21</xref>]). These statistical parameters are calculated using the following equations:</p><p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-i-007" ></inline-graphic></p><p>where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives or over-predictions and FN is the number of false negatives or missed predictions. The value of MCC is 0 for a random assignment and 1.0 for a perfect prediction (Matthews, 1975[<xref ref-type="bibr" rid="R21">21</xref>]).</p></sec></sec>
    <sec>
      <title>Quantitative Population-Health Relationship (QPHR)</title><p>The utilization of data mining techniques for assessing the health status in a population via their health parameters had previously been termed by us as quantitative population-health relationship (QPHR) (Worachartcheewan et al. 2013[<xref ref-type="bibr" rid="R48">48</xref>]). QPHR makes use of data mining to elucidate the relationship between physical and biochemical parameters from populations&#x2F;patients with diseases using data mining technique.</p><p>Data mining has been used to extract and explore knowledge from a large amount of data in clinical&#x2F;medicinal settings. A variety of data mining techniques including SVM, ANN, MLR, PCA, SOM, DT and AA have been demonstrated for constructing predictive models of diseases (Chang et al., 2011[<xref ref-type="bibr" rid="R5">5</xref>]; Firouzi et al., 2007[<xref ref-type="bibr" rid="R8">8</xref>]; Kim et al., 2012[<xref ref-type="bibr" rid="R14">14</xref>]; Lee et al., 2000[<xref ref-type="bibr" rid="R19">19</xref>]; Nahar et al., 2011[<xref ref-type="bibr" rid="R24">24</xref>]; Worachartcheewan et al., 2010[<xref ref-type="bibr" rid="R47">47</xref>][<xref ref-type="bibr" rid="R46">46</xref>], 2013[<xref ref-type="bibr" rid="R48">48</xref>], 2015[<xref ref-type="bibr" rid="R49">49</xref>]; Yeh et al., 2011[<xref ref-type="bibr" rid="R50">50</xref>]). In addition, data mining has previously been employed to generate QSAR&#x2F;QSPR models for insight into correlations between physicochemical descriptors and their biological&#x2F;chemical properties (Nantasenamat et al., 2009[<xref ref-type="bibr" rid="R25">25</xref>], 2010[<xref ref-type="bibr" rid="R26">26</xref>]; Nantasenamat and Prachayasittikul, 2015[<xref ref-type="bibr" rid="R27">27</xref>]; Prachayasittikul et al., 2015[<xref ref-type="bibr" rid="R33">33</xref>]).</p><p>QPHR models were used to discover unknown or hidden parameters associated with the progression of diseases. The QPHR models are performed with a clinical aim in diagnosis, prevention and health promotion of populations&#x2F;patients. Furthermore, the QPHR models could be useful in medical&#x2F;clinical data for identifying important risk factors of diseases and classifying individuals who have risk factors in development of said diseases. The procedure of QPHR is illustrated in Figure 2<xref ref-type="fig" rid="F2">(Fig. 2)</xref>.</p><p>The concept of QSAR&#x2F;QSPR and QPHR is similar as they are both used in the construction of predictive models for biological&#x2F;chemical properties (Nantasenamat et al., 2009[<xref ref-type="bibr" rid="R25">25</xref>], 2010[<xref ref-type="bibr" rid="R26">26</xref>]; Nantasenamat and Prachayasittikul, 2015[<xref ref-type="bibr" rid="R27">27</xref>]) and diseases (Worachartcheewan et al., 2013[<xref ref-type="bibr" rid="R48">48</xref>]), respectively. In QSAR&#x2F;QSPR models, quantum chemical and molecular descriptors with their bioactivities are used to find relationships between physicochemical properties and their activities while in QPHR models, health parameters (physiological and blood chemical testing) with diseases are employed to discover patterns or elucidate the relationships between them (Table 4<xref ref-type="fig" rid="T4">(Tab. 4)</xref>).</p><p>The QPHR models could easily be adapted for identifying the development of other diseases. Therefore, QPHR can be used to discover unknown or hidden parameters associated with the progression of diseases for the diagnosis, prevention and health promotion in populations&#x2F;patients.</p><p>In this review, examples of QPHR investigations on MS identification were described and demonstrated. In addition, Figure 1<xref ref-type="fig" rid="F1">(Fig. 1)</xref> displayed the application of data mining techniques which discovered important health parameters as well as risk factors associated with MS and related diseases together with the construction of classification&#x2F;prediction models for screening and assessing health status leading to increased the well-being in individuals and population health. </p><p>MS has been focused as a risk factor associated with DM and CVD. The main cause of MS includes metabolic abnormalities in protein, carbohydrate and lipid metabolisms. Considering the MS criteria, central obesity (BMI&#x2F;WC), hypertension (SBP or DBP), dyslipidemia (TG and HDL-C) and hyperglycemia (FPG) are integral component that define MS (Table 1<xref ref-type="fig" rid="T1">(Tab. 1)</xref>). Furthermore, unknown components correlating with MS have been discovered whereby other factors involved in MS such as genes, socioeconomic status, behavior and dietary intake were demonstrated. In addition, the in-depth components of health parameters that occur frequently together were also illustrated using data mining. Applications of medical data mining for the classification of MS is essential for the early detection before individuals with high risk factors develop DM and CVD.</p><sec><title>MS classification using various machine learning approaches</title><p>Data mining has been employed to identify MS using various approaches such as ANN, SVM, RT, DT and PCA (Worachartcheewan et al., 2013[<xref ref-type="bibr" rid="R48">48</xref>]; de Edelenyi et al., 2008[<xref ref-type="bibr" rid="R6">6</xref>]). In addition, AA technique is also used for discovering combinations of metabolic abnormalities of MS that occur frequently together. The AA rule is correlated with the previous studies that involved metabolic abnormalities based on high levels of TG, FPG and BP and low level of HDL-C (Worachartcheewan et al., 2010[<xref ref-type="bibr" rid="R47">47</xref>][<xref ref-type="bibr" rid="R46">46</xref>]; Lee et al., 2008[<xref ref-type="bibr" rid="R18">18</xref>]). Moreover, the term of applying data mining for assessing health status via health parameters has been organized and called QPHR by Worachartcheewan et al. (2013[<xref ref-type="bibr" rid="R48">48</xref>]). QPHR is defined by using health parameters for the identification associated with health status or diseases that can provide insight into the relationship between an individual&#x27;s health parameters and the development of diseases. In correlating health parameters with MS status, several machine learning techniques have previously been employed, which comprises of ANN, SVM, DT and PCA (Worachartcheewan et al., 2013[<xref ref-type="bibr" rid="R48">48</xref>]). Classification models for MS using various multivariate analysis have been reported that DT is the best QHPR method outperforming ANN and SVM with correct classification of MS and non-MS in greater than 99 &#x25; of cases, followed by ANN and SVM displaying an accuracy of more than 98 &#x25; and 91 &#x25; (Worachartcheewan et al., 2013[<xref ref-type="bibr" rid="R48">48</xref>]), respectively. PCA is used for clustering analysis that displays distinctive MS and non-MS groups. The AA gave the rules that provide health parameters with abnormalities of MS component occurring frequently together (Worachartcheewan et al., 2013[<xref ref-type="bibr" rid="R48">48</xref>]). In addition, an in-depth analysis for the identification of MS component combinations were explored using AA in order to discover metabolic abnormalities of MS components occurring frequently together. The AA was performed by stratified data from quantitative data to qualitative data using WHO and IDF criteria of metabolic abnormalities. This finding showed the combinations of MS components corresponding to previous studies and obtained association rules for the definition of MS (Worachartcheewan et al., 2010[<xref ref-type="bibr" rid="R47">47</xref>][<xref ref-type="bibr" rid="R46">46</xref>]; Lee et al., 2008[<xref ref-type="bibr" rid="R18">18</xref>]). This work was studied in the Thai population.</p><p>Interestingly, DT has been applied to find MS components in the urban and rural Korean population (Kim et al., 2012[<xref ref-type="bibr" rid="R14">14</xref>]). The MS was identified using Modified National Cholesterol Education Program Adult Treatment Panel III criteria. DT displayed the combinations of high TG &#x2B; high SBP, high TG &#x2B; low HDL-C and high WC &#x2B; high SBP &#x2B; high FPG for MS in the urban population while TG &#x2B; SBP &#x2B; WC and SBP &#x2B; WC &#x2B; FPG for MS in the rural population. From this result, similar patterns for combinations of MS components were observed in the previous study and were highlighted by our results (Worachartcheewan et al., 2010[<xref ref-type="bibr" rid="R47">47</xref>][<xref ref-type="bibr" rid="R46">46</xref>]).</p><p>In addition, DT analysis is considered to be a robust data mining technique for constructing predictive model of metabolic syndrome status with accuracy of 73.90 &#x25; (Kim et al., 2012[<xref ref-type="bibr" rid="R14">14</xref>]) and 99.86 &#x25; (Worachartcheewan et al., 2010[<xref ref-type="bibr" rid="R46">46</xref>], 2013[<xref ref-type="bibr" rid="R48">48</xref>]). Furthermore, the SVM method has been shown to yield accuracy of 75.70 &#x25; (Karimi-Alavijeh et al., 2016[<xref ref-type="bibr" rid="R13">13</xref>]) and 91.98&#x25; (Worachartcheewan et al., 2013[<xref ref-type="bibr" rid="R48">48</xref>]). Moreover, the CHAID decision tree has been shown to display an accuracy of 71.80 &#x25; for identifying MS. It was found that WC, TG, HDL-C, and FPG were significant health parameters for the prediction of MS (Miller et al., 2014[<xref ref-type="bibr" rid="R22">22</xref>]).</p><p>The AA has been used to find patterns of MS related diseases. The study conducted on Taiwanese population by Chan et al. (2008[<xref ref-type="bibr" rid="R4">4</xref>]) in MS and DM patients using AA, discovered the relationship between the diseases. It was observed that individuals having high MS were correlated with liver disease and DM individuals were associated with oral diseases such as dental carries, pulpitis, acute gingivitis and periodontosis. Thus, the AA technique exhibited the rules of relations between diseases that can be used to help diagnosis in order to prevent illnesses in patients.</p><p>Furthermore, this technique was used to explore association rules between MS and lifestyle (Huang, 2013[<xref ref-type="bibr" rid="R10">10</xref>]). It was found that individuals having a BMI &#x3E;27 kg&#x2F;m<sup>2</sup> and&#x2F;or participating in vigorous physical exercise less than once a week were predisposed to having MS.</p><p>In addition, ANN and multiple logistic regression have been employed for identifying MS in patients treated with second-generation antipsychotics (SGAs) (Lin et al., 2010[<xref ref-type="bibr" rid="R20">20</xref>]). The results indicated that ANN and logistic regression models gave high accuracy of 88.3 and 83.6&#x25;, respectively, while WC, BMI, DBP and gender were important variables for identifying MS in patients undergoing SGA treatment.</p><p>A study conducted on the French population by de Edelenyi et al. (2008[<xref ref-type="bibr" rid="R6">6</xref>]) showed factors or combinations of factors associated with MS. Particularly, RF was applied for predicting the MS status. Dietary and genetic parameters were used as independent variables while MS or non-MS classes were used as the dependent variables. Important variables were deduced from RF including plasma concentrations of palmitoleic acid, gamma-linolenic acid (GLA) and linoleic acid. Furthermore, 3 essential single-nucleotide polymorphisms (SNPs) were selected by RF composed of APOB rs512535, LTA rs915654 and ACACB rs4766587. The correct classification is 71.4&#x25; to predict the MS status. For interpretation of health parameters, it showed that the palmitoleic acid was significantly higher in MS than non-MS while APOB rs512535 A&#x3E;G and ACACB rs4766587 A&#x3E;G correlated with the development of MS. Furthermore, the RF method was used to explore important health parameters and identify MS by Worachartcheewan et al. (2015[<xref ref-type="bibr" rid="R49">49</xref>]). It was found that TG is considered as the first significant health parameter associated with MS and gave an accuracy &#x3E; 98 &#x25; for the classification of MS. These results correlated with the previous study (Worachartcheewan et al., 2010[<xref ref-type="bibr" rid="R46">46</xref>]).</p><p>The examples of data mining application techniques are used for the classification or identification of MS. These examples help to identify patterns of MS component combinations and find the rules of metabolic abnormalities and related diseases associated with MS.</p><p>Furthermore, in this review, MS has been focused on risk factors associated with DM and CVD. MS is associated with metabolic abnormalities in protein, carbohydrate and lipid metabolisms. Concerning MS criteria, central obesity (BMI&#x2F;WC), BP, dyslipidemia (TG and HDL-C) and hyperglycemia (FPG) are components that can be used to define MS. Furthermore, unknown components correlated with MS have been discovered in order to find other factors that are involved such as genes, socioeconomic status, behavior and diet. In addition, an in-depth analysis of components occurring frequently together was demonstrated via data mining. The application of medical data mining to classify MS is an essential performance for early detection of DM and CVD. Therefore, the data mining could be recommended for identification of MS during an individual&#x27;s health assessment.</p><p>A summary of examples employing data mining for the classification of MS is presented in Table 5<xref ref-type="fig" rid="T5">(Tab. 5)</xref> (References in Table 5: de Edelenyi et al., 2008[<xref ref-type="bibr" rid="R6">6</xref>]; Karimi-Alavijeh et al., 2016[<xref ref-type="bibr" rid="R13">13</xref>]; Kim et al., 2012[<xref ref-type="bibr" rid="R14">14</xref>]; Chan et al., 2008[<xref ref-type="bibr" rid="R4">4</xref>]; Huang, 2013[<xref ref-type="bibr" rid="R10">10</xref>]; Lin et al., 2010[<xref ref-type="bibr" rid="R20">20</xref>]; Worachartcheewan et al., 2010[<xref ref-type="bibr" rid="R46">46</xref>], 2013[<xref ref-type="bibr" rid="R48">48</xref>], 2015[<xref ref-type="bibr" rid="R49">49</xref>]; Miller et al., 2014[<xref ref-type="bibr" rid="R22">22</xref>]). It was used to identify patterns or combinations of MS components as well as to deduce rules for metabolic abnormalities associated with MS.</p></sec></sec>
    <sec sec-type="conclusions">
      <title>Conclusion</title><p>This review article represents the first work of its kind whereby a summary of data mining for the assessment of MS status and discovery of in-depth MS components has been portrayed. This article summarizes the utilization of data mining techniques as a rapid identification tool for the classification of MS and non-MS categories. Complementary knowledge gained from association analysis provides pertinent information on frequently occurring parameters for defining MS. Furthermore, decision tree analysis offers insights on rules leading up to MS or non-MS groups. The topics covered in this article represent an exciting and growing area whereby various machine learning techniques offer useful insights in unravelling the mechanistic basis for MS.</p><p>The applications of data mining for the identification of MS and non-MS have been demonstrated and could potentially be employed as a rapid identification tool for classifying MS. Furthermore, association rule analysis was able to discover the important rules for defining MS. In addition, DT has been shown to be a robust machine learning approach for classifying MS and therefore holds great potential for assessing an individual&#x27;s risk of MS.</p></sec>
    <sec>
      <title>Acknowledgements</title><p>This research project is supported by the Office of the Higher Education Commission and Mahidol University under the National Research Universities Initiative and the research grant of Mahidol University (B.E. 2556-2558).</p></sec>
  </body>
  <back>
    <ref-list>
      <ref id="R1">
        <label>1</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Alberti</surname>
              <given-names>KG</given-names>
            </name>
            <name>
              <surname>Eckel</surname>
              <given-names>RH</given-names>
            </name>
            <name>
              <surname>Grundy</surname>
              <given-names>SM</given-names>
            </name>
            <name>
              <surname>Zimmet</surname>
              <given-names>PZ</given-names>
            </name>
            <name>
              <surname>Cleeman</surname>
              <given-names>JI</given-names>
            </name>
            <name>
              <surname>Donato</surname>
              <given-names>KA</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Harmonizing the metabolic syndrome: a joint interim statement of the International Diabetes Federation Task Force on epidemiology and prevention;National Heart, Lung, and Blood Institute;American Heart Association;World Heart Federation;International Atherosclerosis Society;and International Association for the Study of Obesity</article-title>
          <source>Circulation</source>
          <year>2009</year>
          <volume>120</volume>
          <fpage>1640</fpage>
          <lpage>1645</lpage>
        </citation>
      </ref>
      <ref id="R2">
        <label>2</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Amos</surname>
              <given-names>AF</given-names>
            </name>
            <name>
              <surname>McCarty</surname>
              <given-names>DJ</given-names>
            </name>
            <name>
              <surname>Zimmet</surname>
              <given-names>P</given-names>
            </name>
          </person-group>
          <article-title>The rising global burden of diabetes and its complications: estimates and projections to the year 2010</article-title>
          <source>Diabet Med</source>
          <year>1997</year>
          <volume>14</volume>
          <issue>Suppl 5</issue>
          <fpage>S1</fpage>
          <lpage>85</lpage>
        </citation>
      </ref>
      <ref id="R3">
        <label>3</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Balkau</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Charles</surname>
              <given-names>MA</given-names>
            </name>
          </person-group>
          <article-title>Comment on the provisional report from the WHO consultation. European Group for the Study of Insulin Resistance (EGIR)</article-title>
          <source>Diabet Med</source>
          <year>1999</year>
          <volume>16</volume>
          <fpage>442</fpage>
          <lpage>443</lpage>
        </citation>
      </ref>
      <ref id="R4">
        <label>4</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Chan</surname>
              <given-names>C-L</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>C-W</given-names>
            </name>
            <name>
              <surname>Liu</surname>
              <given-names>B-J</given-names>
            </name>
          </person-group>
          <article-title>Discovery of association rules in metabolic syndrome related diseases</article-title>
          <year>2008</year>
          <conf-name>IEEE International Joint Conference on Neural Networks IJCNN 2008</conf-name>
          <publisher-loc>Piscataway NJ</publisher-loc>
          <publisher-name>IEEE</publisher-name>
          <fpage>856</fpage>
          <lpage>862</lpage>
        </citation>
      </ref>
      <ref id="R5">
        <label>5</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Chang</surname>
              <given-names>C-D</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>C-C</given-names>
            </name>
            <name>
              <surname>Jiang</surname>
              <given-names>BC</given-names>
            </name>
          </person-group>
          <article-title>Using data mining techniques for multi-diseases prediction modeling of hypertension and hyperlipidemia by common risk factors</article-title>
          <source>Expert Syst Appl</source>
          <year>2011</year>
          <volume>38</volume>
          <fpage>5507</fpage>
          <lpage>5513</lpage>
        </citation>
      </ref>
      <ref id="R6">
        <label>6</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>de Edelenyi</surname>
              <given-names>FS</given-names>
            </name>
            <name>
              <surname>Goumidi</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Bertrais</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Phillips</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>MacManus</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Roche</surname>
              <given-names>H</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Prediction of the metabolic syndrome status based on dietary and genetic parameters, using Random Forest</article-title>
          <source>Genes Nutr</source>
          <year>2008</year>
          <volume>3</volume>
          <fpage>173–6</fpage>
        </citation>
      </ref>
      <ref id="R7">
        <label>7</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Fayyad</surname>
              <given-names>U</given-names>
            </name>
            <name>
              <surname>Piatetsky-Shapiro</surname>
              <given-names>G</given-names>
            </name>
            <name>
              <surname>Smyth</surname>
              <given-names>P</given-names>
            </name>
          </person-group>
          <article-title>From data mining to knowledge discovery in database</article-title>
          <source>Commun ACM</source>
          <year>1996</year>
          <volume>39</volume>
          <fpage>21</fpage>
          <lpage>26</lpage>
        </citation>
      </ref>
      <ref id="R8">
        <label>8</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Firouzi</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Rashidi</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Hashemi</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Kangavari</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Bahari</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Daryani</surname>
              <given-names>NE</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>A decision tree-based approach for determining low bone mineral density in inflammatory bowel disease using WEKA software</article-title>
          <source>Eur J Gastroenterol Hepatol</source>
          <year>2007</year>
          <volume>19</volume>
          <fpage>1075</fpage>
          <lpage>1081</lpage>
        </citation>
      </ref>
      <ref id="R9">
        <label>9</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Han</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Kamber</surname>
              <given-names>M</given-names>
            </name>
          </person-group>
          <source>Data mining: concepts and techniques</source>
          <year>2001</year>
          <publisher-loc>San Francisco, CA</publisher-loc>
          <publisher-name>Morgan Kaufmann Publ</publisher-name>
        </citation>
      </ref>
      <ref id="R10">
        <label>10</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Huang</surname>
              <given-names>YC</given-names>
            </name>
          </person-group>
          <article-title>The application of data mining to explore association rules between metabolic syndrome and lifestyles</article-title>
          <source>HIM J</source>
          <year>2013</year>
          <volume>42</volume>
          <issue>3</issue>
          <fpage>29</fpage>
          <lpage>36</lpage>
        </citation>
      </ref>
      <ref id="R11">
        <label>11</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Iavindrasana</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Cohen</surname>
              <given-names>G</given-names>
            </name>
            <name>
              <surname>Depeursinge</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Muller</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Meyer</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Geissbuhler</surname>
              <given-names>A</given-names>
            </name>
          </person-group>
          <article-title>Clinical data mining: a review</article-title>
          <source>Yearb Med Inform</source>
          <year>2009</year>
          <fpage>121</fpage>
          <lpage>133</lpage>
        </citation>
      </ref>
      <ref id="R12">
        <label>12</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Isarankura-Na-Ayudhya</surname>
              <given-names>C</given-names>
            </name>
          </person-group>
          <source>Protein engineering: innovation in developing biomolecules of the century</source>
          <year>2009</year>
          <publisher-loc>Nonthaburi, Thailand</publisher-loc>
          <publisher-name>Process Color Design &#x26; Printing Ltd Partnership</publisher-name>
        </citation>
      </ref>
      <ref id="R13">
        <label>13</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Karimi-Alavijeh</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Jalili</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Sadeghi</surname>
              <given-names>M</given-names>
            </name>
          </person-group>
          <article-title>Predicting metabolic syndrome using decision tree and support vector machine methods</article-title>
          <source>ARYA Atheroscler</source>
          <year>2016</year>
          <volume>12</volume>
          <fpage>146</fpage>
          <lpage>152</lpage>
        </citation>
      </ref>
      <ref id="R14">
        <label>14</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Kim</surname>
              <given-names>TN</given-names>
            </name>
            <name>
              <surname>Kim</surname>
              <given-names>JM</given-names>
            </name>
            <name>
              <surname>Won</surname>
              <given-names>JC</given-names>
            </name>
            <name>
              <surname>Park</surname>
              <given-names>MS</given-names>
            </name>
            <name>
              <surname>Lee</surname>
              <given-names>SK</given-names>
            </name>
            <name>
              <surname>Yoon</surname>
              <given-names>SH</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>A decision tree-based approach for identifying urban-rural differences in metabolic syndrome risk factors in the adult Korean population</article-title>
          <source>J Endocrinol Invest</source>
          <year>2012</year>
          <volume>35</volume>
          <fpage>847</fpage>
          <lpage>852</lpage>
        </citation>
      </ref>
      <ref id="R15">
        <label>15</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Ko</surname>
              <given-names>GT</given-names>
            </name>
            <name>
              <surname>Tang</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Chan</surname>
              <given-names>JC</given-names>
            </name>
            <name>
              <surname>Sung</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Wu</surname>
              <given-names>MM</given-names>
            </name>
            <name>
              <surname>Wai</surname>
              <given-names>HP</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Lower BMI cutoff value to define obesity in Hong Kong Chinese: an analysis based on body fat assessment by bioelectrical impedance</article-title>
          <source>Br J Nutr</source>
          <year>2001</year>
          <volume>85</volume>
          <fpage>239–42</fpage>
        </citation>
      </ref>
      <ref id="R16">
        <label>16</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Koh</surname>
              <given-names>HC</given-names>
            </name>
            <name>
              <surname>Tan</surname>
              <given-names>G</given-names>
            </name>
          </person-group>
          <article-title>Data mining applications in healthcare</article-title>
          <source>J Healthc Inf Manag</source>
          <year>2005</year>
          <volume>19</volume>
          <fpage>64</fpage>
          <lpage>72</lpage>
        </citation>
      </ref>
      <ref id="R17">
        <label>17</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Kuo</surname>
              <given-names>WJ</given-names>
            </name>
            <name>
              <surname>Chang</surname>
              <given-names>RF</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>DR</given-names>
            </name>
            <name>
              <surname>Lee</surname>
              <given-names>CC</given-names>
            </name>
          </person-group>
          <article-title>Data mining with decision trees for diagnosis of breast tumor in medical ultrasonic images</article-title>
          <source>Breast Cancer Res Treat</source>
          <year>2001</year>
          <volume>66</volume>
          <fpage>51</fpage>
          <lpage>57</lpage>
        </citation>
      </ref>
      <ref id="R18">
        <label>18</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Lee</surname>
              <given-names>CM</given-names>
            </name>
            <name>
              <surname>Huxley</surname>
              <given-names>RR</given-names>
            </name>
            <name>
              <surname>Woodward</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Zimmet</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Shaw</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Cho</surname>
              <given-names>NH</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>The metabolic syndrome identifies a heterogeneous group of metabolic component combinations in the Asia-Pacific region</article-title>
          <source>Diabetes Res Clin Pract</source>
          <year>2008</year>
          <volume>81</volume>
          <fpage>377</fpage>
          <lpage>380</lpage>
        </citation>
      </ref>
      <ref id="R19">
        <label>19</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Lee</surname>
              <given-names>IN</given-names>
            </name>
            <name>
              <surname>Liao</surname>
              <given-names>SC</given-names>
            </name>
            <name>
              <surname>Embrechts</surname>
              <given-names>M</given-names>
            </name>
          </person-group>
          <article-title>Data mining techniques applied to medical information</article-title>
          <source>Med Inform Internet Med</source>
          <year>2000</year>
          <volume>25</volume>
          <fpage>81</fpage>
          <lpage>102</lpage>
        </citation>
      </ref>
      <ref id="R20">
        <label>20</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Lin</surname>
              <given-names>CC</given-names>
            </name>
            <name>
              <surname>Bai</surname>
              <given-names>YM</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>JY</given-names>
            </name>
            <name>
              <surname>Hwang</surname>
              <given-names>TJ</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>TT</given-names>
            </name>
            <name>
              <surname>Chiu</surname>
              <given-names>HW</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Easy and low-cost identification of metabolic syndrome in patients treated with second-generation antipsychotics: artificial neural network and logistic regression models</article-title>
          <source>J Clin Psychiatry</source>
          <year>2010</year>
          <volume>71</volume>
          <fpage>225</fpage>
          <lpage>234</lpage>
        </citation>
      </ref>
      <ref id="R21">
        <label>21</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Matthews</surname>
              <given-names>BW</given-names>
            </name>
          </person-group>
          <article-title>Comparison of the predicted and observed secondary structure of T4 phage lysozyme</article-title>
          <source>Biochim Biophys Acta</source>
          <year>1975</year>
          <volume>405</volume>
          <fpage>442–51</fpage>
        </citation>
      </ref>
      <ref id="R22">
        <label>22</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Miller</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Fridline</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Liu</surname>
              <given-names>P-Y</given-names>
            </name>
            <name>
              <surname>Marino</surname>
              <given-names>D</given-names>
            </name>
          </person-group>
          <article-title>Use of CHAID decision trees to formulate pathways for the early detection of metabolic syndrome in young adults</article-title>
          <source>Comput Math Methods Med</source>
          <year>2014</year>
          <volume>2014</volume>
          <fpage>1</fpage>
          <lpage>7</lpage>
        </citation>
      </ref>
      <ref id="R23">
        <label>23</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Morimoto</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Nishimura</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Suzuki</surname>
              <given-names>N</given-names>
            </name>
            <name>
              <surname>Matsudaira</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Taki</surname>
              <given-names>K</given-names>
            </name>
            <name>
              <surname>Tsujino</surname>
              <given-names>D</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Low prevalence of metabolic syndrome and its components in rural Japan</article-title>
          <source>Tohoku J Exp Med</source>
          <year>2008</year>
          <volume>216</volume>
          <fpage>69–75</fpage>
        </citation>
      </ref>
      <ref id="R24">
        <label>24</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Nahar</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Tickle</surname>
              <given-names>KS</given-names>
            </name>
            <name>
              <surname>Ali</surname>
              <given-names>AB</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>YP</given-names>
            </name>
          </person-group>
          <article-title>Significant cancer prevention factor extraction: an association rule discovery approach</article-title>
          <source>J Med Syst</source>
          <year>2011</year>
          <volume>35</volume>
          <fpage>353</fpage>
          <lpage>367</lpage>
        </citation>
      </ref>
      <ref id="R25">
        <label>25</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Isarankura-Na-Ayudhya</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Naenna</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Prachayasittikul</surname>
              <given-names>V</given-names>
            </name>
          </person-group>
          <article-title>A practical overview of quantitative structure-activity relationship</article-title>
          <source>EXCLI J</source>
          <year>2009</year>
          <volume>8</volume>
          <fpage>74</fpage>
          <lpage>88</lpage>
        </citation>
      </ref>
      <ref id="R26">
        <label>26</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Isarankura-Na-Ayudhya</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Prachayasittikul</surname>
              <given-names>V</given-names>
            </name>
          </person-group>
          <article-title>Advances in computational methods to predict the biological activity of compounds Expert Opin Drug Discov</article-title>
          <year>2010</year>
          <volume>5</volume>
          <fpage>633</fpage>
          <lpage>654</lpage>
        </citation>
      </ref>
      <ref id="R27">
        <label>27</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Prachayasittikul</surname>
              <given-names>V</given-names>
            </name>
          </person-group>
          <article-title>Maximizing computational tools for successful drug discovery</article-title>
          <source>Expert Opin Drug Discov</source>
          <year>2015</year>
          <volume>10</volume>
          <fpage>321</fpage>
          <lpage>329</lpage>
        </citation>
      </ref>
      <ref id="R28">
        <label>28</label>
        <citation citation-type="journal">
          <collab>NCEP ATP III</collab>
          <article-title>Expert panel on detection, evaluation, and treatment of high blood cholesterol in adults. Executive summary of the third report of the national cholesterol education program (NCEP) expert panel on detection, evaluation, and treatment of high blood cholesterol in adults (adult treatment panel III)</article-title>
          <source>JAMA</source>
          <year>2001</year>
          <volume>285</volume>
          <fpage>2486</fpage>
          <lpage>2497</lpage>
        </citation>
      </ref>
      <ref id="R29">
        <label>29</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Nisbet</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Elder</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Miner</surname>
              <given-names>G</given-names>
            </name>
          </person-group>
          <source>Handbook of statistical analysis &#x26; data mining application</source>
          <year>2009</year>
          <publisher-loc>Amsterdam</publisher-loc>
          <publisher-name>Elsevier</publisher-name>
        </citation>
      </ref>
      <ref id="R30">
        <label>30</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Obenshain</surname>
              <given-names>MK</given-names>
            </name>
          </person-group>
          <article-title>Application of data mining techniques to healthcare data</article-title>
          <source>Infect Control Hosp Epidemiol</source>
          <year>2004</year>
          <volume>5</volume>
          <fpage>90</fpage>
          <lpage>95</lpage>
        </citation>
      </ref>
      <ref id="R31">
        <label>31</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Oh</surname>
              <given-names>SW</given-names>
            </name>
            <name>
              <surname>Shin</surname>
              <given-names>SA</given-names>
            </name>
            <name>
              <surname>Yun</surname>
              <given-names>YH</given-names>
            </name>
            <name>
              <surname>Yoo</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Huh</surname>
              <given-names>BY</given-names>
            </name>
          </person-group>
          <article-title>Cut-off point of BMI and obesity-related comorbidities and mortality in middle-aged Koreans</article-title>
          <source>Obes Res</source>
          <year>2004</year>
          <volume>12</volume>
          <fpage>2031–40</fpage>
        </citation>
      </ref>
      <ref id="R32">
        <label>32</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Pan</surname>
              <given-names>WH</given-names>
            </name>
            <name>
              <surname>Flegal</surname>
              <given-names>KM</given-names>
            </name>
            <name>
              <surname>Chang</surname>
              <given-names>HY</given-names>
            </name>
            <name>
              <surname>Yeh</surname>
              <given-names>WT</given-names>
            </name>
            <name>
              <surname>Yeh</surname>
              <given-names>CJ</given-names>
            </name>
            <name>
              <surname>Lee</surname>
              <given-names>WC</given-names>
            </name>
          </person-group>
          <article-title>Body mass index and obesity-related metabolic disorders in Taiwanese and US whites and blacks: implications for definitions of overweight and obesity for Asians</article-title>
          <source>Am J Clin Nutr</source>
          <year>2004</year>
          <volume>79</volume>
          <fpage>31–9</fpage>
        </citation>
      </ref>
      <ref id="R33">
        <label>33</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Prachayasittikul</surname>
              <given-names>V</given-names>
            </name>
            <name>
              <surname>Worachartcheewan</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Songtawee</surname>
              <given-names>N</given-names>
            </name>
            <name>
              <surname>Simeon</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Prachayasittikul</surname>
              <given-names>V</given-names>
            </name>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
          </person-group>
          <article-title>Computer-aided drug design of bioactive natural products</article-title>
          <source>Curr Top Med Chem</source>
          <year>2015</year>
          <volume>15</volume>
          <fpage>1780</fpage>
          <lpage>1800</lpage>
        </citation>
      </ref>
      <ref id="R34">
        <label>34</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Quentin-Trautvetter</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Devos</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Duhamel</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Beuscart</surname>
              <given-names>R</given-names>
            </name>
          </person-group>
          <article-title>Assessing association rules and decision trees on analysis of diabetes data from the DiabCare program in France</article-title>
          <source>Stud Health Technol Inform</source>
          <year>2002</year>
          <volume>90</volume>
          <fpage>557</fpage>
          <lpage>561</lpage>
        </citation>
      </ref>
      <ref id="R35">
        <label>35</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Ryan</surname>
              <given-names>MC</given-names>
            </name>
            <name>
              <surname>Fenster Farin</surname>
              <given-names>HM</given-names>
            </name>
            <name>
              <surname>Abbasi</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Reaven</surname>
              <given-names>GM</given-names>
            </name>
          </person-group>
          <article-title>Comparison of waist circumference versus body mass index in diagnosing metabolic syndrome and identifying apparently healthy subjects at increased risk of cardiovascular disease</article-title>
          <source>Am J Cardiol</source>
          <year>2008</year>
          <volume>102</volume>
          <fpage>40</fpage>
          <lpage>46</lpage>
        </citation>
      </ref>
      <ref id="R36">
        <label>36</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Shearer</surname>
              <given-names>C</given-names>
            </name>
          </person-group>
          <article-title>The CRISP-DM model: the new blueprint for data mining</article-title>
          <source>J Data Warehouse</source>
          <year>2000</year>
          <volume>5</volume>
          <fpage>13</fpage>
          <lpage>22</lpage>
        </citation>
      </ref>
      <ref id="R37">
        <label>37</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Su</surname>
              <given-names>C-T</given-names>
            </name>
            <name>
              <surname>Yang</surname>
              <given-names>C-H</given-names>
            </name>
            <name>
              <surname>Hsu</surname>
              <given-names>K-H</given-names>
            </name>
            <name>
              <surname>Chiu</surname>
              <given-names>W-K</given-names>
            </name>
          </person-group>
          <article-title>Data mining for the diagnosis of type II diabetes from three-dimensional body surface anthropometrical scanning data</article-title>
          <source>Comput Math Appl</source>
          <year>2006</year>
          <volume>51</volume>
          <fpage>1075</fpage>
          <lpage>1092</lpage>
        </citation>
      </ref>
      <ref id="R38">
        <label>38</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Tantimongcolwat</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Naenna</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Isarankura-Na-Ayudhya</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Embrechts</surname>
              <given-names>MJ</given-names>
            </name>
            <name>
              <surname>Prachayasittikul</surname>
              <given-names>V</given-names>
            </name>
          </person-group>
          <article-title>Identification of ischemic heart disease via machine learning analysis on magnetocardiograms</article-title>
          <source>Comput Biol Med</source>
          <year>2008</year>
          <volume>38</volume>
          <fpage>817</fpage>
          <lpage>825</lpage>
        </citation>
      </ref>
      <ref id="R39">
        <label>39</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Thakur</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Olafsson</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Lee</surname>
              <given-names>J-S</given-names>
            </name>
            <name>
              <surname>Hurburgh</surname>
              <given-names>CR</given-names>
            </name>
          </person-group>
          <article-title>Data mining for recognizing patterns in foodborne disease outbreaks</article-title>
          <source>J Food Eng</source>
          <year>2010</year>
          <volume>97</volume>
          <fpage>213</fpage>
          <lpage>227</lpage>
        </citation>
      </ref>
      <ref id="R40">
        <label>40</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Ting</surname>
              <given-names>SL</given-names>
            </name>
            <name>
              <surname>Shum</surname>
              <given-names>CC</given-names>
            </name>
            <name>
              <surname>Kwok</surname>
              <given-names>SK</given-names>
            </name>
            <name>
              <surname>Tsang</surname>
              <given-names>AHC</given-names>
            </name>
            <name>
              <surname>Lee</surname>
              <given-names>WB</given-names>
            </name>
          </person-group>
          <article-title>Data mining in biomedicine: current applications and further directions for research</article-title>
          <source>J Software Eng Appl</source>
          <year>2009</year>
          <volume>2</volume>
          <fpage>150</fpage>
          <lpage>159</lpage>
        </citation>
      </ref>
      <ref id="R41">
        <label>41</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Wei</surname>
              <given-names>CK</given-names>
            </name>
            <name>
              <surname>Su</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Yang</surname>
              <given-names>MC</given-names>
            </name>
          </person-group>
          <article-title>Application of data mining on the development of a disease distribution map of screened community residents of Taipei county in Taiwan</article-title>
          <source>J Med Syst</source>
          <year>2012</year>
          <volume>36</volume>
          <fpage>2021</fpage>
          <lpage>2027</lpage>
        </citation>
      </ref>
      <ref id="R42">
        <label>42</label>
        <citation citation-type="web">
          <collab>WHO</collab>
          <article-title>Cardiovascular diseases</article-title>
          <year>2007</year>
          <access-date>15 January 2009</access-date>
          <publisher-loc>Geneva</publisher-loc>
          <publisher-name>WHO</publisher-name>
          <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://www.who.int/mediacentre/factsheets/fs317/en/index.html">http://www.who.int/mediacentre/factsheets/fs317/en/index.html</ext-link></comment>
        </citation>
      </ref>
      <ref id="R43">
        <label>43</label>
        <citation citation-type="web">
          <collab>WHO</collab>
          <article-title>Diabetes</article-title>
          <year>2008</year>
          <access-date>15 January 2009</access-date>
          <publisher-loc>Geneva</publisher-loc>
          <publisher-name>WHO</publisher-name>
          <comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://www.who.int/mediacentre/factsheets/fs312/en/index.html">http://www.who.int/mediacentre/factsheets/fs312/en/index.html</ext-link></comment>
        </citation>
      </ref>
      <ref id="R44">
        <label>44</label>
        <citation citation-type="book">
          <collab>WHO</collab>
          <source>International Association for the Study of Obesity, International Obesity Taskforce. The Asia-Pacific perspective: redefining obesity and its treatment</source>
          <year>2000</year>
          <publisher-loc>Sydney</publisher-loc>
          <publisher-name>Health Communications</publisher-name>
        </citation>
      </ref>
      <ref id="R45">
        <label>45</label>
        <citation citation-type="book">
          <collab>WHO</collab>
          <source>World Health Organization consultation, definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: Diagnosis and classification of diabetes mellitus</source>
          <year>1999</year>
          <publisher-loc>Geneva</publisher-loc>
          <publisher-name>World Health Organization</publisher-name>
        </citation>
      </ref>
      <ref id="R46">
        <label>46</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Worachartcheewan</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Isarankura-Na-Ayudhya</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Pidetcha</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Prachayasittikul</surname>
              <given-names>V</given-names>
            </name>
          </person-group>
          <article-title>Identification of metabolic syndrome using decision tree analysis</article-title>
          <source>Diabetes Res Clin Pract</source>
          <year>2010</year>
          <volume>90</volume>
          <fpage>e15</fpage>
          <lpage>e18</lpage>
        </citation>
      </ref>
      <ref id="R47">
        <label>47</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Worachartcheewan</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Isarankura-Na-Ayudhya</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Pidetcha</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Prachayasittikul</surname>
              <given-names>V</given-names>
            </name>
          </person-group>
          <article-title>Lower BMI cutoff for assessing the prevalence of metabolic syndrome in Thai population</article-title>
          <source>Acta Diabetol</source>
          <year>2010</year>
          <volume>47</volume>
          <issue>Suppl 1</issue>
          <fpage>S91</fpage>
          <lpage>S96</lpage>
        </citation>
      </ref>
      <ref id="R48">
        <label>48</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Worachartcheewan</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Isarankura-Na-Ayudhya</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Prachayasittikul</surname>
              <given-names>V</given-names>
            </name>
          </person-group>
          <article-title>Quantitative population-health relationship (QPHR) for assessing metabolic syndrome</article-title>
          <source>EXCLI J</source>
          <year>2013</year>
          <volume>12</volume>
          <fpage>569</fpage>
          <lpage>583</lpage>
        </citation>
      </ref>
      <ref id="R49">
        <label>49</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Worachartcheewan</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Shoombuatong</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Pidetcha</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Nopnithipat</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Prachayasittikul</surname>
              <given-names>V</given-names>
            </name>
            <name>
              <surname>Nantasenamat</surname>
              <given-names>C</given-names>
            </name>
          </person-group>
          <article-title>Predicting metabolic syndrome using the random forest method</article-title>
          <source>Sci World J</source>
          <year>2015</year>
          <volume>2015</volume>
          <fpage>581501</fpage>
        </citation>
      </ref>
      <ref id="R50">
        <label>50</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Yeh</surname>
              <given-names>D-Y</given-names>
            </name>
            <name>
              <surname>Cheng</surname>
              <given-names>C-H</given-names>
            </name>
            <name>
              <surname>Chen</surname>
              <given-names>Y-W</given-names>
            </name>
          </person-group>
          <article-title>A predictive model for cerebrovascular disease using data mining</article-title>
          <source>Expert Syst Appl</source>
          <year>2011</year>
          <volume>38</volume>
          <fpage>8970</fpage>
          <lpage>8977</lpage>
        </citation>
      </ref>
      <ref id="R51">
        <label>51</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Yoo</surname>
              <given-names>I</given-names>
            </name>
            <name>
              <surname>Alafaireet</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Marinov</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Pena-Hernandez</surname>
              <given-names>K</given-names>
            </name>
            <name>
              <surname>Gopidi</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Chang</surname>
              <given-names>JF</given-names>
            </name>
            <etal />
          </person-group>
          <article-title>Data mining in healthcare and biomedicine: a survey of the literature</article-title>
          <source>J Med Syst</source>
          <year>2012</year>
          <volume>36</volume>
          <fpage>2431</fpage>
          <lpage>2448</lpage>
        </citation>
      </ref>
    </ref-list>
  </back>
  <floats-wrap>
    <fig id="T1" position="float">
      <label>Table 1</label>
      <caption><title>Criteria for defining metabolic syndrome</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-t-001" />
    </fig>
    <fig id="T2" position="float">
      <label>Table 2</label>
      <caption><title>Typical data set format for data mining</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-t-002" />
    </fig>
    <fig id="T3" position="float">
      <label>Table 3</label>
      <caption><title>Example of applications of data mining for medical&#x2F;clinical data</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-t-003" />
    </fig>
    <fig id="T4" position="float">
      <label>Table 4</label>
      <caption><title>The concept of QPHR and QSAR&#x2F;QSPR models</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-t-004" />
    </fig>
    <fig id="T5" position="float">
      <label>Table 5</label>
      <caption><title>Summary of identifying MS using data mining techniques</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-t-005" />
    </fig>
    <fig id="F1" position="float">
      <label>Figure 1</label>
      <caption><title>Risk factors of developing diseases and applications of data mining techniques for assessing health status. AA: association rule analysis, ANN: artificial neural network, DT: decision tree analysis, HCA: Hierarchical component analysis, <italic>k</italic>NN: <italic>k</italic>-nearest neighbor, MLR: multiple linear regression, PCA: principal component analysis, PLS: partial least square, RF: random forest, SOM: self-organizing map and SVM: support vector machine</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-g-001" />
    </fig>
    <fig id="F2" position="float">
      <label>Figure 2</label>
      <caption><title>Schematic representation of the QPHR models</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-17-72-g-002" />
    </fig>
  </floats-wrap>
</article>