<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD 2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">EXCLI J</journal-id>
      <journal-title>EXCLI Journal</journal-title>
      <issn pub-type="epub">1611-2156</issn>
      <publisher>
        <publisher-name>Leibniz Research Centre for Working Environment and Human Factors</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">2015-302</article-id>
	  <article-id pub-id-type="doi">10.17179/excli2015-302</article-id>
      <article-id pub-id-type="pii">Doc1232</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Original article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Kumar</surname>
            <given-names>Manish</given-names>
          </name>
          <xref ref-type="corresp" rid="COR1">&#x0002a;</xref>
          <xref ref-type="aff" rid="A1">1</xref>
        </contrib>
      </contrib-group>
      <aff id="A1">
        <label>1</label>Department of Computer Science and Engineering, Indian School of Mines, Dhanbad, Jharkhand, India</aff>
      <author-notes>
        <corresp id="COR1">*To whom correspondence should be addressed: Manish Kumar, Department of Computer Science and Engineering, Indian School of Mines, Dhanbad, Jharkhand, India, E-mail: <email>manishkumar@cse.ism.ac.in</email></corresp>
      </author-notes>
      <pub-date pub-type="epub">
        <day>15</day>
        <month>12</month>
        <year>2015</year>
      </pub-date>
      <pub-date pub-type="collection">
        <year>2015</year>
      </pub-date>
      <volume>14</volume>
      <fpage>1232</fpage>
	  <lpage>1255</lpage>
      <history>
        <date date-type="received">
          <day>01</day>
          <month>05</month>
          <year>2015</year>
        </date>
        <date date-type="accepted">
          <day>19</day>
          <month>11</month>
          <year>2015</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>Copyright &#xA9; 2015 Kumar</copyright-statement>
        <copyright-year>2015</copyright-year>
        <license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
          <p>This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/4.0/) You are free to copy, distribute and transmit the work, provided the original author and source are credited.</p>
        </license>
      </permissions>
      <self-uri xlink:href="http://www.excli.de/vol14/Kumar_15122015_proof.pdf">This article is available from http://www.excli.de/vol14/Kumar_15122015_proof.pdf</self-uri>
      <abstract><p>One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). The basic of multiple sequence alignment problems is to determine the most biologically plausible alignments of protein or DNA sequences. In this paper, an alignment method using genetic algorithm for multiple sequence alignment has been proposed. Two different genetic operators mainly crossover and mutation were defined and implemented with the proposed method in order to know the population evolution and quality of the sequence aligned. The proposed method is assessed with protein benchmark dataset, e.g., BALIBASE, by comparing the obtained results to those obtained with other alignment algorithms, e.g., SAGA, RBT-GA, PRRP, HMMT, SB-PIMA, CLUSTALX, CLUSTAL W, DIALIGN and PILEUP8 etc. Experiments on a wide range of data have shown that the proposed algorithm is much better (it terms of score) than previously proposed algorithms in its ability to achieve high alignment quality. </p></abstract>
      <kwd-group>
        <kwd>bioinformatics</kwd>
        <kwd>multiple sequence alignment</kwd>
        <kwd>genetic algorithm</kwd>
        <kwd>crossover operator</kwd>
        <kwd>mutation operator</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec sec-type="intro">
      <title>Introduction</title><p>The sequence alignment of three or more biological sequences such as the Protein, DNA or RNA (Auyeung and Melcher, 2005[<xref ref-type="bibr" rid="R3">3</xref>]; Wei et al., 2013[<xref ref-type="bibr" rid="R66">66</xref>]) is known as the multiple sequence alignment (Hamidi et al., 2013[<xref ref-type="bibr" rid="R21">21</xref>]). One of the standard techniques in bioinformatics for reviling the relationship between collections of evolutionarily or structurally related protein is sequence alignment. </p><p>Sequence alignment are extensively be used for improving the secondary and tertiary structure of protein and RNA sequences, which is used for drug designing and also to find distance between organism. In MSA, the foremost effort is made to find the optimal alignment for a group of biological sequences. In the past research, we have observed several reliable and efficient techniques for alignment of multiple sequences, which includes evolutionary algorithm (GA) (Peng et al., 2011[<xref ref-type="bibr" rid="R51">51</xref>]), HMM (Eddy, 1998[<xref ref-type="bibr" rid="R13">13</xref>]) and the generic probabilistic metaheuristic for the global optimization problem (Kirkpatrick et al., 1983[<xref ref-type="bibr" rid="R32">32</xref>]). </p><p>One of the widely studied branches in bioinformatics is sequence similarity, also known as a subset of sequence analysis. The available molecular sequence data have enough resources that can teach us about the structure, function and evolution of biological macromolecules. The main objective of an MSA is to align sequences which can show the biological relationship between the input sequences, but to develop a reliable MSA program is never easy. In general the MSA problem can be seen as: Let N number of sequences is supplied as input with a predetermined scoring scheme for finding the best matches among the letters (as every sequences consists of a series of letter). Although, definition stated here is simple but still it requires certain input such as the selection of input sequence and comparison model along with the optimization of the model to get completed in all respect. There are various issues demonstrated in the literature (Aniba et al., 2010[<xref ref-type="bibr" rid="R1">1</xref>]; Pop and Salzberg, 2008[<xref ref-type="bibr" rid="R53">53</xref>]; Sellers,1984[<xref ref-type="bibr" rid="R55">55</xref>]) for alignment of protein sequences. First, the protein family described in sequences databases have complex multi domain architecture with huge unstructured regions. Second, the new sequences selected through automatic methods contains relevant amount of sequence error (Yonghua et al., 2004[<xref ref-type="bibr" rid="R70">70</xref>]; Wen and Tan, 1996[<xref ref-type="bibr" rid="R68">68</xref>]). </p><p>There are various methods which can be used to solve MSA problem such as the iterative (Mohsen et al., 2007[<xref ref-type="bibr" rid="R39">39</xref>]) classical, progressive algorithms (Kupis and Mandziuk, 2007[<xref ref-type="bibr" rid="R33">33</xref>]). All these algorithms are based on global or local alignment (Wei et al., 2013[<xref ref-type="bibr" rid="R66">66</xref>]; Changjin and Tewfik, 2009[<xref ref-type="bibr" rid="R8">8</xref>], Ankit and Huang, 2008[<xref ref-type="bibr" rid="R2">2</xref>]) techniques. The Global alignment technique, aids in making the sequences aligned from end to end points. Whereas, the local alignment technique first identifies a substring within a string and then tries to align it with the target string.</p><p>In general, local alignment is considered for sequence alignment but some time it creates problem because here in local alignment we have to deal with an additional challenge of identifying the regions of similarity. A dynamic programming based approach which are mostly used as the local and global alignment technique is the Smith-Waterman algorithm (Haoyue et al., 2009[<xref ref-type="bibr" rid="R22">22</xref>]) and Needleman-Wunsch algorithm (Needleman and Wunsch, 1970[<xref ref-type="bibr" rid="R43">43</xref>]). The dynamic programming (DP) (Zhimin and Zhong, 2013[<xref ref-type="bibr" rid="R72">72</xref>]) approach are considered to be good alignment option for not more than two sequences. Here, one thing is to be noted that MSA is a combinatorial problem (NP-hard) (Kececioglu and Starrett, 2004[<xref ref-type="bibr" rid="R30">30</xref>]) and when the number of sequences increases the computational effort becomes prohibitive. Feng and Doolittle (1987[<xref ref-type="bibr" rid="R15">15</xref>]) proposed a progressive alignment algorithm (tree-base algorithm), which uses the method of Needleman and Wunsch and for constructing an evolutionary tree (Bhattacharjee et al., 2006[<xref ref-type="bibr" rid="R4">4</xref>]) to know the relationship between sequences. The progressive alignment algorithms perform it operation through branching order of a guide tree and thus often get trapped to local optima (Naznin et al., 2012[<xref ref-type="bibr" rid="R42">42</xref>]). To avoid such kind of local optima it is suggested in the literatures to use either stochastic or iterative procedure (Mohsen et al., 2007[<xref ref-type="bibr" rid="R39">39</xref>]; Gotoh, 1982[<xref ref-type="bibr" rid="R19">19</xref>]). </p><p>By referring to various literature studies (Devereux et al., 1984[<xref ref-type="bibr" rid="R11">11</xref>]; Jagadamba et al., 2011[<xref ref-type="bibr" rid="R27">27</xref>]; Nguyen and Yi, 2011[<xref ref-type="bibr" rid="R45">45</xref>]; Katoh et al., 2005[<xref ref-type="bibr" rid="R29">29</xref>]; Pei and Grishin, 2007[<xref ref-type="bibr" rid="R50">50</xref>]; Li et al., 2004[<xref ref-type="bibr" rid="R35">35</xref>], Ma et al., 2002[<xref ref-type="bibr" rid="R37">37</xref>]; Pearson, 2000[<xref ref-type="bibr" rid="R49">49</xref>]), it can be concluded that none of the existing algorithms were accurate enough to provide an optimal alignment for all the datasets. As a result, with the uses of iterative refinement strategies (Gotoh, 1982[<xref ref-type="bibr" rid="R19">19</xref>]), Hidden Markov Models (Eddy, 1998[<xref ref-type="bibr" rid="R13">13</xref>]) or Genetic Algorithms (Peng et al., 2011[<xref ref-type="bibr" rid="R51">51</xref>]) an iterative algorithms (Mohsen et al., 2007[<xref ref-type="bibr" rid="R39">39</xref>]) were developed to construct more reliable and efficient multiple alignments. Also, all these methods listed above have shown their superiority in aligning distantly related sequences for a variety of datasets (Blackshields et al., 2006[<xref ref-type="bibr" rid="R5">5</xref>]; Thompson et al., 1999[<xref ref-type="bibr" rid="R62">62</xref>]). However, some accuracy was degraded while considering the distantly related sequences. </p><p>The above paragraph gives a clear indication that none of the method listed above can provide an accurate or meaningful alignment in all possible situations, irrespective of their advantages or disadvantages. Progressive alignment methods are known to be very fast and deterministic, but it suffers from a problem in which if any error occurs in the initial alignment and somehow gets propagated to other sequences than it cannot be corrected. However, this type of problem does not exist for iterative methods. In general, iterative methods are much slower in comparison to progressive methods and are used in a place where the best possible alignment is of prime importance and not the computational cost. </p><p>Evolutionary algorithms such as the genetic algorithm, which are based on the natural selection processes, are used for implementing iterative methods. Such algorithms have an upper edge with respect to others in the sense that these algorithms are independent for any types of scoring function. This gives an independency that without much alteration to the alignments, different objective functions can easily be tasted. Also, evolutionary algorithms can give low-cost clusters and multi-core processors because of they can be easily parallelize to meets the current trend. </p><p>In this study, genetic algorithms (Pengfei et al., 2010[<xref ref-type="bibr" rid="R52">52</xref>]) has been considered for experimental analysis. The main advantage of using GA for MSA problem is that it does not requires any particular source of algorithm to solve a given problem. Only, requirement for GA is the fitness function (Dongardive and Abraham, 2012[<xref ref-type="bibr" rid="R12">12</xref>]), for necessary analysis and evaluation of solutions. Because GA is an highly implicitly parallel technique therefore, it can be used to solve various large scale and real time problems such as the travelling sales man problem (Zhang and Wong, 1997[<xref ref-type="bibr" rid="R71">71</xref>]; Ulder et al., 1991[<xref ref-type="bibr" rid="R63">63</xref>]). For a sequences of smaller length it can be possible to do the alignment manually but sequences of larger length requires an algorithm for successful alignment. Progressive alignment technique such as the dynamic programming (DP) suffers from a problem of early convergence or local optima problem and hence cannot be used for alignment of larger sequences. Since, this research work is based on sequences of larger length (see Table 1<xref ref-type="fig" rid="T1">(Tab. 1)</xref>) therefore approaches like GA is considered over DP.</p><p>Analyzing the importance of protein sequences in near future (Thompson et al., 2011[<xref ref-type="bibr" rid="R61">61</xref>]) provoked the author for considering MSA of protein sequences for this research work. Till date, sequence homology is considered to be the main method for predicting protein structure and function along with their evolutionary history (Kimura, 1980[<xref ref-type="bibr" rid="R31">31</xref>]). It has been observed that in the recent years, the tools (Gelly et al., 2011[<xref ref-type="bibr" rid="R16">16</xref>]) for MSA of protein sequences has improved. Various literature and related studied have confirmed that the further improvement in protein sequences can only be possible by combining sequence alignment with some know protein structures. A better performance of alignment of protein sequences can be excepted by proper utilizing the phylogenetic relationships among sequences (Cai et al., 2000[<xref ref-type="bibr" rid="R7">7</xref>]). </p><p>Literature studies (Wong et al., 2000[<xref ref-type="bibr" rid="R69">69</xref>]; Taylor, 2000[<xref ref-type="bibr" rid="R58">58</xref>]; Razmara et al., 2009[<xref ref-type="bibr" rid="R54">54</xref>]; Mott, 2005[<xref ref-type="bibr" rid="R41">41</xref>]) says that there are still a number of challenges in aligning protein sequences. First, the misaligned or less aligned locally conserved regions within the sequences are major and foremost challenges in aligning protein sequences. Second, the misalignment of motif which is found in natively disordered regions. Third, the protein sequences which are found in various databases across the globe contain huge amount of alignment error (Loytynoja and Goldman, 2008[<xref ref-type="bibr" rid="R36">36</xref>]). </p><p>On the basic of literature survey (Devereux et al., 1984[<xref ref-type="bibr" rid="R11">11</xref>]; Jagadamba et al., 2011[<xref ref-type="bibr" rid="R27">27</xref>]; Nguyen and Yi, 2011[<xref ref-type="bibr" rid="R45">45</xref>]; Razmara et al., 2009[<xref ref-type="bibr" rid="R54">54</xref>]; Mott, 2005[<xref ref-type="bibr" rid="R41">41</xref>]) and in order to test the feasibility of the proposed approach a comparison study were made between the proposed method and some of the existing methods such as the SAGA (Notredame and Higgins, 1996[<xref ref-type="bibr" rid="R46">46</xref>]), MSA-GA (Gondro and Kinghorn, 2007[<xref ref-type="bibr" rid="R18">18</xref>]), RBT-GA (Taheri and Zomaya, 2009[<xref ref-type="bibr" rid="R57">57</xref>]), CLUSTALX (Thompson et al., 1997[<xref ref-type="bibr" rid="R59">59</xref>]), CLUSTALW (Thompson et al., 1994[<xref ref-type="bibr" rid="R60">60</xref>]), HMMT (Eddy,1995[<xref ref-type="bibr" rid="R14">14</xref>]), PRRP (Gotoh, 1996[<xref ref-type="bibr" rid="R19">19</xref>]), PILEUP8 (Devereux et al., 1984[<xref ref-type="bibr" rid="R11">11</xref>]) and DIALI (Morgenstern et al., 1996[<xref ref-type="bibr" rid="R40">40</xref>]) by calculating the corresponding BAliscore. Some of these methods are iterative and some of these are progressive. Each of these methods has their own advantages and disadvantages in terms of speed, time, convergence, robustness and ability to align different lengths sequences etc. All such factors which promoted the author to select these different methods for the experimental study are mentioned in the paragraph that follows. </p><p>SAGA, MSA-GA and RBT-GA are the GA based methods. The time complexity of SAGA is larger and are not suffers from the problem of local minima. RBT is an iterative algorithm for sequence alignment using a DP table. CLUSTALW can be seen as an example of progressive approach, and can be used to short out the local optimality problem for the progressive alignment approach. This is the most popular, accurate and practical method in the category of hierarchical methods. The widely used programs for MSA are CLUSTAL W and CLUSTAL X. They are very fast and easy to handle and are capable of aligning datasets of medium sized. The sequences so produced by these methods are of sufficient quality and not requires any manual editing or adjustment. HMMT is based on simulated annealing method. PRRP is a global alignment program which is based on a progressive and iterative approach. This approach is robust. PIMA (Smith and Smith, 1992[<xref ref-type="bibr" rid="R56">56</xref>]) uses a local dynamic programming to align only the most conserved motifs. DIANLIGN (Morgenstern et al., 1996[<xref ref-type="bibr" rid="R40">40</xref>]) uses a local alignment approach that construct MSA based on a segment to segment comparison rather than residue to residue comparison.</p><p>T-Coffee (Notredame et al; 2000[<xref ref-type="bibr" rid="R47">47</xref>]) method which was able to make very accurate alignments of very divergent proteins but only for small sets of sequences and therefore not considered for this experimental study. Also this method is often tapped at local minima. It also has a high computational cost with respect to other methods mentioned above. MAFFT (Katoh et al., 2005[<xref ref-type="bibr" rid="R29">29</xref>]) is very fast and can align sequences ranging from hundred to thousand. It is quite similar to CLUSTAL when it comes to alignment accuracy. But we have also not considered this method in the proposed  research work, as the dataset and the fitness measure used by this algorithm is totally different than those used in this experimental approach.</p><p>The rest of the paper is organized as follow. The next section describes the relevant preliminaries on Alignment, Sequence alignment, MSA, GA, BAliBase and PAM Matrix, followed by the proposed approach section which describes the concepts underlying the research work. The experiments setups required in order to validate and observe the results are discussed in the next section. The second last section explains about the detailed results over different datasets. Finally, the concluding section presents the final consideration. </p></sec>
    <sec>
      <title>Preliminaries</title><p>This section provides a detail idea about the basic concept of the related terms used in the paper such as Alignment, Sequence Alignment, Multiple Sequence Alignment, GAP, BAliBase and PAM Matrix.</p><sec><title>Alignment </title><p>The arrangement of two or more biological sequences in such a way that tells us at what point the sequences are similar and at what point they differ is known as alignment. An alignment is said to be the optimal one, if it has more similar sequences as compared to dissimilar sequences. </p></sec><sec><title>Sequence alignment </title><p>Sequence alignment is a way of arranging the biological sequences so as to identify the region of similarity that may be a result of structural, functional, or evolutionary relationships between the sequences (Hicks et al., 2011[<xref ref-type="bibr" rid="R23">23</xref>]). In bioinformatics, the aligned sequences of DNA, RNA, or Protein are represented inside the matrix, in the form of rows. Gaps are inserted at some point in the sequences to achieve maximum similar character in a column.</p><p>It aims to infer clues about the unknown sequence by inferring biological characteristics of the matched sequence. One of the most challenging tasks in sequence alignment is its repetitive and time-consuming alignment matrix computations (Weiwei and Sanzheng, 2000[<xref ref-type="bibr" rid="R67">67</xref>]). </p></sec><sec><title>Multiple sequence alignment </title><p>By referring to Figure 1<xref ref-type="fig" rid="F1">(Fig. 1)</xref>, we can define multiple sequence alignment (MSA) as the optimal alignment technique of three or more sequences with or without inserting gaps (Loytynoja and Goldman, 2008[<xref ref-type="bibr" rid="R36">36</xref>]). It plays an important role in sequence analysis and can also be used to judge and identify the similarity between DNA, RNA or protein sequences. With these features, MSA is proved as an important tool for prediction of function and&#x2F;or structure (Layeb and Deneche, 2007[<xref ref-type="bibr" rid="R34">34</xref>]) of an unknown protein sequences. </p><p>An MSA can be obtained by inserting gaps &#x201C;-&#x201D; at proper places such that no column in the sequences contains only gap character. Insertion of gaps will result in equal length sequences in the resulting alignment. </p><p>Note 1: Consider an input string N1, N<sub>2</sub>.....N<sub>p</sub> where a MSA maps them to some other string M<sub>1</sub>, M<sub>2</sub>....M<sub>c</sub>, where</p><p>1. &#x7C;M<sub>1</sub>&#x7C; &#x3D; &#x7C;M<sub>2</sub>&#x7C; &#x3D;....&#x3D;&#x7C;M<sub>c</sub>&#x7C;</p><p>2. M<sub>i</sub> by removing all &#x201C;-&#x201D; gap characters is equal to N<sub>i.</sub></p><p>3. None of the column contains only the gap character.</p><p>In MSA, there are various measures to evaluate alignment. </p></sec><sec><title>Gaps </title><p>In order to have the best resulting alignment, gaps are permitted within the sequences along with a user defined mechanism for penalizing these gaps. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. </p><p>The values of gap penalties depend on the choice of matrix such as the PAM250 (Dayhoff et al., 1978[<xref ref-type="bibr" rid="R10">10</xref>]) (refer to PAM matrix section), PAM350 or the Substitution matrices such as BLOSUM which are used for sequence alignment of proteins. A Substitution matrix assigns a score for aligning any possible pair of residues and must balance their values. Adopting a high gap plenty scheme will restrict the appearance of gaps within the alignment. On the other hand, a too low gap plenty scheme will allow the gaps to appear everywhere in the alignment.</p></sec><sec><title>Genetic algorithm </title><p>Genetic algorithm is a type of iterative algorithms which allows an efficient and robust search. In the search process, a genetic algorithm starts with an initial state (population) in the solution space and in every search step, it produces a new and usually a better set of solutions. At each stage, GA moves forward towards producing a better solution which may led to minimize the change of getting trapped into a local extrema (Michalewicz, 1992[<xref ref-type="bibr" rid="R38">38</xref>]). Genetic algorithms are capable of handling large and complex scale problems (Jong, 1998[<xref ref-type="bibr" rid="R28">28</xref>]). Some applications of genetic algorithms for solving MSA problem can be found in (Goldberg, 1987[<xref ref-type="bibr" rid="R17">17</xref>]; Grefenstette and Fitzpatrick, 1985[<xref ref-type="bibr" rid="R20">20</xref>]; Holland,1975[<xref ref-type="bibr" rid="R25">25</xref>]; Hillsdale and Lawrence, 1987[<xref ref-type="bibr" rid="R24">24</xref>]; Buckles et al., 1990[<xref ref-type="bibr" rid="R6">6</xref>]). The references cited above, explain the GA approach and its ability to produce optimal solution for solving MSA problem of protein sequences. With addition to the above, there are various merits of genetic algorithms which can be utilized for prediction, alignment and classification of protein, DNA and RNA sequences and their structural and behavioral study (Dandekar and Argos, 1992[<xref ref-type="bibr" rid="R9">9</xref>]; Unger and Moult, 1993[<xref ref-type="bibr" rid="R64">64</xref>]; van Batenburg et al., 1995[<xref ref-type="bibr" rid="R65">65</xref>]). </p><p>The major elements of genetic algorithm consists of representing a solution space, a fitness function, reproduction, crossover and mutation. In every step of GA operation, the genetic operators were applied to the solution space in order to produce new and better individuals for coming generations. A search may terminate when no further improvement is observed in the coming generation as compared to its previous one or when a predefined condition is met. </p></sec><sec><title>BAliBase </title><p>BAliBase dataset is considered to be the standard dataset for alignment of protein sequences. It consists of variable lengths protein sequences which includes 218 sets of sequences taken from different sources. Here, the sequences are differentiated based on their similarity and structure in PDB database (Neshich et al., 1998[<xref ref-type="bibr" rid="R44">44</xref>]). To evaluate the quality of the obtained alignment, the BAliBase defined two sets of score namely SP Score and TC Score.</p></sec><sec><title>PAM Matrix </title><p>PAM which stands for point accepted mutation is used for the replacement of amino acid in the primary structure of protein. This statement will not involve any point mutation in the DNA of an organism. In general, silent mutation is not considered to be a point accepted mutation or lethal mutation.</p><p>PAM matrices encode the evolutionary change recorded at the amino acid level and are known as amino acid substitution matrices. The PAM matrix is constructed in such a way, that it can easily compare two sequences which are a specific number of PAM units apart. For example, the PAM120 score matrix is used to compare such sequences which are 120 PAM units apart.</p></sec></sec>
    <sec>
      <title>Proposed Approach</title><p>This section detailed about the proposed approach which is based on various parameters and are described below.</p><sec><title>Representation and initial generation </title><p>In the proposed approach, the population is initially randomly generated at first. Based on the largest sequence size, the initially generated population is filled with a random gap sign to make the initially generated sequences equals to the largest sequence in the set. Also, the gaps are inserted within the sequences keeping in mind that the total size of the gap does not exceed 25 &#x25; total length of the largest sequence. After the initialization process is over, the solution set is combined and then mutated for further operation so as to produce new individuals with a defined number of generations (iterations), which is 50 for this experimental study.</p></sec><sec><title>Scoring function </title><p>In this section, a formal definition of the sum-of-pairs of multiple sequence alignment is introduced which is used as a tool to calculate fitness.</p><p>Proteins or genes perform the same function because of their similar sequences. DNA stores all genetic information of an organism while the Proteins act as the building blocks for all the cells. There are total 20 linear chain of amino acid for protein which are denoted as:</p><p>E,P,A,C,G,Q,V,M,T,R,K,W,Y,D,N,H,S,F,L and I. </p><p>Similarly, DNA is represented by four nucleotides namely A, C, G, T. Therefore, in general we usually represent protein and DNA sequence through a string of small alphabetical letters. Here, for every protein sequences the sum of scores based on their fitness functions is calculated. Obtaining a best alignment is dependent upon the scoring criteria followed in order to build that alignment. Therefore, a scoring matrix know as the sum of pair score and the match column score is adopted to calculate the alignment scores between two characters within a column (Otman et al., 2012[<xref ref-type="bibr" rid="R48">48</xref>]).</p><p>For the experiment, the gap penalty is taken as:</p><p>J&#x3D;&#x7B;E,P,A,C,G,Q,V,M,T,R,K,W,Y,D,N,H,S,F,L and I &#x7D;</p><p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-i-001" ></inline-graphic></p><p>Equation (1) suggests that</p><p>If p &#x404; J and q &#x3D; - then the gap penalty is taken as 2.</p><p>If p &#x3D; - and q &#x404; J then the gap penalty is taken as 3.</p><p>And if, p &#x3D; - and q &#x3D; - then the gap penalty will be taken as 1.</p><p>If p &#x404; J and q &#x404; J then use PAM 250 matrix. In case of match occurs refer to PAM 250 (Dayhoff et al., 1978[<xref ref-type="bibr" rid="R10">10</xref>]) matrix available online.</p><p>Here, the gap penalty stated in equation 1 is user defined and will remain fix for a complete set of experiment. Here, the penalty for gap extension and opening is not same.</p></sec><sec><title>Fitness evaluation </title><p>To judge the quality of different alignments based on their scores, a fitness function is proposed which is defined in equation 2.</p><p>For scoring purpose, PAM 250 Matrix has been used as a scoring matrix to calculate score between different alignments. </p><p>In the experiment the fitness is calculated as:-</p><p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-i-002" ></inline-graphic></p><p>Where, </p><p>n &#x3D; number of sequences, <italic>l</italic><italic><sub>i</sub></italic> &#x3D; first sequence, <italic>l</italic><italic><sub>j</sub></italic> &#x3D; second sequence </p><p>The score for each column in an alignment is scored by summing the score of each pair of symbols. The overall alignment score is then calculated by using equation 1 and 2, which should be best possible maximum value.</p></sec><sec><title>Selection strategies description </title><p>The selection methods used in this research is here under:</p><p>Sorting of individuals is done in the mating pool according to their fitness and then every two best individuals are selected for crossover.</p></sec><sec><title>Child generation </title><p>In order to generate a child population of 100 individuals in every generation, two genetic operators namely Crossover and Mutation have been considered for the experimental study, which are described below in details.</p></sec><sec><title>Crossover</title><p>Crossover operation is performed over the two strings of biological sequences by randomly selecting a cutting point and swapping the string from that point with a predefined probability.</p></sec><sec><title>Crossover operator I</title><p>As shown in Figure 2<xref ref-type="fig" rid="F2">(Fig. 2)</xref>, this operator first chooses a column randomly in the parent alignments and defines a cut point there. Then by interchanging the different parts of parents it form two new offsprings, also known as Childs. For doing this type of operation gaps may be added to the resulting offsprings. </p></sec><sec><title>Crossover operator II</title><p>Same as in I and as described in Figure 3<xref ref-type="fig" rid="F3">(Fig. 3)</xref>, this operator also chooses a point in the given parent alignment and cuts the alignment from that point. Again by swapping different parts of parent alignment it produces child alignment by inserting gaps at required positions. </p></sec><sec><title>Mutation </title><p>After crossover, the strings are moved for mutation (Otman et al., 2012[<xref ref-type="bibr" rid="R48">48</xref>]). Mutation prevents the algorithm to be trapped in a local minimum. It distributes the genetic information randomly among other individuals and helps to recover the lost genetic materials. Mutation operation involves randomly flipping of few bits in a chromosome. For example, the string 00100100 might be mutated in its second position to yield 01100100. Mutation operation can happen with very small probability at each bit position in a string.</p><p>The mutation operators are exclusively being used in this experimental study. As we all know the mutation operators are used for regaining the lost genetic operator therefore, in this study the mutation operators are used with a very least probability of 0.01 to improve the overall quality of the sequences or for getting a good aligned sequences. In this approach, when the sequences are subjected for mutation operation, then flipping or swapping of nucleotides is being done within the sequences so as to improve the overall score of the alignment which ultimately results in high quality solutions. Flipping or swapping of nucleotides and placing it to somewhere else in the sequences may results in improving the alignment quality of the sequences. As matching of nucleotides in the same row or column is possible by swapping or flipping of nucleotides. All the defined mutations operators are used one by one to check which of these operators gives a better result in terms of score. The operator which give the highest results is considered and rest are declined for that particular sequences (dataset). </p><p>All the different mutation operators defined were selected at a random basic to solve a given set of problem with a very small probability of 0.01. Here, in the proposed approach when one of the randomly selected mutation operator fails to given an optimal results, then a different mutation operators from the defined one is selected and applied to solve the given problem. All the proposed mutation operators for the experimental analysis are described below.</p></sec><sec><title>Exchange mutation operator</title><p>This mutation operator is explained in Figure 4<xref ref-type="fig" rid="F4">(Fig. 4)</xref> in which, the position of two nucleotide (position 4 and 6) are exchanged which are randomly chosen.</p></sec><sec><title>Reverse mutation operator</title><p>This mutation operator is clearly illustrated in Figure 5<xref ref-type="fig" rid="F5">(Fig. 5)</xref>. Here, a sequence S has taken which is limited by two randomly chosen position 2 and 5. The order of nucleotide in this sequence will be reversed in the same order as covered in the previous operation.</p></sec><sec><title>Position mutation operator</title><p>In this mutation operator, Three nucleotide were randomly chosen which shall take the different positions not necessarily successive 2 &#x3C; 4 &#x3C; 6. The nucleotide who is currently at the position of 2 will take the position of 4 and one who was at 4 will take the position 6 and again the nucleotide holding this position currently will occupy the position of 2. Figure 6<xref ref-type="fig" rid="F6">(Fig. 6)</xref> demonstrate the processes discussed above. </p></sec><sec><title>Inverse mutation operator</title><p>In Figure 7<xref ref-type="fig" rid="F7">(Fig. 7)</xref>, two sections of nucleotide were made by dividing the chromosomes into two sections. All nucleotide in each section are copied and are placed inversely in the same section of a child.</p></sec><sec><title>New generation</title><p>For the coming generation, a 60-40 &#x25; selection scheme of parent - child combination based on their fitness score is implemented. It means that for the coming generation 60 &#x25; of the parent and 40 &#x25; of the child population will be used to produce the next population.</p><p>Other combinations such as 40-60 &#x25; or the 50-50 &#x25; parent - child population has also been considered but, these strategies has not shown any impact in improving the overall quality of the solution and hence not been considered. Also, 100 &#x25; crossover and 100 &#x25; mutation operation were considered along with 40-60 &#x25; or the 50-50 &#x25; parent - child population, but these combinations were not able to bring any changes in the overall quality of the solutions so produced. Table 2<xref ref-type="fig" rid="T2">(Tab. 2)</xref> explain the parameter analysis based on 60-40 &#x25;, 40-60 &#x25; and 50-50 &#x25; parent - child combination along with the results 100 &#x25; crossover and 100 &#x25; mutation operation. It can be observed that the time taken to calculate 60-40 &#x25; selection scheme of parent - child combination is least as compared to 40-60 &#x25; or the 50-50 &#x25; parent- child combination or any other scheme discussed in Table 2<xref ref-type="fig" rid="T2">(Tab. 2)</xref>. The average computation time mentioned in Table 2<xref ref-type="fig" rid="T2">(Tab. 2)</xref> is the time taken to perform the experiments for each datasets. However, no comparative study of computation time with different methods mention in Tables 3<xref ref-type="fig" rid="T3">(Tab. 3)</xref>, 4<xref ref-type="fig" rid="T4">(Tab. 4)</xref> and 5<xref ref-type="fig" rid="T5">(Tab. 5)</xref> were made. As, there is no such data available in the literature study for such type of comparison.</p></sec><sec><title>Termination condition</title><p>The termination conditions used for the experiment are as follows: </p><p>In the experimental study, we have tasted the results on maximum 50 iterations (generations), and hence made the experiment to be terminated after reaching 50 iterations, as there is negligible amount of improvement in the alignment quality.</p></sec></sec>
    <sec sec-type="methods">
      <title>Algorithm for the Proposed Method</title><p>Step 1 : Population initialization x<sub>1</sub>,x<sub>2</sub>,...,x<sub>n.</sub> </p><p>Step 2 : Column(N) &#x3D; 1.2 x n<sub>max. </sub>Gaps (-) may be placed in the sequences for proper alignment. </p><p>Step 3 : Compute fitness.</p><p>Step 4 : Select individuals for genetic operations. Two different genetic operators mainly crossover and mutation is used with probability of 0.8 &#x25; and 0.01 &#x25;.</p><p>Step 5 : Do crossover operation by randomly choosing any one of the defined crossover operator. </p><p>Step 6 : Randomly choose and apply all of the defined mutation operator one by one.</p><p>Step 7 : Check all the four solution quality, and choose the one who is the best among all four solutions in terms of scores.</p><p>Step 8: New population generated and fitness evaluated.</p><p>Step 9 : Stop if sufficient solution quality or max search terms reached, which is 50 iteration.</p></sec>
    <sec>
      <title>Experimental Set Up</title><p>This section gives an overview of the parameters and the systems components used for the experiment.</p><sec><title>Parameters setting for the experiment</title><p>The population size was established to 100 individuals and the maximum number of generations (iteration) was 50 with a crossover probability of 0.8 &#x25;, mutation rate of 0.01 &#x25;. The scoring matrix used for the experiment is PAM 250 for each Protein sequences. Here, the population size of 100 suggests that for each generation&#x2F;iteration the algorithm runs for producing 100 childs with the help of proposed genetic operators. And among these 100 childs so produced, the two best childs based on their scores are selected to be the parents for the next generation.</p></sec><sec><title>System components</title><p>The main objective of this research work is to observe the role of proposed crossover and mutation operators in solving MSA problem of protein sequences in terms of quality and scores of the sequence aligned. Here, quality of an aligned sequence is judged by the scores it obtains after successfully aligning. In this study, the experiments for the proposed approach have been performed using genetic algorithm with C programming on an Intel Core 2 Duo processor having 2.53 GHz CPU with 2 GB RAM running on the Linux platform.</p></sec></sec>
    <sec sec-type="discussion">
      <title>Results and Discussion</title><p>In this section, the experimental methodology followed in this work is detailed. Moreover, results obtained with the proposed method are presented and discussed. </p><p>For all the tests, the different crossover and mutation operators are randomly chosen with equal probability of selection within each generation. To test the proposed approach, the experiments are carried out with different datasets (ref. 1, ref. 2 and ref. 3) of different lengths from the BAliBase database (refer Table 1<xref ref-type="fig" rid="T1">(Tab. 1)</xref>). The author used these datasets for the experimental study because of their performance with other related algorithm, which are gained by referring various literature studies (Devereux et al., 1984[<xref ref-type="bibr" rid="R11">11</xref>]; Jagadamba et al., 2011[<xref ref-type="bibr" rid="R27">27</xref>]; Nguyen and Yi, 2011[<xref ref-type="bibr" rid="R45">45</xref>]; Razmara et al., 2009[<xref ref-type="bibr" rid="R54">54</xref>]; Mott, 2005[<xref ref-type="bibr" rid="R41">41</xref>]). As stated earlier, for every experiment the alignments were performed with the proposed method and were compared with the methods described in the literature study stated earlier. </p><p>For evolution of the proposed approach, the algorithm were executed for 50 independent run (iterations) for 30 datasets (some of all datasets in Table 3<xref ref-type="fig" rid="T3">(Tab. 3)</xref>, 4<xref ref-type="fig" rid="T4">(Tab. 4)</xref> and 5<xref ref-type="fig" rid="T5">(Tab. 5)</xref>) and then the best, average and the worst score were calculated. Table 1<xref ref-type="fig" rid="T1">(Tab. 1)</xref> indicates the best, average and the worst score over different datasets with their corresponding BAliscores. As, the fitness score depends upon the level of similarity among the residue in the sequences therefore, the scores can be either positive or negative. Here, one point is to be noted that if the residues among the comparable sequences are similar, then small numbers of gaps (&#x201C;-&#x201D;) are needed to make the sequences aligned properly. On the other hand, if the majority of the residues are dissimilar then a large number of gaps are needed for necessary sequence alignment.</p><p>To analyze the quality and accuracy of solutions produced by the proposed approach, we have considered a BAliscore, which is an open source program of the BAliBase benchmark. BAliBase scores a solution (multiple sequence alignment) between 0.0 and 1.0. A score of 1.0 indicates that the solution is same or identical to that of manually created reference alignment. Unfortunately, with the proposed approach we are unable to get a score equals to 1(see Tables 3<xref ref-type="fig" rid="T3">(Tab. 3)</xref>, 4<xref ref-type="fig" rid="T4">(Tab. 4)</xref> and 5<xref ref-type="fig" rid="T5">(Tab. 5)</xref>). If the score is 0 then it indicates that nothing matches to the reference alignment. This can be observed with some of the datasets in Table 4<xref ref-type="fig" rid="T4">(Tab. 4)</xref> (reference 3).The score between 0 and 1 indicates that some part matches with the reference alignment. The scores which are closer to 1, gives a better alignment for a given dataset. A comparison over different datasets with different methods is being made in Tables 3<xref ref-type="fig" rid="T3">(Tab. 3)</xref>, 4<xref ref-type="fig" rid="T4">(Tab. 4)</xref> and 5<xref ref-type="fig" rid="T5">(Tab. 5)</xref>. By referring to these tables, we can conclude that the proposed method solution is much more efficient than other methods in terms of scores as indicated in the tables. Figures 8<xref ref-type="fig" rid="F8">(Fig. 8)</xref>, 9<xref ref-type="fig" rid="F9">(Fig. 9)</xref>, 10<xref ref-type="fig" rid="F10">(Fig. 10)</xref>, 11<xref ref-type="fig" rid="F11">(Fig. 11)</xref> and 12<xref ref-type="fig" rid="F12">(Fig. 12)</xref> shows comparative results between the proposed and the other methods discussed in the literature review earlier. Figures 13<xref ref-type="fig" rid="F13">(Fig. 13)</xref>, 14<xref ref-type="fig" rid="F14">(Fig. 14)</xref> and 15<xref ref-type="fig" rid="F15">(Fig. 15)</xref> indicates about the average scores comparison among different methods and gives a clear indication about the superiority of the proposed approach over the others.</p><p>In order to evaluate the overall performance of the proposed method, the average score of all test cases were evaluated (bottom of Tables 3<xref ref-type="fig" rid="T3">(Tab. 3)</xref>, 4<xref ref-type="fig" rid="T4">(Tab. 4)</xref> and 5<xref ref-type="fig" rid="T5">(Tab. 5)</xref>). The average score suggest that the proposed method approach is better among all other methods that are considered. The score is calculated considering the standard BAliBase dataset. The bold faced data&#x60;s in the tables indicates the best scores among the methods.</p><sec><title>Performance of the proposed method with Ref. 1 </title><p>The 14 datasets of reference 1 shown in Table 3<xref ref-type="fig" rid="T3">(Tab. 3)</xref> are of different lengths and sequences (refer Table 1<xref ref-type="fig" rid="T1">(Tab. 1)</xref>). In order to compare the proposed method with respect to BAliscore, the proposed approach were compared with that of CLUSTAL W,MSA-GA, MSA-GA w&#x2F; prealign and SAGA. From comparison in Figure 8<xref ref-type="fig" rid="F8">(Fig. 8)</xref> and 9<xref ref-type="fig" rid="F9">(Fig. 9)</xref>, it can be seen that out of 14 test cases, the proposed method has successfully overcome other methods solutions in 11 test cases and in three test cases, the proposed method solution were very close to the best.</p></sec><sec><title>Performance of the proposed method with Ref. 3 </title><p>In this experimental study, eleven test cases were considered from references 3, again out of 11 test cases the proposed method shows better solution for 9 test cases. Only, RBT-GA for 1wit dataset and PRRP for 1r69 dataset shows better performance than the proposed method. The results are provided in Table 4<xref ref-type="fig" rid="T4">(Tab. 4)</xref> and Figure 10<xref ref-type="fig" rid="F10">(Fig. 10)</xref>, 11<xref ref-type="fig" rid="F11">(Fig. 11)</xref> and 14<xref ref-type="fig" rid="F14">(Fig. 14)</xref>.</p></sec><sec><title>Performance of the proposed method with Ref. 2</title><p>As detailed in Table 5<xref ref-type="fig" rid="T5">(Tab. 5)</xref> and Figure 12<xref ref-type="fig" rid="F12">(Fig. 12)</xref> and 15<xref ref-type="fig" rid="F15">(Fig. 15)</xref>, five dataset from ref. 2 were considered for evaluating the proposed approach with some standard methods such as the CLUSTAL X, SB-PIMA, HMMT, ML-PIMA and PILEUP8. Experiment on benchmarks (BAliBase 2.0) were conducted and observed that the proposed method technique is much efficient than the other compared ones.</p></sec><sec><title>Performance characterization of proposed algorithm</title><p>Two different components namely the proposed genetic operators and random population initialization plays an important role in making the performance of the proposed algorithm better than other algorithms. Two different set of experiments have been designed in order to investigate the performance of the proposed algorithm. In the first case, a different approach for population initialization is adopted (different than the proposed scheme). Here, the proposed algorithm was made to run with a randomly generated population, constructed with the help of guide tree. In the second case, a hill climbing approach (Huiying and Zheng, 2013[<xref ref-type="bibr" rid="R26">26</xref>]) (for searching instead of proposed algorithm) has been used, which starts from the same random initial population used in this work. The fitness evaluation scheme will remain the same as discussed in the proposed approach section. A total of fifteen BAliBase datasets (five from each ref 1, 2 and 3) is considered for the experiments. Each datasets was made to run with the proposed algorithm (with two different cases stated above) for fifty iterations. Based on the BAliBase score the best scores were recorded, and it was analyzed that the proposed algorithm with random initial population generation outperformed the guide tree initial generation technique for all the datasets. The average improvement of 9.72 &#x25; was recorded with randomly generated population. Similarly, with hill climbing approach the proposed algorithm was recorded with an average improvement of 7.23 &#x25;. Thus, with the above discussions we can say that the proposed algorithm with randomly generated initial population and proposed genetic operator is superior to other algorithm in terms of performances. The detail experimental results are available in Table 6<xref ref-type="fig" rid="T6">(Tab. 6)</xref>.</p></sec></sec>
    <sec sec-type="conclusions">
      <title>Conclusion</title><p>As we all know that the multiple sequence alignment is a known problem in bioinformatics, but still MSA remains a challenging task to explore. The arrangement of molecular sequences within an alignment to find similarities and differences among them is not an easy task, due to the complex size of the sequences and the search space. Because of the ability to handle complex scale problems, genetic algorithm is used as a genuine solution for the multiple sequence alignment problem. In this paper, a novel approach has been developed, which uses genetic algorithm for performing multiple sequence alignment. The motive of the study reported in this paper is to judge the efficiency of the proposed approach by comparing it with different algorithm over standard datasets. In order to evaluate the efficiency and feasibility of the proposed approach, a benchmark datasets from BAliBase 2.0 is considered, because most of the methods discussed in this paper uses BaliBase datasets to access the quality of the multiple sequence alignments. When compared to other methods listed in (Notredame and Higgins,1996[<xref ref-type="bibr" rid="R46">46</xref>]; Gondro and Kinghorn, 2007[<xref ref-type="bibr" rid="R18">18</xref>]; Taheri and Zomaya, 2009[<xref ref-type="bibr" rid="R57">57</xref>]; Thompson et al., 1997[<xref ref-type="bibr" rid="R60">60</xref>]; Eddy, 1995[<xref ref-type="bibr" rid="R14">14</xref>]; Gotoh, 1996[<xref ref-type="bibr" rid="R19">19</xref>]; Devereux et al., 1984[<xref ref-type="bibr" rid="R11">11</xref>]; Morgenstern et al., 1996[<xref ref-type="bibr" rid="R40">40</xref>]), the proposed method improves the overall quality of the alignment. The experimental result provides a better scope for multiple sequences alignment, as there is an increase in the alignment quality, which can be observed by the scores of different datasets. It was also observed that the proposed method solution gives some unsatisfied results in some test cases. By the above discussions, we can easily conclude that the innovative approach adopted in this paper gives a better and improved result when compared with other methods in most of the testcases.</p></sec>
  </body>
  <back>
    <ref-list>
      <ref id="R1">
        <label>1</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Aniba</surname>
              <given-names>MR</given-names>
            </name>
            <name>
              <surname>Poch</surname>
              <given-names>O</given-names>
            </name>
          </person-group>
          <article-title>Thompson JD. Issues in bioinformatics benchmarking: the case study of multiple sequence alignment</article-title>
          <source>Nucleic Acids Res</source>
          <year>2010</year>
          <volume>38</volume>
          <fpage>7353–63</fpage>
        </citation>
      </ref>
      <ref id="R2">
        <label>2</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Ankit</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Huang</surname>
              <given-names>X</given-names>
            </name>
          </person-group>
          <article-title>Pairwise statistical significance of local sequence alignment using substitution matrices with sequence-pair-specific distance</article-title>
          <source>Proc Int Conf Inform Technol</source>
          <year>2008</year>
          <fpage>94</fpage>
          <lpage>99</lpage>
        </citation>
      </ref>
      <ref id="R3">
        <label>3</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Auyeung</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Melcher</surname>
              <given-names>U</given-names>
            </name>
          </person-group>
          <article-title>Evaluations of protein sequence alignments using structural information</article-title>
          <source>Int Conf Inform Technol: Coding and Computing</source>
          <year>2005</year>
          <volume>2</volume>
          <fpage>748</fpage>
          <lpage>749</lpage>
        </citation>
      </ref>
      <ref id="R4">
        <label>4</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Bhattacharjee</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Sultana</surname>
              <given-names>KZ</given-names>
            </name>
            <name>
              <surname>Shams</surname>
              <given-names>Z</given-names>
            </name>
          </person-group>
          <article-title>Dynamic and parallel approaches to optimal evolutionary tree construction</article-title>
          <source>Can Conf Electr Comp Engin</source>
          <year>2006</year>
          <fpage>119</fpage>
          <lpage>112</lpage>
        </citation>
      </ref>
      <ref id="R5">
        <label>5</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Blackshields</surname>
              <given-names>G</given-names>
            </name>
            <name>
              <surname>Wallace</surname>
              <given-names>IM</given-names>
            </name>
            <name>
              <surname>Larkin</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Higgins</surname>
              <given-names>DG</given-names>
            </name>
          </person-group>
          <article-title>Analysis and comparison of benchmarks for multiple sequence alignment</article-title>
          <source>In Silico Biol</source>
          <year>2006</year>
          <volume>6</volume>
          <fpage>321–39</fpage>
        </citation>
      </ref>
      <ref id="R6">
        <label>6</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Buckles</surname>
              <given-names>BP</given-names>
            </name>
            <name>
              <surname>Petry</surname>
              <given-names>FE</given-names>
            </name>
            <name>
              <surname>Kuester</surname>
              <given-names>RL</given-names>
            </name>
          </person-group>
          <article-title>Schema survival rates and heuristic search in genetic algorithms</article-title>
          <year>1990</year>
          <conf-name>Proc Tools Artificial Intelligence</conf-name>
          <publisher-loc>Los Alamitos CA</publisher-loc>
          <publisher-name>IEEE Comput Soc Press</publisher-name>
        </citation>
      </ref>
      <ref id="R7">
        <label>7</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Cai</surname>
              <given-names>L</given-names>
            </name>
            <name>
              <surname>Juedes</surname>
              <given-names>D</given-names>
            </name>
            <name>
              <surname>Liakhovitch</surname>
              <given-names>E</given-names>
            </name>
          </person-group>
          <article-title>Evolutionary computation techniques for multiple sequence alignment</article-title>
          <source>Proc CEC</source>
          <year>2000</year>
          <fpage>829–35</fpage>
        </citation>
      </ref>
      <ref id="R8">
        <label>8</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Changjin</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Tewfik</surname>
              <given-names>AH</given-names>
            </name>
          </person-group>
          <article-title>Heuristic reusable dynamic programming: efficient updates of local sequence alignment</article-title>
          <source>IEEE&#x2F;ACM Trans Comput Biol Bioinform</source>
          <year>2009</year>
          <volume>6</volume>
          <fpage>570</fpage>
          <lpage>582</lpage>
        </citation>
      </ref>
      <ref id="R9">
        <label>9</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Dandekar</surname>
              <given-names>T</given-names>
            </name>
            <name>
              <surname>Argos</surname>
              <given-names>P</given-names>
            </name>
          </person-group>
          <article-title>Potential of genetic algorithms in protein folding and protein engineering simulations</article-title>
          <source>Protein Eng</source>
          <year>1992</year>
          <volume>5</volume>
          <fpage>637–45</fpage>
        </citation>
      </ref>
      <ref id="R10">
        <label>10</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Dayhoff</surname>
              <given-names>MO</given-names>
            </name>
            <name>
              <surname>Schwartz</surname>
              <given-names>RM</given-names>
            </name>
            <name>
              <surname>Orcutt</surname>
              <given-names>BC</given-names>
            </name>
          </person-group>
          <article-title>A model of evolutionary change in proteins</article-title>
          <source>Atlas Protein Sequence Structure</source>
          <year>1978</year>
          <volume>5</volume>
          <fpage>345–51</fpage>
        </citation>
      </ref>
      <ref id="R11">
        <label>11</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Devereux</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Haeberli</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Smithies</surname>
              <given-names>O</given-names>
            </name>
          </person-group>
          <article-title>A comprehensive set of sequence analysis programs for the VAX</article-title>
          <source>Nucleic Acids Res</source>
          <year>1984</year>
          <volume>12</volume>
          <fpage>387–95</fpage>
        </citation>
      </ref>
      <ref id="R12">
        <label>12</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Dongardive</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Abraham</surname>
              <given-names>S</given-names>
            </name>
          </person-group>
          <article-title>Finding consensus by sequence evolution: An application of differential evolution</article-title>
          <source>World Congress on Information and Communication Technologies</source>
          <year>2012</year>
          <fpage>248</fpage>
          <lpage>253</lpage>
        </citation>
      </ref>
      <ref id="R13">
        <label>13</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Eddy</surname>
              <given-names>S</given-names>
            </name>
          </person-group>
          <article-title>Profile hidden Markov models</article-title>
          <source>Bioinformatics</source>
          <year>1998</year>
          <volume>14</volume>
          <fpage>755–63</fpage>
        </citation>
      </ref>
      <ref id="R14">
        <label>14</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Eddy</surname>
              <given-names>SR</given-names>
            </name>
          </person-group>
          <article-title>Multiple alignment using hidden Markov models</article-title>
          <source>Proc Int Conf Intell Syst Mol Biol</source>
          <year>1995</year>
          <volume>3</volume>
          <fpage>114–20</fpage>
        </citation>
      </ref>
      <ref id="R15">
        <label>15</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Feng</surname>
              <given-names>DF</given-names>
            </name>
            <name>
              <surname>Dolittle</surname>
              <given-names>RF</given-names>
            </name>
          </person-group>
          <article-title>Progressive sequence alignment as a prerequisite to correct phylogenetic trees</article-title>
          <source>J Mol Evol</source>
          <year>1987</year>
          <volume>25</volume>
          <fpage>351–60</fpage>
        </citation>
      </ref>
      <ref id="R16">
        <label>16</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Gelly</surname>
              <given-names>JC</given-names>
            </name>
            <name>
              <surname>Joseph</surname>
              <given-names>AP</given-names>
            </name>
            <name>
              <surname>Srinivasan</surname>
              <given-names>N</given-names>
            </name>
            <name>
              <surname>Brevern</surname>
              <given-names>AG</given-names>
            </name>
          </person-group>
          <article-title>iPBA: a tool for protein structure comparison using sequence alignment strategies</article-title>
          <source>Nucleic Acids Res</source>
          <year>2011</year>
          <volume>39</volume>
          <fpage>18–23</fpage>
        </citation>
      </ref>
      <ref id="R17">
        <label>17</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Goldberg</surname>
              <given-names>DE</given-names>
            </name>
          </person-group>
          <article-title>Simple genetic algorithms and the minimal, deceptive problem</article-title>
          <source>Genetic Algorithms and Simulated Annealing</source>
          <year>1987</year>
          <fpage>74–8</fpage>
        </citation>
      </ref>
      <ref id="R18">
        <label>18</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Gondro</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Kinghorn</surname>
              <given-names>BP</given-names>
            </name>
          </person-group>
          <article-title>A simple genetic algorithm for multiple sequence alignment</article-title>
          <source>Genet Mol Res</source>
          <year>2007</year>
          <volume>6</volume>
          <fpage>964–82</fpage>
        </citation>
      </ref>
      <ref id="R19">
        <label>19</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Gotoh</surname>
              <given-names>O</given-names>
            </name>
          </person-group>
          <article-title>An improved algorithm for matching biological sequences</article-title>
          <source>J Mol Biol</source>
          <year>1982</year>
          <volume>162</volume>
          <fpage>705</fpage>
          <lpage>708</lpage>
        </citation>
      </ref>
      <ref id="R20">
        <label>20</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Grefenstette</surname>
              <given-names>JJ</given-names>
            </name>
            <name>
              <surname>Fitzpatrick</surname>
              <given-names>JM</given-names>
            </name>
          </person-group>
          <article-title>Genetic search with approximate function evaluations</article-title>
          <source>Proc Int Conf Genetic Algorithms Appl</source>
          <year>1985</year>
          <fpage>112–20</fpage>
        </citation>
      </ref>
      <ref id="R21">
        <label>21</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Hamidi</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Naghibzadeh</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Sadri</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Protein multiple sequence alignment based on secondary structure similarity</article-title>
          <source>International Conference on Advances in Computing, Communications and Informatics</source>
          <year>2013</year>
          <fpage>1224</fpage>
          <lpage>1229</lpage>
        </citation>
      </ref>
      <ref id="R22">
        <label>22</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Haoyue</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Dingyu</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Zhang</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Cangzhi</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Conserved secondary structure prediction for similar highly group of related RNA sequences</article-title>
          <source>Control and Decision Conference</source>
          <year>2009</year>
          <fpage>5158</fpage>
          <lpage>5163</lpage>
        </citation>
      </ref>
      <ref id="R23">
        <label>23</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Hicks</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Wheeler</surname>
              <given-names>DA</given-names>
            </name>
            <name>
              <surname>Plon</surname>
              <given-names>SE</given-names>
            </name>
            <name>
              <surname>Kimmel</surname>
              <given-names>M</given-names>
            </name>
          </person-group>
          <article-title>Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed</article-title>
          <source>Hum Mutat</source>
          <year>2011</year>
          <volume>32</volume>
          <fpage>661–8</fpage>
        </citation>
      </ref>
      <ref id="R24">
        <label>24</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Hillsdale</surname>
              <given-names>NJ</given-names>
            </name>
            <name>
              <surname>Lawrence</surname>
              <given-names>E</given-names>
            </name>
          </person-group>
          <article-title>Genetic algorithms and classifier systems: foundations and future directions genetic algorithms their applications</article-title>
          <source>Proc 2nd Int Conf Genetic Algorithms</source>
          <year>1987</year>
          <fpage>82</fpage>
          <lpage>89</lpage>
        </citation>
      </ref>
      <ref id="R25">
        <label>25</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Holland</surname>
              <given-names>JH</given-names>
            </name>
          </person-group>
          <source>Adoption in natural and artificial systems</source>
          <year>1975</year>
          <publisher-loc>Ann Arbor, MI</publisher-loc>
          <publisher-name>Univ. Michigan Press</publisher-name>
        </citation>
      </ref>
      <ref id="R26">
        <label>26</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Huiying</surname>
              <given-names>X</given-names>
            </name>
            <name>
              <surname>Zheng</surname>
              <given-names>Z</given-names>
            </name>
          </person-group>
          <article-title>Hill-climbing genetic algorithm optimization in cognitive radio decision engine</article-title>
          <source>IEEE Int Conf Commun Technol</source>
          <year>2013</year>
          <fpage>17</fpage>
          <lpage>19</lpage>
        </citation>
      </ref>
      <ref id="R27">
        <label>27</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Jagadamba</surname>
              <given-names>PVSL</given-names>
            </name>
            <name>
              <surname>Babu</surname>
              <given-names>MSP</given-names>
            </name>
            <name>
              <surname>Rao</surname>
              <given-names>AA</given-names>
            </name>
          </person-group>
          <article-title>An improved algorithm for multiple sequence alignment using particle swarm optimization</article-title>
          <year>2011</year>
          <conf-name>IEEE 2nd International Conference on Software Engineering and Service Science (ICSESS)</conf-name>
          <fpage>544</fpage>
          <lpage>547</lpage>
        </citation>
      </ref>
      <ref id="R28">
        <label>28</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Jong</surname>
              <given-names>K</given-names>
            </name>
          </person-group>
          <article-title>Learning with genetic algorithms: An overview</article-title>
          <source>Machine learning 3</source>
          <year>1988</year>
          <publisher-loc>Hingham, MA</publisher-loc>
          <publisher-name>Kluwer</publisher-name>
          <fpage>121</fpage>
          <lpage>123</lpage>
        </citation>
      </ref>
      <ref id="R29">
        <label>29</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Katoh</surname>
              <given-names>K</given-names>
            </name>
            <name>
              <surname>Kuma</surname>
              <given-names>K</given-names>
            </name>
            <name>
              <surname>Toh</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Miyata</surname>
              <given-names>T</given-names>
            </name>
          </person-group>
          <article-title>MAFFT version 5: Improvement in accuracy of multiple sequence alignment</article-title>
          <source>Nucleic Acids Res</source>
          <year>2005</year>
          <volume>33</volume>
          <fpage>511</fpage>
          <lpage>518</lpage>
        </citation>
      </ref>
      <ref id="R30">
        <label>30</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Kececioglu</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Starrett</surname>
              <given-names>D</given-names>
            </name>
          </person-group>
          <source>Aligning alignments exactly</source>
          <year>2004</year>
          <publisher-name>RECOMB</publisher-name>
        </citation>
      </ref>
      <ref id="R31">
        <label>31</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Kimura</surname>
              <given-names>M</given-names>
            </name>
          </person-group>
          <article-title>A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences</article-title>
          <source>J Mol Evol</source>
          <year>1980</year>
          <volume>16</volume>
          <fpage>111–20</fpage>
        </citation>
      </ref>
      <ref id="R32">
        <label>32</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Kirkpatrick</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Gelatt</surname>
            </name>
            <name>
              <surname>JCD</surname>
            </name>
            <name>
              <surname>Vecchi</surname>
              <given-names>MP</given-names>
            </name>
          </person-group>
          <article-title>Optimization by simulated annealing</article-title>
          <source>Science</source>
          <year>1983</year>
          <volume>220</volume>
          <fpage>671–80</fpage>
        </citation>
      </ref>
      <ref id="R33">
        <label>33</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Kupis</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Mandziuk</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Evolutionary-progressive method for multiple sequence alignment</article-title>
          <year>2007</year>
          <conf-name>IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology</conf-name>
          <fpage>291</fpage>
          <lpage>297</lpage>
        </citation>
      </ref>
      <ref id="R34">
        <label>34</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Layeb</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Deneche</surname>
              <given-names>AH</given-names>
            </name>
          </person-group>
          <article-title>Multiple sequence alignment by immune artificial system</article-title>
          <year>2007</year>
          <conf-name>IEEE&#x2F;ACS International Conference on Computer Systems and Applications</conf-name>
          <fpage>336</fpage>
          <lpage>342</lpage>
        </citation>
      </ref>
      <ref id="R35">
        <label>35</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Li</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Ma</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Kisman</surname>
              <given-names>D</given-names>
            </name>
            <name>
              <surname>Tromp</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Pattern Hunter II: highly sensitive and fast homology search</article-title>
          <source>J Bioinform Comput Biol</source>
          <year>2004</year>
          <volume>2</volume>
          <fpage>417</fpage>
          <lpage>439</lpage>
        </citation>
      </ref>
      <ref id="R36">
        <label>36</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>L&#xF6;ytynoja</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Goldman</surname>
              <given-names>N</given-names>
            </name>
          </person-group>
          <article-title>Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis</article-title>
          <source>Science</source>
          <year>2008</year>
          <volume>320</volume>
          <fpage>1632–5</fpage>
        </citation>
      </ref>
      <ref id="R37">
        <label>37</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Ma</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Tromp</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>M</given-names>
            </name>
          </person-group>
          <article-title>Pattern Hunter: faster and more sensitive homology search</article-title>
          <source>Bioinformatics</source>
          <year>2002</year>
          <volume>18</volume>
          <fpage>440</fpage>
          <lpage>445</lpage>
        </citation>
      </ref>
      <ref id="R38">
        <label>38</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Michalewicz</surname>
              <given-names>Z</given-names>
            </name>
          </person-group>
          <source>Genetic Algorithms &#x2B; Data Structures &#x3D; Evolution Programs</source>
          <year>1992</year>
          <publisher-loc>New York</publisher-loc>
          <publisher-name>Springer-Verlag</publisher-name>
        </citation>
      </ref>
      <ref id="R39">
        <label>39</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Mohsen</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Balaji</surname>
              <given-names>P</given-names>
            </name>
            <name>
              <surname>Devavrat</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Mayank</surname>
              <given-names>S</given-names>
            </name>
          </person-group>
          <article-title>Iterative scheduling algorithms</article-title>
          <year>2007</year>
          <conf-name>IEEE INFOCOM Proc</conf-name>
        </citation>
      </ref>
      <ref id="R40">
        <label>40</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Morgenstern</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Dress</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Werner</surname>
              <given-names>T</given-names>
            </name>
          </person-group>
          <article-title>Multiple DNA and protein sequence alignment based on segment-to-segment comparison</article-title>
          <source>Proc Natl Acad Sci USA</source>
          <year>1996</year>
          <volume>93</volume>
          <fpage>12098–103</fpage>
        </citation>
      </ref>
      <ref id="R41">
        <label>41</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Mott</surname>
              <given-names>R</given-names>
            </name>
          </person-group>
          <source>Alignment: statistical significance</source>
          <year>2005</year>
          <publisher-name>Encyclopedia Life Science</publisher-name>
        </citation>
      </ref>
      <ref id="R42">
        <label>42</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Naznin</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Sarker</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Essam</surname>
              <given-names>D</given-names>
            </name>
          </person-group>
          <article-title>Progressive alignment method using genetic algorithm for multiple sequence alignment</article-title>
          <source>IEEE Transactions on Evolutionary Computation</source>
          <year>2012</year>
          <volume>16</volume>
          <fpage>615</fpage>
          <lpage>631</lpage>
        </citation>
      </ref>
      <ref id="R43">
        <label>43</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Needleman</surname>
              <given-names>SB</given-names>
            </name>
            <name>
              <surname>Wunsch</surname>
              <given-names>CD</given-names>
            </name>
          </person-group>
          <article-title>A general method applicable to the search for similarities in the amino acid sequence of two proteins</article-title>
          <source>J Mol Biol</source>
          <year>1970</year>
          <volume>48</volume>
          <fpage>443–53</fpage>
        </citation>
      </ref>
      <ref id="R44">
        <label>44</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Neshich</surname>
              <given-names>G</given-names>
            </name>
            <name>
              <surname>Togawa</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Vilella</surname>
              <given-names>W</given-names>
            </name>
            <name>
              <surname>Honig</surname>
              <given-names>B</given-names>
            </name>
          </person-group>
          <article-title>STING (Sequence to and withIn graphics). PDB viewer</article-title>
          <source>Protein Data Bank Quart Newslett</source>
          <year>1998</year>
          <volume>85</volume>
          <fpage>6</fpage>
          <lpage>7</lpage>
        </citation>
      </ref>
      <ref id="R45">
        <label>45</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Nguyen</surname>
              <given-names>KD</given-names>
            </name>
            <name>
              <surname>Yi</surname>
              <given-names>P</given-names>
            </name>
          </person-group>
          <article-title>An improved scoring method for protein residue conservation and multiple sequence alignment</article-title>
          <source>IEEE Transactions on NanoBioscience</source>
          <year>2011</year>
          <volume>10</volume>
          <fpage>275</fpage>
          <lpage>285</lpage>
        </citation>
      </ref>
      <ref id="R46">
        <label>46</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Notredame</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Higgins</surname>
              <given-names>DG</given-names>
            </name>
          </person-group>
          <article-title>SAGA: Sequence alignment by genetic algorithm</article-title>
          <source>Nucleic Acids Res</source>
          <year>1996</year>
          <volume>24</volume>
          <fpage>1515–24</fpage>
        </citation>
      </ref>
      <ref id="R47">
        <label>47</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Notredame</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Higgins</surname>
              <given-names>DG</given-names>
            </name>
            <name>
              <surname>Heringa</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>T-coffee: A novel method for fast and accurate multiple sequence alignment</article-title>
          <source>J Mol Biol</source>
          <year>2000</year>
          <volume>302</volume>
          <fpage>205–17</fpage>
        </citation>
      </ref>
      <ref id="R48">
        <label>48</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Otman</surname>
              <given-names>A</given-names>
            </name>
            <name>
              <surname>Jaafar</surname>
              <given-names>A</given-names>
            </name>
          </person-group>
          <article-title>Chakir TAJANI analyzing the performance of mutation operators to solve the travelling salesman problem</article-title>
          <source>Int J Emerging Sciences</source>
          <year>2012</year>
          <volume>2</volume>
          <fpage>61</fpage>
          <lpage>67</lpage>
        </citation>
      </ref>
      <ref id="R49">
        <label>49</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Pearson</surname>
              <given-names>WR</given-names>
            </name>
          </person-group>
          <article-title>Flexible sequence similarity searching with the FASTA3 program package</article-title>
          <source>Methods Mol Biol</source>
          <year>2000</year>
          <volume>132</volume>
          <fpage>185</fpage>
          <lpage>219</lpage>
        </citation>
      </ref>
      <ref id="R50">
        <label>50</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Pei</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Grishin</surname>
              <given-names>N</given-names>
            </name>
          </person-group>
          <article-title>PROMALS: towards accurate multiple sequence alignments of distantly related proteins</article-title>
          <source>Bioinformatics</source>
          <year>2007</year>
          <volume>23</volume>
          <fpage>802–8</fpage>
        </citation>
      </ref>
      <ref id="R51">
        <label>51</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Peng</surname>
              <given-names>Y</given-names>
            </name>
            <name>
              <surname>Dong</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Zheng</surname>
              <given-names>H</given-names>
            </name>
          </person-group>
          <article-title>Research on genetic algorithm based on pyramid model</article-title>
          <year>2011</year>
          <conf-name>2nd International Symposium on Intelligence Information Processing and Trusted Computing</conf-name>
          <fpage>83</fpage>
          <lpage>86</lpage>
        </citation>
      </ref>
      <ref id="R52">
        <label>52</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Pengfei</surname>
              <given-names>G</given-names>
            </name>
            <name>
              <surname>Xuezhi</surname>
              <given-names>Wa</given-names>
            </name>
            <name>
              <surname>Yingshi</surname>
              <given-names>H</given-names>
            </name>
          </person-group>
          <article-title>The enhanced genetic algorithms for the optimization design</article-title>
          <year>2010</year>
          <volume>7</volume>
          <conf-name>3rd International Conference on Biomedical Engineering and Informatics</conf-name>
          <fpage>2990</fpage>
          <lpage>2994</lpage>
        </citation>
      </ref>
      <ref id="R53">
        <label>53</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Pop</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Salzberg</surname>
              <given-names>SL</given-names>
            </name>
          </person-group>
          <article-title>Bioinformatics challenges of new sequencing technology</article-title>
          <source>Trends Gene</source>
          <year>2008</year>
          <volume>24</volume>
          <fpage>142–9</fpage>
        </citation>
      </ref>
      <ref id="R54">
        <label>54</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Razmara</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Deris</surname>
              <given-names>SB</given-names>
            </name>
            <name>
              <surname>Parvizpour</surname>
              <given-names>S</given-names>
            </name>
          </person-group>
          <article-title>Text-based protein structure modeling for structure comparison</article-title>
          <year>2009</year>
          <conf-name>International Conference of Soft Computing and Pattern Recognition</conf-name>
          <fpage>490</fpage>
          <lpage>496</lpage>
        </citation>
      </ref>
      <ref id="R55">
        <label>55</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Sellers</surname>
              <given-names>PH</given-names>
            </name>
          </person-group>
          <article-title>Pattern recognition in genetic sequences by mismatch density</article-title>
          <source>Bull Math Biol</source>
          <year>1984</year>
          <volume>46</volume>
          <fpage>501</fpage>
          <lpage>514</lpage>
        </citation>
      </ref>
      <ref id="R56">
        <label>56</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Smith</surname>
              <given-names>RF</given-names>
            </name>
            <name>
              <surname>Smith</surname>
              <given-names>TF</given-names>
            </name>
          </person-group>
          <article-title>Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling</article-title>
          <source>Protein Eng</source>
          <year>1992</year>
          <volume>5</volume>
          <fpage>35–41</fpage>
        </citation>
      </ref>
      <ref id="R57">
        <label>57</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Taheri</surname>
              <given-names>J</given-names>
            </name>
            <name>
              <surname>Zomaya</surname>
              <given-names>AY</given-names>
            </name>
          </person-group>
          <article-title>RBT-GA: A novel metaheuristic for solving the multiple sequence alignment problem</article-title>
          <source>BMC Genomics</source>
          <year>2009</year>
          <volume>10</volume>
          <fpage>1–11</fpage>
        </citation>
      </ref>
      <ref id="R58">
        <label>58</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Taylor</surname>
              <given-names>WR</given-names>
            </name>
          </person-group>
          <article-title>Protein structure comparison using SAP</article-title>
          <source>Methods Mol Biol</source>
          <year>2000</year>
          <volume>143</volume>
          <fpage>19–32</fpage>
        </citation>
      </ref>
      <ref id="R59">
        <label>59</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Thompson</surname>
              <given-names>JD</given-names>
            </name>
            <name>
              <surname>Gibson</surname>
              <given-names>TJ</given-names>
            </name>
            <name>
              <surname>Plewniak</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Jeanmougin</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Higgins</surname>
              <given-names>DG</given-names>
            </name>
          </person-group>
          <article-title>The CLUSTAL&#x2212;X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools</article-title>
          <source>Nucleic Acids Res</source>
          <year>1997</year>
          <volume>25</volume>
          <fpage>4876–82</fpage>
        </citation>
      </ref>
      <ref id="R60">
        <label>60</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Thompson</surname>
              <given-names>JD</given-names>
            </name>
            <name>
              <surname>Higgins</surname>
              <given-names>DG</given-names>
            </name>
            <name>
              <surname>Gibson</surname>
              <given-names>TJ</given-names>
            </name>
          </person-group>
          <article-title>CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice</article-title>
          <source>Nucleic Acids Re</source>
          <year>1994</year>
          <volume>22</volume>
          <fpage>4673–80</fpage>
        </citation>
      </ref>
      <ref id="R61">
        <label>61</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Thompson</surname>
              <given-names>JD</given-names>
            </name>
            <name>
              <surname>Linard</surname>
              <given-names>B</given-names>
            </name>
            <name>
              <surname>Lecompte</surname>
              <given-names>O</given-names>
            </name>
            <name>
              <surname>Poch</surname>
              <given-names>O</given-names>
            </name>
          </person-group>
          <article-title>A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives</article-title>
          <source>PLoS ONE</source>
          <year>2011</year>
          <volume>6</volume>
          <issue>3</issue>
          <fpage>e18093</fpage>
        </citation>
      </ref>
      <ref id="R62">
        <label>62</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Thompson</surname>
              <given-names>JD</given-names>
            </name>
            <name>
              <surname>Plewniak</surname>
              <given-names>F</given-names>
            </name>
            <name>
              <surname>Poch</surname>
              <given-names>O</given-names>
            </name>
          </person-group>
          <article-title>A comprehensive comparison of multiple sequence alignment programs</article-title>
          <source>Nucleic Acids Res</source>
          <year>1999</year>
          <volume>27</volume>
          <fpage>2682–90</fpage>
        </citation>
      </ref>
      <ref id="R63">
        <label>63</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Ulder</surname>
              <given-names>NLJ</given-names>
            </name>
            <name>
              <surname>Aarts</surname>
              <given-names>EHL</given-names>
            </name>
            <name>
              <surname>Bandelt</surname>
              <given-names>HJ</given-names>
            </name>
            <name>
              <surname>Van Laarhoven</surname>
              <given-names>PJM</given-names>
            </name>
            <name>
              <surname>Pesch</surname>
              <given-names>E</given-names>
            </name>
          </person-group>
          <article-title>Genetic local search algorithms for the traveling salesman problem</article-title>
          <year>1991</year>
          <volume>496</volume>
          <conf-name>Proc 1st Workshop PPSN</conf-name>
          <fpage>109–16</fpage>
        </citation>
      </ref>
      <ref id="R64">
        <label>64</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Unger</surname>
              <given-names>R</given-names>
            </name>
            <name>
              <surname>Moult</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Genetic algorithms for protein folding simulations</article-title>
          <source>J Mol Biol</source>
          <year>1993</year>
          <volume>231</volume>
          <fpage>75–81</fpage>
        </citation>
      </ref>
      <ref id="R65">
        <label>65</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>van Batenburg</surname>
              <given-names>FHD</given-names>
            </name>
            <name>
              <surname>Gultyaev</surname>
              <given-names>AP</given-names>
            </name>
            <name>
              <surname>Pleij</surname>
              <given-names>CWA</given-names>
            </name>
          </person-group>
          <article-title>An APL programmed genetic algorithm for the prediction of RNA secondary structure</article-title>
          <source>J Theor Biol</source>
          <year>1995</year>
          <volume>174</volume>
          <fpage>269–80</fpage>
        </citation>
      </ref>
      <ref id="R66">
        <label>66</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Wei-C</surname>
              <given-names>C</given-names>
            </name>
            <name>
              <surname>Yu</surname>
              <given-names>JC</given-names>
            </name>
            <name>
              <surname>Chien</surname>
              <given-names>CC</given-names>
            </name>
            <name>
              <surname>Der</surname>
              <given-names>TL</given-names>
            </name>
            <name>
              <surname>Jan</surname>
              <given-names>MH</given-names>
            </name>
          </person-group>
          <article-title>Optimizing a map reduce module of preprocessing high-throughput DNA sequencing data</article-title>
          <year>2013</year>
          <conf-name>IEEE International Conference on Big Data</conf-name>
          <fpage>6</fpage>
          <lpage>9</lpage>
        </citation>
      </ref>
      <ref id="R67">
        <label>67</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Weiwei</surname>
              <given-names>G</given-names>
            </name>
            <name>
              <surname>Sanzheng</surname>
              <given-names>Q</given-names>
            </name>
          </person-group>
          <article-title>Multithreaded implementation of a biomolecular sequence alignment algorithm-software&#x2F;information technology</article-title>
          <year>2000</year>
          <volume>1</volume>
          <conf-name>Canadian Conference on Electrical and Computer Engineering</conf-name>
          <fpage>494</fpage>
          <lpage>498</lpage>
        </citation>
      </ref>
      <ref id="R68">
        <label>68</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Wen</surname>
              <given-names>WC</given-names>
            </name>
            <name>
              <surname>Tan</surname>
              <given-names>HT</given-names>
            </name>
          </person-group>
          <article-title>Statistical characterization of error sequences and its applications to error control</article-title>
          <year>1996</year>
          <volume>2</volume>
          <conf-name>Proceedings of Digital Signal Processing Applications</conf-name>
          <fpage>625</fpage>
          <lpage>629</lpage>
        </citation>
      </ref>
      <ref id="R69">
        <label>69</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Wong</surname>
              <given-names>WC</given-names>
            </name>
            <name>
              <surname>Maurer Stroh</surname>
              <given-names>S</given-names>
            </name>
            <name>
              <surname>Eisenhaber</surname>
              <given-names>F</given-names>
            </name>
          </person-group>
          <article-title>More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology</article-title>
          <source>PLoS Comput Biol</source>
          <year>2010</year>
          <volume>6</volume>
          <issue>7</issue>
          <fpage>e1000867</fpage>
        </citation>
      </ref>
      <ref id="R70">
        <label>70</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Yonghua</surname>
              <given-names>H</given-names>
            </name>
            <name>
              <surname>Bin</surname>
              <given-names>M</given-names>
            </name>
            <name>
              <surname>Kaizhong</surname>
              <given-names>Z</given-names>
            </name>
          </person-group>
          <article-title>SPIDER: software for protein identification from sequence tags with de novo sequencing error</article-title>
          <year>2004</year>
          <conf-name>Proceedings of Computational Systems Bioinformatics Conference</conf-name>
          <fpage>206</fpage>
          <lpage>215</lpage>
        </citation>
      </ref>
      <ref id="R71">
        <label>71</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Zhang</surname>
              <given-names>E</given-names>
            </name>
            <name>
              <surname>Wong</surname>
              <given-names>AKC</given-names>
            </name>
          </person-group>
          <article-title>A genetic algorithm for multiple molecular sequence alignment</article-title>
          <source>Comput Applicat Biosci</source>
          <year>1997</year>
          <volume>13</volume>
          <fpage>565–81</fpage>
        </citation>
      </ref>
      <ref id="R72">
        <label>72</label>
        <citation citation-type="journal">
          <person-group>
            <name>
              <surname>Zhimin</surname>
              <given-names>Zh</given-names>
            </name>
            <name>
              <surname>Zhong</surname>
              <given-names>WC</given-names>
            </name>
          </person-group>
          <article-title>Dynamic programming for protein sequence alignment</article-title>
          <source>Int BioScience Bio Technol</source>
          <year>2013</year>
          <fpage>5</fpage>
        </citation>
      </ref>
    </ref-list>
  </back>
  <floats-wrap>
    <fig id="T1" position="float">
      <label>Table 1</label>
      <caption><title>Summary of the test results of proposed method</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-t-001" />
    </fig>
    <fig id="T2" position="float">
      <label>Table 2</label>
      <caption><title>Average Computation Times(s) comparison over Ref. 1, 2, 3, 4 and 5</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-t-002" />
    </fig>
    <fig id="T3" position="float">
      <label>Table 3</label>
      <caption><title>Experimental results with Ref. 1 datasets of BAliBase 2.0</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-t-003" />
    </fig>
    <fig id="T4" position="float">
      <label>Table 4</label>
      <caption><title>Experimental results with Ref. 3 datasets of BAliBase 2.0</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-t-004" />
    </fig>
    <fig id="T5" position="float">
      <label>Table 5</label>
      <caption><title>Experimental results with Ref. 2 datasets of BAliBase 2.0</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-t-005" />
    </fig>
    <fig id="T6" position="float">
      <label>Table 6</label>
      <caption><title>Performance evaluation of the proposed algorithm with hill climbing approach and randomly generated population through guide tree</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-t-006" />
    </fig>
    <fig id="F1" position="float">
      <label>Figure 1</label>
      <caption><title>Example of a multiple sequence alignment</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-001" />
    </fig>
    <fig id="F2" position="float">
      <label>Figure 2</label>
      <caption><title>One point crossover I</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-002" />
    </fig>
    <fig id="F3" position="float">
      <label>Figure 3</label>
      <caption><title>One point crossover II</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-003" />
    </fig>
    <fig id="F4" position="float">
      <label>Figure 4</label>
      <caption><title>Exchange Mutation operator</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-004" />
    </fig>
    <fig id="F5" position="float">
      <label>Figure 5</label>
      <caption><title>Reverse Mutation operator</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-005" />
    </fig>
    <fig id="F6" position="float">
      <label>Figure 6</label>
      <caption><title>Position mutation operator</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-006" />
    </fig>
    <fig id="F7" position="float">
      <label>Figure 7</label>
      <caption><title>Inverse mutation operator</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-007" />
    </fig>
    <fig id="F8" position="float">
      <label>Figure 8</label>
      <caption><title>Bar graph comparison result of scores between proposed and other methods over Ref. 1</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-008" />
    </fig>
    <fig id="F9" position="float">
      <label>Figure 9</label>
      <caption><title>Bar graph comparison result of scores between proposed and other methods over Ref. 1</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-009" />
    </fig>
    <fig id="F10" position="float">
      <label>Figure 10</label>
      <caption><title>Bar graph comparison result of scores between proposed and other methods over Ref. 3</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-010" />
    </fig>
    <fig id="F11" position="float">
      <label>Figure 11</label>
      <caption><title>Bar graph comparison result of scores between proposed and other methods over Ref. 3</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-011" />
    </fig>
    <fig id="F12" position="float">
      <label>Figure 12</label>
      <caption><title>Bar graph comparison result of scores between proposed and other methods over Ref. 2</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-012" />
    </fig>
    <fig id="F13" position="float">
      <label>Figure 13</label>
      <caption><title>Average score comparison between proposed and other methods over Ref. 1</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-013" />
    </fig>
    <fig id="F14" position="float">
      <label>Figure 14</label>
      <caption><title>Average score comparison between proposed and other methods over Ref. 3</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-014" />
    </fig>
    <fig id="F15" position="float">
      <label>Figure 15</label>
      <caption><title>Average score comparison between proposed and other methods over Ref. 2</title></caption>
      <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EXCLI-14-1232-g-015" />
    </fig>
  </floats-wrap>
</article>