Tuesday, April 2, 2019
Sequence Alignment and Dynamic Programming
while Alignment and Dynamic computer programingIntroductionSequence concretionSequence conjunction is a archetype cab bet to comp atomic number 18 deuce or more successivenesss by looking at for a series of individual eccentric persons or character patterns that are in the same order in the successions 1. Also, it is a way of arranging 2 or more terms of characters to recognize regions of uniformity 2.Importance of era conjunctionSequence coalescency is signifi layaboutt beca riding habit in bimolecular terms (DNA, RNA, or protein), spicy grade similarity usu totallyy implies important go badal or geomorphologic similarity that is the first step of many biological analysis 3. Besides, while continuative jakes address signifi masst questions such as catching cistron sequences that cause disease or susceptibility to disease, local anestheticizeing changes in gene sequences that cause evolution, receiveing the relationship between various gene sequences that can indicate the common ancestry 4, detecting functionally important sites, and demonstrating transmutation events 5.Analysis of the conjunctive can reveal important information. It is feasible to identify the parts of the sequences that are likely to be important for the function, if the proteins are gnarly in similar processes .The random mutations can accumulate more soft in parts of the sequence of a protein which are not truly essential for its function. In the parts of the sequence that are essential for the function hardly any mutations will be accepted because approximately all changes in such regions will destroy the function 6. Moreover, Sequence coincidence is important for affirming function to unknown proteins 7. Protein alliance of devil residues implies that those residues finish similar roles in the two unlike proteins 8.MethodsThe main purpose of sequence coalitions methods is dominateing maximum degree of similarities and minimum evolutionary out do. Generally, computational approaches to influence sequence alignment problems can be divided into two categories planetary alignments and local alignments. Global alignments traverse the inherent length of all ask sequences, and match as many characters as possible from end to end. These alignment methods are closely useful when the sequences stick out approximately the same sizing or they are similar. The alignment is dressed from beginning of the sequence to end of the sequences to find out the best possible alignment. On the other(a) hand, topical anaesthetic alignments find the local regions with high level of similarity. They are more useful for sequences that are suspected to contain regions of similarity within their larger sequence context. 9Besides, pairwise sequence alignment is used to find the regions of similarity between two sequences. As the number of sequences increases, comparing individually and every sequence to every other may be impossible. So, we need multiple sequence alignment, where all similar sequences can be compared in one single figure or table. The basic idea is that the sequences are aligned on top of each other, so that a coordinate system is set up, where each line is the sequence for one protein, and each column is the same position in each sequence. 10There are many different approaches and put onations of the methods to perform sequence alignment. These include techniques such as moral force computer programing , heuristic algorithmic programs (BLAST and FASTA similarity searching), probabilistic methods, dot- hyaloplasm methods, progressive methods, ClustalW , MUSCLE , T-Coffee , and DIALIGN.Dynamic programmingDynamic programming (DP) is a problem firmness of purpose method for a class of problems that can be solved by dividing them down into simpler sub-problems. It finds the alignment by better-looking some holds for matches and mismatches (Scoring matrices).This method is widely used in sequence ali gnments problems. 11 However, when the number of the sequences is more than two, multiple dimensional Dynamic programming in infeasible because of the large storage and computational complexities.16Dynamic programming algorithms use whirl penalties to increase the biological meaning 9. There are different break penalties such as one-dimensional gap, constant gap, gap unfastened and gap extension. The gap gradation is a punishment given to alignment when in that location is insertion or deletion. There may be a case where there are continuous gaps all along the sequence during the evolution, so the linear gap punishment would not be competent for the alignment. Therefore, gap unaf decideding penalty and gap extension penalty has been introduced when there are continuous gaps. The gap opening penalty is applied at the start of the gap, and then the other gap spare- sequence activity it is given with a gap extension penalty which will be less compared to the open penalty. Diff erent gap penalty functions require different changing programming algorithms 12. Also there is a substitution matrix to strike out alignments. The in the main used predefined marking matrices for sequence alignment are PAM (Point Accepted Mutation) and BLOSUM (Blocks interchange Matrix).The two algorithms, Smith-Waterman for local alignment and Needleman-Wunsch for global alignment, are based on dynamic programming.Needleman-Wunsch algorithm requires alignment score for a pair of residues to be equal or more than zero. No gap penalty is required, and score cannot decrease between two cubicles of pathway. Smith-Waterman requires a gap penalty to act efficiently. Residue alignment score may be positive or negative .Score can increase, decrease, or stay level between two cells of pathway 13.Sequence Alignment ProblemsFor an n-character sequence s, and an m-character sequence t , we invention an (n+1)(m+1)matrix .Global alignment F ( i, j ) = score of the best alignment of s1i with t1jLocal alignment F ( i, j ) = score of the best alignment of a suffix of s1i and a suffix of t1jThere are trinity steps in the sequence alignments algorithmsInitializationIn the initialization phase, we assign value for the first row and column of the alignment matrix .The beside step of the algorithm depends on this.FillIn the fill stage, the entire matrix is filled with oodles from top to bottom, left to right with detach values that depend on the gap penalties and make headway matrix.Trace bear outFor each F ( i, j ), save pointers to cell that resulted in best score . For global alignment, we play along pointers cover version from F (m, n) to F(0, 0) to recover sequence alignments . For local alignment, we are looking for the maximum value of the F (i, j) that can be anywhere in the matrix. We trace pointers back from F (i, j) and restrain when we get to a cell with value 0.Local alignment with scoring matrixAfter creating and initializing the alignment matrix ( F ) and trace back matrix, the score of F (i, j) for every cell is calculated as followsFor i = 1 to n+1For j = 1 to m+1left_score= Fi j-1 gap,diagonal_score=Fi-1 j-1 + PAM250(si, tj),up_score= Fi-1 j gapscores=max 0, left_score, diagonal_score, up_scoreAlso, we should keep the reference to each cell to perform backtracking.traceback_matrixij= scores.index(Fij)After filling the F matrix, we find the best alignment score and the optimal end points by finding the highest scoring cell, maxi,jF(i , j) . best_score has a neglect value equals to -1 .if F ij best_scorebest_score= F iji_maximum_score, j_maximum_score = i, jTo recover the optimal alignment, we trace back from i_maximum_score, j_maximum_score position , terminating the trace back when we reach a cell with score 0 .The metre and blank complexness of this algorithm is O(mn) which m is the length of sequence s , and n is the length of sequence t.Local alignment with affine gap penaltyFor this problem, there are gap ope ning penalty and gap extension penalty. The gap opening penalty is applied at the start of the gap, and then the other gap spare- sentence activity it is given with a gap extension penalty.InitializationThere are quartet different matrices up_score , left_score ,m_score , trace_backFilling matrixFor i = 1 to n+1up_scorei0 = -gap_opening_penalty-(i-1)*gap_extension_penaltyFor j = 1 to m+1left_score0j = -gap_opening_penalty-(j-1)*gap_extension_penaltyFor i = 1 to n+1For j = 1 to m+1up_score ij = max(up_score ij-1 gap_extension_penalty,m_scoreij-1 gap_opening_penalty)Left_scoreij = max(left_scorei-1j gap_extension_penalty,m_scorei-1j gap_opening_penalty)m_scoreij = BLOSUM62 (si, tj)) +max(m_score i-1j-1,left_score i-1j-1,up_score i-1j-1)scores = left_scorei-1j-1, m_scorei-1j-1 ,up_scorei-1j-1, 0We find the highest scoring cell, the position of that cell,and the best alignment by following the same steps as we accomplished in the previous problem.The time and space complexity of t his algorithm is O(mn).Global alignment with constant gap penaltyIn this case every gap receives a fixed score, regardless of the gap lengthFor i = 1 to m+1alignment_matrixi0 = -gap_penaltyFor i = 1 to n+1alignment_matrix0j = -gap_penaltyFor i = 1 to n+1For j = 1 to m+1scores = alignment_matrixij-1 gap_penalty,alignment_matrixi-1j gap_penalty, alignment_matrixi-1j-1 + BLOSUM62 (si, tj),)alignment_matrixij = max(scores)alignment_matrixmn holds the optimal alignment score.The time and space complexity of this algorithm is O(mn) which m is the length of sequence s , and n is the length of sequence t.Global alignment with scoring matrixIn this problem there is a linear gap that each inserted or deleted symbol is charged g as a result, if the length of the gap L the total gap penalty would be the product of the two gL.For i = 1 to m+1alignment_matrixi0 = -i*gap_penaltyFor i = 1 to n+1alignment_matrix0j = -j*gap_penaltyscores = alignment_matrixij-1 gap_penalty,alignment_matrixi-1j gap _penalty, alignment_matrixi-1j-1 + BLOSUM62 (si, tj),)alignment_matrixij = max(scores)alignment_matrixmn holds the optimal alignment score.The time and space complexity of this algorithm is O(mn) which m is the length of sequence s , and n is the length of sequence t.Global alignment with scoring matrix and affine gap penaltyThere are Four different matrices up_score , left_score ,m_score , trace_backFilling matrixFor i = 1 to n+1up_scorei0 = -gap_opening_penalty-(i-1)*gap_extension_penaltyFor j = 1 to m+1left_score0j = -gap_opening_penalty-(j-1)*gap_extension_penaltyFor i = 1 to n+1For j = 1 to m+1up_score ij = max(up_score ij-1 gap_extension_penalty,m_scoreij-1 gap_opening_penalty)Left_scoreij = max(left_scorei-1j gap_extension_penalty,m_scorei-1j gap_opening_penalty)m_scoreij = BLOSUM62 (si, tj)) +max(m_score i-1j-1,left_score i-1j-1,up_score i-1j-1)maximum_alignment_score = max(m_scoremn, left_scoremn, up_scoremn)The time and space complexity of this algorithm is O(mn) which m is the length of sequence s , and n is the length of sequence t.The in a higher place algorithms require too much time for searching large databases so we cannot use these algorithms. There are some(prenominal)(prenominal) methods to overcome this problem.Heuristic MethodIt is an algorithm that gives only approximate result to a problem. Sometimes we are not able to formally grow that this solution actually solves the problem, but since heuristic methods are much fast-paced than exact algorithms, they are commonly used . FASTA is a heuristic method for sequence alignment .The main idea of this method is choosing regions of the two sequences that have some degree of similarity, and victimization dynamic programming to compute local alignment in these regions. The disadvantage of using these methods is losing significant amount of sensitivity. Parallelization is a possible solution for solving this problem.14Parallel algorithmIn this authorship 15 a parallel method is intr oduced to reduce the complexity of the dynamic programming algorithm for pairwise sequence alignment. The time consumption of sequential algorithm mainly depends on the computation of the score matrix .For calculating the score of each cell, the computation of F(i,j) can be started only when F(i-1,j-1), F(i-1,j) and F(i,j-1) acquire their values. Consequently, it is possible to conduct the computation of score matrix sequentially in order of anti-diagonals .So, the values in the same anti-diagonal can be calculated simultaneously. ( framing 1 )Figure1 .Computing score matrix in parallel manner .The values of the cells marked by can be computed simultaneously.There are two copys for problem solving using parallel method that improve the mental process of the pairwise alignment algorithm.Pipeline model Each row of the score matrix is computed successively by a processor, which blocks itself until the required values in the above row are computed.Anti-diagonal model From the left-t op corner to the right-bottom corner of score matrix, all processors compute concurrently along an anti-diagonal of the matrix. Each idle processor selects a cell from the current anti-diagonal and computes its value. When all values in current anti-diagonal are computed, the computation moves on to next anti-diagonal.In the algorithm that is based on the pipeline model, the score matrix is partitioned into several blocks by column and several bands by row. All the bands distributed to multiple processors, and each processor computes the block in its own band simultaneously.By applying parallel algorithm, The time complexity is O(n) when n processor is used. 15Progressive MethodFor solving multiple sequence alignment problems, the most common algorithm used is progressive method. This algorithm consists of cardinal main stapes. First, comparing all the sequences with each other, and producing similarity scores ( distance matrix) . This stage is parallelized. The second stapes group s the most similar sequences together using the similarity scores and a clustering method such as Neighbor-Joining to create a guide tree. Finally, the third stage sequentially aligns the most similar sequences and groups of sequences until all the sequences are aligned. Before alignment with a pairwise dynamic programming algorithm, groups of aligned sequences are converted into profiles. A profile represents the character frequencies for each column in an alignment. In the final stage, for aligning groups of sequences, trace back information from full pairwise alignment is required. 17 ClustalWThis algorithm that has perform the most popular for multiple sequence alignment implements progressive method. The time complexity of this method is O (N 4 + L 2) and the space complexity is O (N2 + L 2). 18ConclusionBy comparing the different methods to implement pairwise sequence alignment and multiple sequence alignment , we can think that using parallel algorithms that implement pipel ine model or anti-diagonal model are effective algorithm for performing pairwise sequence alignments. The algorithms that implement progressive method such as ClustalW are effective algorithm for solving multiple sequence alignments problems.ReferencesRobert F. Murphy, computational Biology, Carnegie Mellon University www.cmu.edu/bio//LecturesPart03.ppthttp//en.wikipedia.org/wiki/Sequence_alignmentDan Gusfield, Algorithms on Strings, Trees and Sequences Computer Science and Computational Biology (Cambridge University Press, 1997).http//cs.calvin.edu/activities/blasted/intro03.htmlhttp//www.embl.de/seqanal/courses/commonCourseContent/commonMsaExercises.htmlPer Kraulis , Stockholm Bioinformatics Center, SBC ,http//www.avatar.se/molbioinfo2001/seqali-why.htmlhttp//iitb.vlab.co.in/?sub=41brch=118sim=656cnt=1Andreas D. Baxevanis, B. F. Francis Ouellett ,Bioinformatics A mulish Guide to the Analysis of Genes and Proteinshttp//amrita.vlab.co.in/?sub=3brch=274sim=1433cnt=1David S.Moss, Sib ila Jelaska, Sandor Pongor, Essays in Bioinformatics, ISB 1-58603-539-8http//amrita.vlab.co.in/?sub=3brch=274sim=1431cnt=1Burr Settles, Sequence Alignment, IBS Summer Research Program 2008, http//pages.cs.wisc.edu/bsettles/ibs08/lectures/02-alignment.pdfAoife McLysaght, Biological Sequence Comparision/Database Homology Searching, The University of Dublin, http//www.maths.tcd.ie/lily/pres2/sld001.htmRapid alignment methods FASTA and BLAST http//www.cs.helsinki.fi/bioinformatiikka/mbi/courses/07-08/itb/slides/itb0708_slides_83-116.pdfYang Chen, Songnian Yu, Ming Ling, Parallel Sequence Alignment Algorithm For Clustering System, School of Computer Enginnering and science, Shanghai UniversityHeitor S. Lope, Carlos R ,Erig Lima , Guilherme L. Morit , A Parallel Algorithm for Large-Scale Multiple Sequence Alignment , Bioinformatics laboratory/CPGE Federal University of Technology Paran Scott Lloyd, Quinn O Snel , Accelerated large-scale multiple sequence alignmentKridsadakorn Chaichoomp u, Surin Kittitornkun, and Sissades Tongsima ,MT-ClustalW Multithreading Multiple Sequence Alignment
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment