S.A.P.

SEQUENCES ALIGNMENT PROGRAM

Created by Ariadna Escalona and Anna Travesa

This program uses a dynamic programming algorism to align any two sequences in fasta format. It allows to choose among four different substitution matrices(PAM30, PAM70, BLOSUM62 and BLOSUM80) and it returns the optimum alignment with its corresponding score in fasta format.

Script


We download the two sequences and the program transfers them into two vectors in which each position contains an aminoacid. Simultaneously, a hash is constructed based on the values from the substitution matrix chosen. The choice of the matrix should be made according to the phylogenetical relativeness between the two sequences that are to be aligned. So, the user of the program should choose either PAM30 or BLOSUM80 if dealing with phylogenetically close sequences, and BLOSUM62 or PAM70 if dealing with not so close sequences.

Afterwards, an alignment matrix is constructed, in which the first cell will be filled in with a zero and the rest of the matrix will contain the partial alignment scores calculated according to the following formula:

Si,j = maximum between
  1. Si-1, j-1 + s(i,j)
  2. Si-1, j + g
  3. Si, j-1 + g

where Si,j is the score at the ith, jth cell in the matrix and 'g' is the gap penalty
These three options represent the three possible paths in the matrix to obtain the maximum score. For example, in order to fill in the first column of the matrix it will only be possible to calculate the maximum score by using the expression 'Si-1, j + g', whereas for filling in the first row of the matrix it will only be possible to use the expression 'Si,j-1 + g'. For the rest of rows and columns of the matrix the three options can be used. So, for each cell of the matrix the program calculates the three possible partial alignment scores and records only the maximum score as well as from which cell it is obtained:
The last step is to undo the path starting from the last cell of the matrix and following the paths recorded until the first cell is reached. If there is a 'D' in the cell, the program will align the two symbols and print '*' if they are equal, print ':' if the corresponding value at the substitution matrix choosen is > 0, or print ' ' if this value is < 0.
If there is a 'U', it will align the ith symbol of sequence 1 with a gap, and finally, if there is an 'L', it will align the ith symbol of sequence 2 with a gap. In both cases, the program will print ' '.
The program returns this alignment in fasta format with its corresponding score.

Program


Past sequence 1 in fasta format:

Past sequence 2 in fasta format:

Choose the substitution matrix:

Choose the gap penalty: Other:


Click here to download the program to run it at the Linux terminal.
If you wish to see it in html format, click here.


For any questions or want further information send an email to the authors:

anna.travesa01@campus.upf.edu
ariadna.escalona01@campus.upf.edu


Universitat Pompeu Fabra 2002