HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Why doesn't the Needleman-Wunsch algorithm find solutions that begin with a gap?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
whytheneedlemanwithgapalgorithmdoesnsolutionsthatfind

Problem

I have implemented in C++ the Needleman-Wunsch algorithm for pairwise sequence alignment using the following scores:

  • +1 for match (regardless of base)



  • -1 gap penalty



  • -1 for a mismatch.



Given two sequences of length $m$ and $n$ the algorithm then generates the $mn$ search space row by row as per the algorithm's specifications and backtracks generating all solutions of maximum score. Given two sequences in which the first two letters overlap ie:

GCATGCU


and

GATTACA


the algorithm succesfully generates all the alignments of maximum score. Howeever, given two sequences of the kind:

ACTGCUGCTA


and

UGCTAATCGU


the algorithm fails to find any of the solutions that begin with a gap in sequence 1 or 2, thus failing to find the optimal solution as well:

ACTGCUGCTA-----
-----UGCTAATCGU


Once I add the letter A to the second sequence the algorithm once again has no problem finding the optimal solution (which in this case begins with a match and not a gap)

ACTGCUGCTA-----
A----UGCTAATCGU


Could someone explain to me why the algorithm doesn't find this subset of solutions and perhaps how I can modify it to find these solutions as well?

Solution

Your optimal alignment seems to be an optimal local alignment, the best substring match. Needleman-Wunsch is for global alignment. With the simple program by Eddy that you can find on the internet I have determined a global score of 0. This is better than the gobal score you get: -5 (for 10 gaps and 5 matches).

Sequence X: GCATGCU
Sequence Y: GATTACA
Scoring system: 1 for match; -1 for mismatch; -1 for gap

Dynamic programming matrix:
             G    A    T    T    A    C    A 
        0   -1   -2   -3   -4   -5   -6   -7 
   G   -1    1    0   -1   -2   -3   -4   -5 
   C   -2    0    0   -1   -2   -3   -2   -3 
   A   -3   -1    1    0   -1   -1   -2   -1 
   T   -4   -2    0    2    1    0   -1   -2 
   G   -5   -3   -1    1    1    0   -1   -2 
   C   -6   -4   -2    0    0    0    1    0 
   U   -7   -5   -3   -1   -1   -1    0    0 

Alignment:
X: GCA-TGCU
Y: G-ATTACA
Optimum alignment score: 0

Code Snippets

Sequence X: GCATGCU
Sequence Y: GATTACA
Scoring system: 1 for match; -1 for mismatch; -1 for gap

Dynamic programming matrix:
             G    A    T    T    A    C    A 
        0   -1   -2   -3   -4   -5   -6   -7 
   G   -1    1    0   -1   -2   -3   -4   -5 
   C   -2    0    0   -1   -2   -3   -2   -3 
   A   -3   -1    1    0   -1   -1   -2   -1 
   T   -4   -2    0    2    1    0   -1   -2 
   G   -5   -3   -1    1    1    0   -1   -2 
   C   -6   -4   -2    0    0    0    1    0 
   U   -7   -5   -3   -1   -1   -1    0    0 

Alignment:
X: GCA-TGCU
Y: G-ATTACA
Optimum alignment score: 0

Context

StackExchange Computer Science Q#50035, answer score: 4

Revisions (0)

No revisions yet.