patternMinor
Why doesn't the Needleman-Wunsch algorithm find solutions that begin with a gap?
Viewed 0 times
whytheneedlemanwithgapalgorithmdoesnsolutionsthatfind
Problem
I have implemented in C++ the Needleman-Wunsch algorithm for pairwise sequence alignment using the following scores:
Given two sequences of length $m$ and $n$ the algorithm then generates the $mn$ search space row by row as per the algorithm's specifications and backtracks generating all solutions of maximum score. Given two sequences in which the first two letters overlap ie:
and
the algorithm succesfully generates all the alignments of maximum score. Howeever, given two sequences of the kind:
and
the algorithm fails to find any of the solutions that begin with a gap in sequence 1 or 2, thus failing to find the optimal solution as well:
Once I add the letter A to the second sequence the algorithm once again has no problem finding the optimal solution (which in this case begins with a match and not a gap)
Could someone explain to me why the algorithm doesn't find this subset of solutions and perhaps how I can modify it to find these solutions as well?
- +1 for match (regardless of base)
- -1 gap penalty
- -1 for a mismatch.
Given two sequences of length $m$ and $n$ the algorithm then generates the $mn$ search space row by row as per the algorithm's specifications and backtracks generating all solutions of maximum score. Given two sequences in which the first two letters overlap ie:
GCATGCUand
GATTACAthe algorithm succesfully generates all the alignments of maximum score. Howeever, given two sequences of the kind:
ACTGCUGCTAand
UGCTAATCGUthe algorithm fails to find any of the solutions that begin with a gap in sequence 1 or 2, thus failing to find the optimal solution as well:
ACTGCUGCTA-----
-----UGCTAATCGUOnce I add the letter A to the second sequence the algorithm once again has no problem finding the optimal solution (which in this case begins with a match and not a gap)
ACTGCUGCTA-----
A----UGCTAATCGUCould someone explain to me why the algorithm doesn't find this subset of solutions and perhaps how I can modify it to find these solutions as well?
Solution
Your optimal alignment seems to be an optimal local alignment, the best substring match. Needleman-Wunsch is for global alignment. With the simple program by Eddy that you can find on the internet I have determined a global score of 0. This is better than the gobal score you get: -5 (for 10 gaps and 5 matches).
Sequence X: GCATGCU
Sequence Y: GATTACA
Scoring system: 1 for match; -1 for mismatch; -1 for gap
Dynamic programming matrix:
G A T T A C A
0 -1 -2 -3 -4 -5 -6 -7
G -1 1 0 -1 -2 -3 -4 -5
C -2 0 0 -1 -2 -3 -2 -3
A -3 -1 1 0 -1 -1 -2 -1
T -4 -2 0 2 1 0 -1 -2
G -5 -3 -1 1 1 0 -1 -2
C -6 -4 -2 0 0 0 1 0
U -7 -5 -3 -1 -1 -1 0 0
Alignment:
X: GCA-TGCU
Y: G-ATTACA
Optimum alignment score: 0Code Snippets
Sequence X: GCATGCU
Sequence Y: GATTACA
Scoring system: 1 for match; -1 for mismatch; -1 for gap
Dynamic programming matrix:
G A T T A C A
0 -1 -2 -3 -4 -5 -6 -7
G -1 1 0 -1 -2 -3 -4 -5
C -2 0 0 -1 -2 -3 -2 -3
A -3 -1 1 0 -1 -1 -2 -1
T -4 -2 0 2 1 0 -1 -2
G -5 -3 -1 1 1 0 -1 -2
C -6 -4 -2 0 0 0 1 0
U -7 -5 -3 -1 -1 -1 0 0
Alignment:
X: GCA-TGCU
Y: G-ATTACA
Optimum alignment score: 0Context
StackExchange Computer Science Q#50035, answer score: 4
Revisions (0)
No revisions yet.