snippetcsharpMinor
How to improve this Needleman-Wunsch implementation in C#?
Viewed 0 times
thisneedlemanimprovehowimplementationwunsch
Problem
I split my implementation of this sequence alignment algorithm in three methods. Where NeedlemanWunsch-method makes use of the ScoringFunction and the Traceback methods. Further I decided to go with two matices. One matrix is for the scoring, the second contains data to make the traceback easier.
```
public static int ScoreFunction(char a,char b,int matchScore,int mismatchScore)
{
if (a == b)
{
return matchScore;
}
else
{
return mismatchScore;
}
}
public static string[] NeedlemanWunsch(string sequenceA, string sequenceB, int gapPenalty, int matchScore, int mismatchScore)
{
#region Initialize
int[,] matrix = new int[sequenceA.Length + 1, sequenceB.Length + 1];
char[,] tracebackMatrix = new char[sequenceA.Length + 1, sequenceB.Length + 1];
matrix[0, 0] = 0;
for (int i = 1; i < sequenceA.Length + 1;i++)
{
matrix[i,0] = matrix[i-1,0] + gapPenalty;
tracebackMatrix[i,0] = 'L';
}
for (int i = 1; i < sequenceB.Length + 1; i++)
{
matrix[0, i] = matrix[0 , i - 1] + gapPenalty;
tracebackMatrix[0,i] = 'U';
}
#endregion
#region Scoring
for (int i = 1; i < sequenceA.Length + 1;i++)
{
for (int j = 1; j < sequenceB.Length + 1;j++)
{
int diagonal = matrix[i - 1, j - 1] + ScoreFunction(sequenceA[i-1],sequenceB[j-1],matchScore,mismatchScore);
int links = matrix[i - 1, j] + gapPenalty;
int oben = matrix[i, j - 1] + gapPenalty;
matrix[i, j] = Math.Max(oben,Math.Max(links, di
```
public static int ScoreFunction(char a,char b,int matchScore,int mismatchScore)
{
if (a == b)
{
return matchScore;
}
else
{
return mismatchScore;
}
}
public static string[] NeedlemanWunsch(string sequenceA, string sequenceB, int gapPenalty, int matchScore, int mismatchScore)
{
#region Initialize
int[,] matrix = new int[sequenceA.Length + 1, sequenceB.Length + 1];
char[,] tracebackMatrix = new char[sequenceA.Length + 1, sequenceB.Length + 1];
matrix[0, 0] = 0;
for (int i = 1; i < sequenceA.Length + 1;i++)
{
matrix[i,0] = matrix[i-1,0] + gapPenalty;
tracebackMatrix[i,0] = 'L';
}
for (int i = 1; i < sequenceB.Length + 1; i++)
{
matrix[0, i] = matrix[0 , i - 1] + gapPenalty;
tracebackMatrix[0,i] = 'U';
}
#endregion
#region Scoring
for (int i = 1; i < sequenceA.Length + 1;i++)
{
for (int j = 1; j < sequenceB.Length + 1;j++)
{
int diagonal = matrix[i - 1, j - 1] + ScoreFunction(sequenceA[i-1],sequenceB[j-1],matchScore,mismatchScore);
int links = matrix[i - 1, j] + gapPenalty;
int oben = matrix[i, j - 1] + gapPenalty;
matrix[i, j] = Math.Max(oben,Math.Max(links, di
Solution
Some general notes:
-
The TraceBack method does string concatenation in a tight loop. This is a big f***ing nono. Use a StringBuilder instead, and you will notice a considerable speedup. You have to understand that .net Strings are immutable. This means that a statement like
-
If you do not have a compelling reason to use 2d Arrays (
-
It may be worthwhile to convert tracebackMatrix from chars (16 bits) to bytes (8 bits)
-
The TraceBack method does string concatenation in a tight loop. This is a big f***ing nono. Use a StringBuilder instead, and you will notice a considerable speedup. You have to understand that .net Strings are immutable. This means that a statement like
strA += strB does not update strA, it creates a third string and updates the reference to strA. This means concatenation in a tight loop is very expansive. -
If you do not have a compelling reason to use 2d Arrays (
var bla = new int[5,5]), use jagged arrays (var bla = new int[5][5]). Most of the time, the .net JIT compiler can drop the bounds checks for jagged arrays, but not for 2D arrays.-
It may be worthwhile to convert tracebackMatrix from chars (16 bits) to bytes (8 bits)
Context
StackExchange Code Review Q#41652, answer score: 5
Revisions (0)
No revisions yet.