patternpythonMinor
Find the longest common subsequence algorithm - low speed
Viewed 0 times
thelowsubsequencealgorithmspeedfindlongestcommon
Problem
I've designed an algorithm to find the longest common subsequence.
These are steps:
-
Pick the first letter in the first string.
-
Look for it in the second string and if its found, Add that letter to
compare the length of
and if its greater, asign its value to
-
Return to the first string and pick the next letter and repeat the
previous step again, But this time start searching from
-
Repeat this process until there is no letter in the first string to
pick. At the end the value of
Subsequence.
This is an example:
Repeat the previous steps until reaching the end of the first string. In the end the value of
The complexity of this algorithm is \$\theta(n*m)\$.
I implemented it on two methods. The second one is using a hash table, but after implementation I found it's much slower compared to the first algorithm. I can't understand why.
The first algorithm:
```
import time
def lcs(xstr, ystr):
if not (xstr and ystr): return # if string is empty
lcs = [''] # longest common subsequence
lcslen = 0 # length of longest common subsequence so far
for i in xrange(len(xstr)):
cs = '' # common subsequenc
These are steps:
-
Pick the first letter in the first string.
-
Look for it in the second string and if its found, Add that letter to
common_subsequence and store its position in index, Otherwisecompare the length of
common_subsequence with the length of lcsand if its greater, asign its value to
lcs.-
Return to the first string and pick the next letter and repeat the
previous step again, But this time start searching from
indexth letter-
Repeat this process until there is no letter in the first string to
pick. At the end the value of
lcs is the Longest CommonSubsequence.
This is an example:
X=A, B, C, B, D, A, B
Y=B, D, C, A, B, A- Pick
Ain the first string.
- Look for
AinY.
- Now that there is an
Ain the second string, append it tocommon_subsequence.
- Return to the first string and pick the next letter that is
B.
- Look for
Bin the second string this time starting from the position ofA.
- There is a
BafterAso append B tocommon_subsequence.
- Now pick the next letter in the first string that is
C. There isn't aCnext toBin the second string. So assign the value of common_subsequence tolcsbecause its length is greater than the length oflcs.
Repeat the previous steps until reaching the end of the first string. In the end the value of
lcs is the Longest Common Subsequence.The complexity of this algorithm is \$\theta(n*m)\$.
I implemented it on two methods. The second one is using a hash table, but after implementation I found it's much slower compared to the first algorithm. I can't understand why.
The first algorithm:
```
import time
def lcs(xstr, ystr):
if not (xstr and ystr): return # if string is empty
lcs = [''] # longest common subsequence
lcslen = 0 # length of longest common subsequence so far
for i in xrange(len(xstr)):
cs = '' # common subsequenc
Solution
Firstly, your algorithm is incorrect try:
Secondly your algorithm is
Your second version attempts to optimize the process of searching through the string by using a list of pre-calculated positions. This means you don't have to scan through all the positions in the string with different characters. However, you lose the ability to skip all the position before your starting index. For long strings with few distinct characters, you end up losing out.
lcs("AAAABCC","AAAACCB"), the LCS should be "AAAACC", but your algorithm finds "AAAAB".Secondly your algorithm is
O(n^2m) not O(nm). Since you don't elaborate as to why you think your algorithm is theta(n*m) I can't really guess where your analysis has gone wrong.Your second version attempts to optimize the process of searching through the string by using a list of pre-calculated positions. This means you don't have to scan through all the positions in the string with different characters. However, you lose the ability to skip all the position before your starting index. For long strings with few distinct characters, you end up losing out.
Context
StackExchange Code Review Q#20337, answer score: 3
Revisions (0)
No revisions yet.