HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Find the longest common subsequence algorithm - low speed

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thelowsubsequencealgorithmspeedfindlongestcommon

Problem

I've designed an algorithm to find the longest common subsequence.

These are steps:

-
Pick the first letter in the first string.

-
Look for it in the second string and if its found, Add that letter to
common_subsequence and store its position in index, Otherwise
compare the length of common_subsequence with the length of lcs
and if its greater, asign its value to lcs.

-
Return to the first string and pick the next letter and repeat the
previous step again, But this time start searching from indexth letter

-
Repeat this process until there is no letter in the first string to
pick. At the end the value of lcs is the Longest Common
Subsequence.

This is an example:

X=A, B, C, B, D, A, B‬‬  
‫‪Y=B, D, C, A, B, A‬‬


  • Pick A in the first string.



  • Look for A in Y.



  • Now that there is an A in the second string, append it to common_subsequence.



  • Return to the first string and pick the next letter that is B.



  • Look for B in the second string this time starting from the position of A.



  • There is a B after A so append B to common_subsequence.



  • Now pick the next letter in the first string that is C. There isn't a C next to B in the second string. So assign the value of common_subsequence to lcs because its length is greater than the length of lcs.



Repeat the previous steps until reaching the end of the first string. In the end the value of lcs is the Longest Common Subsequence.

The complexity of this algorithm is \$\theta(n*m)\$.

I implemented it on two methods. The second one is using a hash table, but after implementation I found it's much slower compared to the first algorithm. I can't understand why.

The first algorithm:

```
import time
def lcs(xstr, ystr):
if not (xstr and ystr): return # if string is empty
lcs = [''] # longest common subsequence
lcslen = 0 # length of longest common subsequence so far
for i in xrange(len(xstr)):
cs = '' # common subsequenc

Solution

Firstly, your algorithm is incorrect try:

lcs("AAAABCC","AAAACCB"), the LCS should be "AAAACC", but your algorithm finds "AAAAB".

Secondly your algorithm is O(n^2m) not O(nm). Since you don't elaborate as to why you think your algorithm is theta(n*m) I can't really guess where your analysis has gone wrong.

Your second version attempts to optimize the process of searching through the string by using a list of pre-calculated positions. This means you don't have to scan through all the positions in the string with different characters. However, you lose the ability to skip all the position before your starting index. For long strings with few distinct characters, you end up losing out.

Context

StackExchange Code Review Q#20337, answer score: 3

Revisions (0)

No revisions yet.