HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Longest common subsequence length and backtracking the string

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thelengthsubsequenceandlongeststringcommonbacktracking

Problem

Problem: Given two sequences, print the longest subsequence present in both of them. A subsequence is a sequence that appears in the same relative order, but not necessarily contiguous. For example, “abc”, “abg”, “bdf”, “aeg”, ‘”acefg”, .. etc are subsequences of “abcdefg”. So a string of length n has 2^n different possible subsequences.

Examples:
LCS for input Sequences “ABCDGH” and “AEDFHR” is “ADH” of length 3.
LCS for input Sequences “AGGTAB” and “GXTXAYB” is “GTAB” of length 4.

Python 3 code:

def calculate_lcs_length(a,b):
    a_len = len(a)
    b_len = len(b)
    dp = []
    for i in range(a_len + 1):
        dp.append([0 for j in range(b_len + 1)])
    for i in range(1, a_len + 1):
        for j in range(1, b_len + 1):
            if a[i - 1] == b[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j - 1])
    max_length = dp[a_len][b_len]
    return dp, max_length

def get_path(a, b, dp, i, j):
    if i == 0 or j == 0:
        return ""
    if a[i-1] == b[j-1]:
        return get_path(a, b, dp, i-1, j-1) + a[i-1]
    else:
        if dp[i-1][j] > dp[i][j-1]:
            return get_path(a, b, dp, i-1, j)
        else:
            return get_path(a, b, dp, i, j-1)

if __name__ == "__main__":
    a = "ABCDGH"
    b = "AEDFHR"
    dp, max_length = calculate_lcs_length(a,b)
    lcs_str = get_path(a, b, dp, len(a), len(b))
    print(lcs_str)


Output: ADH

I wonder if I could use one single method (without using recursion) to get the length and string both.

Is this code can be more reader friendly. I am not asking for one liner or complex optimization improve.

Reference: Longest common subsequence problem, From Wikipedia

Solution

Yes, you can easily convert the get_path function to an iterative version.

def get_path(a, b, dp, i, j): 
    seq = ""
    while(i != 0 and j != 0): 
        if a[i-1] == b[j-1]:
            i-=1
            j-=1
            seq += a[i]
        else:
            if dp[i-1][j] > dp[i][j-1]:
                i-=1
            else:
                j-=1
    return seq[::-1]


And now you can merge this function with calculate_lcs_length into one if you want.

I guess you have read this, but I just want to remind you that all described optimizations are still applicable to your code.

Code Snippets

def get_path(a, b, dp, i, j): 
    seq = ""
    while(i != 0 and j != 0): 
        if a[i-1] == b[j-1]:
            i-=1
            j-=1
            seq += a[i]
        else:
            if dp[i-1][j] > dp[i][j-1]:
                i-=1
            else:
                j-=1
    return seq[::-1]

Context

StackExchange Code Review Q#162534, answer score: 4

Revisions (0)

No revisions yet.