HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Python 3: Finding common patterns in pairs of strings

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
patternscommonpythonfindingstringspairs

Problem

I have written a piece of code that finds common patterns in two strings. These patterns have to be in the same order, so for example "I am a person" and "A person I am" would only match "person". The code is crude, all characters, including whitespace and punctuation marks, receive the same treatment. The longest patterns are matched first.

The main function then returns two lists (one per string) of tuples with two elements. The first element is 1 if a substring has been matched, else 0. The second element is the substring.

So the format of returned lists will be like this:

[(1, 'I '), (0, 'am a person\n')]
[(1, 'I '), (0, 'can see\n')]


Now to the question -- what do you think about my code? I somehow feel that it's not quite top-notch, and I'm not an experienced coder. Any suggestions on coding style or the algorithms? I hope the code is reasonably clear.

The find_raw_patterns function returns lists with alternating integers and strings, so the only thing find_common_patterns does in addition to calling find_raw_patterns, is to arrange the lists' elements in two-element-tuples.

The function longest_common_substring is copied directly from Wikibooks, and I'm not very concerned about that function.

Code:

```
def longest_common_substring(S1, S2):
M = [[0]*(1+len(S2)) for i in range(1+len(S1))]
longest, x_longest = 0, 0
for x in range(1,1+len(S1)):
for y in range(1,1+len(S2)):
if S1[x-1] == S2[y-1]:
M[x][y] = M[x-1][y-1] + 1
if M[x][y]>longest:
longest = M[x][y]
x_longest = x
else:
M[x][y] = 0
return S1[x_longest-longest: x_longest]

def find_common_patterns(s1, s2):
arranged1 = []
arranged2 = []
(ptr1, ptr2) = find_raw_patterns(s1, s2) #ptr - pattern
for i in range(len(ptr1) - 1):
if type(ptr1[i]) == int:
arranged1.append((ptr1[i], ptr1[i+1]))
for i in range(len(ptr2)

Solution

You can simplify the two functions quite a bit. For find_raw_patterns you can use partition instead of manually splitting the strings up, and simplify the formation of the list.

def find_raw_patterns(s1, s2): # used recursively
    if s1 == '' or s2 == '':
        return [], []
    com = longest_common_substring(s1, s2)
    if len(com) < 2:
        return ([0, s1], [0, s2])
    s1_bef, _, s1_aft = s1.partition(com)
    s2_bef, _, s2_aft = s2.partition(com)
    before = find_raw_patterns(s1_bef, s2_bef)
    after = find_raw_patterns(s1_aft, s2_aft)
    return (before[0] + [1, com] + after[0],
            before[1] + [1, com] + after[1])


For find_common_patterns, you can use list indices to get the even and odd values rather than checking the type of each value, and use zip to get a list of pair tuples.

def find_common_patterns(s1, s2):
    (ptr1, ptr2) = find_raw_patterns(s1, s2) #ptr - pattern
    return (zip(ptr1[::2], ptr1[1::2]),
            zip(ptr2[::2], ptr2[1::2]))


But I think you can also combine the two functions with a minor adjustment to put the found values into paired tuples.

def find_common_patterns(s1, s2): # used recursively
    if s1 == '' or s2 == '':
        return [], []
    com = longest_common_substring(s1, s2)
    if len(com) < 2:
        return ([(0, s1)], [(0, s2)])
    s1_bef, _, s1_aft = s1.partition(com)
    s2_bef, _, s2_aft = s2.partition(com)
    before = find_common_patterns(s1_bef, s2_bef)
    after = find_common_patterns(s1_aft, s2_aft)
    return (before[0] + [(1, com)] + after[0],
            before[1] + [(1, com)] + after[1])

Code Snippets

def find_raw_patterns(s1, s2): # used recursively
    if s1 == '' or s2 == '':
        return [], []
    com = longest_common_substring(s1, s2)
    if len(com) < 2:
        return ([0, s1], [0, s2])
    s1_bef, _, s1_aft = s1.partition(com)
    s2_bef, _, s2_aft = s2.partition(com)
    before = find_raw_patterns(s1_bef, s2_bef)
    after = find_raw_patterns(s1_aft, s2_aft)
    return (before[0] + [1, com] + after[0],
            before[1] + [1, com] + after[1])
def find_common_patterns(s1, s2):
    (ptr1, ptr2) = find_raw_patterns(s1, s2) #ptr - pattern
    return (zip(ptr1[::2], ptr1[1::2]),
            zip(ptr2[::2], ptr2[1::2]))
def find_common_patterns(s1, s2): # used recursively
    if s1 == '' or s2 == '':
        return [], []
    com = longest_common_substring(s1, s2)
    if len(com) < 2:
        return ([(0, s1)], [(0, s2)])
    s1_bef, _, s1_aft = s1.partition(com)
    s2_bef, _, s2_aft = s2.partition(com)
    before = find_common_patterns(s1_bef, s2_bef)
    after = find_common_patterns(s1_aft, s2_aft)
    return (before[0] + [(1, com)] + after[0],
            before[1] + [(1, com)] + after[1])

Context

StackExchange Code Review Q#21532, answer score: 3

Revisions (0)

No revisions yet.