HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Either or case in Python and Regex

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
caseeitherpythonandregex

Problem

I have a small module that gets the lemma of a word and its plural form. It then searches through sentences looking for a sentence that contains both words (singular or plural) in either order. I have it working but I was wondering if there is a more elegant way to build this expression.

Note: Python2

words = ((cell,), (wolf,wolves))
string1 = "(?:"+"|".join(words[0])+")"
string2 = "(?:"+"|".join(words[1])+")"
pat = ".+".join((string1, string2)) +"|"+ ".+".join((string2, string1))
# Pat output: "(?:cell).+(?:wolf|wolves)|(?:wolf|wolves).+(?:cell)"


Then the search:

pat = re.compile(pat)
for sentence in sentences:
    if len(pat.findall(sentence)) != 0:
        print sentence+'\n'


Alternatively, would this be a good solution?

words = ((cell,), (wolf,wolves))
for sentence in sentences:
    sentence = sentence.lower()
    if any(word in sentence for word in words[0]) and any(word in sentence for word in words[1]):
        print sentence

Solution

You could use findall with a pattern like (cell)|(wolf|wolves) and check if every group was matched:

words = (("cell",), ("wolf","wolves"))
pat = "|".join(("({0})".format("|".join(forms)) for forms in words))
regex = re.compile(pat)
for sentence in sentences:
    matches = regex.findall(sentence)
    if all(any(groupmatches) for groupmatches in zip(*matches)):
        print sentence

Code Snippets

words = (("cell",), ("wolf","wolves"))
pat = "|".join(("({0})".format("|".join(forms)) for forms in words))
regex = re.compile(pat)
for sentence in sentences:
    matches = regex.findall(sentence)
    if all(any(groupmatches) for groupmatches in zip(*matches)):
        print sentence

Context

StackExchange Code Review Q#36922, answer score: 3

Revisions (0)

No revisions yet.