HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Regex, match the most informative pattern

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
thematchpatternregexinformativemost

Problem

I have a function that is designed to parse an utterance (or typed in string) and identify the intent as a yes or no answer.

There are many ways of saying "yes" to something, and similarly for "no" or "unsure"; however, some of these phrases may include sub-phrases that have alternative meanings. For example: certainly vs. certainly not.

I can't guarantee that not will always refer to a negative statement either, for example sure vs not sure; I do not know; or even why not! (the latter could be taken as an affirmative.)

Since I am interested only in phrases and sub-phrases (not words within words), I realised that the way to identify the most informative phrase is to attempt to match them in order of size. Also, if the larger phrase matches, I also don't want to match the sub-phrases as separate. Fortunately, there is a feature of regular expressions that can do this quite easily.

/\b(a b c|a b|a c|b c|a|b|c)\b/ where 'a', 'b', and 'c' are words, will match the first phrase it can find over the others. Therefore, /\b(a b c|a b|a c|b c|a|b|c)\b/.matches("a b c a") == ["a b c", "a"] as opposed to ["a", "b c", "a"] or another combination.
Example here

(Note that for this to work, the order of the 'or'ed phases need to be in descending order of length. Regex captures lazy, and will succeed on the first 'or'-block in the capture group.)

So here's a test function of the idea in C#:

```
public enum Affirmation { No=0, Yes, Unsure, NoAnswer }
public Affirmation getAffirmation(string utterance)
{
string[] yesses = new string[] { "yes", "yep", "y", "yeah", "certainly", "sure", "why not", "agree", "affirmative" };
string[] nos = new string[] { "no", "nope", "nah", "nup", "negative", "n", "certainly not", "yeah nah" };
string[] unsures = new string[] { "unsure", "not sure", "i don't know", "maybe", "come again", "i do not know" };
string[] responses = yesses.Concat(nos).Concat(unsures).OrderByDescending(w => w.Length).ToArray();
Re

Solution

I cannot say much about the limits of the pattern, the only one that I encountered so far was the length of the group name.

IEnumerable affirmations = matches.Cast().Select(m => {
    if (yesses.Contains(m.Value.ToLower())) { return Affirmation.Yes; }
    else if (nos.Contains(m.Value.ToLower())) { return Affirmation.No; }
    else { return Affirmation.Unsure; }
});


You can however make the lookup faster by using a HashSet. Its Contains method is an O(1) operation unlike the array's O(n).

Additionaly you should instantiate it with StringComparer.OrdinalIgnoreCase so that you don't have to use ToLower/ToUpper.

Another possibility would be to use named groups and catch the expressions there so that you can just use them without looking them up in the collections.

Code Snippets

IEnumerable<Affirmation> affirmations = matches.Cast<Match>().Select(m => {
    if (yesses.Contains(m.Value.ToLower())) { return Affirmation.Yes; }
    else if (nos.Contains(m.Value.ToLower())) { return Affirmation.No; }
    else { return Affirmation.Unsure; }
});

Context

StackExchange Code Review Q#150252, answer score: 2

Revisions (0)

No revisions yet.