HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Negative Lookbehind Regex

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
lookbehindregexnegative

Problem

I have the following code which attempts to match all strings like "SOMESTRING" (which can include numeric values), but not "SOMESTRING". For this I am using a negative lookahead as follows;SEX and AN01ZORA should match, \PCCL\* should not match.

string s = "   if 'L,....' MDC = '13' Then " +
           "  if 'B,960.' SEX NOT = '2' AND *SEX NOT = '3' Then " +
           " DRG = 960Z (UNGROUPABLE) " +
           "    GoTo MDC FldErr " +
           "Else if 'B,N01.' SRG IN TABLE(*AN01ZORA) Then " +
           "  if '.,N01.' *PCCL* > 2 Then ";
Regex rr = new Regex(@"(?i)(?!\*\w+\*)\*\w+");
MatchCollection mc = rr.Matches(s);
foreach (Match m in mc)
    m.ToString().Dump();



Output:
*SEX
*AN01ZORA

This seems to produce the correct output, but feels nasty and not correct. Is this right and what could I do to make the Regex better?

Solution

Your regex is overly complicated, I must admit. The negative lookahead is going to do a lot of work to identify all the negative cases before even looking for (nearly) positive matches.

I think the trick you are missing is the word-boundary anchor. Consider the following regex:

\*\w+\b


This looks for an asterisk, followed by characters, and then a (zero length) word-boundary. Now, both SOME and *SOME match that, since the \b happens before the asterisk. The negative lookahead would be useful after the word-boundary. Consider the following:

\*\w+\b(?!\*)


Look for *SOME where the SOME is a complete word not followed by an asterisk.

Here's a little demonstration ....

Edit: Note, there is no reason to add the case-insensitive switch ((?i)) because your regular expression has no specific case-based characters.

Code Snippets

\*\w+\b(?!\*)

Context

StackExchange Code Review Q#54795, answer score: 5

Revisions (0)

No revisions yet.