HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Fighting f̶i̶r̶e̶ regex with f̶i̶r̶e̶ regex

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
withregexfighting

Problem

So someone requested a Regex Builder / Assistant for Rubberduck and since my CS class had been covering the topic only recently I was thinking to myself: "This sounds interesting, you should do this".

Long story short, a day of designing and a weekend of implementation (and some more polishing) as well as a few reviews later Rubberduck has a fully working Regex Assistant (which even supports i18n).

It can be given a VBA Regular Expression pattern and tells you what it does in a manner similar to other well-known assistants (regex101, expresso, ...)

This means it can parse a given syntactically correct Regex and recognize the separate Atoms and Quantifiers. It analyzes said Atoms and builds a structure that allows displaying useful information about the pattern.

This is accomplished by parsing the expression into a Tree-Structure.

Any Regular Expression is built by either the smallest possible unit you have (an Atom) or other regular expressions. This definition allows us to "abuse" the syntax into building our Design.

We basically need to handle two (well three) things to represent any arbitrary Regular Expression.

  • Atoms



  • Regular Expressions



  • Quantifiers



to be fully correct, Quantifiers actually belong in cahoots with Atoms, but they're.. different, so we'll treat them differently.

This sets the stage for our "data holders":

Atom.cs

```
using Rubberduck.RegexAssistant.i18n;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace Rubberduck.RegexAssistant
{
public interface IAtom : IDescribable
{
string Specifier { get; }
}

internal class CharacterClass : IAtom
{
public static readonly string Pattern = @"(?.*?)(? _characterSpecifiers;
public IList CharacterSpecifiers { get { return _characterSpecifiers; } }
private readonly string _specifier;

public CharacterClass(string specifier)
{
Match m = Matcher.Match(specifier);

Solution

Validation

A public method should validate its method parameters. It doesn't matter if it is only used inside one project like RubberDuck or if it is used by other projects or developers.

public CharacterClass(string specifier)
{
    Match m = Matcher.Match(specifier);


here Matcher.Match() would throw an ArgumentNullException which I would throw as well if specifier would be null but if you throw it at your own validation you wouldn't expose that you use a Regex.

The same is true for the constructors of Group and Literal.

A bigger issue will come in RegularExpression.Parse(string) because here you are exposing internal details of your implementation. If the passed in specifier == null the stacktrace would contain that you have called a method named TryParseAsAtom() and that the exception had been thrown by calling the Regex.Match() method.

Regex

I don't know how often the Regexes are used but you should consider to use the overloaded constructor Regex(string, RegexOptions) so you could use the Compiled enum for the RegexOptions to have the regex compiled which will be faster if called often.

CharacterClass

public static readonly string Pattern = @"(?.*?)(? _characterSpecifiers;
public IList CharacterSpecifiers { get { return _characterSpecifiers; } }
private readonly string _specifier;


Instead of adding a private backing field for your properties you could simply have a private set; which would read nicer like so

public static readonly string Pattern = @"(?.*?)(? CharacterSpecifiers { get; private set; }
private readonly string _specifier;


If you are using C# 6 then you could get rid of the private set; all together which would make it nicer.

Maybe I would let the public IList CharacterSpecifiers { get { return _characterSpecifiers; } } stay with a backing field but would return either as a ReadOnlyCollection like so

public ReadOnlyCollection CharacterSpecifiers { get { return _characterSpecifiers.AsReadOnly(); } }


or at least would return a new List like so

public IList CharacterSpecifiers { get { return new List(_characterSpecifiers); } }


This prevents that items of the IList could be changed.

While we are speaking about properties, you should stick to one style. Right now you have single lined properties like above and also multi lined properties like

public string Specifier
{
    get
    {
        return _specifier;
    }
}


Equals()

I avoid the is operator if I later on cast the object. The is operator just tries to cast the object to the desired type and returns true if it can be casted. So a softcast using as and a null check is doing the same but involves less casting like so

public override bool Equals(object obj)
{
    var item = obj as CharacterClass
    if (item != null)
    {
        return item._specifier.Equals(_specifier);
    }
    return false;
}


Quantifier constructor

if expression.Length == 0 you should return early following with the check expression.Length == 1 and returning early as well. This saves you one horizontal identation level which makes your code more readable like so

```
public Quantifier(string expression)
{
if (expression.Length == 0)
{
Kind = QuantifierKind.None;
MaximumMatches = 1;
MinimumMatches = 1;
return;
}
if (expression.Length == 1)
{
switch (expression[0])
{
case '*':
MinimumMatches = 0;
MaximumMatches = int.MaxValue;
Kind = QuantifierKind.Wildcard;
break;
case '+':
MinimumMatches = 1;
MaximumMatches = int.MaxValue;
Kind = QuantifierKind.Wildcard;
break;
case '?':
MinimumMatches = 0;
MaximumMatches = 1;
Kind = QuantifierKind.Wildcard;
break;
default:
throw new ArgumentException("Passed Quantifier String was not an allowed Quantifier");
}
return;
}

Kind = QuantifierKind.Expression;
Match m = Matcher.Match(expression);
if (!m.Success)
{
throw new ArgumentException(string.Format("Cannot extract a Quantifier from the expression {1}", expression));
}
int minimum;
// shouldn't ever happen
if (!int.TryParse(m.Groups["min"].Value, out minimum))
{
throw new ArgumentException("Cannot Parse Quantifier Expression into Range");
}
MinimumMatches = minimum;

string maximumString = m.Groups["max"].Value; // drop the comma
if (maximumString.Length > 1)
{
int maximum;
// shouldn't ever happen
if (!int.TryParse(maximumString.Substring(1), out maximum))
{
throw new ArgumentException("Cannot Parse Quantifier Expression into Range");
}
MaximumMatches = maximum;
}
else

Code Snippets

public CharacterClass(string specifier)
{
    Match m = Matcher.Match(specifier);
public static readonly string Pattern = @"(?<!\\)\[(?<expression>.*?)(?<!\\)\]";
private static readonly Regex Matcher = new Regex("^" + Pattern + "$");

private readonly bool _inverseMatching;
public bool InverseMatching { get { return _inverseMatching; } }
private readonly IList<string> _characterSpecifiers;
public IList<string> CharacterSpecifiers { get { return _characterSpecifiers; } }
private readonly string _specifier;
public static readonly string Pattern = @"(?<!\\)\[(?<expression>.*?)(?<!\\)\]";
private static readonly Regex Matcher = new Regex("^" + Pattern + "$");

public bool InverseMatching { get; private set; }
public IList<string> CharacterSpecifiers { get; private set; }
private readonly string _specifier;
public ReadOnlyCollection<string> CharacterSpecifiers { get { return _characterSpecifiers.AsReadOnly(); } }
public IList<string> CharacterSpecifiers { get { return new List<string>(_characterSpecifiers); } }

Context

StackExchange Code Review Q#136411, answer score: 5

Revisions (0)

No revisions yet.