HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

I think I might be having a regex nightmare

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
mightthinkhavingregexnightmare

Problem

I've been working on the Rubberduck (an add-in for the VBA Editor /"VBE"), specifically here the VBA.Parser namespace. Here's how I ended up implementing the syntax part.

[ComVisible(false)]
public interface ISyntax
{
    /// 
    /// Parses an instruction into a syntax node, if possible.
    /// 
    /// The fully-qualified scope of the specified instruction, when the instruction is publicly scoped..
    /// The fully-qualified scope of the specified instruction, when the instruction is locally scoped.
    /// An instruction.
    /// 
    /// Returns a node representing the specified instruction, 
    /// or null if specified instruction can't be parsed.
    /// 
    SyntaxTreeNode Parse(string publicScope, string localScope, Instruction instruction);

    bool IsMatch(string publicScope, string localScope, Instruction instruction, out SyntaxTreeNode node);

    /// 
    /// Gets a value indicating whether syntax is specific to a particular parent node.
    /// 
    /// 
    /// Implementations with this member set to true will not be considered as part of the general grammar.
    /// 
    bool IsChildNodeSyntax { get; }

    SyntaxType Type { get; }
}


Now, armed with this... thing:

WARNING

Faint of heart, beware. The mere sight of these regex patterns is known to have caused nausea in at least, uh, one, somewhat documented case.

```
[ComVisible(false)]
public static class VBAGrammar
{
private static string IdentifierSyntax { get { return @"(?(?:[a-zA-Z][a-zA-Z0-9_])|(?:\[[a-zA-Z0-9_]\]))"; } }
private static string ReferenceSyntax { get { return @"(?:(?(?:(?:(?[a-zA-Z][a-zA-Z0-9_]))\.))?" + IdentifierSyntax + ")"; } }

///
/// Finds all implementations of in the Rubberduck assembly.
///
///
public static IEnumerable GetGrammarSyntax()
{
return Assembly.GetExecutingAssembly()
.GetTypes()
.Where(type => type.BaseType == typeof(SyntaxBase))

Solution

Regex-nightmare

By looking at the returned value of IdentifierDeclarationSyntax

(?(?:(?(?:[a-zA-Z][a-zA-Z0-9_]*)|(?:\[[a-zA-Z0-9_]*\]))(?[%&@!#$])?(?\((?(([0-9]+)\,?\s?)*|([0-9]+\sTo\s[0-9]+\,?\s?)+)\))?(?\sAs(\s(?New))?\s(?:(?(?:(?:(?[a-zA-Z][a-zA-Z0-9_]*))\.)*)?(?(?:[a-zA-Z][a-zA-Z0-9_]*)|(?:\[[a-zA-Z0-9_]*\]))))?)(?:\,\s)?)+


and the fact that you couldn't make it fool proof in at least 6 months (oldest question I have found), I would say yes you have a nightmare. If you have a bug, debugging this won't be fun.

General

-
here you don't need a readonly bool, you should use an auto property with a private setter

[ComVisible(false)]
public class ParameterNode : SyntaxTreeNode
{
    public ParameterNode(Instruction instruction, string scope, Match match)
    : base(instruction, scope, match, new[] {new IdentifierNode(instruction, scope, match)})
    {
        IsImplicitByRef = !match.Groups["by"].Success;
    }

    public IdentifierNode Identifier { get { return ChildNodes.OfType().Single(); } }

    public bool IsImplicitByRef { get; private set; }
}


This also applies to SyntaxBase.Type and SyntaxBase.IsChildNodeSyntax

-
In the VBAGrammar.GetGrammarSyntax() you are calling unnecessary .ToList().

-
Empty xml documentation should be removed.

Code Snippets

(?<declarations>(?:(?<identifier>(?:[a-zA-Z][a-zA-Z0-9_]*)|(?:\[[a-zA-Z0-9_]*\]))(?<specifier>[%&@!#$])?(?<array>\((?<size>(([0-9]+)\,?\s?)*|([0-9]+\sTo\s[0-9]+\,?\s?)+)\))?(?<as>\sAs(\s(?<initializer>New))?\s(?:(?<reference>(?:(?:(?<library>[a-zA-Z][a-zA-Z0-9_]*))\.)*)?(?<identifier>(?:[a-zA-Z][a-zA-Z0-9_]*)|(?:\[[a-zA-Z0-9_]*\]))))?)(?:\,\s)?)+
[ComVisible(false)]
public class ParameterNode : SyntaxTreeNode
{
    public ParameterNode(Instruction instruction, string scope, Match match)
    : base(instruction, scope, match, new[] {new IdentifierNode(instruction, scope, match)})
    {
        IsImplicitByRef = !match.Groups["by"].Success;
    }

    public IdentifierNode Identifier { get { return ChildNodes.OfType<IdentifierNode>().Single(); } }

    public bool IsImplicitByRef { get; private set; }
}

Context

StackExchange Code Review Q#73798, answer score: 7

Revisions (0)

No revisions yet.