HiveBrain v1.2.0
Get Started
← Back to all entries
snippetcsharpModerate

Convert Sql LIKE to Regex

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
convertregexlikesql

Problem

I have a bit of code that converts a Sql Like expression to a regex expression for the purposes of a Linq to objects Like extension method. For some time I have been using this conversion.

This conversion replaces all "%" with ".?" which works for contains patterns but is over matching for starts with or ends with patterns. So the conversion of %abc to .?abc is capturing both "abcdef" and "123abcdef".

I've amended the conversion algorithm to account for starts with and ends with LIKE expressions.

Here is my code:

internal static string ConvertLikeToRegex(string pattern)
{
    // Turn "off" all regular expression related syntax in the pattern string. 
    StringBuilder builder = new StringBuilder(Regex.Escape(pattern));

    // these are needed because the .*? replacement below at the begining or end of the string is not
    // accounting for cases such as LIKE '%abc' or LIKE 'abc%'
    bool startsWith = pattern.StartsWith("%") && !pattern.EndsWith("%");
    bool endsWith = !pattern.StartsWith("%") && pattern.EndsWith("%");

    // this is a little tricky
    // ends with in like is '%abc'
    // in regex it's 'abc

My in // so need to tanspose if (startsWith) { builder.Replace("%", "", 0, 1); builder.Append("$"); } // same but inverse here if (endsWith) { builder.Replace("%", "", pattern.Length - 1, 1); builder.Insert(0, "^"); } /* Replace the SQL LIKE wildcard metacharacters with the * equivalent regular expression metacharacters. */ builder.Replace("%", ".*?").Replace("_", "."); /* The previous call to Regex.Escape actually turned off * too many metacharacters, i.e. those which are recognized by * both the regular expression engine and the SQL LIKE * statement ([...] and [^...]). Those metacharacters have * to be manually unescaped here. */ builder.Replace(@"\[", "[").Replace(@"\]", "]").Replace(@"\^", "^"); return builder.ToString(); }


My in

Solution

I think you're overcomplicating this and your code still doesn't work correctly. The LIKE pattern bcd shouldn't match abcde, but it does with your code.

What you should do is to always add ^ at the start and $ at the end.

This means the following conversions:

  • bcd^bcd$



  • %bcd^.*?bcd$



  • bcd%^bcd.*?$



  • %bcd%^.?bcd.?$



In the cases where the pattern starts with %, the ^ is not necessary (and similarly for $ and % at the end), but it also doesn't do any harm.

Context

StackExchange Code Review Q#36861, answer score: 10

Revisions (0)

No revisions yet.