snippetcsharpModerate
Convert Sql LIKE to Regex
Viewed 0 times
convertregexlikesql
Problem
I have a bit of code that converts a Sql Like expression to a regex expression for the purposes of a Linq to objects
This conversion replaces all "%" with ".?" which works for contains patterns but is over matching for starts with or ends with patterns. So the conversion of
I've amended the conversion algorithm to account for starts with and ends with LIKE expressions.
Here is my code:
My in
Like extension method. For some time I have been using this conversion.This conversion replaces all "%" with ".?" which works for contains patterns but is over matching for starts with or ends with patterns. So the conversion of
%abc to .?abc is capturing both "abcdef" and "123abcdef".I've amended the conversion algorithm to account for starts with and ends with LIKE expressions.
Here is my code:
internal static string ConvertLikeToRegex(string pattern)
{
// Turn "off" all regular expression related syntax in the pattern string.
StringBuilder builder = new StringBuilder(Regex.Escape(pattern));
// these are needed because the .*? replacement below at the begining or end of the string is not
// accounting for cases such as LIKE '%abc' or LIKE 'abc%'
bool startsWith = pattern.StartsWith("%") && !pattern.EndsWith("%");
bool endsWith = !pattern.StartsWith("%") && pattern.EndsWith("%");
// this is a little tricky
// ends with in like is '%abc'
// in regex it's 'abc
My in
// so need to tanspose
if (startsWith)
{
builder.Replace("%", "", 0, 1);
builder.Append("$");
}
// same but inverse here
if (endsWith)
{
builder.Replace("%", "", pattern.Length - 1, 1);
builder.Insert(0, "^");
}
/* Replace the SQL LIKE wildcard metacharacters with the
* equivalent regular expression metacharacters. */
builder.Replace("%", ".*?").Replace("_", ".");
/* The previous call to Regex.Escape actually turned off
* too many metacharacters, i.e. those which are recognized by
* both the regular expression engine and the SQL LIKE
* statement ([...] and [^...]). Those metacharacters have
* to be manually unescaped here. */
builder.Replace(@"\[", "[").Replace(@"\]", "]").Replace(@"\^", "^");
return builder.ToString();
}My in
Solution
I think you're overcomplicating this and your code still doesn't work correctly. The LIKE pattern
What you should do is to always add
This means the following conversions:
In the cases where the pattern starts with
bcd shouldn't match abcde, but it does with your code.What you should do is to always add
^ at the start and $ at the end.This means the following conversions:
bcd→^bcd$
%bcd→^.*?bcd$
bcd%→^bcd.*?$
%bcd%→^.?bcd.?$
In the cases where the pattern starts with
%, the ^ is not necessary (and similarly for $ and % at the end), but it also doesn't do any harm.Context
StackExchange Code Review Q#36861, answer score: 10
Revisions (0)
No revisions yet.