patterncsharpMinor
Implementation of String.prototype.replace
Viewed 0 times
implementationstringprototypereplace
Problem
With just a brief look over I can think of a large amount of changes that would improve the clarity but I would love to see how someone else would refactor this mess.
Spec
Code
``
.Matches(format)
.Cast()
.Select(m => (Capture)m.Groups[1])
.ToList();
if (tokens.Count == 0)
{
return (value, state) => format;
}
var appendMethod = typeof(StringBuilder).GetMethod("Append", new[] { typeof(string) });
var toStringMethod = typeof(StringBuilder).GetMethod("ToString", new Type[0]);
var substringMethod = typeof(string).GetMethod("Substring", new[] { typeof(int) });
var substringWithLengthMethod = typeof(string).GetMethod("Substring", new[] { typeof(int), typeof(int) });
var zero = Expression.Constant(0);
var one = Expression.Constant(1);
var dollarSign = Expression.Constant("$");
var valueVariable = Expression.Parameter(typeof(string), "value");
var stateVariable = Expression.Parameter(typeof(RegExpParser.MatchState), "state");
var sbVariable = Expression.Variable(typeof(StringBuilder), "sb");
var capturesProp = Expression.Property(stateVariable, "captures");
var capturesLengthProp = Expression.Property(capturesProp, "Length");
var endIndexProp = Expression.Property(stateVariable, "endIndex");
var inputProp = Expression.Property(stateVariable, "input");
var appendDollarSign = Expression.Call(sbVariable, appendMethod, Expression.Constant("$"));
var getMatchedSubstring = Expression.ArrayIndex(capturesProp, zero);
var appendMatchedSubstring = Expression.Call(sbVariable, appendMethod, getMatchedSubstring);
var getBeforeSubstring = Expression.Call(inputProp, substringWithLengthMethod, zero, Expression.Subtract(endIndexProp, one));
var appendBeforeSubstring = Expression.Call(sbVariable, appendMethod, getBeforeSubstring);
var getAfterSu
Spec
Code
``
static Func MakeReplacer(string format)
{
var tokens =
new Regex(@"[^\$]*(\$[\$&']|\$\d{1,2})").Matches(format)
.Cast()
.Select(m => (Capture)m.Groups[1])
.ToList();
if (tokens.Count == 0)
{
return (value, state) => format;
}
var appendMethod = typeof(StringBuilder).GetMethod("Append", new[] { typeof(string) });
var toStringMethod = typeof(StringBuilder).GetMethod("ToString", new Type[0]);
var substringMethod = typeof(string).GetMethod("Substring", new[] { typeof(int) });
var substringWithLengthMethod = typeof(string).GetMethod("Substring", new[] { typeof(int), typeof(int) });
var zero = Expression.Constant(0);
var one = Expression.Constant(1);
var dollarSign = Expression.Constant("$");
var valueVariable = Expression.Parameter(typeof(string), "value");
var stateVariable = Expression.Parameter(typeof(RegExpParser.MatchState), "state");
var sbVariable = Expression.Variable(typeof(StringBuilder), "sb");
var capturesProp = Expression.Property(stateVariable, "captures");
var capturesLengthProp = Expression.Property(capturesProp, "Length");
var endIndexProp = Expression.Property(stateVariable, "endIndex");
var inputProp = Expression.Property(stateVariable, "input");
var appendDollarSign = Expression.Call(sbVariable, appendMethod, Expression.Constant("$"));
var getMatchedSubstring = Expression.ArrayIndex(capturesProp, zero);
var appendMatchedSubstring = Expression.Call(sbVariable, appendMethod, getMatchedSubstring);
var getBeforeSubstring = Expression.Call(inputProp, substringWithLengthMethod, zero, Expression.Subtract(endIndexProp, one));
var appendBeforeSubstring = Expression.Call(sbVariable, appendMethod, getBeforeSubstring);
var getAfterSu
Solution
My first instinct was “why do you construct all these expressions, only to compile it and return just the function — why not just use lambdas directly”... but after a second thought it seems that this wouldn’t be any clearer at all, so I think using expressions here is actually really nice :)
Things I notice that could be improved:
Instead of
I think you could write
Notice that I’ve changed the
Now you don’t need to care about the index anymore and can get rid of that variable:
The next thing I notice is:
You don’t need to remove the leading zero. It’s not going to throw
That’s all I notice in the first method. I cannot claim to fully understand the code that makes use of
As for the second method...
Is it just me, or is all of that just a roundabout way of saying
Having too many local variables that are used only once does not always improve readability. The reader cannot tell instantly that the variable is used only once, so it is not obvious which variables they need to keep in their head while reading. This is why I inlined
In the
Looking at it again, however, I’m beginning to wonder why you are populating this
Also, this is a significant performance problem:
You are searching the same string for the same substring many times even when you know it’s not there. If
The same may apply to the regex-matching part of the block depending on what
I believe this is a serious bug:
I think you want this to say
Imagine the client wants to replace all instances of
Things I notice that could be improved:
Instead of
var tokens =
new Regex(@"[^\$]*(\$[\%%CODEBLOCK_0%%amp;`']|\$\d{1,2})")
.Matches(format)
.Cast()
.Select(m => (Capture)m.Groups[1])
.ToList();I think you could write
var items =
Regex.Matches(format, @"([^\$]*)(\$[\%%CODEBLOCK_1%%amp;`']|\$\d{1,2})")
.Cast()
.Select(m => new { TextBefore = m.Groups[1].Value,
Token = m.Groups[2].Value });Notice that I’ve changed the
Regex object instantiation to a static method call; it makes little difference, I just think it’s more readable. I’ve also removed the call to ToList() because it seems you are only iterating over it, you are not manipulating the list, so there is no need for this extra list instantiation.Now you don’t need to care about the index anymore and can get rid of that variable:
foreach (var item in items)
{
if (item.TextBefore.Length > 0)
{
e = Expression.Call(e, appendMethod, Expression.Constant(item.TextBefore));
}
// ... etc.The next thing I notice is:
var n = token.Value.Substring(1);
if (n.Length == 2 && n[0] == '0')
{
n = n.Substring(1);
}
var i = int.Parse(n) - 1;You don’t need to remove the leading zero. It’s not going to throw
int.Parse off (it doesn’t make it mean octal if that’s what you thought). So you can just write:var i = int.Parse(item.Token.Substring(1)) - 1;That’s all I notice in the first method. I cannot claim to fully understand the code that makes use of
RegExpParser.MatchState, but it looks clean, if that means anything :)As for the second method...
var nullArg = (IDynamic)environment.Null;
var cArgs = new List();
foreach (var c in state.captures)
{
if (c == null)
{
cArgs.Add(environment.Null);
continue;
}
cArgs.Add(c == null ? nullArg : environment.CreateString(c));
}Is it just me, or is all of that just a roundabout way of saying
var cArgs = state.captures
.Select(c => c == null ? (IDynamic)environment.Null
: environment.CreateString(c))
.ToList();Having too many local variables that are used only once does not always improve readability. The reader cannot tell instantly that the variable is used only once, so it is not obvious which variables they need to keep in their head while reading. This is why I inlined
nullArg. If you disagree, of course you can change it back to a local variable.In the
if (!regExpObj.Flags.Contains("g")) block you seem to have a bit of duplicated code. It seems to me that it can easily be refactored:var globalFlag = regExpObj.Flags.Contains("g");
var index = globalFlag ? (int)regExpObj.Get("lastIndex").ConvertToNumber().BaseValue : 0;
do
{
var r = matcher(s, index);
if (!r.success)
{
index++;
continue;
}
matches.Add(
Tuple.Create(
index,
r.matchState.endIndex - index,
makeReplacer(index, r.matchState)
)
);
if (!globalFlag)
break;
index = r.matchState.endIndex;
regExpObj.Put("lastIndex", environment.CreateNumber(index), false);
}
while (index < s.Length);Looking at it again, however, I’m beginning to wonder why you are populating this
matches list; you are not returning it or otherwise passing it anywhere. It seems that you actually only need the resulting string, not the list of matches, right? If that is the case, then personally I would probably just generate the string and not bother with the list of matches. That also gets rid of the problem I see of using tuples: the names Item1, Item2, Item3 are really not very readable.Also, this is a significant performance problem:
var resultIndex = s.IndexOf(searchValue, index);
if (resultIndex < 0)
{
index++;
continue;
}You are searching the same string for the same substring many times even when you know it’s not there. If
resultIndex < 0 you can stop:var resultIndex = s.IndexOf(searchValue, index);
if (resultIndex < 0)
break;The same may apply to the regex-matching part of the block depending on what
matcher does. If it only matches a regular expression anchored to the beginning of the string, then of course your existing code is correct. If it matches anywhere in the string, the same optimisation should be made there.I believe this is a serious bug:
index = resultIndex + 1;I think you want this to say
index = resultIndex + searchValue.Length;Imagine the client wants to replace all instances of
aa with x and the input string contains aaaa. If I read the code at the Code Snippets
var tokens =
new Regex(@"[^\$]*(\$[\$&`']|\$\d{1,2})")
.Matches(format)
.Cast<Match>()
.Select(m => (Capture)m.Groups[1])
.ToList();var items =
Regex.Matches(format, @"([^\$]*)(\$[\$&`']|\$\d{1,2})")
.Cast<Match>()
.Select(m => new { TextBefore = m.Groups[1].Value,
Token = m.Groups[2].Value });foreach (var item in items)
{
if (item.TextBefore.Length > 0)
{
e = Expression.Call(e, appendMethod, Expression.Constant(item.TextBefore));
}
// ... etc.var n = token.Value.Substring(1);
if (n.Length == 2 && n[0] == '0')
{
n = n.Substring(1);
}
var i = int.Parse(n) - 1;var i = int.Parse(item.Token.Substring(1)) - 1;Context
StackExchange Code Review Q#1699, answer score: 2
Revisions (0)
No revisions yet.