HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Implementation of String.prototype.replace

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
implementationstringprototypereplace

Problem

With just a brief look over I can think of a large amount of changes that would improve the clarity but I would love to see how someone else would refactor this mess.

Spec

Code

``
static Func MakeReplacer(string format)
{
var tokens =
new Regex(@"[^\$]*(\$[\$&
']|\$\d{1,2})")
.Matches(format)
.Cast()
.Select(m => (Capture)m.Groups[1])
.ToList();

if (tokens.Count == 0)
{
return (value, state) => format;
}

var appendMethod = typeof(StringBuilder).GetMethod("Append", new[] { typeof(string) });
var toStringMethod = typeof(StringBuilder).GetMethod("ToString", new Type[0]);
var substringMethod = typeof(string).GetMethod("Substring", new[] { typeof(int) });
var substringWithLengthMethod = typeof(string).GetMethod("Substring", new[] { typeof(int), typeof(int) });
var zero = Expression.Constant(0);
var one = Expression.Constant(1);
var dollarSign = Expression.Constant("$");
var valueVariable = Expression.Parameter(typeof(string), "value");
var stateVariable = Expression.Parameter(typeof(RegExpParser.MatchState), "state");
var sbVariable = Expression.Variable(typeof(StringBuilder), "sb");
var capturesProp = Expression.Property(stateVariable, "captures");
var capturesLengthProp = Expression.Property(capturesProp, "Length");
var endIndexProp = Expression.Property(stateVariable, "endIndex");
var inputProp = Expression.Property(stateVariable, "input");
var appendDollarSign = Expression.Call(sbVariable, appendMethod, Expression.Constant("$"));
var getMatchedSubstring = Expression.ArrayIndex(capturesProp, zero);
var appendMatchedSubstring = Expression.Call(sbVariable, appendMethod, getMatchedSubstring);
var getBeforeSubstring = Expression.Call(inputProp, substringWithLengthMethod, zero, Expression.Subtract(endIndexProp, one));
var appendBeforeSubstring = Expression.Call(sbVariable, appendMethod, getBeforeSubstring);
var getAfterSu

Solution

My first instinct was “why do you construct all these expressions, only to compile it and return just the function — why not just use lambdas directly”... but after a second thought it seems that this wouldn’t be any clearer at all, so I think using expressions here is actually really nice :)

Things I notice that could be improved:

Instead of

var tokens =
    new Regex(@"[^\$]*(\$[\%%CODEBLOCK_0%%amp;`']|\$\d{1,2})")
    .Matches(format)
    .Cast()
    .Select(m => (Capture)m.Groups[1])
    .ToList();


I think you could write

var items =
    Regex.Matches(format, @"([^\$]*)(\$[\%%CODEBLOCK_1%%amp;`']|\$\d{1,2})")
    .Cast()
    .Select(m => new { TextBefore = m.Groups[1].Value,
                       Token = m.Groups[2].Value });


Notice that I’ve changed the Regex object instantiation to a static method call; it makes little difference, I just think it’s more readable. I’ve also removed the call to ToList() because it seems you are only iterating over it, you are not manipulating the list, so there is no need for this extra list instantiation.

Now you don’t need to care about the index anymore and can get rid of that variable:

foreach (var item in items)
{
    if (item.TextBefore.Length > 0)
    {
        e = Expression.Call(e, appendMethod, Expression.Constant(item.TextBefore));
    }

    // ... etc.


The next thing I notice is:

var n = token.Value.Substring(1);
if (n.Length == 2 && n[0] == '0')
{
    n = n.Substring(1);
}
var i = int.Parse(n) - 1;


You don’t need to remove the leading zero. It’s not going to throw int.Parse off (it doesn’t make it mean octal if that’s what you thought). So you can just write:

var i = int.Parse(item.Token.Substring(1)) - 1;


That’s all I notice in the first method. I cannot claim to fully understand the code that makes use of RegExpParser.MatchState, but it looks clean, if that means anything :)

As for the second method...

var nullArg = (IDynamic)environment.Null;
var cArgs = new List();
foreach (var c in state.captures)
{
    if (c == null)
    {
        cArgs.Add(environment.Null);
        continue;
    }
    cArgs.Add(c == null ? nullArg : environment.CreateString(c));
}


Is it just me, or is all of that just a roundabout way of saying

var cArgs = state.captures
                 .Select(c => c == null ? (IDynamic)environment.Null
                                        : environment.CreateString(c))
                 .ToList();


Having too many local variables that are used only once does not always improve readability. The reader cannot tell instantly that the variable is used only once, so it is not obvious which variables they need to keep in their head while reading. This is why I inlined nullArg. If you disagree, of course you can change it back to a local variable.

In the if (!regExpObj.Flags.Contains("g")) block you seem to have a bit of duplicated code. It seems to me that it can easily be refactored:

var globalFlag = regExpObj.Flags.Contains("g");
var index = globalFlag ? (int)regExpObj.Get("lastIndex").ConvertToNumber().BaseValue : 0;
do
{
    var r = matcher(s, index);
    if (!r.success)
    {
        index++;
        continue;
    }
    matches.Add(
        Tuple.Create(
            index,
            r.matchState.endIndex - index,
            makeReplacer(index, r.matchState)
        )
    );

    if (!globalFlag)
        break;

    index = r.matchState.endIndex;
    regExpObj.Put("lastIndex", environment.CreateNumber(index), false);
}
while (index < s.Length);


Looking at it again, however, I’m beginning to wonder why you are populating this matches list; you are not returning it or otherwise passing it anywhere. It seems that you actually only need the resulting string, not the list of matches, right? If that is the case, then personally I would probably just generate the string and not bother with the list of matches. That also gets rid of the problem I see of using tuples: the names Item1, Item2, Item3 are really not very readable.

Also, this is a significant performance problem:

var resultIndex = s.IndexOf(searchValue, index);
if (resultIndex < 0)
{
    index++;
    continue;
}


You are searching the same string for the same substring many times even when you know it’s not there. If resultIndex < 0 you can stop:

var resultIndex = s.IndexOf(searchValue, index);
if (resultIndex < 0)
    break;


The same may apply to the regex-matching part of the block depending on what matcher does. If it only matches a regular expression anchored to the beginning of the string, then of course your existing code is correct. If it matches anywhere in the string, the same optimisation should be made there.

I believe this is a serious bug:

index = resultIndex + 1;


I think you want this to say

index = resultIndex + searchValue.Length;


Imagine the client wants to replace all instances of aa with x and the input string contains aaaa. If I read the code at the

Code Snippets

var tokens =
    new Regex(@"[^\$]*(\$[\$&`']|\$\d{1,2})")
    .Matches(format)
    .Cast<Match>()
    .Select(m => (Capture)m.Groups[1])
    .ToList();
var items =
    Regex.Matches(format, @"([^\$]*)(\$[\$&`']|\$\d{1,2})")
    .Cast<Match>()
    .Select(m => new { TextBefore = m.Groups[1].Value,
                       Token = m.Groups[2].Value });
foreach (var item in items)
{
    if (item.TextBefore.Length > 0)
    {
        e = Expression.Call(e, appendMethod, Expression.Constant(item.TextBefore));
    }

    // ... etc.
var n = token.Value.Substring(1);
if (n.Length == 2 && n[0] == '0')
{
    n = n.Substring(1);
}
var i = int.Parse(n) - 1;
var i = int.Parse(item.Token.Substring(1)) - 1;

Context

StackExchange Code Review Q#1699, answer score: 2

Revisions (0)

No revisions yet.