patternjavaMinor
Map-reduce implementation for splitting strings
Viewed 0 times
mapsplittingreduceforimplementationstrings
Problem
I have been changing this code and I don't get to make it much better. I changed a little bit the structure, reimplemeted a new function for splitting Strings which is more efficient, etc. I have been tested with MR-Unit (it's part of map-reduce code).
I'm testing the code with 1.5 millions calls. It takes about 35 seconds on my computer, but in the real environment, it could be called with much more data, so, any optimization could be great. I'm worry about a little part of the code which I call around 7 times each iteration.
The parameters of my function are a map with the values what I want to replace, and another string which is a expression. It could be something like hard-code (I won't have to do any processing) or a expression like
My idea now, since it's map-reduce code, is to do some of this code out of the mapper, and it should execute just once. The code could be more complex but I would only have once for the matchers and the split. I don't know if that could improve the performance.
```
private static final Pattern PATTERN = Pattern
.compile("\\$\\{.+?\\}");
private static final Pattern PATTERN_DOLLAR = Pattern
.compile("^.\\$.$");
public static String replaceVariables(final String expression,
final Map vars) {
String tmpExp = expression;
Matcher matcher = PATTERN.matcher(tmpExp);
while (matcher.find()) {
final String group = matcher.group();
//${4} --> 4, ${2,8} --> 2,8
final String prop = group.substring(2, group.length() - 1);
// If the property has a comma, special case.
final String[] props = split(prop, ',');
//I get the value from the Map
String sValue = vars.get(props[0]);
if (sValue != null) {
//Special case, I could write ${0,3}, field 0, only the first 3 characters.
if (props.length > 1) {
I'm testing the code with 1.5 millions calls. It takes about 35 seconds on my computer, but in the real environment, it could be called with much more data, so, any optimization could be great. I'm worry about a little part of the code which I call around 7 times each iteration.
The parameters of my function are a map with the values what I want to replace, and another string which is a expression. It could be something like hard-code (I won't have to do any processing) or a expression like
${0} or something more complex like ${0}_${3}. My idea now, since it's map-reduce code, is to do some of this code out of the mapper, and it should execute just once. The code could be more complex but I would only have once for the matchers and the split. I don't know if that could improve the performance.
```
private static final Pattern PATTERN = Pattern
.compile("\\$\\{.+?\\}");
private static final Pattern PATTERN_DOLLAR = Pattern
.compile("^.\\$.$");
public static String replaceVariables(final String expression,
final Map vars) {
String tmpExp = expression;
Matcher matcher = PATTERN.matcher(tmpExp);
while (matcher.find()) {
final String group = matcher.group();
//${4} --> 4, ${2,8} --> 2,8
final String prop = group.substring(2, group.length() - 1);
// If the property has a comma, special case.
final String[] props = split(prop, ',');
//I get the value from the Map
String sValue = vars.get(props[0]);
if (sValue != null) {
//Special case, I could write ${0,3}, field 0, only the first 3 characters.
if (props.length > 1) {
Solution
What about the following code. I've improved it by not using regular expressions (except one call of
With the following example values:
%%CODEBLOCK_1%%
I have a benchmark of (1 million replaces):
%%CODEBLOCK_2%%
So it's a bit more than three times faster than your implementation.
And the more values like
Input:
%%CODEBLOCK_3%%
Benchmark:
%%CODEBLOCK_4%%
So, here it is more than six times faster!
Probably, it can be much more improved. But I just wanted to show you a possible direction so far!
String.split, perhaps you already improved this by your own function split).private final static char EXPR_VAR = '
With the following example values:
final String expression = "Hello! This is a ${1} ${0,2} of foo bla.";
final Map vars = new HashMap<>();
vars.put("0", "Foo");
vars.put("1", "funny");
I have a benchmark of (1 million replaces):
// Original (yours): 1872 ms
// Improved (mine): 500 ms
So it's a bit more than three times faster than your implementation.
And the more values like {1,3} are in the input expression, the faster it is in relation:
Input:
final String expression = "${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ";
Benchmark:
// Original: 17455 ms
// Improved: 2855 ms
So, here it is more than six times faster!
Probably, it can be much more improved. But I just wanted to show you a possible direction so far!;
private final static char EXPR_START = '{';
private final static char EXPR_END = '}';
private final static char EXPR_SPLIT = ',';
public static String replaceVariables(final String expression, final Map vars) {
if (expression == null) {
throw new IllegalArgumentException("The expression may not be null!");
}
if (vars == null) {
throw new IllegalArgumentException("The vars map may not be null!");
}
int firstIndex = expression.indexOf(EXPR_VAR);
if (firstIndex == -1) {
// nothing to replace, just return the expression
return expression;
}
final StringBuffer sb = new StringBuffer();
String tmpExp = expression;
int lastIndex;
String group;
String parts[];
while (firstIndex != -1) {
// check if char after '
With the following example values:
%%CODEBLOCK_1%%
I have a benchmark of (1 million replaces):
%%CODEBLOCK_2%%
So it's a bit more than three times faster than your implementation.
And the more values like {1,3} are in the input expression, the faster it is in relation:
Input:
%%CODEBLOCK_3%%
Benchmark:
%%CODEBLOCK_4%%
So, here it is more than six times faster!
Probably, it can be much more improved. But I just wanted to show you a possible direction so far! is '{'
if (tmpExp.charAt(firstIndex + 1) != EXPR_START) {
continue;
}
// find ending sign '}'
lastIndex = tmpExp.indexOf(EXPR_END, firstIndex);
if (lastIndex > -1) {
// complete pattern "${...}" found, append previous chars
sb.append(tmpExp.substring(0, firstIndex));
// get value inside pattern
group = tmpExp.substring(firstIndex + 2, lastIndex);
if (group.indexOf(EXPR_SPLIT) > -1) {
// we have a pattern like "${xxxxx,xxxxxx}"
parts = split(group, EXPR_SPLIT);
if (vars.containsKey(parts[0])) {
sb.append(vars.get(parts[0]).substring(0, Integer.valueOf(parts[1])));
} else {
throw new IllegalArgumentException("Key [" + parts[0] + "] not found in variable map!");
}
} else {
// we have a pattern like "${xxxxx}"
if (vars.containsKey(group)) {
sb.append(vars.get(group));
} else {
throw new IllegalArgumentException("Key [" + group + "] not found in variable map!");
}
}
// cut off previous chars from expression
tmpExp = tmpExp.substring(lastIndex + 1);
} else {
// assuming, that if no right parenthesis is found, there is nothing
// more to replace in this expression. So just get out of here!
return sb.append(tmpExp).toString();
}
// find next pattern
firstIndex = tmpExp.indexOf(EXPR_VAR);
}
// append rest of expression and return it as String
return sb.append(tmpExp).toString();
}With the following example values:
%%CODEBLOCK_1%%
I have a benchmark of (1 million replaces):
%%CODEBLOCK_2%%
So it's a bit more than three times faster than your implementation.
And the more values like
{1,3} are in the input expression, the faster it is in relation:Input:
%%CODEBLOCK_3%%
Benchmark:
%%CODEBLOCK_4%%
So, here it is more than six times faster!
Probably, it can be much more improved. But I just wanted to show you a possible direction so far!
Code Snippets
private final static char EXPR_VAR = '$';
private final static char EXPR_START = '{';
private final static char EXPR_END = '}';
private final static char EXPR_SPLIT = ',';
public static String replaceVariables(final String expression, final Map<String, String> vars) {
if (expression == null) {
throw new IllegalArgumentException("The expression may not be null!");
}
if (vars == null) {
throw new IllegalArgumentException("The vars map may not be null!");
}
int firstIndex = expression.indexOf(EXPR_VAR);
if (firstIndex == -1) {
// nothing to replace, just return the expression
return expression;
}
final StringBuffer sb = new StringBuffer();
String tmpExp = expression;
int lastIndex;
String group;
String parts[];
while (firstIndex != -1) {
// check if char after '$' is '{'
if (tmpExp.charAt(firstIndex + 1) != EXPR_START) {
continue;
}
// find ending sign '}'
lastIndex = tmpExp.indexOf(EXPR_END, firstIndex);
if (lastIndex > -1) {
// complete pattern "${...}" found, append previous chars
sb.append(tmpExp.substring(0, firstIndex));
// get value inside pattern
group = tmpExp.substring(firstIndex + 2, lastIndex);
if (group.indexOf(EXPR_SPLIT) > -1) {
// we have a pattern like "${xxxxx,xxxxxx}"
parts = split(group, EXPR_SPLIT);
if (vars.containsKey(parts[0])) {
sb.append(vars.get(parts[0]).substring(0, Integer.valueOf(parts[1])));
} else {
throw new IllegalArgumentException("Key [" + parts[0] + "] not found in variable map!");
}
} else {
// we have a pattern like "${xxxxx}"
if (vars.containsKey(group)) {
sb.append(vars.get(group));
} else {
throw new IllegalArgumentException("Key [" + group + "] not found in variable map!");
}
}
// cut off previous chars from expression
tmpExp = tmpExp.substring(lastIndex + 1);
} else {
// assuming, that if no right parenthesis is found, there is nothing
// more to replace in this expression. So just get out of here!
return sb.append(tmpExp).toString();
}
// find next pattern
firstIndex = tmpExp.indexOf(EXPR_VAR);
}
// append rest of expression and return it as String
return sb.append(tmpExp).toString();
}final String expression = "Hello! This is a ${1} ${0,2} of foo bla.";
final Map<String, String> vars = new HashMap<>();
vars.put("0", "Foo");
vars.put("1", "funny");// Original (yours): 1872 ms
// Improved (mine): 500 msfinal String expression = "${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ${1} ${0,2} ";// Original: 17455 ms
// Improved: 2855 msContext
StackExchange Code Review Q#37487, answer score: 6
Revisions (0)
No revisions yet.