patternjavaModerate
What would be preferred aesthetically and performance wise?
Viewed 0 times
whatwisepreferredaestheticallywouldperformanceand
Problem
Which one of these two would you prefer writing in your code?
This:
Or this:
Which one looks cleaner (in my opinion the second one but it obstructs you from commenting every line if you wanted to) and which one should perform better? (my guess is that there is no difference since regex doesn't create a new string everytime and it does everything internally if I am correct)
Additional Information:
Using one
Here is a short compilable example:
I have tried merging some of the
This:
tweet = tweet.replaceAll("@\\w+|#\\w+|\\bRT\\b", "");
tweet = tweet.replaceAll("\n", " ");
tweet = tweet.replaceAll("[^\\p{L}\\p{N} ]+", " ");
tweet = tweet.replaceAll(" +", " ").trim();Or this:
tweet = tweet.replaceAll("@\\w+|#\\w+|\\bRT\\b", "")
.replaceAll("\n", " ")
.replaceAll("[^\\p{L}\\p{N} ]+", " ")
.replaceAll(" +", " ")
.trim();Which one looks cleaner (in my opinion the second one but it obstructs you from commenting every line if you wanted to) and which one should perform better? (my guess is that there is no difference since regex doesn't create a new string everytime and it does everything internally if I am correct)
Additional Information:
Using one
.replaceAll() is not possible because if the content that I want to be removed is not removed in this order then the output will be wrong.Here is a short compilable example:
public class StringTest {
public static void main(String args[]) {
String text = "RT @AshStewart09: Vote for Lady Gaga for \"Best Fans\""
+ " at iHeart Awards\n"
+ "\n"
+ "RT!!\n"
+ "\n"
+ "My vote for #FanArmy goes to #LittleMonsters #iHeartAwards\n"
+ "for ART!"
+ "htt… è, é, ê, ë asdf324 ah_";
System.out.println("Before: " + text + "\n");
text = text.replaceAll("@\\w+|#\\w+|\\bRT\\b", "")
.replaceAll("\n", " ")
.replaceAll("[^\\p{L}\\p{N} ]+", " ")
.replaceAll(" +", " ").trim();
System.out.println("After: " + text + "\n");
}
}I have tried merging some of the
.replaceAll() but the output was always wrong if I did the operations in any other order than this. In the end I want to be left with just the words of the tweet and nothing else. Bare in mind that this isSolution
I agree mostly with @Vogel612's answer, the second version is more concise and there is no need to litter variables all over the place as is done in the first version.
Another point is that code should be self-documenting, which @Vogel612 also suggests, but else you can still add a comment behind the code lines.
My review
I wanted to talk about a more interesting point, as this question seems to be a subproblem of a bigger question of yours, which involves creating a high throughput program that does something with tweets.
Perhaps this is a premature optimization given that I do not know how well your program performs, but your program may very well hit a threshold limit because of the way you handle regexes.
You should consider what a
For every provided regex it creates a pattern and matches and replaces the occurences with the replacement.
It is also important to look into
Here you can see, that because of four regex replaces, you create four times a
There are two things you can do:
Another point is that code should be self-documenting, which @Vogel612 also suggests, but else you can still add a comment behind the code lines.
My review
I wanted to talk about a more interesting point, as this question seems to be a subproblem of a bigger question of yours, which involves creating a high throughput program that does something with tweets.
Perhaps this is a premature optimization given that I do not know how well your program performs, but your program may very well hit a threshold limit because of the way you handle regexes.
You should consider what a
String.replaceAll does:public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}For every provided regex it creates a pattern and matches and replaces the occurences with the replacement.
It is also important to look into
Pattern.replaceAll now:public String replaceAll(String replacement) {
reset();
boolean result = find();
if (result) {
StringBuffer sb = new StringBuffer();
do {
appendReplacement(sb, replacement);
result = find();
} while (result);
appendTail(sb);
return sb.toString();
}
return text.toString();
}Here you can see, that because of four regex replaces, you create four times a
StringBuffer, which is synchronized, and a String object, which can count up if you are processing tons of tweets.There are two things you can do:
- Create a single regex that captures what you want to do, thus needing only one
replaceAllcall.
- Go really into micro-optimizing and get rid of the regexes in a whole, operate on a
char[]and only store the resulting character array.
Code Snippets
public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}public String replaceAll(String replacement) {
reset();
boolean result = find();
if (result) {
StringBuffer sb = new StringBuffer();
do {
appendReplacement(sb, replacement);
result = find();
} while (result);
appendTail(sb);
return sb.toString();
}
return text.toString();
}Context
StackExchange Code Review Q#47465, answer score: 15
Revisions (0)
No revisions yet.