patternjavaModerate
Comparing strings with different newlines
Viewed 0 times
comparingnewlineswithdifferentstrings
Problem
Recently, I was burdened with the task of finding a bug. It turns out the problem was strings from different systems containing different newlines. Comparing two strings with different newlines (but same "text") still are not equal. E.g. "new\nline" (Unix flavor) and "new\r\nline" (Windows flavor) are not equal.
Since the code will be dealing with both types of newlines, I wrote a method to test for equality independent from newline type (the code treats "\n", "\r", "\r\n" and "\n\r" the same (even though "\n\r" isn't really used as newline)).
After I got the code done I would like your opinion on it. What do you think of variable names or method names (I know I could have chosen better names)? Is there a way to optimize the code or make it more readable?
```
public class StringUtils {
public static final char LF = '\n';
public static final char CR = '\r';
public static boolean equalsIgnoreNewlineTwirks(String str, String other){
if (str == null || other == null){
return false;
}
if (str == other){
return true;
}
char[] s1 = str.toCharArray();
char[] s2 = other.toCharArray();
int index1 = 0, index2 = 0;
while (true){
boolean oob1 = index1 >= s1.length, oob2 = index2 >= s2.length;
if (oob1 | oob2){
return oob1 & oob2;
}
char ch1 = s1[index1], ch2 = s2[index2];
if (ch1 != ch2){
if (ch1 != LF && ch1 != CR) return false;
if (ch2 != LF && ch2 != CR) return false;
if (index1 + 1 < s1.length && isCRAndLF(s1[index1], s1[index1 + 1])){
index1++;
}
if (index2 + 1 < s2.length && isCRAndLF(s2[index2], s2[index2 + 1])){
index2++;
}
}
index1++; index2++;
}
}
private static boolean isCRAndLF(char ch1, char ch2){
return
Since the code will be dealing with both types of newlines, I wrote a method to test for equality independent from newline type (the code treats "\n", "\r", "\r\n" and "\n\r" the same (even though "\n\r" isn't really used as newline)).
After I got the code done I would like your opinion on it. What do you think of variable names or method names (I know I could have chosen better names)? Is there a way to optimize the code or make it more readable?
```
public class StringUtils {
public static final char LF = '\n';
public static final char CR = '\r';
public static boolean equalsIgnoreNewlineTwirks(String str, String other){
if (str == null || other == null){
return false;
}
if (str == other){
return true;
}
char[] s1 = str.toCharArray();
char[] s2 = other.toCharArray();
int index1 = 0, index2 = 0;
while (true){
boolean oob1 = index1 >= s1.length, oob2 = index2 >= s2.length;
if (oob1 | oob2){
return oob1 & oob2;
}
char ch1 = s1[index1], ch2 = s2[index2];
if (ch1 != ch2){
if (ch1 != LF && ch1 != CR) return false;
if (ch2 != LF && ch2 != CR) return false;
if (index1 + 1 < s1.length && isCRAndLF(s1[index1], s1[index1 + 1])){
index1++;
}
if (index2 + 1 < s2.length && isCRAndLF(s2[index2], s2[index2 + 1])){
index2++;
}
}
index1++; index2++;
}
}
private static boolean isCRAndLF(char ch1, char ch2){
return
Solution
The code looks quite complicated at first glance, but when reading it, it is straightforward. If you want to make the code shorter, you can just do this:
Concerning running time and GC stress, your code is probably better. Use a benchmark to see how much better it is.
The word twirks sounded negative to me, therefore I replaced it with style.
Since the two strings have equal rights and are treated the same, none of them should be called the "other one".
You should not make the constants
public static boolean equalsIgnoreNewlineStyle(String s1, String s2) {
return s1 != null && s2 != null && normalizeLineEnds(s1).equals(normalizeLineEnds(s2));
}
private static String normalizeLineEnds(String s) {
return s.replace("\r\n", "\n").replace('\r', '\n');
}Concerning running time and GC stress, your code is probably better. Use a benchmark to see how much better it is.
The word twirks sounded negative to me, therefore I replaced it with style.
Since the two strings have equal rights and are treated the same, none of them should be called the "other one".
You should not make the constants
public, since there is no need to do that.Code Snippets
public static boolean equalsIgnoreNewlineStyle(String s1, String s2) {
return s1 != null && s2 != null && normalizeLineEnds(s1).equals(normalizeLineEnds(s2));
}
private static String normalizeLineEnds(String s) {
return s.replace("\r\n", "\n").replace('\r', '\n');
}Context
StackExchange Code Review Q#140048, answer score: 14
Revisions (0)
No revisions yet.