HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaModerate

Comparing strings with different newlines

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
comparingnewlineswithdifferentstrings

Problem

Recently, I was burdened with the task of finding a bug. It turns out the problem was strings from different systems containing different newlines. Comparing two strings with different newlines (but same "text") still are not equal. E.g. "new\nline" (Unix flavor) and "new\r\nline" (Windows flavor) are not equal.

Since the code will be dealing with both types of newlines, I wrote a method to test for equality independent from newline type (the code treats "\n", "\r", "\r\n" and "\n\r" the same (even though "\n\r" isn't really used as newline)).

After I got the code done I would like your opinion on it. What do you think of variable names or method names (I know I could have chosen better names)? Is there a way to optimize the code or make it more readable?

```
public class StringUtils {
public static final char LF = '\n';
public static final char CR = '\r';

public static boolean equalsIgnoreNewlineTwirks(String str, String other){
if (str == null || other == null){
return false;
}
if (str == other){
return true;
}

char[] s1 = str.toCharArray();
char[] s2 = other.toCharArray();
int index1 = 0, index2 = 0;
while (true){
boolean oob1 = index1 >= s1.length, oob2 = index2 >= s2.length;
if (oob1 | oob2){
return oob1 & oob2;
}

char ch1 = s1[index1], ch2 = s2[index2];
if (ch1 != ch2){
if (ch1 != LF && ch1 != CR) return false;
if (ch2 != LF && ch2 != CR) return false;

if (index1 + 1 < s1.length && isCRAndLF(s1[index1], s1[index1 + 1])){
index1++;
}
if (index2 + 1 < s2.length && isCRAndLF(s2[index2], s2[index2 + 1])){
index2++;
}
}

index1++; index2++;
}
}

private static boolean isCRAndLF(char ch1, char ch2){
return

Solution

The code looks quite complicated at first glance, but when reading it, it is straightforward. If you want to make the code shorter, you can just do this:

public static boolean equalsIgnoreNewlineStyle(String s1, String s2) {
    return s1 != null && s2 != null && normalizeLineEnds(s1).equals(normalizeLineEnds(s2));
}

private static String normalizeLineEnds(String s) {
    return s.replace("\r\n", "\n").replace('\r', '\n');
}


Concerning running time and GC stress, your code is probably better. Use a benchmark to see how much better it is.

The word twirks sounded negative to me, therefore I replaced it with style.

Since the two strings have equal rights and are treated the same, none of them should be called the "other one".

You should not make the constants public, since there is no need to do that.

Code Snippets

public static boolean equalsIgnoreNewlineStyle(String s1, String s2) {
    return s1 != null && s2 != null && normalizeLineEnds(s1).equals(normalizeLineEnds(s2));
}

private static String normalizeLineEnds(String s) {
    return s.replace("\r\n", "\n").replace('\r', '\n');
}

Context

StackExchange Code Review Q#140048, answer score: 14

Revisions (0)

No revisions yet.