HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Number aware string sorting with comparator

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
sortingnumberwithstringcomparatoraware

Problem

I have this class for use in sorting strings such that if strings have a number in the same position it will order the numbers in increasing order.

Alphabetical gives:

  • file1



  • file10



  • file2



What I'm calling "number aware" string sorting should give:

  • file1



  • file2



  • file10



Here is what I have using a regex split from there.

The code seems to work. Any cases where I could run into problem? If not any suggestions on making it simpler or more efficient.

import java.util.Comparator;

public class NumberAwareStringComparator implements Comparator{
     public int compare(String s1, String s2) {

            String[] s1Parts = s1.split("(? s2.length()){
                return 1;
            }else{
                return 0;
            }
        }
}

Solution

Exceptions should be reserved for exceptional situations, and should be avoided if possible. The fundamental reason you have to deal with NumberFormatException is that after splitting, you don't know whether each part contains a number or a non-number.

Here's a strategy that always compares non-digits to non-digits, and numbers to numbers.

import java.math.BigInteger;
import java.util.Comparator;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class NumberAwareStringComparator implements Comparator {
    public static final NumberAwareStringComparator INSTANCE =
        new NumberAwareStringComparator();

    private static final Pattern PATTERN = Pattern.compile("(\\D*)(\\d*)");

    private NumberAwareStringComparator() {
    }

    public int compare(CharSequence s1, CharSequence s2) {
        Matcher m1 = PATTERN.matcher(s1);
        Matcher m2 = PATTERN.matcher(s2);

        // The only way find() could fail is at the end of a string
        while (m1.find() && m2.find()) {
            // matcher.group(1) fetches any non-digits captured by the
            // first parentheses in PATTERN.
            int nonDigitCompare = m1.group(1).compareTo(m2.group(1));
            if (0 != nonDigitCompare) {
                return nonDigitCompare;
            }

            // matcher.group(2) fetches any digits captured by the
            // second parentheses in PATTERN.
            if (m1.group(2).isEmpty()) {
                return m2.group(2).isEmpty() ? 0 : -1;
            } else if (m2.group(2).isEmpty()) {
                return +1;
            }

            BigInteger n1 = new BigInteger(m1.group(2));
            BigInteger n2 = new BigInteger(m2.group(2));
            int numberCompare = n1.compareTo(n2);
            if (0 != numberCompare) {
                return numberCompare;
            }
        }

        // Handle if one string is a prefix of the other.
        // Nothing comes before something.
        return m1.hitEnd() && m2.hitEnd() ? 0 :
               m1.hitEnd()                ? -1 : +1;
    }
}


Since strings of digits (such as those representing dates, like 20131212123456.log) can overflow an int, I've used java.math.BigInteger.

Also, since the code works just as well with CharSequence as with String, I've generalized the type to Comparator.

Code Snippets

import java.math.BigInteger;
import java.util.Comparator;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class NumberAwareStringComparator implements Comparator<CharSequence> {
    public static final NumberAwareStringComparator INSTANCE =
        new NumberAwareStringComparator();

    private static final Pattern PATTERN = Pattern.compile("(\\D*)(\\d*)");

    private NumberAwareStringComparator() {
    }

    public int compare(CharSequence s1, CharSequence s2) {
        Matcher m1 = PATTERN.matcher(s1);
        Matcher m2 = PATTERN.matcher(s2);

        // The only way find() could fail is at the end of a string
        while (m1.find() && m2.find()) {
            // matcher.group(1) fetches any non-digits captured by the
            // first parentheses in PATTERN.
            int nonDigitCompare = m1.group(1).compareTo(m2.group(1));
            if (0 != nonDigitCompare) {
                return nonDigitCompare;
            }

            // matcher.group(2) fetches any digits captured by the
            // second parentheses in PATTERN.
            if (m1.group(2).isEmpty()) {
                return m2.group(2).isEmpty() ? 0 : -1;
            } else if (m2.group(2).isEmpty()) {
                return +1;
            }

            BigInteger n1 = new BigInteger(m1.group(2));
            BigInteger n2 = new BigInteger(m2.group(2));
            int numberCompare = n1.compareTo(n2);
            if (0 != numberCompare) {
                return numberCompare;
            }
        }

        // Handle if one string is a prefix of the other.
        // Nothing comes before something.
        return m1.hitEnd() && m2.hitEnd() ? 0 :
               m1.hitEnd()                ? -1 : +1;
    }
}

Context

StackExchange Code Review Q#37192, answer score: 8

Revisions (0)

No revisions yet.