HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Fast regex to extract strings before and after a time

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fastafterbeforetimeextractandregexstrings

Problem

I want to get text1 and text2 by splitting time format

Text1 10:24:02 Text2


Of the two working regexes below, which is faster?

Regex 1

String regex1= "([0-9]{2}):([0-9]{2}):([0-9]{2})";


Regex 2

String regex2= "[0-9][0-9]:[0-9][0-9]:[0-9][0-9]";


This is the code I've used:

String s="Text1 10:24:02 Text2";
String[] split= s.split(regex1);//regex 1 and 2
System.out.println(split[0]);
System.out.println(split[1]);

Solution

Regex 2 is faster, but probably not for the reason that you expect.

You can easily answer questions like these by writing a benchmark. Here is an example:

public static long splitTime(String regex, String text) {
    long start = System.currentTimeMillis();
    for (int i = 0; i < 1000; i++) {
        String[] split = text.split(regex);
    }
    long end = System.currentTimeMillis();
    return end - start;
}

public static void main(String[] args) {
    String regex1 = "([0-9]{2}):([0-9]{2}):([0-9]{2})";
    String regex2 = "[0-9][0-9]:[0-9][0-9]:[0-9][0-9]";
    String s="Text1 10:24:02 Text2";

    // Warm up the loops
    for (int i = 0; i < 2000; i++) {
        splitTime(regex1, s);
        splitTime(regex2, s);
    }

    long time0 = 0, time1 = 0, time2 = 0;
    for (int i = 0; i < 2000; i++) {
        time1 += splitTime(regex1, s);
        time2 += splitTime(regex2, s);
        time2 += splitTime(regex2, s);
        time1 += splitTime(regex1, s);
    }

    System.out.println("Regex 1: " + time1);
    System.out.println("Regex 2: " + time2);
}


The JIT compiler tends to do funny tricks. For fairness, I've warmed up the loop by executing both of them without timing. I've also interleaved the calls to splitTime in case the order somehow makes a difference.

I found that Regex 1 is slower than Regex 2 by about 5%.

However, Regex 1 has some capturing parentheses. If you remove them,

String regex0 = "[0-9]{2}:[0-9]{2}:[0-9]{2}";


then you get a result that is 16% faster than Regex 2.

Code Snippets

public static long splitTime(String regex, String text) {
    long start = System.currentTimeMillis();
    for (int i = 0; i < 1000; i++) {
        String[] split = text.split(regex);
    }
    long end = System.currentTimeMillis();
    return end - start;
}

public static void main(String[] args) {
    String regex1 = "([0-9]{2}):([0-9]{2}):([0-9]{2})";
    String regex2 = "[0-9][0-9]:[0-9][0-9]:[0-9][0-9]";
    String s="Text1 10:24:02 Text2";

    // Warm up the loops
    for (int i = 0; i < 2000; i++) {
        splitTime(regex1, s);
        splitTime(regex2, s);
    }

    long time0 = 0, time1 = 0, time2 = 0;
    for (int i = 0; i < 2000; i++) {
        time1 += splitTime(regex1, s);
        time2 += splitTime(regex2, s);
        time2 += splitTime(regex2, s);
        time1 += splitTime(regex1, s);
    }

    System.out.println("Regex 1: " + time1);
    System.out.println("Regex 2: " + time2);
}
String regex0 = "[0-9]{2}:[0-9]{2}:[0-9]{2}";

Context

StackExchange Code Review Q#67194, answer score: 4

Revisions (0)

No revisions yet.