HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Split camel cased/snake cased String

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
casedsnakecamelsplitstring

Problem

I want to implement a method which, given some camelcased or underscored String, will return a list of separate words that make up this String.
Examples:

  • ISomeCamelCasedString -> {I, Some, Camel, Cased, String}



  • UNDERSCORED_STRING -> {UNDERSCORED, STRING}



  • camelCased_and_UNDERSCORED -> {camel, Cased, and, UNDERSCORED}



My approach to solve this is as follows: I add a space character between each two words that should be separated and then divide them into a list using StringTokenizer.

public static List split(String string) {
    StringBuilder separatedWords = new StringBuilder();

    for (int i=0; i 0) {    
            char previousC = string.charAt(i-1);

            if ((!Character.isLowerCase(c) && !Character.isUpperCase(previousC)) //  UpperCamelCase, UPPER_DASHED
                    || !Character.isLetterOrDigit(previousC)                    // lower_dashed
                    || (i  tokens = new ArrayList();

    StringTokenizer tokenizer = new StringTokenizer(separatedWords.toString());

    while(tokenizer.hasMoreTokens()) {
        tokens.add(tokenizer.nextToken());
    }
    return tokens;
}


As you can see, it looks quite complicated. How can I improve this?

Solution

My first impression:

  • The implementation looks long and complicated for something that sounds simple



  • You've provided some example cases to verify correctness, which is a very good thing



My first reaction is to embody the example test cases in proper unit tests, and then replace the implementation with a different approach, and make the tests I broke work again.

The unit tests, straightforward from your examples:

@Test
public void testCamelCased() {
    assertEquals(Arrays.asList("I", "Some", "Camel", "Cased", "String"), split("ISomeCamelCasedString"));
}

@Test
public void testSnakeCased() {
    assertEquals(Arrays.asList("UNDERSCORED", "STRING"), split("UNDERSCORED_STRING"));
}

@Test
public void testMixed() {
    assertEquals(Arrays.asList("camel", "Cased", "and", "UNDERSCORED"), split("camelCased_and_UNDERSCORED"));
}


Next, I was thinking that this can be simplified by splitting on some clever regex. I'm no regex wiz, but there are a lot of those on Stack Overflow, and I found a suitable thread about splitting camelCase to its words, here. Adapting it to also split on underscores was relatively easy, though probably not perfect, resulting in this:

private static final String RE_CAMELCASE_OR_UNDERSCORE =
        "(? split(String string) {
    List words = new ArrayList();
    for (String word : string.split(RE_CAMELCASE_OR_UNDERSCORE)) {
        if (!word.isEmpty()) {
            words.add(word);
        }
    }
    return words;
}


This is probably not perfect, but a lot shorter than the original, and easier to understand how it works. If somebody can figure out how to get RE_CAMELCASE_OR_UNDERSCORE so that it doesn't produce empty elements, then the method can be shortened to simply:

return Arrays.asList(string.split(RE_CAMELCASE_OR_UNDERSCORE));


PS: this_is_usually_called_snake_cased, not "underscored".

Code Snippets

@Test
public void testCamelCased() {
    assertEquals(Arrays.asList("I", "Some", "Camel", "Cased", "String"), split("ISomeCamelCasedString"));
}

@Test
public void testSnakeCased() {
    assertEquals(Arrays.asList("UNDERSCORED", "STRING"), split("UNDERSCORED_STRING"));
}

@Test
public void testMixed() {
    assertEquals(Arrays.asList("camel", "Cased", "and", "UNDERSCORED"), split("camelCased_and_UNDERSCORED"));
}
private static final String RE_CAMELCASE_OR_UNDERSCORE =
        "(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])|_";

public static List<String> split(String string) {
    List<String> words = new ArrayList<String>();
    for (String word : string.split(RE_CAMELCASE_OR_UNDERSCORE)) {
        if (!word.isEmpty()) {
            words.add(word);
        }
    }
    return words;
}
return Arrays.asList(string.split(RE_CAMELCASE_OR_UNDERSCORE));

Context

StackExchange Code Review Q#62500, answer score: 8

Revisions (0)

No revisions yet.