patternjavaMinor
Split camel cased/snake cased String
Viewed 0 times
casedsnakecamelsplitstring
Problem
I want to implement a method which, given some camelcased or underscored
Examples:
My approach to solve this is as follows: I add a space character between each two words that should be separated and then divide them into a list using
As you can see, it looks quite complicated. How can I improve this?
String, will return a list of separate words that make up this String.Examples:
- ISomeCamelCasedString -> {I, Some, Camel, Cased, String}
- UNDERSCORED_STRING -> {UNDERSCORED, STRING}
- camelCased_and_UNDERSCORED -> {camel, Cased, and, UNDERSCORED}
My approach to solve this is as follows: I add a space character between each two words that should be separated and then divide them into a list using
StringTokenizer.public static List split(String string) {
StringBuilder separatedWords = new StringBuilder();
for (int i=0; i 0) {
char previousC = string.charAt(i-1);
if ((!Character.isLowerCase(c) && !Character.isUpperCase(previousC)) // UpperCamelCase, UPPER_DASHED
|| !Character.isLetterOrDigit(previousC) // lower_dashed
|| (i tokens = new ArrayList();
StringTokenizer tokenizer = new StringTokenizer(separatedWords.toString());
while(tokenizer.hasMoreTokens()) {
tokens.add(tokenizer.nextToken());
}
return tokens;
}As you can see, it looks quite complicated. How can I improve this?
Solution
My first impression:
My first reaction is to embody the example test cases in proper unit tests, and then replace the implementation with a different approach, and make the tests I broke work again.
The unit tests, straightforward from your examples:
Next, I was thinking that this can be simplified by splitting on some clever regex. I'm no regex wiz, but there are a lot of those on Stack Overflow, and I found a suitable thread about splitting
This is probably not perfect, but a lot shorter than the original, and easier to understand how it works. If somebody can figure out how to get
PS:
- The implementation looks long and complicated for something that sounds simple
- You've provided some example cases to verify correctness, which is a very good thing
My first reaction is to embody the example test cases in proper unit tests, and then replace the implementation with a different approach, and make the tests I broke work again.
The unit tests, straightforward from your examples:
@Test
public void testCamelCased() {
assertEquals(Arrays.asList("I", "Some", "Camel", "Cased", "String"), split("ISomeCamelCasedString"));
}
@Test
public void testSnakeCased() {
assertEquals(Arrays.asList("UNDERSCORED", "STRING"), split("UNDERSCORED_STRING"));
}
@Test
public void testMixed() {
assertEquals(Arrays.asList("camel", "Cased", "and", "UNDERSCORED"), split("camelCased_and_UNDERSCORED"));
}Next, I was thinking that this can be simplified by splitting on some clever regex. I'm no regex wiz, but there are a lot of those on Stack Overflow, and I found a suitable thread about splitting
camelCase to its words, here. Adapting it to also split on underscores was relatively easy, though probably not perfect, resulting in this:private static final String RE_CAMELCASE_OR_UNDERSCORE =
"(? split(String string) {
List words = new ArrayList();
for (String word : string.split(RE_CAMELCASE_OR_UNDERSCORE)) {
if (!word.isEmpty()) {
words.add(word);
}
}
return words;
}This is probably not perfect, but a lot shorter than the original, and easier to understand how it works. If somebody can figure out how to get
RE_CAMELCASE_OR_UNDERSCORE so that it doesn't produce empty elements, then the method can be shortened to simply:return Arrays.asList(string.split(RE_CAMELCASE_OR_UNDERSCORE));PS:
this_is_usually_called_snake_cased, not "underscored".Code Snippets
@Test
public void testCamelCased() {
assertEquals(Arrays.asList("I", "Some", "Camel", "Cased", "String"), split("ISomeCamelCasedString"));
}
@Test
public void testSnakeCased() {
assertEquals(Arrays.asList("UNDERSCORED", "STRING"), split("UNDERSCORED_STRING"));
}
@Test
public void testMixed() {
assertEquals(Arrays.asList("camel", "Cased", "and", "UNDERSCORED"), split("camelCased_and_UNDERSCORED"));
}private static final String RE_CAMELCASE_OR_UNDERSCORE =
"(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])|_";
public static List<String> split(String string) {
List<String> words = new ArrayList<String>();
for (String word : string.split(RE_CAMELCASE_OR_UNDERSCORE)) {
if (!word.isEmpty()) {
words.add(word);
}
}
return words;
}return Arrays.asList(string.split(RE_CAMELCASE_OR_UNDERSCORE));Context
StackExchange Code Review Q#62500, answer score: 8
Revisions (0)
No revisions yet.