patternjavaMinor
Function that shortens a String based on a term/abbreviation mapping
Viewed 0 times
functiontermabbreviationthatbasedmappingstringshortens
Problem
I have a function that takes a description as a
The shortening is done by checking if a word matches the key of the
private static HashMap map = new HashMap();
public static String shortenText(String input){
ArrayList listUpperCase = new ArrayList(Arrays.asList(new String(input).toUpperCase().split(" ")));
ArrayList list = new ArrayList(Arrays.asList(input.split(" ")));
//Remove any words that map to
for (int i=0;i")){
list.remove(i);
listUpperCase.remove(i);
}
}
}
//Replace words by their abbreviation
StringBuilder builder = new StringBuilder();
for (int i=0;i<list.size();i++){
StringBuilder tempBuilder = new StringBuilder();
String longestStringToShort="";
int count=0;
//Find abbreviations for
String and returns a shortened version of the description.The shortening is done by checking if a word matches the key of the
HashMap map and replaces the word by its abbreviation defined as the value in the HashMap. If the word cannot be found it in the HashMap it will not be replaced (it stays the same). If the value of a word is ` the word will be removed from the new description.
There can be multiple abbreviations for the same word.
Example:
AUTOMATIC GAIN CONTROL --> AGC
CONTROL --> CTRL
In any case the function should always take the abbreviation which takes the most words.
Consider this example mapping:
AUTOMATIC GAIN CONTROL --> AGC
CONTROL --> CTRL
IDENTIFIER --> ID
THE -->
The input String:
The Automatic Gain Control identifier value
will result in the output String:
AGC ID value
The String's in the HashMap are all uppercase and the abbreviation in the output String should also be uppercase. However, any word that was not found in the HashMap should stay the same as it was in the input String.
My function seems to do the task correctly but I'm not sure if it could have been written better and/or more efficiently.
``private static HashMap map = new HashMap();
public static String shortenText(String input){
ArrayList listUpperCase = new ArrayList(Arrays.asList(new String(input).toUpperCase().split(" ")));
ArrayList list = new ArrayList(Arrays.asList(input.split(" ")));
//Remove any words that map to
for (int i=0;i")){
list.remove(i);
listUpperCase.remove(i);
}
}
}
//Replace words by their abbreviation
StringBuilder builder = new StringBuilder();
for (int i=0;i<list.size();i++){
StringBuilder tempBuilder = new StringBuilder();
String longestStringToShort="";
int count=0;
//Find abbreviations for
Solution
Some observations (partly covered in @RoToRa's answer):
-
A placeholder
You may miss out that
In this case, it also removes the ambiguity (as unlikely as that may seem initially) that
-
Not particularly efficient uses for
-
Related to the previous point, your explicit concatentation of
-
Usage of
Alternative implementation (Java 8 features ahead)
You can then use a
Or in the form of an anonymous class:
This literally does what is required: if the lengths of both
Will give us the following desired iteration order:
With this
Unit-testing
In case this is not done yet, remember to unit test this method too! For example, using a combination of TestNG and Hamcrest matchers library to perform the required assertions:
```
@Test
public void testInput() {
String test = "There is the Automatic Automatic Gain Control Zookeeper Identifier";
assertThat(shorten(test + " " + test),
equalsTo("There is AUTO AGC ZK ID There is AUTO AGC ZK ID"));
// the middle whitespace is effectively trimmed away
assertThat(shorten("the the"), equalsTo(""));
assertThat(shorten("The Automatic Gai
-
A placeholder
String such as "" doesn't communicate at a glance whether a subsequence can be removed or not. Consider if you are scanning through your code:map.put("Another Bored Cat", "ABC");
map.put("Another Bad Coral", "");
map.put("Another Berry Can", "ABC");You may miss out that
"Another Bad Coral" needs to be removed, purely by looking at the length of the line. It's arguably easier to read as such:map.put("Another Bored Cat", "ABC");
map.put("Another Bad Coral", "");
map.put("Another Berry Can", "ABC");In this case, it also removes the ambiguity (as unlikely as that may seem initially) that
"Another Bad Coral" will be substituted with "".-
Not particularly efficient uses for
StringBuilder, especially when you builder.append(" " + aString) when you can do builder.apppend(' ').append(aString). Furthermore, you are also creating a StringBuilder for every tokenized word.-
Related to the previous point, your explicit concatentation of
" " characters forces you to repetitively do builder.deleteCharAt(0), which also leads to buggy derivation.-
Usage of
Map/HashMap can be enhanced so as to reduce the explicit looping that you are currently doing by tokenizing the input String. As you mentioned, you want to "takes the most words as the key in the HashMap", so if there's a way to actually iterate in that order from the Map...Alternative implementation (Java 8 features ahead)
You can then use a
TreeMap with a suitable Comparator implementation to persist and iterate with. For example:private static final Comparator KEY_COMPARATOR = (s1, s2) -> {
int diff = Integer.compare(s2.length(), s1.length());
return diff == 0 ? s1.compareToIgnoreCase(s2) : diff;
};Or in the form of an anonymous class:
private static final Comparator KEY_COMPARATOR = new Comparator() {
@Override
public int compare(String s1, String s2) {
int diff = Integer.compare(s2.length(), s1.length());
return diff == 0 ? s1.compareToIgnoreCase(s2) : diff;
}
};This literally does what is required: if the lengths of both
Strings are the same, we'll use a case-insensitive comparison between the both of them, else we simply rank the longer String to be greater than the other. This means that a Map such as:private static final Map MAPPER = getMapper();
private static Map getMapper() {
Map mapper = new TreeMap<>(KEY_COMPARATOR);
mapper.put("THE", "");
mapper.put("AUTOMATIC", "AUTO");
mapper.put("ZOOKEEPER", "ZK");
mapper.put("AUTOMATIC GAIN CONTROL", "AGC");
mapper.put("IDENTIFIER", "ID");
return mapper;
}Will give us the following desired iteration order:
// MAPPER.forEach((key, value) -> System.out.printf("%s => %s\n", key, value));
AUTOMATIC GAIN CONTROL => AGC
IDENTIFIER => ID
AUTOMATIC => AUTO
ZOOKEEPER => ZK
THE =>With this
Map implementation, the overall solution is half-complete. What is left is to replace any matching keys (as a subsequence) in the input String with the shortened value. For this, you can consider using regular expressions to perform the substitution on just one instance of StringBuilder:private static String shorten(String input) {
StringBuilder result = new StringBuilder(input);
MAPPER.forEach((key, value) -> {
Matcher matcher = Pattern.compile("(?i)\\b" + key + "\\b").matcher(result);
while (matcher.reset().find()) {
result.replace(matcher.start(), matcher.end(), value);
if (value.isEmpty() && result.length() != 0) {
result.deleteCharAt(Math.max(0, matcher.start() - 1));
}
}
});
return result.toString();
}- Use a case-insensitive (
(?i))Patterninstance to apply on theStringBuilderviaPattern.matcher(CharSequence). The\\bword boundary is important to make sure we only match on a complete expression demarcated by whitespaces.
- Loop through each match, taking care to
reset()theMatcherinstance first.
- Replace the contents of the
StringBuilderinstance using the resultsmatcher.start()andmatcher.end().
- For cases where we are removing text, we also need to remember to delete the previous character, or the first, so long as the
StringBuilderinstance is non-empty.
Unit-testing
In case this is not done yet, remember to unit test this method too! For example, using a combination of TestNG and Hamcrest matchers library to perform the required assertions:
```
@Test
public void testInput() {
String test = "There is the Automatic Automatic Gain Control Zookeeper Identifier";
assertThat(shorten(test + " " + test),
equalsTo("There is AUTO AGC ZK ID There is AUTO AGC ZK ID"));
// the middle whitespace is effectively trimmed away
assertThat(shorten("the the"), equalsTo(""));
assertThat(shorten("The Automatic Gai
Code Snippets
map.put("Another Bored Cat", "ABC");
map.put("Another Bad Coral", "<remove>");
map.put("Another Berry Can", "ABC");map.put("Another Bored Cat", "ABC");
map.put("Another Bad Coral", "");
map.put("Another Berry Can", "ABC");private static final Comparator<String> KEY_COMPARATOR = (s1, s2) -> {
int diff = Integer.compare(s2.length(), s1.length());
return diff == 0 ? s1.compareToIgnoreCase(s2) : diff;
};private static final Comparator<String> KEY_COMPARATOR = new Comparator<String>() {
@Override
public int compare(String s1, String s2) {
int diff = Integer.compare(s2.length(), s1.length());
return diff == 0 ? s1.compareToIgnoreCase(s2) : diff;
}
};private static final Map<String, String> MAPPER = getMapper();
private static Map<String, String> getMapper() {
Map<String, String> mapper = new TreeMap<>(KEY_COMPARATOR);
mapper.put("THE", "");
mapper.put("AUTOMATIC", "AUTO");
mapper.put("ZOOKEEPER", "ZK");
mapper.put("AUTOMATIC GAIN CONTROL", "AGC");
mapper.put("IDENTIFIER", "ID");
return mapper;
}Context
StackExchange Code Review Q#100565, answer score: 2
Revisions (0)
No revisions yet.