HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Function that shortens a String based on a term/abbreviation mapping

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
functiontermabbreviationthatbasedmappingstringshortens

Problem

I have a function that takes a description as a String and returns a shortened version of the description.

The shortening is done by checking if a word matches the key of the HashMap map and replaces the word by its abbreviation defined as the value in the HashMap. If the word cannot be found it in the HashMap it will not be replaced (it stays the same). If the value of a word is ` the word will be removed from the new description.

There can be multiple abbreviations for the same word.

Example:


AUTOMATIC GAIN CONTROL --> AGC


CONTROL --> CTRL

In any case the function should always take the abbreviation which takes the most words.

Consider this example mapping:


AUTOMATIC GAIN CONTROL --> AGC


CONTROL --> CTRL


IDENTIFIER --> ID


THE -->


The input
String:


The Automatic Gain Control identifier value

will result in the output
String:


AGC ID value

The
String's in the HashMap are all uppercase and the abbreviation in the output String should also be uppercase. However, any word that was not found in the HashMap should stay the same as it was in the input String.

My function seems to do the task correctly but I'm not sure if it could have been written better and/or more efficiently.

``
private static HashMap map = new HashMap();

public static String shortenText(String input){

ArrayList listUpperCase = new ArrayList(Arrays.asList(new String(input).toUpperCase().split(" ")));
ArrayList list = new ArrayList(Arrays.asList(input.split(" ")));

//Remove any words that map to
for (int i=0;i")){
list.remove(i);
listUpperCase.remove(i);
}
}
}
//Replace words by their abbreviation
StringBuilder builder = new StringBuilder();
for (int i=0;i<list.size();i++){
StringBuilder tempBuilder = new StringBuilder();
String longestStringToShort="";
int count=0;
//Find abbreviations for

Solution

Some observations (partly covered in @RoToRa's answer):

-
A placeholder String such as "" doesn't communicate at a glance whether a subsequence can be removed or not. Consider if you are scanning through your code:

map.put("Another Bored Cat", "ABC");
map.put("Another Bad Coral", "");
map.put("Another Berry Can", "ABC");


You may miss out that "Another Bad Coral" needs to be removed, purely by looking at the length of the line. It's arguably easier to read as such:

map.put("Another Bored Cat", "ABC");
map.put("Another Bad Coral", "");
map.put("Another Berry Can", "ABC");


In this case, it also removes the ambiguity (as unlikely as that may seem initially) that "Another Bad Coral" will be substituted with "".

-
Not particularly efficient uses for StringBuilder, especially when you builder.append(" " + aString) when you can do builder.apppend(' ').append(aString). Furthermore, you are also creating a StringBuilder for every tokenized word.

-
Related to the previous point, your explicit concatentation of " " characters forces you to repetitively do builder.deleteCharAt(0), which also leads to buggy derivation.

-
Usage of Map/HashMap can be enhanced so as to reduce the explicit looping that you are currently doing by tokenizing the input String. As you mentioned, you want to "takes the most words as the key in the HashMap", so if there's a way to actually iterate in that order from the Map...

Alternative implementation (Java 8 features ahead)

You can then use a TreeMap with a suitable Comparator implementation to persist and iterate with. For example:

private static final Comparator KEY_COMPARATOR = (s1, s2) -> {
    int diff = Integer.compare(s2.length(), s1.length());
    return diff == 0 ? s1.compareToIgnoreCase(s2) : diff;
};


Or in the form of an anonymous class:

private static final Comparator KEY_COMPARATOR = new Comparator() {

    @Override
    public int compare(String s1, String s2) {
        int diff = Integer.compare(s2.length(), s1.length());
        return diff == 0 ? s1.compareToIgnoreCase(s2) : diff;
    }
};


This literally does what is required: if the lengths of both Strings are the same, we'll use a case-insensitive comparison between the both of them, else we simply rank the longer String to be greater than the other. This means that a Map such as:

private static final Map MAPPER = getMapper();

private static Map getMapper() {
    Map mapper = new TreeMap<>(KEY_COMPARATOR);
    mapper.put("THE", "");
    mapper.put("AUTOMATIC", "AUTO");
    mapper.put("ZOOKEEPER", "ZK");
    mapper.put("AUTOMATIC GAIN CONTROL", "AGC");
    mapper.put("IDENTIFIER", "ID");
    return mapper;
}


Will give us the following desired iteration order:

// MAPPER.forEach((key, value) -> System.out.printf("%s => %s\n", key, value));
AUTOMATIC GAIN CONTROL => AGC
IDENTIFIER => ID
AUTOMATIC => AUTO
ZOOKEEPER => ZK
THE =>


With this Map implementation, the overall solution is half-complete. What is left is to replace any matching keys (as a subsequence) in the input String with the shortened value. For this, you can consider using regular expressions to perform the substitution on just one instance of StringBuilder:

private static String shorten(String input) {
    StringBuilder result = new StringBuilder(input);
    MAPPER.forEach((key, value) -> {
        Matcher matcher = Pattern.compile("(?i)\\b" + key + "\\b").matcher(result);
        while (matcher.reset().find()) {
            result.replace(matcher.start(), matcher.end(), value);
            if (value.isEmpty() && result.length() != 0) {
                result.deleteCharAt(Math.max(0, matcher.start() - 1));
            }
        }
    });
    return result.toString();
}


  • Use a case-insensitive ((?i)) Pattern instance to apply on the StringBuilder via Pattern.matcher(CharSequence). The \\b word boundary is important to make sure we only match on a complete expression demarcated by whitespaces.



  • Loop through each match, taking care to reset() the Matcher instance first.



  • Replace the contents of the StringBuilder instance using the results matcher.start() and matcher.end().



  • For cases where we are removing text, we also need to remember to delete the previous character, or the first, so long as the StringBuilder instance is non-empty.



Unit-testing

In case this is not done yet, remember to unit test this method too! For example, using a combination of TestNG and Hamcrest matchers library to perform the required assertions:

```
@Test
public void testInput() {
String test = "There is the Automatic Automatic Gain Control Zookeeper Identifier";
assertThat(shorten(test + " " + test),
equalsTo("There is AUTO AGC ZK ID There is AUTO AGC ZK ID"));
// the middle whitespace is effectively trimmed away
assertThat(shorten("the the"), equalsTo(""));
assertThat(shorten("The Automatic Gai

Code Snippets

map.put("Another Bored Cat", "ABC");
map.put("Another Bad Coral", "<remove>");
map.put("Another Berry Can", "ABC");
map.put("Another Bored Cat", "ABC");
map.put("Another Bad Coral", "");
map.put("Another Berry Can", "ABC");
private static final Comparator<String> KEY_COMPARATOR = (s1, s2) -> {
    int diff = Integer.compare(s2.length(), s1.length());
    return diff == 0 ? s1.compareToIgnoreCase(s2) : diff;
};
private static final Comparator<String> KEY_COMPARATOR = new Comparator<String>() {

    @Override
    public int compare(String s1, String s2) {
        int diff = Integer.compare(s2.length(), s1.length());
        return diff == 0 ? s1.compareToIgnoreCase(s2) : diff;
    }
};
private static final Map<String, String> MAPPER = getMapper();

private static Map<String, String> getMapper() {
    Map<String, String> mapper = new TreeMap<>(KEY_COMPARATOR);
    mapper.put("THE", "");
    mapper.put("AUTOMATIC", "AUTO");
    mapper.put("ZOOKEEPER", "ZK");
    mapper.put("AUTOMATIC GAIN CONTROL", "AGC");
    mapper.put("IDENTIFIER", "ID");
    return mapper;
}

Context

StackExchange Code Review Q#100565, answer score: 2

Revisions (0)

No revisions yet.