patternjavaModerate
Printing the most used words from phrases
Viewed 0 times
theusedwordsprintingfromphrasesmost
Problem
I have a collection of phrases in a
I create a
Then what I do is print the top
Here is my code:
List. Each phrase is a String Array where each element in the array is a word.I create a
List> which holds the words as keys and the times used as value. Everything is sorted by value in descending order. Then what I do is print the top
X words that were used along with how many times they were used. What I want to know is if there is a better/simpler way of doing it and generally anything you want to add to make my code look or perform better. Here is my code:
public class WordCounting {
public static void printTopWords(final int numberOfWords, List phrases) {
List> wordsMap = entriesSortedByValues(wordCount(phrases));
Iterator entries = wordsMap.iterator();
int wordsCounter = 1;
while (entries.hasNext() && wordsCounter wordCount(List phrases) {
Map wordCounter = new TreeMap<>();
for (String[] strings : phrases) {
for (String string : strings) {
wordCounter.put(string, wordCounter.get(string) == null
? 1 : wordCounter.get(string) + 1);
}
}
return wordCounter;
}
static >
List> entriesSortedByValues(Map map) {
List> sortedEntries = new ArrayList<>(map.entrySet());
Collections.sort(sortedEntries,
new Comparator>() {
@Override
public int compare(Entry e1, Entry e2) {
return e2.getValue().compareTo(e1.getValue());
}
}
);
return sortedEntries;
}
}Solution
As others have pointed out the most obvious improvements already, I would like to talk about Java 8. Maybe you cannot use it yet, but I would recommend looking into it and this answer should proivde useful for anyone reading this.
My main focus point here will be designing the code such that it will logically do what it is supposed to do.
You have a method in which you take a
Issue 4 and 5 are seperated explicitely here, because that is what you ought to do. Every method should serve one purpose, and doing something and printing it is not one purpose.
Another improvement is that as input I want a structure consisting of
The code:
The output:
1: a - 3
2: b - 2
The explanation:
A few points that are worth to note:
My main focus point here will be designing the code such that it will logically do what it is supposed to do.
You have a method in which you take a
List as argument, and you want to return the top x occurences, that is all you want, in order that means:- Put all
Strings together in some structure.
- Keep track of how many occurences they have.
- Sort the list to have the words with the highest amount of occurences first.
- Return the top x occurences.
- Then decide what you want to do with it.
Issue 4 and 5 are seperated explicitely here, because that is what you ought to do. Every method should serve one purpose, and doing something and printing it is not one purpose.
Another improvement is that as input I want a structure consisting of
Strings, and not your (ugly) List, I will also deal with that. The code will be fully explained below. While we are at it, I'll also improve the error checking.The code:
public static Stream> getTopWords(final int topX, final Stream words) {
if (topX > comparator = Comparator.comparingLong(Map.Entry::getValue);
return words.collect(Collectors.groupingBy(i -> i, Collectors.counting()))
.entrySet().stream()
.sorted(comparator.reversed())
.limit(topX);
}List phrases = Arrays.asList(new String[]{"a", "b", "c"}, new String[]{"a", "a", "b", "d"});
List> topEntries = getTopWords(2, phrases.stream().flatMap(Arrays::stream))
.collect(Collectors.toList());
int counter = 1;
for (Map.Entry entry : topEntries) {
System.out.println(counter + ": " + entry.getKey() + " - " + entry.getValue());
counter++;
}The output:
1: a - 3
2: b - 2
The explanation:
- You have your
Listphrases first, you want to simply have an object that holds yourStrings. Here aStreamis a suitable object, because you only need to have a view on yourphrasesobject, there is no point in actually storing the new data. You do this by callingphrases.stream().flatMap(Arrays::stream).
- This will first turn your
Listin aStream.
- Then you use a method reference that describes the lambda
stringArray -> Arrays.stream(stringArray)to acquire aStream.
- Then with the
flatMapyou add all elements of the resultingStreamback into the originalStream.
- Then you start at the
getTopWordsmethod, which returns aStream>, again to offer the flexibility to do what you want with the results, they are not stored yet at the point where it gets returned.
- First I added some error checking.
- Then I obtain a
Comparator>that will compare what the entry is with the lowest number of occurences. This is done by using aComparator.comparingLongon the value of the entry, which is obtained with the method referenceMap.Entry::getValue.
- Then I start the chain of operations on the input
Stream:
- First group the results by their identity, which normally produces a
Map>.
- The trick here is that I also used a downstream
Collector, which counts the number of times the string occurs, hence it is calledCollectors.counting().
- At this point I have a
Mapdenoting the word and the number of occurences. It uses aLong, because this is whatCollectors.counting()returns.
- Then I obtain a
Set>and convert it into a stream.
- Then I call
sorted()on theStream>with the reversed comparator. This is done here, because type interference is not strong enough to useComparator.comparingLong(Map.Entry::getValue).reversed().
- Then I
limit()the stream by the top x elements.
- Now we have the
Stream>, and here we decide to collect it into aList>.
- Here we continue with your old logic of having a counter attached to it.
A few points that are worth to note:
- The explicit
comparator.reversed()is ugly, but neccessary to not result in type casting, which is even more ugly, it is a limitation of the current type interference. It might be only an issue in IDE's and the javac compiler might actually compile it though.
- The usage of
Map.Entryis pretty bloated, but our most reasonable option, besides creating aPairclass ourselves and usingPair. This will hopefully be easier if Java 9 includes tuples (which logically include pairs) as more or less first-class citizens.
- In the whole method
getTopWordswe end up storing all the entries in memory once, with theMap, I am pretty sure there are ways around that, but not worth the effort here, only optimize this if it becomes a real bottleneck.
- I was hoping to use the
Stream.forEach()method when processing the results, however this is not possible with the requirement that you want to have a counter. Again, more possibilities open up in Java 9 when we hopefully have `BiStre
Code Snippets
public static Stream<Map.Entry<String, Long>> getTopWords(final int topX, final Stream<String> words) {
if (topX < 1) {
throw new IllegalArgumentException("invalid value for topX: " + topX);
}
Objects.requireNonNull(words);
Comparator<Map.Entry<String, Long>> comparator = Comparator.comparingLong(Map.Entry::getValue);
return words.collect(Collectors.groupingBy(i -> i, Collectors.counting()))
.entrySet().stream()
.sorted(comparator.reversed())
.limit(topX);
}List<String[]> phrases = Arrays.asList(new String[]{"a", "b", "c"}, new String[]{"a", "a", "b", "d"});
List<Map.Entry<String, Long>> topEntries = getTopWords(2, phrases.stream().flatMap(Arrays::stream))
.collect(Collectors.toList());
int counter = 1;
for (Map.Entry<String, Long> entry : topEntries) {
System.out.println(counter + ": " + entry.getKey() + " - " + entry.getValue());
counter++;
}Context
StackExchange Code Review Q#47884, answer score: 12
Revisions (0)
No revisions yet.