patternjavaMinor
Efficient top k words using MultiMap (Google Guava ext lib JAR)
Viewed 0 times
topgooglelibjarwordsefficientextguavausingmultimap
Problem
Basic algorithm:
Now, I tried to use
Is it efficient enough or are there other ways to make it more efficient?
```
//Below code works fine (Please include lib files as per your IDE setting)
package Topk;
//---import necessary lib---
class Words
{
int count=1;
String word;
Words(int count, String word)
{
this.count = count;
this.word = word;
}
}
public class TopK
{
public static void main(String argv[]) throws IOException
{
int Tcount = 0;
Map map = new TreeMap();
BufferedReader br = new BufferedReader(new FileReader("text.txt"));
String line;
while((line = br.readLine())!=null)
{
String[] words = line.split("\\W");
for(String word:words)
{
word = word.toLowerCase();
Tcount++;
if(word.equals(""))
continue;
insert(word,map);
}
}
Multimap mm = ArrayListMultimap.create();
Iterator itr = map.keySet().iterator();
while(itr.hasNext())
{
String key = (String) itr.next();
int tempi = map.get(key);
Stri
- Open document/huge text file
- Use
MapandListto separate the words. \$O(K)\$ time complexity (TC) considering \$K\$ distinct words.
- If the word already exists in the list, increment the count of frequency, else add the new word. Constant TC.
- Use a
Collectionto sort the list on the basis of frequency of usage. \$n*log(n)\$ TC.
- Print the list,
Entry.getKey(max count word), and itsEntry.getValue(max count).
Now, I tried to use
Multimap before sorting so that I can group same frequency count words. This will drastically decrease the sorting complexity as the number of elements to compare will decrease.Is it efficient enough or are there other ways to make it more efficient?
```
//Below code works fine (Please include lib files as per your IDE setting)
package Topk;
//---import necessary lib---
class Words
{
int count=1;
String word;
Words(int count, String word)
{
this.count = count;
this.word = word;
}
}
public class TopK
{
public static void main(String argv[]) throws IOException
{
int Tcount = 0;
Map map = new TreeMap();
BufferedReader br = new BufferedReader(new FileReader("text.txt"));
String line;
while((line = br.readLine())!=null)
{
String[] words = line.split("\\W");
for(String word:words)
{
word = word.toLowerCase();
Tcount++;
if(word.equals(""))
continue;
insert(word,map);
}
}
Multimap mm = ArrayListMultimap.create();
Iterator itr = map.keySet().iterator();
while(itr.hasNext())
{
String key = (String) itr.next();
int tempi = map.get(key);
Stri
Solution
Java/Java 8 tips
Since Java 7, the recommended approach for reading from a
In Java 8, there is an even more convenient way of specifying how you want to read and process lines from a text file, which I will explain below...
Your Guava-based approach
-
-
Your implementation for step 3, i.e. the
What this means is to either add
-
Steps 4 to 7 are... quite lengthy and you can rely on some of Guava's built-in methods to perform the same operations too:
-
Step 8 is quite an odd step, partly because I feel you have deconstructed a
-
Putting together, steps 4 to 8 can probably be done as such:
A non-Guava-based approach
With Java 8, one can also learn to apply the new
For example, to read a file in as a
Processing each line into your
-
The equivalent of using a Guava
```
private static > Collection getMaximumByValue(
Map map) {
Words is an unused class, and even if it is in used, class naming convention recommends the singular form rather than the plural. This is because you will have a Word instance or many Word instances, but a Words instance doesn't sound as right. The only pluralized class names that I can recall off-hand are for utilities classes, i.e. those that only provide a bunch of static methods.Since Java 7, the recommended approach for reading from a
Reader source is try-with-resources, e.g.try (BufferedReader reader = new BufferedReader(new FileReader("text.txt"))) {
String line;
while ((line = reader.readLine()) != null) {
// ...
}
}In Java 8, there is an even more convenient way of specifying how you want to read and process lines from a text file, which I will explain below...
Your Guava-based approach
- Read each
lineand split intowords.
- Store a running count as the variable
Tcount.
- For each non-empty
wordinwords, insert the lowercase form into amap.
- Insertion is based on checking if
wordexists in themap. If it does, increment the counter, else addword => 1to themap.
- Create a
Multimap,mm.
- Loop through the keys of
mapto add the count-per-word pairing tomm.
- Create another
HashMap,fmap.
- Loop through
mmto add thetoString()representation of each value as the key forfmap, with the key being the value forfmap.
- Sort
fmapinto aList>data structure.
- This is done by wrapping the
entrySet()into aList, and applyingCollections.sort()on the newList.
- Display the result.
-
Tcount can be better renamed as totalCount or wordCount, following the camelCase naming convention for variables, and the unnecessary need to shorten variable names (unless they are really long).-
Your implementation for step 3, i.e. the
insert() method, has a simpler implementation in Java 8 using Map.merge():map.merge(word, 1, (a, b) -> a + b);What this means is to either add
word => 1 to the map if it does not exist, or use the BiFunction lambda declaration to add the existing and new (i.e. 1) values.-
Steps 4 to 7 are... quite lengthy and you can rely on some of Guava's built-in methods to perform the same operations too:
Multimap inverseMap = Multimaps.invertFrom(Multimaps.forMap(map),
ArrayListMultimap.create());- You can use
Multimaps.forMap(Map)to give you aMultimapwrapper over the originalMapso that it can be used as an argument for...
Multimaps.invertFrom, inverting thekey => valuemapping of aMultimapinto a resultingMultimapinstance, which you can declare asArrayListMultimap.create().
-
Step 8 is quite an odd step, partly because I feel you have deconstructed a
Map semantic as a sorted List of Entry instances. This definitely can be implemented in the form of a TreeMap, which allows you to sort on the keys while giving you additional flexibility of retrieving using Map-friendly methods. After sorting the keys, you can simply use the largest key to get() from your Multimap as your result, instead of having to deal with a List data structure.-
Putting together, steps 4 to 8 can probably be done as such:
private static > Collection getMaximumByValue(
Map map) {
Multimap inverseMap = Multimaps.invertFrom(Multimaps.forMap(map),
ArrayListMultimap.create());
return inverseMap.get(Ordering.natural().immutableSortedCopy(inverseMap.keys())
.reverse().get(0));
}A non-Guava-based approach
With Java 8, one can also learn to apply the new
Stream-based processing techniques that can simplify most of the explicit looping.For example, to read a file in as a
Stream of lines using Files.lines(Path):try (Stream lines = Files.lines(Paths.get(path))) {
// perform further operations on lines, e.g. lines.collect()...
}Processing each line into your
map, i.e. step 3 above, can be done by:flapMap()-ing every line with a replacement ofStreamof words,
- The method reference to use here will be
Pattern::splitAsStream.
filter()-ing for non-emptyStrings, before performing a
-
collect(), groupingBy(), and then counting() on the word occurrences:private static final Pattern LINE_SPLIITER = Pattern.compile("\\W");
private static Map process(Stream lines) {
return lines.flatMap(LINE_SPLIITER::splitAsStream).filter(s -> !s.isEmpty())
.collect(Collectors.groupingBy(String::toLowerCase, Collectors.counting()));
}The equivalent of using a Guava
Multimap so that the values can be sorted is to apply another form of groupingBy() on the resulting Map from above:```
private static > Collection getMaximumByValue(
Map map) {
Code Snippets
try (BufferedReader reader = new BufferedReader(new FileReader("text.txt"))) {
String line;
while ((line = reader.readLine()) != null) {
// ...
}
}map.merge(word, 1, (a, b) -> a + b);Multimap<Integer, String> inverseMap = Multimaps.invertFrom(Multimaps.forMap(map),
ArrayListMultimap.create());private static <K, V extends Comparable<V>> Collection<K> getMaximumByValue(
Map<K, V> map) {
Multimap<V, K> inverseMap = Multimaps.invertFrom(Multimaps.forMap(map),
ArrayListMultimap.create());
return inverseMap.get(Ordering.natural().immutableSortedCopy(inverseMap.keys())
.reverse().get(0));
}try (Stream<String> lines = Files.lines(Paths.get(path))) {
// perform further operations on lines, e.g. lines.collect()...
}Context
StackExchange Code Review Q#101695, answer score: 2
Revisions (0)
No revisions yet.