patternjavaMinor
Function to get top k words in a given String
Viewed 0 times
topwordsfunctiongetstringgiven
Problem
This is a two-part job:
I've tried to use regex to split but it's taking 10x more time when string is really long. The top
```
public static List getTopStrings(String input, int k)
{
if(k list = new ArrayList();
if(input==null || input.isEmpty() || k==0)
{
return list;
}
Set separators = new HashSet();
separators.add(' ');
separators.add(',');
separators.add('.');
separators.add(';');
separators.add(':');
separators.add('"');
separators.add('(');
separators.add(')');
separators.add('-');
separators.add('/');
separators.add('\\');
separators.add('\'');
separators.add('?');
separators.add('\n');
separators.add('\r');
separators.add('!');
separators.add('|');
separators.add('~');
separators.add('\'');
separators.add('[');
separators.add(']');
separators.add('{');
separators.add('}');
separators.add('&');
separators.add('%');
separators.add('$');
separators.add('#');
separators.add('@');
separators.add('*');
separators.add('=');
separators.add('+');
separators.add('>');
separators.add(' wordMap = new HashMap();
int count, wordStart=0;
for(int i=0;i keySet = new ArrayList();
keySet.addAll(wordMap.keySet());
if(keySet.size() minHeap = new PriorityQueue(k, new Comparator() {
- to identify a word
- to find the top
kwords
I've tried to use regex to split but it's taking 10x more time when string is really long. The top
k strings are found using a priority queue. I've created a min priority queue for first k element, if the frequency of the next word is greater than the min element. I then remove min and insert the new word. ```
public static List getTopStrings(String input, int k)
{
if(k list = new ArrayList();
if(input==null || input.isEmpty() || k==0)
{
return list;
}
Set separators = new HashSet();
separators.add(' ');
separators.add(',');
separators.add('.');
separators.add(';');
separators.add(':');
separators.add('"');
separators.add('(');
separators.add(')');
separators.add('-');
separators.add('/');
separators.add('\\');
separators.add('\'');
separators.add('?');
separators.add('\n');
separators.add('\r');
separators.add('!');
separators.add('|');
separators.add('~');
separators.add('\'');
separators.add('[');
separators.add(']');
separators.add('{');
separators.add('}');
separators.add('&');
separators.add('%');
separators.add('$');
separators.add('#');
separators.add('@');
separators.add('*');
separators.add('=');
separators.add('+');
separators.add('>');
separators.add(' wordMap = new HashMap();
int count, wordStart=0;
for(int i=0;i keySet = new ArrayList();
keySet.addAll(wordMap.keySet());
if(keySet.size() minHeap = new PriorityQueue(k, new Comparator() {
Solution
Initialize Set with values
Your method of initializing the hashset with values takes quite a bit of space. I would create a static field:
And then use it like this:
Extract code to methods
Your
Simplify Sorting
Your sorting mechanism using a queue seems overly complicated. If you are using Java 8, it could be reduced to three lines (or one really long one):
Misc
Your method of initializing the hashset with values takes quite a bit of space. I would create a static field:
public static final String[] SEPARATORS = new String[] { " ", ",", ".", ";" };And then use it like this:
public static final Set separatorsSet = new HashSet(Arrays.asList(SEPARATORS));Extract code to methods
Your
getTopStrings method currently does three things: extract words from string, count amount of duplicate words, sort the result of that by frequency. I would at least create separate methods for the first two and the third functionality.Simplify Sorting
Your sorting mechanism using a queue seems overly complicated. If you are using Java 8, it could be reduced to three lines (or one really long one):
// sort by value, reversed
Stream> sorted = wordMap.entrySet().stream().sorted(Map.Entry.comparingByValue((x,y) -> (y collected = sorted.map(entry -> entry.getKey()).collect(Collectors.toCollection(ArrayList::new));
// top k:
List topK = collected.subList(0, k);Misc
- in Java, it is customary to place curly brackets on the same line as the opening statements.
- use curly brackets even for one line statements for improved readability and to avoid future bugs.
- use more spaces, eg around
==, `
- declare variables in as small a scope as possible to increase readability.
Code Snippets
public static final String[] SEPARATORS = new String[] { " ", ",", ".", ";" };public static final Set<String> separatorsSet = new HashSet<String>(Arrays.asList(SEPARATORS));// sort by value, reversed
Stream<Entry<String, Integer>> sorted = wordMap.entrySet().stream().sorted(Map.Entry.comparingByValue((x,y) -> (y < x) ? -1 : ((x == y) ? 0 : 1))); // alternatively: use reversed and cast
// flatten map to list:
List<String> collected = sorted.map(entry -> entry.getKey()).collect(Collectors.toCollection(ArrayList::new));
// top k:
List<String> topK = collected.subList(0, k);Context
StackExchange Code Review Q#82264, answer score: 7
Revisions (0)
No revisions yet.