HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Creating a fast Android dictionary (word counts)

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fastandroidcreatingworddictionarycounts

Problem

This is a follow-up of my question here:

I am currently working on an application for various statistics. One task is to analyse a good amount of sentences for their word counts.

The specifications are:

  • sentences are read from SQLiteDatabase (up to 20k with an average of about 15 words)



  • transformation: split by whitespaces (to get the words of the sentences)



  • transformation: toLowerCase (to minimize variations of words)



  • transformation: replace [^a-zA-Z] (for the same reason as above)



  • get word + count for the first x (not sure yet, maybe 10-15) most common words



  • preserve a flag if the messages was sent/received



What I'm looking for:

  • improvements to make the code run faster



  • alternative approaches for this task



  • (general hints to improve the task)



Current version with the suggested improvements made

```
//fields

private static final CharMatcher pat_rep = CharMatcher.inRange('A', 'Z').or(CharMatcher.inRange('a', 'z'))
.precomputed();
private static final Pattern pat_split = Pattern.compile("\\s");
private HashMultiset sent = HashMultiset.create();
private HashMultiset rcvd = HashMultiset.create();
private Cursor c1;
private Cursor c2;

//start

c1 = db.rawQuery("select lower(DATA) as SENTENCE, SENT from MESSAGELIST", null);
while (c1.moveToNext()) {
String[] words = pat_split.split(c1.getString(c1.getColumnIndex("SENTENCE")));
int from_me = c1.getInt(c1.getColumnIndex("key_from_me"));
for (String in : words) {
in = pat_rep.retainFrom(in);
if (!in.equals("")) {
if (from_me == 1) {
sent.add(in);
} else {
rcvd.add(in);
}
}
}
}

db.execSQL("create temp table if not exists WORDS (WORD varchar, SENT integer, CNT integer)");
SQLiteStatement ins = db.compileStatement("insert into WORDS values (?, ?, ?)");
db.beginTransaction();

Iterator> i = sent.entrySet().

Solution

Just some ideas.

Maybe you could use batching?

You may be also able to save some time by iterating over sent.entrySet() instead of looking up the count separately.

Split on [^a-zA-Z] as you later throw non-letter away anyway.

Can't you use JDK5 loops like

for (String in : send) {...}


? I guess, clearBindings is unnecessary as you always overwrite everything.

Make all fields private. Always (unless you have a very good reason not to). AT least I hope that pat_rep etc. are fields.

Split your method. Shorter methods are easier to read and to optimize.

Code Snippets

for (String in : send) {...}

Context

StackExchange Code Review Q#60579, answer score: 3

Revisions (0)

No revisions yet.