HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Improving mongodb read throughput for tiny database

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
throughputreadmongodbtinydatabaseforimproving

Problem

I am given to understand that MongoDB will essentially perform like it is pulling records from memory if the working set is small. I wrote a simple MongoDB test program that inserts just one record into a collection with an indexed primary key and another field and uses findOne to read the field of the inserted key.

The read throughput I am getting with many threads is just ~14K/s on my 2-core laptop, which is better than, say, mysql, but this throughput still seems awfully low given that a java hashmap gives me a read throughput of nearly ~2 million/s. Shouldn't I be getting performance comparable to a completely in-memory map? What else does MongoDB really have to do for a read-only workload with a tiny database? Do I need to change any settings from the MongoDB defaults?

I have just one "document" that has a string primary key and small string field "some value".

Test code I scratched up that gives me 15-20K/s on a 2-core machine. You would need org.json and mongodb jars to run it.

`import java.net.UnknownHostException;
import java.text.DecimalFormat;
import java.util.concurrent.ScheduledThreadPoolExecutor;

import org.json.JSONException;
import org.json.JSONObject;

import com.mongodb.BasicDBObject;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBObject;
import com.mongodb.DuplicateKeyException;
import com.mongodb.MongoClient;
import com.mongodb.MongoException;
import com.mongodb.util.JSON;

@SuppressWarnings("javadoc")
public class MongoSmallWorkingSetRead {
private static ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(
8);

private static long initTime = System.currentTimeMillis();
private static int count = 0;

private static synchronized int incrCount() {
return ++count;
}

private static synchronized int getCount() {
return count;
}

private static synchronized void reset() {
count = 0;
initTime = System.currentTimeMillis();
}

private static void testReadRate(String dbName

Solution

You have a major mistake in your code. MongoClient creates a connection pool. Even in large applications, it is hence usually a singleton. So you should have it as a global variable, initialize it in main and reuse it in each runnable. Which is perfectly fine, since MongoClient is thread safe.

Another thing to keep in mind is that although the single document sure is in the working set and hence should be in RAM, you application still needs to communicate with mongod. So your query will be translated to MongoDB's wire protocol, sent to the server where it will be executed, the matching documents identified (in this case only one, though this is not necessarily transparent before execution), finally sent back to the client and the answer is translated from MongoDB's wire protocol to Java terms. This is obviously going to be slower than simple Java native accesses in the same JVM without match conditions.

Finally, let's do some maths:

which even not taking the overhead of unnecessary MongoClients into account is pretty fast in my book.

Context

StackExchange Database Administrators Q#136177, answer score: 2

Revisions (0)

No revisions yet.