patternMinor
Sharded key-value store using MongoDB
Viewed 0 times
mongodbvaluestoreshardedusingkey
Problem
Would like to set up a key-value store that is sharded across multiple machines.
We are currently using MongoDB, is there a reason why we shouldn't use MongoDB for this purpose?
We also use Redis, however for this use case, we would like to use the hard drive and Redis is in-RAM only.
We are currently using MongoDB, is there a reason why we shouldn't use MongoDB for this purpose?
We also use Redis, however for this use case, we would like to use the hard drive and Redis is in-RAM only.
Solution
There are a number of reasons not to use MongoDB as a pure key-value store, and there are some reasons to consider it. Mongo is optimized as a document store - it indexes all the fields in a document, and has rich primitives for JSON objects and hierarchies. You can use it as a key-value store, but the single-threaded nature means you won't be getting good performance out of your hardware. Storing simple blobs removes a number of the benefits of Mongo. Mongo has algorithms where it splits data chunks as you insert, which can create lag. Monogo's system for re-partitioning is cumbersome, as well. The benefit of a key-value system is it should be really simple and really fast, so you can scale up and keep server and management costs down.
Other systems are more tuned for key-value use. You mention Redis, one of the best key-value stores, but the repartitioning/clustering in Redis is still alpha-level, and there is a requirement of DRAM. Some people build their own shard layers and partitioning layers on Redis - this is very common among some of the larger Chinese social networks.
Cassandra is sometimes used as a key-value store. This isn't the best use of Cassandra, as Cassandra's "super column families" provide rich indexing. Cassandra isn't as fast as databases written in C like Redis and Mongo, but does have strong clustering capabilities.
One store you should strongly consider in this area is Aerospike. Aerospike has very flexible cluster management - adding a single node by just bringing it up - as well as support for both DRAM and SSD/Flash - and easy replication for HA. It's in use at very high levels of scale by advertising platform companies who need huge key value stores. Aerospike has a free version that supports node sizes to 200G.
CoucheBase (was MemBase) is another system to look at for key-value use. It provides some clustering primitives, and is focused more around in-memory use.
Other systems are more tuned for key-value use. You mention Redis, one of the best key-value stores, but the repartitioning/clustering in Redis is still alpha-level, and there is a requirement of DRAM. Some people build their own shard layers and partitioning layers on Redis - this is very common among some of the larger Chinese social networks.
Cassandra is sometimes used as a key-value store. This isn't the best use of Cassandra, as Cassandra's "super column families" provide rich indexing. Cassandra isn't as fast as databases written in C like Redis and Mongo, but does have strong clustering capabilities.
One store you should strongly consider in this area is Aerospike. Aerospike has very flexible cluster management - adding a single node by just bringing it up - as well as support for both DRAM and SSD/Flash - and easy replication for HA. It's in use at very high levels of scale by advertising platform companies who need huge key value stores. Aerospike has a free version that supports node sizes to 200G.
CoucheBase (was MemBase) is another system to look at for key-value use. It provides some clustering primitives, and is focused more around in-memory use.
Context
StackExchange Database Administrators Q#41043, answer score: 8
Revisions (0)
No revisions yet.