HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

What is the modern way to partition PostgreSQL across machines, when the data is "naturally partitionable"

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlpartitionthewhatnaturallywaymachinesacrosspartitionabledata

Problem

After several years of dwelling into the "NoSQL" space, now I have a problem that is quite "relational" in its nature.
Today I see data stores with quite different eyes than before. Things like Riak have spoiled me in a way that I can no more tolerate single points of failure, "down for maintenance" etc. Of course, (or I hope), I havent' lost my sanity totally. This is a a personal project that doesn't quite (or yet) have extremely high requirements.

Most of the sharding solutions don't give me what I want (at least on a glimpse), probably because my problem is quite "easy" to solve. At least on conceptual level (ignoring the restraints that RDBMs themselves bring to the table).

-
I have a small amount of "shared" data, which can be duplicated freely. It doesn't have requirements of hard consistency. This can be stored in a dynamo-like database and will scale infinitely. But I still would like to go with a single database if possible.

-
I have lots of "per-user" data. That is - lots of users, with every user having data of absolutely reasonable size, really fit to be stored on a single PostgreSQL node. We are talking about 10s of thousands records at maximum.

-
I never need to query cross-user and I don't need cross-user atomicity.

This sounds extremely easy to achieve. At least when I'm looking at it with my "NoSQL eyes".

Here are my naive starter ideas:

-
At the very extreme, I could just serialize the whole user as a single Key/Value in Riak. Of course, constant de/serialization of several megabytes of data will be slow and that's why I'm considering using PostgreSQL. Lots of Riak K/Vs is a no-go, as I need atomicity/transactions within each user's data.

-
I could use an SQLite database per user, and use something like GlusterFS for the redundancy/availability. This is probably the solution I'm going to choose if I can't find something equally good using PostgreSQL. Pros: Can down/up scale really well; Cons: I'd prefer having PostgreSQL's types and

Solution

Postgres-XL is attempting to solve this as of 2014. They're aiming directly at big data on PostgreSQL, and they have developers from Stado onboard.

Context

StackExchange Database Administrators Q#30274, answer score: 5

Revisions (0)

No revisions yet.