snippetsqlMinor
PostgreSQL 11: how to partition 1000 tenants
Viewed 0 times
postgresqlpartitiontenants1000how
Problem
Setup
We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.
Problem
1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?
More details
Currently, the separation between our tenants is via an
Here are a few facts about the event table:
Possible solutions
Partionining by
Pros:
Cons:
Partitioning by timestamp:
Pros:
Cons:
We have a multi-tenant app that has about 1000 customers more or less. When a customer churns we delete all their data after a period. We have a few tables that are pretty big and we're considering using partitioning to split them per customer.
Problem
1000 tenants (customers) are a lot of partitions - is it reasonable to do this on PostgreSQL?
More details
Currently, the separation between our tenants is via an
account_id column on all tables in the DB. There are a few tables that are pretty big. For example, there is an event table (the one I'm interested in partitioning) that contains audit logs and other events for everything that happens in our app.Here are a few facts about the event table:
- It contains about 300M rows + a few of composite/partial indexes.
- The count of events by
account_idis very uneven, 5% accounts have 50% of the data.
- There is a timestamp field and a few others (JSONB,
author_id, etc..)
- Write operations: inserts mostly and deletes (per
account_id). Deletes can potentially be millions of rows. No updates. Deletes of big accounts are rare and not a big performance issue at the moment.
- Selects are either for a specific event (by
account_id+id) or all events in a given period of time. Period of time is not always set.account_idis always present in the query.
Possible solutions
Partionining by
account_id:Pros:
- Deletes will be fast because of
DROP TABLE.
- Queries should also be pretty decent since all queries contains
WHERE account_id = 123
Cons:
- Are 1000 partitions too much for Postgres?
- Uneven distribution of events per account creating a few super big partitions and a few small ones.
Partitioning by timestamp:
Pros:
- Recent data is usually mostly accessed and will make queries with a timestamp faster.
- More predictable/even distribution of events.
Cons:
- Deletion of a single account might touch a lot of partitions - not a big concern.
- Always needs to contain a f
Solution
Partitioning is mostly about speeding up deletes and sequential scans.
-
If your biggest pain is the big deletes by
-
If your biggest pain is getting rid of all data older than a certain date, range partitioning by time would be the solution.
-
If you have may queries that have to perform sequential scans of large tables, but have a
You have to decide if any of these things is painful enough for you to consider partitioning. Partitioning doesn't come for free: it will increase query planning time and sometimes execution time as well.
Partitioning won't make index scans faster, often the opposite is the case. Only partition if you expect a real benefit from it.
1000 partitions is almost too many to be efficient. You might consider having separate partitions for the bigger accounts and bundling the rest, perhaps using a default partition.
-
If your biggest pain is the big deletes by
account_id, list partitioning on that column would be the best solution.-
If your biggest pain is getting rid of all data older than a certain date, range partitioning by time would be the solution.
-
If you have may queries that have to perform sequential scans of large tables, but have a
WHERE condition on account_id or a time constraint, partitioning on that WHERE condition would allow PostgreSQL to perform the sequential scan only on some of the partitions.You have to decide if any of these things is painful enough for you to consider partitioning. Partitioning doesn't come for free: it will increase query planning time and sometimes execution time as well.
Partitioning won't make index scans faster, often the opposite is the case. Only partition if you expect a real benefit from it.
1000 partitions is almost too many to be efficient. You might consider having separate partitions for the bigger accounts and bundling the rest, perhaps using a default partition.
Context
StackExchange Database Administrators Q#235108, answer score: 5
Revisions (0)
No revisions yet.