patternsqlMinor
PostgreSQL time series table with composite partitions
Viewed 0 times
postgresqlwithtimepartitionscompositeseriestable
Problem
I am investigating how I might structure a PostgreSQL table to store a large amount of time stamped data that also needs to be portioned by another field.
My expected data structure will be something like:
Events will be written into the table at a fast rate, possibly as high as 1000 per second, stored for around 6 months, and exported to cheaper storage. (perhaps pgdump files to an S3 bucket). So it would make sense to use declarative portioning on the event time, using pg_partman to create and manage new partition tables each week or so.
However, there is a strong requirement to run queries on the data by the object_sha, and my concern is that if the data is portioned only by timestamp, then the most recent partition table will be under heavy I/O load and might not keep up, so it appears logical to me that the data should also be partitioned by the prefix on object_sha, (perhaps on the first one or two hex digits), as that way reads and writes will be evenly distributed across many tables.
My questions are:
My expected data structure will be something like:
CREATE TABLE event (
event_time timestamp not null,
object_sha char(64) not null, ; sha256 as hex digits
username text not null, ; actual name not a foreign key
payload jsonb not null ; many other fields, not indexed
);Events will be written into the table at a fast rate, possibly as high as 1000 per second, stored for around 6 months, and exported to cheaper storage. (perhaps pgdump files to an S3 bucket). So it would make sense to use declarative portioning on the event time, using pg_partman to create and manage new partition tables each week or so.
However, there is a strong requirement to run queries on the data by the object_sha, and my concern is that if the data is portioned only by timestamp, then the most recent partition table will be under heavy I/O load and might not keep up, so it appears logical to me that the data should also be partitioned by the prefix on object_sha, (perhaps on the first one or two hex digits), as that way reads and writes will be evenly distributed across many tables.
My questions are:
- The pg_partman documentation says Sub-partitioning with multiple levels is supported, but it is of very limited use in PostgreSQL and provides next to NO PERFORMANCE BENEFIT – Why is that? Is that advice up to date and correct, as I can’t see how it would be a bad idea.
- All the queries that I plan to run will always have
order by event_time desc limit 50or suchlike included and most of the time will be satisfied by recent events in one or two recent tables. Is the PostgreSQL query engine smart enough to limit the query to the most recent table, and only look in older tables if it does not find enough results?
- Given that my front end application will kn
Solution
my concern is that if the data is portioned only by timestamp, then the most recent partition table will be under heavy I/O load and might not keep up
I think your fundamental premise is flawed. It is the job of your storage layer (RAID controller, LVM, etc.) to balance the IO load over the available underlying storage hardware by using striping; and with modern systems they are good at it. Trying to do it in PostgreSQL with partitioning (and different tablespaces for the different partitions) is the road to madness. I think this answers your first question: assuming your storage layer is vaguely competent, there is no additional performance benefit to be had.
All the queries that I plan to run will always have order by event_time desc limit 50 or suchlike included and most of the time will be satisfied by recent events in one or two recent tables.
As long as there is an index leading with "event_time" and that is also the range partition key, yes it will start with the newest partitions. If there is no such index, then it will not start with the newest ones and stop once it reaches the LIMIT (even though I would argue that this is when it is most important to start with the newest ones). Of course "suchlike" might hide a multitude of sins...
For your 3rd question, I don't think it matters much with declarative partitioning, though it would have been important under the older inheritance-based partitioning. But if you can arrange for different partitions to take their turn in the "hot seat" that could make a big difference in cache effectiveness, and having the clients aware of the partitioning scheme would probably be necessary (but not sufficient) to make that happen.
I think your fundamental premise is flawed. It is the job of your storage layer (RAID controller, LVM, etc.) to balance the IO load over the available underlying storage hardware by using striping; and with modern systems they are good at it. Trying to do it in PostgreSQL with partitioning (and different tablespaces for the different partitions) is the road to madness. I think this answers your first question: assuming your storage layer is vaguely competent, there is no additional performance benefit to be had.
All the queries that I plan to run will always have order by event_time desc limit 50 or suchlike included and most of the time will be satisfied by recent events in one or two recent tables.
As long as there is an index leading with "event_time" and that is also the range partition key, yes it will start with the newest partitions. If there is no such index, then it will not start with the newest ones and stop once it reaches the LIMIT (even though I would argue that this is when it is most important to start with the newest ones). Of course "suchlike" might hide a multitude of sins...
For your 3rd question, I don't think it matters much with declarative partitioning, though it would have been important under the older inheritance-based partitioning. But if you can arrange for different partitions to take their turn in the "hot seat" that could make a big difference in cache effectiveness, and having the clients aware of the partitioning scheme would probably be necessary (but not sufficient) to make that happen.
Context
StackExchange Database Administrators Q#322077, answer score: 2
Revisions (0)
No revisions yet.