patternsqlMinor
Better database for "keep always the 5 latest entries per ID and delete older"?
Viewed 0 times
thelatestperdeletekeepalwaysbetterdatabaseforand
Problem
I have a PostgreSQL database with a
There are several million entries in this table and a couple of thousand changes per day. Every day I will scan the table and keep the last 5 entries per
The
My question is rather: Is a standard PostgreSQL table the best approach for this problem?
Would PostgreSQL table partitioning help here? I know partitioning is used to easily throw away big chunks of data that are older than X days, but partitioning by
Are there NoSQL databases that would be much faster because they store the data differently?
Are there other PostgreSQL tricks that help me to store the data differently and more optimized for the use case of the daily purging (SELECTs are not an issue, it gets rarely queried)?
An exclusive lock for an 1 hour per week would be acceptable. There are about 1 million different
history table where I store an fooID (not a primary key serial but a text), an attribute target and the current timestamp whenever the target for this fooID changes:CREATE TABLE history (
fooId text not null,
target text not null,
updated_at timestamp not null default default now()
);There are several million entries in this table and a couple of thousand changes per day. Every day I will scan the table and keep the last 5 entries per
fooId and delete any older entries.The
DELETE ... WHERE id in ... rank() over (partiton by nr order by created_at... query is not my question, that works. Just takes awful long.My question is rather: Is a standard PostgreSQL table the best approach for this problem?
Would PostgreSQL table partitioning help here? I know partitioning is used to easily throw away big chunks of data that are older than X days, but partitioning by
fooId would seem to create too many partitions in my case.Are there NoSQL databases that would be much faster because they store the data differently?
Are there other PostgreSQL tricks that help me to store the data differently and more optimized for the use case of the daily purging (SELECTs are not an issue, it gets rarely queried)?
An exclusive lock for an 1 hour per week would be acceptable. There are about 1 million different
fooIDs so no chance of having a separate table per fooID.Solution
One option I can think of is to delete the oldest row as soon as you insert a new one. That will only require a very quick lookup limited to at most 6 rows rather then going through all rows at a time.
To do that efficiently you need a unique key on the table:
Then create a trigger that only keeps the 5 most recent rows for a
On my laptop, a simple test setup with 5 million rows (1 million different fooid values) showed a trigger overhead that was less then a millisecond:
QUERY PLAN
----------------------------------------------------------------------------------------------------
Insert on stuff.history (cost=0.00..0.01 rows=1 width=76) (actual time=0.062..0.062 rows=0 loops=1)
Buffers: shared hit=8
-> Result (cost=0.00..0.01 rows=1 width=76) (actual time=0.017..0.017 rows=1 loops=1)
Output: nextval('history_id_seq'::regclass), '1'::text, 'new stuff'::text, now()
Buffers: shared hit=1
Planning time: 0.024 ms
Trigger remove_last_trigger: time=0.438 calls=1
Execution time: 0.524 ms
How much overhead that is in your system and whether or not that is acceptable for you I don't know, but "thousand changes per day" doesn't sound like a very busy system.
To do that efficiently you need a unique key on the table:
create table history
(
id serial primary key, -- to make a lookup on a single row efficient
fooid text not null,
target text not null,
updated_at timestamp not null default now()
);
-- to make finding the oldest row for one fooid efficient
create index on history(fooid, updated_at);Then create a trigger that only keeps the 5 most recent rows for a
fooid:create or replace function remove_last()
returns trigger
as
$
begin
with ranked as (
select id, row_number() over (partition by fooid order by updated_at) as rn
from history
where id <> new.id
and fooid = new.fooid
)
delete from history
where id in (select id
from ranked
where rn >= 5);
return new;
end;
$
language plpgsql;
create trigger remove_last_trigger
after insert on history
for each row execute procedure remove_last();On my laptop, a simple test setup with 5 million rows (1 million different fooid values) showed a trigger overhead that was less then a millisecond:
insert into history (fooid, target) values ('1', 'new stuff');QUERY PLAN
----------------------------------------------------------------------------------------------------
Insert on stuff.history (cost=0.00..0.01 rows=1 width=76) (actual time=0.062..0.062 rows=0 loops=1)
Buffers: shared hit=8
-> Result (cost=0.00..0.01 rows=1 width=76) (actual time=0.017..0.017 rows=1 loops=1)
Output: nextval('history_id_seq'::regclass), '1'::text, 'new stuff'::text, now()
Buffers: shared hit=1
Planning time: 0.024 ms
Trigger remove_last_trigger: time=0.438 calls=1
Execution time: 0.524 ms
How much overhead that is in your system and whether or not that is acceptable for you I don't know, but "thousand changes per day" doesn't sound like a very busy system.
Code Snippets
create table history
(
id serial primary key, -- to make a lookup on a single row efficient
fooid text not null,
target text not null,
updated_at timestamp not null default now()
);
-- to make finding the oldest row for one fooid efficient
create index on history(fooid, updated_at);create or replace function remove_last()
returns trigger
as
$$
begin
with ranked as (
select id, row_number() over (partition by fooid order by updated_at) as rn
from history
where id <> new.id
and fooid = new.fooid
)
delete from history
where id in (select id
from ranked
where rn >= 5);
return new;
end;
$$
language plpgsql;
create trigger remove_last_trigger
after insert on history
for each row execute procedure remove_last();insert into history (fooid, target) values ('1', 'new stuff');Context
StackExchange Database Administrators Q#186041, answer score: 3
Revisions (0)
No revisions yet.