snippetsqlModerate
AWS Aurora PostgreSQL Serverless: How do you pre-warm the shared buffer after scaling?
Viewed 0 times
postgresqlaftertheyousharedscalingserverlesswarmaurorapre
Problem
I'm using
It appears as though scaling clears the shared buffer, so right when we want to crank out the performance, we are forced to face-plant with an I/O bottleneck. After we get warmed up, we see a great performance improvement.
However, if we run back-to-back once scaled, the second run goes much faster.
While I haven't seen anything specific on whether the shared buffer gets cleared on scaling, I'm almost positive that it is.
Aurora Serverless is currently using
I found this post that works great for PostgreSQL when restarting the server or recovering from a crash.
We know ahead of time when our peak is and tell RDS to scale right before, so we have a window of time where we could prepare.
What are my options?
AWS Aurora PostgreSQL Serverless with autoscaling.It appears as though scaling clears the shared buffer, so right when we want to crank out the performance, we are forced to face-plant with an I/O bottleneck. After we get warmed up, we see a great performance improvement.
However, if we run back-to-back once scaled, the second run goes much faster.
While I haven't seen anything specific on whether the shared buffer gets cleared on scaling, I'm almost positive that it is.
Aurora Serverless is currently using
PostgreSQL 10.14 and it supports the pg_prewarm extension. It looks like the newest documentation suggests that prewarm supports auto pre-warm after a server restart, but this is serverless and a version that doesn't appear to mention auto pre-warming in the documentation.I found this post that works great for PostgreSQL when restarting the server or recovering from a crash.
- If we could at least retain the contents of the shared buffer of the lower ACU node after scaling, that'd be fine.
- If we could pre-warm exactly what needs to be in memory ahead of time, that would be awesome!
- There are certain tables that are quite large and we would want to selectively pre-warm the pieces that we want.
pg_prewarmsupportsfirst_blockandlast_blockblock numbers for a table/index, but how would one know what values to put in there?
We know ahead of time when our peak is and tell RDS to scale right before, so we have a window of time where we could prepare.
What are my options?
Solution
My answer is not specific to AWS Aurora PostgreSQL Serverless, but for Postgres in general.
Simple alternative
In your related comment, you hinted that you only need rows from the last 24 hours. So you could (without involving pg_prewarm) simply:
If
Since you don't actually want to retrieve any data when prewarming, you could use
Same effect.
You can verify success with
If there is enough cache memory available, you should now only see "shared hit" buffers. Like:
.. where you'd see mostly "read" with mostly cold cache. Like:
Basically, just run expected queries, and the cache will be populated accordingly.
If you still want to use
Working with block numbers only makes sense if your table is (mostly) physically clustered on the assumed column
You can get the block number of a row from its
To get the block number of the first row that's younger than 24 hours:
You get something like
Since we left
The function call is with "mixed notation" (mix of "positional" and "named notation"). See:
Simple alternative
In your related comment, you hinted that you only need rows from the last 24 hours. So you could (without involving pg_prewarm) simply:
SELECT * FROM public.tbl WHERE created_at > now() - interval '24h';If
created_at is indexed, and the predicate is selective enough, relevant blocks of table and index are prewarmed.Since you don't actually want to retrieve any data when prewarming, you could use
PERFORM in a DO statement:DO
$BEGIN
PERFORM * FROM public.tbl WHERE created_at > now() - interval '24h';
END$;Same effect.
You can verify success with
EXPLAIN (ANALYZE, BUFFERS):EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM public.tbl WHERE created_at > now() - interval '24h';If there is enough cache memory available, you should now only see "shared hit" buffers. Like:
Buffers: shared hit=123456.. where you'd see mostly "read" with mostly cold cache. Like:
Buffers: shared hit=143 read=153689Basically, just run expected queries, and the cache will be populated accordingly.
pg_prewarm()If you still want to use
pg_prewarm() with block numbers, you can do that, too. Allows more options, like picking which cache to populate (operating system or database buffer cache) or some other finesse. The additional module has to be installed first, once per database:CREATE EXTENSION pg_prewarm;Working with block numbers only makes sense if your table is (mostly) physically clustered on the assumed column
created_at. That would be the case with a read-only (read-mostly) table where new rows with the current timestamp are appended at the end of the table.You can get the block number of a row from its
ctid. See:- How do I decompose ctid into page and row numbers?
To get the block number of the first row that's younger than 24 hours:
SELECT ctid
FROM public.tbl
WHERE created_at > now() - interval '24h'
ORDER BY created_at
LIMIT 1;You get something like
(5759,1). 5759 is the block number. Then you can:SELECT pg_prewarm('public.tbl'::regclass, first_block => 5759)Since we left
last_block at its default NULL, everything "through the last block in the relation" will be prewarmed. (Not the index, though. You can prewarm that, too.)The function call is with "mixed notation" (mix of "positional" and "named notation"). See:
- Functions with variable number of input parameters
Code Snippets
SELECT * FROM public.tbl WHERE created_at > now() - interval '24h';DO
$$BEGIN
PERFORM * FROM public.tbl WHERE created_at > now() - interval '24h';
END$$;EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM public.tbl WHERE created_at > now() - interval '24h';CREATE EXTENSION pg_prewarm;SELECT ctid
FROM public.tbl
WHERE created_at > now() - interval '24h'
ORDER BY created_at
LIMIT 1;Context
StackExchange Database Administrators Q#295140, answer score: 10
Revisions (0)
No revisions yet.