HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlModerate

AWS Aurora PostgreSQL Serverless: How do you pre-warm the shared buffer after scaling?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlaftertheyousharedscalingserverlesswarmaurorapre

Problem

I'm using AWS Aurora PostgreSQL Serverless with autoscaling.
It appears as though scaling clears the shared buffer, so right when we want to crank out the performance, we are forced to face-plant with an I/O bottleneck. After we get warmed up, we see a great performance improvement.
However, if we run back-to-back once scaled, the second run goes much faster.
While I haven't seen anything specific on whether the shared buffer gets cleared on scaling, I'm almost positive that it is.

Aurora Serverless is currently using PostgreSQL 10.14 and it supports the pg_prewarm extension. It looks like the newest documentation suggests that prewarm supports auto pre-warm after a server restart, but this is serverless and a version that doesn't appear to mention auto pre-warming in the documentation.

I found this post that works great for PostgreSQL when restarting the server or recovering from a crash.

  • If we could at least retain the contents of the shared buffer of the lower ACU node after scaling, that'd be fine.



  • If we could pre-warm exactly what needs to be in memory ahead of time, that would be awesome!



  • There are certain tables that are quite large and we would want to selectively pre-warm the pieces that we want. pg_prewarm supports first_block and last_block block numbers for a table/index, but how would one know what values to put in there?



We know ahead of time when our peak is and tell RDS to scale right before, so we have a window of time where we could prepare.

What are my options?

Solution

My answer is not specific to AWS Aurora PostgreSQL Serverless, but for Postgres in general.
Simple alternative

In your related comment, you hinted that you only need rows from the last 24 hours. So you could (without involving pg_prewarm) simply:

SELECT * FROM public.tbl WHERE created_at > now() - interval '24h';


If created_at is indexed, and the predicate is selective enough, relevant blocks of table and index are prewarmed.

Since you don't actually want to retrieve any data when prewarming, you could use PERFORM in a DO statement:

DO
$BEGIN
   PERFORM * FROM public.tbl WHERE created_at > now() - interval '24h';
END$;


Same effect.

You can verify success with EXPLAIN (ANALYZE, BUFFERS):

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM public.tbl WHERE created_at > now() - interval '24h';


If there is enough cache memory available, you should now only see "shared hit" buffers. Like:

Buffers: shared hit=123456

.. where you'd see mostly "read" with mostly cold cache. Like:

Buffers: shared hit=143 read=153689

Basically, just run expected queries, and the cache will be populated accordingly.
pg_prewarm()

If you still want to use pg_prewarm() with block numbers, you can do that, too. Allows more options, like picking which cache to populate (operating system or database buffer cache) or some other finesse. The additional module has to be installed first, once per database:

CREATE EXTENSION pg_prewarm;


Working with block numbers only makes sense if your table is (mostly) physically clustered on the assumed column created_at. That would be the case with a read-only (read-mostly) table where new rows with the current timestamp are appended at the end of the table.

You can get the block number of a row from its ctid. See:

  • How do I decompose ctid into page and row numbers?



To get the block number of the first row that's younger than 24 hours:

SELECT ctid
FROM   public.tbl
WHERE  created_at > now() - interval '24h'
ORDER  BY created_at
LIMIT  1;


You get something like (5759,1). 5759 is the block number. Then you can:

SELECT pg_prewarm('public.tbl'::regclass, first_block => 5759)


Since we left last_block at its default NULL, everything "through the last block in the relation" will be prewarmed. (Not the index, though. You can prewarm that, too.)

The function call is with "mixed notation" (mix of "positional" and "named notation"). See:

  • Functions with variable number of input parameters

Code Snippets

SELECT * FROM public.tbl WHERE created_at > now() - interval '24h';
DO
$$BEGIN
   PERFORM * FROM public.tbl WHERE created_at > now() - interval '24h';
END$$;
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM public.tbl WHERE created_at > now() - interval '24h';
CREATE EXTENSION pg_prewarm;
SELECT ctid
FROM   public.tbl
WHERE  created_at > now() - interval '24h'
ORDER  BY created_at
LIMIT  1;

Context

StackExchange Database Administrators Q#295140, answer score: 10

Revisions (0)

No revisions yet.