HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Azure Postgresql hitting 100% memory usage without easy way to restart the server

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlhittingwithouttherestartway100azureusageeasy

Problem

I have recently noticed an issue when using Azure Database for PostgreSQL where my memory usage keeps growing, and upon hitting 100%, the server will stop responding.

This database server is used exclusively for development, so it has a lot of short-lived connections that are frequently closed forcibly (as people restart their application to fix an error here or there.)

In the logs, before this happens, there are two patterns I can see, autovacuum errors:

2018-05-22 11:16:13 UTC-5ae5085b.20-LOG:  CreateProcess call failed: No error (error code 1455)
2018-05-22 11:16:13 UTC-5ae5085b.20-LOG:  could not fork autovacuum worker process: No error
2018-05-22 11:16:14 UTC-5ae5085b.20-LOG:  CreateProcess call failed: A blocking operation was interrupted by a call to WSACancelBlockingCall.

 (error code 1455)


And what appears to be a memory usage dump, followed by a warning that a request failed:

```
TopMemoryContext: 143584 total in 6 blocks; 68072 free (43 chunks); 75512 used
TopTransactionContext: 8192 total in 1 blocks; 7960 free (0 chunks); 232 used
CFuncHash: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
Type information cache: 24472 total in 2 blocks; 2840 free (0 chunks); 21632 used
Record information cache: 24576 total in 2 blocks; 15072 free (5 chunks); 9504 used
Operator lookup cache: 24576 total in 2 blocks; 10976 free (5 chunks); 13600 used
TableSpace cache: 8192 total in 1 blocks; 2312 free (0 chunks); 5880 used
MessageContext: 8192 total in 1 blocks; 6968 free (0 chunks); 1224 used
Operator class cache: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
smgr relation table: 24576 total in 2 blocks; 8896 free (4 chunks); 15680 used
TransactionAbortContext: 32768 total in 1 blocks; 32728 free (0 chunks); 40 used
Portal hash: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
PortalMemory: 8192 total in 1 blocks; 7880 free (0 chunks); 312 used
PortalHeapMemory: 1024 total in 1 blocks; 672 free (0 chunks); 3

Solution

AWS RDS has multiple options in a default parameter group that are calculated from DBInstanceClassMemory variable, for example max_connections parameter which might be {DBInstanceClassMemory/31457280}, which for 8GB server would calculate to 810241024*1024/31457280=273.

This allows the parameters to scale up and down while changing the database instance RAM size.

I have no idea how is it configured on Azure, but you also need to make sure that those parameters are reasonably configured for your RAM. For the start I'd check which parameters are dependent on DBInstanceClassMemory on AWS RDS and then set those parameters the same on Azure (calculate manually if there's no similar mechanism to automatically scale them).

Try to set them so the query below will return value smaller than available memory for Postgres:

select
    pg_size_pretty(
        shared_buffers::bigint*block_size
        + max_connections*work_mem*1024
        + autovacuum_max_workers*(
            case when autovacuum_work_mem=-1
            then maintenance_work_mem
            else autovacuum_work_mem
            end
        )*1024
    ) as estimated_max_ram_usage
from (
    select
    (select setting::bigint from pg_settings where name='block_size') as block_size,
    (select setting::bigint from pg_settings where name='shared_buffers') as shared_buffers,
    (select setting::bigint from pg_settings where name='max_connections') as max_connections,
    (select setting::bigint from pg_settings where name='work_mem') as work_mem,
    (select setting::bigint from pg_settings where name='autovacuum_max_workers') as autovacuum_max_workers,
    (select setting::bigint from pg_settings where name='autovacuum_work_mem') as autovacuum_work_mem,
    (select setting::bigint from pg_settings where name='maintenance_work_mem') as maintenance_work_mem
) as _
;


But you can still kill the server, for example with a pg_restore or reindex with too many parallel workers (they can use up to maintenance_work_mem per worker) or too complicated queries (they can use up to work_mem per query execution node, and there can be multiple in a single query).

There's no single setting for the server to tell it to use no more than X RAM.

Code Snippets

select
    pg_size_pretty(
        shared_buffers::bigint*block_size
        + max_connections*work_mem*1024
        + autovacuum_max_workers*(
            case when autovacuum_work_mem=-1
            then maintenance_work_mem
            else autovacuum_work_mem
            end
        )*1024
    ) as estimated_max_ram_usage
from (
    select
    (select setting::bigint from pg_settings where name='block_size') as block_size,
    (select setting::bigint from pg_settings where name='shared_buffers') as shared_buffers,
    (select setting::bigint from pg_settings where name='max_connections') as max_connections,
    (select setting::bigint from pg_settings where name='work_mem') as work_mem,
    (select setting::bigint from pg_settings where name='autovacuum_max_workers') as autovacuum_max_workers,
    (select setting::bigint from pg_settings where name='autovacuum_work_mem') as autovacuum_work_mem,
    (select setting::bigint from pg_settings where name='maintenance_work_mem') as maintenance_work_mem
) as _
;

Context

StackExchange Database Administrators Q#209194, answer score: 3

Revisions (0)

No revisions yet.