HiveBrain v1.2.0
Get Started
← Back to all entries
debugsqlModerate

PostgreSQL could not fork autovacuum worker process: Cannot allocate memory

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlcannotforkprocesscouldworkerautovacuumallocatememorynot

Problem

Running into some issues every few days with postgres crashing and entering recovery mode. The logs from postgres look like so

```
... Lots of this for 5-10minutes

2015-09-24 10:07:27 GMT LOG: could not fork autovacuum worker process: Cannot allocate memory
2015-09-24 10:07:28 GMT LOG: could not fork autovacuum worker process: Cannot allocate memory
2015-09-24 10:07:29 GMT LOG: could not fork autovacuum worker process: Cannot allocate memory
2015-09-24 10:07:30 GMT LOG: could not fork autovacuum worker process: Cannot allocate memory
2015-09-24 10:07:32 GMT LOG: server process (PID 16244) was terminated by signal 9: Killed
2015-09-24 10:07:32 GMT DETAIL: Failed process was running: SELECT 1
2015-09-24 10:07:32 GMT LOG: terminating any other active server processes
2015-09-24 10:07:32 GMT WARNING: terminating connection because of crash of another server process
2015-09-24 10:07:32 GMT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2015-09-24 10:07:32 GMT HINT: In a moment you should be able to reconnect to the database and repeat your command.
2015-09-24 10:07:32 GMT WARNING: terminating connection because of crash of another server process
2015-09-24 10:07:32 GMT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

.... for some time repeats this log:

2015-09-24 10:07:33 GMT HINT: In a moment you should be able to reconnect to the database and repeat your command.
2015-09-24 10:07:33 GMT WARNING: terminating connection because of crash of another server process
2015-09-24 10:07:33 GMT DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memor

Solution

Hardware problems...

This log entry:


The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.

can be a result of at least two different underlying problems. The first one is a bad executable or failing hardware - this is why I suggested moving your database to a better place (be it whatever).

You are currently on a Digital Ocean droplet, which (as I just checked) is a virtual private server. This, at least to me, does not necessarily mean that it is a separate piece of hardware - and if it is not, there are chances that other users are also affected and the problem is handled quickly by the provider. Hopefully, the configuration there excludes the possibility of foreign systems having an adverse effect on yours.

That much about clouds and shared hosting :) As you see from the comments above, the more probable root of your issue is something you can solve.

... or memory handling issues?

The second (and I think more common) reason of the error is memory pressure. If you are short on memory (which can be the case, see the calculation below), the operating system may decide to kill some process to allocate memory for some other processes. If memory overcommit is allowed for the OS, the chances for this is much higher than without it.

See what the PostgreSQL documentation has to say about this:


In Linux 2.4 and later, the default virtual memory behavior is not
optimal for PostgreSQL. Because of the way that the kernel implements
memory overcommit, the kernel might terminate the PostgreSQL
postmaster (the master server process) if the memory demands of either
PostgreSQL or another process cause the system to run out of virtual
memory.


If this happens, you will see a kernel message that looks like this
(consult your system documentation and configuration on where to look
for such a message):

Out of Memory: Killed process 12345 (postgres).




This indicates that the postgres process has been terminated due to
memory pressure. Although existing database connections will continue
to function normally, no new connections will be accepted. To recover,
PostgreSQL will need to be restarted.

Further down, it describes how to change this. What is interesting and important that you cannot fully disallow the OOM killer - this is important for keeping the OS running as long as possible. So you set the overcommit behaviour to strict by

sysctl -w vm.overcommit_memory=2


(or editing sysctl.conf and reloading it via sysctl).

Alternatively, you can set the target score of the postmaster process to the lowest possible value, therefore making it highly unlikely to be chosen when the OOM killer looks for a victim. This should be done in a root-owned startup script - editing the already used one seems appropriate. This is what you need:

echo -1000 > /proc/self/oom_score_adj


Check the linked documentation page for further details, there are smaller details to be observed with each solution.

It's good to know that the OOM killer wakes up only when the physical memory and the swap space are both exhausted. A cheap way out is increasing the swap space - however, relying on it is usually too slow for normal database operation. Depending on your use case, it might be the solution, however.

Note that for both approaches you need root access to the OS.

An approach that may work without root access

If hardware problems can be excluded as the root cause, and you don't have root access, you can still work around the problem. This will not give a bulletproof solution, but can decrease the possibility of the issue reoccurring.

Let's quickly check how much memory you original setup uses:

max_connections = 250
shared_buffers = 768MB
temp_buffers = 8MB
# work_mem = 1MB        # a commented-out value means it is at the default - 
                        # in 9.4 it is 4MB


You have 2 GB of physical memory.

Let's count how much is used (calculating the worst-case scenario):

  • shared_buffers is taken always: 768MB



  • work_mem and temp_buffers are allocated per session (that is, connection), and max_connections is 250: (4MB + 8MB) * 250 = 3000MB



Of course, it is highly unlikely that all connections use up all this space. And as you state in a comment, you don't use more than 70 connections at a time, that lowers the number to 840MB

  • maintenance_work_mem and (optionally) autovacuum_work_mem can consume some more. You seem to have them at their default value, that is 64MB.



All these add up to 1672MB. What is left for everything else is 2048MB - 1672MB = 376MB. Checking how much is needed for a linux server installation, I took Ubuntu as an example. The documentation says that 192MiB should be enough for a minimal setup - this way, you could survive with your settings. Apparently, there are othe

Code Snippets

Out of Memory: Killed process 12345 (postgres).
sysctl -w vm.overcommit_memory=2
echo -1000 > /proc/self/oom_score_adj
max_connections = 250
shared_buffers = 768MB
temp_buffers = 8MB
# work_mem = 1MB        # a commented-out value means it is at the default - 
                        # in 9.4 it is 4MB

Context

StackExchange Database Administrators Q#116027, answer score: 12

Revisions (0)

No revisions yet.