HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

FATAL 53300: Remaining connection slots are reserved for non-replication superuser connections

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
53300remainingnonarefatalreplicationslotsforreservedsuperuser

Problem

I have a PostgreSQL 12.1 database system (I'll refer to it as PGSQL) running on a remotely hosted VM server (Windows Server 2019). We upgraded the server OS and PGSQL a couple of months ago. Everything has been running more-or-less normally since then, until this morning when I started receiving the above-mentioned database error in pretty much every one of our in-house applications that connect to this PGSQL instance.

To check the connections, I ran SELECT * FROM pg_stat_activity;, which returned 103 rows. My postgresql.conf file has max_connections = 100, so that makes sense, but what doesn't make sense is that, of those 103 connections, 90+ of them are listed as idle with query text of DISCARD ALL. All of these show as being executed by the same, non-superuser account from the server's own internal address. However, several of the connections show a query_start date value from a month or more ago.

Now, many of the applications we have in place are unfortunately built with hard-coded credentials (I have a lot of "clean-up" work to do on the code for these applications that I inherited) and are generally being executed from shortcuts pointing to an "Application" share on the server that's hosting the PGSQL database, so none of this looks particularly "suspicious". I tried to simply kill the processes using SELECT pg_cancel_backend(); on one of the pid values from the previous query, but requerying pg_stat_activity still shows the same record in the result set (all of the values appear to be exactly the same, from what I can tell).

Perhaps I'm not using the correct function to terminate these "hung" processes or something, but I could not figure out how to clear out these connections individually. Because I needed to get our production environment back to a usable state, I ended up just stopping and restarting the PGSQL service on the server which did clear out all of those old DISCARD ALL statements, but I'm curious if there's something I cou

Solution

Regardless of whatever issues you might have, pg_cancel_backend cancels the backend's current query. But an idle backend doesn't have a current query, which is what makes it idle. You want pg_terminate_backend.

More fundamentally, it seems like you have an application problem or a connection pooler problem, but you haven't described them enough for us to give advice.

If you just want to kick the can down the road a bit further, you can increase max_connections. Having a large number of idle connections (not "idle in transaction"!) is not desirable, but is also not a huge problem.


prior to upgrading our PGSQL server to v12.1, we ran v9.4 for a number of years and never once encountered this issue.

So 0 vs 1? That doesn't sound like a sample size on which you can draw a conclusion. Maybe 9.4 just got rebooted for other reasons often enough to keep the leak under control. Or maybe someone increased the max_connections to a large enough value that it was never reached between reboots, but that increase didn't survive the upgrade.

Context

StackExchange Database Administrators Q#264921, answer score: 4

Revisions (0)

No revisions yet.