patternsqlMajor
pgBouncer works great but occasionally becomes unavailable
Viewed 0 times
pgbouncerbutgreatworksoccasionallybecomesunavailable
Problem
I'm running pgBouncer in front of a busy postgres 9 database.
For most of the time it works fine. But every few hours I'll get an error email from my application with an exception from psycopg2:
OperationalError('could not connect to server: Cannot assign requested address
Is the server running on host "neo-hulk" and accepting
TCP/IP connections on port 6432?')
This is a python app with a bunch of celery workers running tasks. When those errors arrive I check the pgbouncer db and the pool size is within limits.
After some experimentation I've set the pool max size to 400 and pool size to 200. pool mode is "session" (requests are mostly auto-commit, almost no transactions).
What makes pgBouncer 'vanish' like that? its only for short periods of time (and in total we're talking about a tiny amount of requests compared to the sheer volume of requests its handing) but those requests that fail are important.
Thanks!
For most of the time it works fine. But every few hours I'll get an error email from my application with an exception from psycopg2:
OperationalError('could not connect to server: Cannot assign requested address
Is the server running on host "neo-hulk" and accepting
TCP/IP connections on port 6432?')
This is a python app with a bunch of celery workers running tasks. When those errors arrive I check the pgbouncer db and the pool size is within limits.
After some experimentation I've set the pool max size to 400 and pool size to 200. pool mode is "session" (requests are mostly auto-commit, almost no transactions).
What makes pgBouncer 'vanish' like that? its only for short periods of time (and in total we're talking about a tiny amount of requests compared to the sheer volume of requests its handing) but those requests that fail are important.
Thanks!
Solution
The "Cannot assign requested address" part in the error message comes from the kernel TCP stack. When encountered intermittently, this typically means that the space of available sockets is exhausted because of too much sockets in wait state (
The range of socket ports can be output by
You may check the result of
counters related to wait states.
If the total number of TCP sockets is close to
As a first workaround, the TCP ports range may be extended:
If it's not satisfactory, check the
They're defined as (from
tcp_tw_recycle (Boolean; default: disabled; since Linux 2.4)
Enable fast recycling of TIME_WAIT sockets. Enabling this
option is not recommended since this causes problems when work‐
ing with NAT (Network Address Translation).
tcp_tw_reuse (Boolean; default: disabled; since Linux 2.4.19/2.6)
Allow to reuse TIME_WAIT sockets for new connections when it is
safe from protocol viewpoint. It should not be changed without
advice/request of technical experts.
Personally I had success with
TIME_WAIT, or less probably FIN_WAIT_1 or FIN_WAIT_2)The range of socket ports can be output by
cat /proc/sys/net/ipv4/ip_local_port_range. The default value on a stock Linux kernel is generally 32768 61000.You may check the result of
netstat -ton|grep WAIT on the client(s) and on the pgBouncer's host when the system is busy. The -o flag will show the timeout counters related to wait states.
If the total number of TCP sockets is close to
61000-32768=28232 then exhaustion of this range is likely your problem. Since a closed socket spends 60 seconds in TIME_WAIT state in normal condition, if a client host connects more than 28232 times in one minute, new connections will fail with the mentioned error until ports are freed.As a first workaround, the TCP ports range may be extended:
# echo "1025 65535" >/proc/sys/net/ipv4/ip_local_port_rangeIf it's not satisfactory, check the
tcp_tw_recycle and tcp_tw_reuse flags, also tunable through /proc/sys/net/ipv4 and sysctl.They're defined as (from
man tcp):tcp_tw_recycle (Boolean; default: disabled; since Linux 2.4)
Enable fast recycling of TIME_WAIT sockets. Enabling this
option is not recommended since this causes problems when work‐
ing with NAT (Network Address Translation).
tcp_tw_reuse (Boolean; default: disabled; since Linux 2.4.19/2.6)
Allow to reuse TIME_WAIT sockets for new connections when it is
safe from protocol viewpoint. It should not be changed without
advice/request of technical experts.
Personally I had success with
tcp_tw_recycle when faced with this problem with a MySQL client app, but don't take this as a recommendation, my understanding of TCP being superficial at best.Code Snippets
# echo "1025 65535" >/proc/sys/net/ipv4/ip_local_port_rangeContext
StackExchange Database Administrators Q#59650, answer score: 22
Revisions (0)
No revisions yet.