snippetsqlModerate
How to understand the PostgreSQL discrepancy between connection pool recommendation ((2 * n_cores) + n_disks) and support for 100s of connections?
Viewed 0 times
postgresqlthen_disksdiscrepancy100srecommendationn_coresunderstandbetweenfor
Problem
From PostgreSQL docs:
-
Number of Database Connections -> How to Find the Optimal Database Connection Pool Size
for optimal throughput the number of active connections should be
somewhere near ((core_count * 2) + effective_spindle_count)
-
Tuning Your PostgreSQL Server -> max_connections
Generally, PostgreSQL on good hardware can support a few hundred
connections.
For me - not an experienced DBA - there's a discrepancy somewhere in here, especially looking at the offerings of some DB-as-a-Service providers.
For example, at this time Amazon RDS's largest machine (db.r3.8xlarge) has 32 vCPUs, which according to the first formula would perhaps manage to run optimally with 100 connections in the pool, given many disks. Wouldn't it though run very badly with the "few hundred connections" from the second formula?
Even more extreme is the discrepancy for another DBaaS provider, who proposes a 2 core server with 500 concurrent connections. How could this possibly work well?
If I'm misunderstanding something, please let me know. Many thanks!
-
Number of Database Connections -> How to Find the Optimal Database Connection Pool Size
for optimal throughput the number of active connections should be
somewhere near ((core_count * 2) + effective_spindle_count)
-
Tuning Your PostgreSQL Server -> max_connections
Generally, PostgreSQL on good hardware can support a few hundred
connections.
For me - not an experienced DBA - there's a discrepancy somewhere in here, especially looking at the offerings of some DB-as-a-Service providers.
For example, at this time Amazon RDS's largest machine (db.r3.8xlarge) has 32 vCPUs, which according to the first formula would perhaps manage to run optimally with 100 connections in the pool, given many disks. Wouldn't it though run very badly with the "few hundred connections" from the second formula?
Even more extreme is the discrepancy for another DBaaS provider, who proposes a 2 core server with 500 concurrent connections. How could this possibly work well?
If I'm misunderstanding something, please let me know. Many thanks!
Solution
"Can support" != "optimal throughput".
You can use lots of connections, but it's slower.
If you use fewer connections and queue work, you get the same amount of work done in a smaller time.
Even more extreme is the discrepancy for another DBaaS provider, who proposes a 2 core server with 500 concurrent connections. How could this possibly work well?
Either they're using a connection pooling frontend like PgBouncer in transaction pooling mode, or it won't work well.
People like big numbers though, so they'll give you big numbers.
They're actually hurting performance by doing so. PostgreSQL has some costs that scale linearly with
Additionally, even idle connections have some further housekeeping costs.
If the connections are actively working, then you also have contention on system resources and on internal locks.
I routinely run into people who're having PostgreSQL performance issues - and who try to solve them by adding more connections, more workers in their application, etc. Especially people running queuing systems. It's surprisingly hard to persuade them that lowering the number of workers will make the system go faster, and that their original performance issues stemmed from having too many in the first place.
You can use lots of connections, but it's slower.
If you use fewer connections and queue work, you get the same amount of work done in a smaller time.
Even more extreme is the discrepancy for another DBaaS provider, who proposes a 2 core server with 500 concurrent connections. How could this possibly work well?
Either they're using a connection pooling frontend like PgBouncer in transaction pooling mode, or it won't work well.
People like big numbers though, so they'll give you big numbers.
They're actually hurting performance by doing so. PostgreSQL has some costs that scale linearly with
max_connections, so even if the connections are not used it still has a performance impact.Additionally, even idle connections have some further housekeeping costs.
If the connections are actively working, then you also have contention on system resources and on internal locks.
I routinely run into people who're having PostgreSQL performance issues - and who try to solve them by adding more connections, more workers in their application, etc. Especially people running queuing systems. It's surprisingly hard to persuade them that lowering the number of workers will make the system go faster, and that their original performance issues stemmed from having too many in the first place.
Context
StackExchange Database Administrators Q#82558, answer score: 12
Revisions (0)
No revisions yet.