HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Diagnosing high write throughput on RDS Postgres

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
throughputpostgreshighwritediagnosingrds

Problem

I'm using Postgres 9.6.1 on RDS, and seeing orders-of-magnitude higher write throughput than I expect to see. I'm seeing a WriteThroughput around 40,000,000 Bytes/Second. I've looked at the following to try to track down where it is coming from:

I installed pg_stat_statements and have been periodically running:

SELECT sum(shared_blks_dirtied) from pg_stat_statements


And it looks like my queries are only dirtying at most about 30 blocks per second. A block is 8kB, right? So that's only 240 kB per second. (I also checked temp blocks and local blocks and I'm barely dirtying them or writing them at all. I also confirmed that I'm not hitting the limit of unique statements that pg_stat_statements will track).

I'm also looking at my logs, and I see checkpoints running every 5 minutes about this size:

LOG:  checkpoint complete: wrote 8538 buffers (0.4%); 0 transaction log file(s) added, 0 removed, 3 recycled; write=269.825 s, sync=0.021 s, total=269.921 s; sync files=2349, longest=0.010 s, average=0.000 s; distance=39599 kB, estimate=39599 kB


So that's approximately 40 MB / 5 minutes = 133 kB per second, which is the same order of magnitude I'm seeing in pg_stat_statements.

So I'm a little confused... am I doing the math wrong, or is there somewhere else I should be looking to see what's generating all that write throughput?

One other thing to mention: I have a lot of tables and schemas: approximately 25,000 schemas and 2 million total relations. (And I create / drop new schemas a couple times a minute generally). I've found that autovacuum can't keep up with this, even with very aggressive configuration, so I am manually vacuuming and analyzing tables based on my write patterns, and I've adjusted the autovacuum thresholds upwards so that in practice my manual vacuuming almost always gets there before autovacuum runs on it. I've confirmed that my manual vacuums do show up in pg_stat-statements, so I'd assume that any writes they generate a

Solution

It turns out that the vast majority of my write throughput was coming from the stats collector. Because of the very high number of relations in my database, the stats data is unusually large. I was able to diagnose the problem by temporarily clearing the stats:

SELECT pg_stat_reset()


Which led to an immediate, dramatic drop in my write throughput. To solve the issue going forward, I've redirected my stats directory to a ram disk instead of to the hard drive, using this setting: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html#PostgreSQL.Concepts.General.FeatureSupport.RamDisk

The end result is that CloudWatch is now reporting around ~400 KB/s of throughput instead of 40 MB/s.

Code Snippets

SELECT pg_stat_reset()

Context

StackExchange Database Administrators Q#189347, answer score: 8

Revisions (0)

No revisions yet.