HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

PostgreSQL Initial Database Size

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlsizedatabaseinitial

Problem

There are 2 parts to my question.

  • Is there a way of specifying the initial size of a database in PostgreSQL?



  • If there isn't, how do you deal with fragmentation when the database grows over time?



I've recently migrated from MSSQL to Postgres, and one of the things we did in the MSSQL world when creating a database was to specify the initial size of the database and transaction log. This reduced fragmentation and increased performance, especially if the "normal" size of the database is known beforehand.

The performance of my database drops as the size grows. For example, the workload I'm putting it through normally takes 10 minutes. As the database grows, this time increases. Doing a VACUUM, VACUUM FULL and VACUUM FULL ANALYSE do not appear to solve the issue. What does solve the performance problem is stopping the database, de-fragmenting the drive and then doing a VACUUM FULL ANALYSE takes the performance of my test back to the original 10 minutes. This leads me to suspect that fragmentation is what's causing me pain.

I've not been able to find any reference to reserving tablespace/database space in Postgres. Either I'm using the wrong terminology and thus finding nothing, or there is a different way of mitigating filesystem fragmentation in Postgres.

Any pointers?

The Solution

The supplied answers helped confirm what I'd begun to suspect. PostgreSQL stores the database across multiple files and this is what allows the database to grow without worry of fragmentation. The default behaviour is to pack these files to the brim with table data, which is good for tables that rarely change but is bad for tables that a frequently updated.

PostgreSQL utilizes MVCC to provide concurrent access to table data. Under this scheme, each update creates a new version of the row that was updated (this could be via time stamp or version number, who knows?). The old data is not immediately deleted, but marked for deletion. The actual deletion occurs when a VACUUM operati

Solution

There is another thing in play that hasn't entered your equations yet: HOT update. Related answers:

  • PostgreSQL unexplained table bloat



  • Are regular VACUUM ANALYZE still recommended under 9.1?



  • What is the meaning of n_live_tup and n_dead_tup in pg_stat_user_tables



  • How to speed up a slow UPDATE query



Setting FILLFACTOR to as low as 20 does seem excessive. It bloats the table to up to five times its size. If HOT updates work, you shouldn't have to go that low - normally.

There are exceptions: HOT updates can only reuse dead tuples from previous transactions, not from the same or concurrent ones. Therefore, heavy concurrent load or long transactions repeatedly updating the same rows can warrant such a low (or even lower) setting.

If you have big updates, changing large portions of the table at once, you might want to split them up in a couple of chunks, ideally only changing as many rows at once as fit locally on the data page. But that's hard to estimate and regulate.

Note that HOT updates only work when the changed columns are not involved in indexes in any way (neither as data nor as condition in a partial index). You might be blocking HOT updates with indexes on updated columns. If those are expendable, you might get better overall performance without them.

Finally, you can set autovacuum parameters per table. You could target heavily updated tables with aggressive settings allowing a somewhat tighter packing of rows than only FILLFACTOR 20.

Context

StackExchange Database Administrators Q#21014, answer score: 6

Revisions (0)

No revisions yet.