patternsqlModerate
PostgreSQL: When should I use fillfactor < 100 for tables on SSD?
Viewed 0 times
postgresqltablesssdfillfactor100forshouldusewhen
Problem
PostgreSQL documentation says that choosing smaller
"This gives UPDATE a chance to place the updated copy of a row on the
same page as the original, which is more efficient than placing it on
a different page."
Indeed this could improve UPDATE performance if cluster was stored on regular HDD drives, but mine is on SSD. As far as I understand setting
Do you have a counterexample for my case?
fillfactor lets table reserve space for future updates on the same page and"This gives UPDATE a chance to place the updated copy of a row on the
same page as the original, which is more efficient than placing it on
a different page."
Indeed this could improve UPDATE performance if cluster was stored on regular HDD drives, but mine is on SSD. As far as I understand setting
fillfactor < 100 will not give me any performance benefit but only waste space on a (relatively) expensive storage.Do you have a counterexample for my case?
Solution
You're addressing the performance problems that arise from random access, which as you point out is solved well enough by eliminating the cost of random access altogether with an SSD. However, postgres does offer "Heap Only Tuple (HOT) updates" (some notes here and here and here), which may benefit you from having a fillfactor less than 100% regardless of the physical medium in the case that you make updates moderately to frequently (the more frequent, the more the savings) and those updates do not affect any indexed columns.
Imagine if the fillfactor were exactly 100% and over time your table's data fills exactly one page. On the next update (assume it's of just one row), a copy of the updated row is made to the next page and the original row is left on the current page; if the transaction is committed the pre-update version of that row is marked as dead to be removed by a vacuum and the post-update version is marked as live. Now, let's say that update only affected a column "data_value" for a specific PK value, like
With HOT updates, as long as you don't update an indexed column, the database can avoid having to update index entries by keeping the post-update tuple on the same page as the pre-update table, which can speed up your updates. On top of that, HOT updates provide a bit of automatic maintenance and keep the page freer of those dead rows, reducing the need for vacuuming and auto-vacuuming.
Also, even in the case of data access, you might still benefit from having your data localized. Even if a random page read from an SSD takes a negligible amount of time, having to perform many reads is still (many * negligible time) more expensive than just reading the minimum amount of pages required to hold the data.
Imagine if the fillfactor were exactly 100% and over time your table's data fills exactly one page. On the next update (assume it's of just one row), a copy of the updated row is made to the next page and the original row is left on the current page; if the transaction is committed the pre-update version of that row is marked as dead to be removed by a vacuum and the post-update version is marked as live. Now, let's say that update only affected a column "data_value" for a specific PK value, like
update table_name set data_value = 1 where pk = 3. In this case, the heap location of the data changed pages, so the index supporting the PK constraint has to be updated to reflect that change. Regardless of the physical medium, that's a potential performance hit. Moreover, that dead tuple is left lingering until vacuum reclaims the space.With HOT updates, as long as you don't update an indexed column, the database can avoid having to update index entries by keeping the post-update tuple on the same page as the pre-update table, which can speed up your updates. On top of that, HOT updates provide a bit of automatic maintenance and keep the page freer of those dead rows, reducing the need for vacuuming and auto-vacuuming.
Also, even in the case of data access, you might still benefit from having your data localized. Even if a random page read from an SSD takes a negligible amount of time, having to perform many reads is still (many * negligible time) more expensive than just reading the minimum amount of pages required to hold the data.
Context
StackExchange Database Administrators Q#56735, answer score: 11
Revisions (0)
No revisions yet.