patternsqlCritical
VACUUM returning disk space to operating system
Viewed 0 times
vacuumspacedisksystemoperatingreturning
Problem
VACUUM usually does not return disk space to operating system, except in some special cases.From the docs:
The standard form of
VACUUM removes dead row versions in tables and indexes and marks the space available for future reuse. However, it will not return the space to the operating system, except in the special case where one or more pages at the end of a table become entirely free and an exclusive table lock can be easily obtained. In contrast, VACUUM FULL actively compacts tables by writing a complete new version of the table file with no dead space. This minimizes the size of the table, but can take a long time. It also requires extra disk space for the new copy of the table, until the operation completes.The question is: how can this database state when
one or more pages at the end of a table become entirely free be achieved? This can be done via VACUUM FULL, but I haven't got enough space to implement it. So are there any other possibilities?Solution
To return space to the OS, use
Selects "full" vacuum, which can reclaim more space, but takes much
longer and exclusively locks the table. This method also requires
extra disk space, since it writes a new copy of the table and doesn't
release the old copy until the operation is complete. Usually this
should only be used when a significant amount of space needs to be
reclaimed from within the table.
Bold emphasis mine.
Plain
You can get empty pages at the end of the physical file when you
There are also special settings that might prevent
Prepare empty pages at the end of a table for testing
The system column
We can work with that and prepare a table by deleting all rows from the last page:
Now, the last page is empty. This ignores concurrent writes. Either you are the only one writing to that table or you need to to take a write lock to prevent race conditions.
The query is optimized to identify qualifying rows quickly. The second number of a
db<>fiddle here (reusing a simple table from a different case.)
Old sqlfiddle
Tools to measure row / table size:
Disk full
You need wiggle room on disk for any of these operations. There is also the community tool
Requires free disk space twice as large as the target table(s) and indexes.
As a last resort, you can run a dump/restore cycle. That removes all bloat from tables and indexes, too. Closely related question:
The answer over there is pretty radical. If your situation allows for it (no foreign keys or other references preventing row deletions), and no concurrent access to the table), you can just:
Dump the table to disk connecting from a remote computer with plenty of disk space (
From a remote shell, dump table data:
In a pg session,
From remote shell, restore to same table:
It is now free of any dead rows or bloat.
But maybe you can have that simpler?
-
Can you make enough space on disk by deleting (moving) unrelated files?
-
Can you
-
Can you run
Whatever you do, don't be rash. If in doubt, backup everything to a secure location first.
VACUUM FULL. While being at it, I suppose you run VACUUM FULL ANALYZE. I quote the manual:FULLSelects "full" vacuum, which can reclaim more space, but takes much
longer and exclusively locks the table. This method also requires
extra disk space, since it writes a new copy of the table and doesn't
release the old copy until the operation is complete. Usually this
should only be used when a significant amount of space needs to be
reclaimed from within the table.
Bold emphasis mine.
CLUSTER achieves that, too, as a collateral effect.Plain
VACUUM does not normally achieve your goal ("one or more pages at the end of a table entirely free"). It does not reorder rows and only prunes empty pages from the physical end of the file when the opportunity arises - like your quote from the manual reveals.You can get empty pages at the end of the physical file when you
INSERT a batch of rows and DELETE them before other tuples get appended. Or it can happen by coincidence if enough rows are deleted.There are also special settings that might prevent
VACUUM FULL from reclaiming space. See:- How to optimize a table for a very high-frequency updates?
Prepare empty pages at the end of a table for testing
The system column
ctid represents the physical position of a row. You need to understand that column:- How do I decompose ctid into page and row numbers?
We can work with that and prepare a table by deleting all rows from the last page:
DELETE FROM tbl t
USING (
SELECT (split_part(ctid::text, ',', 1) || ',0)')::tid AS min_tid
, (split_part(ctid::text, ',', 1) || ',65535)')::tid AS max_tid
FROM tbl
ORDER BY ctid DESC
LIMIT 1
) d
WHERE t.ctid BETWEEN d.min_tid AND d.max_tid;Now, the last page is empty. This ignores concurrent writes. Either you are the only one writing to that table or you need to to take a write lock to prevent race conditions.
The query is optimized to identify qualifying rows quickly. The second number of a
tid is the tuple index stored as unsigned int2, and 65535 is the maximum for that type (2^16 - 1), so that's the safe upper bound.db<>fiddle here (reusing a simple table from a different case.)
Old sqlfiddle
Tools to measure row / table size:
- Measure the size of a PostgreSQL table row
Disk full
You need wiggle room on disk for any of these operations. There is also the community tool
pg_repack as replacement for VACUUM FULL / CLUSTER. It avoids exclusive locks but needs free space to work with as well. The manual:Requires free disk space twice as large as the target table(s) and indexes.
As a last resort, you can run a dump/restore cycle. That removes all bloat from tables and indexes, too. Closely related question:
- I need to run VACUUM FULL with no available disk space
The answer over there is pretty radical. If your situation allows for it (no foreign keys or other references preventing row deletions), and no concurrent access to the table), you can just:
Dump the table to disk connecting from a remote computer with plenty of disk space (
-a for --data-only):From a remote shell, dump table data:
pg_dump -h -p -t mytbl -a mydb > db_mytbl.sqlIn a pg session,
TRUNCATE the table:-- drop all indexes and constraints here for best performance
TRUNCATE mytbl;From remote shell, restore to same table:
psql -h -p mydb -f db_mytbl.sql
-- recreate all indexes and constraints hereIt is now free of any dead rows or bloat.
But maybe you can have that simpler?
-
Can you make enough space on disk by deleting (moving) unrelated files?
-
Can you
VACUUM FULL smaller tables first, one by one, thereby freeing up enough disk space?-
Can you run
REINDEX TABLE or REINDEX INDEX to free disk space from bloated indexes?Whatever you do, don't be rash. If in doubt, backup everything to a secure location first.
Code Snippets
DELETE FROM tbl t
USING (
SELECT (split_part(ctid::text, ',', 1) || ',0)')::tid AS min_tid
, (split_part(ctid::text, ',', 1) || ',65535)')::tid AS max_tid
FROM tbl
ORDER BY ctid DESC
LIMIT 1
) d
WHERE t.ctid BETWEEN d.min_tid AND d.max_tid;pg_dump -h <host_name> -p <port> -t mytbl -a mydb > db_mytbl.sql-- drop all indexes and constraints here for best performance
TRUNCATE mytbl;psql -h <host_name> -p <port> mydb -f db_mytbl.sql
-- recreate all indexes and constraints hereContext
StackExchange Database Administrators Q#37028, answer score: 58
Revisions (0)
No revisions yet.