patternsqlMinor
Adding nullable column to table costs more than 10 minutes
Viewed 0 times
columnthanmoreaddingnullableminutescoststable
Problem
I have problems to add a new column on a table.
I tried to run it a couple of times, but after more than 10 minutes running, I decided to cancel the query because of lock time.
Useful information:
I found interesting information about the way PostgreSQL manages nullable columns (via HeapTupleHeader).
My first guess is that because this table already has 32 nullable columns with 8-bits
So adding a new nullable column could need an update of HeapTupleHeader on every row to add a new 8-bits
So I tried to alter one of the nullable columns (which is not really nullable in reality) in order to decrease to 31 the number of nullable columns, to check if my guess could be true.
Unfortunately, this alter also takes very long time, more than 5 minutes, so I also aborted it.
Do you have an idea of what could cause this performance cost?
I tried to run it a couple of times, but after more than 10 minutes running, I decided to cancel the query because of lock time.
ALTER TABLE mytable ADD mycolumn VARCHAR(50);Useful information:
- PostgreSQL version: 9.1
- Number of rows: ~ 250K
- Number of columns: 38
- Number of nullable columns: 32
- Number of constraints: 5 (1 PK, 3 FK, 1 UNIQUE)
- Number of indexes: 1
- OS type: Debian Squeeze 64
I found interesting information about the way PostgreSQL manages nullable columns (via HeapTupleHeader).
My first guess is that because this table already has 32 nullable columns with 8-bits
MAXALIGN, HeapTupleHeader is 4 Bytes length (not verified, and I don't know how to do so).So adding a new nullable column could need an update of HeapTupleHeader on every row to add a new 8-bits
MAXALIGN, which could cause performance issues.So I tried to alter one of the nullable columns (which is not really nullable in reality) in order to decrease to 31 the number of nullable columns, to check if my guess could be true.
ALTER TABLE mytable ALTER myothercolumn SET NOT NULL;Unfortunately, this alter also takes very long time, more than 5 minutes, so I also aborted it.
Do you have an idea of what could cause this performance cost?
Solution
There are a couple of misunderstandings here:
The null bitmap is not part of the heap tuple header. Per documentation:
There is a fixed-size header (occupying 23 bytes on most machines),
followed by an optional null bitmap ...
Your 32 nullable columns are unsuspicious for two reasons:
-
The null bitmap is added per row, and only if there is at least one actual
-
The null bitmap is allocated after the heap tuple header and followed by an optional OID and then row data. The start of an OID or row data is indicated by
Note that t_hoff must be a multiple of MAXALIGN.
-
There is one free byte after the heap tuple header, which occupies 23 bytes. So the null bitmap for rows up to 8 columns effectively comes at no additional cost. With the 9th column in the table,
To display control information of a PostgreSQL database cluster (incl.
I updated instructions in the related answer you quoted.
All that aside, even if your
The growing number of entries in
A few shots in the dark
Check for possible table bloat, try a gentle
I would start by inspecting possible issues with indexes, triggers, foreign key or other constraints, especially those involving the column. Especially a corrupted index might be involved? Try
Try running the command in the night or whenever there is not much load.
A brute-force method would be to stop access to the server, then try again:
Without being able to pin it down, upgrading to the current version or the upcoming 9.4 in particular might help. There have been several improvements for big tables and for locking details. But if there is something broken in your DB, you should probably figure that out first.
The null bitmap is not part of the heap tuple header. Per documentation:
There is a fixed-size header (occupying 23 bytes on most machines),
followed by an optional null bitmap ...
Your 32 nullable columns are unsuspicious for two reasons:
-
The null bitmap is added per row, and only if there is at least one actual
NULL value in the row. Nullable columns have no direct impact, only actual NULL values do. If the null bitmap is allocated, it's always allocated completely (all or nothing). The actual size of the null bitmap is 1 bit per column, rounded up to the next byte. Per current souce code:#define BITMAPLEN(NATTS) (((int)(NATTS) + 7) / 8)-
The null bitmap is allocated after the heap tuple header and followed by an optional OID and then row data. The start of an OID or row data is indicated by
t_hoff in the header. Per comment source code:Note that t_hoff must be a multiple of MAXALIGN.
-
There is one free byte after the heap tuple header, which occupies 23 bytes. So the null bitmap for rows up to 8 columns effectively comes at no additional cost. With the 9th column in the table,
t_hoff is advanced another MAXALIGN (typically 8) bytes to provide for another 64 columns. So the next border would be at 72 columns.To display control information of a PostgreSQL database cluster (incl.
MAXALIGN), example for a typical installation of Postgres 9.3 on a Debian machine:sudo /usr/lib/postgresql/9.3/bin/pg_controldata /var/lib/postgresql/9.3/mainI updated instructions in the related answer you quoted.
All that aside, even if your
ALTER TABLE statement triggers a whole table rewrite (which it probably does, changing a data type), 250K are really not that much and would be a matter of seconds on any halfway decent machine (unless rows are unusually big). 10 minutes or more indicate a completely different problem. Your statement is waiting to get a lock on the table, most likely.The growing number of entries in
pg_stat_activity means more open transactions - indicates concurrent access on the table (most likely) that has to wait for the operation to finish.A few shots in the dark
Check for possible table bloat, try a gentle
VACUUM mytable or a more aggressive VACUUM FULL mytable - which might encounter the same concurrency issues, since this form also acquires an exclusive lock. You could try pg_repack instead ...I would start by inspecting possible issues with indexes, triggers, foreign key or other constraints, especially those involving the column. Especially a corrupted index might be involved? Try
REINDEX TABLE mytable; or DROP all of them and re-add them after ALTER TABLE in the same transaction.Try running the command in the night or whenever there is not much load.
A brute-force method would be to stop access to the server, then try again:
- Force drop db while others may be connected in postgresql
Without being able to pin it down, upgrading to the current version or the upcoming 9.4 in particular might help. There have been several improvements for big tables and for locking details. But if there is something broken in your DB, you should probably figure that out first.
Code Snippets
#define BITMAPLEN(NATTS) (((int)(NATTS) + 7) / 8)sudo /usr/lib/postgresql/9.3/bin/pg_controldata /var/lib/postgresql/9.3/mainContext
StackExchange Database Administrators Q#78405, answer score: 9
Revisions (0)
No revisions yet.