principlesqlModerate

What is "special" about PostgreSQL update vs delete+insert

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

postgresqlupdateinsertwhatdeletespecialabout

Problem

My understanding is that an update locks a tuple, marks it as deleted, and then adds a new tuple.

In other words, update = delete + insert.

Or so I had believe. But it appears there something fundamentally different about update from delete+insert in MVCC.

Setup:

CREATE TABLE example (a int PRIMARY KEY, b int);
INSERT INTO example VALUES (1, 1);

Method 1: Update

-- session A                          session B
BEGIN;
UPDATE example SET b = 2 WHERE a = 1;
                                      DELETE FROM example WHERE a = 1;
COMMIT;
-- now there are 0 rows in table example (1 row was deleted by session B)

Method 2: Delete and insert

-- session A                          session B
BEGIN;
DELETE FROM example WHERE a = 1;
INSERT INTO example VALUES (1, 2);
                                      DELETE FROM example WHERE a = 1;
COMMIT;
-- now there is 1 row in table example (0 rows deleted by session B)

Thus

UPDATE example SET b = 2 WHERE a = 1;

is different from

DELETE FROM example WHERE a = 1;
INSERT INTO example VALUES (1, 2);

How am I to understand the MVCC nature of update? Does the tuple has some sort of MVCC "identity" that is preserved during the update? What is it?

Solution

Yes, there is a difference between UPDATE and DELETE + INSERT.

Let's use the pageinspect extension to look at the tuples and the tuple headers.

If you want to repeat my experiment, you have to drop and re-create the table in between. Also, there may be additional flags (hint bits) if you selected the rows before examining them.

The meaning of infomask2 and infomask can be found in src/include/access/htup_details.h, see the quotations at the end of the answer.
After the UPDATE:

SELECT lp, t_xmin, t_xmax, t_ctid, t_infomask2, t_infomask, t_attrs
FROM heap_page_item_attrs(get_raw_page('example', 0), 'example');

 lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask |            t_attrs            
----+--------+--------+--------+-------------+------------+-------------------------------
  1 | 380943 | 380944 | (0,2)  |       16386 |        256 | {"\\x01000000","\\x02000000"}
  2 | 380944 |      0 | (0,2)  |       32770 |      10240 | {"\\x01000000","\\x02000000"}
(2 rows)

-
The first tuple is the dead one. Its t_ctid has been changed to point to the updated version.

This is one of the key points, so let me expand on this: the ctid of a tuple is the combination of the block number and the “line pointer” (lp in the query result. t_ctid is normally redundant, but in this case it is used to point to the new row version. This is the link between the original tuple and the updated version.

t_infomask2 is 2 (the number of columns) plus HEAP_HOT_UPDATED, so this row received a HOT update (there was enough space in the block, and there is no index). t_infomask is HEAP_XMIN_COMMITTED (a hint bit).

-
The second tuple is the new version.

t_infomask2 is 2 plus HEAP_ONLY_TUPLE, so this is “heap-only tuple” that is only reachable via the updated ctid of the old version. t_infomask is HEAP_XMAX_INVALID (true, it is 0) plus HEAP_UPDATED (this is the updated version).

After the DELETE + INSERT:

SELECT lp, t_xmin, t_xmax, t_ctid, t_infomask2, t_infomask, t_attrs
FROM heap_page_item_attrs(get_raw_page('example', 0), 'example');

 lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask |            t_attrs            
----+--------+--------+--------+-------------+------------+-------------------------------
  1 | 380958 | 380961 | (0,1)  |        8194 |        256 | {"\\x01000000","\\x02000000"}
  2 | 380961 |      0 | (0,2)  |           2 |       2048 | {"\\x01000000","\\x02000000"}
(2 rows)

-
Again, the first tuple is the dead one.

t_infomask2 is 2 plus HEAP_KEYS_UPDATED (this is a deleted or updated tuple), and t_infomask is HEAP_XMIN_COMMITTED (the tuple was valid before it was deleted).

-
The second tuple is the inserted one:

t_infomask2 is 2 plus, and t_infomask is HEAP_XMAX_INVALID (it is 0), so this is a new tuple.

Explanation of the observed difference:

In READ COMMITTED isolation level, a transaction always sees the latest committed version of a row. The DELETE in session B has to lock the row and is blocked by the UPDATE or DELETE in session A.

The documentation explains what happens when the lock is released:

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row.

In the case of the UPDATE there is a link between the old and the new row version, so PostgreSQL locks and deletes the new row version, while in the case of DELETE + INSERT there is no valid version of the row after the lock is gone, and nothing is deleted.

So while in many respects UPDATE and DELETE + INSERT are quite similar in PostgreSQL, they are not the same: in the second case, there is no connection between the deleted and the inserted row.
Appendix: the meaning of infomask and infomask2

t_infomask:
`/*
* information stored in t_infomask:
*/
#define HEAP_HASNULL 0x0001 / has null attribute(s) /
#define HEAP_HASVARWIDTH 0x0002 / has variable-width attribute(s) /
#define HEAP_HASEXTERNAL 0x0004 /* has

Code Snippets

SELECT lp, t_xmin, t_xmax, t_ctid, t_infomask2, t_infomask, t_attrs
FROM heap_page_item_attrs(get_raw_page('example', 0), 'example');

 lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask |            t_attrs            
----+--------+--------+--------+-------------+------------+-------------------------------
  1 | 380943 | 380944 | (0,2)  |       16386 |        256 | {"\\x01000000","\\x02000000"}
  2 | 380944 |      0 | (0,2)  |       32770 |      10240 | {"\\x01000000","\\x02000000"}
(2 rows)

SELECT lp, t_xmin, t_xmax, t_ctid, t_infomask2, t_infomask, t_attrs
FROM heap_page_item_attrs(get_raw_page('example', 0), 'example');

 lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask |            t_attrs            
----+--------+--------+--------+-------------+------------+-------------------------------
  1 | 380958 | 380961 | (0,1)  |        8194 |        256 | {"\\x01000000","\\x02000000"}
  2 | 380961 |      0 | (0,2)  |           2 |       2048 | {"\\x01000000","\\x02000000"}
(2 rows)

Context

StackExchange Database Administrators Q#278476, answer score: 17

Revisions (0)

No revisions yet.