principlesqlModerate
What is "special" about PostgreSQL update vs delete+insert
Viewed 0 times
postgresqlupdateinsertwhatdeletespecialabout
Problem
My understanding is that an update locks a tuple, marks it as deleted, and then adds a new tuple.
In other words, update = delete + insert.
Or so I had believe. But it appears there something fundamentally different about update from delete+insert in MVCC.
Setup:
Method 1: Update
Method 2: Delete and insert
Thus
is different from
How am I to understand the MVCC nature of update? Does the tuple has some sort of MVCC "identity" that is preserved during the update? What is it?
In other words, update = delete + insert.
Or so I had believe. But it appears there something fundamentally different about update from delete+insert in MVCC.
Setup:
CREATE TABLE example (a int PRIMARY KEY, b int);
INSERT INTO example VALUES (1, 1);Method 1: Update
-- session A session B
BEGIN;
UPDATE example SET b = 2 WHERE a = 1;
DELETE FROM example WHERE a = 1;
COMMIT;
-- now there are 0 rows in table example (1 row was deleted by session B)
Method 2: Delete and insert
-- session A session B
BEGIN;
DELETE FROM example WHERE a = 1;
INSERT INTO example VALUES (1, 2);
DELETE FROM example WHERE a = 1;
COMMIT;
-- now there is 1 row in table example (0 rows deleted by session B)
Thus
UPDATE example SET b = 2 WHERE a = 1;
is different from
DELETE FROM example WHERE a = 1;
INSERT INTO example VALUES (1, 2);
How am I to understand the MVCC nature of update? Does the tuple has some sort of MVCC "identity" that is preserved during the update? What is it?
Solution
Yes, there is a difference between
Let's use the
If you want to repeat my experiment, you have to drop and re-create the table in between. Also, there may be additional flags (hint bits) if you selected the rows before examining them.
The meaning of
After the
-
The first tuple is the dead one. Its
This is one of the key points, so let me expand on this: the
-
The second tuple is the new version.
After the
-
Again, the first tuple is the dead one.
-
The second tuple is the inserted one:
Explanation of the observed difference:
In
The documentation explains what happens when the lock is released:
In the case of the
So while in many respects
Appendix: the meaning of
`/*
* information stored in t_infomask:
*/
#define HEAP_HASNULL 0x0001 / has null attribute(s) /
#define HEAP_HASVARWIDTH 0x0002 / has variable-width attribute(s) /
#define HEAP_HASEXTERNAL 0x0004 /* has
UPDATE and DELETE + INSERT.Let's use the
pageinspect extension to look at the tuples and the tuple headers.If you want to repeat my experiment, you have to drop and re-create the table in between. Also, there may be additional flags (hint bits) if you selected the rows before examining them.
The meaning of
infomask2 and infomask can be found in src/include/access/htup_details.h, see the quotations at the end of the answer.After the
UPDATE:SELECT lp, t_xmin, t_xmax, t_ctid, t_infomask2, t_infomask, t_attrs
FROM heap_page_item_attrs(get_raw_page('example', 0), 'example');
lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask | t_attrs
----+--------+--------+--------+-------------+------------+-------------------------------
1 | 380943 | 380944 | (0,2) | 16386 | 256 | {"\\x01000000","\\x02000000"}
2 | 380944 | 0 | (0,2) | 32770 | 10240 | {"\\x01000000","\\x02000000"}
(2 rows)-
The first tuple is the dead one. Its
t_ctid has been changed to point to the updated version.This is one of the key points, so let me expand on this: the
ctid of a tuple is the combination of the block number and the “line pointer” (lp in the query result. t_ctid is normally redundant, but in this case it is used to point to the new row version. This is the link between the original tuple and the updated version.t_infomask2 is 2 (the number of columns) plus HEAP_HOT_UPDATED, so this row received a HOT update (there was enough space in the block, and there is no index). t_infomask is HEAP_XMIN_COMMITTED (a hint bit).-
The second tuple is the new version.
t_infomask2 is 2 plus HEAP_ONLY_TUPLE, so this is “heap-only tuple” that is only reachable via the updated ctid of the old version. t_infomask is HEAP_XMAX_INVALID (true, it is 0) plus HEAP_UPDATED (this is the updated version).After the
DELETE + INSERT:SELECT lp, t_xmin, t_xmax, t_ctid, t_infomask2, t_infomask, t_attrs
FROM heap_page_item_attrs(get_raw_page('example', 0), 'example');
lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask | t_attrs
----+--------+--------+--------+-------------+------------+-------------------------------
1 | 380958 | 380961 | (0,1) | 8194 | 256 | {"\\x01000000","\\x02000000"}
2 | 380961 | 0 | (0,2) | 2 | 2048 | {"\\x01000000","\\x02000000"}
(2 rows)-
Again, the first tuple is the dead one.
t_infomask2 is 2 plus HEAP_KEYS_UPDATED (this is a deleted or updated tuple), and t_infomask is HEAP_XMIN_COMMITTED (the tuple was valid before it was deleted).-
The second tuple is the inserted one:
t_infomask2 is 2 plus, and t_infomask is HEAP_XMAX_INVALID (it is 0), so this is a new tuple.Explanation of the observed difference:
In
READ COMMITTED isolation level, a transaction always sees the latest committed version of a row. The DELETE in session B has to lock the row and is blocked by the UPDATE or DELETE in session A.The documentation explains what happens when the lock is released:
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row.In the case of the
UPDATE there is a link between the old and the new row version, so PostgreSQL locks and deletes the new row version, while in the case of DELETE + INSERT there is no valid version of the row after the lock is gone, and nothing is deleted.So while in many respects
UPDATE and DELETE + INSERT are quite similar in PostgreSQL, they are not the same: in the second case, there is no connection between the deleted and the inserted row.Appendix: the meaning of
infomask and infomask2t_infomask:`/*
* information stored in t_infomask:
*/
#define HEAP_HASNULL 0x0001 / has null attribute(s) /
#define HEAP_HASVARWIDTH 0x0002 / has variable-width attribute(s) /
#define HEAP_HASEXTERNAL 0x0004 /* has
Code Snippets
SELECT lp, t_xmin, t_xmax, t_ctid, t_infomask2, t_infomask, t_attrs
FROM heap_page_item_attrs(get_raw_page('example', 0), 'example');
lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask | t_attrs
----+--------+--------+--------+-------------+------------+-------------------------------
1 | 380943 | 380944 | (0,2) | 16386 | 256 | {"\\x01000000","\\x02000000"}
2 | 380944 | 0 | (0,2) | 32770 | 10240 | {"\\x01000000","\\x02000000"}
(2 rows)SELECT lp, t_xmin, t_xmax, t_ctid, t_infomask2, t_infomask, t_attrs
FROM heap_page_item_attrs(get_raw_page('example', 0), 'example');
lp | t_xmin | t_xmax | t_ctid | t_infomask2 | t_infomask | t_attrs
----+--------+--------+--------+-------------+------------+-------------------------------
1 | 380958 | 380961 | (0,1) | 8194 | 256 | {"\\x01000000","\\x02000000"}
2 | 380961 | 0 | (0,2) | 2 | 2048 | {"\\x01000000","\\x02000000"}
(2 rows)Context
StackExchange Database Administrators Q#278476, answer score: 17
Revisions (0)
No revisions yet.