patternsqlModerate
Why can MySQL handle multiple updates concurrently and PostgreSQL can't?
Viewed 0 times
postgresqlwhycanhandlemysqlmultipleandupdatesconcurrently
Problem
Let's imagine you have a table with this definition:
And you have 50,000 rows in this table. Now for testing purposes you will run an update like this:
That statement will update 1,000 rows from the 50,000 (in this specific dataset).
If you run such a query in 30 different threads*, MySQL innodb will succeed and PostgreSQL will fail with lots of deadlocks.
Why?
* I'm comparing the latest version of MySQL innodb vs Postgres, this is a concurrent update case. Production cases: imagine 5000 stocks being updated with the latest price available constantly.
CREATE TABLE public.positions
(
id serial,
latitude numeric(18,12),
longitude numeric(18,12),
updated_at timestamp without time zone
)And you have 50,000 rows in this table. Now for testing purposes you will run an update like this:
update positions
set updated_at = now()
where latitude between 234.12 and 235.00;That statement will update 1,000 rows from the 50,000 (in this specific dataset).
If you run such a query in 30 different threads*, MySQL innodb will succeed and PostgreSQL will fail with lots of deadlocks.
Why?
* I'm comparing the latest version of MySQL innodb vs Postgres, this is a concurrent update case. Production cases: imagine 5000 stocks being updated with the latest price available constantly.
Solution
HISTORY LESSON
On Mar 13, 2013, Uber had switched from MySQL to PostgrsSQL.
Surprisingly, that love affair did not last very long.
On Jun 26, 2016, Uber had switched from PostgreSQL to MySQL.
Why the about face ???
YOUR ACTUAL QUESTION
Running an
There are two DDL System Identifiers in PostgreSQL that must be properly understood
With regard to ctids, please note that performing and
Please note the following diagram (courtesy of Uber Engineering)
PostgreSQL does not cohesively couple a
Therefore, doing an
SUPPLEMENTAL INFO
Someone once posted a discussion about this and how to get around it. Others take advantage of the exposure of the Systems Identifiers available to their WHERE clauses:
The default transaction isolation level is
WHAT ABOUT INNODB ???
InnoDB does not jump through hoops like PostgreSQL to perform an UPDATE. InnoDB does not compel you to write specialized SQL commands that harness rowid-type info to juice up an UPDATE.
InnoDB's default isolation is level is
In order for InnoDB to have this same bad behavior, you would have to run the REPLACE command (which is a mechanical DELETE and INSERT). PostgreSQL performs this operation automatically and there is nothing you can do about it (unless you are brave enough to get the source code and fix or improve the
On Mar 13, 2013, Uber had switched from MySQL to PostgrsSQL.
Surprisingly, that love affair did not last very long.
On Jun 26, 2016, Uber had switched from PostgreSQL to MySQL.
Why the about face ???
YOUR ACTUAL QUESTION
Running an
UPDATE is surprisingly micromanaged in PostgreSQL.There are two DDL System Identifiers in PostgreSQL that must be properly understood
ctid: The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row'sctidwill change if it is updated or moved byVACUUM FULL. Thereforectidis useless as a long-term row identifier. The OID, or even better a user-defined serial number, should be used to identify logical rows.
oid: The object identifier (object ID) of a row. This column is only present if the table was created using WITH OIDS, or if the default_with_oids configuration variable was set at the time. This column is of typeoid (same name as the column); see PostgreSQL Documentation on Object Identifier Types for more information about the type.
With regard to ctids, please note that performing and
UPDATE (DELETE/INSERT from a physical standpoint) or VACUUM FULL will change a row's ctid. This does not bode well for tables with many indexes. Why ? This fact was recently discovered (July 26, 2016) by the Data Engineers at Uber.Please note the following diagram (courtesy of Uber Engineering)
PostgreSQL does not cohesively couple a
PRIMARY KEY with non-unique indexes. Since an index references a row by ctid, a simple UPDATE (even on a non-indexed column) will change the ctid, resulting in the need to rewrite the ctid in every index in the table that references that changed row. This is not something new. PostgreSQL has always done this by design.Therefore, doing an
UPDATE on 1000 rows will perform 1000 DELETEs and 1000 INSERTs. As already mentioned, every index attached to the table must have their new row ctid values written in the BTREE index entries, replacing the old row ctid value. Since you have no indexes on the table, you are just experiencing the DELETEs and INSERTs generating the deadlocks you see.SUPPLEMENTAL INFO
Someone once posted a discussion about this and how to get around it. Others take advantage of the exposure of the Systems Identifiers available to their WHERE clauses:
Jun 04, 2002: ctid & updates (or speedy updates/deletes)
Jul 05, 2011: Delete completely duplicate rows in PostgreSQL (and keep only 1)
Nov 26, 2012: Detecting Table Rewrites with the ctid Column
May 27, 2014: How do I decompose ctid into page and row numbers?
The default transaction isolation level is
READ COMMITTED. Having indexes that move would provided better consistency of data under this isolation level, but it comes at the cost of the constant rewriting of ctid values in the indexes and well as the MongoDB-like behavior of deleting and inserting a whole new row just for a simple UPDATE. While other transaction isolation levels are possible (REPEATABLE READ, READ UNCOMMITTED, SERIALIZABLE), it may require some specialized SQL against ctid and oid values to maintain needed views of the data in any given transaction. Such specialized queries may be helpful for SELECTs but would be rather untrustworthy for UPDATEs and DELETEs since there is also a VACUUM daemon running and the possibility of ctid going obsolete would be huge. In that case, it would be wise to reference rows in queries by the oid rather that citd.WHAT ABOUT INNODB ???
InnoDB does not jump through hoops like PostgreSQL to perform an UPDATE. InnoDB does not compel you to write specialized SQL commands that harness rowid-type info to juice up an UPDATE.
InnoDB's default isolation is level is
REPEATABLE READ. This is much easier to maintain than READ COMMITTED along with the fact that InnoDB rowids never change.In order for InnoDB to have this same bad behavior, you would have to run the REPLACE command (which is a mechanical DELETE and INSERT). PostgreSQL performs this operation automatically and there is nothing you can do about it (unless you are brave enough to get the source code and fix or improve the
UPDATE process).Context
StackExchange Database Administrators Q#151813, answer score: 12
Revisions (0)
No revisions yet.