HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Why can MySQL handle multiple updates concurrently and PostgreSQL can't?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlwhycanhandlemysqlmultipleandupdatesconcurrently

Problem

Let's imagine you have a table with this definition:

CREATE TABLE public.positions
(
   id serial,
   latitude numeric(18,12),
   longitude numeric(18,12),
   updated_at timestamp without time zone
)


And you have 50,000 rows in this table. Now for testing purposes you will run an update like this:

update positions
set  updated_at = now()
where latitude between 234.12 and 235.00;


That statement will update 1,000 rows from the 50,000 (in this specific dataset).

If you run such a query in 30 different threads*, MySQL innodb will succeed and PostgreSQL will fail with lots of deadlocks.

Why?

* I'm comparing the latest version of MySQL innodb vs Postgres, this is a concurrent update case. Production cases: imagine 5000 stocks being updated with the latest price available constantly.

Solution

HISTORY LESSON

On Mar 13, 2013, Uber had switched from MySQL to PostgrsSQL.

Surprisingly, that love affair did not last very long.

On Jun 26, 2016, Uber had switched from PostgreSQL to MySQL.

Why the about face ???
YOUR ACTUAL QUESTION

Running an UPDATE is surprisingly micromanaged in PostgreSQL.

There are two DDL System Identifiers in PostgreSQL that must be properly understood

  • ctid : The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier. The OID, or even better a user-defined serial number, should be used to identify logical rows.



  • oid : The object identifier (object ID) of a row. This column is only present if the table was created using WITH OIDS, or if the default_with_oids configuration variable was set at the time. This column is of typeoid (same name as the column); see PostgreSQL Documentation on Object Identifier Types for more information about the type.



With regard to ctids, please note that performing and UPDATE (DELETE/INSERT from a physical standpoint) or VACUUM FULL will change a row's ctid. This does not bode well for tables with many indexes. Why ? This fact was recently discovered (July 26, 2016) by the Data Engineers at Uber.

Please note the following diagram (courtesy of Uber Engineering)

PostgreSQL does not cohesively couple a PRIMARY KEY with non-unique indexes. Since an index references a row by ctid, a simple UPDATE (even on a non-indexed column) will change the ctid, resulting in the need to rewrite the ctid in every index in the table that references that changed row. This is not something new. PostgreSQL has always done this by design.

Therefore, doing an UPDATE on 1000 rows will perform 1000 DELETEs and 1000 INSERTs. As already mentioned, every index attached to the table must have their new row ctid values written in the BTREE index entries, replacing the old row ctid value. Since you have no indexes on the table, you are just experiencing the DELETEs and INSERTs generating the deadlocks you see.
SUPPLEMENTAL INFO

Someone once posted a discussion about this and how to get around it. Others take advantage of the exposure of the Systems Identifiers available to their WHERE clauses:

  • Jun 04, 2002 : ctid & updates (or speedy updates/deletes)



  • Jul 05, 2011 : Delete completely duplicate rows in PostgreSQL (and keep only 1)



  • Nov 26, 2012 : Detecting Table Rewrites with the ctid Column



  • May 27, 2014 : How do I decompose ctid into page and row numbers?



The default transaction isolation level is READ COMMITTED. Having indexes that move would provided better consistency of data under this isolation level, but it comes at the cost of the constant rewriting of ctid values in the indexes and well as the MongoDB-like behavior of deleting and inserting a whole new row just for a simple UPDATE. While other transaction isolation levels are possible (REPEATABLE READ, READ UNCOMMITTED, SERIALIZABLE), it may require some specialized SQL against ctid and oid values to maintain needed views of the data in any given transaction. Such specialized queries may be helpful for SELECTs but would be rather untrustworthy for UPDATEs and DELETEs since there is also a VACUUM daemon running and the possibility of ctid going obsolete would be huge. In that case, it would be wise to reference rows in queries by the oid rather that citd.
WHAT ABOUT INNODB ???

InnoDB does not jump through hoops like PostgreSQL to perform an UPDATE. InnoDB does not compel you to write specialized SQL commands that harness rowid-type info to juice up an UPDATE.

InnoDB's default isolation is level is REPEATABLE READ. This is much easier to maintain than READ COMMITTED along with the fact that InnoDB rowids never change.

In order for InnoDB to have this same bad behavior, you would have to run the REPLACE command (which is a mechanical DELETE and INSERT). PostgreSQL performs this operation automatically and there is nothing you can do about it (unless you are brave enough to get the source code and fix or improve the UPDATE process).

Context

StackExchange Database Administrators Q#151813, answer score: 12

Revisions (0)

No revisions yet.