HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

PostgreSQL DELETE query hangs (and so does EXPLAIN ANALYZE!)

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlanalyzedeletequerydoesandexplainhangs

Problem

I'm trying to delete a bunch of rows from a table matching a query. The general form of my query is:

DELETE FROM mytable WHERE _id IN (SELECT _id FROM mytable WHERE ...);

The _id column is a SERIAL PRIMARY KEY.

If I run the inner SELECT query on its own it runs in about 1 second and returns about 100,000 rows. But when I add the DELETE call onto it, it seems to just sit and chug away. And chug away. I let it run for about a minute or so, and then cancelled it assuming no progress was being made.

I added EXPLAIN ANALYZE onto the front of it to see if I could check what was taking so long, but that call also hangs for a very long period of time.

Can anybody tell me what could be going on such that it's very quick to identify the rows I'd like to delete but takes an indeterminate amount of time to delete them?

Update: The full query is as follows.

DELETE FROM cards 
WHERE _id IN (
    SELECT _id
    FROM cards
    LEFT JOIN game_results ON game_results.card_id = cards._id 
    WHERE NOT available
    AND game_id IS NULL
)


A simplified version of the tables would be:

CREATE TABLE cards (_id INTEGER PRIMARY KEY, available BOOLEAN);
CREATE TABLE games (_id INTEGER PRIMARY KEY);
CREATE TABLE game_results (game_id INTEGER REFERENCES games, card_id INTEGER REFERENCES cards);


I'm sure this query would eventually finish running, I'm just surprised at how long it's taking given that retrieving all of the IDs to delete is so quick.

Solution:

The problem was three-fold!

-
I had one of the earlier (very slow!) queries running that had been triggered by an aborted script. The query was still running even though the script had been aborted. As such, operations like trying to drop indexes were locked waiting for the query to finish. Manually cancelling that query using pg_cancel_backend allowed me to experiment with dropping indexes and foreign key constraints.

-
Foreign key constraints seem to be the issue slowing everything down. The

Solution

Assuming that the columns are not nullable, the query can be simplified (without the self join) to using either NOT IN or NOT EXISTS:

DELETE FROM cards 
WHERE available = FALSE 
  AND _id NOT IN 
      ( SELECT card_id
        FROM game_results 
      ) ;

DELETE FROM cards AS c 
WHERE c.available = FALSE
  AND NOT EXISTS 
      ( SELECT * 
        FROM game_results AS gr
        WHERE gr.card_id = c._id
      ) ;


Still, deleting 100K rows can not be instantaneous. Performance may be affected by lack (or excess) of indexes, triggers, cascade deletes, etc.

You can use EXPLAIN (only, without ANALYZE) to get the execution plan of a DELETE. See EXPLAIN for details, especially if you want to try ANALYZE or other options of EXPLAIN:

Important: Keep in mind that the statement is actually executed when the ANALYZE option is used. Although EXPLAIN will discard any output that a SELECT would return, other side effects of the statement will happen as usual. If you wish to use EXPLAIN ANALYZE on an INSERT, UPDATE, DELETE, CREATE TABLE AS, or EXECUTE statement without letting the command affect your data, use this approach:

BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;

Code Snippets

DELETE FROM cards 
WHERE available = FALSE 
  AND _id NOT IN 
      ( SELECT card_id
        FROM game_results 
      ) ;

DELETE FROM cards AS c 
WHERE c.available = FALSE
  AND NOT EXISTS 
      ( SELECT * 
        FROM game_results AS gr
        WHERE gr.card_id = c._id
      ) ;
BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;

Context

StackExchange Database Administrators Q#166747, answer score: 8

Revisions (0)

No revisions yet.