patternsqlMinor

Is this DELETE ... WHERE EXISTS query plan O(n*2) or O(n^2)?

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

thisdeletewherequeryplanexists

Problem

I'm trying to perform a common task, deleting duplicates from a table with the aim of adding a unique constraint.

CREATE TABLE IF NOT EXISTS item_identifier (
    pk       BIGSERIAL PRIMARY KEY,
    prefix   INTEGER NOT NULL,
    suffix   VARCHAR(1024) NOT NULL
);
CREATE INDEX temp_prefix_suffix_idx ON item_identifier (prefix, suffix);

I want to delete duplicates using a common query that can be found in many answers on this site. I think the rate of duplicates runs to about 1%, so there are not many to remove.

Index is provided purely to help this de-duplicate and will be dropped later. Though, as you see, it isn't even used by PostgreSQL!

There are 2,759,559,168 rows. The temp_prefix_suffix_idx index itself is ~ 100 GB. The CREATE INDEX took 12 hours so I don't expect the DELETE to be quick. But from a 10% sample set I extrapolated that it would take 20 hours, and it's already taken 40 hours. It's probably still within the margin of error for my sample method, but I am worried that this will take exponential time due to it not using indexes.

This EXPLAIN has Seq Scan on item_identifier a and Seq Scan on item_identifier b.

```
EXPLAIN DELETE FROM item_identifier a
WHERE EXISTS
(SELECT FROM item_identifier b
WHERE a.prefix = b.prefix
AND a.suffix = b.suffix
AND a.pk > b.pk);
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Delete on item_identifier a (cost=1168440103.12..1233224771.45 rows=0 width=0)
-> Merge Semi Join (cost=1168440103.12..1233224771.45 rows=919853056 width=12)
Merge Cond: ((a.prefix = b.prefix) AND ((a.suffix)::text = (b.suffix)::text))
Join Filter: (a.pk > b.pk)
-> Sort (cost=584220051.56..591118949.48 rows=2759559168 width=52)
Sort Key: a.prefix, a.suffix
-> Seq Scan on item_identifier a (cost=0.00..57175596.68 r

Solution

That index is unlikely to be useful, unless you have well over 100GB of RAM. Otherwise, hitting a random uncached index leaf page over 2.5 billion time will be a massive problem.

This deletion effort will be very frustrating as postgresql offers no way to introspect what if going on, and an interruption will cause all work so far to be lost. So I would start out by creating a table (AS SELECT) that stores all the pks to delete. Once that is done, that is at least one chunk of work that can be reused if the next step fails. I would also store the system column 'item_identifier.ctid' of the row as well as the pk. It is not safe to use a ctid collected in one statement to id rows in a later statement, but it should be safe enough to use it for ordering purposes.

Planning estimates for deletions and updates are particularly bad, as it doesn't take into account of random IO needed to fetch the row to delete it. The top merge join for example is going to return rows ordered by (prefix, suffix), which is likely to mean random ordering on ctid. So it will hop randomly all over the table to do the actual deletion, but the estimate does not account for that. Now it only has to do that for each row to delete, but still 1% of 2.7 billion is not a small number. If the order of the pk matches the physical ording of table rows, the second plan might be better in that regard.

How important this is of course depends on the relative speed of random versus sequential read of your underlying storage.

Context

StackExchange Database Administrators Q#314052, answer score: 2

Revisions (0)

No revisions yet.