patternsqlMinor
pgsql - Delete data from huge table where not in
Viewed 0 times
deletewherehugepgsqlfromdatatablenot
Problem
I need to delete some rows from one large table. The rows to delete shouldn't be in another table, example:
but my server can't handle such blunt query, because there are almost a million records in the LargeTable and few million records in the EvenLargerTable
How can I solve it?
DELETE FROM LargeTable WHERE id IS NOT IN (SELECT DISTINCT foreign_id from EvenLargerTable)but my server can't handle such blunt query, because there are almost a million records in the LargeTable and few million records in the EvenLargerTable
How can I solve it?
Solution
For large tables used for test of existence
It is hard to explain why it's happening without having actual data. But with simple playground we can see that
Playground:
Tests:
NOT EXISTS frequently works faster then NOT IN. So tryDELETE FROM LargeTable
WHERE NOT EXISTS (
SELECT *
FROM EvenLargerTable
WHERE EvenLargerTable.foreign_id = LargeTable.id);It is hard to explain why it's happening without having actual data. But with simple playground we can see that
NOT IN case does not used indexes to perform the operation:Playground:
-- drop table if exists a;
-- drop table if exists b;
create table a as select (random()*1000)::int as x from generate_series(1,10000);
create index idx_a on a(x);
create table b as select (random()*1000)::int*10 as x from generate_series(1,1000000);
create index idx_b on b(x);
analyse a;
analyse b;Tests:
nd@postgres=# explain (verbose) delete from a where a.x not in (select b.x from b);
╔══════════════════════════════════════════════════════════════════════════════════╗
║ QUERY PLAN ║
╠══════════════════════════════════════════════════════════════════════════════════╣
║ Delete on nd.a (cost=0.00..129160170.00 rows=5000 width=6) ║
║ -> Seq Scan on nd.a (cost=0.00..129160170.00 rows=5000 width=6) ║
║ Output: a.ctid ║
║ Filter: (NOT (SubPlan 1)) ║
║ SubPlan 1 ║
║ -> Materialize (cost=0.00..23332.00 rows=1000000 width=4) ║
║ Output: b.x ║
║ -> Seq Scan on nd.b (cost=0.00..14425.00 rows=1000000 width=4) ║
║ Output: b.x ║
╚══════════════════════════════════════════════════════════════════════════════════╝
nd@postgres=# explain (verbose) delete from a where not exists (select * from b where a.x=b.x);
╔══════════════════════════════════════════════════════════════════════════════════╗
║ QUERY PLAN ║
╠══════════════════════════════════════════════════════════════════════════════════╣
║ Delete on nd.a (cost=0.42..5005.91 rows=1 width=12) ║
║ -> Nested Loop Anti Join (cost=0.42..5005.91 rows=1 width=12) ║
║ Output: a.ctid, b.ctid ║
║ -> Seq Scan on nd.a (cost=0.00..145.00 rows=10000 width=10) ║
║ Output: a.ctid, a.x ║
║ -> Index Scan using idx_b on nd.b (cost=0.42..20.78 rows=999 width=10) ║
║ Output: b.ctid, b.x ║
║ Index Cond: (a.x = b.x) ║
╚══════════════════════════════════════════════════════════════════════════════════╝Code Snippets
DELETE FROM LargeTable
WHERE NOT EXISTS (
SELECT *
FROM EvenLargerTable
WHERE EvenLargerTable.foreign_id = LargeTable.id);-- drop table if exists a;
-- drop table if exists b;
create table a as select (random()*1000)::int as x from generate_series(1,10000);
create index idx_a on a(x);
create table b as select (random()*1000)::int*10 as x from generate_series(1,1000000);
create index idx_b on b(x);
analyse a;
analyse b;nd@postgres=# explain (verbose) delete from a where a.x not in (select b.x from b);
╔══════════════════════════════════════════════════════════════════════════════════╗
║ QUERY PLAN ║
╠══════════════════════════════════════════════════════════════════════════════════╣
║ Delete on nd.a (cost=0.00..129160170.00 rows=5000 width=6) ║
║ -> Seq Scan on nd.a (cost=0.00..129160170.00 rows=5000 width=6) ║
║ Output: a.ctid ║
║ Filter: (NOT (SubPlan 1)) ║
║ SubPlan 1 ║
║ -> Materialize (cost=0.00..23332.00 rows=1000000 width=4) ║
║ Output: b.x ║
║ -> Seq Scan on nd.b (cost=0.00..14425.00 rows=1000000 width=4) ║
║ Output: b.x ║
╚══════════════════════════════════════════════════════════════════════════════════╝
nd@postgres=# explain (verbose) delete from a where not exists (select * from b where a.x=b.x);
╔══════════════════════════════════════════════════════════════════════════════════╗
║ QUERY PLAN ║
╠══════════════════════════════════════════════════════════════════════════════════╣
║ Delete on nd.a (cost=0.42..5005.91 rows=1 width=12) ║
║ -> Nested Loop Anti Join (cost=0.42..5005.91 rows=1 width=12) ║
║ Output: a.ctid, b.ctid ║
║ -> Seq Scan on nd.a (cost=0.00..145.00 rows=10000 width=10) ║
║ Output: a.ctid, a.x ║
║ -> Index Scan using idx_b on nd.b (cost=0.42..20.78 rows=999 width=10) ║
║ Output: b.ctid, b.x ║
║ Index Cond: (a.x = b.x) ║
╚══════════════════════════════════════════════════════════════════════════════════╝Context
StackExchange Database Administrators Q#172688, answer score: 6
Revisions (0)
No revisions yet.