HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

speeding up a deletion in postgresql

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
speedingdeletionpostgresql

Problem

I came up with:

drop table if exists idtemp;
create temp table idtemp as 
      select documentid from taskflag where taskid='coref' and state = 2 
         order by statechanged asc limit howmany;
create unique index on idtemp(documentid);

-- trim taskflag to the first N docs ordered by coref.
delete from taskflag where documentid not in (select documentid from idtemp) ;


This is very slow when there are 120k records in taskflag and I'm keeping 10k.

Taskflag looks like:

\d taskflag
                Table "public.taskflag"
    Column    |            Type             | Modifiers 
--------------+-----------------------------+-----------
 documentid   | character varying(64)       | not null
 taskid       | character varying(64)       | not null
 state        | smallint                    | 
 statechanged | timestamp without time zone | 
Indexes:
    "taskflag_pkey" PRIMARY KEY, btree (documentid, taskid)
    "task_index2" btree (documentid)
    "task_index4" btree (taskid, state, statechanged)


Explain says:

QUERY PLAN                                    
----------------------------------------------------------------------------------
 Delete on taskflag  (cost=0.00..105811822.25 rows=223210 width=6)
   ->  Seq Scan on taskflag  (cost=0.00..105811822.25 rows=223210 width=6)
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.00..449.00 rows=10000 width=146)
                 ->  Seq Scan on idtemp  (cost=0.00..184.00 rows=10000 width=146)
(6 rows)


Should I just arrange for the temp table to contain the ones I am keeping?

Solution

The easiest optimization would probably be to let the planner use a hash anti join, by rewriting the query as:

delete from taskflag where not exists
  (select 1 from idtemp where documentid=taskflag.documentid);


also you may need to ANALYZE the temp table immediately after populating it.

Code Snippets

delete from taskflag where not exists
  (select 1 from idtemp where documentid=taskflag.documentid);

Context

StackExchange Database Administrators Q#18808, answer score: 5

Revisions (0)

No revisions yet.