HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlMinor

What is the best approach to run a DELETE query on 92K of 4 million rows?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
million92ktherowswhatdeletequeryapproachrunbest

Problem

Options that have occurred to me:

-
One big delete statement. For example:

delete from x where (y = 'x' and x = 'y') or (y = 'a' and x = 'b')


-
Same as above, but chunk in smaller pieces, 10k at a time?

-
Create a stored procedure so all I'm sending are the the two query values.

Solution

92K out of 4M is still only 2.3% of your records. I'd suggest your first option, just throw what you want deleted at the database and let the database sort it out (if you give it to the database all at once, chances are it'll do it more efficiently than if you parcel it out). The only time I'd recommend chunking it is if your application is time-critical and you're worried about slowing it down (in which case the other solution is to do it off-hours).

If you were taking 20-30% or more of the records out of the table, I'd recommend dropping indexes, doing the delete, and recreating indexes, followed by an OPTIMIZE TABLE, but for 2% of the records in the table the index rebuild overhead isn't worth it (it's faster just to keep the indexes up-to-date while deleting the records).

That said, after you've done the delete, running OPTIMIZE TABLE might not be a bad idea - don't do that during peak hours though.

Context

StackExchange Database Administrators Q#8514, answer score: 9

Revisions (0)

No revisions yet.