HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Delete data from a billion row table based on where clause

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
frombilliondeletewherebasedrowdataclausetable

Problem

I need to delete about 400 million rows from a 1.3 billion row table based on an indexed date field.

The table is about 800 GB in size.

What is the most efficient way to delete the data without causing adverse effects? The table is under heavy use, meaning lots of inserts and updates (which don't affect the rows to be deleted).

There is a maintenance window, which would be fastest, but I may not be able to get a big enough maintenance window. On the flip side, I can take my time with the delete, so there is no rush.

Solution

Figured a better way to do this

  • Insert into new_tbl select * from old_tbl where start_date



-
During downtime window:

insert into new_tbl select * from old_tbl where id not in (select id from new_tbl ) and id not in ( select id from old_tbl)


-
Rename
old_tbl to old_tbl_drop and new_tbl to old_tbl.

  • Drop old_tbl_drop`.

Code Snippets

insert into new_tbl select * from old_tbl where id not in (select id from new_tbl ) and id not in ( select id from old_tbl)

Context

StackExchange Database Administrators Q#199523, answer score: 2

Revisions (0)

No revisions yet.