HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Deleting large amounts of cascading records with minimal locking

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
deletingwithrecordslargeamountslockingminimalcascading

Problem

I'm looking for a strategy to purge large amounts of logically deleted data from a MySQL database. The deleted records are in a relatively small table, but are referred to with many ON DELETE CASACDE records on several referring tables, which makes the deletion process very slow and time consuming, during which time some of the smaller but more frequently accessed tables are, apparently, locked (these are InnoDB tables).

Since these purges happen on regular intervals on a high-load database, I'm trying to figure out the best strategy to purge this data with minimal impact on other processes, i.e. with minimal locking of parent tables.

To give an example, I have a table structure similar to this:

+------------+    +------------+    +------------+    +-------------+
| accounts   |    | users      |    | messages   |    | attachments |
+------------+    +------------+    +------------+    +-------------+
| id         |    | id         |    | id         |    | id          |
| name       |    | account_id |    | user_id    |    | message_id  |
| is_deleted |    | name       |    +------------+    +-------------+
+------------+    +------------+

users.account_id REFERENCES accounts.id ON DELETE CASCADE
messages.user_id REFERENCES users.id ON DELETE CASCADE
attachments.message_id REFERENCES messages.id ON DELETE CASCADE


Assuming there are dozens of users per account, thousands of messages per user and potentially dozens attachments and other related data (e.g. tags, metadata etc.) per message - deleting an account will result in tens or hundreds of thousands of records to delete spanning multiple tables.

What would be a good strategy to take when purging data in such a case?

Solution

I had the same issue. Deleting in batches helps, but still tends to lock up my Percona cluster because of the delay in replicating the delete to other nodes.

In my case, most of the cascaded deletes did not exist in practice - there could in theory be records in those tables, but in practice there weren't for most of the tables. So what I did was query the schema to find which tables the cascade could affect, then for each item to delete:

  • SET FOREIGN_KEY_CHECKS=0;



  • Query the affected tables to find if there were any affected records; if so, delete them.



  • Delete the main record.



  • SET FOREIGN_KEY_CHECKS=1;



This meant that the actual locking was very much reduced. In my case this enabled me to delete about an order of magnitude faster.

This approach is not atomic, but won't leave the tables in a bad state if there's a crash.

Context

StackExchange Database Administrators Q#112387, answer score: 2

Revisions (0)

No revisions yet.