HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Copy millions of rows to another table in batches mySQL

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsmillionsbatchesmysqlanothertablecopy

Problem

Table A which is always getting updated (records being inserted or updated).
Table A contains millions of records.
I'd like to copy some of these records to a new table Table B.
Table A and Table B has exact same schema.

How can I copy records from Table A to Table B? I don't want to consider the data which keeps getting updated in Table A. I only want to copy the data which is there when I first queried Table A. I'm trying to copy data in batches. So everytime I query for a batch of 500 records from Table A and copy them to Table B. The next time I query Table A to get next 500 records using offset. There is no guarentee that the new set of records are exactly next batch of 500 records since Table A is always getting updated. The task is to be able to ensure we are fetching batches in sequential way and it guarentees we have exactly next 500 records.

INSERT INTO `Table B` FROM SELECT * FROM `Table A` WHERE ...


doesn't work. Because as I mentioned Table A has a lot of data and running this query timesout. It needs to be carried out in batches. Creating a temporary table would also require to copy it in batches.

I tried to use mySQL views but they also have the same problem. The view fetches data from the underlying table. If the underlying table gets updated the view fetches the updated data.

Solution

Have you considered Table Partitioning? It's seems that you are trying to do by hand what MySQL will do for you automatically.

In any event you will need to get your data into the new partitioned table or the old data into your archive table. I agree with Rick James - find a key (primary key would work well for this) and move small chunks using that key not limit and offset.

You did not say it you are going to do this often (in which case please read up on partitions) or this is a one time thing. If it's a one time thing you can even write a single script:

INSERT INTO foo SELECT * FROM bar WHERE id between 0 and 5000;
INSERT INTO foo SELECT * FROM bar WHERE id between 5001 and 10000;
INSERT INTO foo SELECT * FROM bar WHERE id between 10001 and 15000;
INSERT INTO foo SELECT * FROM bar WHERE id between 15001 and 20000;
...


You probably want to delete from the bar table and you'd like to be sure those rows were indeed copied to the foo table so you might instead try:

INSERT INTO foo SELECT * FROM bar WHERE id between 0 and 5000;
DELETE FROM bar WHERE id in (SELECT id FROM foo);
INSERT INTO foo SELECT * FROM bar WHERE id between 5001 and 10000;
DELETE FROM bar WHERE id in (SELECT id FROM foo);
INSERT INTO foo SELECT * FROM bar WHERE id between 10001 and 15000;
DELETE FROM bar WHERE id in (SELECT id FROM foo);
INSERT INTO foo SELECT * FROM bar WHERE id between 15001 and 20000;
DELETE FROM bar WHERE id in (SELECT id FROM foo);
...


If 5000 rows is still too big try 500, 100 or even 10. You can try different numbers before you write the whole script. Remember the data you move will not need to be moved again so you can start over at 0 without harm. This script can run for a long time; if you really want to have minimal impact put a SELECT sleep(1) after each INSERT. Remember in this example id really needs to be in some kind of index.

Good Luck,
Frank

Code Snippets

INSERT INTO foo SELECT * FROM bar WHERE id between 0 and 5000;
INSERT INTO foo SELECT * FROM bar WHERE id between 5001 and 10000;
INSERT INTO foo SELECT * FROM bar WHERE id between 10001 and 15000;
INSERT INTO foo SELECT * FROM bar WHERE id between 15001 and 20000;
...
INSERT INTO foo SELECT * FROM bar WHERE id between 0 and 5000;
DELETE FROM bar WHERE id in (SELECT id FROM foo);
INSERT INTO foo SELECT * FROM bar WHERE id between 5001 and 10000;
DELETE FROM bar WHERE id in (SELECT id FROM foo);
INSERT INTO foo SELECT * FROM bar WHERE id between 10001 and 15000;
DELETE FROM bar WHERE id in (SELECT id FROM foo);
INSERT INTO foo SELECT * FROM bar WHERE id between 15001 and 20000;
DELETE FROM bar WHERE id in (SELECT id FROM foo);
...

Context

StackExchange Database Administrators Q#302399, answer score: 2

Revisions (0)

No revisions yet.