snippetsqlMajor
How to update 10 million+ rows in MySQL single table as Fast as possible?
Viewed 0 times
fastmillionrowsupdatemysqlsinglepossiblehowtable
Problem
Using MySQL 5.6 with InnoDB storage engine for most of the tables. InnoDB buffer pool size is 15 GB and Innodb DB + indexes are around 10 GB. Server has 32GB RAM and is running Cent OS 7 x64.
I have one big table which contains around 10 millions + records.
I get an updated dump file from a remote server every 24 hours. The file is in csv format. I don't have control over that format. The file is ~750 MB. I tried inserting data to a MyISAM table row by row and it took 35 minutes.
I need to take only 3 values per line out of 10-12 from the file and update it in the database.
What's the best way to achieve something like this ?
I need to do this daily.
Currently Flow is like this:
Above operations takes around 30-40 minutes to complete and while doing this, there are other updates going on which gives me
Lock wait timeout exceeded; try restarting transaction
Update 1
data loading in new table using
Update 2
same update with join query
Clarifications from questions in comments:
I have one big table which contains around 10 millions + records.
I get an updated dump file from a remote server every 24 hours. The file is in csv format. I don't have control over that format. The file is ~750 MB. I tried inserting data to a MyISAM table row by row and it took 35 minutes.
I need to take only 3 values per line out of 10-12 from the file and update it in the database.
What's the best way to achieve something like this ?
I need to do this daily.
Currently Flow is like this:
- mysqli_begin_transaction
- Read Dump file line by line
- Update each record Line by Line.
- mysqli_commit
Above operations takes around 30-40 minutes to complete and while doing this, there are other updates going on which gives me
Lock wait timeout exceeded; try restarting transaction
Update 1
data loading in new table using
LOAD DATA LOCAL INFILE. In MyISAM it took 38.93 sec while in InnoDB it took 7 min 5.21 sec. Then I did:UPDATE table1 t1, table2 t2
SET
t1.field1 = t2.field1,
t1.field2 = t2.field2,
t1.field3 = t2.field3
WHERE t1.field10 = t2.field10
Query OK, 434914 rows affected (22 hours 14 min 47.55 sec)Update 2
same update with join query
UPDATE table1 a JOIN table2 b
ON a.field1 = b.field1
SET
a.field2 = b.field2,
a.field3 = b.field3,
a.field4 = b.field4
(14 hours 56 min 46.85 sec)Clarifications from questions in comments:
- About 6% of the rows in the table will be updated by the file, but sometimes it can be as much as 25%.
- There are indexes on the fields being updated. There are 12 indexes on the table, and 8 indexes include the update fields.
- It is not necessary to do the update in one transaction. It can take time but not more than 24 hours. I am looking to get it done in 1 hour without locking the whole table, as la
Solution
In light of all the things mentioned, it looks like the bottleneck is the join itself.
ASPECT #1 : Join Buffer Size
In all likelihood, your join_buffer_size is probably too low.
According to the MySQL Documentation on How MySQL Uses the Join Buffer Cache
We only store the used columns in the join buffer, not the whole rows.
This being the case, make the keys of the join buffer stay in RAM.
You have 10 million rows times 4 bytes for each key. That's about 40M.
Try bumping it up in the session to 42M (a little bigger than 40M)
If this does the trick, proceed to add this to
Restarting mysqld not required for new connections. Just run
ASPECT #2 : Join Operation
You could manipulate the style of the join operation by tweeking the optimizer
According to MySQL Documentation on Block Nested-Loop and Batched Key Access Joins
When BKA is used, the value of join_buffer_size defines how large the batch of keys is in each request to the storage engine. The larger the buffer, the more sequential access will be to the right hand table of a join operation, which can significantly improve performance.
For BKA to be used, the batched_key_access flag of the optimizer_switch system variable must be set to on. BKA uses MRR, so the mrr flag must also be on. Currently, the cost estimation for MRR is too pessimistic. Hence, it is also necessary for mrr_cost_based to be off for BKA to be used.
This same page recommends doing this:
ASPECT #3 : Writing Updates to Disk (OPTIONAL)
Most forget to increase the innodb_write_io_threads to write dirty pages out of the buffer pool faster.
You will have to restart MySQL for this change
GIVE IT A TRY !!!
ASPECT #1 : Join Buffer Size
In all likelihood, your join_buffer_size is probably too low.
According to the MySQL Documentation on How MySQL Uses the Join Buffer Cache
We only store the used columns in the join buffer, not the whole rows.
This being the case, make the keys of the join buffer stay in RAM.
You have 10 million rows times 4 bytes for each key. That's about 40M.
Try bumping it up in the session to 42M (a little bigger than 40M)
SET join_buffer_size = 1024 * 1024 * 42;
UPDATE table1 a JOIN table2 b
ON a.field1 = b.field1
SET
a.field2 = b.field2,
a.field3 = b.field3,
a.field4 = b.field4;If this does the trick, proceed to add this to
my.cnf[mysqld]
join_buffer_size = 42MRestarting mysqld not required for new connections. Just run
mysql> SET GLOBAL join_buffer_size = 1024 * 1024 * 42;ASPECT #2 : Join Operation
You could manipulate the style of the join operation by tweeking the optimizer
According to MySQL Documentation on Block Nested-Loop and Batched Key Access Joins
When BKA is used, the value of join_buffer_size defines how large the batch of keys is in each request to the storage engine. The larger the buffer, the more sequential access will be to the right hand table of a join operation, which can significantly improve performance.
For BKA to be used, the batched_key_access flag of the optimizer_switch system variable must be set to on. BKA uses MRR, so the mrr flag must also be on. Currently, the cost estimation for MRR is too pessimistic. Hence, it is also necessary for mrr_cost_based to be off for BKA to be used.
This same page recommends doing this:
mysql> SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';ASPECT #3 : Writing Updates to Disk (OPTIONAL)
Most forget to increase the innodb_write_io_threads to write dirty pages out of the buffer pool faster.
[mysqld]
innodb_write_io_threads = 16You will have to restart MySQL for this change
GIVE IT A TRY !!!
Code Snippets
SET join_buffer_size = 1024 * 1024 * 42;
UPDATE table1 a JOIN table2 b
ON a.field1 = b.field1
SET
a.field2 = b.field2,
a.field3 = b.field3,
a.field4 = b.field4;[mysqld]
join_buffer_size = 42Mmysql> SET GLOBAL join_buffer_size = 1024 * 1024 * 42;mysql> SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';[mysqld]
innodb_write_io_threads = 16Context
StackExchange Database Administrators Q#119621, answer score: 20
Revisions (0)
No revisions yet.