HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMajor

How to update 10 million+ rows in MySQL single table as Fast as possible?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fastmillionrowsupdatemysqlsinglepossiblehowtable

Problem

Using MySQL 5.6 with InnoDB storage engine for most of the tables. InnoDB buffer pool size is 15 GB and Innodb DB + indexes are around 10 GB. Server has 32GB RAM and is running Cent OS 7 x64.

I have one big table which contains around 10 millions + records.

I get an updated dump file from a remote server every 24 hours. The file is in csv format. I don't have control over that format. The file is ~750 MB. I tried inserting data to a MyISAM table row by row and it took 35 minutes.

I need to take only 3 values per line out of 10-12 from the file and update it in the database.

What's the best way to achieve something like this ?

I need to do this daily.

Currently Flow is like this:

  • mysqli_begin_transaction



  • Read Dump file line by line



  • Update each record Line by Line.



  • mysqli_commit



Above operations takes around 30-40 minutes to complete and while doing this, there are other updates going on which gives me


Lock wait timeout exceeded; try restarting transaction

Update 1

data loading in new table using LOAD DATA LOCAL INFILE. In MyISAM it took 38.93 sec while in InnoDB it took 7 min 5.21 sec. Then I did:

UPDATE table1 t1, table2 t2
SET 
t1.field1 = t2.field1,
t1.field2 = t2.field2,
t1.field3 = t2.field3
WHERE t1.field10 = t2.field10

Query OK, 434914 rows affected (22 hours 14 min 47.55 sec)


Update 2

same update with join query

UPDATE table1 a JOIN table2 b 
ON a.field1 = b.field1 
SET 
a.field2 = b.field2,
a.field3 = b.field3,
a.field4 = b.field4

(14 hours 56 min 46.85 sec)


Clarifications from questions in comments:

  • About 6% of the rows in the table will be updated by the file, but sometimes it can be as much as 25%.



  • There are indexes on the fields being updated. There are 12 indexes on the table, and 8 indexes include the update fields.



  • It is not necessary to do the update in one transaction. It can take time but not more than 24 hours. I am looking to get it done in 1 hour without locking the whole table, as la

Solution

In light of all the things mentioned, it looks like the bottleneck is the join itself.

ASPECT #1 : Join Buffer Size

In all likelihood, your join_buffer_size is probably too low.

According to the MySQL Documentation on How MySQL Uses the Join Buffer Cache


We only store the used columns in the join buffer, not the whole rows.

This being the case, make the keys of the join buffer stay in RAM.

You have 10 million rows times 4 bytes for each key. That's about 40M.

Try bumping it up in the session to 42M (a little bigger than 40M)

SET join_buffer_size = 1024 * 1024 * 42;
UPDATE table1 a JOIN table2 b 
ON a.field1 = b.field1 
SET 
a.field2 = b.field2,
a.field3 = b.field3,
a.field4 = b.field4;


If this does the trick, proceed to add this to my.cnf

[mysqld]
join_buffer_size = 42M


Restarting mysqld not required for new connections. Just run

mysql> SET GLOBAL join_buffer_size = 1024 * 1024 * 42;


ASPECT #2 : Join Operation

You could manipulate the style of the join operation by tweeking the optimizer

According to MySQL Documentation on Block Nested-Loop and Batched Key Access Joins


When BKA is used, the value of join_buffer_size defines how large the batch of keys is in each request to the storage engine. The larger the buffer, the more sequential access will be to the right hand table of a join operation, which can significantly improve performance.


For BKA to be used, the batched_key_access flag of the optimizer_switch system variable must be set to on. BKA uses MRR, so the mrr flag must also be on. Currently, the cost estimation for MRR is too pessimistic. Hence, it is also necessary for mrr_cost_based to be off for BKA to be used.

This same page recommends doing this:

mysql> SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';


ASPECT #3 : Writing Updates to Disk (OPTIONAL)

Most forget to increase the innodb_write_io_threads to write dirty pages out of the buffer pool faster.

[mysqld]
innodb_write_io_threads = 16


You will have to restart MySQL for this change

GIVE IT A TRY !!!

Code Snippets

SET join_buffer_size = 1024 * 1024 * 42;
UPDATE table1 a JOIN table2 b 
ON a.field1 = b.field1 
SET 
a.field2 = b.field2,
a.field3 = b.field3,
a.field4 = b.field4;
[mysqld]
join_buffer_size = 42M
mysql> SET GLOBAL join_buffer_size = 1024 * 1024 * 42;
mysql> SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';
[mysqld]
innodb_write_io_threads = 16

Context

StackExchange Database Administrators Q#119621, answer score: 20

Revisions (0)

No revisions yet.