HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to speed up update join query between 2 mysql tables?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
tablesupdatequeryjoinmysqlbetweenhowspeed

Problem

alright long story short.,
i have powerful dedicated server.

Intel  I7-6700K -
64GB DDR4 2400 MHz
1x480GB   SSD


running mysql server along with nginx,php

innodb-ft-min-token-size = 1
innodb-ft-enable-stopword = 0

innodb_buffer_pool_size = 40G
max_connections = 2000

[deploy@ns540545 ~]$ free -h
              total        used        free      shared  buff/cache   available
Mem:            62G         45G         11G        107M        6.4G         16G
Swap:          2.0G        1.4G        640M


it was expensive so i got another dedicated server for cost cutting lets call it
not-so powerful dedicated server

Intel  i3-2130  
8GB DDR3 1333 MHz   
2TB


running mysql server along with nginx,php

innodb-ft-min-token-size = 1
innodb-ft-enable-stopword = 0

innodb_buffer_pool_size = 4G
max_connections = 2000

[root@privateserver deploy]# free -h
              total        used        free      shared  buff/cache   available
Mem:           7.7G        7.5G         73M         24M        150M         79M
Swap:           39G        7.8G         32G


i moved database from powerful server to not-so powerful server.

I can feel slight performance degradation while running simple queries which is fine, but this one query which used to take 2 minutes on powerful server now it takes around 26.6525 hours and counting on not-so powerful server.

UPDATE content a JOIN peers_data b ON a.hash = b.hash SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated


More info about tables which are exactly same on both the dedicated server

``
CREATE TABLE
peers_data (
id int(11) NOT NULL AUTO_INCREMENT,
hash char(40) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
seeders int(11) NOT NULL DEFAULT '0',
leechers int(11) NOT NULL DEFAULT '0',
is_updated int(1) NOT NULL DEFAULT '1',
PRIMARY KEY (
hash),
UNIQUE KEY
id (id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE

Solution

#YOUR QUERY

Your query should be taking a long time. Why ??? Look at your query:

UPDATE content a JOIN peers_data b ON a.hash = b.hash
SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated;


Please note the columns begin updated by the query in the content table:

  • seeders



  • leechers



  • is_updated



Which of these columns are indexed ?

  • seeders



  • leechers



What indexes do you have in the content table involving those columns ?

KEY `seeders` (`seeders`), <<<<-------------------------------- THIS ONE !!!
KEY `uploader_sub_category` (`uploader`,`sub_category`),
KEY `upload_date` (`upload_date`),
KEY `uploader_upload_date` (`uploader`,`upload_date`),
KEY `leechers` (`leechers`), <<<<------------------------------ THIS ONE !!!
KEY `size` (`size`),
KEY `uploader_seeders` (`uploader`,`seeders`), <<<<------------ THIS ONE !!!


What is going on under the hood ?

If you are changing the values for seeders and leeches, these three(3) indexes are having their leaf nodes reshuffled.

Even if the the majority of the values are not being changed, rows are being locked and copies of your data are being stockpiled in your undo logs (for the sake of MVCC). This results is additional disk I/O (ibdata1 should be growing)

#INNODB

The InnoDB Buffer Pool will be going through the "Perfect Storm". Why ?

Please note the InnoDB Architecture (Picture from Percona CTO Vadim Tkachenko)

Please note the Insert Buffer. Here is what the MySQL Documentation says about it:

The change buffer is a special data structure that caches changes to
secondary index pages when affected pages are not in the buffer pool.
The buffered changes, which may result from INSERT, UPDATE, or DELETE
operations (DML), are merged later when the pages are loaded into the
buffer pool by other read operations.

Unlike clustered indexes, secondary indexes are usually nonunique, and
inserts into secondary indexes happen in a relatively random order.
Similarly, deletes and updates may affect secondary index pages that
are not adjacently located in an index tree. Merging cached changes at
a later time, when affected pages are read into the buffer pool by
other operations, avoids substantial random access I/O that would be
required to read-in secondary index pages from disk.

Periodically, the purge operation that runs when the system is mostly
idle, or during a slow shutdown, writes the updated index pages to
disk. The purge operation can write disk blocks for a series of index
values more efficiently than if each value were written to disk
immediately.

Change buffer merging may take several hours when there are numerous
secondary indexes to update and many affected rows. During this time,
disk I/O is increased, which can cause a significant slowdown for
disk-bound queries. Change buffer merging may also continue to occur
after a transaction is committed. In fact, change buffer merging may
continue to occur after a server shutdown and restart (see Section
14.21.2, “Forcing InnoDB Recovery” for more information).

In memory, the change buffer occupies part of the InnoDB buffer pool.
On disk, the change buffer is part of the system tablespace, so that
index changes remain buffered across database restarts.

Here is where the "Perfect Storm" comes in: By default, InnoDB storage engine reserves up to 25% of the Buffer Pool for Change Buffering. All changes to your three(3) secondary indexes has to pile up in the InnoDB Bufffer Pool's Insert Buffer. When the actual index pages land in the Buffer Pool, the merge process will subsequently be pushed to disk (Note the Insert Buffer inside ibdata1), producing more disk I/O.

#SUGGESTIONS

###SUGGESTION #1

Get rid of those indexes. Why ? This will eliminate the change buffering needed for managing those indexes during your mass UPDATE.

Run the following query:

SELECT
    COUNT(1) rowcount,
    COUNT(DISTINCT seeders) seeders_count,
    COUNT(DISTINCT leechers) leechers_count
FROM content;


If seeders_count is less than 5% of rowcount, then the cardinality of seeders_count can disqualify the seeders from being of any use. Same going with leechers.

For that index uploader_seeders, run this query:

SELECT COUNT(1) uploader_seeders_count FROM
(SELECT DISTINCT uploader,seeders FROM content) A;


If uploader_seeders_count is less than 5% of rowcount (from previous query), then get rid of uploader_seeders index.

To get rid of those indexes, run this:

ALTER TABLE content DROP INDEX uploader_seeders,DROP INDEX seeders,DROP INDEX leechers;


###SUGGESTION #2

Change the Insert Buffer size to the max value, the run the query:

SET GLOBAL innodb_change_buffer_max_size = 50;
UPDATE content a JOIN peers_data b ON a.hash = b.hash
SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated;
SET GLOBAL innodb_change_buffer_max_size = 25;


Also, set the Buffer Pool Size (innodb_buffer_pool_size) to 20G

Code Snippets

UPDATE content a JOIN peers_data b ON a.hash = b.hash
SET a.seeders = b.seeders, a.leechers = b.leechers, a.is_updated = b.is_updated;
KEY `seeders` (`seeders`), <<<<-------------------------------- THIS ONE !!!
KEY `uploader_sub_category` (`uploader`,`sub_category`),
KEY `upload_date` (`upload_date`),
KEY `uploader_upload_date` (`uploader`,`upload_date`),
KEY `leechers` (`leechers`), <<<<------------------------------ THIS ONE !!!
KEY `size` (`size`),
KEY `uploader_seeders` (`uploader`,`seeders`), <<<<------------ THIS ONE !!!
SELECT
    COUNT(1) rowcount,
    COUNT(DISTINCT seeders) seeders_count,
    COUNT(DISTINCT leechers) leechers_count
FROM content;
SELECT COUNT(1) uploader_seeders_count FROM
(SELECT DISTINCT uploader,seeders FROM content) A;
ALTER TABLE content DROP INDEX uploader_seeders,DROP INDEX seeders,DROP INDEX leechers;

Context

StackExchange Database Administrators Q#196715, answer score: 2

Revisions (0)

No revisions yet.