patternsqlMinor
Can I speed up a large MySQL/MariaDB join through partitioning or merging?
Viewed 0 times
canmergingjoinmysqllargethroughspeedpartitioningmariadb
Problem
I'm trying to run the following join query as part of a more complex one on MariaDB 10.1.26.
Both join fields are indexed. However, the join involves a full scan of
The sizes of the two tables are relatively large:
Could I somehow increase the qu
select distinct
project_commits.project_id,
date_format(created_at, '%x%v1') as week_commit
from project_commits
left join commits
on project_commits.commit_id = commits.id;Both join fields are indexed. However, the join involves a full scan of
project_commits and an index lookup on commits. This is corroborated by the output of EXPLAIN.+------+-------------+-----------------+--------+---------------+---------+---------+-------------------------------------+------------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-----------------+--------+---------------+---------+---------+-------------------------------------+------------+-----------------+
| 1 | SIMPLE | project_commits | ALL | NULL | NULL | NULL | NULL | 5417294109 | Using temporary |
| 1 | SIMPLE | commits | eq_ref | PRIMARY | PRIMARY | 4 | ghtorrent.project_commits.commit_id | 1 | |
+------+-------------+-----------------+--------+---------------+---------+---------+-------------------------------------+------------+-----------------+The sizes of the two tables are relatively large:
project_commits contains 5 billion rows and commits 847 million rows. Also the server's memory size is relatively small (16GB). This probably means that index lookups hit the (unfortunately magnetic) disk, and therefore performance takes a heavy hit. According to the output of pmonitor run on the generated temporary table, the query, which has already run for more than half a day, will take another 373 hours to complete./home/mysql/ghtorrent/project_commits#P#p0.MYD 6.68% ETA 373:38:11Could I somehow increase the qu
Solution
From the looks of the EXPLAIN plan, you are doing a full table scan on
From the looks of the query, I surmise there is a one-to-many relationship from
You are also using
Your query is saying:
Show me all
Instead, you might want the query to say:
Show me all
SUGGESTION #1 : Flip Table Order
SUGGESTION #2 : Use
SUGGESTION #3 : Add
If you are already have an index on
Adding an index will change
SUGGESTION #4 : Alter the
You could manipulate the style of the join operation by tweeking the optimizer
According to MySQL Documentation on Block Nested-Loop and Batched Key Access Joins
When BKA is used, the value of join_buffer_size defines how large the batch of keys is in each request to the storage engine. The larger the buffer, the more sequential access will be to the right hand table of a join operation, which can significantly improve performance.
For BKA to be used, the batched_key_access flag of the optimizer_switch system variable must be set to on. BKA uses MRR, so the mrr flag must also be on. Currently, the cost estimation for MRR is too pessimistic. Hence, it is also necessary for mrr_cost_based to be off for BKA to be used.
This same page recommends doing this:
GIVE IT A TRY !!!
Note : I have no idea what to expect from my suggestions. After all, your
4.573 Quintillion rows (5.4 Billion X 0.847 Billion) being looked up
Have fun and let us know what you find
project_commits.From the looks of the query, I surmise there is a one-to-many relationship from
commits to project_commits. The problem I see is that your query is gathering data as many-on-one.You are also using
LEFT JOIN. Your query is saying:
Show me all
project_commits that are connected to or orphaned from commitsInstead, you might want the query to say:
Show me all
project_commits that are connected to commitsSUGGESTION #1 : Flip Table Order
EXPLAIN select distinct
project_commits.project_id,
date_format(created_at, '%x%v1') as week_commit
from commits
left join project_commits
on commits.id = project_commits.commit_id;SUGGESTION #2 : Use
INNER JOINEXPLAIN select distinct
project_commits.project_id,
date_format(created_at, '%x%v1') as week_commit
from project_commits
inner join commits
on project_commits.commit_id = commits.id;SUGGESTION #3 : Add
created_at indexIf you are already have an index on
created_at or if you already have a compound index whose first column is created_at, skip this suggestion.ALTER TABLE `project_commits` ADD INDEX created_at_ndx (`created_at`);Adding an index will change
EXPLAIN plan to do an index scan instead of a table scanSUGGESTION #4 : Alter the
JOIN behaviorYou could manipulate the style of the join operation by tweeking the optimizer
According to MySQL Documentation on Block Nested-Loop and Batched Key Access Joins
When BKA is used, the value of join_buffer_size defines how large the batch of keys is in each request to the storage engine. The larger the buffer, the more sequential access will be to the right hand table of a join operation, which can significantly improve performance.
For BKA to be used, the batched_key_access flag of the optimizer_switch system variable must be set to on. BKA uses MRR, so the mrr flag must also be on. Currently, the cost estimation for MRR is too pessimistic. Hence, it is also necessary for mrr_cost_based to be off for BKA to be used.
This same page recommends doing this:
mysql> SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';GIVE IT A TRY !!!
Note : I have no idea what to expect from my suggestions. After all, your
LEFT JOIN is like an iterative Cartesian Join with the potential of make a temp table that is the following4.573 Quintillion rows (5.4 Billion X 0.847 Billion) being looked up
Have fun and let us know what you find
Code Snippets
EXPLAIN select distinct
project_commits.project_id,
date_format(created_at, '%x%v1') as week_commit
from commits
left join project_commits
on commits.id = project_commits.commit_id;EXPLAIN select distinct
project_commits.project_id,
date_format(created_at, '%x%v1') as week_commit
from project_commits
inner join commits
on project_commits.commit_id = commits.id;ALTER TABLE `project_commits` ADD INDEX created_at_ndx (`created_at`);mysql> SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';Context
StackExchange Database Administrators Q#213983, answer score: 6
Revisions (0)
No revisions yet.