HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Performance issues with groupwise MAX in JOIN clause

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
issueswithjoinmaxperformanceclausegroupwise

Problem

Problem

I have some assets in my application which are updated from time to time in an asynchronous fashion.

The example I'm gonna use here is Vehicles. There are two tables:

  • Vehicles: holds information about the vehicles themselves



  • VehicleUpdates: holds information about all updates that happened for that vehicle.



The relevant parts of the table structure are:

CREATE TABLE Vehicles (
id varchar(50) NOT NULL,
organizationId varchar(50) NOT NULL,
plate char(7) NOT NULL,
vehicleInfo json DEFAULT NULL,
createdAt timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
updatedAt timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (
id),
UNIQUE KEY
unq_Vehicles_orgId_plate_idx (organizationId,plate) USING BTREE,
KEY
Vehicles_createdAt_idx (createdAt),
);

CREATE TABLE
VehicleUpdates (
id varchar(50) NOT NULL,
organizationId varchar(50) NOT NULL,
vehiclePlate char(7) NOT NULL,
status varchar(15) NOT NULL,
createdAt timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
updatedAt timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (
id),
KEY
VehicleUpdates_orgId_vhclPlt_createdAt_idx (organizationId,vehiclePlate,createdAt) USING BTREE
);


Now I have a new requirement in which I must return the latest update information along side the vehicle information itself.

Groupwise MAX Solutions

After digging a little, I've found this blog article. I then tried to use the suggested "uncorrelated subquery" approach, since it's deemed the best one:

Uncorrelated subquery

SELECT vu1.*
FROM VehicleUpdates AS vu1
JOIN
( SELECT vehiclePlate, organizationId, MAX(createdAt) AS createdAt
FROM VehicleUpdates
GROUP BY organizationId, vehiclePlate
) AS vu2 USING (organizationId, vehiclePlate, createdAt);


This query has an average execution time of 275 ms in my production database.

I thought that was too slow, so I decided to g

Solution

I feel so stupid! Just found the problem.

For some reason, Vehicles and VehicleUpdates had different charsets (utf8mb4 and utf8, respectively) in production.

That's the reason why the EXPLAIN result for the "uncorrelated subquery" approach had a full table scan in one of its steps:

| "select_type" | "table"        | "type" | "possible_keys"                            | "key"                                      | "key_len" | "ref"                                             | "rows" | "filtered" | "Extra"
| PRIMARY       | v              | ALL    |                                            |                                            |           |                                                   | 14456  | 100        |
| PRIMARY       |      | ALL    |                                            |                                            |           |                                                   | 29289  | 100        | Using where
| PRIMARY       | vu1            | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "327"     | vu2.organizationId,vu2.vehiclePlate,vu2.createdAt | 1      | 100        | Using where
| DERIVED       | VehicleUpdates | range  | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "323"     |                                                   | 29289  | 100        | Using index for group-by


After converting VehicleUpdates to utf8mb4, the EXPLAIN result is:

| "select_type" | "table"        | "type" | "possible_keys"                            | "key"                                      | "key_len" | "ref"                                           | "rows" | "filtered" | "Extra"
| PRIMARY       | v              | ref    | Vehicles_orgId_status_idx                  | Vehicles_orgId_status_idx                  | "202"     | const                                           | 188    | 100        |
| PRIMARY       |      | ref    |                                 |                                 | "230"     | v.plate,v.organizationId                        | 10     | 100        |
| PRIMARY       | vu1            | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "234"     | v.organizationId,vu2.vehiclePlate,vu2.createdAt | 1      | 100        | Using where
| DERIVED       | VehicleUpdates | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "202"     | const                                           | 24090  | 100        | Using where; Using index


Likewise, the "LEFT JOIN" approach execution plan changed from:

| "select_type" | "table" | "type" | "possible_keys"                                        | "key"                                      | "key_len" | "ref"                               | "rows" | "filtered" | "Extra"
| SIMPLE        | v       | ref    | unq_Vehicles_orgId_plate_idx,Vehicles_orgId_status_idx | unq_Vehicles_orgId_plate_idx               | "202"     | const                               | 30     | 100        |
| SIMPLE        | vu1     | ALL    |                                                        |                                            |           |                                     | 263171 | 100        | Using where; Using join buffer (Block Nested Loop)
| SIMPLE        | vu2     | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx             | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "173"     | vu1.organizationId,vu1.vehiclePlate | 10     | 10         | Using where; Not exists; Using index


To:

| "select_type" | "table" | "type" | "possible_keys"                            | "key"                                      | "key_len" | "ref"                               | "rows" | "filtered" | "Extra"
| SIMPLE        | v       | ref    | Vehicles_orgId_status_idx                  | Vehicles_orgId_status_idx                  | "202"     | const                               | 188    | 100        |
| SIMPLE        | vu1     | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "230"     | v.organizationId,v.plate            | 9      | 100        |
| SIMPLE        | vu2     | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "230"     | vu1.organizationId,vu1.vehiclePlate | 9      | 10         | Using where; Not exists; Using index


So, the performance now for the different queries are:

LEFT JOIN

SELECT v., vu1.
FROM Vehicles AS v
LEFT JOIN VehicleUpdates AS vu1
ON v.plate = vu1.vehiclePlate
AND v.organizationId = vu1.organizationId
LEFT JOIN VehicleUpdates AS vu2
ON vu1.organizationId = vu2.organizationId
AND vu1.vehiclePlate = vu2.vehiclePlate
AND vu2.createdAt > vu1.createdAt
where v.organizationId = ''
AND vu2.id IS NULL
LIMIT 100;


Always runs bellow 50 ms.

Unc

Code Snippets

| "select_type" | "table"        | "type" | "possible_keys"                            | "key"                                      | "key_len" | "ref"                                             | "rows" | "filtered" | "Extra"
| PRIMARY       | v              | ALL    |                                            |                                            |           |                                                   | 14456  | 100        |
| PRIMARY       | <derived3>     | ALL    |                                            |                                            |           |                                                   | 29289  | 100        | Using where
| PRIMARY       | vu1            | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "327"     | vu2.organizationId,vu2.vehiclePlate,vu2.createdAt | 1      | 100        | Using where
| DERIVED       | VehicleUpdates | range  | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "323"     |                                                   | 29289  | 100        | Using index for group-by
| "select_type" | "table"        | "type" | "possible_keys"                            | "key"                                      | "key_len" | "ref"                                           | "rows" | "filtered" | "Extra"
| PRIMARY       | v              | ref    | Vehicles_orgId_status_idx                  | Vehicles_orgId_status_idx                  | "202"     | const                                           | 188    | 100        |
| PRIMARY       | <derived2>     | ref    | <auto_key1>                                | <auto_key1>                                | "230"     | v.plate,v.organizationId                        | 10     | 100        |
| PRIMARY       | vu1            | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "234"     | v.organizationId,vu2.vehiclePlate,vu2.createdAt | 1      | 100        | Using where
| DERIVED       | VehicleUpdates | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "202"     | const                                           | 24090  | 100        | Using where; Using index
| "select_type" | "table" | "type" | "possible_keys"                                        | "key"                                      | "key_len" | "ref"                               | "rows" | "filtered" | "Extra"
| SIMPLE        | v       | ref    | unq_Vehicles_orgId_plate_idx,Vehicles_orgId_status_idx | unq_Vehicles_orgId_plate_idx               | "202"     | const                               | 30     | 100        |
| SIMPLE        | vu1     | ALL    |                                                        |                                            |           |                                     | 263171 | 100        | Using where; Using join buffer (Block Nested Loop)
| SIMPLE        | vu2     | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx             | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "173"     | vu1.organizationId,vu1.vehiclePlate | 10     | 10         | Using where; Not exists; Using index
| "select_type" | "table" | "type" | "possible_keys"                            | "key"                                      | "key_len" | "ref"                               | "rows" | "filtered" | "Extra"
| SIMPLE        | v       | ref    | Vehicles_orgId_status_idx                  | Vehicles_orgId_status_idx                  | "202"     | const                               | 188    | 100        |
| SIMPLE        | vu1     | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "230"     | v.organizationId,v.plate            | 9      | 100        |
| SIMPLE        | vu2     | ref    | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "230"     | vu1.organizationId,vu1.vehiclePlate | 9      | 10         | Using where; Not exists; Using index

Context

StackExchange Database Administrators Q#229643, answer score: 2

Revisions (0)

No revisions yet.