HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

SQL delivery query performance

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
sqlqueryperformancedelivery

Problem

I need a "little" help with an SQL query (MySQL).

I have the following tables:

COURIERS table:

+------------+
| COURIER_ID |
+------------+


DELIVERIES table:

+-------------+------------+------------+
| DELIVERY_ID | COURIER_ID | START_DATE |
+-------------+------------+------------+


ORDERS table:

+----------+-------------+-------------+
| ORDER_ID | DELIVERY_ID | FINISH_DATE |
+----------+-------------+-------------+


COORDINATES table:

+-------------+-----+-----+------+
| DELIVERY_ID | LAT | LNG | DATE |
+-------------+-----+-----+------+


In the real database I have more columns in each table, but for this example the above columns are enough.

What do I need?

-
An SQL query that returns all couriers [COURIER_ID], their last
delivery [DELIVERY_ID] (based on last START_DATE), the
delivery's last coordinate [LAT and LNG] (based on last DATE) and the remaining orders count (total of orders of the last delivery that have no FINISH_DATE).

-
A courier can have no deliveries, in this case I want DELIVERY_ID =
NULL, LAT = NULL and LNG = NULL in the result.

-
A delivery can have no coordinates, in this case I want LAT = NULL
and LNG = NULL in the result.

What was I able to do?

``
SELECT c.
COURIER_ID,
d.
DELIVERY_ID,
r.
LAT,
r.
LNG,
(SELECT COUNT(DISTINCT
ORDER_ID)
FROM
ORDERS
WHERE
DELIVERY_ID = d.DELIVERY_ID
AND
FINISH_DATE IS NULL) AS REMAINING_ORDERS

FROM
COURIERS AS c
LEFT JOIN
DELIVERIES AS d USING (COURIER_ID)
LEFT JOIN
COORDINATES AS r ON r.DELIVERY_ID = d.DELIVERY_ID

WHERE (CASE WHEN
(SELECT MAX(
START_DATE)
FROM
DELIVERIES
WHERE
COURIER_ID = c.COURIER_ID) IS NULL THEN d.START_DATE IS NULL ELSE d.START_DATE =
(SELECT MAX(
START_DATE)
FROM
DELIVERIES
WHERE
COURIER_ID = c.COURIER_ID) END)
AND (CASE WHEN
(SELECT MAX(
DATE)
FROM

Solution

Distinct

In a properly designed database, this DISTINCT should not be needed:

(SELECT COUNT(DISTINCT `ORDER_ID`)
   FROM `ORDERS`


If multiple different orders can have the same ORDER_ID, we're gonna have a bad time.

Aliases

c, d, r... these are bad table aliases. Why? Because Mr. Maintainer has to go through your entire query to know what those mean, as they are not self-evident. Your table names are pretty short, but if you really need to shorten them further... perhaps something like crier, deliv, coord would be better.

Join syntax

This USING command:

FROM `COURIERS` AS c
LEFT JOIN `DELIVERIES` AS d USING (`COURIER_ID`)


USING is not standard SQL syntax, when it comes to joins. Sure, it can be a nice shorthand in MySQL, but generally it is preferred to use standard SQL syntax (usually more verbose) instead of depending on syntactic sugar provided by a particular build or version of a language. Like this:

FROM `COURIERS` AS c
LEFT JOIN `DELIVERIES` AS d 
ON c.`COURIER_ID` = d.`COURIER_ID`


Performance

It's a bit difficult to assess performance hiccups if not presented with the execution plan, or a sample dataset to reproduce the issue. If I had to guess, I would say you have too many subqueries in your CASE statements.

CASE statements are great and very fast when you use constants to compare; but when you use variable values, such as what you are doing... well... it's pretty slow. You're asking the SQL engine to run each subquery for each record essentially. It may find a way to optimize that's halfway efficient, but I believe you would be better using a set-based approach. Example:

WHERE (CASE WHEN
         (SELECT MAX(`START_DATE`)
          FROM `DELIVERIES`
          WHERE `COURIER_ID` = c.`COURIER_ID`) IS NULL THEN d.`START_DATE` IS NULL ELSE -- etc.


Instead:

-- this goes before your CASE statement
INSERT INTO TEMP TABLE DeliveryDates
    SELECT IFNULL( 
        MAX( d.`START_DATE`, NULL ) 
    )
WHERE d.`COURIER_ID` = c.`COURIER_ID`


Then:

WHERE (CASE WHEN
         (SELECT `START_DATE`
            FROM DeliveryDates) IS NULL THEN d.`START_DATE` IS NULL 
        ELSE d.`START_DATE`

Code Snippets

(SELECT COUNT(DISTINCT `ORDER_ID`)
   FROM `ORDERS`
FROM `COURIERS` AS c
LEFT JOIN `DELIVERIES` AS d USING (`COURIER_ID`)
FROM `COURIERS` AS c
LEFT JOIN `DELIVERIES` AS d 
ON c.`COURIER_ID` = d.`COURIER_ID`
WHERE (CASE WHEN
         (SELECT MAX(`START_DATE`)
          FROM `DELIVERIES`
          WHERE `COURIER_ID` = c.`COURIER_ID`) IS NULL THEN d.`START_DATE` IS NULL ELSE -- etc.
-- this goes before your CASE statement
INSERT INTO TEMP TABLE DeliveryDates
    SELECT IFNULL( 
        MAX( d.`START_DATE`, NULL ) 
    )
WHERE d.`COURIER_ID` = c.`COURIER_ID`

Context

StackExchange Code Review Q#61996, answer score: 4

Revisions (0)

No revisions yet.