patternsqlMinor
Nested aggregation and grouping on multiple columns in MySQL
Viewed 0 times
columnsgroupingnestedmysqlmultipleandaggregation
Problem
Consider the following
For a given category ID, I am attempting to retrieve a list containing the vendor with the lowest latest price for each subcategory. With "latest" I mean that vendors may have multiple prices for a given category ID/subcategory ID combination, so only the most recently inserted price for that category ID/subcategory ID/vendor ID should be used. If there's a tie between 2 or more vendor's prices, the lowest id should be used as the tie-breaker.
For example, with this data:
So, first find the most recent prices for every subcategory/vendor combination (row with
I would expect the following results for
Here's what I have so far. I feel like it's already starting to get out of hand and this doesn't even account for ties between 2 or more vendor's prices.
```
SELECT c.subcategory_id
product table (which is highly trimmed down):`id` int AUTO_INCREMENT
`category_id` int
`subcategory_id` int
`vendor_id` int
`price` decimal(6,2)
`inserted_at` timestampFor a given category ID, I am attempting to retrieve a list containing the vendor with the lowest latest price for each subcategory. With "latest" I mean that vendors may have multiple prices for a given category ID/subcategory ID combination, so only the most recently inserted price for that category ID/subcategory ID/vendor ID should be used. If there's a tie between 2 or more vendor's prices, the lowest id should be used as the tie-breaker.
For example, with this data:
id | category_id | subcategory_id | vendor_id | price | inserted_at
---------------------------------------------------------------------------
1 | 1 | 2 | 3 | 16.00 | 2015-07-23 04:00:00
2 | 1 | 1 | 2 | 9.00 | 2015-07-26 08:00:00
3 | 1 | 2 | 4 | 16.00 | 2015-08-02 10:00:00
4 | 1 | 1 | 1 | 7.00 | 2015-08-04 11:00:00
5 | 1 | 1 | 1 | 11.00 | 2015-08-09 16:00:00So, first find the most recent prices for every subcategory/vendor combination (row with
price=7.00 would be removed because it's not the most recent for that vendor in that subcategory). Then for subcategory 1 the lowest price would be 9 (so vendor_id = 2) and for subcategory 2 the lowest price is 16 (two vendors tie ()ids 3 and 4) so we choose the one with lowest vendor_id = 3).I would expect the following results for
category_id = 1:subcategory_id | vendor_id | price
----------------------------------
1 | 2 | 9.00
2 | 3 | 16.00Here's what I have so far. I feel like it's already starting to get out of hand and this doesn't even account for ties between 2 or more vendor's prices.
```
SELECT c.subcategory_id
Solution
This should work:
Test:
Explain:
I used the index recommendation from @ypercube.
SELECT
d.subcategory_id,
d.vendor_id,
MIN(d.price) AS price,
d.inserted_at
FROM product AS d
JOIN (SELECT
b.category_id,
b.subcategory_id,
b.vendor_id,
a.last_iat
FROM product AS b
JOIN (SELECT
a.category_id,
a.subcategory_id,
a.vendor_id,
a.price,
MAX(a.inserted_at) AS last_iat
FROM product AS a
GROUP BY a.category_id,a.subcategory_id,a.vendor_id
) AS a
ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id)
GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c
ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at)
WHERE d.category_id=1
GROUP BY d.category_id,d.subcategory_id;Test:
mysql> SELECT
-> d.subcategory_id,
-> d.vendor_id,
-> MIN(d.price) AS price,
-> d.inserted_at
-> FROM product AS d
-> JOIN (SELECT
-> b.category_id,
-> b.subcategory_id,
-> b.vendor_id,
-> a.last_iat
-> FROM product AS b
-> JOIN (SELECT
-> a.category_id,
-> a.subcategory_id,
-> a.vendor_id,
-> a.price,
-> MAX(a.inserted_at) AS last_iat
-> FROM product AS a
-> GROUP BY a.category_id,a.subcategory_id,a.vendor_id
-> ) AS a
-> ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id)
-> GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c
-> ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at)
-> WHERE d.category_id=1
-> GROUP BY d.category_id,d.subcategory_id;
+----------------+-----------+-------+---------------------+
| subcategory_id | vendor_id | price | inserted_at |
+----------------+-----------+-------+---------------------+
| 1 | 2 | 9.00 | 2015-07-26 08:00:00 |
| 2 | 3 | 16.00 | 2015-07-23 04:00:00 |
+----------------+-----------+-------+---------------------+
2 rows in set (0.00 sec)
mysql>Explain:
I used the index recommendation from @ypercube.
mysql> EXPLAIN SELECT d.subcategory_id, d.vendor_id, MIN(d.price) AS price, d.inserted_at FROM product AS d JOIN (SELECT b.category_id, b.subcategory_id, b.vendor_id, a.last_iat FROM product AS b JOIN (SELECT a.category_id, a.subcategory_id, a.vendor_id, a.price, MAX(a.inserted_at) AS last_iat FROM product AS a GROUP BY a.category_id,a.subcategory_id,a.vendor_id ) AS a ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id) GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at) WHERE d.category_id=1 GROUP BY d.category_id,d.subcategory_id;
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 4 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | d | ALL | q_ix | NULL | NULL | NULL | 5 | Using where; Using join buffer |
| 2 | DERIVED | | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 2 | DERIVED | b | ref | q_ix | q_ix | 15 | a.category_id,a.subcategory_id,a.vendor_id | 1 | Using where; Using index |
| 3 | DERIVED | a | index | NULL | q_ix | 19 | NULL | 5 | |
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
5 rows in set (0.00 sec)
mysql>Code Snippets
SELECT
d.subcategory_id,
d.vendor_id,
MIN(d.price) AS price,
d.inserted_at
FROM product AS d
JOIN (SELECT
b.category_id,
b.subcategory_id,
b.vendor_id,
a.last_iat
FROM product AS b
JOIN (SELECT
a.category_id,
a.subcategory_id,
a.vendor_id,
a.price,
MAX(a.inserted_at) AS last_iat
FROM product AS a
GROUP BY a.category_id,a.subcategory_id,a.vendor_id
) AS a
ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id)
GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c
ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at)
WHERE d.category_id=1
GROUP BY d.category_id,d.subcategory_id;mysql> SELECT
-> d.subcategory_id,
-> d.vendor_id,
-> MIN(d.price) AS price,
-> d.inserted_at
-> FROM product AS d
-> JOIN (SELECT
-> b.category_id,
-> b.subcategory_id,
-> b.vendor_id,
-> a.last_iat
-> FROM product AS b
-> JOIN (SELECT
-> a.category_id,
-> a.subcategory_id,
-> a.vendor_id,
-> a.price,
-> MAX(a.inserted_at) AS last_iat
-> FROM product AS a
-> GROUP BY a.category_id,a.subcategory_id,a.vendor_id
-> ) AS a
-> ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id)
-> GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c
-> ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at)
-> WHERE d.category_id=1
-> GROUP BY d.category_id,d.subcategory_id;
+----------------+-----------+-------+---------------------+
| subcategory_id | vendor_id | price | inserted_at |
+----------------+-----------+-------+---------------------+
| 1 | 2 | 9.00 | 2015-07-26 08:00:00 |
| 2 | 3 | 16.00 | 2015-07-23 04:00:00 |
+----------------+-----------+-------+---------------------+
2 rows in set (0.00 sec)
mysql>mysql> EXPLAIN SELECT d.subcategory_id, d.vendor_id, MIN(d.price) AS price, d.inserted_at FROM product AS d JOIN (SELECT b.category_id, b.subcategory_id, b.vendor_id, a.last_iat FROM product AS b JOIN (SELECT a.category_id, a.subcategory_id, a.vendor_id, a.price, MAX(a.inserted_at) AS last_iat FROM product AS a GROUP BY a.category_id,a.subcategory_id,a.vendor_id ) AS a ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id) GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at) WHERE d.category_id=1 GROUP BY d.category_id,d.subcategory_id;
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | d | ALL | q_ix | NULL | NULL | NULL | 5 | Using where; Using join buffer |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 2 | DERIVED | b | ref | q_ix | q_ix | 15 | a.category_id,a.subcategory_id,a.vendor_id | 1 | Using where; Using index |
| 3 | DERIVED | a | index | NULL | q_ix | 19 | NULL | 5 | |
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
5 rows in set (0.00 sec)
mysql>Context
StackExchange Database Administrators Q#111334, answer score: 6
Revisions (0)
No revisions yet.