HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Nested aggregation and grouping on multiple columns in MySQL

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
columnsgroupingnestedmysqlmultipleandaggregation

Problem

Consider the following product table (which is highly trimmed down):

`id` int AUTO_INCREMENT
`category_id` int
`subcategory_id` int
`vendor_id` int
`price` decimal(6,2)
`inserted_at` timestamp


For a given category ID, I am attempting to retrieve a list containing the vendor with the lowest latest price for each subcategory. With "latest" I mean that vendors may have multiple prices for a given category ID/subcategory ID combination, so only the most recently inserted price for that category ID/subcategory ID/vendor ID should be used. If there's a tie between 2 or more vendor's prices, the lowest id should be used as the tie-breaker.

For example, with this data:

id | category_id | subcategory_id | vendor_id | price | inserted_at
---------------------------------------------------------------------------
1  | 1           | 2              | 3         | 16.00 | 2015-07-23 04:00:00  
2  | 1           | 1              | 2         |  9.00 | 2015-07-26 08:00:00 
3  | 1           | 2              | 4         | 16.00 | 2015-08-02 10:00:00 
4  | 1           | 1              | 1         |  7.00 | 2015-08-04 11:00:00
5  | 1           | 1              | 1         | 11.00 | 2015-08-09 16:00:00


So, first find the most recent prices for every subcategory/vendor combination (row with price=7.00 would be removed because it's not the most recent for that vendor in that subcategory). Then for subcategory 1 the lowest price would be 9 (so vendor_id = 2) and for subcategory 2 the lowest price is 16 (two vendors tie ()ids 3 and 4) so we choose the one with lowest vendor_id = 3).

I would expect the following results for category_id = 1:

subcategory_id | vendor_id | price
----------------------------------
1              | 2         |  9.00
2              | 3         | 16.00


Here's what I have so far. I feel like it's already starting to get out of hand and this doesn't even account for ties between 2 or more vendor's prices.

```
SELECT c.subcategory_id

Solution

This should work:

SELECT
    d.subcategory_id,
    d.vendor_id,
    MIN(d.price) AS price,
    d.inserted_at
FROM product AS d
JOIN (SELECT
        b.category_id,
        b.subcategory_id,
        b.vendor_id,
        a.last_iat
    FROM product AS b 
    JOIN (SELECT
            a.category_id,
            a.subcategory_id,
            a.vendor_id,
            a.price,
            MAX(a.inserted_at) AS last_iat
        FROM product AS a
        GROUP BY a.category_id,a.subcategory_id,a.vendor_id
        ) AS a
        ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id)
    GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c
    ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at)
WHERE d.category_id=1
GROUP BY d.category_id,d.subcategory_id;


Test:

mysql> SELECT
    ->     d.subcategory_id,
    ->     d.vendor_id,
    ->     MIN(d.price) AS price,
    ->     d.inserted_at
    -> FROM product AS d
    -> JOIN (SELECT
    ->         b.category_id,
    ->         b.subcategory_id,
    ->         b.vendor_id,
    ->         a.last_iat
    ->     FROM product AS b 
    ->     JOIN (SELECT
    ->             a.category_id,
    ->             a.subcategory_id,
    ->             a.vendor_id,
    ->             a.price,
    ->             MAX(a.inserted_at) AS last_iat
    ->         FROM product AS a
    ->         GROUP BY a.category_id,a.subcategory_id,a.vendor_id
    ->         ) AS a
    ->         ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id)
    ->     GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c
    ->     ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at)
    -> WHERE d.category_id=1
    -> GROUP BY d.category_id,d.subcategory_id;
+----------------+-----------+-------+---------------------+
| subcategory_id | vendor_id | price | inserted_at         |
+----------------+-----------+-------+---------------------+
|              1 |         2 |  9.00 | 2015-07-26 08:00:00 |
|              2 |         3 | 16.00 | 2015-07-23 04:00:00 |
+----------------+-----------+-------+---------------------+
2 rows in set (0.00 sec)

mysql>


Explain:

I used the index recommendation from @ypercube.

mysql> EXPLAIN SELECT d.subcategory_id, d.vendor_id, MIN(d.price) AS price, d.inserted_at FROM product AS d JOIN (SELECT b.category_id, b.subcategory_id, b.vendor_id, a.last_iat FROM product AS b  JOIN (SELECT a.category_id, a.subcategory_id, a.vendor_id, a.price, MAX(a.inserted_at) AS last_iat FROM product AS a GROUP BY a.category_id,a.subcategory_id,a.vendor_id ) AS a ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id) GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at) WHERE d.category_id=1 GROUP BY d.category_id,d.subcategory_id;
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
| id | select_type | table      | type  | possible_keys | key  | key_len | ref                                        | rows | Extra                                        |
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     |  | ALL   | NULL          | NULL | NULL    | NULL                                       |    4 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY     | d          | ALL   | q_ix          | NULL | NULL    | NULL                                       |    5 | Using where; Using join buffer               |
|  2 | DERIVED     |  | ALL   | NULL          | NULL | NULL    | NULL                                       |    4 | Using temporary; Using filesort              |
|  2 | DERIVED     | b          | ref   | q_ix          | q_ix | 15      | a.category_id,a.subcategory_id,a.vendor_id |    1 | Using where; Using index                     |
|  3 | DERIVED     | a          | index | NULL          | q_ix | 19      | NULL                                       |    5 |                                              |
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
5 rows in set (0.00 sec)

mysql>

Code Snippets

SELECT
    d.subcategory_id,
    d.vendor_id,
    MIN(d.price) AS price,
    d.inserted_at
FROM product AS d
JOIN (SELECT
        b.category_id,
        b.subcategory_id,
        b.vendor_id,
        a.last_iat
    FROM product AS b 
    JOIN (SELECT
            a.category_id,
            a.subcategory_id,
            a.vendor_id,
            a.price,
            MAX(a.inserted_at) AS last_iat
        FROM product AS a
        GROUP BY a.category_id,a.subcategory_id,a.vendor_id
        ) AS a
        ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id)
    GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c
    ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at)
WHERE d.category_id=1
GROUP BY d.category_id,d.subcategory_id;
mysql> SELECT
    ->     d.subcategory_id,
    ->     d.vendor_id,
    ->     MIN(d.price) AS price,
    ->     d.inserted_at
    -> FROM product AS d
    -> JOIN (SELECT
    ->         b.category_id,
    ->         b.subcategory_id,
    ->         b.vendor_id,
    ->         a.last_iat
    ->     FROM product AS b 
    ->     JOIN (SELECT
    ->             a.category_id,
    ->             a.subcategory_id,
    ->             a.vendor_id,
    ->             a.price,
    ->             MAX(a.inserted_at) AS last_iat
    ->         FROM product AS a
    ->         GROUP BY a.category_id,a.subcategory_id,a.vendor_id
    ->         ) AS a
    ->         ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id)
    ->     GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c
    ->     ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at)
    -> WHERE d.category_id=1
    -> GROUP BY d.category_id,d.subcategory_id;
+----------------+-----------+-------+---------------------+
| subcategory_id | vendor_id | price | inserted_at         |
+----------------+-----------+-------+---------------------+
|              1 |         2 |  9.00 | 2015-07-26 08:00:00 |
|              2 |         3 | 16.00 | 2015-07-23 04:00:00 |
+----------------+-----------+-------+---------------------+
2 rows in set (0.00 sec)

mysql>
mysql> EXPLAIN SELECT d.subcategory_id, d.vendor_id, MIN(d.price) AS price, d.inserted_at FROM product AS d JOIN (SELECT b.category_id, b.subcategory_id, b.vendor_id, a.last_iat FROM product AS b  JOIN (SELECT a.category_id, a.subcategory_id, a.vendor_id, a.price, MAX(a.inserted_at) AS last_iat FROM product AS a GROUP BY a.category_id,a.subcategory_id,a.vendor_id ) AS a ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id) GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at) WHERE d.category_id=1 GROUP BY d.category_id,d.subcategory_id;
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
| id | select_type | table      | type  | possible_keys | key  | key_len | ref                                        | rows | Extra                                        |
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL | NULL    | NULL                                       |    4 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY     | d          | ALL   | q_ix          | NULL | NULL    | NULL                                       |    5 | Using where; Using join buffer               |
|  2 | DERIVED     | <derived3> | ALL   | NULL          | NULL | NULL    | NULL                                       |    4 | Using temporary; Using filesort              |
|  2 | DERIVED     | b          | ref   | q_ix          | q_ix | 15      | a.category_id,a.subcategory_id,a.vendor_id |    1 | Using where; Using index                     |
|  3 | DERIVED     | a          | index | NULL          | q_ix | 19      | NULL                                       |    5 |                                              |
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
5 rows in set (0.00 sec)

mysql>

Context

StackExchange Database Administrators Q#111334, answer score: 6

Revisions (0)

No revisions yet.