patternsqlMinor
Group only certain rows with GROUP BY
Viewed 0 times
rowsgroupwithcertainonly
Problem
Schema
I have the following set-up in MySQL database:
Problem
I need to select:
Expected results
Possible solution 1
Two queries with
`/ EXPLAIN /
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
| 1 | PRIMARY | items | ref | items_group_id_idx | items_group_id_idx | 5 | const | 3 | Using where |
| 2 | UNION | items | ALL | items_group_id_idx | NULL | NULL | NULL | 7 | Using where; Using temporary; Using filesort |
| NULL | UNION RESULT | | ALL | NULL | N
I have the following set-up in MySQL database:
CREATE TABLE items (
id SERIAL,
name VARCHAR(100),
group_id INT,
price DECIMAL(10,2),
KEY items_group_id_idx (group_id),
PRIMARY KEY (id)
);
INSERT INTO items VALUES
(1, 'Item A', NULL, 10),
(2, 'Item B', NULL, 20),
(3, 'Item C', NULL, 30),
(4, 'Item D', 1, 40),
(5, 'Item E', 2, 50),
(6, 'Item F', 2, 60),
(7, 'Item G', 2, 70);Problem
I need to select:
- All items with
group_idthat hasNULLvalue, and
- One item from each group identified by
group_idhaving the lowest price.
Expected results
+----+--------+----------+-------+
| id | name | group_id | price |
+----+--------+----------+-------+
| 1 | Item A | NULL | 10.00 |
| 2 | Item B | NULL | 20.00 |
| 3 | Item C | NULL | 30.00 |
| 4 | Item D | 1 | 40.00 |
| 5 | Item E | 2 | 50.00 |
+----+--------+----------+-------+
Possible solution 1
Two queries with
UNION ALL:SELECT id, name, group_id, price FROM items
WHERE group_id IS NULL
UNION ALL
SELECT id, name, MIN(price) FROM items
WHERE group_id IS NOT NULL
GROUP BY group_id;`/ EXPLAIN /
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
| 1 | PRIMARY | items | ref | items_group_id_idx | items_group_id_idx | 5 | const | 3 | Using where |
| 2 | UNION | items | ALL | items_group_id_idx | NULL | NULL | NULL | 7 | Using where; Using temporary; Using filesort |
| NULL | UNION RESULT | | ALL | NULL | N
Solution
First, do not use either of your two queries. Both have a group by some column (
Second, the
Now, the problem of "groupwise-max" or "greatest-n-per-group" has many solutions (and even a tag, both at SO and here). There are two sub-problems, depending on whether ties can happen and what the wanted results are in those cases.
If you want all the tied rows, the solution with
An index on
GROUP BY group_id) and then select other columns, non-aggregated (SELECT id, name). This may give you wrong and unexpected results, despite that it may work in your tests, with some small sized table. Second, the
UNION ALL is not a problem. If the two subqueries perform efficiently, then the final union is ok, too. If you need a sort, the efficiency will depend on how that sort differs from the indexes used.Now, the problem of "groupwise-max" or "greatest-n-per-group" has many solutions (and even a tag, both at SO and here). There are two sub-problems, depending on whether ties can happen and what the wanted results are in those cases.
If you want all the tied rows, the solution with
GROUP BY inside a derived table is usually good. In your case, that you want just one row returned per group, another approach is easier to write and usually performs very well when there is a small number of group overall:SELECT id, name, price
FROM items
WHERE group_id IS NULL
UNION ALL
SELECT i.id, i.name, i.price
FROM
( SELECT DISTINCT group_id
FROM items
WHERE group_id IS NOT NULL
) AS di
JOIN
items AS i
ON i.id =
( SELECT id
FROM items
WHERE group_id = di.group_id
ORDER BY price, id -- order for resolving ties
LIMIT 1
)
ORDER BY
; -- final orderAn index on
(group_id, price, id) will be helpfulCode Snippets
SELECT id, name, price
FROM items
WHERE group_id IS NULL
UNION ALL
SELECT i.id, i.name, i.price
FROM
( SELECT DISTINCT group_id
FROM items
WHERE group_id IS NOT NULL
) AS di
JOIN
items AS i
ON i.id =
( SELECT id
FROM items
WHERE group_id = di.group_id
ORDER BY price, id -- order for resolving ties
LIMIT 1
)
ORDER BY
<some_columns> ; -- final orderContext
StackExchange Database Administrators Q#132242, answer score: 4
Revisions (0)
No revisions yet.