HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Group only certain rows with GROUP BY

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsgroupwithcertainonly

Problem

Schema

I have the following set-up in MySQL database:

CREATE TABLE items (
  id SERIAL,
  name VARCHAR(100),
  group_id INT,
  price DECIMAL(10,2),
  KEY items_group_id_idx (group_id),
  PRIMARY KEY (id)
);

INSERT INTO items VALUES 
(1, 'Item A', NULL, 10),
(2, 'Item B', NULL, 20),
(3, 'Item C', NULL, 30),
(4, 'Item D', 1,    40),
(5, 'Item E', 2,    50),
(6, 'Item F', 2,    60),
(7, 'Item G', 2,    70);


Problem

I need to select:

  • All items with group_id that has NULL value, and



  • One item from each group identified by group_id having the lowest price.



Expected results

+----+--------+----------+-------+
| id | name | group_id | price |
+----+--------+----------+-------+
| 1 | Item A | NULL | 10.00 |
| 2 | Item B | NULL | 20.00 |
| 3 | Item C | NULL | 30.00 |
| 4 | Item D | 1 | 40.00 |
| 5 | Item E | 2 | 50.00 |
+----+--------+----------+-------+


Possible solution 1

Two queries with UNION ALL:

SELECT id, name, group_id, price FROM items
WHERE group_id IS NULL
UNION ALL
SELECT id, name, MIN(price) FROM items
WHERE group_id IS NOT NULL
GROUP BY group_id;


`/ EXPLAIN /
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
| 1 | PRIMARY | items | ref | items_group_id_idx | items_group_id_idx | 5 | const | 3 | Using where |
| 2 | UNION | items | ALL | items_group_id_idx | NULL | NULL | NULL | 7 | Using where; Using temporary; Using filesort |
| NULL | UNION RESULT | | ALL | NULL | N

Solution

First, do not use either of your two queries. Both have a group by some column (GROUP BY group_id) and then select other columns, non-aggregated (SELECT id, name). This may give you wrong and unexpected results, despite that it may work in your tests, with some small sized table.

Second, the UNION ALL is not a problem. If the two subqueries perform efficiently, then the final union is ok, too. If you need a sort, the efficiency will depend on how that sort differs from the indexes used.

Now, the problem of "groupwise-max" or "greatest-n-per-group" has many solutions (and even a tag, both at SO and here). There are two sub-problems, depending on whether ties can happen and what the wanted results are in those cases.

If you want all the tied rows, the solution with GROUP BY inside a derived table is usually good. In your case, that you want just one row returned per group, another approach is easier to write and usually performs very well when there is a small number of group overall:

SELECT id, name, price 
FROM items
WHERE group_id IS NULL

UNION ALL

SELECT i.id, i.name, i.price 
FROM 
    ( SELECT DISTINCT group_id 
      FROM items
      WHERE group_id IS NOT NULL
    ) AS di
  JOIN 
    items AS i
  ON  i.id = 
    ( SELECT id
      FROM items 
      WHERE group_id = di.group_id
      ORDER BY price, id             -- order for resolving ties
      LIMIT 1
    ) 
ORDER BY
     ;                 -- final order


An index on (group_id, price, id) will be helpful

Code Snippets

SELECT id, name, price 
FROM items
WHERE group_id IS NULL

UNION ALL

SELECT i.id, i.name, i.price 
FROM 
    ( SELECT DISTINCT group_id 
      FROM items
      WHERE group_id IS NOT NULL
    ) AS di
  JOIN 
    items AS i
  ON  i.id = 
    ( SELECT id
      FROM items 
      WHERE group_id = di.group_id
      ORDER BY price, id             -- order for resolving ties
      LIMIT 1
    ) 
ORDER BY
    <some_columns> ;                 -- final order

Context

StackExchange Database Administrators Q#132242, answer score: 4

Revisions (0)

No revisions yet.