HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Aggregate discarding values in one column that haven't a match in another column

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
columndiscardingmatchonethatanothervaluesaggregatehaven

Problem

Say, I have a table representing colored and labeled items inside numbered boxes.

Each box can not contain more than one item with a particular label, but items with the same label (and the same or a different color) may be unique in other boxes.

Oversimplifying, and using PostgreSQL, we can take the following table:

CREATE TABLE items (
    label character varying,
    color character varying,
    box_number integer
);
INSERT INTO items VALUES
  ('a','red',1),
  ('b','blue',1),
  ('c','blue',1),
  ('a','red',2),
  ('c','green',2),
  ('d','blue',2),
  ('b','red',3),
  ('d','green',3);


I want to know the label and the color of all the items inside the box number 3, but also all the box numbers where an item with the same label can be found. In other words, I'm trying to:

SELECT label, boxes
FROM (
  SELECT label, array_agg(DISTINCT box_number) AS boxes
  FROM items
  GROUP BY label
) AS sub1
WHERE 3 = ANY(boxes);


But I also need to return the color column, showing only the color of the item inside the box number 3.

For the example data, the output should be this:

label | color | boxes
------+-------+------
b | red | 1,3
d | green | 2,3

Here's an SQL Fiddle for the example.

Solution

You can use a correlated subquery:

SELECT label, color
    , (SELECT ARRAY (SELECT box_number FROM items WHERE label = i.label)) AS boxes
FROM   items i
WHERE  box_number = 3;


Or (more modern) with LEFT JOIN LATERAL:

SELECT i.label, i.color, b.boxes
FROM   items i
LEFT   JOIN LATERAL (
   SELECT ARRAY (SELECT box_number FROM items WHERE label = i.label) AS boxes
   ) b ON true
WHERE  box_number = 3;


  • What is the difference between LATERAL and a subquery in PostgreSQL?



Using an array constructor since that is faster than array_agg() for a single column in the SELECT list.

You could also just use a self-join and aggregate:

SELECT i.label, i.color, array_agg(b.box_number) AS boxes
FROM   items i
JOIN   items b USING (label)
WHERE  i.box_number = 3
GROUP  BY i.label, i.color;


I'd expect that to be a bit slower. Test with EXPLAIN ANALYZE.

SQL Fiddle.

We certainly don't need DISTINCT here, since your data definition says:


Each box can not contain more than one item with a particular label

.. which is surely enforced with a UNIQUE constraint (or equivalent) on (label, box_number)?

This would also automatically provide the second index to make this query fast - besides the obvious index on (box_number). Details:

  • Is a composite index also good for queries on the first field?

Code Snippets

SELECT label, color
    , (SELECT ARRAY (SELECT box_number FROM items WHERE label = i.label)) AS boxes
FROM   items i
WHERE  box_number = 3;
SELECT i.label, i.color, b.boxes
FROM   items i
LEFT   JOIN LATERAL (
   SELECT ARRAY (SELECT box_number FROM items WHERE label = i.label) AS boxes
   ) b ON true
WHERE  box_number = 3;
SELECT i.label, i.color, array_agg(b.box_number) AS boxes
FROM   items i
JOIN   items b USING (label)
WHERE  i.box_number = 3
GROUP  BY i.label, i.color;

Context

StackExchange Database Administrators Q#121095, answer score: 3

Revisions (0)

No revisions yet.