HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Avoid duplication when using other table in WHERE and JOIN clause in Postgres

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresduplicationwherejoinavoidusingwhenandotherclause

Problem

I have 3 tables (devices, shops, device_shops_versions). devices can have many device_shops_versions.

device_shops_versions table:

+----+---------+-----------+---------+---------------------------+
| id | version | device_id | shop_id | updated_at                |
+----+---------+-----------+---------+---------------------------+
| 1  | 113     | 1         | 1       | 2014-05-05 17:03:25.04055 |
| 2  | 702     | 1         | 1       | 2015-05-05 17:03:25.04055 |
| 3  | 410     | 2         | 1       | 2014-05-30 09:29:44.88214 |
| 4  | 440     | 4         | 2       | 2013-06-30 08:28:42.98214 |
+----+---------+-----------+---------+---------------------------+


1) I want to return all the devices using device_id that are in device_shops_versions for a specific shop_id.

2) I want to display the most recent version of every device in device_shops_versions - with the most recent updated_at.

I should have something like:

+----+--------------+---------+-----------+---------+--------------------------+
| id | device_model | version | device_id | shop_id | updated_at               |
+----+--------------+---------+-----------+---------+--------------------------+
| 1  | 'iphone'     | 702     | 1         | 1       | 2015-05-05 17:03:25.040  |
| 2  | 'test'       | 410     | 2         | 1       | 2014-05-30 09:28:44.982  |
+----+--------------+---------+-----------+---------+--------------------------+


My query looks like this but I'm not sure I'm correct and also I would like to know if I can avoid the duplication :

SELECT *
FROM devices
INNER JOIN
  (SELECT device_shops_versions.device_id,
          MAX(device_shops_versions.updated_at)
   FROM device_shops_versions
   GROUP BY device_id ) dcv ON devices.id = dcv.device_id
WHERE devices.id IN
    (SELECT device_shops_versions.device_id
     FROM device_shops_versions
     WHERE device_shops_versions.shop_id = 1);

Solution

Use DISTINCT ON, it's considerably shorter, simpler and probably also faster:

SELECT DISTINCT ON (device_id) *
FROM   devices_shop_versions
WHERE  shop_id = 1
ORDER  BY device_id, updated_at DESC NULLS LAST;


Detailed explanation:

  • Select first row in each GROUP BY group?



If your table is big, you have many rows per device_id and you need to optimize performance, a recursive CTE might be faster, yet. Detailed instructions:

  • Optimize GROUP BY query to retrieve latest record per user



Make it a subquery to integrate in a bigger query:

SELECT mv.id, d.device_name, u.user_name, mv.version, mv.device_id, mv.shop_id, mv.updated_at
FROM  (
   -- query from above
   ) mv
JOIN   devices d on d.id = mv.device_id
JOIN   users   u on u.id = d.user_id
-- ORDER BY ???;  -- optional


Demo on rextester.

It's typically cheaper to join to more tables after excluding irrelevant rows.

Code Snippets

SELECT DISTINCT ON (device_id) *
FROM   devices_shop_versions
WHERE  shop_id = 1
ORDER  BY device_id, updated_at DESC NULLS LAST;
SELECT mv.id, d.device_name, u.user_name, mv.version, mv.device_id, mv.shop_id, mv.updated_at
FROM  (
   -- query from above
   ) mv
JOIN   devices d on d.id = mv.device_id
JOIN   users   u on u.id = d.user_id
-- ORDER BY ???;  -- optional

Context

StackExchange Database Administrators Q#165560, answer score: 3

Revisions (0)

No revisions yet.