patternsqlModerate
Extract MIN and MAX values related to datetime values on Postgres 9+
Viewed 0 times
valuespostgresrelatedminmaxextractanddatetime
Problem
I am trying to query a table that has a collection of student grades recorded over time. I would like to produce a result set that grabs the student id, the first grade, first grade date, last grade, last grade date.
I think I need to use the
Is there an effective way to achieve the results below on PostgreSQL?
Database Example:
Results I'm aiming for:
I think I need to use the
MIN and MAX functions and some subqueries to achieve this, but I'm just not getting the results I need.Is there an effective way to achieve the results below on PostgreSQL?
Database Example:
user_id | grade | grade_date
1 | A | 01/05/2016
1 | B | 01/15/2016
1 | C | 01/31/2016
2 | A | 01/05/2016
2 | B | 01/15/2016
2 | C | 01/31/2016
3 | A | 01/05/2016
3 | B | 01/15/2016
3 | C | 01/31/2016
4 | A | 01/05/2016
4 | B | 01/15/2016
4 | C | 01/31/2016Results I'm aiming for:
user_id | first_grade | first_date | last_grade | last_date
1 | A | 01/05/2016 | C | 01/31/2016
2 | A | 01/05/2016 | C | 01/31/2016
3 | A | 01/05/2016 | C | 01/31/2016
4 | A | 01/05/2016 | C | 01/31/2016Solution
There are various ways to do it. An index on columns used in order/filter/join (user_id and grade_date + grade) will play an important role on a large table.
Performances must be tested with real data and table/index design.
Using a window function (
ROW_NUMBER gives each row a number from 1 to N by grade_date up and down and only the first one of each is kept (n=1).
Using subqueries:
Each subquery only keep the first row and returns it.
Using MIN and MAX:
It may generate duplicate lines if a user has more than 1 grade on a first or last date.
See SQL Fiddle.
Performances must be tested with real data and table/index design.
Using a window function (
ROW_NUMBER()):SELECT f.user_id, f.grade, f.grade_date, l.grade, l.grade_date
FROM (
SELECT user_id, grade, grade_date
, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY grade_date) as n
FROM data
) f
INNER JOIN (
SELECT user_id, grade, grade_date
, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY grade_date DESC) as n
FROM data
) l
ON f.user_id = l.user_id
AND f.n = 1 AND l.n = 1;ROW_NUMBER gives each row a number from 1 to N by grade_date up and down and only the first one of each is kept (n=1).
Using subqueries:
SELECT user_id
, ( SELECT grade FROM data
WHERE user_id = d.user_id
ORDER BY grade_date LIMIT 1
)
, ( SELECT grade_date FROM data
WHERE user_id = d.user_id
ORDER BY grade_date LIMIT 1
)
, ( SELECT grade FROM data
WHERE user_id = d.user_id
ORDER BY grade_date DESC LIMIT 1
)
, ( SELECT grade_date FROM data
WHERE user_id = d.user_id
ORDER BY grade_date DESC LIMIT 1
)
FROM (SELECT DISTINCT user_id FROM data) d
;Each subquery only keep the first row and returns it.
Using MIN and MAX:
SELECT d.user_id, mn.grade, mn.grade_date, mx.grade, mx.grade_date
FROM (
SELECT user_id, MIN(grade_date) as min_grade_date, MAX(grade_date) as max_grade_date
FROM data
GROUP BY user_id
) d
INNER JOIN data mn
ON mn.grade_date = d.min_grade_date AND mn.user_id = d.user_id
INNER JOIN data mx
ON mx.grade_date = d.max_grade_date AND mx.user_id = d.user_id
;It may generate duplicate lines if a user has more than 1 grade on a first or last date.
See SQL Fiddle.
Code Snippets
SELECT f.user_id, f.grade, f.grade_date, l.grade, l.grade_date
FROM (
SELECT user_id, grade, grade_date
, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY grade_date) as n
FROM data
) f
INNER JOIN (
SELECT user_id, grade, grade_date
, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY grade_date DESC) as n
FROM data
) l
ON f.user_id = l.user_id
AND f.n = 1 AND l.n = 1;SELECT user_id
, ( SELECT grade FROM data
WHERE user_id = d.user_id
ORDER BY grade_date LIMIT 1
)
, ( SELECT grade_date FROM data
WHERE user_id = d.user_id
ORDER BY grade_date LIMIT 1
)
, ( SELECT grade FROM data
WHERE user_id = d.user_id
ORDER BY grade_date DESC LIMIT 1
)
, ( SELECT grade_date FROM data
WHERE user_id = d.user_id
ORDER BY grade_date DESC LIMIT 1
)
FROM (SELECT DISTINCT user_id FROM data) d
;SELECT d.user_id, mn.grade, mn.grade_date, mx.grade, mx.grade_date
FROM (
SELECT user_id, MIN(grade_date) as min_grade_date, MAX(grade_date) as max_grade_date
FROM data
GROUP BY user_id
) d
INNER JOIN data mn
ON mn.grade_date = d.min_grade_date AND mn.user_id = d.user_id
INNER JOIN data mx
ON mx.grade_date = d.max_grade_date AND mx.user_id = d.user_id
;Context
StackExchange Database Administrators Q#132465, answer score: 10
Revisions (0)
No revisions yet.