patternsqlMinor
Split up interval into year slices
Viewed 0 times
slicesyearintointervalsplit
Problem
Being a novice Postgres user, I have a table
I need to split the range into one year windows in separate rows (from start year to the last year of interval) stored in database like this:
Using the operator
I avoided this question because I don't want to go for a stored procedure based approach. I am aware that
ad in my PostgreSQL 9.5 (x64) database with 87 rows. In addition to other columns, it has two columns 'start' and 'end' having date-time duration range like this:ID Start End
1 2003-06-07 00:00:00 2004-09-30 23:59:59I need to split the range into one year windows in separate rows (from start year to the last year of interval) stored in database like this:
ID Start
1_2003 2003-06-07 00:00:00 2003-12-31 23:59:59
1_2004 2004-01-01 00:00:00 2004-09-30 23:59:59Using the operator
||'_'|| and Extract() function, I am able to concatenate ID with year. Also, this question addresses how to split up in interval in weeks and this one shows how to do the same for days but none of them addresses how to split up an interval in years specifically.I avoided this question because I don't want to go for a stored procedure based approach. I am aware that
generate_series() returns a series from start and stop parameters but actually I am struggling to break the interval at last day of the year and then re-starting from first day of the year in next row. I would highly appreciate if someone could guide me to this?Solution
A couple of suggestions:
-
Use legal, lower-case, unquoted identifiers to save yourself a lot of confusion.
-
Your
-
Since you don't need a time component in
-
Why concatenate
-
Typically,
So, your table could look like this:
Proper test values:
Solution
Use
Note that
Result:
If you create a new table from the result, I suggest
Aside: This is not an operator:
-
Use legal, lower-case, unquoted identifiers to save yourself a lot of confusion.
end (and, to a lesser degree start) are reserved words.-
Your
id column seems to be numeric type like integer, not text.-
Since you don't need a time component in
start_day and end_day, the appropriate data type is date, not timestamp.-
Why concatenate
id and the year for a new "ID"? Add a second column year if you need it for a PK. Or don't add an additional column at all. It can cheaply be extracted from the new start_day on the fly. Generally, don't store data redundantly if you can avoid it.-
Typically,
timestamp ranges use an inclusive lower bound and an exclusive upper bound. Since timestamps can have fractional digits (up to 6 in Postgres) that is much cleaner. Your input 2003-12-31 23:59:59 would fail for 2003-12-31 23:59:59.123.So, your table could look like this:
CREATE TABLE ad (
id int PRIMARY KEY
, start_day date NOT NULL -- *inclusive* lower bound
, end_day date NOT NULL -- *exclusive* upper bound
CHECK (end_day > start_day) -- enforce legal input
);Proper test values:
INSERT INTO ad(id, start_day, end_day)
VALUES
(1, '2003-06-07', '2004-10-01') -- span 2 years (your example)
, (2, '2003-06-07', '2003-06-08') -- 1 day in same year
, (3, '2003-06-07', '2003-10-01') -- span 1 year
, (4, '2003-06-07', '2006-10-01'); -- span many yearsSolution
Use
generate_series() in a LATERAL join, based on start and end day, truncated to the year with date_trunc(). This produces one row per year with the new start date. Add a year and you have the new end date. Except for first and last row per id, where you substitute the proper start / end with GREATEST and LEAST respectively. Voilá.-- CREATE TABLE ad_year AS
SELECT ad.id
, extract('year' FROM y)::int AS year
, GREATEST(y , ad.start_day) AS start_day
, LEAST (y + interval '1 year', ad.end_day) AS end_day
FROM ad
, generate_series(date_trunc('year', start_day::timestamp) -- cast to ts here!
, date_trunc('year', end_day::timestamp)
, interval '1 year') y;Note that
date_trunc() returns timestamptz for timestamptz input and timestamp for timestamp input. For date input it defaults to timestamptz. Since you seem to be ignoring time zones, cast the date to timestamp explicitly (start_day::timestamp).Result:
id | year | start_day | end_day
----+------+---------------------+---------------------
1 | 2003 | 2003-06-07 00:00:00 | 2004-01-01 00:00:00
1 | 2004 | 2004-01-01 00:00:00 | 2004-10-01 00:00:00
2 | 2003 | 2003-06-07 00:00:00 | 2003-06-08 00:00:00
3 | 2003 | 2003-06-07 00:00:00 | 2003-10-01 00:00:00
4 | 2003 | 2003-06-07 00:00:00 | 2004-01-01 00:00:00
4 | 2004 | 2004-01-01 00:00:00 | 2005-01-01 00:00:00
4 | 2005 | 2005-01-01 00:00:00 | 2006-01-01 00:00:00
4 | 2006 | 2006-01-01 00:00:00 | 2006-10-01 00:00:00
If you create a new table from the result, I suggest
(id, year) as primary key.Aside: This is not an operator:
||'_'|| (nor a cute little face). It's 2 concatenation operators || and a string literal '_'.Code Snippets
CREATE TABLE ad (
id int PRIMARY KEY
, start_day date NOT NULL -- *inclusive* lower bound
, end_day date NOT NULL -- *exclusive* upper bound
CHECK (end_day > start_day) -- enforce legal input
);INSERT INTO ad(id, start_day, end_day)
VALUES
(1, '2003-06-07', '2004-10-01') -- span 2 years (your example)
, (2, '2003-06-07', '2003-06-08') -- 1 day in same year
, (3, '2003-06-07', '2003-10-01') -- span 1 year
, (4, '2003-06-07', '2006-10-01'); -- span many years-- CREATE TABLE ad_year AS
SELECT ad.id
, extract('year' FROM y)::int AS year
, GREATEST(y , ad.start_day) AS start_day
, LEAST (y + interval '1 year', ad.end_day) AS end_day
FROM ad
, generate_series(date_trunc('year', start_day::timestamp) -- cast to ts here!
, date_trunc('year', end_day::timestamp)
, interval '1 year') y;Context
StackExchange Database Administrators Q#145996, answer score: 7
Revisions (0)
No revisions yet.