snippetModerate
How to get non-overlapping distinct intervals from a PostgreSQL table?
Viewed 0 times
postgresqldistinctnonintervalsoverlappinggethowfromtable
Problem
Using postgresql 9.6.
The table has user sessions and I need distinct non overlapping sessions printed.
sql fiddle
Now I need distinct non-overlapping user sessions, so it should give me:
The table has user sessions and I need distinct non overlapping sessions printed.
CREATE TABLE SESSIONS(
id serial NOT NULL PRIMARY KEY,
ctn INT NOT NULL,
day DATE NOT NULL,
f_time TIME(0) NOT NULL,
l_time TIME(0) NOT NULL
);
INSERT INTO SESSIONS(id, ctn, day, f_time, l_time)
VALUES
(1, 707, '2019-06-18', '10:48:25', '10:56:17'),
(2, 707, '2019-06-18', '10:48:33', '10:56:17'),
(3, 707, '2019-06-18', '10:53:17', '11:00:49'),
(4, 707, '2019-06-18', '10:54:31', '10:57:37'),
(5, 707, '2019-06-18', '11:03:59', '11:10:39'),
(6, 707, '2019-06-18', '11:04:41', '11:08:02'),
(7, 707, '2019-06-18', '11:11:04', '11:19:39');sql fiddle
id ctn day f_time l_time
1 707 2019-06-18 10:48:25 10:56:17
2 707 2019-06-18 10:48:33 10:56:17
3 707 2019-06-18 10:53:17 11:00:49
4 707 2019-06-18 10:54:31 10:57:37
5 707 2019-06-18 11:03:59 11:10:39
6 707 2019-06-18 11:04:41 11:08:02
7 707 2019-06-18 11:11:04 11:19:39Now I need distinct non-overlapping user sessions, so it should give me:
1. start_time: 10:48:25 end_time: 11:00:49 duration: 12min,24 sec
2. start_time: 11:03:59 end_time: 11:10:39 duration: 6min,40 sec
3. start_time: 11:11:04 end_time: 11:19:39 duration: 8min,35 secSolution
To solve this problem I did the following:
"Easy" explanation:
For this part I slightly added to the table definition provided by the OP. I'm firmly of the belief that DDL should be used to the maximum extent possible to "guide" the whole database programming process and could be much more powerful - an example of this would be SQL in
However, that's all very well and good, but we have to deal with PostgreSQL's 9.6 capabilities - the OP's version. My adjusted DDL for the "simple" explanation (see the entire fiddle here):
Indexes:
Just a point to note: do not use SQL keywords as table or column names -
The change to 'f_day' has no effect on the rest of the SQL as we don't take account of sessions which span midnight. Taking account of these can be done relatively easily by doing to the following (see fiddle).
Now the advent of
If a constraint to the second is unworkable - logins at the same time, you could possibly use
The fact remains that some simple DDL like this enormously simplifies your subsequent SQL (see discussion at the end of the "complex" explanation below).
You might also want to put
As for the sample data, I also added a few records for testing my solution as follows:
I'll go through my logic step by step - good for you perhaps, but also for me as it helps me to clarify my thinking and will ensure that the lessons which I have learnt from this exercise will stay with me - "I hear and I forget. I see and I remember. I do and I understand." - Confucius.
All of the following is included in the fiddle.
The first step is to use the
Result:
```
id ctn ft lt ovl
1 707 10:48:25 10:56:17 1
2 707 10:53:17 11:00:49 0
3 707 10:54:31 10:59:43 0
4 707 11:03:59 11:10:39 1
5 707 11:04:41 11:08:02 0
6 707 11:11:04 11:19:39 1
7 707 12:15:15 13:13:13 1
8 707 13:04:41 13:20:02 0
9 707 13:17:17 13:22:22 0
10 707 14:05:17 14:14:14 1
"Easy" explanation:
For this part I slightly added to the table definition provided by the OP. I'm firmly of the belief that DDL should be used to the maximum extent possible to "guide" the whole database programming process and could be much more powerful - an example of this would be SQL in
CHECK constraints - so far only provided by Firebird (example here) and H2 (see reference here).However, that's all very well and good, but we have to deal with PostgreSQL's 9.6 capabilities - the OP's version. My adjusted DDL for the "simple" explanation (see the entire fiddle here):
CREATE TABLE sessions
(
id serial NOT NULL PRIMARY KEY,
ctn INT NOT NULL,
f_day DATE NOT NULL,
f_time TIME(0) NOT NULL,
l_time TIME(0) NOT NULL,
CONSTRAINT ft_less_than_lt_ck CHECK (f_time < l_time),
CONSTRAINT ctn_f_day_f_time_uq UNIQUE (ctn, f_day, f_time),
CONSTRAINT ctn_f_day_l_time_uq UNIQUE (ctn, f_day, l_time)
-- could put in a DISTINCT somewhere if you don't have these constraints
-- maybe has TIME(2) - but see complex solution
);Indexes:
CREATE INDEX ctn_ix ON sessions USING BTREE (ctn ASC);
CREATE INDEX f_day_ix ON sessions USING BTREE (f_day ASC);
CREATE INDEX f_time_ix ON sessions USING BTREE (f_time ASC);Just a point to note: do not use SQL keywords as table or column names -
day is such a keyword! It can be confusing for debugging &c - it's simply not good practice. I've changed your original fieldname of day to f_day - note all lower and python case! Whatever you do, have a standard method of naming variables and stick to it - there are many coding standards documents out there.The change to 'f_day' has no effect on the rest of the SQL as we don't take account of sessions which span midnight. Taking account of these can be done relatively easily by doing to the following (see fiddle).
SELECT (f_day + f_time)::TIMESTAMP FROM sessions;Now the advent of
GENERATED columns, you don't even have to worry about this - just have a GENERATED field as above!If a constraint to the second is unworkable - logins at the same time, you could possibly use
TIME(2) (or 3..6) for ensuring uniqueness. If [you don't want | can't have] UNIQUE constraints, you could put in DISTINCT in your SQL for identical login and logout times - though this is unlikely.The fact remains that some simple DDL like this enormously simplifies your subsequent SQL (see discussion at the end of the "complex" explanation below).
You might also want to put
ctn and/or day in your DDL UNIQUE constraints as shown? I've also added what I think may be good indexes! You might also want to investigate the OVERLAPS operator?As for the sample data, I also added a few records for testing my solution as follows:
INSERT INTO sessions (id, ctn, day, f_time, l_time)
VALUES
( 1, 707, '2019-06-18', '10:48:25', '10:56:17'),
( 2, 707, '2019-06-18', '10:53:17', '11:00:49'),
( 3, 707, '2019-06-18', '10:54:31', '10:59:43'), -- record 3 is completely covered
-- by record 2
( 4, 707, '2019-06-18', '11:03:59', '11:10:39'),
( 5, 707, '2019-06-18', '11:04:41', '11:08:02'), -- GROUP 2 record 6 completely
-- covers record 7
( 6, 707, '2019-06-18', '11:11:04', '11:19:39'), -- GROUP 3
( 7, 707, '2019-06-18', '12:15:15', '13:13:13'),
( 8, 707, '2019-06-18', '13:04:41', '13:20:02'),
( 9, 707, '2019-06-18', '13:17:17', '13:22:22'), -- GROUP 4
(10, 707, '2019-06-18', '14:05:17', '14:14:14'); -- GROUP 5I'll go through my logic step by step - good for you perhaps, but also for me as it helps me to clarify my thinking and will ensure that the lessons which I have learnt from this exercise will stay with me - "I hear and I forget. I see and I remember. I do and I understand." - Confucius.
All of the following is included in the fiddle.
/**
So, the desired result is:
Interval 1 - start: 10:48:25 - end 11:00:49
Interval 2 - start: 11:03:59 - end 11:10:39
Interval 3 - start: 11:11:04 - end 11:19:39
Interval 4 - start: 12:15:15 - end 13:22:22
Interval 5 - start: 14:05:17 - end 14:14:14
**/The first step is to use the
LAG function (documentation) as follows:SELECT
s.id AS id, s.ctn AS ctn, s.f_time AS ft, s.l_time AS lt,
CASE
WHEN LAG(s.l_time) OVER () > f_time THEN 0
ELSE 1
END AS ovl
FROM sessions sResult:
```
id ctn ft lt ovl
1 707 10:48:25 10:56:17 1
2 707 10:53:17 11:00:49 0
3 707 10:54:31 10:59:43 0
4 707 11:03:59 11:10:39 1
5 707 11:04:41 11:08:02 0
6 707 11:11:04 11:19:39 1
7 707 12:15:15 13:13:13 1
8 707 13:04:41 13:20:02 0
9 707 13:17:17 13:22:22 0
10 707 14:05:17 14:14:14 1
Code Snippets
CREATE TABLE sessions
(
id serial NOT NULL PRIMARY KEY,
ctn INT NOT NULL,
f_day DATE NOT NULL,
f_time TIME(0) NOT NULL,
l_time TIME(0) NOT NULL,
CONSTRAINT ft_less_than_lt_ck CHECK (f_time < l_time),
CONSTRAINT ctn_f_day_f_time_uq UNIQUE (ctn, f_day, f_time),
CONSTRAINT ctn_f_day_l_time_uq UNIQUE (ctn, f_day, l_time)
-- could put in a DISTINCT somewhere if you don't have these constraints
-- maybe has TIME(2) - but see complex solution
);CREATE INDEX ctn_ix ON sessions USING BTREE (ctn ASC);
CREATE INDEX f_day_ix ON sessions USING BTREE (f_day ASC);
CREATE INDEX f_time_ix ON sessions USING BTREE (f_time ASC);SELECT (f_day + f_time)::TIMESTAMP FROM sessions;INSERT INTO sessions (id, ctn, day, f_time, l_time)
VALUES
( 1, 707, '2019-06-18', '10:48:25', '10:56:17'),
( 2, 707, '2019-06-18', '10:53:17', '11:00:49'),
( 3, 707, '2019-06-18', '10:54:31', '10:59:43'), -- record 3 is completely covered
-- by record 2
( 4, 707, '2019-06-18', '11:03:59', '11:10:39'),
( 5, 707, '2019-06-18', '11:04:41', '11:08:02'), -- GROUP 2 record 6 completely
-- covers record 7
( 6, 707, '2019-06-18', '11:11:04', '11:19:39'), -- GROUP 3
( 7, 707, '2019-06-18', '12:15:15', '13:13:13'),
( 8, 707, '2019-06-18', '13:04:41', '13:20:02'),
( 9, 707, '2019-06-18', '13:17:17', '13:22:22'), -- GROUP 4
(10, 707, '2019-06-18', '14:05:17', '14:14:14'); -- GROUP 5/**
So, the desired result is:
Interval 1 - start: 10:48:25 - end 11:00:49
Interval 2 - start: 11:03:59 - end 11:10:39
Interval 3 - start: 11:11:04 - end 11:19:39
Interval 4 - start: 12:15:15 - end 13:22:22
Interval 5 - start: 14:05:17 - end 14:14:14
**/Context
StackExchange Database Administrators Q#270952, answer score: 10
Revisions (0)
No revisions yet.