patternsqlMinor
Measure time duration between events, partitioned by hours
Viewed 0 times
eventshourstimebetweendurationmeasurepartitioned
Problem
I have the following table in PostgreSQL 9.4, which persists users log on/off events (log on is
Sample data:
I'd like to calculate the number of minutes each user was logged-on per hour:
This is my WIP version. It's not really elegant, it still has the hour hard coded in, ignores user_ids and assumes online-offline events are consecutive:
How to do this properly?
event_type 1, log off is event_type 0).CREATE TABLE user_online_offline_events (
id serial,
user_id int4,
event_type int4,
created_at timestamp
);Sample data:
INSERT INTO user_online_offline_events
(id, user_id, event_type, created_at)
VALUES (1, 123, 1, '2015-10-07 12:15:00'),
(2, 123, 0, '2015-10-07 12:25:00'),
(3, 123, 1, '2015-10-07 12:45:00'),
(4, 123, 0, '2015-10-07 13:10:00');I'd like to calculate the number of minutes each user was logged-on per hour:
| id | user_id | time | minutes logged on |
+----+---------+----------+-------------------+
| 1 | 123 | 12:00:00 | 30 |
| 2 | 123 | 13:00:00 | 10 |This is my WIP version. It's not really elegant, it still has the hour hard coded in, ignores user_ids and assumes online-offline events are consecutive:
select time, sum(minutes) / 60 as minutes from (
SELECT
date_trunc('hour', time) as time,
CASE
WHEN event_type = 0 AND lag(event_type, 1) OVER w = 1
THEN
extract(EPOCH FROM time - lag(time, 1) OVER w)
WHEN event_type = 0 AND lag(event_type, 1) OVER w ISNULL
THEN
extract(EPOCH FROM time - date_trunc('hour', created_at))
WHEN event_type = 1 AND lead(event_type, 1) OVER w ISNULL
THEN
extract(EPOCH FROM date_trunc('hour', time) + INTERVAL '1 hour' - time)
ELSE 0
END AS minutes
FROM user_online_offline_events
WHERE date_trunc('hour', time) = '2015-10-07 12:00:00'
WINDOW w AS ( ORDER BY time )
ORDER BY time
) m group by time;How to do this properly?
Solution
This is more sophisticated than I first understood. One way to handle it is with
fiddle - with extended data
Old sqlfiddle
Major points
Define the outer time frame in
Minutes per hours of the day in multiple days are summed up - if the outer time frame spans multiple days.
Hours without any online time are not in the result. If you need that use
Using the overlap operator
Resulting times are given as
Basic timestamp ranges are computed naively with the window function
You need to define exactly which corner cases can exist and how you want to handle them. There is a solution either way.
generate-series(), range types and associated functions and operators:SELECT user_id, hour, sum(upper(min_on) - lower(min_on)) AS minutes_on
FROM (
SELECT user_id, hour, u.range * h.range AS min_on
FROM (
SELECT hour, tsrange(hour, hour + interval '1h') AS range
FROM generate_series(timestamp '2015-10-07 00:00' -- defines range of interest
, timestamp '2015-10-08 03:00' -- cut off the rest
, interval '1h') hour
) h
JOIN (
SELECT user_id, event_type
, tsrange(created_at
, lead(created_at) OVER (PARTITION BY user_id ORDER BY created_at)
) AS range
FROM user_online_offline_events
-- add WHERE conditions to limit selection
-- careful with cutting off leading "on" / trailing "off" events
) u ON u.event_type = 1 AND u.range && h.range
) sub
GROUP BY user_id, hour
ORDER BY user_id, hour;fiddle - with extended data
Old sqlfiddle
Major points
Define the outer time frame in
generate_series() once. The function generates one row per hour.Minutes per hours of the day in multiple days are summed up - if the outer time frame spans multiple days.
Hours without any online time are not in the result. If you need that use
LEFT JOIN instead of JOINUsing the overlap operator
&& for ranges to identify matching hours and the intersection operator * to calculate actual overlaps.Resulting times are given as
interval. Since multiple days might be summed, the result can be more than one hour and a simple EXTRACT() would not capture that. You can extract minutes as integer number this way:EXTRACT('epoch' FROM sum(upper(min_on) - lower(min_on))) / 60 AS minutes_onBasic timestamp ranges are computed naively with the window function
lead(). This assumes that every log-on is followed by a log-off, which typically misses special cases:- Multiple log-on / log-off events for the same user in a row
- No matching log-on / log-off
You need to define exactly which corner cases can exist and how you want to handle them. There is a solution either way.
Code Snippets
SELECT user_id, hour, sum(upper(min_on) - lower(min_on)) AS minutes_on
FROM (
SELECT user_id, hour, u.range * h.range AS min_on
FROM (
SELECT hour, tsrange(hour, hour + interval '1h') AS range
FROM generate_series(timestamp '2015-10-07 00:00' -- defines range of interest
, timestamp '2015-10-08 03:00' -- cut off the rest
, interval '1h') hour
) h
JOIN (
SELECT user_id, event_type
, tsrange(created_at
, lead(created_at) OVER (PARTITION BY user_id ORDER BY created_at)
) AS range
FROM user_online_offline_events
-- add WHERE conditions to limit selection
-- careful with cutting off leading "on" / trailing "off" events
) u ON u.event_type = 1 AND u.range && h.range
) sub
GROUP BY user_id, hour
ORDER BY user_id, hour;EXTRACT('epoch' FROM sum(upper(min_on) - lower(min_on))) / 60 AS minutes_onContext
StackExchange Database Administrators Q#117256, answer score: 3
Revisions (0)
No revisions yet.