HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Measure time duration between events, partitioned by hours

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
eventshourstimebetweendurationmeasurepartitioned

Problem

I have the following table in PostgreSQL 9.4, which persists users log on/off events (log on is event_type 1, log off is event_type 0).

CREATE TABLE user_online_offline_events (
  id serial,
  user_id int4,
  event_type int4,
  created_at timestamp
);


Sample data:

INSERT INTO user_online_offline_events
       (id, user_id, event_type, created_at)
VALUES (1,  123,     1,         '2015-10-07 12:15:00'),
       (2,  123,     0,         '2015-10-07 12:25:00'),
       (3,  123,     1,         '2015-10-07 12:45:00'),
       (4,  123,     0,         '2015-10-07 13:10:00');


I'd like to calculate the number of minutes each user was logged-on per hour:

| id | user_id |     time | minutes logged on |
+----+---------+----------+-------------------+
|  1 |     123 | 12:00:00 |                30 |
|  2 |     123 | 13:00:00 |                10 |


This is my WIP version. It's not really elegant, it still has the hour hard coded in, ignores user_ids and assumes online-offline events are consecutive:

select time, sum(minutes) / 60 as minutes from (
  SELECT
    date_trunc('hour', time) as time,
    CASE
    WHEN event_type = 0 AND lag(event_type, 1) OVER w = 1
      THEN
        extract(EPOCH FROM time - lag(time, 1) OVER w)
    WHEN event_type = 0 AND lag(event_type, 1) OVER w ISNULL
      THEN
        extract(EPOCH FROM time - date_trunc('hour', created_at))
    WHEN event_type = 1 AND lead(event_type, 1) OVER w ISNULL
      THEN
        extract(EPOCH FROM date_trunc('hour', time) + INTERVAL '1 hour' - time)
    ELSE 0
    END AS minutes
  FROM user_online_offline_events
  WHERE date_trunc('hour', time) = '2015-10-07 12:00:00'
  WINDOW w AS ( ORDER BY time )
  ORDER BY time
) m group by time;


How to do this properly?

Solution

This is more sophisticated than I first understood. One way to handle it is with generate-series(), range types and associated functions and operators:

SELECT user_id, hour, sum(upper(min_on) - lower(min_on)) AS minutes_on
FROM  (
   SELECT user_id, hour, u.range * h.range AS min_on
   FROM  (
      SELECT hour, tsrange(hour, hour + interval '1h') AS range
      FROM   generate_series(timestamp '2015-10-07 00:00'  -- defines range of interest
                           , timestamp '2015-10-08 03:00'  -- cut off the rest
                           , interval  '1h') hour
      ) h
   JOIN  (
      SELECT user_id, event_type
           , tsrange(created_at
                   , lead(created_at) OVER (PARTITION BY user_id ORDER BY created_at)
                    ) AS range
      FROM   user_online_offline_events
      -- add WHERE conditions to limit selection
      -- careful with cutting off leading "on" / trailing "off" events
      ) u ON u.event_type = 1 AND u.range && h.range
   ) sub
GROUP  BY  user_id, hour
ORDER  BY  user_id, hour;


fiddle - with extended data

Old sqlfiddle
Major points

Define the outer time frame in generate_series() once. The function generates one row per hour.

Minutes per hours of the day in multiple days are summed up - if the outer time frame spans multiple days.

Hours without any online time are not in the result. If you need that use LEFT JOIN instead of JOIN

Using the overlap operator && for ranges to identify matching hours and the intersection operator * to calculate actual overlaps.

Resulting times are given as interval. Since multiple days might be summed, the result can be more than one hour and a simple EXTRACT() would not capture that. You can extract minutes as integer number this way:

EXTRACT('epoch' FROM sum(upper(min_on) - lower(min_on))) / 60 AS minutes_on


Basic timestamp ranges are computed naively with the window function lead(). This assumes that every log-on is followed by a log-off, which typically misses special cases:

  • Multiple log-on / log-off events for the same user in a row



  • No matching log-on / log-off



You need to define exactly which corner cases can exist and how you want to handle them. There is a solution either way.

Code Snippets

SELECT user_id, hour, sum(upper(min_on) - lower(min_on)) AS minutes_on
FROM  (
   SELECT user_id, hour, u.range * h.range AS min_on
   FROM  (
      SELECT hour, tsrange(hour, hour + interval '1h') AS range
      FROM   generate_series(timestamp '2015-10-07 00:00'  -- defines range of interest
                           , timestamp '2015-10-08 03:00'  -- cut off the rest
                           , interval  '1h') hour
      ) h
   JOIN  (
      SELECT user_id, event_type
           , tsrange(created_at
                   , lead(created_at) OVER (PARTITION BY user_id ORDER BY created_at)
                    ) AS range
      FROM   user_online_offline_events
      -- add WHERE conditions to limit selection
      -- careful with cutting off leading "on" / trailing "off" events
      ) u ON u.event_type = 1 AND u.range && h.range
   ) sub
GROUP  BY  user_id, hour
ORDER  BY  user_id, hour;
EXTRACT('epoch' FROM sum(upper(min_on) - lower(min_on))) / 60 AS minutes_on

Context

StackExchange Database Administrators Q#117256, answer score: 3

Revisions (0)

No revisions yet.