HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Counting entites based on status in rolling interval

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
countingintervalrollingstatusentitesbased

Problem

Given the following table:

CREATE TABLE status_log
(
  entity_id bigint NOT NULL,
  user_id bigint NOT NULL,
  status character varying(128),
  "timestamp" timestamp(6) without time zone
)


I want to query number of entities in time window (grouped by granularity - hours, days, months), specifically number of entities in status before window, number that change into status during window and number that leave status before window ends. My faulty/unoptimized query:

WITH statuses AS (
    SELECT
        status,
        "timestamp",
        lag(status) OVER w AS previous_status,
        lag("timestamp") OVER w AS previous_timestamp,
        lead(status) OVER w AS next_status,
        lead("timestamp") OVER w AS next_timestamp
FROM status_log
WINDOW w AS (PARTITION BY entity_id ORDER BY "timestamp" ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
ORDER BY "timestamp" ASC
)
SELECT
    SUM(CASE WHEN previous_timestamp  'a'
AND next_timestamp <= date_trunc('hours', statuses.timestamp) + INTERVAL '1 hour'
THEN 1 ELSE 0 END) AS handled,
    date_trunc('hours', statuses.timestamp) AS interval
FROM statuses
WHERE statuses.timestamp BETWEEN '2016-07-24T00:00:00'::timestamp AND '2016-07-25T23:59:59.999999'::timestamp
GROUP BY date_trunc('hours', statuses.timestamp)
ORDER BY interval;


I think I have incoming and handled counts right. Is there any easier way than using window functions/precalculating entering entities?

UPDATE:

Trying a little bit different query now, still faulty and slow:

```
WITH predecesors AS (
SELECT lag(status) OVER w = 'a' AS valid, lag("timestamp") OVER w AS change, status, "timestamp"
FROM status_log
WINDOW w AS (PARTITION BY entity_id ORDER BY "timestamp" ASC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
), succesors AS (
SELECT status, "timestamp", lead(status) OVER w <> 'a' AS valid, lead("timestamp") OVER w AS change
FROM status_log
WINDOW w AS (PARTITION BY entity_id ORDER BY "timestamp" ASC ROWS BETWEEN 1 PRECEDING

Solution

Get previous status by self join by Row_Number and group by date_part:

WITH Prev_status AS (
   SELECT entity_id,
           user_id,
           status_id,
           row_number () over (entity_id, user_id
                               ORDER BY TIMESTAMP DESC) AS OrderNumber
   FROM Status_log
   WHERE status_log.timestamp BETWEEN '2016-06-25T00:00:00'::TIMESTAMP AND '2016-07-25T23:59:59.999999'::TIMESTAMP 
    ),

     Cur_status AS (
   SELECT entity_id,
           user_id,
           status_id,
           row_number () over (entity_id, user_id
                               ORDER BY TIMESTAMP DESC) NewOrderNumber
   FROM status_log sl
   WHERE status_log.timestamp BETWEEN '2016-06-25T00:00:00'::TIMESTAMP AND '2016-07-25T23:59:59.999999'::TIMESTAMP )

SELECT date_trunc('day',cs.timestamp) AS Date,
       date_part ('hour'cs.timestamp) AS HourInterval,
       SUM (CASE
                WHEN ps.status ='a' THEN 1
                ELSE 0
            END) ) AS Enter,
           SUM (CASE
                    WHEN cs.status ='a' THEN 1
                    ELSE 0
                END) ) AS Incoming,
               SUM (CASE WHEN ps.status = 'a'
                    AND cs.status <>'a' THEN 1 ELSE 0) AS Handeled

FROM Cur_status sc
JOIN Prev_status ps 
    ON cs.entity_id = ps.entity_id
        AND cs.user_id = cs.entity_id
        AND ps.OrderNumber = cs. NewOrdernumber +1

GROUP BY date_trunc('day',cs.timestamp) AS Date,
         date_part ('hour'cs.timestamp) AS HourInterval


This is core.

This script is prone to double or triple counting when same status for Entity/User posted several times during an hour.
Also one time window treated on its own and not aware of status in previous time interval.

If you share example of source data I can update my answer.

Code Snippets

WITH Prev_status AS (
   SELECT entity_id,
           user_id,
           status_id,
           row_number () over (entity_id, user_id
                               ORDER BY TIMESTAMP DESC) AS OrderNumber
   FROM Status_log
   WHERE status_log.timestamp BETWEEN '2016-06-25T00:00:00'::TIMESTAMP AND '2016-07-25T23:59:59.999999'::TIMESTAMP 
    ),

     Cur_status AS (
   SELECT entity_id,
           user_id,
           status_id,
           row_number () over (entity_id, user_id
                               ORDER BY TIMESTAMP DESC) NewOrderNumber
   FROM status_log sl
   WHERE status_log.timestamp BETWEEN '2016-06-25T00:00:00'::TIMESTAMP AND '2016-07-25T23:59:59.999999'::TIMESTAMP )


SELECT date_trunc('day',cs.timestamp) AS Date,
       date_part ('hour'cs.timestamp) AS HourInterval,
       SUM (CASE
                WHEN ps.status ='a' THEN 1
                ELSE 0
            END) ) AS Enter,
           SUM (CASE
                    WHEN cs.status ='a' THEN 1
                    ELSE 0
                END) ) AS Incoming,
               SUM (CASE WHEN ps.status = 'a'
                    AND cs.status <>'a' THEN 1 ELSE 0) AS Handeled

FROM Cur_status sc
JOIN Prev_status ps 
    ON cs.entity_id = ps.entity_id
        AND cs.user_id = cs.entity_id
        AND ps.OrderNumber = cs. NewOrdernumber +1

GROUP BY date_trunc('day',cs.timestamp) AS Date,
         date_part ('hour'cs.timestamp) AS HourInterval

Context

StackExchange Database Administrators Q#144949, answer score: 3

Revisions (0)

No revisions yet.