HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to sum up the distinct Total Time of an Event ignoring duplicate overlaps in Times?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
totaldistinctthehowduplicateoverlapstimesumignoringtimes

Problem

I have the following EventTimes table:

DROP TABLE IF EXISTS dbo.EventTimes;
CREATE TABLE dbo.EventTimes
(
  EventTimeKeyId INT IDENTITY(1,1) PRIMARY KEY,
  EventId INT NOT NULL,
  StartTime TIME NOT NULL,
  EndTime TIME NOT NULL
);


With the following data:

-- Event 1
INSERT INTO dbo.EventTimes (EventId, StartTime, EndTime)
VALUES 
  (1, '04:00:00', '14:00:00'), 
  (1, '06:00:00', '11:00:00'), 
  (1, '09:00:00', '12:00:00'), 
  (1, '13:00:00', '14:00:00'), -- Gap between this row and the next row
  (1, '02:30:00', '04:00:00'); -- Notice the half-hour on this one

-- Event 2
INSERT INTO dbo.EventTimes (EventId, StartTime, EndTime)
VALUES 
  (2, '00:00:00', '06:00:00'), -- Gap between this row and the next row
  (2, '09:00:00', '13:00:00'), 
  (2, '11:00:00', '15:00:00');


Notice:

  • The same Event can have two time ranges that overlap each other. E.g. same Event from 4 AM to 2 PM and and also from 6 AM to 11 AM.



  • There can also be gaps between the two time ranges. E.g. same Event from 1 PM to 2 PM and also from 2:30 PM to 4 PM.



End Goal:

I'm trying to calculate the TotalTime of a given Event ignoring the duplicate overlapping time. E.g. for the set of ranges of 9 AM to 1 PM and 11 AM to 3 PM, the distinct TotalTime should be 6 hours (9 AM to 3 PM). Conversely, I also don't want to count the time in the gaps between two time ranges. So for the set of ranges of 1 PM to 2 PM and 2:30 PM to 4 PM the TotalTime should be 2.5 hours. (Note these are just subsets of the full example above, and the final result should be the sum of all of these unique times per Event.)

The TotalTime should never exceed 24 hours, these times are all within a single day (as is the TIME data type).

Expected final results for the provided examples in the scripts above:

dbfiddle.uk for reference.

Other Info:

-
If it's easier to work with datetimes, feel free to alter the data types from TIME to DATETIME. I can convert the results back, no proble

Solution

All the complexity results from the overlapping intervals.

If we pack the intervals so there are no overlaps, the solution is a straightforward group and sum.

The following code uses Itzik Ben-Gan's interval packing solution:

Packing intervals is a classic T-SQL problem that involves packing groups of intersecting intervals into their respective continuous intervals. I set a challenge to myself to try and find an elegant solution that can achieve the task by using only one supporting index and a single scan of the data, and I found one.

-- Itzik Ben-Gan's interval packing solution
-- https://www.itprotoday.com/sql-server/new-solution-packing-intervals-problem
WITH 
    C1 AS
    (
        SELECT 
            *,
            prvend = MAX(ET.EndTime) OVER (
                PARTITION BY ET.EventId
                ORDER BY ET.StartTime, ET.EndTime
                ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
      FROM dbo.EventTimes AS ET
    ),
    C2 AS
    (
        SELECT 
            *,
            grp = SUM(isstart) OVER (
                PARTITION BY C1.EventId
                ORDER BY C1.StartTime, C1.EndTime
                ROWS UNBOUNDED PRECEDING)
        FROM C1
        CROSS APPLY
        (
            VALUES
            (
                CASE 
                    WHEN C1.StartTime <= C1.prvend 
                    THEN NULL 
                    ELSE 1 
                END
            )
        ) AS A (isstart)
    ),
    Packed AS
    (
        SELECT 
            C2.EventId, 
            StartTime = MIN(C2.StartTime), 
            EndTime = MAX(C2.EndTime)
        FROM C2
        GROUP BY 
            C2.EventId, 
            C2.grp
    )
-- Sum the packed intervals by EventId
SELECT 
    P.EventId, 
    TotalTime = SUM(DATEDIFF(MINUTE, P.StartTime, P.EndTime)) / 60.0
FROM Packed AS P
GROUP BY 
    P.EventId;


EventId
TotalTime

1
11.500000

2
12.000000

The initial sort can be avoided with an index on (EventId, StartTime, EndTime).

Code Snippets

-- Itzik Ben-Gan's interval packing solution
-- https://www.itprotoday.com/sql-server/new-solution-packing-intervals-problem
WITH 
    C1 AS
    (
        SELECT 
            *,
            prvend = MAX(ET.EndTime) OVER (
                PARTITION BY ET.EventId
                ORDER BY ET.StartTime, ET.EndTime
                ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
      FROM dbo.EventTimes AS ET
    ),
    C2 AS
    (
        SELECT 
            *,
            grp = SUM(isstart) OVER (
                PARTITION BY C1.EventId
                ORDER BY C1.StartTime, C1.EndTime
                ROWS UNBOUNDED PRECEDING)
        FROM C1
        CROSS APPLY
        (
            VALUES
            (
                CASE 
                    WHEN C1.StartTime <= C1.prvend 
                    THEN NULL 
                    ELSE 1 
                END
            )
        ) AS A (isstart)
    ),
    Packed AS
    (
        SELECT 
            C2.EventId, 
            StartTime = MIN(C2.StartTime), 
            EndTime = MAX(C2.EndTime)
        FROM C2
        GROUP BY 
            C2.EventId, 
            C2.grp
    )
-- Sum the packed intervals by EventId
SELECT 
    P.EventId, 
    TotalTime = SUM(DATEDIFF(MINUTE, P.StartTime, P.EndTime)) / 60.0
FROM Packed AS P
GROUP BY 
    P.EventId;

Context

StackExchange Database Administrators Q#333229, answer score: 3

Revisions (0)

No revisions yet.