HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

query optimization: time intervals

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
optimizationquerytimeintervals

Problem

In the main, I've got two kinds of time intervals:

presence time and absence time

absence time can be of different types (eg breaks, absences, special day and so on) and time intervals may overlap and / or intersect.

It is not for sure, that only plausible combinations of intervals exist in raw data, eg. overlapping presence-intervals don't make sense, but may exist. I've tried to identify resulting presence-time intervals in many ways now - for me, the most comfortable seems to be the follwing one.

```
;with "timestamps"
as
(
select
"id" = row_number() over ( order by "empId", "timestamp", "opening", "type" )
, "empId"
, "timestamp"
, "type"
, "opening"
from
(
select "empId", "timestamp", "type", case when "types" = 'starttime' then 1 else -1 end as "opening" from
( select "empId", "starttime", "endtime", 1 as "type" from "worktime" ) as data
unpivot ( "timestamp" for "types" in ( "starttime", "endtime" ) ) as pvt
union all
select "empId", "timestamp", "type", case when "types" = 'starttime' then 1 else -1 end as "opening" from
( select "empId", "starttime", "endtime", 2 as "type" from "break" ) as data
unpivot ( "timestamp" for "types" in ( "starttime", "endtime" ) ) as pvt
union all
select "empId", "timestamp", "type", case when "types" = 'starttime' then 1 else -1 end as "opening" from
( select "empId", "starttime", "endtime", 3 as "type" from "absence" ) as data
unpivot ( "timestamp" for "types" in ( "starttime", "endtime" ) ) as pvt
) as data
)
select
T1."empId"
, "starttime" = T1."timestamp"
, "endtime" = T2."timestamp"
from
"timestamps" as T1
left join "timestamps" as T2
on T2."empId" = T1."empId"
and T2."id" = T1."id" + 1
left join "timestamps" as RS
on RS."empId" = T2."empId"
and RS."id" <= T1."id"
group by
T1."empId", T1."timest

Solution

I can't answer your question as to the absolutely best way. But I can offer a different way of solving the problem, which may or may not be better. It has a reasonably flat execution plan, and I think it will perform well. (I'm eager to know so share the results!)

I apologize for using my own syntax style instead of yours--it helps query wizardry come to me when everything lines up in its usual place.

The query is available in a SqlFiddle. I threw in an overlap for EmpID 1 just to be sure I had that covered. If you eventually find that overlaps cannot occur in presence data, then you can remove the final query and the Dense_Rank calculations.

WITH Points AS (
  SELECT DISTINCT
    T.EmpID,
    P.TimePoint
  FROM
    (
      SELECT * FROM dbo.WorkTime
      UNION SELECT * FROM dbo.BreakTime
      UNION SELECT * FROM dbo.Absence
    ) T
    CROSS APPLY (VALUES (StartTime), (EndTime)) P (TimePoint)
), Groups AS (
  SELECT
    P.EmpID,
    P.TimePoint,
    Grp =
      Row_Number()
      OVER (PARTITION BY P.EmpID ORDER BY P.TimePoint, X.Which) / 2
  FROM
    Points P
    CROSS JOIN (VALUES (1), (2)) X (Which)
), Ranges AS (
  SELECT
    G.EmpID,
    StartTime = Min(G.TimePoint),
    EndTime = Max(G.TimePoint)
  FROM Groups G
  GROUP BY
    G.EmpID,
    G.Grp
  HAVING Count(*) = 2
), Presences AS (
  SELECT
    R.*,
    P.Present,
    Grp =
       Dense_Rank() OVER (PARTITION BY R.EmpID ORDER BY R.StartTime)
       - Dense_Rank() OVER (PARTITION BY R.EmpID, P.Present ORDER BY R.StartTime)
  FROM
    Ranges R
    CROSS APPLY (
      SELECT
        CASE WHEN EXISTS (
          SELECT *
          FROM dbo.WorkTime W
          WHERE
            R.EmpID = W.EmpID
            AND R.StartTime < W.EndTime
            AND W.StartTime < R.EndTime
        ) AND NOT EXISTS (
          SELECT *
          FROM dbo.BreakTime B
          WHERE
            R.EmpID = B.EmpID
            AND R.StartTime < B.EndTime
            AND B.StartTime < R.EndTime
        ) AND NOT EXISTS (
          SELECT *
          FROM dbo.Absence A
          WHERE
            R.EmpID = A.EmpID
            AND R.StartTime < A.EndTime
            AND A.StartTime < R.EndTime
        ) THEN 1 ELSE 0 END
    ) P (Present)
)
SELECT
  EmpID,
  StartTime = Min(StartTime),
  EndTime = Max(EndTime)
FROM Presences
WHERE Present = 1
GROUP BY
  EmpID,
  Grp
ORDER BY
  EmpID,
  StartTime;


Note: performance of this query would be improved you combined the three tables and added a column to indicate what kind of time it was: work, break, or absence.

And why all the CTEs, you ask? Because each one is forced by what I need to do to the data. There's an aggregate, or I need to put a WHERE condition on a windowing function or use it in a clause where windowing functions aren't allowed.

Now I'm going to go off and see if I can't think up another strategy for accomplishing this. :)

For amusement I include here a "diagram" I made to help solve the problem:

------------
   -----------------
                ---------------
                           -----------

    ---    ------   ------       ------------

----   ----      ---      -------


The three sets of dashes (separated by spaces) represent, in order: presence data, absence data, and the desired result.

Code Snippets

WITH Points AS (
  SELECT DISTINCT
    T.EmpID,
    P.TimePoint
  FROM
    (
      SELECT * FROM dbo.WorkTime
      UNION SELECT * FROM dbo.BreakTime
      UNION SELECT * FROM dbo.Absence
    ) T
    CROSS APPLY (VALUES (StartTime), (EndTime)) P (TimePoint)
), Groups AS (
  SELECT
    P.EmpID,
    P.TimePoint,
    Grp =
      Row_Number()
      OVER (PARTITION BY P.EmpID ORDER BY P.TimePoint, X.Which) / 2
  FROM
    Points P
    CROSS JOIN (VALUES (1), (2)) X (Which)
), Ranges AS (
  SELECT
    G.EmpID,
    StartTime = Min(G.TimePoint),
    EndTime = Max(G.TimePoint)
  FROM Groups G
  GROUP BY
    G.EmpID,
    G.Grp
  HAVING Count(*) = 2
), Presences AS (
  SELECT
    R.*,
    P.Present,
    Grp =
       Dense_Rank() OVER (PARTITION BY R.EmpID ORDER BY R.StartTime)
       - Dense_Rank() OVER (PARTITION BY R.EmpID, P.Present ORDER BY R.StartTime)
  FROM
    Ranges R
    CROSS APPLY (
      SELECT
        CASE WHEN EXISTS (
          SELECT *
          FROM dbo.WorkTime W
          WHERE
            R.EmpID = W.EmpID
            AND R.StartTime < W.EndTime
            AND W.StartTime < R.EndTime
        ) AND NOT EXISTS (
          SELECT *
          FROM dbo.BreakTime B
          WHERE
            R.EmpID = B.EmpID
            AND R.StartTime < B.EndTime
            AND B.StartTime < R.EndTime
        ) AND NOT EXISTS (
          SELECT *
          FROM dbo.Absence A
          WHERE
            R.EmpID = A.EmpID
            AND R.StartTime < A.EndTime
            AND A.StartTime < R.EndTime
        ) THEN 1 ELSE 0 END
    ) P (Present)
)
SELECT
  EmpID,
  StartTime = Min(StartTime),
  EndTime = Max(EndTime)
FROM Presences
WHERE Present = 1
GROUP BY
  EmpID,
  Grp
ORDER BY
  EmpID,
  StartTime;
------------
   -----------------
                ---------------
                           -----------

    ---    ------   ------       ------------

----   ----      ---      -------

Context

StackExchange Database Administrators Q#31012, answer score: 3

Revisions (0)

No revisions yet.