patternMinor
query optimization: time intervals
Viewed 0 times
optimizationquerytimeintervals
Problem
In the main, I've got two kinds of time intervals:
It is not for sure, that only plausible combinations of intervals exist in raw data, eg. overlapping presence-intervals don't make sense, but may exist. I've tried to identify resulting presence-time intervals in many ways now - for me, the most comfortable seems to be the follwing one.
```
;with "timestamps"
as
(
select
"id" = row_number() over ( order by "empId", "timestamp", "opening", "type" )
, "empId"
, "timestamp"
, "type"
, "opening"
from
(
select "empId", "timestamp", "type", case when "types" = 'starttime' then 1 else -1 end as "opening" from
( select "empId", "starttime", "endtime", 1 as "type" from "worktime" ) as data
unpivot ( "timestamp" for "types" in ( "starttime", "endtime" ) ) as pvt
union all
select "empId", "timestamp", "type", case when "types" = 'starttime' then 1 else -1 end as "opening" from
( select "empId", "starttime", "endtime", 2 as "type" from "break" ) as data
unpivot ( "timestamp" for "types" in ( "starttime", "endtime" ) ) as pvt
union all
select "empId", "timestamp", "type", case when "types" = 'starttime' then 1 else -1 end as "opening" from
( select "empId", "starttime", "endtime", 3 as "type" from "absence" ) as data
unpivot ( "timestamp" for "types" in ( "starttime", "endtime" ) ) as pvt
) as data
)
select
T1."empId"
, "starttime" = T1."timestamp"
, "endtime" = T2."timestamp"
from
"timestamps" as T1
left join "timestamps" as T2
on T2."empId" = T1."empId"
and T2."id" = T1."id" + 1
left join "timestamps" as RS
on RS."empId" = T2."empId"
and RS."id" <= T1."id"
group by
T1."empId", T1."timest
presence time and absence timeabsence time can be of different types (eg breaks, absences, special day and so on) and time intervals may overlap and / or intersect.It is not for sure, that only plausible combinations of intervals exist in raw data, eg. overlapping presence-intervals don't make sense, but may exist. I've tried to identify resulting presence-time intervals in many ways now - for me, the most comfortable seems to be the follwing one.
```
;with "timestamps"
as
(
select
"id" = row_number() over ( order by "empId", "timestamp", "opening", "type" )
, "empId"
, "timestamp"
, "type"
, "opening"
from
(
select "empId", "timestamp", "type", case when "types" = 'starttime' then 1 else -1 end as "opening" from
( select "empId", "starttime", "endtime", 1 as "type" from "worktime" ) as data
unpivot ( "timestamp" for "types" in ( "starttime", "endtime" ) ) as pvt
union all
select "empId", "timestamp", "type", case when "types" = 'starttime' then 1 else -1 end as "opening" from
( select "empId", "starttime", "endtime", 2 as "type" from "break" ) as data
unpivot ( "timestamp" for "types" in ( "starttime", "endtime" ) ) as pvt
union all
select "empId", "timestamp", "type", case when "types" = 'starttime' then 1 else -1 end as "opening" from
( select "empId", "starttime", "endtime", 3 as "type" from "absence" ) as data
unpivot ( "timestamp" for "types" in ( "starttime", "endtime" ) ) as pvt
) as data
)
select
T1."empId"
, "starttime" = T1."timestamp"
, "endtime" = T2."timestamp"
from
"timestamps" as T1
left join "timestamps" as T2
on T2."empId" = T1."empId"
and T2."id" = T1."id" + 1
left join "timestamps" as RS
on RS."empId" = T2."empId"
and RS."id" <= T1."id"
group by
T1."empId", T1."timest
Solution
I can't answer your question as to the absolutely best way. But I can offer a different way of solving the problem, which may or may not be better. It has a reasonably flat execution plan, and I think it will perform well. (I'm eager to know so share the results!)
I apologize for using my own syntax style instead of yours--it helps query wizardry come to me when everything lines up in its usual place.
The query is available in a SqlFiddle. I threw in an overlap for EmpID 1 just to be sure I had that covered. If you eventually find that overlaps cannot occur in presence data, then you can remove the final query and the
Note: performance of this query would be improved you combined the three tables and added a column to indicate what kind of time it was: work, break, or absence.
And why all the CTEs, you ask? Because each one is forced by what I need to do to the data. There's an aggregate, or I need to put a WHERE condition on a windowing function or use it in a clause where windowing functions aren't allowed.
Now I'm going to go off and see if I can't think up another strategy for accomplishing this. :)
For amusement I include here a "diagram" I made to help solve the problem:
The three sets of dashes (separated by spaces) represent, in order: presence data, absence data, and the desired result.
I apologize for using my own syntax style instead of yours--it helps query wizardry come to me when everything lines up in its usual place.
The query is available in a SqlFiddle. I threw in an overlap for EmpID 1 just to be sure I had that covered. If you eventually find that overlaps cannot occur in presence data, then you can remove the final query and the
Dense_Rank calculations.WITH Points AS (
SELECT DISTINCT
T.EmpID,
P.TimePoint
FROM
(
SELECT * FROM dbo.WorkTime
UNION SELECT * FROM dbo.BreakTime
UNION SELECT * FROM dbo.Absence
) T
CROSS APPLY (VALUES (StartTime), (EndTime)) P (TimePoint)
), Groups AS (
SELECT
P.EmpID,
P.TimePoint,
Grp =
Row_Number()
OVER (PARTITION BY P.EmpID ORDER BY P.TimePoint, X.Which) / 2
FROM
Points P
CROSS JOIN (VALUES (1), (2)) X (Which)
), Ranges AS (
SELECT
G.EmpID,
StartTime = Min(G.TimePoint),
EndTime = Max(G.TimePoint)
FROM Groups G
GROUP BY
G.EmpID,
G.Grp
HAVING Count(*) = 2
), Presences AS (
SELECT
R.*,
P.Present,
Grp =
Dense_Rank() OVER (PARTITION BY R.EmpID ORDER BY R.StartTime)
- Dense_Rank() OVER (PARTITION BY R.EmpID, P.Present ORDER BY R.StartTime)
FROM
Ranges R
CROSS APPLY (
SELECT
CASE WHEN EXISTS (
SELECT *
FROM dbo.WorkTime W
WHERE
R.EmpID = W.EmpID
AND R.StartTime < W.EndTime
AND W.StartTime < R.EndTime
) AND NOT EXISTS (
SELECT *
FROM dbo.BreakTime B
WHERE
R.EmpID = B.EmpID
AND R.StartTime < B.EndTime
AND B.StartTime < R.EndTime
) AND NOT EXISTS (
SELECT *
FROM dbo.Absence A
WHERE
R.EmpID = A.EmpID
AND R.StartTime < A.EndTime
AND A.StartTime < R.EndTime
) THEN 1 ELSE 0 END
) P (Present)
)
SELECT
EmpID,
StartTime = Min(StartTime),
EndTime = Max(EndTime)
FROM Presences
WHERE Present = 1
GROUP BY
EmpID,
Grp
ORDER BY
EmpID,
StartTime;Note: performance of this query would be improved you combined the three tables and added a column to indicate what kind of time it was: work, break, or absence.
And why all the CTEs, you ask? Because each one is forced by what I need to do to the data. There's an aggregate, or I need to put a WHERE condition on a windowing function or use it in a clause where windowing functions aren't allowed.
Now I'm going to go off and see if I can't think up another strategy for accomplishing this. :)
For amusement I include here a "diagram" I made to help solve the problem:
------------
-----------------
---------------
-----------
--- ------ ------ ------------
---- ---- --- -------The three sets of dashes (separated by spaces) represent, in order: presence data, absence data, and the desired result.
Code Snippets
WITH Points AS (
SELECT DISTINCT
T.EmpID,
P.TimePoint
FROM
(
SELECT * FROM dbo.WorkTime
UNION SELECT * FROM dbo.BreakTime
UNION SELECT * FROM dbo.Absence
) T
CROSS APPLY (VALUES (StartTime), (EndTime)) P (TimePoint)
), Groups AS (
SELECT
P.EmpID,
P.TimePoint,
Grp =
Row_Number()
OVER (PARTITION BY P.EmpID ORDER BY P.TimePoint, X.Which) / 2
FROM
Points P
CROSS JOIN (VALUES (1), (2)) X (Which)
), Ranges AS (
SELECT
G.EmpID,
StartTime = Min(G.TimePoint),
EndTime = Max(G.TimePoint)
FROM Groups G
GROUP BY
G.EmpID,
G.Grp
HAVING Count(*) = 2
), Presences AS (
SELECT
R.*,
P.Present,
Grp =
Dense_Rank() OVER (PARTITION BY R.EmpID ORDER BY R.StartTime)
- Dense_Rank() OVER (PARTITION BY R.EmpID, P.Present ORDER BY R.StartTime)
FROM
Ranges R
CROSS APPLY (
SELECT
CASE WHEN EXISTS (
SELECT *
FROM dbo.WorkTime W
WHERE
R.EmpID = W.EmpID
AND R.StartTime < W.EndTime
AND W.StartTime < R.EndTime
) AND NOT EXISTS (
SELECT *
FROM dbo.BreakTime B
WHERE
R.EmpID = B.EmpID
AND R.StartTime < B.EndTime
AND B.StartTime < R.EndTime
) AND NOT EXISTS (
SELECT *
FROM dbo.Absence A
WHERE
R.EmpID = A.EmpID
AND R.StartTime < A.EndTime
AND A.StartTime < R.EndTime
) THEN 1 ELSE 0 END
) P (Present)
)
SELECT
EmpID,
StartTime = Min(StartTime),
EndTime = Max(EndTime)
FROM Presences
WHERE Present = 1
GROUP BY
EmpID,
Grp
ORDER BY
EmpID,
StartTime;------------
-----------------
---------------
-----------
--- ------ ------ ------------
---- ---- --- -------Context
StackExchange Database Administrators Q#31012, answer score: 3
Revisions (0)
No revisions yet.