HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Cluster date time values

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
valuestimedatecluster

Problem

I came up with this TSQL for SQL Server 2008:

DECLARE @TimeWindowInSeconds INT

SET @TimeWindowInSeconds = 10

IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
    DROP TABLE #Temp
CREATE TABLE #Temp
    (
      Class INT,
      DT DateTime
    )

INSERT INTO #Temp (Class, DT)
SELECT 1, '2014-11-05 10:55:00'
    UNION ALL
SELECT 1, '2014-11-05 10:55:10'
    UNION ALL
SELECT 1, '2014-11-05 11:55:10'
    UNION ALL
SELECT 2, '2014-11-05 10:55:11'
    UNION ALL
SELECT 2, '2014-11-05 13:56:10'

;WITH CTE1 AS
(
    SELECT
        Class,
        DT,
        ROW_NUMBER() OVER (ORDER BY Class, DT ) AS RowNumber
    FROM #Temp 
)
,CTE2 AS
(
    -- A is the successor 
    SELECT
        A.Class,
        A.RowNumber,
        B.RowNumber AS RowNumber1,
        A.DT,
        B.DT AS DT2,
        DATEDIFF(second, B.DT, A.DT) AS DifferenceInSeconds,
        CASE WHEN B.DT IS NULL THEN 1 END z
    FROM CTE1 AS A
    LEFT OUTER JOIN CTE1 AS B ON A.RowNumber = B.RowNumber + 1 AND A.Class = B.Class 
    AND DATEDIFF(second, B.DT, A.DT)  1
    )


This clusters datetime values inside groups (class) if they fall inside the same time window and if the cluster contains more than 1 entry. Please review the code.

Solution

I agree with the other feedback given here so far; I have two three suggestions of my own, two minor and one more significant.

Minor point (1)

Initial assignation to a variable can be done in one line, so instead of:

DECLARE @TimeWindowInSeconds INT

SET @TimeWindowInSeconds = 10


you could just write:

DECLARE @TimeWindowInSeconds INT = 10


Minor point (2)

I am not a fan of selecting columns in intermediate rowsets that are never used subsequently. They add clutter and unnecessarily make the reader spend time understanding or visualising them.

CTE1 selects just three columns, all of which are necessary for the join in CTE2: this is good.

However, CTE2 then selects seven columns, of which only A.Class, A.DT and the expression aliased as z are required by CTE3 or presented in the final output.

I suspect that you put these columns in whilst writing CTE2, because the others are all used in the join within that CTE. This is fine for work-in-progress, but these columns should be removed from the finished query. They don't need to be in the SELECT list just because they're used in the JOIN.

The bigger one

Your code uses both common table expressions (CTEs) and subqueries. For me, this is quite a painful inconsistency. CTEs were effectively an evolution of subqueries in the ANSI SQL standard; mixing both in one query feels to me like designing an aeroplane that has both propellers and a jet engine.

Aside: please don't read too far into that analogy: this answer is not about their relative performance, nor am I claiming that one is made redundant by the other. People still use prop planes, after all.

Unless performance is particularly important, and you know for certain that subqueries perform measurably better on your server, with your data, I recommend sticking with CTEs alone:

  • The syntax is easier to follow



  • They mimic step-by-step problem solving, walking your reader through the solution steadily



  • Well-chosen CTE names are tantamount to good comments, explaining your rationale



  • They can make code shorter because you can re-use them



In your particular case, I think the SubGroup subquery will need to become a separate CTE between CTE2 and CTE3 (you'd then join to that CTE in the current CTE3, using c.DT 1.

This also removes altogether the need to CAST and concatenate Class and Subgroup, since you're not including that concatenation in your result set.

Even if you didn't want to change the WHERE...IN ( subquery ) part, you could simplify things by adding an extra, final CTE in which you do the casting and concatenation of Class and Subgroup just once, give it an alias, and then do

WHERE alias IN ( SELECT alias FROM cte3 GROUP BY ... )

Code Snippets

DECLARE @TimeWindowInSeconds INT

SET @TimeWindowInSeconds = 10
DECLARE @TimeWindowInSeconds INT = 10
COUNT(*) OVER ( PARTITION BY Class, Subgroup ) as CountBy_Cls_Sgp
WHERE alias IN ( SELECT alias FROM cte3 GROUP BY ... )

Context

StackExchange Code Review Q#68978, answer score: 2

Revisions (0)

No revisions yet.