HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlModerate

how to calculate difference of first row and last row fields value in each group

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
lasteachgroupfieldsvaluedifferencefirstcalculatehowand

Problem

I have table with structure like this:

+-------+------------------+
| Value |       Date       |
+-------+------------------+
|    10 | 10/10/2010 10:00 |
|    11 | 10/10/2010 10:15 |
|    15 | 10/10/2010 10:30 |
|    15 | 10/10/2010 10:45 |
|    17 | 10/10/2010 11:00 |
|    18 | 10/10/2010 11:15 |
|    22 | 10/10/2010 11:30 |
|    30 | 10/10/2010 11:45 |
+-------+------------------+


currently i'm using group by to get min,max,avg to take hourly reports like this:

+-----+-----+-------+------------------+
| min | max |  avg  |       Date       |
+-----+-----+-------+------------------+
|  10 |  15 | 12.75 | 10/10/2010 10:00 |
|  17 |  30 | 21.75 | 10/10/2010 11:00 |
+-----+-----+-------+------------------+


How i can calculate differences of last and first rows value in each group to generate something like:

+-----+-----+-------+------+------------------+
| min | max |  avg  | diff |       Date       |
+-----+-----+-------+------+------------------+
|  10 |  15 | 12.75 |    5 | 10/10/2010 10:00 |
|  17 |  30 | 21.75 |   13 | 10/10/2010 11:00 |
+-----+-----+-------+------+------------------+


Thanks.

Solution

You are not showing the query you are using to obtain the results without diff. I'm assuming it is something like this:

SELECT
  min  = MIN(Value),
  max  = MAX(Value),
  avg  = AVG(Value),  -- or, if Value is an int, like this, perhaps:
                      -- AVG(CAST(Value AS decimal(10,2))
  Date = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
GROUP BY
  DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
;


Also, you are not explaining what first and last mean. In this answer, it is assumed that first stands for earliest in the group (according to the Date value) and, similarly, last means latest in the group.

One way to throw in diff could be like this:

First, add two more aggregated columns, minDate and maxDate, to the original query:

SELECT
  min     = MIN(Value),
  max     = MAX(Value),
  avg     = AVG(Value),
  minDate = MIN(Date),
  maxDate = MAX(Date),
  Date    = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
GROUP BY
  DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
;


Next, join the aggregated result set back to the original table on minDate and on maxDate (separately) to access the corresponding Values:

SELECT
  g.min,
  g.max,
  g.avg,
  diff = last.Value - first.Value,
  g.Date
FROM (
  SELECT
    min     = MIN(Value),
    max     = MAX(Value),
    avg     = AVG(Value),
    minDate = MIN(Date),
    maxDate = MAX(Date),
    Date    = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
  FROM atable
  GROUP BY
    DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
) g
INNER JOIN atable first ON first.Date = g.minDate
INNER JOIN atable last  ON last .Date = g.maxDate
;


Note that the above assumes the Date values (at least those that happen to be first or last in their corresponding hours) to be unique, or you would get more than one row for some of the hours in the output.

An alternative, if you are on SQL Server 2005 or later version, could be to use window aggregate functions MIN() OVER (...) and MAX() OVER (...) to calculate Values corresponding to either minDate or maxDate, before aggregating all the results similarly to how you are probably doing it now. Here's what specifically I'm talking about:

WITH partitioned AS (
  SELECT
    Value,
    Date,
    GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
  FROM atable
)
, firstlast AS (
  SELECT
    Value,
    Date,
    GroupDate,
    FirstValue = CASE Date WHEN MIN(Date) OVER (PARTITION BY GroupDate) THEN Value END,
    LastValue  = CASE Date WHEN MAX(Date) OVER (PARTITION BY GroupDate) THEN Value END
  FROM partitioned
)
SELECT
  min  = MIN(Value),
  max  = MAX(Value),
  avg  = AVG(Value),  -- or, again, if Value is an int, cast it as a decimal or float
  diff = MAX(LastValue) - MIN(FirstValue),
  Date = GroupDate
FROM firstlast
GROUP BY
  GroupDate
;


As you can see, the first common table expression (CTE) merely returns all rows and adds a calculated column GroupDate, the one subsequently used for grouping/partitioning. So it essentially just assigns a name to the grouping expression, and that is done for better readability/maintainability of the entire query, as the column is later referenced more than once. This is what the first CTE produces:

+-------+------------------+------------------+
| Value |       Date       |    GroupDate     |
+-------+------------------+------------------+
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |
+-------+------------------+------------------+


The second CTE adds two more columns to the above result. It uses window aggregate functions MIN() OVER ... and MAX() OVER ... to match against Date, and where the match takes place, the corresponding Value is returned in a separate column, either FirstValue or LastValue:

+-------+------------------+------------------+------------+-----------+
| Value |       Date       |    GroupDate     | FirstValue | LastValue |
+-------+------------------+------------------+------------+-----------+
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |         10 |      NULL |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |       NULL |        15 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |         17 |      NULL |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |       NULL |      NULL |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |       NULL |      NULL |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |       NULL |        30 |
+-------+------------------+------------------+------------+-----------+


At this point,

Code Snippets

SELECT
  min  = MIN(Value),
  max  = MAX(Value),
  avg  = AVG(Value),  -- or, if Value is an int, like this, perhaps:
                      -- AVG(CAST(Value AS decimal(10,2))
  Date = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
GROUP BY
  DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
;
SELECT
  min     = MIN(Value),
  max     = MAX(Value),
  avg     = AVG(Value),
  minDate = MIN(Date),
  maxDate = MAX(Date),
  Date    = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
GROUP BY
  DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
;
SELECT
  g.min,
  g.max,
  g.avg,
  diff = last.Value - first.Value,
  g.Date
FROM (
  SELECT
    min     = MIN(Value),
    max     = MAX(Value),
    avg     = AVG(Value),
    minDate = MIN(Date),
    maxDate = MAX(Date),
    Date    = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
  FROM atable
  GROUP BY
    DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
) g
INNER JOIN atable first ON first.Date = g.minDate
INNER JOIN atable last  ON last .Date = g.maxDate
;
WITH partitioned AS (
  SELECT
    Value,
    Date,
    GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
  FROM atable
)
, firstlast AS (
  SELECT
    Value,
    Date,
    GroupDate,
    FirstValue = CASE Date WHEN MIN(Date) OVER (PARTITION BY GroupDate) THEN Value END,
    LastValue  = CASE Date WHEN MAX(Date) OVER (PARTITION BY GroupDate) THEN Value END
  FROM partitioned
)
SELECT
  min  = MIN(Value),
  max  = MAX(Value),
  avg  = AVG(Value),  -- or, again, if Value is an int, cast it as a decimal or float
  diff = MAX(LastValue) - MIN(FirstValue),
  Date = GroupDate
FROM firstlast
GROUP BY
  GroupDate
;
+-------+------------------+------------------+
| Value |       Date       |    GroupDate     |
+-------+------------------+------------------+
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |
+-------+------------------+------------------+

Context

StackExchange Database Administrators Q#28482, answer score: 13

Revisions (0)

No revisions yet.