HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Selecting Distinct Rows but Counting All Rows

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
distinctrowscountingallbutselecting

Problem

I have the following table:

----------------------------------------------
| ID | interestingData |      timestamp      |
----------------------------------------------
|  1 |       400       | 2016-01-23 17:01:00 |
----------------------------------------------
|  1 |       400       | 2016-01-24 17:01:00 |
----------------------------------------------
|  1 |       350       | 2016-01-25 17:01:00 |
----------------------------------------------
|  2 |       23        | 2016-01-23 17:01:00 | 
----------------------------------------------
|  2 |       34        | 2016-01-24 17:01:00 | 
----------------------------------------------
|  2 |       12        | 2016-01-25 17:01:00 | 
----------------------------------------------


Where our PK is (ID, timestamp). I'm attempting to determine a query that will give me the unique IDs and the latest interestingData for which interestingData exceeds a threshold. That would, of course, be done with:

SELECT DISTINCT ID
FROM table
WHERE interestingData > threshold
ORDER BY timestamp DESC;


However, I want the count of every occurrence where interestingData exceeded the threshold. My results table would ideally look like

------------------------------------------------------
| ID | interestingData | timestamp           | count |    
------------------------------------------------------
|  1 |        350      | 2016-01-25 17:01:00 |   3   |
------------------------------------------------------


Were my threshold 300. I am aware that if you want to pair something distinct with a set of data then a left outer join is going to be in order, but I'm not entirely sure how to go about it. This is the closest I can think of so far.

SELECT DISTINCT ID
FROM table t1
LEFT OUTER JOIN table t2 ON t1.ID = t2.table.ID
WHERE interestingData > 300
ORDER BY timestamp DESC


This gets me the distinct IDs and pairs them with the rest of the data as I need, but no provisions for getting the other parts of the re

Solution

If you want the interestingData and timestamp from the same row (the most recent row that exceeds the threshold), and if you want to include all rows that exceed the threshold even if some rows for that ID don't meet the threshold, then:

;WITH x AS 
(
  SELECT ID, interestingData, [timestamp], 
    [count] = COUNT(1) OVER (PARTITION BY ID),
    rn = ROW_NUMBER() 
      OVER (PARTITION BY ID ORDER BY [timestamp] DESC)
  FROM dbo.tablename
  WHERE interestingData > 300
)
SELECT ID, interestingData, [timestamp], [count]
  FROM x
  WHERE rn = 1;


Also, try to avoid data types and/or reserved keywords as column names. timestamp is not a great choice because (a) it's not very meaningful and (b) it requires square brackets in a lot of scenarios.

Code Snippets

;WITH x AS 
(
  SELECT ID, interestingData, [timestamp], 
    [count] = COUNT(1) OVER (PARTITION BY ID),
    rn = ROW_NUMBER() 
      OVER (PARTITION BY ID ORDER BY [timestamp] DESC)
  FROM dbo.tablename
  WHERE interestingData > 300
)
SELECT ID, interestingData, [timestamp], [count]
  FROM x
  WHERE rn = 1;

Context

StackExchange Database Administrators Q#127973, answer score: 7

Revisions (0)

No revisions yet.