patternsqlMinor
What are my highest activity streaks?
Viewed 0 times
streakswhatarehighestactivity
Problem
I have written the following query to figure out activity streaks on a per-user basis. I find it... Ugly... And would love to improve it!
Limitations
Those are explained as commented text at the very top of the query.
Problems
-
Can this be done without a loop?
-
Is there a better way to do this than #TempTables?
Bonus!
If it is possible to do this for multiple users, e.g., a top 10 or top 50, I'll award a 100 rep bounty to whoever solves it first!
```
/*
There are certain limitations to the usefulness of this query.
It cannot track if a user has just visited the site without "significant" activity
"Significant" would be defined as a Comment, Post, Suggest edit, Approve or Reject edit, Vote on suggested edit, Vote on a post
Also, Up-votes and Down-votes on Posts are anonymous, as such they cannot be included in the query
*/
DECLARE @UserId INT;
SET @UserId = ##UserId##;
-- Temp table to hold all relevant activity dates
CREATE TABLE #UserActiveDays
(
UserId INTEGER
, ActivityDate DATE
, Source VARCHAR(50)
);
INSERT INTO #UserActiveDays
(
UserId
, ActivityDate
, Source
)
SELECT DISTINCT
OwnerUserId
, CAST(CreationDate AS DATE)
, 'PostCreation'
FROM Posts
WHERE OwnerUserId = @UserId
UNION
SELECT DISTINCT
UserId
, CAST(CreationDate AS DATE)
, 'CommentCreation'
FROM Comments
WHERE UserId = @UserId
UNION
SELECT DISTINCT
UserId
, CAST(CreationDate AS DATE)
, 'PostHistoryCreation'
FROM PostHistory
WHERE UserId = @UserId
UNION
SELECT DISTINCT
OwnerUserId
, CAST(CreationDate AS DATE)
, 'SuggestedEditCreation'
FROM SuggestedEdits
WHERE OwnerUserId = @UserId
UNION
SELECT DISTINCT
OwnerUserId
, CAST(ApprovalDate AS DATE)
, 'SuggestedEditApproval'
FROM SuggestedEdits
WHERE OwnerUserId = @UserId
UNION
SELECT DISTINCT
OwnerUserId
, CAST(RejectionDate AS DATE)
, 'SuggestedEditRejection'
FROM SuggestedEdits
WHERE OwnerUserId = @UserId
UNIO
Limitations
Those are explained as commented text at the very top of the query.
Problems
-
Can this be done without a loop?
-
Is there a better way to do this than #TempTables?
Bonus!
If it is possible to do this for multiple users, e.g., a top 10 or top 50, I'll award a 100 rep bounty to whoever solves it first!
```
/*
There are certain limitations to the usefulness of this query.
It cannot track if a user has just visited the site without "significant" activity
"Significant" would be defined as a Comment, Post, Suggest edit, Approve or Reject edit, Vote on suggested edit, Vote on a post
Also, Up-votes and Down-votes on Posts are anonymous, as such they cannot be included in the query
*/
DECLARE @UserId INT;
SET @UserId = ##UserId##;
-- Temp table to hold all relevant activity dates
CREATE TABLE #UserActiveDays
(
UserId INTEGER
, ActivityDate DATE
, Source VARCHAR(50)
);
INSERT INTO #UserActiveDays
(
UserId
, ActivityDate
, Source
)
SELECT DISTINCT
OwnerUserId
, CAST(CreationDate AS DATE)
, 'PostCreation'
FROM Posts
WHERE OwnerUserId = @UserId
UNION
SELECT DISTINCT
UserId
, CAST(CreationDate AS DATE)
, 'CommentCreation'
FROM Comments
WHERE UserId = @UserId
UNION
SELECT DISTINCT
UserId
, CAST(CreationDate AS DATE)
, 'PostHistoryCreation'
FROM PostHistory
WHERE UserId = @UserId
UNION
SELECT DISTINCT
OwnerUserId
, CAST(CreationDate AS DATE)
, 'SuggestedEditCreation'
FROM SuggestedEdits
WHERE OwnerUserId = @UserId
UNION
SELECT DISTINCT
OwnerUserId
, CAST(ApprovalDate AS DATE)
, 'SuggestedEditApproval'
FROM SuggestedEdits
WHERE OwnerUserId = @UserId
UNION
SELECT DISTINCT
OwnerUserId
, CAST(RejectionDate AS DATE)
, 'SuggestedEditRejection'
FROM SuggestedEdits
WHERE OwnerUserId = @UserId
UNIO
Solution
General
There are a few issues in here that cause inaccurate data, as well as some inefficiencies.
SuggestedEdits requires a special section....
You have three sections for SuggestedEdits. One uses the Creation date, the others use the Approved and Rejected dates. The approval and rejection are not activities by the Owner of the suggested edit, but you are counting them as such.... Only the CreationDate indicates activity by the OwnerUserId in the SuggestedEdit.
The approval and rejection activity are recorded in the SuggestedEditVotes table (which you process from a different section).
Note, you are missing 'accepts an answer' which can be calculated from the Post/Votes combination.
As for your 'additional questions'....
Without the temp table
Yes, this can be done without the temp table, and without the loop.
You can do it all as a single query, if you use CTE expressions to hold the data.
With a CTE containing the 'activity' data, you can find the first day in a streak (Using a self-join, does the user have activity on the previous day, if they do not, then it's the first day of a streak)
Once you have identified the first day, you can also identify which activity belongs in which streak (the most recent start before the current activity must be the start...)
Finally, the streak is as long as the furthest date that belongs to each streak....
As it happens, this is efficient enough to be run for all users on the whole of Code Review.... the query is (Here in SEDE):
There are a few issues in here that cause inaccurate data, as well as some inefficiencies.
- When a post is created it is also registered in PostHistory (three times, actually) There is no need to check for post creation at all since you already check for the same activity in the history.
- You populate the Source column, but never use it. SQL is in large part dependent on data volume, and you are 'shifting' more data than you need to.
- Your VoteCreation section is OK, but you realize you only pull out bounties and favourites from that, right (VoteType 5 and 8)? Other votes are anonymized, and are not possible to find...
- Somehow you have 0-day streaks. How does that make sense?
SuggestedEdits requires a special section....
You have three sections for SuggestedEdits. One uses the Creation date, the others use the Approved and Rejected dates. The approval and rejection are not activities by the Owner of the suggested edit, but you are counting them as such.... Only the CreationDate indicates activity by the OwnerUserId in the SuggestedEdit.
The approval and rejection activity are recorded in the SuggestedEditVotes table (which you process from a different section).
Note, you are missing 'accepts an answer' which can be calculated from the Post/Votes combination.
As for your 'additional questions'....
Without the temp table
Yes, this can be done without the temp table, and without the loop.
You can do it all as a single query, if you use CTE expressions to hold the data.
With a CTE containing the 'activity' data, you can find the first day in a streak (Using a self-join, does the user have activity on the previous day, if they do not, then it's the first day of a streak)
Once you have identified the first day, you can also identify which activity belongs in which streak (the most recent start before the current activity must be the start...)
Finally, the streak is as long as the furthest date that belongs to each streak....
As it happens, this is efficient enough to be run for all users on the whole of Code Review.... the query is (Here in SEDE):
declare @userid integer = ##UserId:int?-1##;
with ActiveDays as (
select Distinct
UserId as UserId,
Convert(Date, CreationDate) as Day
from PostHistory
UNION
select Distinct
UserId as UserId,
Convert(Date, CreationDate) as Day
from Comments
UNION
select Distinct
OwnerUserId as UserId,
convert(Date, CreationDate) as Day
from SuggestedEdits
UNION
select distinct
Q.OwnerUserId as UserId,
Convert(Date, V.CreationDate) as Day
from Posts Q inner join Votes V
on Q.AcceptedAnswerId = V.PostId
and V.VoteTypeId = 1
UNION
select distinct
UserId as UserId,
Convert(Date, CreationDate) as Day
from SuggestedEditVotes
), StreakStarts as (
select UserId, Start.Day
from ActiveDays Start
where not exists (
select 1
from ActiveDays
where Start.UserId = UserId
and DateAdd(dd, -1, Start.Day) = Day
)
), StreakMerge as (
select A.UserId, A.Day, Max(S.Day) as Start
from ActiveDays as A inner join StreakStarts as S
on A.UserId = S.UserId
and A.Day >= S.Day
group by A.UserId, A.Day
), LongStreaks as (
select UserId, Start, 1 + max (DateDiff(dd, Start, Day)) as Streak
from StreakMerge
group by UserId, Start
)
Select U.Id as [User Link], U.DisplayName, Start, S.Streak
from Users as U inner join LongStreaks as S
on U.Id = S.UserId
where S.Streak > 1
and (@userid = -1 OR U.Id = @userid)
order by S.Streak descCode Snippets
declare @userid integer = ##UserId:int?-1##;
with ActiveDays as (
select Distinct
UserId as UserId,
Convert(Date, CreationDate) as Day
from PostHistory
UNION
select Distinct
UserId as UserId,
Convert(Date, CreationDate) as Day
from Comments
UNION
select Distinct
OwnerUserId as UserId,
convert(Date, CreationDate) as Day
from SuggestedEdits
UNION
select distinct
Q.OwnerUserId as UserId,
Convert(Date, V.CreationDate) as Day
from Posts Q inner join Votes V
on Q.AcceptedAnswerId = V.PostId
and V.VoteTypeId = 1
UNION
select distinct
UserId as UserId,
Convert(Date, CreationDate) as Day
from SuggestedEditVotes
), StreakStarts as (
select UserId, Start.Day
from ActiveDays Start
where not exists (
select 1
from ActiveDays
where Start.UserId = UserId
and DateAdd(dd, -1, Start.Day) = Day
)
), StreakMerge as (
select A.UserId, A.Day, Max(S.Day) as Start
from ActiveDays as A inner join StreakStarts as S
on A.UserId = S.UserId
and A.Day >= S.Day
group by A.UserId, A.Day
), LongStreaks as (
select UserId, Start, 1 + max (DateDiff(dd, Start, Day)) as Streak
from StreakMerge
group by UserId, Start
)
Select U.Id as [User Link], U.DisplayName, Start, S.Streak
from Users as U inner join LongStreaks as S
on U.Id = S.UserId
where S.Streak > 1
and (@userid = -1 OR U.Id = @userid)
order by S.Streak descContext
StackExchange Code Review Q#74822, answer score: 5
Revisions (0)
No revisions yet.