HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

What are my highest activity streaks?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
streakswhatarehighestactivity

Problem

I have written the following query to figure out activity streaks on a per-user basis. I find it... Ugly... And would love to improve it!
Limitations

Those are explained as commented text at the very top of the query.
Problems

-
Can this be done without a loop?

-
Is there a better way to do this than #TempTables?

Bonus!

If it is possible to do this for multiple users, e.g., a top 10 or top 50, I'll award a 100 rep bounty to whoever solves it first!

```
/*
There are certain limitations to the usefulness of this query.
It cannot track if a user has just visited the site without "significant" activity
"Significant" would be defined as a Comment, Post, Suggest edit, Approve or Reject edit, Vote on suggested edit, Vote on a post
Also, Up-votes and Down-votes on Posts are anonymous, as such they cannot be included in the query
*/

DECLARE @UserId INT;
SET @UserId = ##UserId##;

-- Temp table to hold all relevant activity dates
CREATE TABLE #UserActiveDays
(
UserId INTEGER
, ActivityDate DATE
, Source VARCHAR(50)
);

INSERT INTO #UserActiveDays
(
UserId
, ActivityDate
, Source
)
SELECT DISTINCT
OwnerUserId
, CAST(CreationDate AS DATE)
, 'PostCreation'
FROM Posts
WHERE OwnerUserId = @UserId
UNION
SELECT DISTINCT
UserId
, CAST(CreationDate AS DATE)
, 'CommentCreation'
FROM Comments
WHERE UserId = @UserId
UNION
SELECT DISTINCT
UserId
, CAST(CreationDate AS DATE)
, 'PostHistoryCreation'
FROM PostHistory
WHERE UserId = @UserId
UNION
SELECT DISTINCT
OwnerUserId
, CAST(CreationDate AS DATE)
, 'SuggestedEditCreation'
FROM SuggestedEdits
WHERE OwnerUserId = @UserId
UNION
SELECT DISTINCT
OwnerUserId
, CAST(ApprovalDate AS DATE)
, 'SuggestedEditApproval'
FROM SuggestedEdits
WHERE OwnerUserId = @UserId
UNION
SELECT DISTINCT
OwnerUserId
, CAST(RejectionDate AS DATE)
, 'SuggestedEditRejection'
FROM SuggestedEdits
WHERE OwnerUserId = @UserId
UNIO

Solution

General

There are a few issues in here that cause inaccurate data, as well as some inefficiencies.

  • When a post is created it is also registered in PostHistory (three times, actually) There is no need to check for post creation at all since you already check for the same activity in the history.



  • You populate the Source column, but never use it. SQL is in large part dependent on data volume, and you are 'shifting' more data than you need to.



  • Your VoteCreation section is OK, but you realize you only pull out bounties and favourites from that, right (VoteType 5 and 8)? Other votes are anonymized, and are not possible to find...



  • Somehow you have 0-day streaks. How does that make sense?



SuggestedEdits requires a special section....

You have three sections for SuggestedEdits. One uses the Creation date, the others use the Approved and Rejected dates. The approval and rejection are not activities by the Owner of the suggested edit, but you are counting them as such.... Only the CreationDate indicates activity by the OwnerUserId in the SuggestedEdit.

The approval and rejection activity are recorded in the SuggestedEditVotes table (which you process from a different section).

Note, you are missing 'accepts an answer' which can be calculated from the Post/Votes combination.

As for your 'additional questions'....
Without the temp table

Yes, this can be done without the temp table, and without the loop.

You can do it all as a single query, if you use CTE expressions to hold the data.

With a CTE containing the 'activity' data, you can find the first day in a streak (Using a self-join, does the user have activity on the previous day, if they do not, then it's the first day of a streak)

Once you have identified the first day, you can also identify which activity belongs in which streak (the most recent start before the current activity must be the start...)

Finally, the streak is as long as the furthest date that belongs to each streak....

As it happens, this is efficient enough to be run for all users on the whole of Code Review.... the query is (Here in SEDE):

declare @userid integer = ##UserId:int?-1##;

with ActiveDays as (

    select Distinct
           UserId as UserId,
           Convert(Date, CreationDate) as Day
    from   PostHistory
    
  UNION
  
    select Distinct
           UserId as UserId,
           Convert(Date, CreationDate) as Day
    from   Comments
    
  UNION
  
    select Distinct
           OwnerUserId as UserId,
           convert(Date, CreationDate) as Day
    from SuggestedEdits
    
  UNION
  
    select distinct
           Q.OwnerUserId as UserId,
           Convert(Date, V.CreationDate) as Day
    from Posts Q inner join Votes V
      on Q.AcceptedAnswerId = V.PostId
      and V.VoteTypeId = 1
      
  UNION
  
     select distinct
            UserId as UserId,
            Convert(Date, CreationDate) as Day
     from SuggestedEditVotes
    
), StreakStarts as (

    select UserId, Start.Day
    from ActiveDays Start
    where not exists (
        select 1
        from ActiveDays
        where Start.UserId = UserId
          and DateAdd(dd, -1, Start.Day) = Day
    )
    
), StreakMerge as (

    select A.UserId, A.Day, Max(S.Day) as Start
    from ActiveDays as A inner join StreakStarts as S
      on A.UserId = S.UserId
      and A.Day >= S.Day
    group by A.UserId, A.Day
    
), LongStreaks as (

    select UserId, Start, 1 + max (DateDiff(dd, Start, Day)) as  Streak
    from StreakMerge
    group by UserId, Start
)

Select U.Id as [User Link], U.DisplayName, Start, S.Streak
from Users as U inner join LongStreaks as S
  on U.Id = S.UserId
where S.Streak > 1
  and (@userid = -1 OR U.Id = @userid)
order by S.Streak desc

Code Snippets

declare @userid integer = ##UserId:int?-1##;

with ActiveDays as (

    select Distinct
           UserId as UserId,
           Convert(Date, CreationDate) as Day
    from   PostHistory
    
  UNION
  
    select Distinct
           UserId as UserId,
           Convert(Date, CreationDate) as Day
    from   Comments
    
  UNION
  
    select Distinct
           OwnerUserId as UserId,
           convert(Date, CreationDate) as Day
    from SuggestedEdits
    
  UNION
  
    select distinct
           Q.OwnerUserId as UserId,
           Convert(Date, V.CreationDate) as Day
    from Posts Q inner join Votes V
      on Q.AcceptedAnswerId = V.PostId
      and V.VoteTypeId = 1
      
  UNION
  
     select distinct
            UserId as UserId,
            Convert(Date, CreationDate) as Day
     from SuggestedEditVotes
    
), StreakStarts as (

    select UserId, Start.Day
    from ActiveDays Start
    where not exists (
        select 1
        from ActiveDays
        where Start.UserId = UserId
          and DateAdd(dd, -1, Start.Day) = Day
    )
    
), StreakMerge as (

    select A.UserId, A.Day, Max(S.Day) as Start
    from ActiveDays as A inner join StreakStarts as S
      on A.UserId = S.UserId
      and A.Day >= S.Day
    group by A.UserId, A.Day
    
), LongStreaks as (

    select UserId, Start, 1 + max (DateDiff(dd, Start, Day)) as  Streak
    from StreakMerge
    group by UserId, Start
)

Select U.Id as [User Link], U.DisplayName, Start, S.Streak
from Users as U inner join LongStreaks as S
  on U.Id = S.UserId
where S.Streak > 1
  and (@userid = -1 OR U.Id = @userid)
order by S.Streak desc

Context

StackExchange Code Review Q#74822, answer score: 5

Revisions (0)

No revisions yet.