HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Aggregating Conditional Sums

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
aggregatingconditionalsums

Problem

I have recently forked/rewritten a SEDE query (here) that aims at figuring out where a site stands in terms of avid users and distribution of reputation scores, compared to a specific target (number of avid users in each major rep cluster).

declare @target_20k float
declare @target_10k float
declare @target_3k float
declare @target_2k float

set @target_20k = 3.0
set @target_10k = 10.0
set @target_3k = 80.0
set @target_2k = 120.0

/* only look at users that visited the site recently */
declare @lastActiveDays int
set @lastActiveDays = 60

select [20K], [10K], [3K], [2K], 
       [TotalSiteRep], 
       [UsersCount],
       round([TotalSiteRep]/cast([UsersCount] as float),2) [AvgRep],
       [AvidUsersRep],
       round([AvidUsersRep]/cast([TotalSiteRep] as float),2) [%AvidRep],
       [AvidUsersCount],
       round([AvidUsersCount]/cast([UsersCount] as float),2) [%AvidUsers],
       round([AvidUsersRep]/cast(AvidUsersCount as float),2) [AvgAvidRep]
from (

select round(sum([20K])/@target_20k, 2) [20K],
       round(sum([10K])/@target_10k,2) [10K],
       round(sum([3K])/@target_3k,2) [3K],
       round(sum([2K])/@target_2k,2) [2K],
       sum(Reputation) [TotalSiteRep],
       count(*) [UsersCount],
       sum(AvidRep) [AvidUsersRep],
       sum(AvidUser) [AvidUsersCount]
from (
  select Reputation,
    case when Reputation >= 20000 then 1 else 0 end [20K],
    case when Reputation >= 10000 then 1 else 0 end [10K],
    case when Reputation >= 3000 then 1 else 0 end [3K],
    case when Reputation >= 2000 then 1 else 0 end [2K],
    case when Reputation >= 150 then Reputation else 0 end [AvidRep],
    case when Reputation >= 150 then 1 else 0 end [AvidUser]
  from Users
  where datediff(d, LastAccessDate, getdate()) <= @lastActiveDays
) q
) r


I'm not crazy about the case when ... then .. else ... end part and about the sub-querying, but it seems to get the job done.

Is there a better way to aggregate conditional sums? (feel free to review everything else,

Solution

It's a good habit to end each statement with an explicit semicolon.

The innermost query q generates one row for each recent user. Since you have no interest in individual users, and only want counts and sums, you can collapse the two innermost queries q and r — both poorly named, by the way — into one. Use COUNT() for counting.

The lastActiveDays filtering is being done inefficiently:

where datediff(d, LastAccessDate, getdate()) <= @lastActiveDays


To accomplish that, the server would have to run datediff() on every single row in the Users table. Instead, you should calculate the cutoff date just once:

where LastAccessDate >= cast(dateadd(day, -@lastActiveDays, getdate()) as Date)


(The cast truncates the result to midnight.)

I prefer to use Common Table Expressions for readability. Since you are not running a correlated subquery, a top-down layout is better than nesting.

The UserStats CTE below is roughly equivalent to subquery q. I've deferred the … / @target… divisions for readability and rearranged the columns in a more logical order. It could be written as just one SELECT, but I think that splitting it up with a CTE makes it easier to understand.

declare @target_20k float;
declare @target_10k float;
declare @target_3k float;
declare @target_2k float;

set @target_20k = 3.0;
set @target_10k = 10.0;
set @target_3k = 80.0;
set @target_2k = 120.0;

declare @lastActiveDays int;
set @lastActiveDays = 60;

with UserStats as (
  select
      count(case when Reputation >= 20000 then 1 else null end) [20KUsersCount],
      count(case when Reputation >= 10000 then 1 else null end) [10KUsersCount],
      count(case when Reputation >=  3000 then 1 else null end) [3KUsersCount],
      count(case when Reputation >=  2000 then 1 else null end) [2KUsersCount],
      count(case when Reputation >=   150 then 1 else null end) [AvidUsersCount],
      count(Id) [UsersCount],
      sum(case when Reputation >= 150 then Reputation else 0 end) [AvidUsersRep],
      sum(Reputation) [TotalSiteRep]
    from Users
    where LastAccessDate >= cast(dateadd(day, -@lastActiveDays, getdate()) as Date)
)
select
    round([20KUsersCount] / @target_20k, 2) [20K],
    round([10KUsersCount] / @target_10k, 2) [10K],
    round([3KUsersCount]  / @target_3k,  2) [3K],
    round([2KUsersCount]  / @target_2k,  2) [2K],
    [TotalSiteRep],
    [UsersCount],
    round([TotalSiteRep] / cast([UsersCount] as float), 2) [AvgRep],
    [AvidUsersRep],
    round([AvidUsersRep] / cast([TotalSiteRep] as float), 2) [%AvidRep],
    [AvidUsersCount],
    round([AvidUsersCount] / cast([UsersCount] as float), 2) [%AvidUsers],
    round([AvidUsersRep] / cast(AvidUsersCount as float), 2) [AvgAvidRep]
  from UserStats;

Code Snippets

where datediff(d, LastAccessDate, getdate()) <= @lastActiveDays
where LastAccessDate >= cast(dateadd(day, -@lastActiveDays, getdate()) as Date)
declare @target_20k float;
declare @target_10k float;
declare @target_3k float;
declare @target_2k float;

set @target_20k = 3.0;
set @target_10k = 10.0;
set @target_3k = 80.0;
set @target_2k = 120.0;

declare @lastActiveDays int;
set @lastActiveDays = 60;

with UserStats as (
  select
      count(case when Reputation >= 20000 then 1 else null end) [20KUsersCount],
      count(case when Reputation >= 10000 then 1 else null end) [10KUsersCount],
      count(case when Reputation >=  3000 then 1 else null end) [3KUsersCount],
      count(case when Reputation >=  2000 then 1 else null end) [2KUsersCount],
      count(case when Reputation >=   150 then 1 else null end) [AvidUsersCount],
      count(Id) [UsersCount],
      sum(case when Reputation >= 150 then Reputation else 0 end) [AvidUsersRep],
      sum(Reputation) [TotalSiteRep]
    from Users
    where LastAccessDate >= cast(dateadd(day, -@lastActiveDays, getdate()) as Date)
)
select
    round([20KUsersCount] / @target_20k, 2) [20K],
    round([10KUsersCount] / @target_10k, 2) [10K],
    round([3KUsersCount]  / @target_3k,  2) [3K],
    round([2KUsersCount]  / @target_2k,  2) [2K],
    [TotalSiteRep],
    [UsersCount],
    round([TotalSiteRep] / cast([UsersCount] as float), 2) [AvgRep],
    [AvidUsersRep],
    round([AvidUsersRep] / cast([TotalSiteRep] as float), 2) [%AvidRep],
    [AvidUsersCount],
    round([AvidUsersCount] / cast([UsersCount] as float), 2) [%AvidUsers],
    round([AvidUsersRep] / cast(AvidUsersCount as float), 2) [AvgAvidRep]
  from UserStats;

Context

StackExchange Code Review Q#44353, answer score: 4

Revisions (0)

No revisions yet.