patternsqlMinor
Aggregating Conditional Sums
Viewed 0 times
aggregatingconditionalsums
Problem
I have recently forked/rewritten a SEDE query (here) that aims at figuring out where a site stands in terms of avid users and distribution of reputation scores, compared to a specific target (number of avid users in each major rep cluster).
I'm not crazy about the
Is there a better way to aggregate conditional sums? (feel free to review everything else,
declare @target_20k float
declare @target_10k float
declare @target_3k float
declare @target_2k float
set @target_20k = 3.0
set @target_10k = 10.0
set @target_3k = 80.0
set @target_2k = 120.0
/* only look at users that visited the site recently */
declare @lastActiveDays int
set @lastActiveDays = 60
select [20K], [10K], [3K], [2K],
[TotalSiteRep],
[UsersCount],
round([TotalSiteRep]/cast([UsersCount] as float),2) [AvgRep],
[AvidUsersRep],
round([AvidUsersRep]/cast([TotalSiteRep] as float),2) [%AvidRep],
[AvidUsersCount],
round([AvidUsersCount]/cast([UsersCount] as float),2) [%AvidUsers],
round([AvidUsersRep]/cast(AvidUsersCount as float),2) [AvgAvidRep]
from (
select round(sum([20K])/@target_20k, 2) [20K],
round(sum([10K])/@target_10k,2) [10K],
round(sum([3K])/@target_3k,2) [3K],
round(sum([2K])/@target_2k,2) [2K],
sum(Reputation) [TotalSiteRep],
count(*) [UsersCount],
sum(AvidRep) [AvidUsersRep],
sum(AvidUser) [AvidUsersCount]
from (
select Reputation,
case when Reputation >= 20000 then 1 else 0 end [20K],
case when Reputation >= 10000 then 1 else 0 end [10K],
case when Reputation >= 3000 then 1 else 0 end [3K],
case when Reputation >= 2000 then 1 else 0 end [2K],
case when Reputation >= 150 then Reputation else 0 end [AvidRep],
case when Reputation >= 150 then 1 else 0 end [AvidUser]
from Users
where datediff(d, LastAccessDate, getdate()) <= @lastActiveDays
) q
) rI'm not crazy about the
case when ... then .. else ... end part and about the sub-querying, but it seems to get the job done.Is there a better way to aggregate conditional sums? (feel free to review everything else,
Solution
It's a good habit to end each statement with an explicit semicolon.
The innermost query
The
To accomplish that, the server would have to run
(The cast truncates the result to midnight.)
I prefer to use Common Table Expressions for readability. Since you are not running a correlated subquery, a top-down layout is better than nesting.
The
The innermost query
q generates one row for each recent user. Since you have no interest in individual users, and only want counts and sums, you can collapse the two innermost queries q and r — both poorly named, by the way — into one. Use COUNT() for counting.The
lastActiveDays filtering is being done inefficiently:where datediff(d, LastAccessDate, getdate()) <= @lastActiveDaysTo accomplish that, the server would have to run
datediff() on every single row in the Users table. Instead, you should calculate the cutoff date just once:where LastAccessDate >= cast(dateadd(day, -@lastActiveDays, getdate()) as Date)(The cast truncates the result to midnight.)
I prefer to use Common Table Expressions for readability. Since you are not running a correlated subquery, a top-down layout is better than nesting.
The
UserStats CTE below is roughly equivalent to subquery q. I've deferred the … / @target… divisions for readability and rearranged the columns in a more logical order. It could be written as just one SELECT, but I think that splitting it up with a CTE makes it easier to understand.declare @target_20k float;
declare @target_10k float;
declare @target_3k float;
declare @target_2k float;
set @target_20k = 3.0;
set @target_10k = 10.0;
set @target_3k = 80.0;
set @target_2k = 120.0;
declare @lastActiveDays int;
set @lastActiveDays = 60;
with UserStats as (
select
count(case when Reputation >= 20000 then 1 else null end) [20KUsersCount],
count(case when Reputation >= 10000 then 1 else null end) [10KUsersCount],
count(case when Reputation >= 3000 then 1 else null end) [3KUsersCount],
count(case when Reputation >= 2000 then 1 else null end) [2KUsersCount],
count(case when Reputation >= 150 then 1 else null end) [AvidUsersCount],
count(Id) [UsersCount],
sum(case when Reputation >= 150 then Reputation else 0 end) [AvidUsersRep],
sum(Reputation) [TotalSiteRep]
from Users
where LastAccessDate >= cast(dateadd(day, -@lastActiveDays, getdate()) as Date)
)
select
round([20KUsersCount] / @target_20k, 2) [20K],
round([10KUsersCount] / @target_10k, 2) [10K],
round([3KUsersCount] / @target_3k, 2) [3K],
round([2KUsersCount] / @target_2k, 2) [2K],
[TotalSiteRep],
[UsersCount],
round([TotalSiteRep] / cast([UsersCount] as float), 2) [AvgRep],
[AvidUsersRep],
round([AvidUsersRep] / cast([TotalSiteRep] as float), 2) [%AvidRep],
[AvidUsersCount],
round([AvidUsersCount] / cast([UsersCount] as float), 2) [%AvidUsers],
round([AvidUsersRep] / cast(AvidUsersCount as float), 2) [AvgAvidRep]
from UserStats;Code Snippets
where datediff(d, LastAccessDate, getdate()) <= @lastActiveDayswhere LastAccessDate >= cast(dateadd(day, -@lastActiveDays, getdate()) as Date)declare @target_20k float;
declare @target_10k float;
declare @target_3k float;
declare @target_2k float;
set @target_20k = 3.0;
set @target_10k = 10.0;
set @target_3k = 80.0;
set @target_2k = 120.0;
declare @lastActiveDays int;
set @lastActiveDays = 60;
with UserStats as (
select
count(case when Reputation >= 20000 then 1 else null end) [20KUsersCount],
count(case when Reputation >= 10000 then 1 else null end) [10KUsersCount],
count(case when Reputation >= 3000 then 1 else null end) [3KUsersCount],
count(case when Reputation >= 2000 then 1 else null end) [2KUsersCount],
count(case when Reputation >= 150 then 1 else null end) [AvidUsersCount],
count(Id) [UsersCount],
sum(case when Reputation >= 150 then Reputation else 0 end) [AvidUsersRep],
sum(Reputation) [TotalSiteRep]
from Users
where LastAccessDate >= cast(dateadd(day, -@lastActiveDays, getdate()) as Date)
)
select
round([20KUsersCount] / @target_20k, 2) [20K],
round([10KUsersCount] / @target_10k, 2) [10K],
round([3KUsersCount] / @target_3k, 2) [3K],
round([2KUsersCount] / @target_2k, 2) [2K],
[TotalSiteRep],
[UsersCount],
round([TotalSiteRep] / cast([UsersCount] as float), 2) [AvgRep],
[AvidUsersRep],
round([AvidUsersRep] / cast([TotalSiteRep] as float), 2) [%AvidRep],
[AvidUsersCount],
round([AvidUsersCount] / cast([UsersCount] as float), 2) [%AvidUsers],
round([AvidUsersRep] / cast(AvidUsersCount as float), 2) [AvgAvidRep]
from UserStats;Context
StackExchange Code Review Q#44353, answer score: 4
Revisions (0)
No revisions yet.