patternsqlMinor
Who are these lurkers?
Viewed 0 times
arewholurkersthese
Problem
Last night in The 2nd Monitor there was a discussion about postless users - I wanted to see how many there are, whether they're one-timers that showed up once, registered and never came back, and whether they vote at all.
So I whipped up this query on SEDE, showing voters and their post count, as well as other information:
Returns 26740 rows returned in 211 ms (cached). Is there any way it could be improved, performance or otherwise?
Postless Users query on SEDE.
So I whipped up this query on SEDE, showing voters and their post count, as well as other information:
with sub as (
select distinct
Users.Id
,Users.DisplayName
,Users.CreationDate
,Users.LastAccessDate
,datediff(day, cast(Users.CreationDate as date), cast(Users.LastAccessDate as date)) MembershipDays
,case when cast(Users.CreationDate as date) = cast(Users.LastAccessDate as date) then 1 else 0 end IsOneTimer
,sum(case when Posts.Id is null then 0 else 1 end) over (partition by Users.Id) PostCount
,Users.Reputation
,Users.UpVotes + Users.DownVotes TotalVotes
,Users.UpVotes
,Users.DownVotes
from Users
left join Posts on Users.Id = Posts.OwnerUserId
where Users.Id > 0
), agg as (
select
Id
,DisplayName
,CreationDate
,LastAccessDate
,IsOneTimer
,MembershipDays
,case when MembershipDays = 0 then TotalVotes else TotalVotes / cast(MembershipDays as decimal) end VotesPerDay
,PostCount
,Reputation
,TotalVotes
,UpVotes
,DownVotes
from sub
where TotalVotes > 0)
select
Id
,DisplayName
,CreationDate
,LastAccessDate
,IsOneTimer
,MembershipDays
,round(VotesPerDay, 3) VotesPerDay
,PostCount
,Reputation
,TotalVotes
,UpVotes
,DownVotes
from agg
order by
PostCount
,cast(VotesPerDay as int) desc
,cast(LastAccessDate as date) desc
,TotalVotes desc
,Reputation
,IsOneTimer
,CreationDateReturns 26740 rows returned in 211 ms (cached). Is there any way it could be improved, performance or otherwise?
Postless Users query on SEDE.
Solution
Output
It would be nice if the usernames were clickable. You can achieve that using the magic
It's weird that you
Implementation
The names of your CTEs,
Your
Suggested solution
It would be nice if the usernames were clickable. You can achieve that using the magic
[User Link] column. It might make the query fail to complete on a large site like Stack Overflow, though.It's weird that you
cast(VotesPerDay as int) and cast(LastAccessDate as date) for sorting. Why not sort by VotesPerDay (without rounding) and LastAccessDate directly?Implementation
The names of your CTEs,
sub and agg, don't mean anything to me.select distinct should be replaced with a smarter join wherever possible. If you look at the execution plan, you'll see that it estimates that there are 3 million rows, when there are actually just 26740 Code Review users. The estimated subtree cost is 1048. In comparison, my query below has an estimated subtree cost of 5.7 and 27580 rows for Code Review, and those numbers scale depending on the size of the site.Your
agg CTE has a TotalVotes > 0 as a WHERE condition. It would probably be more efficient to test for (UpVotes > 0 OR DownVotes > 0), because that has a possibility to take advantage of an index (though we don't know whether the Users table actually has such an index in SEDE).Suggested solution
WITH FilteredUsers AS (
SELECT *
, DATEDIFF(day, CAST(CreationDate AS DATE), CAST(LastAccessDate AS DATE)) AS MembershipDays
FROM Users
WHERE
Id > 0 AND
(UpVotes > 0 OR DownVotes > 0)
), PostCounts AS (
SELECT OwnerUserId
, COUNT(*) AS PostCount
FROM Posts
GROUP BY OwnerUserId
)
SELECT Id
, DisplayName
, CreationDate
, LastAccessDate
, CASE WHEN MembershipDays = 0 THEN 1 ELSE 0 END AS IsOneTimer
, MembershipDays
, ROUND((UpVotes + DownVotes) / CASE WHEN MembershipDays = 0 THEN 1 ELSE CAST(MembershipDays AS DECIMAL) END, 3) AS VotesPerDay
, COALESCE(PostCount, 0) AS PostCount
, Reputation
, UpVotes + DownVotes AS TotalVotes
, UpVotes
, DownVotes
FROM FilteredUsers
LEFT OUTER JOIN PostCounts
ON FilteredUsers.Id = PostCounts.OwnerUserId
ORDER BY PostCount
, VotesPerDay DESC
, LastAccessDate DESC
, TotalVotes DESC
, Reputation
, IsOneTimer
, CreationDate;Code Snippets
WITH FilteredUsers AS (
SELECT *
, DATEDIFF(day, CAST(CreationDate AS DATE), CAST(LastAccessDate AS DATE)) AS MembershipDays
FROM Users
WHERE
Id > 0 AND
(UpVotes > 0 OR DownVotes > 0)
), PostCounts AS (
SELECT OwnerUserId
, COUNT(*) AS PostCount
FROM Posts
GROUP BY OwnerUserId
)
SELECT Id
, DisplayName
, CreationDate
, LastAccessDate
, CASE WHEN MembershipDays = 0 THEN 1 ELSE 0 END AS IsOneTimer
, MembershipDays
, ROUND((UpVotes + DownVotes) / CASE WHEN MembershipDays = 0 THEN 1 ELSE CAST(MembershipDays AS DECIMAL) END, 3) AS VotesPerDay
, COALESCE(PostCount, 0) AS PostCount
, Reputation
, UpVotes + DownVotes AS TotalVotes
, UpVotes
, DownVotes
FROM FilteredUsers
LEFT OUTER JOIN PostCounts
ON FilteredUsers.Id = PostCounts.OwnerUserId
ORDER BY PostCount
, VotesPerDay DESC
, LastAccessDate DESC
, TotalVotes DESC
, Reputation
, IsOneTimer
, CreationDate;Context
StackExchange Code Review Q#147339, answer score: 4
Revisions (0)
No revisions yet.