HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Upvote distribution SEDE query

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
distributionsedequeryupvote

Problem

There is a 1-to-many relationship between the posts table and the votes table, and they join on posts.id = votes.postid. I would like to know how many posts have 1, 2, 3, 5, etc. votes.

Thanks to Waffles' most controversial posts query, it was trivial to write this:

declare @VoteStats table (PostId int, up int) 

set nocount on

insert @VoteStats
select
    PostId, 
    up = sum(case when VoteTypeId = 2 then 1 else 0 end)
from Votes
where VoteTypeId in (2,3)
group by PostId

set nocount off

select up, count(*)
from @VoteStats
group by up;


However, it runs in a full 22 seconds on the Stack Overflow data dump.

Am I overlooking an obvious way to speed this query up, perhaps avoiding the creation of the VoteStats table?

Solution

Not creating the VoteStats table makes a big difference. Gets it down to ~5.5 seconds.

set nocount on

select up, count(*)
from (
    select
        PostId,
        up = sum(case when VoteTypeId = 2 then 1 else 0 end)
    from Votes
    where VoteTypeId in (2,3)
    group by PostId
) a
group by up
order by up


http://data.stackexchange.com/stackoverflow/q/112108/

From the execution plan of the table variable version, it looks like the table scan and the sort on it is pretty expensive.

Code Snippets

set nocount on

select up, count(*)
from (
    select
        PostId,
        up = sum(case when VoteTypeId = 2 then 1 else 0 end)
    from Votes
    where VoteTypeId in (2,3)
    group by PostId
) a
group by up
order by up

Context

StackExchange Code Review Q#4635, answer score: 2

Revisions (0)

No revisions yet.