HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Follow up to Top Active Answerers on Stack Exchange site

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
answerersactivetopstackexchangefollowsite

Problem

This is a follow up to Top Active Answerers on a Stack Exchange site

Actual SEDE Query --> Top Active Answerers on site

Do I have a good understanding of the RANK function? It seems pretty simple.

Should I separate this into 2 queries or is having 2 rank columns an acceptable thing? If it is acceptable have I displayed it properly? or does it look messy?

--@DaysSinceActivity is the amount of days that you would like to scan for activity
--@NumberOfUsers is the number of users you want to return.

SELECT 
    TOP ##NumberOfUsers:int?1000##
    Users.Id AS [User Link],
    Users.DisplayName AS UserName,
    COUNT(Posts.Id) AS [Total Answers],
    RANK() OVER (ORDER BY (COUNT(Posts.Id)) DESC) AS [Total Answer Rank],
    CAST(AVG(CAST(Score AS float)) AS NUMERIC(6,2)) AS [Average Answer Score],
    RANK() OVER (ORDER BY (CAST(AVG(CAST(Score AS float)) AS NUMERIC(6,2))) DESC) AS [Avg Score Ranking]
FROM
    Posts
INNER JOIN
    Users ON Users.Id = OwnerUserId
WHERE 
    PostTypeId = 2 
    AND CommunityOwnedDate IS NULL 
    AND ClosedDate IS NULL
    AND Users.LastAccessDate > DATEADD(DAY, -##DaysSinceActivity:int?31##, GETDATE())
GROUP BY
    Users.Id, Users.DisplayName
HAVING
    COUNT(Posts.Id) > 10
ORDER BY
    [Average Answer Score] DESC


The Ranking on those two rows seems really redundant because they are calculated fields.

The DisplayName in the GROUP BY is because I didn't know where else to put that, when I removed it I got an error saying


Column 'Users.DisplayName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

I will probably use it in the ORDER BY statement, I think I can get away with that.

Solution

This is a substantial improvement over the previous version. The code is consistent, I do not see any bugs, it's generally good.

There are two further opportunities for improvement that I can see.

-
I can tell you do this to get an accurate average, and round it to 2 decimal places. The concept is good:

CAST(AVG(CAST(Score AS float)) AS NUMERIC(6,2))


but, there's a trick with float that simplifies things a bit. Consider:

CAST(AVG(Score * 1.0) AS NUMERIC(6,2))


I know it is small, but the implicit cast makes things easier to read.

-
You duplicate calculations once for the base value, then again for the rank, and again in the having clause. These complex aggregates can be simplified by using a CTE. Calculate the raw data in one CTE and do the calculations in that, then, use that raw data as part of the outer select. Describing it is harder than just doing it. Consider the following query which produces the same results as yours:

WITH RawData AS (

    SELECT 
        Users.Id AS UserId,
        Users.DisplayName AS UserName,
        COUNT(Posts.Id) AS AnswerCount,
        CAST(AVG(1.0 * Score) AS NUMERIC(6,2)) AS AvgScore
    FROM
        Posts
      INNER JOIN
        Users ON Users.Id = OwnerUserId
    WHERE 
        PostTypeId = 2 
        AND CommunityOwnedDate IS NULL 
        AND ClosedDate IS NULL
        AND Users.LastAccessDate > DATEADD(DAY, -##DaysSinceActivity:int?31##, GETDATE())
    GROUP BY
        Users.Id, Users.DisplayName

)
SELECT 
    TOP ##NumberOfUsers:int?1000##
    UserId AS [User Link],
    UserName AS [UserName],
    AnswerCount AS [Total Answers],
    RANK() OVER (ORDER BY AnswerCount DESC) AS [Total Answer Rank],
    AvgScore AS [Average Answer Score],
    RANK() OVER (ORDER BY AvgScore DESC) AS [Avg Score Ranking]
FROM
    RawData
WHERE
    AnswerCount > 10
ORDER BY
    AvgScore DESC


Note how the complex aggregates happen in just one place.

Code Snippets

CAST(AVG(CAST(Score AS float)) AS NUMERIC(6,2))
CAST(AVG(Score * 1.0) AS NUMERIC(6,2))
WITH RawData AS (

    SELECT 
        Users.Id AS UserId,
        Users.DisplayName AS UserName,
        COUNT(Posts.Id) AS AnswerCount,
        CAST(AVG(1.0 * Score) AS NUMERIC(6,2)) AS AvgScore
    FROM
        Posts
      INNER JOIN
        Users ON Users.Id = OwnerUserId
    WHERE 
        PostTypeId = 2 
        AND CommunityOwnedDate IS NULL 
        AND ClosedDate IS NULL
        AND Users.LastAccessDate > DATEADD(DAY, -##DaysSinceActivity:int?31##, GETDATE())
    GROUP BY
        Users.Id, Users.DisplayName

)
SELECT 
    TOP ##NumberOfUsers:int?1000##
    UserId AS [User Link],
    UserName AS [UserName],
    AnswerCount AS [Total Answers],
    RANK() OVER (ORDER BY AnswerCount DESC) AS [Total Answer Rank],
    AvgScore AS [Average Answer Score],
    RANK() OVER (ORDER BY AvgScore DESC) AS [Avg Score Ranking]
FROM
    RawData
WHERE
    AnswerCount > 10
ORDER BY
    AvgScore DESC

Context

StackExchange Code Review Q#63385, answer score: 6

Revisions (0)

No revisions yet.