patternsqlMinor
Follow up to Top Active Answerers on Stack Exchange site
Viewed 0 times
answerersactivetopstackexchangefollowsite
Problem
This is a follow up to Top Active Answerers on a Stack Exchange site
Actual SEDE Query --> Top Active Answerers on site
Do I have a good understanding of the
Should I separate this into 2 queries or is having 2 rank columns an acceptable thing? If it is acceptable have I displayed it properly? or does it look messy?
The Ranking on those two rows seems really redundant because they are calculated fields.
The
Column 'Users.DisplayName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I will probably use it in the
Actual SEDE Query --> Top Active Answerers on site
Do I have a good understanding of the
RANK function? It seems pretty simple.Should I separate this into 2 queries or is having 2 rank columns an acceptable thing? If it is acceptable have I displayed it properly? or does it look messy?
--@DaysSinceActivity is the amount of days that you would like to scan for activity
--@NumberOfUsers is the number of users you want to return.
SELECT
TOP ##NumberOfUsers:int?1000##
Users.Id AS [User Link],
Users.DisplayName AS UserName,
COUNT(Posts.Id) AS [Total Answers],
RANK() OVER (ORDER BY (COUNT(Posts.Id)) DESC) AS [Total Answer Rank],
CAST(AVG(CAST(Score AS float)) AS NUMERIC(6,2)) AS [Average Answer Score],
RANK() OVER (ORDER BY (CAST(AVG(CAST(Score AS float)) AS NUMERIC(6,2))) DESC) AS [Avg Score Ranking]
FROM
Posts
INNER JOIN
Users ON Users.Id = OwnerUserId
WHERE
PostTypeId = 2
AND CommunityOwnedDate IS NULL
AND ClosedDate IS NULL
AND Users.LastAccessDate > DATEADD(DAY, -##DaysSinceActivity:int?31##, GETDATE())
GROUP BY
Users.Id, Users.DisplayName
HAVING
COUNT(Posts.Id) > 10
ORDER BY
[Average Answer Score] DESCThe Ranking on those two rows seems really redundant because they are calculated fields.
The
DisplayName in the GROUP BY is because I didn't know where else to put that, when I removed it I got an error sayingColumn 'Users.DisplayName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I will probably use it in the
ORDER BY statement, I think I can get away with that.Solution
This is a substantial improvement over the previous version. The code is consistent, I do not see any bugs, it's generally good.
There are two further opportunities for improvement that I can see.
-
I can tell you do this to get an accurate average, and round it to 2 decimal places. The concept is good:
but, there's a trick with float that simplifies things a bit. Consider:
I know it is small, but the implicit cast makes things easier to read.
-
You duplicate calculations once for the base value, then again for the rank, and again in the having clause. These complex aggregates can be simplified by using a CTE. Calculate the raw data in one CTE and do the calculations in that, then, use that raw data as part of the outer select. Describing it is harder than just doing it. Consider the following query which produces the same results as yours:
Note how the complex aggregates happen in just one place.
There are two further opportunities for improvement that I can see.
-
I can tell you do this to get an accurate average, and round it to 2 decimal places. The concept is good:
CAST(AVG(CAST(Score AS float)) AS NUMERIC(6,2))but, there's a trick with float that simplifies things a bit. Consider:
CAST(AVG(Score * 1.0) AS NUMERIC(6,2))I know it is small, but the implicit cast makes things easier to read.
-
You duplicate calculations once for the base value, then again for the rank, and again in the having clause. These complex aggregates can be simplified by using a CTE. Calculate the raw data in one CTE and do the calculations in that, then, use that raw data as part of the outer select. Describing it is harder than just doing it. Consider the following query which produces the same results as yours:
WITH RawData AS (
SELECT
Users.Id AS UserId,
Users.DisplayName AS UserName,
COUNT(Posts.Id) AS AnswerCount,
CAST(AVG(1.0 * Score) AS NUMERIC(6,2)) AS AvgScore
FROM
Posts
INNER JOIN
Users ON Users.Id = OwnerUserId
WHERE
PostTypeId = 2
AND CommunityOwnedDate IS NULL
AND ClosedDate IS NULL
AND Users.LastAccessDate > DATEADD(DAY, -##DaysSinceActivity:int?31##, GETDATE())
GROUP BY
Users.Id, Users.DisplayName
)
SELECT
TOP ##NumberOfUsers:int?1000##
UserId AS [User Link],
UserName AS [UserName],
AnswerCount AS [Total Answers],
RANK() OVER (ORDER BY AnswerCount DESC) AS [Total Answer Rank],
AvgScore AS [Average Answer Score],
RANK() OVER (ORDER BY AvgScore DESC) AS [Avg Score Ranking]
FROM
RawData
WHERE
AnswerCount > 10
ORDER BY
AvgScore DESCNote how the complex aggregates happen in just one place.
Code Snippets
CAST(AVG(CAST(Score AS float)) AS NUMERIC(6,2))CAST(AVG(Score * 1.0) AS NUMERIC(6,2))WITH RawData AS (
SELECT
Users.Id AS UserId,
Users.DisplayName AS UserName,
COUNT(Posts.Id) AS AnswerCount,
CAST(AVG(1.0 * Score) AS NUMERIC(6,2)) AS AvgScore
FROM
Posts
INNER JOIN
Users ON Users.Id = OwnerUserId
WHERE
PostTypeId = 2
AND CommunityOwnedDate IS NULL
AND ClosedDate IS NULL
AND Users.LastAccessDate > DATEADD(DAY, -##DaysSinceActivity:int?31##, GETDATE())
GROUP BY
Users.Id, Users.DisplayName
)
SELECT
TOP ##NumberOfUsers:int?1000##
UserId AS [User Link],
UserName AS [UserName],
AnswerCount AS [Total Answers],
RANK() OVER (ORDER BY AnswerCount DESC) AS [Total Answer Rank],
AvgScore AS [Average Answer Score],
RANK() OVER (ORDER BY AvgScore DESC) AS [Avg Score Ranking]
FROM
RawData
WHERE
AnswerCount > 10
ORDER BY
AvgScore DESCContext
StackExchange Code Review Q#63385, answer score: 6
Revisions (0)
No revisions yet.