patternsqlMinor
SE Data Explorer Query: Average score for questions and answers, by tag
Viewed 0 times
taganswersexplorerqueryquestionsaveragescoreforanddata
Problem
Being curious to see which tags on this site get the most attention, I developed a Stack Exchange Data Explorer query:
```
WITH CanonicalTags AS (
SELECT Id, TagName, Id AS AliasId, TagName AS Alias
FROM Tags
WHERE NOT EXISTS (
SELECT Id
FROM TagSynonyms
WHERE SourceTagName=TagName AND ApprovedByUserId IS NOT NULL
)
UNION
SELECT Tags.Id, Tags.TagName, Alias.Id AS AliasId, Alias.TagName
FROM Tags
INNER JOIN TagSynonyms
ON TargetTagName=TagName
INNER JOIN Tags AS Alias
ON Alias.TagName=SourceTagName
WHERE ApprovedByUserId IS NOT NULL
), TagQuestionStats AS (
SELECT Tags.Id AS Id
, Tags.TagName AS TagName
, AVG(CAST(Questions.Score AS DECIMAL)) AS AvgScore
, STDEV(Questions.Score) AS StdDevScore
, COUNT(Questions.Id) AS N
FROM CanonicalTags AS Tags
INNER JOIN PostTags
ON PostTags.TagId=Tags.AliasId
INNER JOIN Posts AS Questions
ON Questions.Id=PostTags.PostId
GROUP BY Tags.Id, Tags.TagName
), TagAnswerStats AS (
SELECT Tags.Id AS Id
, Tags.TagName AS TagName
, AVG(CAST(Answers.Score AS DECIMAL)) AS AvgScore
, STDEV(Answers.Score) AS StdDevScore
, COUNT(Answers.Id) AS N
FROM CanonicalTags AS Tags
INNER JOIN PostTags
ON PostTags.TagId=Tags.AliasId
INNER JOIN Posts AS Questions
ON Questions.Id=PostTags.PostId
INNER JOIN Posts AS Answers
ON Answers.ParentId=Questions.Id
WHERE
Answers.PostTypeId=2
GROUP BY Tags.Id, Tags.TagName
)
SELECT '' AS [TagName]
, ROUND(QStats.AvgScore, 2) AS [Avg Qst Score]
, ROUND(QStats.StdDevScore, 2) AS [Qst Score Std Dev]
, QStats.N AS [# Qst]
, ROUND(AStats.AvgScore, 2) AS [Avg Ans Score]
, ROUND(AStats.St
```
WITH CanonicalTags AS (
SELECT Id, TagName, Id AS AliasId, TagName AS Alias
FROM Tags
WHERE NOT EXISTS (
SELECT Id
FROM TagSynonyms
WHERE SourceTagName=TagName AND ApprovedByUserId IS NOT NULL
)
UNION
SELECT Tags.Id, Tags.TagName, Alias.Id AS AliasId, Alias.TagName
FROM Tags
INNER JOIN TagSynonyms
ON TargetTagName=TagName
INNER JOIN Tags AS Alias
ON Alias.TagName=SourceTagName
WHERE ApprovedByUserId IS NOT NULL
), TagQuestionStats AS (
SELECT Tags.Id AS Id
, Tags.TagName AS TagName
, AVG(CAST(Questions.Score AS DECIMAL)) AS AvgScore
, STDEV(Questions.Score) AS StdDevScore
, COUNT(Questions.Id) AS N
FROM CanonicalTags AS Tags
INNER JOIN PostTags
ON PostTags.TagId=Tags.AliasId
INNER JOIN Posts AS Questions
ON Questions.Id=PostTags.PostId
GROUP BY Tags.Id, Tags.TagName
), TagAnswerStats AS (
SELECT Tags.Id AS Id
, Tags.TagName AS TagName
, AVG(CAST(Answers.Score AS DECIMAL)) AS AvgScore
, STDEV(Answers.Score) AS StdDevScore
, COUNT(Answers.Id) AS N
FROM CanonicalTags AS Tags
INNER JOIN PostTags
ON PostTags.TagId=Tags.AliasId
INNER JOIN Posts AS Questions
ON Questions.Id=PostTags.PostId
INNER JOIN Posts AS Answers
ON Answers.ParentId=Questions.Id
WHERE
Answers.PostTypeId=2
GROUP BY Tags.Id, Tags.TagName
)
SELECT '' AS [TagName]
, ROUND(QStats.AvgScore, 2) AS [Avg Qst Score]
, ROUND(QStats.StdDevScore, 2) AS [Qst Score Std Dev]
, QStats.N AS [# Qst]
, ROUND(AStats.AvgScore, 2) AS [Avg Ans Score]
, ROUND(AStats.St
Solution
Your code is, for the most part, consistently styled, and predictably structured, which is nice. I am not a fan of the comma-on-the-new-line style of continuation. I prefer the comma at the end. But, at least you are consistent with this.
One inconsistency which I see is that for smaller selects you have multiple selected columns on a single line.
To be consistent with the other selects, as much as these are short statements, you should maintain consistency:
There is only one example of it in your code, but I really dislike multiple conditions on a single line:
This really should be:
The UNION condition is also very buried in there, and it has too much indentation ;-)
Finally, white-space is cheap in SQL, I really don't like the space-less
Right, that's the nit-picky stuff done.
Talking about the UNION, the statement could be rewritten as:
I prefer the more compact result, although it is debatable as to whether the "Outer" Join is a better solution than the UNION. Outer joins can lead to confusion. So can UNIONS and I prefer the outer join.
In the above query I have removed the
I also found the final 'assimilation' of all the data in the final select to have too many casts and data conversions. The actual logic was lost in the detail. I moved the complicating CASTS to the previous
Finally, the indexed
Putting this all together, I have the SQL:
This query supports this fork of your original question
One inconsistency which I see is that for smaller selects you have multiple selected columns on a single line.
SELECT Tags.Id, Tags.TagName, Alias.Id AS AliasId, Alias.TagName
....To be consistent with the other selects, as much as these are short statements, you should maintain consistency:
SELECT Tags.Id
, Tags.TagName
, Alias.Id AS AliasId
, Alias.TagName
....There is only one example of it in your code, but I really dislike multiple conditions on a single line:
WHERE SourceTagName=TagName AND ApprovedByUserId IS NOT NULLThis really should be:
WHERE SourceTagName=TagName
AND ApprovedByUserId IS NOT NULLThe UNION condition is also very buried in there, and it has too much indentation ;-)
Finally, white-space is cheap in SQL, I really don't like the space-less
arg=val expressions, spacing them is easy arg = val.Right, that's the nit-picky stuff done.
Talking about the UNION, the statement could be rewritten as:
SELECT IsNull(Main.TagName, Alias.TagName) AS TagName
, IsNull(Main.Id, Alias.Id ) AS Id
, Alias.Id AS AliasId
FROM Tags AS Alias
LEFT JOIN TagSynonyms
ON SourceTagName = Alias.TagName
LEFT JOIN Tags AS Main
ON TargetTagName = Main.TagNameI prefer the more compact result, although it is debatable as to whether the "Outer" Join is a better solution than the UNION. Outer joins can lead to confusion. So can UNIONS and I prefer the outer join.
In the above query I have removed the
Alias TagName since it is not used outside the sub-select.I also found the final 'assimilation' of all the data in the final select to have too many casts and data conversions. The actual logic was lost in the detail. I moved the complicating CASTS to the previous
with clauses.Finally, the indexed
ORDER BY has caught a number of people out in the past, as it is too easy to change columns and suddenly the data ordering is off, even though the query still runs. I always recommend named-column ordering.Putting this all together, I have the SQL:
WITH CanonicalTags AS (
SELECT IsNull(Main.TagName, Alias.TagName) AS TagName
, IsNull(Main.Id, Alias.Id ) AS Id
-- , Alias.TagName AS AliasName
, Alias.Id AS AliasId
FROM Tags AS Alias
LEFT JOIN TagSynonyms
ON SourceTagName = Alias.TagName
LEFT JOIN Tags AS Main
ON TargetTagName = Main.TagName
), TagQuestionStats AS (
SELECT Tags.Id AS Id
, Tags.TagName AS TagName
, CAST(AVG(CAST(Questions.Score AS DECIMAL)) AS DECIMAL(8,2)) AS AvgScore
, CAST(STDEV(Questions.Score) AS DECIMAL(8,2)) AS StdDevScore
, CAST(COUNT(Questions.Id) AS DECIMAL(8,2)) AS N
FROM CanonicalTags AS Tags
INNER JOIN PostTags
ON PostTags.TagId=Tags.AliasId
INNER JOIN Posts AS Questions
ON Questions.Id=PostTags.PostId
GROUP BY Tags.Id, Tags.TagName
), TagAnswerStats AS (
SELECT Tags.Id AS Id
, Tags.TagName AS TagName
, CAST(AVG(CAST(Answers.Score AS DECIMAL)) AS DECIMAL(8,2)) AS AvgScore
, CAST(STDEV(Answers.Score) AS DECIMAL(8,2)) AS StdDevScore
, CAST(COUNT(Answers.Id) AS DECIMAL(8,2)) AS N
FROM CanonicalTags AS Tags
INNER JOIN PostTags
ON PostTags.TagId=Tags.AliasId
INNER JOIN Posts AS Questions
ON Questions.Id=PostTags.PostId
INNER JOIN Posts AS Answers
ON Answers.ParentId=Questions.Id
WHERE
Answers.PostTypeId=2
GROUP BY Tags.Id, Tags.TagName
)
SELECT '' AS [TagName]
, QStats.AvgScore AS [Avg Qst Score]
, QStats.StdDevScore AS [Qst Score Std Dev]
, QStats.N AS [# Qst]
, AStats.AvgScore AS [Avg Ans Score]
, AStats.StdDevScore AS [Ans Score Std Dev]
, AStats.N AS [# Ans]
, ROUND(AStats.N / QStats.N, 2) AS [Avg # Ans/Qst]
FROM
TagQuestionStats AS QStats
LEFT OUTER JOIN TagAnswerStats AS AStats
ON AStats.Id=QStats.Id
ORDER BY [Avg Ans Score] DESC
, [Avg Qst Score] DESC
, [# Ans]
, [# Qst]
, [TagName];This query supports this fork of your original question
Code Snippets
SELECT Tags.Id, Tags.TagName, Alias.Id AS AliasId, Alias.TagName
....SELECT Tags.Id
, Tags.TagName
, Alias.Id AS AliasId
, Alias.TagName
....WHERE SourceTagName=TagName AND ApprovedByUserId IS NOT NULLWHERE SourceTagName=TagName
AND ApprovedByUserId IS NOT NULLSELECT IsNull(Main.TagName, Alias.TagName) AS TagName
, IsNull(Main.Id, Alias.Id ) AS Id
, Alias.Id AS AliasId
FROM Tags AS Alias
LEFT JOIN TagSynonyms
ON SourceTagName = Alias.TagName
LEFT JOIN Tags AS Main
ON TargetTagName = Main.TagNameContext
StackExchange Code Review Q#39410, answer score: 4
Revisions (0)
No revisions yet.