patternsqlMinor
Stack Exchange Data Explorer query to find upvoted questions with only downvoted answers
Viewed 0 times
stackexchangewithexplorerqueryquestionsanswersdownvotedfinddata
Problem
I have written a SQL query against the Stack Exchange Data Explorer that identifies answered questions with a positive score for which all the answers have a negative score (this was an answer to this Meta Stack Exchange post, though the code in that post is different from what I've posted below).
It's a fairly straightforward query; you can see it and run it on the Data Explorer (where you can also examine the schemata of the tables), and I have reproduced it below for your convenience.
While I of course would love to hear anything you have to say about this code, I am particularly interested in the following:
-
How do the performance characteristics of the HAVING clause here differ from instead using a WHERE clause of the form below?
My sense is that
It's a fairly straightforward query; you can see it and run it on the Data Explorer (where you can also examine the schemata of the tables), and I have reproduced it below for your convenience.
SELECT
q.Id AS [Post Link], -- this alias makes the Data Explorer do magic link rendering
MAX(q.Score) AS [Score],
COUNT(a.Id) AS [AnswerCount],
AVG(CAST(a.Score AS float)) AS [AvgAnswerScore]
FROM
Posts AS q
-- INNER JOIN => At least one answer
INNER JOIN Posts AS a
ON q.Id = a.ParentId
WHERE
-- Question has a positive score
q.Score >= 1
GROUP BY
q.Id
HAVING
-- Zero answers with a non-negative score
SUM((CASE WHEN a.Score >= 0 THEN 1 ELSE 0 END)) = 0
ORDER BY
[Score] DESC,
[AnswerCount] DESCWhile I of course would love to hear anything you have to say about this code, I am particularly interested in the following:
- Stylistic tips - coming from the perspective of a PEP8-adhering Python guy, the huge inhomogeneity in SQL coding styles makes me queasy.
MAX(q.Score)looks really stupid and is semantically wrong, since there's obviously only oneq.Scorefor eachq.Idsince they come from the same table, andq.Idis the primary key of that table. But puttingq.Scoreinto the GROUP BY also seems weird to me, since that's not semantically what I'm doing either. Is there a better way to approach this?
-
How do the performance characteristics of the HAVING clause here differ from instead using a WHERE clause of the form below?
q.Score >= 1
AND NOT EXISTS (
SELECT 1
FROM Posts AS a2
WHERE a2.ParentId = q.Id AND a2.Score >= 0
)My sense is that
HAVING SUM(...) = 0 is probably better, since, uh... there doesn't neeSolution
Answering your questions in reverse order…
Your
Your
HAVING clause is clearer than than an anti-join condition in the WHERE clause. It is also more efficient — tick the "Include execution plan" checkbox to see how much more complicated the anti-join is. You could do better, though: MAX(a.Score)
- Indent more. The
FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses are all subsidiary to the SELECT, and therefore should be at the next indentation level.
- Finishing the query with a semicolon is a good habit.
- Move comments to the right to be less obtrusive.
- I like to put commas before the selected columns. That convention makes it easier to add and remove columns.
- Prefer comparing with 0 instead of comparing with any other constant.
SELECT q.Id AS [Post Link] -- Alias makes Data Explorer do magic link rendering
, q.Score AS [Score]
, COUNT(a.Id) AS [AnswerCount]
, AVG(CAST(a.Score AS float)) AS [AvgAnswerScore]
FROM Posts AS q
INNER JOIN Posts AS a
ON q.Id = a.ParentId
WHERE
q.Score > 0 -- Question has a positive score
GROUP BY
q.Id, q.Score
HAVING
MAX(a.Score) < 0 -- Best answer has a negative score
ORDER BY
[Score] DESC,
[AnswerCount] DESC;
I'd like to point out: good job remembering to CAST(a.Score AS float) before taking the average. SQL Server is tricky, in that AVG() of ints is an int`.Code Snippets
SELECT q.Id AS [Post Link] -- Alias makes Data Explorer do magic link rendering
, q.Score AS [Score]
, COUNT(a.Id) AS [AnswerCount]
, AVG(CAST(a.Score AS float)) AS [AvgAnswerScore]
FROM Posts AS q
INNER JOIN Posts AS a
ON q.Id = a.ParentId
WHERE
q.Score > 0 -- Question has a positive score
GROUP BY
q.Id, q.Score
HAVING
MAX(a.Score) < 0 -- Best answer has a negative score
ORDER BY
[Score] DESC,
[AnswerCount] DESC;Context
StackExchange Code Review Q#80489, answer score: 2
Revisions (0)
No revisions yet.