HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Stack Exchange Data Explorer query to find upvoted questions with only downvoted answers

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
stackexchangewithexplorerqueryquestionsanswersdownvotedfinddata

Problem

I have written a SQL query against the Stack Exchange Data Explorer that identifies answered questions with a positive score for which all the answers have a negative score (this was an answer to this Meta Stack Exchange post, though the code in that post is different from what I've posted below).

It's a fairly straightforward query; you can see it and run it on the Data Explorer (where you can also examine the schemata of the tables), and I have reproduced it below for your convenience.

SELECT
  q.Id AS [Post Link], -- this alias makes the Data Explorer do magic link rendering
  MAX(q.Score) AS [Score],
  COUNT(a.Id) AS [AnswerCount],
  AVG(CAST(a.Score AS float)) AS [AvgAnswerScore]
FROM
  Posts AS q
  -- INNER JOIN => At least one answer
  INNER JOIN Posts AS a
    ON q.Id = a.ParentId
WHERE
  -- Question has a positive score
  q.Score >= 1
GROUP BY
  q.Id
HAVING
  -- Zero answers with a non-negative score
  SUM((CASE WHEN a.Score >= 0 THEN 1 ELSE 0 END)) = 0
ORDER BY
  [Score] DESC,
  [AnswerCount] DESC


While I of course would love to hear anything you have to say about this code, I am particularly interested in the following:

  • Stylistic tips - coming from the perspective of a PEP8-adhering Python guy, the huge inhomogeneity in SQL coding styles makes me queasy.



  • MAX(q.Score) looks really stupid and is semantically wrong, since there's obviously only one q.Score for each q.Id since they come from the same table, and q.Id is the primary key of that table. But putting q.Score into the GROUP BY also seems weird to me, since that's not semantically what I'm doing either. Is there a better way to approach this?



-
How do the performance characteristics of the HAVING clause here differ from instead using a WHERE clause of the form below?

q.Score >= 1
AND NOT EXISTS (
  SELECT 1
  FROM Posts AS a2
  WHERE a2.ParentId = q.Id AND a2.Score >= 0
)


My sense is that HAVING SUM(...) = 0 is probably better, since, uh... there doesn't nee

Solution

Answering your questions in reverse order…

Your HAVING clause is clearer than than an anti-join condition in the WHERE clause. It is also more efficient — tick the "Include execution plan" checkbox to see how much more complicated the anti-join is. You could do better, though: MAX(a.Score)

  • Indent more. The FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses are all subsidiary to the SELECT, and therefore should be at the next indentation level.



  • Finishing the query with a semicolon is a good habit.



  • Move comments to the right to be less obtrusive.



  • I like to put commas before the selected columns. That convention makes it easier to add and remove columns.



  • Prefer comparing with 0 instead of comparing with any other constant.



SELECT q.Id AS [Post Link] -- Alias makes Data Explorer do magic link rendering
     , q.Score AS [Score]
     , COUNT(a.Id) AS [AnswerCount]
     , AVG(CAST(a.Score AS float)) AS [AvgAnswerScore]
    FROM Posts AS q
        INNER JOIN Posts AS a
            ON q.Id = a.ParentId
    WHERE
        q.Score > 0        -- Question has a positive score
    GROUP BY
        q.Id, q.Score
    HAVING
        MAX(a.Score) < 0   -- Best answer has a negative score
    ORDER BY
        [Score] DESC,
        [AnswerCount] DESC;


I'd like to point out: good job remembering to
CAST(a.Score AS float) before taking the average. SQL Server is tricky, in that AVG() of ints is an int`.

Code Snippets

SELECT q.Id AS [Post Link] -- Alias makes Data Explorer do magic link rendering
     , q.Score AS [Score]
     , COUNT(a.Id) AS [AnswerCount]
     , AVG(CAST(a.Score AS float)) AS [AvgAnswerScore]
    FROM Posts AS q
        INNER JOIN Posts AS a
            ON q.Id = a.ParentId
    WHERE
        q.Score > 0        -- Question has a positive score
    GROUP BY
        q.Id, q.Score
    HAVING
        MAX(a.Score) < 0   -- Best answer has a negative score
    ORDER BY
        [Score] DESC,
        [AnswerCount] DESC;

Context

StackExchange Code Review Q#80489, answer score: 2

Revisions (0)

No revisions yet.