HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

1st questions with answers within X hours

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
1stwithanswersquestionshourswithin

Problem

I recently challenged @Hosch520 with writing a query as such:


Find first questions with answers posted within 24 hours

And he did great. I figured I would have a go at it too. Turns out I learned valuable things about data subsets, so here I am seeking review.

I did not specify whether to return only the first answer within 24 hours, or all answers within 24 hours, so we went about it a bit differently. The remarks at the top of my query give more details.

In any case, mine returns all answers posted to a new user's first question within 24 hours.

Link to query on SEDE.

```
/*
* Author: @Phrancis
* Title: 1st questions with answers within X hours
* Date: 2015-04-16
* Purpose:
* To query a new user's first question under the condition that
* the question received one or more answers within a certain number of
* hours (default 24), as well as all qualifying answers, and both users'
* name and current reputation. It also calculates the interval between
* question and answers.
*/

DECLARE @HoursElapsed AS INT;
SET @HoursElapsed = ##NumberOfHours:float?24##;
-- NumberOfHours: Hours elapsed between OP 1st question and answers following it "Decimals are allowed"

DECLARE @QuestionType INT;
DECLARE @AnswerType INT;
SET @QuestionType = 1;
SET @AnswerType = 2;

WITH AllFirstQuestions AS (
SELECT
OwnerUserId
, Id
FROM Posts
WHERE CreationDate IN (SELECT MIN(CreationDate) FROM Posts GROUP BY OwnerUserId)
AND PostTypeId = @QuestionType
)

SELECT
qUser.Id AS [User Link]
, qUser.Reputation AS [Q OP Rep]
, firstQ.Id AS [Post Link]
, q.CreationDate AS [Q Date]
, CONVERT( DECIMAL(10,2), DATEDIFF(MINUTE, q.CreationDate, a.CreationDate) /60.0 ) AS [Hours Elapsed]
, aUser.Id AS [User Link]
, aUser.Reputation AS [A OP Rep]
, a.Id AS [Post Link]
, a.CreationDate AS [A Date]

FROM Posts AS q
INNER JOIN AllFirstQuestions AS firstQ
ON q.Id = firstQ.Id
INNER JOIN Posts AS a
ON q.Id = a.ParentId
AND q.PostTypeId = @Quest

Solution

a.CreationDate <= (DATEADD(HOUR, @HoursElapsed, a.CreationDate))


This first condition of your WHERE clause either needs to be removed or needs some explanation.

For any non-negative value of @HoursElapsed, this will always return true. Perhaps, the intent was to use q.CreationDate as the third argument of the DATEADD function?

Otherwise, this condition just says:

x <= x + y


The <= can return true for any non-negative y value. No matter the value of x, x plus some non-negative y will always be some number larger than (or in the case of y=0, equal to) x.

I don't like your indention style. Specifically, in your CTE, I don't like that the AND in your WHERE clause gets the same indention level as the rest of the query. In your main query, I don't like that your INNER JOIN have the same indention level as the rest of the query (and the WHERE appears to have a space in front of it.

Importantly, any SQL query has up to eight primary clauses. WITH, SELECT, INTO, FROM, WHERE, GROUP BY, HAVING, and ORDER BY. Everything else is a part of one of these clauses. These 8 clauses should have zero indentation relative to the query as a whole, and everything else should be indented by at least one level (and I personally prefer four spaces, but I know SEDE default is two spaces).

Our major offender here is the massive list of INNER JOIN subclauses (helper elves?) within the FROM clause.

Keeping in mind the comments I already made about the first part of your WHERE clause, we could simplify the entire WHERE clause into simply just this:

WHERE a.CreationDate BETWEEN q.CreationDate AND DATEADD(HOUR,@HoursElapsed,q.CreationDate)


Our FROM can be massively simplified.

First, there's no reason our CTE can't include the CreationDate. That completely eliminates the need for us to have the Posts table we labeled q. And that can clean up some other nastiness, like q.PostTypeId = @QuestionType (our CTE already narrows that dataset down to just questions).

Bearing in mind the recommended change to the CTE (just add CreationDate to the select list), I recommend a FROM that looks more like this:

FROM AllFirstQuestions q
    INNER JOIN Posts as a ON a.PostTypeId = @AnswerType AND a.ParentId = q.Id 
    INNER JOIN Users qUser ON qUser.Id = q.OwnerUserId
    INNER JOIN Users aUser ON aUser.Id = a.OwnerUserId


I'm guessing that a.PostTypeId = @AnswerType is probably unnecessary, but I don't know enough about SEDE.

I'm going to wager though that the Id column of the Users table is an indexed, auto-incrementing, non-null primary key. We don't have to include a AND alias.OwnerUserId IS NOT NULL on joining it. Any post with a NULL owner user id will be eliminated by virtue of the fact that there are no NULLs in the Id column of the Users table (and even if there were, null wouldn't join to null... I don't believe).

Finally, this is SEDE specific, but it doesn't make sense to allow the user to enter a decimal number for hours elapsed. You're saving it as an INT, so it's immediately converted to an INT and the decimal part is lost before you even do anything with it. Moreover, SQL's DATEADD (one place you use @HoursElapsed) takes an INT, so it doesn't make sense to even think about passing anything else here, and SQL's DATEDIFF (you compare result of this to @HoursElapsed) returns an INT, so it also doesn't make a whole lot of sense to compare to a floating point number here. I think you just need to not allow the user to enter decimal places, since they're 100% ignored no matter what. Allowing the user to enter decimal places gives the false impression that your query is doing something it's not.

Code Snippets

a.CreationDate <= (DATEADD(HOUR, @HoursElapsed, a.CreationDate))
WHERE a.CreationDate BETWEEN q.CreationDate AND DATEADD(HOUR,@HoursElapsed,q.CreationDate)
FROM AllFirstQuestions q
    INNER JOIN Posts as a ON a.PostTypeId = @AnswerType AND a.ParentId = q.Id 
    INNER JOIN Users qUser ON qUser.Id = q.OwnerUserId
    INNER JOIN Users aUser ON aUser.Id = a.OwnerUserId

Context

StackExchange Code Review Q#87278, answer score: 6

Revisions (0)

No revisions yet.