patternsqlMajor
Excuse me, how many of you are new here? Mind if I JOIN you?
Viewed 0 times
howyounewareherejoinmindmanyexcuse
Problem
I had perceived an uptick in new reviewers around here, and I wanted to see if the data backed that observation up. It's been a while since I've written any SQL, so this was a nice little exercise in using the Stack Exchange Data Explorer.
The query finds the first question and first answer from every user, and then groups those first posts by week. It turns out that I was right, but the growth of first questions is outpacing the growth of first answers.
The query feels repetitive and I'm not a huge fan of the
Is there a better way to write this?
The query finds the first question and first answer from every user, and then groups those first posts by week. It turns out that I was right, but the growth of first questions is outpacing the growth of first answers.
The query feels repetitive and I'm not a huge fan of the
Full Outer Join.Is there a better way to write this?
WITH FirstAnswers
AS (
SELECT
Users.Id UserId
,dateadd(week, datediff(week, 0, Convert(Date,Min(Posts.CreationDate))), 0) WeekOf
FROM Posts
INNER JOIN Users
ON Posts.OwnerUserId = Users.Id
WHERE PostTypeId = 2 --answer
AND Posts.CreationDate > '2011-Jan-01' --There was very little activity prior to this date. Including it skews the graph.
GROUP BY Users.Id
),
FirstQuestions
AS (
SELECT
Users.Id UserId
,dateadd(week, datediff(week, 0, Convert(Date,Min(Posts.CreationDate))), 0) WeekOf
FROM Posts
INNER JOIN Users
ON Posts.OwnerUserId = Users.Id
WHERE PostTypeId = 1 --question
AND Posts.CreationDate > '2011-Jan-01' --There was very little activity prior to this date. Including it skews the graph.
GROUP BY Users.Id
)
SELECT ISNULL(a.WeekOf,b.WeekOf) As WeekOf
, a.AnswerCount, b.QuestionCount
FROM (
SELECT WeekOf, Count(UserId) AnswerCount
FROM FirstAnswers
GROUP BY WeekOf
) a
FULL OUTER JOIN (
SELECT WeekOf, Count(UserId) QuestionCount
FROM FirstQuestions
GROUP BY WeekOf
)b
ON a.WeekOf = b.WeekOf
ORDER BY WeekOf
Solution
Technically, the full outer join is insufficient. If during a certain week there was not a single user who posted their first question or answer, the plot for that week needs to be 0. It would be misleading to plot a direct line from the previous week to the following week, skipping the quiet week. So, strictly speaking, you need to generate a series of all the weeks, to which you
First rewrite
All of the information you need is contained in the
The subselects in the query suggest that your CTEs are underdeveloped. In my first rewrite below, I've reorganized
Your function names are inconsistently capitalized:
Better solution
As it turns out, there is a much slicker way to avoid the join altogether by using a conditional
LEFT OUTER JOIN the FirstQuestions and FirstAnswers CTEs. However, generating a date series in SQL Server is a hassle compared to some developer-friendlier databases. I'm willing to pretend that there aren't weeks with no new user activity (actually demonstrably false for the week of 2011-01-10).First rewrite
All of the information you need is contained in the
Posts table. There is no need to join the Users table.The subselects in the query suggest that your CTEs are underdeveloped. In my first rewrite below, I've reorganized
FirstAnswers and FirstQuestions to perform the role of your subselects.a and q would make more meaningful table aliases than a and b.Your function names are inconsistently capitalized:
dateadd, datediff, Convert, Min, ISNULL. I believe that Convert(Date, …) isn't needed at all.WITH FirstPosts AS (
SELECT OwnerUserId As UserId
, PostTypeId
, DATEADD(week, DATEDIFF(week, 0, MIN(CreationDate)), 0) AS WeekOf
FROM Posts
WHERE CreationDate > '2011-01-01' -- There was very little activity prior to this date. Including it skews the graph.
GROUP BY OwnerUserId, PostTypeId
), FirstAnswers AS (
SELECT WeekOf
, COUNT(UserId) AS UserCount
FROM FirstPosts
WHERE PostTypeId = 2 -- answer
GROUP BY WeekOf
), FirstQuestions AS (
SELECT WeekOf
, COUNT(UserId) AS UserCount
FROM FirstPosts
WHERE PostTypeId = 1 -- question
GROUP BY WeekOf
)
SELECT ISNULL(a.WeekOf, q.WeekOf) AS WeekOf
, ISNULL(a.UserCount, 0) AS AnswerCount
, ISNULL(q.UserCount, 0) AS QuestionCount
FROM FirstAnswers AS a
FULL OUTER JOIN FirstQuestions AS q
ON a.WeekOf = q.WeekOf
ORDER BY 1;Better solution
As it turns out, there is a much slicker way to avoid the join altogether by using a conditional
COUNT():WITH FirstPosts AS (
SELECT OwnerUserId As UserId
, PostTypeId
, DATEADD(week, DATEDIFF(week, 0, MIN(CreationDate)), 0) AS WeekOf
FROM Posts
WHERE CreationDate > '2011-01-01' -- There was very little activity prior to this date. Including it skews the graph.
GROUP BY OwnerUserId, PostTypeId
)
SELECT WeekOf
, COUNT(CASE WHEN PostTypeId = 2 THEN UserId END) AS AnswerCount
, COUNT(CASE WHEN PostTypeId = 1 THEN UserId END) AS QuestionCount
FROM FirstPosts
GROUP BY WeekOf
ORDER BY WeekOf;Code Snippets
WITH FirstPosts AS (
SELECT OwnerUserId As UserId
, PostTypeId
, DATEADD(week, DATEDIFF(week, 0, MIN(CreationDate)), 0) AS WeekOf
FROM Posts
WHERE CreationDate > '2011-01-01' -- There was very little activity prior to this date. Including it skews the graph.
GROUP BY OwnerUserId, PostTypeId
), FirstAnswers AS (
SELECT WeekOf
, COUNT(UserId) AS UserCount
FROM FirstPosts
WHERE PostTypeId = 2 -- answer
GROUP BY WeekOf
), FirstQuestions AS (
SELECT WeekOf
, COUNT(UserId) AS UserCount
FROM FirstPosts
WHERE PostTypeId = 1 -- question
GROUP BY WeekOf
)
SELECT ISNULL(a.WeekOf, q.WeekOf) AS WeekOf
, ISNULL(a.UserCount, 0) AS AnswerCount
, ISNULL(q.UserCount, 0) AS QuestionCount
FROM FirstAnswers AS a
FULL OUTER JOIN FirstQuestions AS q
ON a.WeekOf = q.WeekOf
ORDER BY 1;WITH FirstPosts AS (
SELECT OwnerUserId As UserId
, PostTypeId
, DATEADD(week, DATEDIFF(week, 0, MIN(CreationDate)), 0) AS WeekOf
FROM Posts
WHERE CreationDate > '2011-01-01' -- There was very little activity prior to this date. Including it skews the graph.
GROUP BY OwnerUserId, PostTypeId
)
SELECT WeekOf
, COUNT(CASE WHEN PostTypeId = 2 THEN UserId END) AS AnswerCount
, COUNT(CASE WHEN PostTypeId = 1 THEN UserId END) AS QuestionCount
FROM FirstPosts
GROUP BY WeekOf
ORDER BY WeekOf;Context
StackExchange Code Review Q#97890, answer score: 22
Revisions (0)
No revisions yet.