HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Excuse me, how many of you are new here? Mind if I JOIN you?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
howyounewareherejoinmindmanyexcuse

Problem

I had perceived an uptick in new reviewers around here, and I wanted to see if the data backed that observation up. It's been a while since I've written any SQL, so this was a nice little exercise in using the Stack Exchange Data Explorer.

The query finds the first question and first answer from every user, and then groups those first posts by week. It turns out that I was right, but the growth of first questions is outpacing the growth of first answers.

The query feels repetitive and I'm not a huge fan of the Full Outer Join.
Is there a better way to write this?

WITH FirstAnswers
AS (
SELECT
Users.Id UserId
,dateadd(week, datediff(week, 0, Convert(Date,Min(Posts.CreationDate))), 0) WeekOf
FROM Posts
INNER JOIN Users
ON Posts.OwnerUserId = Users.Id
WHERE PostTypeId = 2 --answer
AND Posts.CreationDate > '2011-Jan-01' --There was very little activity prior to this date. Including it skews the graph.
GROUP BY Users.Id
),

FirstQuestions
AS (
SELECT
Users.Id UserId
,dateadd(week, datediff(week, 0, Convert(Date,Min(Posts.CreationDate))), 0) WeekOf
FROM Posts
INNER JOIN Users
ON Posts.OwnerUserId = Users.Id
WHERE PostTypeId = 1 --question
AND Posts.CreationDate > '2011-Jan-01' --There was very little activity prior to this date. Including it skews the graph.
GROUP BY Users.Id
)

SELECT ISNULL(a.WeekOf,b.WeekOf) As WeekOf
, a.AnswerCount, b.QuestionCount
FROM (
SELECT WeekOf, Count(UserId) AnswerCount
FROM FirstAnswers
GROUP BY WeekOf
) a
FULL OUTER JOIN (
SELECT WeekOf, Count(UserId) QuestionCount
FROM FirstQuestions
GROUP BY WeekOf
)b
ON a.WeekOf = b.WeekOf
ORDER BY WeekOf

Solution

Technically, the full outer join is insufficient. If during a certain week there was not a single user who posted their first question or answer, the plot for that week needs to be 0. It would be misleading to plot a direct line from the previous week to the following week, skipping the quiet week. So, strictly speaking, you need to generate a series of all the weeks, to which you LEFT OUTER JOIN the FirstQuestions and FirstAnswers CTEs. However, generating a date series in SQL Server is a hassle compared to some developer-friendlier databases. I'm willing to pretend that there aren't weeks with no new user activity (actually demonstrably false for the week of 2011-01-10).

First rewrite

All of the information you need is contained in the Posts table. There is no need to join the Users table.

The subselects in the query suggest that your CTEs are underdeveloped. In my first rewrite below, I've reorganized FirstAnswers and FirstQuestions to perform the role of your subselects.

a and q would make more meaningful table aliases than a and b.

Your function names are inconsistently capitalized: dateadd, datediff, Convert, Min, ISNULL. I believe that Convert(Date, …) isn't needed at all.

WITH FirstPosts AS (
  SELECT OwnerUserId As UserId
       , PostTypeId
       , DATEADD(week, DATEDIFF(week, 0, MIN(CreationDate)), 0) AS WeekOf
    FROM Posts
    WHERE CreationDate > '2011-01-01' -- There was very little activity prior to this date. Including it skews the graph.
    GROUP BY OwnerUserId, PostTypeId
), FirstAnswers AS (
  SELECT WeekOf
       , COUNT(UserId) AS UserCount
    FROM FirstPosts
    WHERE PostTypeId = 2 -- answer
    GROUP BY WeekOf
), FirstQuestions AS (
  SELECT WeekOf
       , COUNT(UserId) AS UserCount
    FROM FirstPosts
    WHERE PostTypeId = 1 -- question
    GROUP BY WeekOf
)
SELECT ISNULL(a.WeekOf, q.WeekOf) AS WeekOf
     , ISNULL(a.UserCount, 0) AS AnswerCount
     , ISNULL(q.UserCount, 0) AS QuestionCount
  FROM FirstAnswers AS a
    FULL OUTER JOIN FirstQuestions AS q
      ON a.WeekOf = q.WeekOf
  ORDER BY 1;


Better solution

As it turns out, there is a much slicker way to avoid the join altogether by using a conditional COUNT():

WITH FirstPosts AS (
  SELECT OwnerUserId As UserId
       , PostTypeId
       , DATEADD(week, DATEDIFF(week, 0, MIN(CreationDate)), 0) AS WeekOf
    FROM Posts
    WHERE CreationDate > '2011-01-01' -- There was very little activity prior to this date. Including it skews the graph.
    GROUP BY OwnerUserId, PostTypeId
)
SELECT WeekOf
     , COUNT(CASE WHEN PostTypeId = 2 THEN UserId END) AS AnswerCount
     , COUNT(CASE WHEN PostTypeId = 1 THEN UserId END) AS QuestionCount
  FROM FirstPosts
  GROUP BY WeekOf
  ORDER BY WeekOf;

Code Snippets

WITH FirstPosts AS (
  SELECT OwnerUserId As UserId
       , PostTypeId
       , DATEADD(week, DATEDIFF(week, 0, MIN(CreationDate)), 0) AS WeekOf
    FROM Posts
    WHERE CreationDate > '2011-01-01' -- There was very little activity prior to this date. Including it skews the graph.
    GROUP BY OwnerUserId, PostTypeId
), FirstAnswers AS (
  SELECT WeekOf
       , COUNT(UserId) AS UserCount
    FROM FirstPosts
    WHERE PostTypeId = 2 -- answer
    GROUP BY WeekOf
), FirstQuestions AS (
  SELECT WeekOf
       , COUNT(UserId) AS UserCount
    FROM FirstPosts
    WHERE PostTypeId = 1 -- question
    GROUP BY WeekOf
)
SELECT ISNULL(a.WeekOf, q.WeekOf) AS WeekOf
     , ISNULL(a.UserCount, 0) AS AnswerCount
     , ISNULL(q.UserCount, 0) AS QuestionCount
  FROM FirstAnswers AS a
    FULL OUTER JOIN FirstQuestions AS q
      ON a.WeekOf = q.WeekOf
  ORDER BY 1;
WITH FirstPosts AS (
  SELECT OwnerUserId As UserId
       , PostTypeId
       , DATEADD(week, DATEDIFF(week, 0, MIN(CreationDate)), 0) AS WeekOf
    FROM Posts
    WHERE CreationDate > '2011-01-01' -- There was very little activity prior to this date. Including it skews the graph.
    GROUP BY OwnerUserId, PostTypeId
)
SELECT WeekOf
     , COUNT(CASE WHEN PostTypeId = 2 THEN UserId END) AS AnswerCount
     , COUNT(CASE WHEN PostTypeId = 1 THEN UserId END) AS QuestionCount
  FROM FirstPosts
  GROUP BY WeekOf
  ORDER BY WeekOf;

Context

StackExchange Code Review Q#97890, answer score: 22

Revisions (0)

No revisions yet.