HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Retrieve high views per day questions with few answers

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
perwithanswersquestionshighviewsretrievefewday

Problem

I've been messing around with the SE Data Explorer and I've come up with a query that basically returns information about questions that have a high number of views per day, that have few or no answers.

SELECT 
      TOP 500
      Question.ViewCount,
      DATEDIFF(dd, CONVERT(date,Question.CreationDate), CONVERT(date, GETDATE()) ) AS Days ,    
      Question.Id AS [Post Link],
      (Question.ViewCount / DATEDIFF(dd, CONVERT(date,Question.CreationDate), CONVERT(date, GETDATE()) )) AS ViewsPerDay,
      Question.AnswerCount,
      (SELECT MAX (Answers.Score )
       FROM POSTS "Answers"
       WHERE Answers.ParentId = Question.Id
       AND   Answers.PostTypeId = 2
      ) "Highest Answer Score",
      Question.Tags
 FROM
      POSTS "Question"
WHERE Question.PostTypeId = 1 
AND   Question.acceptedanswerid is null
AND   DATEDIFF(dd, CONVERT(date,Question.CreationDate), CONVERT(date, GETDATE()) ) > 10
AND   (Question.ViewCount / DATEDIFF(dd, CONVERT(date,Question.CreationDate), CONVERT(date, GETDATE()) )) > 5
ORDER BY
    AnswerCount ASC,
    "Highest Answer Score" ASC,
    ViewsPerDay DESC


It's been a while since I used SQL directly in anger, so any feedback's welcome.

I particularly don't like the way DATEDIFF(dd, CONVERT(date,Question.CreationDate), CONVERT(date, GETDATE()) ) is repeated in both the SELECT and the WHERE clauses, so I'd like some way of aliasing the check if possible.

Solution

Overall

I think your query is pretty nicely written and returns useful information.

Use some variables to refer to repeated values, for example, the GETDATE() reference could just be declared as a variable at the top and reused throughout.

DECLARE @now DATETIME2 = GETDATE();


You could also use variables to eliminate some magic numbers, such as the Posts.PostTypeId for questions and answers.

You might want to add a field to indicate when the question was actually posted. A day count is all fair and well, but knowing the actual post date might be good information as well.

SELECT 
  /* ...*/ 
  Question.CreationDate AS DatePosted


You might want to rename the Days column to something a bit more descriptive, like DaysSinceFirstPosted or just DaysOld.

Use day instead of dd, they work the same but day makes the code easier to read.

You also don't need to convert datetime fields to date fields for calculations to work correctly, it's all factored in for you already, e.g.:

DATEDIFF(DAY, Question.CreationDate, @now) AS DaysSinceFirstPosted


ViewsPerDay would be better named as AverageViewsPerDay based on your calculations.

SELECT 
    /* ... */
      (SELECT MAX (Answers.Score)
       FROM POSTS "Answers"
       WHERE Answers.ParentId = Question.Id
       AND   Answers.PostTypeId = 2
      ) "Highest Answer Score",


This would make more sense as a normal table join rather than an inline subquery:

SELECT
    /* ... */
    MAX(Answers.Score) AS [HighestAnswerScore],
    /* ... */
FROM
    Posts AS [Question]
    LEFT JOIN Posts AS [Answers]
        ON Answers.ParentId = Question.Id
        AND Answers.PostTypeId = @answer
WHERE 
    /* ... */


If you plan on using quoted identifiers as you did, you would want to declare SET QUOTED_IDENTIFIER ON; at the top of your query. SE Data Explorer must have that set by default, yet it's not the case for some other databases. I personally find it more practical to use default square brackets for identifiers.

With all of the above applied, here is the new query:

DECLARE @now DATETIME2 = GETDATE();
DECLARE @question INT = 1;
DECLARE @answer INT = 2;
SELECT TOP 500
    Question.CreationDate AS QuestionPosted,
    Question.ViewCount AS QuestionViews,
    DATEDIFF(DAY, Question.CreationDate, @now) AS DaysSinceFirstPosted,    
    Question.Id AS [Post Link],
    Question.ViewCount / DATEDIFF(DAY, Question.CreationDate, @now) AS AverageViewsPerDay,
    Question.AnswerCount,
    MAX(Answers.Score) AS HighestAnswerScore,
    Question.Tags
FROM
    Posts AS [Question]
    LEFT JOIN Posts AS [Answers]
        ON Answers.ParentId = Question.Id
        AND Answers.PostTypeId = @answer
WHERE 
    Question.PostTypeId = @question
    AND Question.AcceptedAnswerId is null
    AND DATEDIFF(DAY, Question.CreationDate, @now) > 10
    AND (Question.ViewCount / DATEDIFF(DAY, Question.CreationDate, @now)) > 5
GROUP BY
    Question.CreationDate,
    Question.ViewCount,
    DATEDIFF(DAY, Question.CreationDate, @now),
    Question.Id,
    Question.ViewCount / DATEDIFF(DAY, Question.CreationDate, @now),
    Question.AnswerCount,
    Answers.Score,
    Question.Tags
ORDER BY
    AnswerCount ASC,
    HighestAnswerScore ASC,
    AverageViewsPerDay DESC;

Code Snippets

DECLARE @now DATETIME2 = GETDATE();
SELECT 
  /* ...*/ 
  Question.CreationDate AS DatePosted
DATEDIFF(DAY, Question.CreationDate, @now) AS DaysSinceFirstPosted
SELECT 
    /* ... */
      (SELECT MAX (Answers.Score)
       FROM POSTS "Answers"
       WHERE Answers.ParentId = Question.Id
       AND   Answers.PostTypeId = 2
      ) "Highest Answer Score",
SELECT
    /* ... */
    MAX(Answers.Score) AS [HighestAnswerScore],
    /* ... */
FROM
    Posts AS [Question]
    LEFT JOIN Posts AS [Answers]
        ON Answers.ParentId = Question.Id
        AND Answers.PostTypeId = @answer
WHERE 
    /* ... */

Context

StackExchange Code Review Q#135034, answer score: 3

Revisions (0)

No revisions yet.