patternsqlMinor
Retrieve high views per day questions with few answers
Viewed 0 times
perwithanswersquestionshighviewsretrievefewday
Problem
I've been messing around with the SE Data Explorer and I've come up with a query that basically returns information about questions that have a high number of views per day, that have few or no answers.
It's been a while since I used SQL directly in anger, so any feedback's welcome.
I particularly don't like the way
SELECT
TOP 500
Question.ViewCount,
DATEDIFF(dd, CONVERT(date,Question.CreationDate), CONVERT(date, GETDATE()) ) AS Days ,
Question.Id AS [Post Link],
(Question.ViewCount / DATEDIFF(dd, CONVERT(date,Question.CreationDate), CONVERT(date, GETDATE()) )) AS ViewsPerDay,
Question.AnswerCount,
(SELECT MAX (Answers.Score )
FROM POSTS "Answers"
WHERE Answers.ParentId = Question.Id
AND Answers.PostTypeId = 2
) "Highest Answer Score",
Question.Tags
FROM
POSTS "Question"
WHERE Question.PostTypeId = 1
AND Question.acceptedanswerid is null
AND DATEDIFF(dd, CONVERT(date,Question.CreationDate), CONVERT(date, GETDATE()) ) > 10
AND (Question.ViewCount / DATEDIFF(dd, CONVERT(date,Question.CreationDate), CONVERT(date, GETDATE()) )) > 5
ORDER BY
AnswerCount ASC,
"Highest Answer Score" ASC,
ViewsPerDay DESCIt's been a while since I used SQL directly in anger, so any feedback's welcome.
I particularly don't like the way
DATEDIFF(dd, CONVERT(date,Question.CreationDate), CONVERT(date, GETDATE()) ) is repeated in both the SELECT and the WHERE clauses, so I'd like some way of aliasing the check if possible.Solution
Overall
I think your query is pretty nicely written and returns useful information.
Use some variables to refer to repeated values, for example, the
You could also use variables to eliminate some magic numbers, such as the
You might want to add a field to indicate when the question was actually posted. A day count is all fair and well, but knowing the actual post date might be good information as well.
You might want to rename the
Use
You also don't need to convert datetime fields to date fields for calculations to work correctly, it's all factored in for you already, e.g.:
This would make more sense as a normal table join rather than an inline subquery:
If you plan on using quoted identifiers as you did, you would want to declare
With all of the above applied, here is the new query:
I think your query is pretty nicely written and returns useful information.
Use some variables to refer to repeated values, for example, the
GETDATE() reference could just be declared as a variable at the top and reused throughout.DECLARE @now DATETIME2 = GETDATE();You could also use variables to eliminate some magic numbers, such as the
Posts.PostTypeId for questions and answers.You might want to add a field to indicate when the question was actually posted. A day count is all fair and well, but knowing the actual post date might be good information as well.
SELECT
/* ...*/
Question.CreationDate AS DatePostedYou might want to rename the
Days column to something a bit more descriptive, like DaysSinceFirstPosted or just DaysOld.Use
day instead of dd, they work the same but day makes the code easier to read.You also don't need to convert datetime fields to date fields for calculations to work correctly, it's all factored in for you already, e.g.:
DATEDIFF(DAY, Question.CreationDate, @now) AS DaysSinceFirstPostedViewsPerDay would be better named as AverageViewsPerDay based on your calculations.SELECT
/* ... */
(SELECT MAX (Answers.Score)
FROM POSTS "Answers"
WHERE Answers.ParentId = Question.Id
AND Answers.PostTypeId = 2
) "Highest Answer Score",This would make more sense as a normal table join rather than an inline subquery:
SELECT
/* ... */
MAX(Answers.Score) AS [HighestAnswerScore],
/* ... */
FROM
Posts AS [Question]
LEFT JOIN Posts AS [Answers]
ON Answers.ParentId = Question.Id
AND Answers.PostTypeId = @answer
WHERE
/* ... */If you plan on using quoted identifiers as you did, you would want to declare
SET QUOTED_IDENTIFIER ON; at the top of your query. SE Data Explorer must have that set by default, yet it's not the case for some other databases. I personally find it more practical to use default square brackets for identifiers.With all of the above applied, here is the new query:
DECLARE @now DATETIME2 = GETDATE();
DECLARE @question INT = 1;
DECLARE @answer INT = 2;
SELECT TOP 500
Question.CreationDate AS QuestionPosted,
Question.ViewCount AS QuestionViews,
DATEDIFF(DAY, Question.CreationDate, @now) AS DaysSinceFirstPosted,
Question.Id AS [Post Link],
Question.ViewCount / DATEDIFF(DAY, Question.CreationDate, @now) AS AverageViewsPerDay,
Question.AnswerCount,
MAX(Answers.Score) AS HighestAnswerScore,
Question.Tags
FROM
Posts AS [Question]
LEFT JOIN Posts AS [Answers]
ON Answers.ParentId = Question.Id
AND Answers.PostTypeId = @answer
WHERE
Question.PostTypeId = @question
AND Question.AcceptedAnswerId is null
AND DATEDIFF(DAY, Question.CreationDate, @now) > 10
AND (Question.ViewCount / DATEDIFF(DAY, Question.CreationDate, @now)) > 5
GROUP BY
Question.CreationDate,
Question.ViewCount,
DATEDIFF(DAY, Question.CreationDate, @now),
Question.Id,
Question.ViewCount / DATEDIFF(DAY, Question.CreationDate, @now),
Question.AnswerCount,
Answers.Score,
Question.Tags
ORDER BY
AnswerCount ASC,
HighestAnswerScore ASC,
AverageViewsPerDay DESC;Code Snippets
DECLARE @now DATETIME2 = GETDATE();SELECT
/* ...*/
Question.CreationDate AS DatePostedDATEDIFF(DAY, Question.CreationDate, @now) AS DaysSinceFirstPostedSELECT
/* ... */
(SELECT MAX (Answers.Score)
FROM POSTS "Answers"
WHERE Answers.ParentId = Question.Id
AND Answers.PostTypeId = 2
) "Highest Answer Score",SELECT
/* ... */
MAX(Answers.Score) AS [HighestAnswerScore],
/* ... */
FROM
Posts AS [Question]
LEFT JOIN Posts AS [Answers]
ON Answers.ParentId = Question.Id
AND Answers.PostTypeId = @answer
WHERE
/* ... */Context
StackExchange Code Review Q#135034, answer score: 3
Revisions (0)
No revisions yet.