patternsqlMinor
Finding which users have the most questions with unaccepted answers
Viewed 0 times
themostwithanswersquestionsunacceptedfindingwhichusershave
Problem
I was looking at a few profiles this afternoon and noticed a few users had plenty of open questions. This led me to wonder which users actually had the highest amount of questions with unaccepted answers (both in size and percent). So with all the hoopla I kept seeing in the 2nd monitor about SEDE queries, I decided to try my hand at writing one.
with unanswered as (
select
posts.Id as PostId
from
posts
where posts.PostTypeId = 1 -- questions
and posts.OwnerUserId is not null -- user exists
and posts.AcceptedAnswerId is null -- no answer selected
and posts.ClosedDate is null -- still open
group by posts.Id
),
percentages as (
select
users.Id as UserId,
count(posts.Id) as Questions,
count(unanswered.PostId) as UnansweredQuestions,
(count(unanswered.PostId) * 100.0 / count(posts.Id)) as UnansweredPct
from
users left outer join
posts on users.Id = posts.OwnerUserId
left outer join
unanswered on posts.Id = unanswered.PostId
where
posts.PostTypeId = 1
group by
users.Id
)
select top ##MaxRowsToSelect:int?100##
UserId as [User Link],
UnansweredQuestions as [Unanswered Questions],
Questions,
round(UnansweredPct, 1) as [Unanswered %]
from percentages
where UnansweredPct > ##MinUnansweredPct:int?40##
and UnansweredQuestions > ##MinUnansweredQuestions:int?10##
order by UnansweredPct desc;Solution
This is essentially a good, clear and effective query. The only feedback I can offer is minor and somewhat subjective.
-
In the first CTE,
-
I'm not intimately familiar with SEDE but I think you can
-
I'm not a huge fan of your layout and indentation style. Wholly subjective (and I don't know if I'm in the minority or you are) but I usually start each new query clause (
-
I recommend aliasing all tables/rowsets for clarity, e.g.
-
Your final ordering, by
-
I am not a fan of using
-
In the first CTE,
unanswered, I personally would use distinct rather than grouping by posts.id. I group by when I am using an aggregate function; since there's no aggregate, distinct expresses the intention clearer to me. -
I'm not intimately familiar with SEDE but I think you can
inner join between users and posts in the percentages CTE. This is likely to be more performant, and again it expresses the intention clearer. The left outer join to unanswered still makes sense because you want to include the count of answered as well as unanswered posts. (Personally, I write left join rather than left outer join - less typing, totally equivalent).-
I'm not a huge fan of your layout and indentation style. Wholly subjective (and I don't know if I'm in the minority or you are) but I usually start each new query clause (
select, from, joins, where, etc) on a new line, and leave a blank line between each clause.-
I recommend aliasing all tables/rowsets for clarity, e.g.
...from users as U..., and referring to the alias e.g. select U.Id.... So long as you choose sensible aliases (never just A, B, C etc) this is clear and more compact. -
Your final ordering, by
UnansweredPct, doesn't guarantee the same order between re-runs (as two users could and do have the same value in that column). I'd order by more columns to ensure consistent ordering - it isn't the end of the world in this particular query, but in some cases consistency can be really important, so it's a good habit to have.-
I am not a fan of using
top in any "long term" SQL queries (in the context of my day job, any query which will be used in production rather than just ad-hoc code). Whether you consider your query to be the former or the latter is up to you ;-) My "production" approach would be to add a Row_Number() function, ordering by UnansweredPct (and other columns, see above), then restrict `where RowNoContext
StackExchange Code Review Q#96563, answer score: 3
Revisions (0)
No revisions yet.