HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Top users under age 30, sorted by age, with rank

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
agetoprankwithundersortedusers

Problem

In the pursuit of SQL enlightenment, I wrote a quick SEDE query to display Rep, Age and Overall Rank.

Sorted by Age, then Rank (Which is generated by Rep).

SELECT TOP 150 
ROW_NUMBER() OVER (ORDER BY Users.Reputation DESC) AS Rank,
  Users.Id as [User Link], 
  Users.Age,
  Users.Reputation as Rep
FROM Users
WHERE Users.Age IS NOT NULL AND Users.Reputation > 150 AND Users.Age <= 30
ORDER BY Users.Age ASC, Rank ASC;


I attempted formatting, but it feels wrong, and my (possibly) overly long WHERE statement seems incorrectly formatted as well.

What is a more optimal way to write this, if one does exist?

I've had parameters mentioned as an idea on improvement (age range and selection limitation).

Solution

A few things, some nitpicks, some UX, some probably-bugs. What Lyle's Mug has already stated is also part of my answer, but first things first:

UX

This query is quite unwieldy to use, because all the things you might want to play with are hardcoded.

SEDE allows using parameters, with a rather simple syntax:

##name:type[?if optional, default value]##


replacing all your magic numbers gets us to:

SELECT TOP ##x:int?150##
 -- ...
WHERE User.Age IS NOT NULL AND Users.Reputation > ##minrep:int?150## AND User.Age <= ##age:int?30##


Nitpicks:

Note that this already stretches out that single-line WHERE quite much. I really like queries to state WHERE conditions on separate lines, because that makes it easier to process them.

Additionally I strongly recommend a lightly different comma-placement when listing columns, namely the comma before the declaration.

This has the advantage of allowing removals without running into syntax errors each time.

Probably-Bugs:

-
The ordering you Limit By is ... skewed? strange? ... well you select the youngest users and limit according to age, when the purpose of such a query is most probably reputation based stats.

-
Off-By-One: WHERE Rep > 150 excludes users with 150 rep!

Then again it seems that the where clause about rep is moot anyways...

If you remove it, there's a way to cut away large parts of "inactive" userbase depending on reputation. Usually when querying that table, I explicitly exclude users with 1 and 101 rep. These two values are a strong indicator for non-activity.

My final version:

SELECT TOP ##x:int?150##
  ROW_NUMBER() OVER (ORDER BY Users.Reputation DESC) AS Rank
  , Users.Id as [User Link]
  , Users.Age
  , Users.Reputation as Rep
FROM Users
WHERE Users.Age <= ##age:int?30##
ORDER BY Rank ASC;


(also available on SEDE)

Code Snippets

##name:type[?if optional, default value]##
SELECT TOP ##x:int?150##
 -- ...
WHERE User.Age IS NOT NULL AND Users.Reputation > ##minrep:int?150## AND User.Age <= ##age:int?30##
SELECT TOP ##x:int?150##
  ROW_NUMBER() OVER (ORDER BY Users.Reputation DESC) AS Rank
  , Users.Id as [User Link]
  , Users.Age
  , Users.Reputation as Rep
FROM Users
WHERE Users.Age <= ##age:int?30##
ORDER BY Rank ASC;

Context

StackExchange Code Review Q#93317, answer score: 16

Revisions (0)

No revisions yet.