gotchasqlMinor
SEDE Query to Find... Weird Question/Protect Combinations
Viewed 0 times
questiocombinationsweirdqueryfindsedeprotect
Problem
I was interested in finding some of the weirdest question/protection combinations, basically out of curiosity.
I heuristically defined this as:
I'm... let's just say not an expert in SQL. Though this query works, it feels super inefficient (especially the 2x left join, it seems plausible that could be a single join).
Looking basically for any feedback. I used this site to format things, so hopefully it's more readable than the mess it was before I started formatting it.
You can play with it here if your curiosity is also too strong to resist ;)
```
DECLARE @postViews int
SET @postViews = ##postView##
SELECT [Post Link] = ph.PostId,
[User Link] = ph.UserId,
Protected.ProtectCount,
Unprotected.UnprotectedCount
FROM PostHistory ph
JOIN Posts p ON ph.PostID=p.id
LEFT JOIN
(SELECT count(UserID) AS ProtectCount,
PostID
FROM PostHistory ph
WHERE ph.PostHistoryTypeID=19
GROUP BY PostID) AS Protected ON Protected.PostID=ph.PostID
LEFT JOIN
(SELECT count(ph.UserID) AS UnprotectedCount,
PostID
FROM PostHistory ph
WHERE ph.PostHistoryTypeID=20
GROUP BY PostID) AS Unprotected ON Unprotected.PostID=ph.PostID
WHERE ph.UserID IN
(SELECT top ##NumbUsers## ph.UserId
FROM PostHistory ph
JOIN Posts p ON ph.PostID=p.id
WHERE ph.PostHistoryTypeId = 19
AND p.ViewCount < @postViews
/ exclude former moderators /
AND ph.UserID NOT IN (419,
1228,
1288,
19679,
23354,
50049,
59303,
102937,
106224,
246246,
-1)
GROUP BY ph.UserId
ORDER BY COUNT (ph.UserId) DESC)
AND ph.PostHistoryTypeId = 19
AND
I heuristically defined this as:
- Low view counts
- Not a moderator/SE employee
- From users who protect multiple questions
I'm... let's just say not an expert in SQL. Though this query works, it feels super inefficient (especially the 2x left join, it seems plausible that could be a single join).
Looking basically for any feedback. I used this site to format things, so hopefully it's more readable than the mess it was before I started formatting it.
You can play with it here if your curiosity is also too strong to resist ;)
```
DECLARE @postViews int
SET @postViews = ##postView##
SELECT [Post Link] = ph.PostId,
[User Link] = ph.UserId,
Protected.ProtectCount,
Unprotected.UnprotectedCount
FROM PostHistory ph
JOIN Posts p ON ph.PostID=p.id
LEFT JOIN
(SELECT count(UserID) AS ProtectCount,
PostID
FROM PostHistory ph
WHERE ph.PostHistoryTypeID=19
GROUP BY PostID) AS Protected ON Protected.PostID=ph.PostID
LEFT JOIN
(SELECT count(ph.UserID) AS UnprotectedCount,
PostID
FROM PostHistory ph
WHERE ph.PostHistoryTypeID=20
GROUP BY PostID) AS Unprotected ON Unprotected.PostID=ph.PostID
WHERE ph.UserID IN
(SELECT top ##NumbUsers## ph.UserId
FROM PostHistory ph
JOIN Posts p ON ph.PostID=p.id
WHERE ph.PostHistoryTypeId = 19
AND p.ViewCount < @postViews
/ exclude former moderators /
AND ph.UserID NOT IN (419,
1228,
1288,
19679,
23354,
50049,
59303,
102937,
106224,
246246,
-1)
GROUP BY ph.UserId
ORDER BY COUNT (ph.UserId) DESC)
AND ph.PostHistoryTypeId = 19
AND
Solution
One of the first things you should do if you have an arbitrary data set (as you do in your list of
Then you can just do an existence check in your query:
Please get into the habit of using the (optional) keyword
This bracketing style is a bit unusual:
It looks like LISP-style brackets. Most often in SQL either C# or Java style brackets are used (depending on the programming shop).
C#-style
Java-style
What does this number mean?
No easy way to tell. Consider something like:
Then it's easier to follow:
SEDE magic
You can get this valueat the top, or assign a default value for newbies like me:
The other one is a script variable so it can't be manipulated as easily, but can still be assigned a default value (and documented):
One more thing,
The
Everything combined (demo on SEDE)
UserIds) would be to extract it out of the WHERE clause. Just make a @TempTable variable. If the data set gets very large, consider a "physical" #TempTable.DECLARE @UsersToExclude TABLE (UserId INT PRIMARY KEY);
INSERT INTO @UsersToExclude (UserId)
VALUES
/*Some comments to explain these numbers would be good:*/
(419),
(1228),
(1288),
(19679),
(23354),
(50049),
(59303),
(102937),
(106224),
(246246),
(-1);Then you can just do an existence check in your query:
AND ph.UserID NOT IN (
SELECT usrExcl.UserID from @UsersToExclude as usrExcl
)Please get into the habit of using the (optional) keyword
AS to reference to table aliases. If not used it can be ambiguous whether it's supposed to be an alias or a query hint for something else, like NOLOCK.FROM PostHistory AS ph
JOIN Posts AS p ON ph.PostID = p.idThis bracketing style is a bit unusual:
LEFT JOIN
( SELECT foo
FROM bar
WHERE ...) AS Unprotected ON Unprotected.PostID=ph.PostIDIt looks like LISP-style brackets. Most often in SQL either C# or Java style brackets are used (depending on the programming shop).
C#-style
LEFT JOIN
(
SELECT foo
FROM bar
WHERE ...
)
AS Unprotected ON Unprotected.PostID=ph.PostIDJava-style
LEFT JOIN (
SELECT foo
FROM bar
WHERE ...
) AS Unprotected ON Unprotected.PostID=ph.PostIDWhat does this number mean?
WHERE ph.PostHistoryTypeID=19No easy way to tell. Consider something like:
DECLARE @QuestionProtected INT = (SELECT Id FROM PostHistoryTypes WHERE Name = 'Question Protected');Then it's easier to follow:
WHERE ph.PostHistoryTypeID = @QuestionProtectedSEDE magic
You can get this valueat the top, or assign a default value for newbies like me:
-- postViews: Number of post views
DECLARE @postViews INT = ##postViews:int?10000##;The other one is a script variable so it can't be manipulated as easily, but can still be assigned a default value (and documented):
WHERE
ph.UserID IN (
-- NumbUsers: Max number of users
SELECT TOP ##NumbUsers:int?5000##One more thing,
AND isnull(Protected.ProtectCount,0) != isnull(Unprotected.UnprotectedCount,0)The
!= operator is not SQL standard. Instead, use <>. Also, consider using COALESCE() instead of ISNULL() for some of the reasons listed here. Performance is right about the same, but it handles types better, usually (and can take more than 2 arguments, if needed).Everything combined (demo on SEDE)
-- postViews: Number of post views
DECLARE @postViews INT = ##postViews:int?10000##;
DECLARE @UsersToExclude TABLE (UserId INT PRIMARY KEY);
DECLARE @QuestionProtected INT = (SELECT Id FROM PostHistoryTypes WHERE Name = 'Question Protected');
INSERT INTO
@UsersToExclude (UserId)
VALUES
/*Some comments to explain these numbers would be good:*/
(419),
(1228),
(1288),
(19679),
(23354),
(50049),
(59303),
(102937),
(106224),
(246246),
(-1);
SELECT
[Post Link] = ph.PostId,
[User Link] = ph.UserId,
Protected.ProtectCount,
Unprotected.UnprotectedCount,
p.ViewCount
FROM
PostHistory AS ph
INNER JOIN
Posts AS p ON ph.PostID = p.id
LEFT JOIN (
SELECT
count(UserID) AS ProtectCount,
PostID
FROM PostHistory ph
WHERE ph.PostHistoryTypeID = @QuestionProtected
GROUP BY PostID
) AS Protected ON Protected.PostID = ph.PostID
LEFT JOIN (
SELECT
count(ph.UserID) AS UnprotectedCount,
PostID
FROM PostHistory ph
WHERE ph.PostHistoryTypeID=20
GROUP BY PostID
) AS Unprotected ON Unprotected.PostID = ph.PostID
WHERE
ph.UserID IN (
-- NumbUsers: Max number of users
SELECT TOP ##NumbUsers:int?5000##
ph.UserId
FROM PostHistory ph
JOIN Posts p ON ph.PostID=p.id
WHERE ph.PostHistoryTypeId = @QuestionProtected
AND p.ViewCount COALESCE(Unprotected.UnprotectedCount,0)
ORDER BY ph.UserId ASC;Code Snippets
DECLARE @UsersToExclude TABLE (UserId INT PRIMARY KEY);
INSERT INTO @UsersToExclude (UserId)
VALUES
/*Some comments to explain these numbers would be good:*/
(419),
(1228),
(1288),
(19679),
(23354),
(50049),
(59303),
(102937),
(106224),
(246246),
(-1);AND ph.UserID NOT IN (
SELECT usrExcl.UserID from @UsersToExclude as usrExcl
)FROM PostHistory AS ph
JOIN Posts AS p ON ph.PostID = p.idLEFT JOIN
( SELECT foo
FROM bar
WHERE ...) AS Unprotected ON Unprotected.PostID=ph.PostIDLEFT JOIN
(
SELECT foo
FROM bar
WHERE ...
)
AS Unprotected ON Unprotected.PostID=ph.PostIDContext
StackExchange Code Review Q#118203, answer score: 4
Revisions (0)
No revisions yet.