patternsqlMinor
The opposite of CodeLess Questions query
Viewed 0 times
thequestionsquerycodelessopposite
Problem
What's the opposite of CodeLess Questions?
Perhaps CodeOnly Answers? (Also known as code dumps.)
Inspired by the opposite question, here's the opposite query:
', Body) + LEN('') = LEN(Body)
AND ClosedDate IS NULL
ORDER BY CreationDate
Checking a couple of the results, this successfully finds the code only answers, and pretty fast.
I'm wondering if there's a better, cleaner approach that the hack with the
Perhaps CodeOnly Answers? (Also known as code dumps.)
Inspired by the opposite question, here's the opposite query:
SELECT Id AS [Post Link], Score, CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
AND Body LIKE '%'
AND CHARINDEX('', Body) + LEN('') = LEN(Body)
AND ClosedDate IS NULL
ORDER BY CreationDate
Checking a couple of the results, this successfully finds the code only answers, and pretty fast.
I'm wondering if there's a better, cleaner approach that the hack with the
CHARINDEX.Solution
If your aim is to find code dumps, then you can do better.
Disclaimer: the example posts linked in the following paragraphs are code dump suspects. Maybe they are code dumps, maybe not. Judge for yourselves.
In particular,
answers that start with "I prefer:", or "Try:", or "Yes:",
and then followed by a code dump, won't be matched by the posted query,
but effectively they are still code dump suspects.
Similarly, answers that start with a code dump suspect and end with "etc...",
"and so on", or "?" are also code dump suspects.
Generally speaking, a code dump suspect looks like this:
What is too short? There cannot be rule for that. The longer the text, the more false positives, and the shorter the text, the more code dumps missed. It's probably a good idea to make the text length before and after parameters of the query.
In addition to adding these parameters to widen the search,
it's natural to include the text before and after in the query result,
to make it easier to judge code dump suspects at a glance.
Something like this:
', Body) - LEN('') AS TextLenAfterCode,
RIGHT(Body, LEN(Body) - CHARINDEX('', Body) - LEN('')) AS TextAfterCode,
CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
AND CHARINDEX('
', Body) + LEN('') >= LEN(Body) - @MaxTextLenAfterCode
AND ClosedDate IS NULL
ORDER BY Score DESC, CreationDate
Disclaimer: the example posts linked in the following paragraphs are code dump suspects. Maybe they are code dumps, maybe not. Judge for yourselves.
In particular,
answers that start with "I prefer:", or "Try:", or "Yes:",
and then followed by a code dump, won't be matched by the posted query,
but effectively they are still code dump suspects.
Similarly, answers that start with a code dump suspect and end with "etc...",
"and so on", or "?" are also code dump suspects.
Generally speaking, a code dump suspect looks like this:
- Contains mostly code
- Too short (or no) introductory text
- Too short (or no) closing text
What is too short? There cannot be rule for that. The longer the text, the more false positives, and the shorter the text, the more code dumps missed. It's probably a good idea to make the text length before and after parameters of the query.
In addition to adding these parameters to widen the search,
it's natural to include the text before and after in the query result,
to make it easier to judge code dump suspects at a glance.
Something like this:
DECLARE @MaxTextLenBeforeCode AS INT = ##MaxTextLenBeforeCode:int?0##;
DECLARE @MaxTextLenAfterCode AS INT = ##MaxTextLenAfterCode:int?0##;
SELECT
Id AS [Post Link],
Score,
CHARINDEX('', Body) - 1 AS TextLenBeforeCode,
LEFT(Body, CHARINDEX('', Body) - 1) AS TextBeforeCode,
LEN(Body) - CHARINDEX('', Body) - LEN('') AS TextLenAfterCode,
RIGHT(Body, LEN(Body) - CHARINDEX('', Body) - LEN('')) AS TextAfterCode,
CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
AND CHARINDEX('
', Body) > 0
AND CHARINDEX('', Body) - 1 <= @MaxTextLenBeforeCode
AND CHARINDEX('', Body) + LEN('') >= LEN(Body) - @MaxTextLenAfterCode
AND ClosedDate IS NULL
ORDER BY Score DESC, CreationDate
Code Snippets
DECLARE @MaxTextLenBeforeCode AS INT = ##MaxTextLenBeforeCode:int?0##;
DECLARE @MaxTextLenAfterCode AS INT = ##MaxTextLenAfterCode:int?0##;
SELECT
Id AS [Post Link],
Score,
CHARINDEX('<pre><code>', Body) - 1 AS TextLenBeforeCode,
LEFT(Body, CHARINDEX('<pre><code>', Body) - 1) AS TextBeforeCode,
LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>') AS TextLenAfterCode,
RIGHT(Body, LEN(Body) - CHARINDEX('</code></pre>', Body) - LEN('</code></pre>')) AS TextAfterCode,
CreationDate
FROM Posts
WHERE PostTypeId = 2 -- answers
AND CHARINDEX('<pre><code>', Body) > 0
AND CHARINDEX('<pre><code>', Body) - 1 <= @MaxTextLenBeforeCode
AND CHARINDEX('</code></pre>', Body) + LEN('</code></pre>') >= LEN(Body) - @MaxTextLenAfterCode
AND ClosedDate IS NULL
ORDER BY Score DESC, CreationDateContext
StackExchange Code Review Q#93138, answer score: 3
Revisions (0)
No revisions yet.