patternsqlMinor
Query for your Answer Progress
Viewed 0 times
answeryourqueryprogressfor
Problem
I was inspired to write a query that produced a graph mapping the perceived quality of your Stack Exchange answers over time.... in other words, "Are your answers getting better, or worse?"
Because answer scores are very 'spiky', it seems natural to calculate a sliding average with a configurable window, to show trends.
The result I have is:
And this is the SEDE query.
I am particularly interested to know whether the rolling average can be computed in a more SQL Server friendly way. There does not appear to be a performance problem, but the CTE and self join seem.... ugly.
Any other observations?
Because answer scores are very 'spiky', it seems natural to calculate a sliding average with a configurable window, to show trends.
The result I have is:
And this is the SEDE query.
with Rolling as (
Select Rank() over (order by CreationDate Asc) as Answer,
CreationDate,
Convert(float, Score) as Score
from Posts
where PostTypeId = 2
and OwnerUserId = ##UserId:int##
)
select Master.CreationDate, Master.Score as [Score], Avg(Slave.Score) as [RollingAvg]
from Rolling as Master,
Rolling as Slave
where Slave.Answer > Master.Answer - ##RollingAvg:int?10##
and Slave.Answer <= Master.Answer
group by Master.CreationDate, Master.Score
order by Master.CreationDateI am particularly interested to know whether the rolling average can be computed in a more SQL Server friendly way. There does not appear to be a performance problem, but the CTE and self join seem.... ugly.
Any other observations?
Solution
There is a simple way to calculate rolling averages in SQL, with a caveat.
The problem with that is that a window that includes up to n preceding rows is one row larger than the window in your original query. Unfortunately, SQL Server requires the window frame bound to be specified using an unsigned integer literal — it cannot be an expression.
That turns out to be the most annoying off-by-one error to fix:
It's arguable that your original formulation, calculating the window average the "hard" way, is better than constructing the SQL dynamically to be
Of course, if you are willing to redefine the meaning of the window size parameter, then the window aggregate function would be the way to go.
SELECT CreationDate
, Score
, avg(CAST(Score AS FLOAT)) OVER (
ORDER BY CreationDate
ROWS BETWEEN ##RollingAvg:INT?10## PRECEDING AND CURRENT ROW
) AS RollingAvg
FROM Posts
WHERE PostTypeId = 2 AND OwnerUserId = ##UserId:INT##;The problem with that is that a window that includes up to n preceding rows is one row larger than the window in your original query. Unfortunately, SQL Server requires the window frame bound to be specified using an unsigned integer literal — it cannot be an expression.
That turns out to be the most annoying off-by-one error to fix:
DECLARE @UserId AS INT = ##UserId##;
DECLARE @RollingAvg AS INT = ##RollingAvg:INT?10##;
DECLARE @Sql AS NVARCHAR(1000) = 'SELECT CreationDate
, Score
, avg(CAST(Score AS FLOAT)) OVER (
ORDER BY CreationDate
ROWS BETWEEN ' + CAST(@RollingAvg - 1 AS VARCHAR) + ' PRECEDING AND CURRENT ROW
) AS RollingAvg
FROM Posts
WHERE PostTypeId = 2 AND OwnerUserId = ' + CAST(@UserId AS VARCHAR);
EXEC sp_executesql @Sql;It's arguable that your original formulation, calculating the window average the "hard" way, is better than constructing the SQL dynamically to be
EXECuted.Of course, if you are willing to redefine the meaning of the window size parameter, then the window aggregate function would be the way to go.
Code Snippets
SELECT CreationDate
, Score
, avg(CAST(Score AS FLOAT)) OVER (
ORDER BY CreationDate
ROWS BETWEEN ##RollingAvg:INT?10## PRECEDING AND CURRENT ROW
) AS RollingAvg
FROM Posts
WHERE PostTypeId = 2 AND OwnerUserId = ##UserId:INT##;DECLARE @UserId AS INT = ##UserId##;
DECLARE @RollingAvg AS INT = ##RollingAvg:INT?10##;
DECLARE @Sql AS NVARCHAR(1000) = 'SELECT CreationDate
, Score
, avg(CAST(Score AS FLOAT)) OVER (
ORDER BY CreationDate
ROWS BETWEEN ' + CAST(@RollingAvg - 1 AS VARCHAR) + ' PRECEDING AND CURRENT ROW
) AS RollingAvg
FROM Posts
WHERE PostTypeId = 2 AND OwnerUserId = ' + CAST(@UserId AS VARCHAR);
EXEC sp_executesql @Sql;Context
StackExchange Code Review Q#61979, answer score: 5
Revisions (0)
No revisions yet.