HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Query for your Answer Progress

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
answeryourqueryprogressfor

Problem

I was inspired to write a query that produced a graph mapping the perceived quality of your Stack Exchange answers over time.... in other words, "Are your answers getting better, or worse?"

Because answer scores are very 'spiky', it seems natural to calculate a sliding average with a configurable window, to show trends.

The result I have is:

And this is the SEDE query.

with Rolling as (
    Select Rank() over (order by CreationDate Asc) as Answer,
           CreationDate,
           Convert(float, Score) as Score
    from Posts
    where PostTypeId = 2
      and OwnerUserId = ##UserId:int##
)
select Master.CreationDate, Master.Score as [Score], Avg(Slave.Score) as [RollingAvg]
from Rolling as Master,
     Rolling as Slave
where Slave.Answer  > Master.Answer - ##RollingAvg:int?10##
  and Slave.Answer <= Master.Answer
group by Master.CreationDate, Master.Score
order by Master.CreationDate


I am particularly interested to know whether the rolling average can be computed in a more SQL Server friendly way. There does not appear to be a performance problem, but the CTE and self join seem.... ugly.

Any other observations?

Solution

There is a simple way to calculate rolling averages in SQL, with a caveat.

SELECT CreationDate
     , Score
     , avg(CAST(Score AS FLOAT)) OVER (
         ORDER BY CreationDate
         ROWS BETWEEN ##RollingAvg:INT?10## PRECEDING AND CURRENT ROW
       ) AS RollingAvg
    FROM Posts
    WHERE PostTypeId = 2 AND OwnerUserId = ##UserId:INT##;


The problem with that is that a window that includes up to n preceding rows is one row larger than the window in your original query. Unfortunately, SQL Server requires the window frame bound to be specified using an unsigned integer literal — it cannot be an expression.

That turns out to be the most annoying off-by-one error to fix:

DECLARE @UserId AS INT = ##UserId##;
DECLARE @RollingAvg AS INT = ##RollingAvg:INT?10##;

DECLARE @Sql AS NVARCHAR(1000) = 'SELECT CreationDate
     , Score
     , avg(CAST(Score AS FLOAT)) OVER (
         ORDER BY CreationDate
         ROWS BETWEEN ' + CAST(@RollingAvg - 1 AS VARCHAR) + ' PRECEDING AND CURRENT ROW
       ) AS RollingAvg
    FROM Posts
    WHERE PostTypeId = 2 AND OwnerUserId = ' + CAST(@UserId AS VARCHAR);

EXEC sp_executesql @Sql;


It's arguable that your original formulation, calculating the window average the "hard" way, is better than constructing the SQL dynamically to be EXECuted.

Of course, if you are willing to redefine the meaning of the window size parameter, then the window aggregate function would be the way to go.

Code Snippets

SELECT CreationDate
     , Score
     , avg(CAST(Score AS FLOAT)) OVER (
         ORDER BY CreationDate
         ROWS BETWEEN ##RollingAvg:INT?10## PRECEDING AND CURRENT ROW
       ) AS RollingAvg
    FROM Posts
    WHERE PostTypeId = 2 AND OwnerUserId = ##UserId:INT##;
DECLARE @UserId AS INT = ##UserId##;
DECLARE @RollingAvg AS INT = ##RollingAvg:INT?10##;

DECLARE @Sql AS NVARCHAR(1000) = 'SELECT CreationDate
     , Score
     , avg(CAST(Score AS FLOAT)) OVER (
         ORDER BY CreationDate
         ROWS BETWEEN ' + CAST(@RollingAvg - 1 AS VARCHAR) + ' PRECEDING AND CURRENT ROW
       ) AS RollingAvg
    FROM Posts
    WHERE PostTypeId = 2 AND OwnerUserId = ' + CAST(@UserId AS VARCHAR);

EXEC sp_executesql @Sql;

Context

StackExchange Code Review Q#61979, answer score: 5

Revisions (0)

No revisions yet.