HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Calculating Standard Deviation using SQL

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
deviationsqlstandardcalculatingusing

Problem

I have been reading a bit about statistics to improve the way I understand data. This is my attempt at calculating Standard Variance using data tables. I solved this simple question:

This is the code I used (I used SEDE to execute it):

CREATE TABLE #Numbers 
( 
  Number DECIMAL(10,4),
  Variance DECIMAL(10,4)
);
GO
INSERT INTO #Numbers (Number) VALUES
    (23),
    (37),
    (45),
    (49),
    (56),
    (63),
    (63),
    (70),
    (72),
    (82);
GO

DECLARE @Mean DECIMAL(10,4);
SET @Mean = (SELECT SUM (Number) / COUNT(Number) FROM #Numbers);
UPDATE #Numbers 
SET Variance = POWER( (@Mean - Number), 2);
DECLARE @StdVariance DECIMAL(10,4);
SET @StdVariance = (SELECT SQRT( SUM(Variance) / COUNT(Number) ) FROM #Numbers);
SELECT 
  @Mean AS [Mean],
  @StdVariance AS [Standard Variance];


Output:

(10 row(s) affected)

Mean    Standard Variance 
------- ----------------- 
56.0000 16.8701  

(10 row(s) affected)
(1 row(s) affected)


I'm less familiar with SQL Server than other RDBMS, so I want to make sure my use of temp tables and variables is done the "Transact-SQL Way". Is there a cleaner, smarter way to do this?

Solution

Assuming you want to reinvent the wheel…

You really shouldn't be using a data table as scratch space for calculations, even if it is a temporary table. You should also not need to rely on variables either — that is not idiomatic SQL. A Common Table Expression would solve both problems.

CREATE TABLE #Numbers
( 
  Number DECIMAL(10,4)
);

INSERT …;

WITH Mean AS (
    SELECT SUM(Number) / COUNT(Number) AS Mean
        FROM #Numbers
), Deviation AS (
    SELECT Mean, POWER(Number - Mean, 2) AS Error
        FROM #Numbers CROSS JOIN Mean
)
SELECT Mean, SQRT(SUM(Error) / COUNT(Error)) AS [Standard Deviation]
    FROM Deviation
    GROUP BY Mean;

Code Snippets

CREATE TABLE #Numbers
( 
  Number DECIMAL(10,4)
);

INSERT …;

WITH Mean AS (
    SELECT SUM(Number) / COUNT(Number) AS Mean
        FROM #Numbers
), Deviation AS (
    SELECT Mean, POWER(Number - Mean, 2) AS Error
        FROM #Numbers CROSS JOIN Mean
)
SELECT Mean, SQRT(SUM(Error) / COUNT(Error)) AS [Standard Deviation]
    FROM Deviation
    GROUP BY Mean;

Context

StackExchange Code Review Q#68788, answer score: 3

Revisions (0)

No revisions yet.