patternsqlMinor
Calculate median of values in T-SQL
Viewed 0 times
valuesmediancalculatesql
Problem
I've got a piece of code (stored procedure) written in T-SQL that inserts into a table several data from a temporal table as shown here:
As you can see, I calculate the average (mean) of all the values. Easy, considering the built-in function.
Now, what I want to do is to calculate the median. There isn't any function to calculate it, but as the median is equal to calculating the 50th percentile, this can be used. But I don't know how to do it.
Any help, please?
INSERT INTO [myschema].[CalculatedData](
Year,
Month,
Line,
Car,
Service,
Route,
Trip,
Stop,
Qty,
CalculatedMean
) SELECT
Year,
Month,
Line,
Car,
Service,
Route,
Trip,
Stop,
COUNT(*),
AVG(Duration),
FROM
@TableToCalculate
GROUP BY
Year,
Month,
Line,
Car,
Service,
Route,
Trip,
Stop;As you can see, I calculate the average (mean) of all the values. Easy, considering the built-in function.
Now, what I want to do is to calculate the median. There isn't any function to calculate it, but as the median is equal to calculating the 50th percentile, this can be used. But I don't know how to do it.
Any help, please?
Solution
You can find a survey and performance comparison of the main SQL Server methods in Aaron Bertrand's articles Best approaches for grouped median and What is the fastest way to calculate the median?
With suitable indexing, the method proposed by Peter Larsson often performs best, though the logic is a little trickier to follow. The following example is quoted from Aaron's first article, using a sales data set:
You might also be interested in reading my article about Calculating the Median with a Dynamic Cursor, which performs surprisingly well. I also have Improving the Row Numbering Median Solution showing how to improve the performance of Peter's solution further.
Related Q & A here:
With suitable indexing, the method proposed by Peter Larsson often performs best, though the logic is a little trickier to follow. The following example is quoted from Aaron's first article, using a sales data set:
SELECT d.SalesPerson, w.Median
FROM
(
SELECT SalesPerson, COUNT(*) AS y
FROM dbo.Sales
GROUP BY SalesPerson
) AS d
CROSS APPLY
(
SELECT AVG(0E + Amount)
FROM
(
SELECT z.Amount
FROM dbo.Sales AS z
WHERE z.SalesPerson = d.SalesPerson
ORDER BY z.Amount
OFFSET (d.y - 1) / 2 ROWS
FETCH NEXT 2 - d.y % 2 ROWS ONLY
) AS f
) AS w(Median);You might also be interested in reading my article about Calculating the Median with a Dynamic Cursor, which performs surprisingly well. I also have Improving the Row Numbering Median Solution showing how to improve the performance of Peter's solution further.
Related Q & A here:
- Using Percentile_Disc to compute Median/Interquartiles
Code Snippets
SELECT d.SalesPerson, w.Median
FROM
(
SELECT SalesPerson, COUNT(*) AS y
FROM dbo.Sales
GROUP BY SalesPerson
) AS d
CROSS APPLY
(
SELECT AVG(0E + Amount)
FROM
(
SELECT z.Amount
FROM dbo.Sales AS z
WHERE z.SalesPerson = d.SalesPerson
ORDER BY z.Amount
OFFSET (d.y - 1) / 2 ROWS
FETCH NEXT 2 - d.y % 2 ROWS ONLY
) AS f
) AS w(Median);Context
StackExchange Database Administrators Q#314075, answer score: 6
Revisions (0)
No revisions yet.