HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Calculate median of values in T-SQL

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
valuesmediancalculatesql

Problem

I've got a piece of code (stored procedure) written in T-SQL that inserts into a table several data from a temporal table as shown here:

INSERT INTO [myschema].[CalculatedData](
    Year, 
    Month,
    Line,
    Car,
    Service,
    Route,
    Trip,
    Stop,
    Qty,
    CalculatedMean
  ) SELECT
    Year,
    Month,
    Line,
    Car,
    Service,
    Route,
    Trip,
    Stop,
    COUNT(*),
    AVG(Duration),  
  FROM
    @TableToCalculate
  GROUP BY
    Year,
    Month,
    Line,
    Car,
    Service,
    Route,
    Trip,
    Stop;


As you can see, I calculate the average (mean) of all the values. Easy, considering the built-in function.

Now, what I want to do is to calculate the median. There isn't any function to calculate it, but as the median is equal to calculating the 50th percentile, this can be used. But I don't know how to do it.

Any help, please?

Solution

You can find a survey and performance comparison of the main SQL Server methods in Aaron Bertrand's articles Best approaches for grouped median and What is the fastest way to calculate the median?

With suitable indexing, the method proposed by Peter Larsson often performs best, though the logic is a little trickier to follow. The following example is quoted from Aaron's first article, using a sales data set:

SELECT d.SalesPerson, w.Median
FROM
(
 SELECT SalesPerson, COUNT(*) AS y
 FROM dbo.Sales
 GROUP BY SalesPerson
) AS d
CROSS APPLY
(
 SELECT AVG(0E + Amount)
 FROM
 (
   SELECT z.Amount
    FROM dbo.Sales AS z
    WHERE z.SalesPerson = d.SalesPerson
    ORDER BY z.Amount
    OFFSET (d.y - 1) / 2 ROWS
    FETCH NEXT 2 - d.y % 2 ROWS ONLY
 ) AS f
) AS w(Median);


You might also be interested in reading my article about Calculating the Median with a Dynamic Cursor, which performs surprisingly well. I also have Improving the Row Numbering Median Solution showing how to improve the performance of Peter's solution further.

Related Q & A here:

  • Using Percentile_Disc to compute Median/Interquartiles

Code Snippets

SELECT d.SalesPerson, w.Median
FROM
(
 SELECT SalesPerson, COUNT(*) AS y
 FROM dbo.Sales
 GROUP BY SalesPerson
) AS d
CROSS APPLY
(
 SELECT AVG(0E + Amount)
 FROM
 (
   SELECT z.Amount
    FROM dbo.Sales AS z
    WHERE z.SalesPerson = d.SalesPerson
    ORDER BY z.Amount
    OFFSET (d.y - 1) / 2 ROWS
    FETCH NEXT 2 - d.y % 2 ROWS ONLY
 ) AS f
) AS w(Median);

Context

StackExchange Database Administrators Q#314075, answer score: 6

Revisions (0)

No revisions yet.