HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

PERCENT_RANK does not distribute over 100

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
distribute100doesnotoverpercent_rank

Problem

I'm trying to find out the position of each given record of a query in 0-100 scale. I use PERCENT_RANK ranking function this way:

select Term, Frequency, percent_rank() over (order by Frequency desc) * 100
from Words


But when I look at the results, instead of seeing a column that starts from 0 and goes up to 100, I see a column that starts from 0 and goes up to 37.xxxx.

Though BOL does not explicitly mention that the result is distributed over 0-100 scale, my understanding from the word percent made me use this ranking function.

What do I miss here?

Solution

This will happen if you have ties for the lowest frequency


The formula for figuring out the percentile rank is the following (rk
-1)/(rn -1) where rk equals the rank of the value and rn equals the count of the items.

The below shows an example along with a calculated column showing how the PERCENT_RANK is calculated.

As the highest RANK is 3 when ordered descending and there are 7 rows. The (rk -1)/(rn -1) is (3-1)/(7 -1) = 2/6 = 33.3%

SELECT *, 
        RANK() OVER (order by Frequency  desc) AS  Rank1,
        1e2 * (RANK() OVER (order by Frequency  desc) - 1)/(count(*) over() - 1) AS Calc1,
        100 * PERCENT_RANK() OVER (order by Frequency  desc) AS Percent_Rank1, 

        RANK() OVER (order by Frequency  asc) AS  Rank2,
        1e2 * (RANK() OVER (order by Frequency  ASC) - 1)/(count(*) over() - 1) AS  Calc2,
        100 * PERCENT_RANK() OVER (order by Frequency ) AS  Percent_Rank2
from (VALUES (1),(1),(1),(1),(1),(2),(3))V(Frequency )


+-----------+-------+------------------+------------------+-------+------------------+------------------+
| Frequency | Rank1 |      Calc1       |  Percent_Rank1   | Rank2 |      Calc2       |  Percent_Rank2   |
+-----------+-------+------------------+------------------+-------+------------------+------------------+
|         1 |     3 | 33.3333333333333 | 33.3333333333333 |     1 |                0 |                0 |
|         1 |     3 | 33.3333333333333 | 33.3333333333333 |     1 |                0 |                0 |
|         1 |     3 | 33.3333333333333 | 33.3333333333333 |     1 |                0 |                0 |
|         1 |     3 | 33.3333333333333 | 33.3333333333333 |     1 |                0 |                0 |
|         1 |     3 | 33.3333333333333 | 33.3333333333333 |     1 |                0 |                0 |
|         2 |     2 | 16.6666666666667 | 16.6666666666667 |     6 | 83.3333333333333 | 83.3333333333333 |
|         3 |     1 |                0 |                0 |     7 |              100 |              100 |
+-----------+-------+------------------+------------------+-------+------------------+------------------+

Code Snippets

SELECT *, 
        RANK() OVER (order by Frequency  desc) AS  Rank1,
        1e2 * (RANK() OVER (order by Frequency  desc) - 1)/(count(*) over() - 1) AS Calc1,
        100 * PERCENT_RANK() OVER (order by Frequency  desc) AS Percent_Rank1, 

        RANK() OVER (order by Frequency  asc) AS  Rank2,
        1e2 * (RANK() OVER (order by Frequency  ASC) - 1)/(count(*) over() - 1) AS  Calc2,
        100 * PERCENT_RANK() OVER (order by Frequency ) AS  Percent_Rank2
from (VALUES (1),(1),(1),(1),(1),(2),(3))V(Frequency )
+-----------+-------+------------------+------------------+-------+------------------+------------------+
| Frequency | Rank1 |      Calc1       |  Percent_Rank1   | Rank2 |      Calc2       |  Percent_Rank2   |
+-----------+-------+------------------+------------------+-------+------------------+------------------+
|         1 |     3 | 33.3333333333333 | 33.3333333333333 |     1 |                0 |                0 |
|         1 |     3 | 33.3333333333333 | 33.3333333333333 |     1 |                0 |                0 |
|         1 |     3 | 33.3333333333333 | 33.3333333333333 |     1 |                0 |                0 |
|         1 |     3 | 33.3333333333333 | 33.3333333333333 |     1 |                0 |                0 |
|         1 |     3 | 33.3333333333333 | 33.3333333333333 |     1 |                0 |                0 |
|         2 |     2 | 16.6666666666667 | 16.6666666666667 |     6 | 83.3333333333333 | 83.3333333333333 |
|         3 |     1 |                0 |                0 |     7 |              100 |              100 |
+-----------+-------+------------------+------------------+-------+------------------+------------------+

Context

StackExchange Database Administrators Q#144009, answer score: 9

Revisions (0)

No revisions yet.