patternsqlMinor
Select row at the nth percentile
Viewed 0 times
nththepercentileselectrow
Problem
I have two tables,
I have a simple query that performs a join on the two tables and returns a set of rows:
I need to find the nth percentile row from the returned set of rows, lets say I need to find 90%, 99% and 99.9% percentile row from the data.
I need to display the data in a form like this:
This is my first encounter to PostgreSQL. I am confused as to how should I proceed.
I was having a look at
table1 and table2. Let the two tables contain date, id and latency column.I have a simple query that performs a join on the two tables and returns a set of rows:
Select table1.date,(table2.latency - table1.latency) as ans from table1, table2
where table1.id = table2.id order by ans;I need to find the nth percentile row from the returned set of rows, lets say I need to find 90%, 99% and 99.9% percentile row from the data.
I need to display the data in a form like this:
date | percentile | ans
01-12-1995 | 90 | 0.001563
02-12-1999 | 99 | 0.0015
05-12-2000 | 99.9 | 0.012This is my first encounter to PostgreSQL. I am confused as to how should I proceed.
I was having a look at
PERCENT_RANK() function. Please guide me in the correct direction.Solution
Use the window function
Then select the segments you are interested in (corresponding to percentiles) and pick the row with the lowest value from it:
The Postgres-specific
Select first row in each GROUP BY group?
To get the
This algorithm picks the row at or above the exact value. You can add a line to the subquery with
Aside: I replaced the column name
ntile() in a subquery (requires Postgres 8.4 or later).Then select the segments you are interested in (corresponding to percentiles) and pick the row with the lowest value from it:
SELECT DISTINCT ON (segment)
the_date, to_char((segment - 1)/ 10.0, '99.9') AS percentile, ans
FROM (
SELECT t1.the_date
,ntile(1000) OVER (ORDER BY (t2.latency - t1.latency)) AS segment
,(t2.latency - t1.latency) AS ans
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id
) sub
WHERE segment IN (601, 901, 991, 1000)
ORDER BY segment, ans;The Postgres-specific
DISTINCT ON comes in handy for the last step. Detailed explanation in this related answer on SO:Select first row in each GROUP BY group?
To get the
90, 99 and 99.9 percentile I picked the matching granularity with ntile(1000). And added a 60 percentile as per comment.This algorithm picks the row at or above the exact value. You can add a line to the subquery with
percent_rank() to get the exact relative rank of the select row in addition:percent_rank() OVER (ORDER BY (t2.latency - t1.latency)) AS pct_rankAside: I replaced the column name
date with the_date since I am in the habbit of avoiding reserved SQL key words as identifiers, even if Postgres would permit them.Code Snippets
SELECT DISTINCT ON (segment)
the_date, to_char((segment - 1)/ 10.0, '99.9') AS percentile, ans
FROM (
SELECT t1.the_date
,ntile(1000) OVER (ORDER BY (t2.latency - t1.latency)) AS segment
,(t2.latency - t1.latency) AS ans
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id
) sub
WHERE segment IN (601, 901, 991, 1000)
ORDER BY segment, ans;percent_rank() OVER (ORDER BY (t2.latency - t1.latency)) AS pct_rankContext
StackExchange Database Administrators Q#55553, answer score: 5
Revisions (0)
No revisions yet.