HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Get 10th and 90th percentile by customer

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
percentilegetand90th10thcustomer

Problem

I have a table which contains customers and scores (based on different factors, irrelevant in this case; a customer can have multiple scores), which looks like this:

customer_id | score | score_giver_id
====================================
          1 | 100   | 1
          1 | 102   | 1
          1 | 101   | 1
          1 | 140   | 1
          2 | 131   | 3
          1 | 44    | 1
          3 | 223   | 1
          3 | 1     | 2
          3 | 201   | 1
          3 | 211   | 1
          3 | 231   | 1
          3 | 243   | 1


The score_giver_id is irrelevant, but I'd still like to fetch it.

In the example above, when getting the 50th percentile, grouped by customer_id, the result should be (I picked the 50th percentile in this example, because it illustrates what I want to do better):

customer_id | score | score_giver_id
====================================
          1 | 101   | 1
          2 | 131   | 3
          3 | 223   | 1


I used the method described here.

I need to get the value that is at the 10th percentile, respectively at the 90th percentile in PostgreSQL. I've seen that since 9.4 there is an ntile function, but I don't really understand how it works, what it does, and if it helps me.

I've found a nice snippet for MySQL, which works (even though there are some caveats), but I'd like to use built-in functions if available (for MySQL there are none, hence the snippet).

Solution

It seems you are after the percentile_disc() ordered-set aggregate function.

The documentation says the following about it:

percentile_disc(fraction) WITHIN GROUP (ORDER BY sort_expression)

discrete percentile: returns the first input value whose position in the ordering equals or exceeds the specified fraction

The syntax is a bit strange for an aggregate, but using it is easy:

SELECT percentile_disc(0.9) WITHIN GROUP (ORDER BY score)
  FROM customer_score
 GROUP BY customer_id;


You define the column from which to take the percentile in the ORDER BY clause.

Code Snippets

SELECT percentile_disc(0.9) WITHIN GROUP (ORDER BY score)
  FROM customer_score
 GROUP BY customer_id;

Context

StackExchange Database Administrators Q#106440, answer score: 22

Revisions (0)

No revisions yet.