HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Bayesian ranking

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rankingbayesianstackoverflow

Problem

I have the following table in my database:

╔════════╦═════════════╦═══════════════╗
║  Name  ║ total_stars ║ total_reviews ║
╠════════╬═════════════╬═══════════════╣
║ Item A ║          27 ║             7 ║
║ Item B ║          36 ║             9 ║
║ Item C ║          27 ║             7 ║
║ Item D ║          30 ║             6 ║
║ Item E ║           0 ║             0 ║
║ Item F ║           0 ║             0 ║
║ Item F ║          15 ║             3 ║
╚════════╩═════════════╩═══════════════╝


I was looking at this article and trying to implement Bayesian rankings in PostgreSQL database.

The formula given for the rank is

br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / 
(avg_num_votes + this_num_votes)


where:

  • avg_num_votes: The average number of votes of all items that have num_votes>0



  • avg_rating: The average rating of each item (again, of those that have num_votes>0)



  • this_num_votes: number of votes for this item



  • this_rating: the rating of this item



This is the query I came up with, but it is not working:

with avg_num_votes as (
      select AVG(total_reviews) 
           from business 
           where total_reviews != 0), 
       avg_rating as (
          select AVG(total_stars/total_reviews) 
          from business 
          where total_reviews != 0)
select * from business 
  order by ((avg_num_votes * avg_rating) + (total_stars)) / (avg_num_votes + total_reviews);


I am getting:

ERROR:  column "avg_num_votes" does not exist

Solution

Here

with avg_num_votes as (
      select AVG(total_reviews) 
           from business 
           where total_reviews != 0),


avg_num_votes is the name of the table. The column itself gets a nice auto-generated name (in this case, most probably avg), which is clearly not what you want and refer below. You can give a name to the column two ways:

with avg_num_votes as (
      select AVG(total_reviews) AS avg_num_votes
      ...


or

with avg_num_votes(avg_num_votes) as (
      select AVG(total_reviews) 
      ...


Choose whch one you prefer. The two only differ in readability.

But this is only a part of the problem. In the main query you have to refer the CTEs constructed with WITH. In order to get meaningful results, you have to join then on a column present in all tables - so, if you choose Name, then you have to do something like

with avg_num_votes(avg_num_votes) as (
      select AVG(total_reviews), Name
      from ...
      GROUP BY Name
      ...
select b.* 
from business b
    JOIN avg_num_votes v ON b.Name = v.Name
    JOIN avg_rating r ON b.Name = r.Name
...

Code Snippets

with avg_num_votes as (
      select AVG(total_reviews) 
           from business 
           where total_reviews != 0),
with avg_num_votes as (
      select AVG(total_reviews) AS avg_num_votes
      ...
with avg_num_votes(avg_num_votes) as (
      select AVG(total_reviews) 
      ...
with avg_num_votes(avg_num_votes) as (
      select AVG(total_reviews), Name
      from ...
      GROUP BY Name
      ...
select b.* 
from business b
    JOIN avg_num_votes v ON b.Name = v.Name
    JOIN avg_rating r ON b.Name = r.Name
...

Context

StackExchange Database Administrators Q#30795, answer score: 3

Revisions (0)

No revisions yet.