patternsqlMinor
Find friends of friends (recursively) efficiently using Postgresql
Viewed 0 times
postgresqlefficientlyrecursivelyusingfindfriends
Problem
Objective: Users submit their Contact Books, and then the application looks for connections between users, according to their Phone Number. Something like "6 Handshakes" idea (https://en.wikipedia.org/wiki/Six_degrees_of_separation).
Problem: Make this query performance close to real time. When the User submits his phone number and gets the full list of other phones, he may know. Plain list - without connections (graph vertices etc), but full, not paginated (this requirement is here because original goal is more complex).
Question: is it possible to achieve close-to-realtime performance with pure relational database, without Graph databases (Neo4j, etc), graph extensions (bitnine agensgraph) or workflow redesign? Any denormalization is possible, but to my understanding, it won't help.
Given:
I expect to receive the following connections for User with Phone === 1:
It is really difficult to estimate real-world connections numbers. But I was testing at least with:
If it is impossible to achieve this in general (using some tricky ORDER BYs or subqueries etc) - what metrics should be considered for example to understand that:
with 1,000,000 connections you need 128GB RAM instance to get 2 seconds response time
an
Problem: Make this query performance close to real time. When the User submits his phone number and gets the full list of other phones, he may know. Plain list - without connections (graph vertices etc), but full, not paginated (this requirement is here because original goal is more complex).
Question: is it possible to achieve close-to-realtime performance with pure relational database, without Graph databases (Neo4j, etc), graph extensions (bitnine agensgraph) or workflow redesign? Any denormalization is possible, but to my understanding, it won't help.
Given:
test=# select * from connections;
user_phone | friend_phone
------------+--------------
1 | 2
1 | 3
1 | 4
2 | 6
2 | 7
2 | 8
8 | 10
8 | 11
8 | 12
20 | 30
40 | 50
60 | 70I expect to receive the following connections for User with Phone === 1:
friend_phone
--------------
2
3
4
6
7
8
10
11
12
(9 rows)It is really difficult to estimate real-world connections numbers. But I was testing at least with:
- 10,000 Users (Phone Numbers)
- Each user was randomly assigned with 50-1000 Connections with pseudo-random other Users
- This resulted in about 1,000,000 Connections
If it is impossible to achieve this in general (using some tricky ORDER BYs or subqueries etc) - what metrics should be considered for example to understand that:
with 1,000,000 connections you need 128GB RAM instance to get 2 seconds response time
an
Solution
Just by looking at the explain plan, we see that the optimizer's estimate are totally off the grid:
And so on.
So I think you'll need to
Maybe, it will only help a little, but we need to get rid of this problem before trying to understand what's happening and how to optimize that query in your context.
- the sort node is estimated at 46170 rows whereas there are actually 637677 rows
- the index scan is estimated at 2270852 rows whereas there are actually 1722262 rows
And so on.
So I think you'll need to
vacuum analyze your tables connections and friends before getting a new plan.Maybe, it will only help a little, but we need to get rid of this problem before trying to understand what's happening and how to optimize that query in your context.
Context
StackExchange Database Administrators Q#271010, answer score: 2
Revisions (0)
No revisions yet.