patternsqlMinor

Find friends of friends (recursively) efficiently using Postgresql

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

postgresqlefficientlyrecursivelyusingfindfriends

Problem

Objective: Users submit their Contact Books, and then the application looks for connections between users, according to their Phone Number. Something like "6 Handshakes" idea (https://en.wikipedia.org/wiki/Six_degrees_of_separation).

Problem: Make this query performance close to real time. When the User submits his phone number and gets the full list of other phones, he may know. Plain list - without connections (graph vertices etc), but full, not paginated (this requirement is here because original goal is more complex).

Question: is it possible to achieve close-to-realtime performance with pure relational database, without Graph databases (Neo4j, etc), graph extensions (bitnine agensgraph) or workflow redesign? Any denormalization is possible, but to my understanding, it won't help.

Given:

test=# select * from connections;
 user_phone | friend_phone
------------+--------------
          1 |            2
          1 |            3
          1 |            4
          2 |            6
          2 |            7
          2 |            8
          8 |           10
          8 |           11
          8 |           12
         20 |           30
         40 |           50
         60 |           70

I expect to receive the following connections for User with Phone === 1:

friend_phone
--------------
            2
            3
            4
            6
            7
            8
           10
           11
           12
(9 rows)

It is really difficult to estimate real-world connections numbers. But I was testing at least with:

10,000 Users (Phone Numbers)

Each user was randomly assigned with 50-1000 Connections with pseudo-random other Users

This resulted in about 1,000,000 Connections

If it is impossible to achieve this in general (using some tricky ORDER BYs or subqueries etc) - what metrics should be considered for example to understand that:

with 1,000,000 connections you need 128GB RAM instance to get 2 seconds response time

an

Solution

Just by looking at the explain plan, we see that the optimizer's estimate are totally off the grid:

the sort node is estimated at 46170 rows whereas there are actually 637677 rows

the index scan is estimated at 2270852 rows whereas there are actually 1722262 rows

And so on.

So I think you'll need to vacuum analyze your tables connections and friends before getting a new plan.

Maybe, it will only help a little, but we need to get rid of this problem before trying to understand what's happening and how to optimize that query in your context.

Context

StackExchange Database Administrators Q#271010, answer score: 2

Revisions (0)

No revisions yet.