HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlModerate

How to most efficiently find out if a record has child records?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
efficientlyhowrecordshasrecordchildfindoutmost

Problem

I am writing a query that returns a single record from the parent table. I would like to also return in this query if it has any children. This is a one to many relationship.

parent:
 -parent_id
 -name

child:
-child_id
-name
-parent_id


My first instinct is to write the following query:

select name, (select count(child_id) from child c  where c.parent_id=p.parent_id) children
     from parent p
     where name like 'some name'


But I was wondering if there was a more efficient way to do this, since I don't actually care about the count, just whether or not it has children. Any pointers?

Solution

Methods

Aggregate Method

The popular way we'll call it the aggregate method. Note bool_or(child_id IS NOT NULL) also works but was not any faster.

SELECT parent_id, count(*)>1 AS has_children
FROM parent
LEFT OUTER JOIN children
  USING (parent_id)
GROUP BY parent_id;


LEFT JOIN LATERAL with limit

But you may also try this, with LEFT JOIN LATERAL() like this..

SELECT parent_id, has_children
FROM parent AS p
LEFT JOIN LATERAL (
  SELECT true
  FROM children AS c
  WHERE c.parent_id = p.parent_id
  FETCH FIRST ROW ONLY
) AS t(has_children)
  ON (true);


EXISTS

Just FYI, you can use CROSS JOIN LATERAL with EXISTS too (which is I believe how it's planned). We'll call it the EXISTS method.

SELECT parent_id, has_children
FROM parent AS p
CROSS JOIN LATERAL (
  SELECT EXISTS(
    SELECT 
    FROM children AS c
    WHERE c.parent_id = p.parent_id
  )
) AS t(has_children);


Which is the same as,

SELECT parent_id, EXISTS(
    SELECT 
    FROM children AS c
    WHERE c.parent_id = p.parent_id
) AS has_children
FROM parent AS p;


Benchmarks

Sample dataset

1000000 children, 2500 parents. Our sims get it done.

CREATE TABLE parent (
  parent_id int PRIMARY KEY
);
INSERT INTO parent
  SELECT x
  FROM generate_series(1,1e4,4) AS gs(x);
CREATE TABLE children (
  child_id int PRIMARY KEY,
  parent_id int REFERENCES parent
);
INSERT INTO children
  SELECT x, 1 + (x::int%1e4)::int/4*4
  FROM generate_series(1,1e6) AS gs(x);

VACUUM FULL ANALYZE children;
VACUUM FULL ANALYZE parent;


Results (pt1)

  • Aggregate method: 450ms,



  • LEFT JOIN LATERAL ( FETCH FIRST ROW ONLY ): 850ms



  • EXISTS method: 850ms



Results (adding an index and running again)

Now let's add an index

CREATE INDEX ON children (parent_id);
ANALYZE children;


Now the timing profile is totally different,

  • Aggregate method: 450ms,



  • LEFT JOIN LATERAL ( FETCH FIRST ROW ONLY ): 30ms



  • EXISTS method: 30ms

Code Snippets

SELECT parent_id, count(*)>1 AS has_children
FROM parent
LEFT OUTER JOIN children
  USING (parent_id)
GROUP BY parent_id;
SELECT parent_id, has_children
FROM parent AS p
LEFT JOIN LATERAL (
  SELECT true
  FROM children AS c
  WHERE c.parent_id = p.parent_id
  FETCH FIRST ROW ONLY
) AS t(has_children)
  ON (true);
SELECT parent_id, has_children
FROM parent AS p
CROSS JOIN LATERAL (
  SELECT EXISTS(
    SELECT 
    FROM children AS c
    WHERE c.parent_id = p.parent_id
  )
) AS t(has_children);
SELECT parent_id, EXISTS(
    SELECT 
    FROM children AS c
    WHERE c.parent_id = p.parent_id
) AS has_children
FROM parent AS p;
CREATE TABLE parent (
  parent_id int PRIMARY KEY
);
INSERT INTO parent
  SELECT x
  FROM generate_series(1,1e4,4) AS gs(x);
CREATE TABLE children (
  child_id int PRIMARY KEY,
  parent_id int REFERENCES parent
);
INSERT INTO children
  SELECT x, 1 + (x::int%1e4)::int/4*4
  FROM generate_series(1,1e6) AS gs(x);

VACUUM FULL ANALYZE children;
VACUUM FULL ANALYZE parent;

Context

StackExchange Database Administrators Q#215212, answer score: 11

Revisions (0)

No revisions yet.