HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Understanding multiple table join with aggregation

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
understandingwithjoinmultipleaggregationtable

Problem

I have a basic question on how JOIN works on multiple tables. I want to count occurrences of Foreign Key in link1 & link2

CREATE TABLE main (
   id SERIAL PRIMARY KEY,
   name text NOT NULL
);

CREATE TABLE link1 (
   id SERIAL PRIMARY KEY,
   main_id integer NOT NULL,
   CONSTRAINT main_id_fk FOREIGN KEY (main_id) REFERENCES main (id)
);

-- link2 is similar to link1


SQL Fiddle

Why does the query below give a product of counts (rather than sum) when the count is non-zero in both columns.

SELECT main.id, COUNT(link1.main_id) + COUNT(link2.main_id)
FROM main
LEFT JOIN link1 ON main.id=link1.main_id
LEFT JOIN link2 ON main.id=link2.main_id
GROUP BY main.id

Solution

What you see is a "proxy cross join". Aggregate first, then join:

SELECT m.id, COALESCE(l1.ct, 0) + COALESCE(l2.ct, 0) AS total_ct
FROM   main m
LEFT   JOIN (
   SELECT main_id, count(*) AS ct
   FROM   link1
   GROUP  BY main_id
   ) l1 ON l1.main_id = m.id
LEFT   JOIN (
   SELECT main_id, count(*) AS ct
   FROM   link2
   GROUP  BY main_id
   ) l2 ON l2.main_id = m.id
ORDER BY m.id;


SQL Fiddle.

Do not multiply rows with multiple unqualified joins and count(DISTINCT ...) later to fix the mistake. It happens to work in this case since counting DISTINCT link1.id / link2.id coincides with the desired result, but it's needlessly expensive and error prone.

Detailed explanation and a couple of syntax variants in these related answers on SO:

  • Multiple array_agg() calls in a single query



  • Two SQL LEFT JOINS produce incorrect result

Code Snippets

SELECT m.id, COALESCE(l1.ct, 0) + COALESCE(l2.ct, 0) AS total_ct
FROM   main m
LEFT   JOIN (
   SELECT main_id, count(*) AS ct
   FROM   link1
   GROUP  BY main_id
   ) l1 ON l1.main_id = m.id
LEFT   JOIN (
   SELECT main_id, count(*) AS ct
   FROM   link2
   GROUP  BY main_id
   ) l2 ON l2.main_id = m.id
ORDER BY m.id;

Context

StackExchange Database Administrators Q#87249, answer score: 8

Revisions (0)

No revisions yet.