HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Removing duplicate rows with two or more left join tables

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowslefttablesremovingwithduplicatemorejointwo

Problem

With a query like this

SELECT a.id, a.name,
       COALESCE( json_agg(b.*), '[]'::json ),
       COALESCE( json_agg(c.*), '[]'::json ),
  FROM a
  LEFT JOIN b ON a.id = b.a_id
  LEFT JOIN c ON a.id = c.a_id
 GROUP BY a.id, a.name;


When executed, both c and b will be multiplied by each other and produce duplicated entries in the JSON array object.

I tried changing the query to use 2 subqueries instead, but I get all sorts of errors and warning, such as "Subquery must return only one column", etc.

I also tried using LEFT OUTER JOIN, but I guess I don't yet master how joining tables work as it only applies to b and c is still multiplied and contain duplicates.

Edit: using DISTINCT errors with "could not identify an equality operator for type json" on the COALESCE functions.

How can I fix this query and only aggregate unique rows?

Edit 2

I need to specify that both table b and c are actually VIEWs, and they both have at least one json_agg column as well, so I cannot just use json_agg(DISTINCT b.*). This would've been way too easy.

Edit 3

Here is a small snippet to reproduce the issue :

```
--DROP TABLE IF EXISTS tbl_a CASCADE;
--DROP TABLE IF EXISTS tbl_b CASCADE;
--DROP TABLE IF EXISTS tbl_c CASCADE;

CREATE TABLE tbl_a (
id bigserial NOT NULL,
name character varying(16),
CONSTRAINT "PK_tbl_a" PRIMARY KEY (id)
) WITH ( OIDS=FALSE );

CREATE TABLE tbl_b (
a_id bigint NOT NULL,
foo json NOT NULL DEFAULT '{}'::json,
CONSTRAINT "FK_tbl_b_a" FOREIGN KEY (a_id)
REFERENCES tbl_a (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
) WITH ( OIDS=FALSE );

CREATE TABLE tbl_c (
a_id bigint NOT NULL,
bar json NOT NULL DEFAULT '{}'::json,
CONSTRAINT "FK_tbl_c_a" FOREIGN KEY (a_id)
REFERENCES tbl_a (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
) WITH ( OIDS=FALSE );

INSERT INTO tbl_a (id,name) VALUES (1, 'Test');
INSERT INTO tbl_b (a_id, foo) VALUES (1, '{"foo":"Hello"}'::json), (1, '{"fo

Solution

I have found a solution. I am not sure if it is optimal, but it works.

SELECT tbl_a.id, tbl_a.name,
       COALESCE( ( SELECT json_agg(tbl_b.*)
                     FROM tbl_b 
                    WHERE tbl_b.a_id = tbl_a.id ), '{}'::json),
       COALESCE( ( SELECT json_agg(tbl_c.*)
                     FROM tbl_c 
                    WHERE tbl_c.a_id = tbl_a.id ), '{}'::json)
  FROM tbl_a;


Which correctly returns

id  name    coalesce                             coalesce
--  ------  -----------------------------------  ----------------------
 1  "Test"  "[{"a_id":1,"foo":{"foo":"World"}},  "[{"a_id":1,"bar":{"bar":"abc"}},
              {"a_id":1,"foo":{"foo":"Hello"}},    {"a_id":1,"bar":{"bar":"def"}}]"

Code Snippets

SELECT tbl_a.id, tbl_a.name,
       COALESCE( ( SELECT json_agg(tbl_b.*)
                     FROM tbl_b 
                    WHERE tbl_b.a_id = tbl_a.id ), '{}'::json),
       COALESCE( ( SELECT json_agg(tbl_c.*)
                     FROM tbl_c 
                    WHERE tbl_c.a_id = tbl_a.id ), '{}'::json)
  FROM tbl_a;
id  name    coalesce                             coalesce
--  ------  -----------------------------------  ----------------------
 1  "Test"  "[{"a_id":1,"foo":{"foo":"World"}},  "[{"a_id":1,"bar":{"bar":"abc"}},
              {"a_id":1,"foo":{"foo":"Hello"}},    {"a_id":1,"bar":{"bar":"def"}}]"

Context

StackExchange Database Administrators Q#130200, answer score: 6

Revisions (0)

No revisions yet.