patternMinor
Find duplicate records across one-to-many relationship tables
Viewed 0 times
findtablesduplicaterecordsonemanyacrossrelationship
Problem
Let's say I have two tables that have a one-to-many relationship: People and Attributes. I'm trying to find duplicates based on the first_name, last_name, and all attributes must match exactly.
Which gives us:
I'd like a query that will return IDs 1 and 2 from the People table. I can find duplicates within a single table:
But I'm having trouble joining the one-to-many table and detecting duplicates across both tables. How can I detect duplicates across tables that have a one-to-many relationship?
CREATE TABLE People (Id int,
first_name varchar(100),
last_name varchar(100));
CREATE TABLE Attributes (Id int,
person_id int,
field varchar(100),
field_value varchar(100));
INSERT INTO People VALUES (1, 'John', 'Smith');
INSERT INTO People VALUES (2, 'John', 'Smith');
INSERT INTO People VALUES (3, 'John', 'Smith');
INSERT INTO Attributes VALUES (1, 1, 'HairColor', 'Brown');
INSERT INTO Attributes VALUES (2, 1, 'EyeColor', 'Blue');
INSERT INTO Attributes VALUES (3, 2, 'HairColor', 'Brown');
INSERT INTO Attributes VALUES (4, 2, 'EyeColor', 'Blue');
INSERT INTO Attributes VALUES (5, 3, 'HairColor', 'Blonde');Which gives us:
id | first_name | last_name
----+------------+-----------
1 | John | Smith
2 | John | Smith
3 | John | Smith
id | person_id | field | field_value
----+-----------+-----------+-------------
1 | 1 | HairColor | Brown
2 | 1 | EyeColor | Blue
3 | 2 | HairColor | Brown
4 | 2 | EyeColor | Blue
5 | 3 | HairColor | BlondeI'd like a query that will return IDs 1 and 2 from the People table. I can find duplicates within a single table:
select first_name,last_name,count(*) from People
group by first_name,last_name having ( count(*) > 1 );But I'm having trouble joining the one-to-many table and detecting duplicates across both tables. How can I detect duplicates across tables that have a one-to-many relationship?
Solution
One way to do this (check the SQLfiddle):
and another:
Intersect has higher precedence than except/minus, at least in Postgres and SQL Server. Just to be safe, you can use parentheses, to ensure the precedence.
select
p1.id as id1,
p2.id as id2
from people p1
join people p2
on p1.first_name = p2.first_name
and p1.last_name = p2.last_name
and p1.id 2
) ;and another:
select
p1.id as id1,
p2.id as id2
from people p1
join people p2
on p1.first_name = p2.first_name
and p1.last_name = p2.last_name
and p1.id < p2.id
where not exists
( ( select field, field_value
from attributes a1
where a1.person_id = p1.id
union
select field, field_value
from attributes a2
where a2.person_id = p2.id
)
except
( select field, field_value
from attributes a1
where a1.person_id = p1.id
intersect
select field, field_value
from attributes a2
where a2.person_id = p2.id
)
) ;Intersect has higher precedence than except/minus, at least in Postgres and SQL Server. Just to be safe, you can use parentheses, to ensure the precedence.
Code Snippets
select
p1.id as id1,
p2.id as id2
from people p1
join people p2
on p1.first_name = p2.first_name
and p1.last_name = p2.last_name
and p1.id < p2.id
where not exists
( select 1
from
( select *
from attributes a1
where a1.person_id = p1.id
union all
select *
from attributes a2
where a2.person_id = p2.id
) g
group by field, field_value
having count(*) <> 2
) ;select
p1.id as id1,
p2.id as id2
from people p1
join people p2
on p1.first_name = p2.first_name
and p1.last_name = p2.last_name
and p1.id < p2.id
where not exists
( ( select field, field_value
from attributes a1
where a1.person_id = p1.id
union
select field, field_value
from attributes a2
where a2.person_id = p2.id
)
except
( select field, field_value
from attributes a1
where a1.person_id = p1.id
intersect
select field, field_value
from attributes a2
where a2.person_id = p2.id
)
) ;Context
StackExchange Database Administrators Q#91959, answer score: 4
Revisions (0)
No revisions yet.