HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Finding rows that are not duplicate

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsareduplicatethatfindingnot

Problem

I have two DB2 tables.

Table 1, called EmployeeFile

EmployeeFile has the following columns: EMP_ID, FNAME, LNAME, DOB

The data looks like this:

EMP_ID, FNAME, LNAME, DOB
 - GB2342, John,  Smith, 1970-12-10
 - L3243C, Jane,  Jones, 1969-11-09


Table 2, called OriginalData

OriginalData has the following columns: INTERNAL_ID, FNAME, LNAME, DOB.

The data looks like this:

INTERNAL_ID, FNAME, LNAME,    DOB
 - 6575678588,  Jane,  Jones,    1969-11-09
 - 1232145564,  Rob,   Mitchell, 1968-10-08
 - 3452345345,  John,  Smith,    1970-12-10
 - 6786478568,  John,  Smith,    1970-12-10


I want to show all rows of EmployeeFile for which the (FNAME, LNAME, DOB) combination occurs only once in OriginalData.

For example, if I ran the query on the example dataset shown above, it should return just the following row:

  • Jane, Jones, 1969-11-09



I don't want 'John Smith' because although he exists in EmployeeFile, his (FNAME, LNAME, DOB) combination occurs twice in OriginalData.

I don't want 'Rob Mitchell' because he doesn't exist in EmployeeFile.

Question

How do I do this?

The only way I can think of is immensely ugly: Concatenate (FNAME, LNAME, DOB) (with some delimiter like ~) from both tables, do a join on that column, do a group by having count(that concatenated column) = 1, then do another select on EmployeeFile, selecting only those rows where the (FNAME, LNAME, DOB) match the parsed substrings of that concatenated string!

Solution

Pretty straightforward:

select e.* 
from EmployeeFile e
inner join
(
  select FNAME, LNAME, DOB
  from OriginalData
  group by FNAME, LNAME, DOB
  having count(1) = 1
) o
on (e.FNAME, e.LNAME, e.DOB) = (o.FNAME, o.LNAME, o.DOB)

Code Snippets

select e.* 
from EmployeeFile e
inner join
(
  select FNAME, LNAME, DOB
  from OriginalData
  group by FNAME, LNAME, DOB
  having count(1) = 1
) o
on (e.FNAME, e.LNAME, e.DOB) = (o.FNAME, o.LNAME, o.DOB)

Context

StackExchange Database Administrators Q#206276, answer score: 5

Revisions (0)

No revisions yet.