HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Select rows where value of second column is not present in first column

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowspresentcolumnwherevaluefirstsecondselectnot

Problem

I have this table structure:

id1 id2
 1   2 
 1   3
 1   4
 2   1
 2   5


I need to build a query to select id2 where id1 is not in id2. For example if id1 = 1 then only id2=3 and id2=4.

I tried this one:

SET @a=1;
SELECT DISTINCT x.id2 
 FROM tt x, tt y 
 WHERE x.id1=@a AND x.id1 != y.id2;


but it's not working.

Solution

For the simple case of a single given id1:

SELECT DISTINCT x.id2   -- DISTINCT only needed if there are dupes
FROM   tbl x
WHERE  x.id1 =  @a
AND    x.id2 <> @a;


Generally, there are basically four techniques to do what you are after. Both columns could just as well be in different tables, almost the same problem.
NOT EXISTS

SELECT DISTINCT x.id2 
FROM   tbl x
WHERE  
AND    NOT EXISTS (
    SELECT FROM tbl y
    WHERE  
    AND    y.id1 = x.id2
    );


LEFT JOIN / IS NULL

SELECT DISTINCT x.id2 
FROM   tbl x
LEFT   JOIN tbl y ON y.id1 = x.id2
                 AND 
WHERE  
AND    y.id1 IS NULL


EXCEPT

SELECT DISTINCT id2
FROM   tbl
WHERE  

EXCEPT ALL            -- ALL to make it faster - no dupes left after DISTINCT
SELECT tbl.id1
FROM   tbl
WHERE  


NOT IN

SELECT DISTINCT x.id2 
FROM   tbl x
WHERE  
AND    x.id2 NOT IN (
    SELECT DISTINCT id1
    FROM   tbl y
    WHERE  
    );


You'll have to test which one is fastest for you. Benchmarks disagree. It depends on data distribution and other details. NOT IN rarely wins. It's mostly one of the first two.

As @ypercube commented: if (id1, id2) is unique, the DISTINCT clause is not needed - except the one in the NOT IN sub-query, which is meant to help performance.

Related answer with more details for PostgreSQL on SO:

  • Select rows which are not present in other table

Code Snippets

SELECT DISTINCT x.id2   -- DISTINCT only needed if there are dupes
FROM   tbl x
WHERE  x.id1 =  @a
AND    x.id2 <> @a;
SELECT DISTINCT x.id2 
FROM   tbl x
WHERE  <some condition>
AND    NOT EXISTS (
    SELECT FROM tbl y
    WHERE  <some condition>
    AND    y.id1 = x.id2
    );
SELECT DISTINCT x.id2 
FROM   tbl x
LEFT   JOIN tbl y ON y.id1 = x.id2
                 AND <some condition for y>
WHERE  <some condition for x>
AND    y.id1 IS NULL
SELECT DISTINCT id2
FROM   tbl
WHERE  <some condition>

EXCEPT ALL            -- ALL to make it faster - no dupes left after DISTINCT
SELECT tbl.id1
FROM   tbl
WHERE  <some condition>
SELECT DISTINCT x.id2 
FROM   tbl x
WHERE  <some condition>
AND    x.id2 NOT IN (
    SELECT DISTINCT id1
    FROM   tbl y
    WHERE  <some condition>
    );

Context

StackExchange Database Administrators Q#16650, answer score: 9

Revisions (0)

No revisions yet.