HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlCritical

Best practice between using LEFT JOIN or NOT EXISTS

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
leftpracticejoinbetweenusingexistsnotbest

Problem

Is there a best practice between using a LEFT JOIN or a NOT EXISTS format?

What is benefit to using one over the other?

If none, which should be preferred?

SELECT *
FROM tableA A
LEFT JOIN tableB B
     ON A.idx = B.idx
WHERE B.idx IS NULL


SELECT *
FROM tableA A
WHERE NOT EXISTS
(SELECT idx FROM tableB B WHERE B.idx = A.idx)


I am using queries within Access against a SQL Server database.

Solution

The biggest difference is not in the join vs not exists, it is (as written), the SELECT *.

On the first example, you get all columns from both A and B, whereas in the second example, you get only columns from A.

In SQL Server, the second variant is slightly faster in a very simple contrived example:

Create two sample tables:

CREATE TABLE dbo.A
(
    A_ID INT NOT NULL
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
);

CREATE TABLE dbo.B
(
    B_ID INT NOT NULL
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
);
GO


Insert 10,000 rows into each table:

INSERT INTO dbo.A DEFAULT VALUES;
GO 10000

INSERT INTO dbo.B DEFAULT VALUES;
GO 10000


Remove every 5th row from the second table:

DELETE 
FROM dbo.B 
WHERE B_ID % 5 = 1;

SELECT COUNT(*) -- shows 10,000
FROM dbo.A;

SELECT COUNT(*) -- shows  8,000
FROM dbo.B;


Perform the two test SELECT statement variants:

SELECT *
FROM dbo.A
    LEFT JOIN dbo.B ON A.A_ID = B.B_ID
WHERE B.B_ID IS NULL;

SELECT *
FROM dbo.A
WHERE NOT EXISTS (SELECT 1
    FROM dbo.B
    WHERE b.B_ID = a.A_ID);


Execution plans:

The second variant does not need to perform the filter operation since it can use the left anti-semi join operator.

Code Snippets

CREATE TABLE dbo.A
(
    A_ID INT NOT NULL
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
);

CREATE TABLE dbo.B
(
    B_ID INT NOT NULL
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
);
GO
INSERT INTO dbo.A DEFAULT VALUES;
GO 10000

INSERT INTO dbo.B DEFAULT VALUES;
GO 10000
DELETE 
FROM dbo.B 
WHERE B_ID % 5 = 1;

SELECT COUNT(*) -- shows 10,000
FROM dbo.A;

SELECT COUNT(*) -- shows  8,000
FROM dbo.B;
SELECT *
FROM dbo.A
    LEFT JOIN dbo.B ON A.A_ID = B.B_ID
WHERE B.B_ID IS NULL;

SELECT *
FROM dbo.A
WHERE NOT EXISTS (SELECT 1
    FROM dbo.B
    WHERE b.B_ID = a.A_ID);

Context

StackExchange Database Administrators Q#121034, answer score: 94

Revisions (0)

No revisions yet.