HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Query for records from T1 NOT in junction table T2

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
queryrecordsjunctionforfromnottable

Problem

I have a table containing the names and filesystem locations of C source-files in a table, a list of relevant categories for the programming principals demonstrated by the source files in a second table and then a junction table that is a one-to-many for the categories associated with each source-file. (table definitions below question)

What I need to do is get a list of file_ids from the source-file table that have no associated file_id entry in the junction table. I have done that using an except clause that relies on an inner join selecting records from the junction table where file_ids are present in the junction table, e.g.
select distinct file_id from c_file

except (

select distinct c_file.file_id from c_file

inner join c_file_topic_rel
on c_file_topic_rel.file_id = c_file.file_id

where exists (

select distinct file_id from c_file_topic_rel

)
)

order by file_id;


This works fine, but I'm no SQL master and I'm skeptical I'm taking a very long way around to get the records from the c_file table NOT in the junction table c_file_topic_rel. Is there a better way to go about isolating the records with no entry in the junction table? (there are about 15,000 total entries in the c_file table, if the size matters -- it's relatively small)

Table Definitions

The Default column is removed to prevent excess scrolling. If relevant, I'm happy to add the full definitions.

c_file table
` Table "public.c_file"
Column | Type | Collation | Nullable |
---------+-----------------------------+-----------+----------+
file_id | integer | | not null | ...
fname | character varying(64) | | not null |
dirname | character varying(128) | | not null |
mtime | timestamp without time zone | | not null | ...
created | timestamp

Solution

It seems you are over-complicating it and joining onto C-file multiple times. You can just use a NOT EXISTS

SELECT
  f.file_id
FROM c_file f
WHERE NOT EXISTS (SELECT 1
    FROM c_file_topic_rel r
    WHERE r.file_id = f.file_id
);


DISTINCT is not necessary on the outside as file_id is already unique. On the inner query we are using EXISTS so DISTINCT is pointless.

Other equivalent syntaxes (these should have the same performance)

SELECT
  f.file_id
FROM c_file f
EXCEPT
SELECT r.file_id
FROM c_file_topic_rel r;


SELECT
  f.file_id
FROM c_file f
WHERE f.file_id NOT IN (
    SELECT r.file_id
    FROM c_file_topic_rel r
);


I would advise not to ever use NOT IN as it can give incorrect results in the presence of nulls.

Another, usually slower, syntax

SELECT
  f.file_id
FROM c_file f
LEFT JOIN c_file_topic_rel r ON r.file_id = c.file_id
WHERE r.file_id IS NULL;

Code Snippets

SELECT
  f.file_id
FROM c_file f
WHERE NOT EXISTS (SELECT 1
    FROM c_file_topic_rel r
    WHERE r.file_id = f.file_id
);
SELECT
  f.file_id
FROM c_file f
EXCEPT
SELECT r.file_id
FROM c_file_topic_rel r;
SELECT
  f.file_id
FROM c_file f
WHERE f.file_id NOT IN (
    SELECT r.file_id
    FROM c_file_topic_rel r
);
SELECT
  f.file_id
FROM c_file f
LEFT JOIN c_file_topic_rel r ON r.file_id = c.file_id
WHERE r.file_id IS NULL;

Context

StackExchange Database Administrators Q#327904, answer score: 4

Revisions (0)

No revisions yet.