patternsqlMinor
Query for records from T1 NOT in junction table T2
Viewed 0 times
queryrecordsjunctionforfromnottable
Problem
I have a table containing the names and filesystem locations of C source-files in a table, a list of relevant categories for the programming principals demonstrated by the source files in a second table and then a junction table that is a one-to-many for the categories associated with each source-file. (table definitions below question)
What I need to do is get a list of
This works fine, but I'm no SQL master and I'm skeptical I'm taking a very long way around to get the records from the
Table Definitions
The
c_file table
` Table "public.c_file"
Column | Type | Collation | Nullable |
---------+-----------------------------+-----------+----------+
file_id | integer | | not null | ...
fname | character varying(64) | | not null |
dirname | character varying(128) | | not null |
mtime | timestamp without time zone | | not null | ...
created | timestamp
What I need to do is get a list of
file_ids from the source-file table that have no associated file_id entry in the junction table. I have done that using an except clause that relies on an inner join selecting records from the junction table where file_ids are present in the junction table, e.g.select distinct file_id from c_file
except (
select distinct c_file.file_id from c_file
inner join c_file_topic_rel
on c_file_topic_rel.file_id = c_file.file_id
where exists (
select distinct file_id from c_file_topic_rel
)
)
order by file_id;
This works fine, but I'm no SQL master and I'm skeptical I'm taking a very long way around to get the records from the
c_file table NOT in the junction table c_file_topic_rel. Is there a better way to go about isolating the records with no entry in the junction table? (there are about 15,000 total entries in the c_file table, if the size matters -- it's relatively small)Table Definitions
The
Default column is removed to prevent excess scrolling. If relevant, I'm happy to add the full definitions.c_file table
` Table "public.c_file"
Column | Type | Collation | Nullable |
---------+-----------------------------+-----------+----------+
file_id | integer | | not null | ...
fname | character varying(64) | | not null |
dirname | character varying(128) | | not null |
mtime | timestamp without time zone | | not null | ...
created | timestamp
Solution
It seems you are over-complicating it and joining onto
Other equivalent syntaxes (these should have the same performance)
I would advise not to ever use
Another, usually slower, syntax
C-file multiple times. You can just use a NOT EXISTSSELECT
f.file_id
FROM c_file f
WHERE NOT EXISTS (SELECT 1
FROM c_file_topic_rel r
WHERE r.file_id = f.file_id
);DISTINCT is not necessary on the outside as file_id is already unique. On the inner query we are using EXISTS so DISTINCT is pointless.Other equivalent syntaxes (these should have the same performance)
SELECT
f.file_id
FROM c_file f
EXCEPT
SELECT r.file_id
FROM c_file_topic_rel r;SELECT
f.file_id
FROM c_file f
WHERE f.file_id NOT IN (
SELECT r.file_id
FROM c_file_topic_rel r
);I would advise not to ever use
NOT IN as it can give incorrect results in the presence of nulls.Another, usually slower, syntax
SELECT
f.file_id
FROM c_file f
LEFT JOIN c_file_topic_rel r ON r.file_id = c.file_id
WHERE r.file_id IS NULL;Code Snippets
SELECT
f.file_id
FROM c_file f
WHERE NOT EXISTS (SELECT 1
FROM c_file_topic_rel r
WHERE r.file_id = f.file_id
);SELECT
f.file_id
FROM c_file f
EXCEPT
SELECT r.file_id
FROM c_file_topic_rel r;SELECT
f.file_id
FROM c_file f
WHERE f.file_id NOT IN (
SELECT r.file_id
FROM c_file_topic_rel r
);SELECT
f.file_id
FROM c_file f
LEFT JOIN c_file_topic_rel r ON r.file_id = c.file_id
WHERE r.file_id IS NULL;Context
StackExchange Database Administrators Q#327904, answer score: 4
Revisions (0)
No revisions yet.