patternsqlMinor
Self-JOIN with occurrence count
Viewed 0 times
withjoinoccurrencecountself
Problem
I use PostgreSQL 9.4 and I have a table called
I'd like to fetch all the certificates that have not been renewed yet, which means all the certificates that
I created the following query. It works, but I wonder if it can be optimized to be more efficient.
Specifically, my concern is the nested query that I believe is run once for each record in C.
I tried by self-joining the table but the resulting query may return duplicates (e.g. if a certificate has 2 or more renewals) and I feel like using
Any suggestion?
certificates containing the following columns:CREATE TABLE certificates (
id serial PRIMARY KEY,
common_name varchar(255) NOT NULL,
state varchar(255),
expires_on date,
);
CREATE INDEX index_certificates_on_common_name ON certificates (common_name);
CREATE INDEX index_certificates_on_expires_on ON certificates (expires_on);I'd like to fetch all the certificates that have not been renewed yet, which means all the certificates that
- don't have another certificate matching the
common_nameOR
- have another certificate matching
common_name, but thestateis not issued orexpires_onis past.
I created the following query. It works, but I wonder if it can be optimized to be more efficient.
SELECT
C.*
FROM
certificates as C
WHERE
C.expires_on - 30 certificates.expires_on
AND N.state = 'issued'
) = 0;Specifically, my concern is the nested query that I believe is run once for each record in C.
I tried by self-joining the table but the resulting query may return duplicates (e.g. if a certificate has 2 or more renewals) and I feel like using
DISTINCT or GROUP BY may not be the best option.SELECT C.id, N.id
FROM
certificates AS C LEFT JOIN certificates AS N ON C.common_name = N.common_name
WHERE
(C.expires_on - 30 C.expires_on));Any suggestion?
Solution
Two major improvements:
-
Make the first predicate sargable, so that an index can be used.
-
You second condition is that no qualifying row exists. So don't count, that would be expensive because Postgres has to scan all rows unconditionally. Use a
Ideally, you have at least two indexes:
-
on
-
A partial multicolumn index.
If
Answer to additional question in comment
If the number of days for tolerance is variable and stored as
And adapt the query accordingly:
Aside: A column like
SELECT *
FROM certificates c
WHERE c.expires_on -- sargable!
AND NOT EXISTS (
SELECT 1
FROM certificates
WHERE common_name = c.common_name
AND expires_on > c.expires_on
AND state = 'issued'
);-
Make the first predicate sargable, so that an index can be used.
-
You second condition is that no qualifying row exists. So don't count, that would be expensive because Postgres has to scan all rows unconditionally. Use a
NOT EXISTS anti-semi-join. Postgres can use an index again to quickly verify non-existence and can stop looking further as soon as the first qualifying row has been found. Typically much faster (depending on data distribution). There are several related techniques possible for this:- Select rows which are not present in other table
Ideally, you have at least two indexes:
-
on
(expires_on):CREATE INDEX ON certificates (expires_on);-
A partial multicolumn index.
CREATE INDEX foo ON certificates (common_name, expires_on DESC)
WHERE state = 'issued';If
state = 'issued' is true for most rows, then adding the condition to the index won't help.Answer to additional question in comment
If the number of days for tolerance is variable and stored as
integer with each row, use a functional index:CREATE INDEX ON certificates ((expires_on + expiring_days));And adapt the query accordingly:
SELECT *
FROM certificates c
WHERE expires_on + expiring_days c.expires_on
AND state = 'issued'
);Aside: A column like
state typically only allows a couple of distinct values. If so, it's a candidate for normalization: add a table certificate_state listing allowed values and use state_id int REFERENCES state in the main table. Just like outlined here:- Low cardinality fields: integer or string
Code Snippets
SELECT *
FROM certificates c
WHERE c.expires_on <= current_date + 30 -- sargable!
AND NOT EXISTS (
SELECT 1
FROM certificates
WHERE common_name = c.common_name
AND expires_on > c.expires_on
AND state = 'issued'
);CREATE INDEX ON certificates (expires_on);CREATE INDEX foo ON certificates (common_name, expires_on DESC)
WHERE state = 'issued';CREATE INDEX ON certificates ((expires_on + expiring_days));SELECT *
FROM certificates c
WHERE expires_on + expiring_days <= current_date -- can use index
AND NOT EXISTS (
SELECT 1
FROM certificates
WHERE common_name = c.common_name
AND expires_on > c.expires_on
AND state = 'issued'
);Context
StackExchange Database Administrators Q#141834, answer score: 4
Revisions (0)
No revisions yet.