snippetsqlMinor
How to index two tables for JOINed query optimisation
Viewed 0 times
tablesjoinedqueryoptimisationtwoforhowindex
Problem
I'm using PostgreSQL 9.5 on High Sierra.
Over the two tables:
request_logs - ~ 26K rows
response_logs - ~ 9K rows
I've the following query (with JOIN):
SELECT
req.uuid,
res.status,
req.method,
req.requesturi,
req.accessed,
req.payload reqpayload,
res.payload respayload,
COUNT(*) OVER() AS total_rows
FROM
request_logs req
LEFT OUTER JOIN response_logs res ON req.uuid = res.uuid
WHERE
req.accountid = 2 AND
req.requesturi not ilike '/v1/sessions%' AND
req.accessed BETWEEN “2018-01-01 15:04:05 +0000” and “2019-01-02 15:04:05+0000” AND
res.status IS NOT NULL AND
req.applicationid = 1
ORDER BY
accessed DESC LIMIT 1000
As I'm trying to optimise the query, I've experimented with different indexes:
Here's a list of what I tried:
Configuration 1:
1. request_log.uuid (pkey, unique)
2. response_log.uuid (pkey, unique, foreign key)
Response time avg. : 260 ms
Configuration 2:
1. request_log.uuid (pkey, unique)
2. request_log.applicationid
3. response_log.uuid (pkey, unique, foreign key)
Response time avg. : 230 ms
Configuration 3:
1. request_log.uuid (pkey, unique)
2. request_log.applicationid
3. request_log.accessed (timestampz)
4. response_log.uuid (pkey, unique, foreign key)
Response time avg. : 230 ms
Configuration 4:
1. request_log.uuid (pkey, unique)
2. request_log.applicationid
3. request_log.accessed (timestampz)
4. request_log.accountid
5. response_log.uuid (pkey, unique, foreign key)
Response time avg. : 230 ms
Configuration 5:
1. request_log.uuid (pkey, unique)
2. request_log.applicationid, request_log.accessed (combined)
3. response_log.uuid (pkey, unique, foreign key)
Response time avg. : 240 ms
As visible from the result, indexing by
Maybe the bad performance is due to the JOIN?
Altogether, it seems quite slow and I try not to think what will happen when these tables contain millio
Over the two tables:
request_logs - ~ 26K rows
response_logs - ~ 9K rows
I've the following query (with JOIN):
SELECT
req.uuid,
res.status,
req.method,
req.requesturi,
req.accessed,
req.payload reqpayload,
res.payload respayload,
COUNT(*) OVER() AS total_rows
FROM
request_logs req
LEFT OUTER JOIN response_logs res ON req.uuid = res.uuid
WHERE
req.accountid = 2 AND
req.requesturi not ilike '/v1/sessions%' AND
req.accessed BETWEEN “2018-01-01 15:04:05 +0000” and “2019-01-02 15:04:05+0000” AND
res.status IS NOT NULL AND
req.applicationid = 1
ORDER BY
accessed DESC LIMIT 1000
As I'm trying to optimise the query, I've experimented with different indexes:
Here's a list of what I tried:
Configuration 1:
1. request_log.uuid (pkey, unique)
2. response_log.uuid (pkey, unique, foreign key)
Response time avg. : 260 ms
Configuration 2:
1. request_log.uuid (pkey, unique)
2. request_log.applicationid
3. response_log.uuid (pkey, unique, foreign key)
Response time avg. : 230 ms
Configuration 3:
1. request_log.uuid (pkey, unique)
2. request_log.applicationid
3. request_log.accessed (timestampz)
4. response_log.uuid (pkey, unique, foreign key)
Response time avg. : 230 ms
Configuration 4:
1. request_log.uuid (pkey, unique)
2. request_log.applicationid
3. request_log.accessed (timestampz)
4. request_log.accountid
5. response_log.uuid (pkey, unique, foreign key)
Response time avg. : 230 ms
Configuration 5:
1. request_log.uuid (pkey, unique)
2. request_log.applicationid, request_log.accessed (combined)
3. response_log.uuid (pkey, unique, foreign key)
Response time avg. : 240 ms
As visible from the result, indexing by
applicationid (an int8) did help a little, while indexing by the timestampz accessed didn't help at all. Maybe the bad performance is due to the JOIN?
Altogether, it seems quite slow and I try not to think what will happen when these tables contain millio
Solution
For starters, get your
Query
Updated to your comment:
I only want to returns rows from
Your
Indexes
If (like you commented)
the only 'moving parts' in this query are the dates (from and to) that are compared with accessed
... and more than a few rows are excluded with the other predicates
on
And if a substantial percentage of rows is eliminated by
It may pay to append the columns
LEFT JOIN right and try to get rid of total rows (as discussed in comments):Query
Updated to your comment:
I only want to returns rows from
request_logs that have a parallel entry in response_logs with non nil status:SELECT q.uuid
, s.status
, q.method
, q.requesturi
, q.accessed
, q.payload reqpayload
, s.payload respayload
-- , COUNT(*) OVER() AS total_rows -- see below
FROM request_logs q
JOIN response_logs s USING (uuid) -- not LEFT JOIN
WHERE q.accountid = 2
AND q.requesturi NOT ILIKE '/v1/sessions%'
AND q.accessed BETWEEN '2018-01-01 15:04:05 +0' AND '2019-01-02 15:04:05+0'
AND q.applicationid = 1
AND s.status IS NOT NULL -- see below
ORDER BY q.accessed DESC
LIMIT 1001; -- see belowYour
LEFT [OUTER] JOIN burns down to a plain [INNER] JOIN. Related:- Query with LEFT JOIN not returning rows for count of 0
COUNT(*) OVER() AS total_rows is expensive for counts substantially bigger than LIMIT (and you are expecting "10m+ rows"). Maybe it's good enough to use LIMIT 1001, only use the first 1000 rows, check the row count and if it's 1001 then you know there are "more than 1000 matching rows". Related:- Best way to get result count before LIMIT was applied
Indexes
If (like you commented)
the only 'moving parts' in this query are the dates (from and to) that are compared with accessed
... and more than a few rows are excluded with the other predicates
on
applicationid, requesturi and accountid, then a partial index should help read performance (a lot):CREATE INDEX ON request_logs (uuid, accessed DESC)
WHERE accountid = 2
AND requesturi NOT ILIKE '/v1/sessions%'
AND applicationid = 1And if a substantial percentage of rows is eliminated by
status IS NOT NULL, also:CREATE INDEX ON response_logs (uuid)
WHERE status IS NOT NULL;It may pay to append the columns
status and payload as index columns if you can get index-only scans out of it. Some preconditions apply.Code Snippets
SELECT q.uuid
, s.status
, q.method
, q.requesturi
, q.accessed
, q.payload reqpayload
, s.payload respayload
-- , COUNT(*) OVER() AS total_rows -- see below
FROM request_logs q
JOIN response_logs s USING (uuid) -- not LEFT JOIN
WHERE q.accountid = 2
AND q.requesturi NOT ILIKE '/v1/sessions%'
AND q.accessed BETWEEN '2018-01-01 15:04:05 +0' AND '2019-01-02 15:04:05+0'
AND q.applicationid = 1
AND s.status IS NOT NULL -- see below
ORDER BY q.accessed DESC
LIMIT 1001; -- see belowCREATE INDEX ON request_logs (uuid, accessed DESC)
WHERE accountid = 2
AND requesturi NOT ILIKE '/v1/sessions%'
AND applicationid = 1CREATE INDEX ON response_logs (uuid)
WHERE status IS NOT NULL;Context
StackExchange Database Administrators Q#213327, answer score: 3
Revisions (0)
No revisions yet.