patternsqlModerate
Multicolumn indexes and OR clause
Viewed 0 times
andclauseindexesmulticolumn
Problem
I have a
I am basically querying for the messages the current user has either received or sent.
The value of the parameter for
I have created the following index:
I would like to know if the order of the fields matters for my use case bearing in mind the OR clause in my SQL query.
Can someone please help?
message table in my application and one of the most commonly executed SQL queries on this table is the following one:select * from message message0_
where message0_.sender_id=? or message0_.recipient_id=?
order by message0_.send_date asc;I am basically querying for the messages the current user has either received or sent.
The value of the parameter for
sender_id & recipient_id is the current user ID.I have created the following index:
CREATE INDEX ON message(recipient_id, sender_id);I would like to know if the order of the fields matters for my use case bearing in mind the OR clause in my SQL query.
Can someone please help?
Solution
For your use case, you actually need two different indexes:
Your multi-column indexed would be used by the
Depending on how your
You can then either use your query
Experimental check
Table definitions (simplified)
We populate the tables with some simulated data:
And, at this point, we check which are the execution plans that we get:
gives ...
And using
you get the following query plan:
```
QUERY PLAN
1 Sort (cost=538.87..539.37 rows=200 width=48) (actual time=0.387..0.399 rows=192 loops=1)
2 Sort Key: messages.send_date
3 Sort Method: quicksort Memory: 34kB
4 -> HashAggregate (cost=529.22..531.22 rows=200 width=48) (actual time=0.268..0.317 rows=192 loops=1)
5 Group Key: messages.sender_id, messages.recipient_id, messages.send_date, messages.message_text
6 -> Append (cost=5.07..527.22 rows=200 width=48) (actual time=0.038..0.192 rows=192 loops=1)
7 -> Bitmap Heap Scan on messages (cost=5.07..262.55 rows=100 width=48) (actual time=0.038..0.094 rows=97 loops=1)
8 Recheck Cond: (recipient_id = 123)
9 Heap Blocks: exact=89
10 -> Bitmap Index Scan on messages_recipient_id_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.022..0.022 rows=97 loops=1)
11 Index Cond: (recipient_id = 123)
12 -> Bitmap Heap Scan on messages messages_1 (cost=5.19..262.67 rows=100 width=48) (actual time=0.033..0.085 rows=95 loops=1)
13 Recheck Cond: (sender
CREATE INDEX ON messages(recipient_id);
CREATE INDEX ON messages(sender_id);Your multi-column indexed would be used by the
recipient_id = ?? part of the query, but it would not be used by the other (it can be used by other databases that know how to do Skip Scanning, such as Oracle, although it might not be extremely efficient). Depending on how your
PRIMARY KEY is for the message(s) table, you might not need one of them (the implicity index associated with the PK will do the job).You can then either use your query
as is, and let PostgreSQL do a BitmapOr, or convert the query to a UNION and avoid the OR. Timings should be about the same in both cases. If you are sure that there are not messages where both the sender_id and the recipient_id are the same (i.e.: nobody sends a message to him/herself, or if s/he does, you don't mind showing it twice), then a UNION ALL (which is slightly quicker) could be used instead with the same result. OR conditions tend not to be handled too well by most databases. This specific case, which is rather common and rather simple, is handled well by PostgreSQL.Experimental check
Table definitions (simplified)
CREATE TABLE users
(
user_id integer PRIMARY KEY,
user_name text
) ;
CREATE TABLE messages
(
sender_id integer NOT NULL REFERENCES users(user_id)
ON UPDATE CASCADE ON DELETE RESTRICT,
recipient_id integer NOT NULL REFERENCES users(user_id)
ON UPDATE CASCADE ON DELETE RESTRICT,
send_date timestamp NOT NULL DEFAULT now(),
message_text text,
PRIMARY KEY(sender_id, recipient_id, send_date)
) ;
CREATE INDEX ON messages (recipient_id) ;
-- Following index not needed, given our primary key
-- CREATE INDEX ON messages (sender_id) ;We populate the tables with some simulated data:
-- Create 1000 users
INSERT INTO users (user_id, user_name)
SELECT
user_id, 'user' || user_id AS user_name
FROM
generate_series(1, 1000) AS x(user_id) ;
-- Create (aprox.) 100000 messages
INSERT INTO messages (sender_id, recipient_id, send_date)
SELECT
(random()*999+1)::integer AS sender_id,
(random()*999+1)::integer AS recipient_id,
send_date
FROM
generate_series(timestamp '2017-01-01', timestamp '2017-01-31', interval '25 seconds') AS x(send_date);
ANALYZE;And, at this point, we check which are the execution plans that we get:
-- Checking the query with OR
EXPLAIN ANALYZE
SELECT *
FROM messages
WHERE recipient_id = 123 OR sender_id = 123
ORDER BY send_date ;gives ...
QUERY PLAN
1 Sort (cost=420.21..420.71 rows=200 width=48) (actual time=0.430..0.445 rows=192 loops=1)
2 Sort Key: send_date
3 Sort Method: quicksort Memory: 34kB
4 -> Bitmap Heap Scan on messages (cost=10.31..412.56 rows=200 width=48) (actual time=0.092..0.389 rows=192 loops=1)
5 Recheck Cond: ((recipient_id = 123) OR (sender_id = 123))
6 Heap Blocks: exact=157
7 -> BitmapOr (cost=10.31..10.31 rows=200 width=0) (actual time=0.061..0.061 rows=0 loops=1)
8 -> Bitmap Index Scan on messages_recipient_id_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.033..0.033 rows=97 loops=1)
9 Index Cond: (recipient_id = 123)
10 -> Bitmap Index Scan on messages_pkey (cost=0.00..5.17 rows=100 width=0) (actual time=0.025..0.025 rows=95 loops=1)
11 Index Cond: (sender_id = 123)
12 Planning time: 0.379 ms
13 Execution time: 0.527 msAnd using
UNION:-- Not using OR, but UNION-ing
EXPLAIN ANALYZE
SELECT *
FROM messages
WHERE recipient_id = 123
UNION
SELECT *
FROM messages
WHERE sender_id = 123
ORDER BY send_date ;you get the following query plan:
```
QUERY PLAN
1 Sort (cost=538.87..539.37 rows=200 width=48) (actual time=0.387..0.399 rows=192 loops=1)
2 Sort Key: messages.send_date
3 Sort Method: quicksort Memory: 34kB
4 -> HashAggregate (cost=529.22..531.22 rows=200 width=48) (actual time=0.268..0.317 rows=192 loops=1)
5 Group Key: messages.sender_id, messages.recipient_id, messages.send_date, messages.message_text
6 -> Append (cost=5.07..527.22 rows=200 width=48) (actual time=0.038..0.192 rows=192 loops=1)
7 -> Bitmap Heap Scan on messages (cost=5.07..262.55 rows=100 width=48) (actual time=0.038..0.094 rows=97 loops=1)
8 Recheck Cond: (recipient_id = 123)
9 Heap Blocks: exact=89
10 -> Bitmap Index Scan on messages_recipient_id_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.022..0.022 rows=97 loops=1)
11 Index Cond: (recipient_id = 123)
12 -> Bitmap Heap Scan on messages messages_1 (cost=5.19..262.67 rows=100 width=48) (actual time=0.033..0.085 rows=95 loops=1)
13 Recheck Cond: (sender
Code Snippets
CREATE INDEX ON messages(recipient_id);
CREATE INDEX ON messages(sender_id);CREATE TABLE users
(
user_id integer PRIMARY KEY,
user_name text
) ;
CREATE TABLE messages
(
sender_id integer NOT NULL REFERENCES users(user_id)
ON UPDATE CASCADE ON DELETE RESTRICT,
recipient_id integer NOT NULL REFERENCES users(user_id)
ON UPDATE CASCADE ON DELETE RESTRICT,
send_date timestamp NOT NULL DEFAULT now(),
message_text text,
PRIMARY KEY(sender_id, recipient_id, send_date)
) ;
CREATE INDEX ON messages (recipient_id) ;
-- Following index not needed, given our primary key
-- CREATE INDEX ON messages (sender_id) ;-- Create 1000 users
INSERT INTO users (user_id, user_name)
SELECT
user_id, 'user' || user_id AS user_name
FROM
generate_series(1, 1000) AS x(user_id) ;
-- Create (aprox.) 100000 messages
INSERT INTO messages (sender_id, recipient_id, send_date)
SELECT
(random()*999+1)::integer AS sender_id,
(random()*999+1)::integer AS recipient_id,
send_date
FROM
generate_series(timestamp '2017-01-01', timestamp '2017-01-31', interval '25 seconds') AS x(send_date);
ANALYZE;-- Checking the query with OR
EXPLAIN ANALYZE
SELECT *
FROM messages
WHERE recipient_id = 123 OR sender_id = 123
ORDER BY send_date ;QUERY PLAN
1 Sort (cost=420.21..420.71 rows=200 width=48) (actual time=0.430..0.445 rows=192 loops=1)
2 Sort Key: send_date
3 Sort Method: quicksort Memory: 34kB
4 -> Bitmap Heap Scan on messages (cost=10.31..412.56 rows=200 width=48) (actual time=0.092..0.389 rows=192 loops=1)
5 Recheck Cond: ((recipient_id = 123) OR (sender_id = 123))
6 Heap Blocks: exact=157
7 -> BitmapOr (cost=10.31..10.31 rows=200 width=0) (actual time=0.061..0.061 rows=0 loops=1)
8 -> Bitmap Index Scan on messages_recipient_id_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.033..0.033 rows=97 loops=1)
9 Index Cond: (recipient_id = 123)
10 -> Bitmap Index Scan on messages_pkey (cost=0.00..5.17 rows=100 width=0) (actual time=0.025..0.025 rows=95 loops=1)
11 Index Cond: (sender_id = 123)
12 Planning time: 0.379 ms
13 Execution time: 0.527 msContext
StackExchange Database Administrators Q#172907, answer score: 14
Revisions (0)
No revisions yet.