HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Multicolumn indexes and OR clause

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
andclauseindexesmulticolumn

Problem

I have a message table in my application and one of the most commonly executed SQL queries on this table is the following one:

select * from message message0_ 
 where message0_.sender_id=? or message0_.recipient_id=?
 order by message0_.send_date asc;


I am basically querying for the messages the current user has either received or sent.

The value of the parameter for sender_id & recipient_id is the current user ID.

I have created the following index:

CREATE INDEX ON message(recipient_id, sender_id);


I would like to know if the order of the fields matters for my use case bearing in mind the OR clause in my SQL query.

Can someone please help?

Solution

For your use case, you actually need two different indexes:

CREATE INDEX ON messages(recipient_id);
CREATE INDEX ON messages(sender_id);


Your multi-column indexed would be used by the recipient_id = ?? part of the query, but it would not be used by the other (it can be used by other databases that know how to do Skip Scanning, such as Oracle, although it might not be extremely efficient).

Depending on how your PRIMARY KEY is for the message(s) table, you might not need one of them (the implicity index associated with the PK will do the job).

You can then either use your query as is, and let PostgreSQL do a BitmapOr, or convert the query to a UNION and avoid the OR. Timings should be about the same in both cases. If you are sure that there are not messages where both the sender_id and the recipient_id are the same (i.e.: nobody sends a message to him/herself, or if s/he does, you don't mind showing it twice), then a UNION ALL (which is slightly quicker) could be used instead with the same result.

OR conditions tend not to be handled too well by most databases. This specific case, which is rather common and rather simple, is handled well by PostgreSQL.

Experimental check

Table definitions (simplified)

CREATE TABLE users
(
    user_id integer PRIMARY KEY,
    user_name text 
) ;

CREATE TABLE messages
(
    sender_id    integer NOT NULL REFERENCES users(user_id) 
                 ON UPDATE CASCADE ON DELETE RESTRICT,
    recipient_id integer NOT NULL REFERENCES users(user_id) 
                 ON UPDATE CASCADE ON DELETE RESTRICT,
    send_date    timestamp NOT NULL DEFAULT now(),
    message_text text,
    PRIMARY KEY(sender_id, recipient_id, send_date)
) ;
CREATE INDEX ON messages (recipient_id) ;
-- Following index not needed, given our primary key
-- CREATE INDEX ON messages (sender_id) ;


We populate the tables with some simulated data:

-- Create 1000 users
INSERT INTO users (user_id, user_name)
SELECT
    user_id, 'user' || user_id AS user_name
FROM
    generate_series(1, 1000) AS x(user_id) ;

-- Create (aprox.) 100000 messages
INSERT INTO messages (sender_id, recipient_id, send_date)
SELECT
    (random()*999+1)::integer AS sender_id,
    (random()*999+1)::integer AS recipient_id,
    send_date
FROM
    generate_series(timestamp '2017-01-01', timestamp '2017-01-31', interval '25 seconds') AS x(send_date); 

ANALYZE;


And, at this point, we check which are the execution plans that we get:

-- Checking the query with OR
EXPLAIN ANALYZE
SELECT * 
FROM messages
WHERE recipient_id = 123 OR sender_id = 123
ORDER BY send_date ;


gives ...

QUERY PLAN
1   Sort  (cost=420.21..420.71 rows=200 width=48) (actual time=0.430..0.445 rows=192 loops=1)
2     Sort Key: send_date
3     Sort Method: quicksort  Memory: 34kB
4     ->  Bitmap Heap Scan on messages  (cost=10.31..412.56 rows=200 width=48) (actual time=0.092..0.389 rows=192 loops=1)
5           Recheck Cond: ((recipient_id = 123) OR (sender_id = 123))
6           Heap Blocks: exact=157
7           ->  BitmapOr  (cost=10.31..10.31 rows=200 width=0) (actual time=0.061..0.061 rows=0 loops=1)
8                 ->  Bitmap Index Scan on messages_recipient_id_idx  (cost=0.00..5.04 rows=100 width=0) (actual time=0.033..0.033 rows=97 loops=1)
9                       Index Cond: (recipient_id = 123)
10                ->  Bitmap Index Scan on messages_pkey  (cost=0.00..5.17 rows=100 width=0) (actual time=0.025..0.025 rows=95 loops=1)
11                      Index Cond: (sender_id = 123)
12  Planning time: 0.379 ms
13  Execution time: 0.527 ms


And using UNION:

-- Not using OR, but UNION-ing
EXPLAIN ANALYZE
SELECT * 
FROM messages
WHERE recipient_id = 123
UNION
SELECT *
FROM messages
WHERE sender_id = 123
ORDER BY send_date ;


you get the following query plan:

```
QUERY PLAN
1 Sort (cost=538.87..539.37 rows=200 width=48) (actual time=0.387..0.399 rows=192 loops=1)
2 Sort Key: messages.send_date
3 Sort Method: quicksort Memory: 34kB
4 -> HashAggregate (cost=529.22..531.22 rows=200 width=48) (actual time=0.268..0.317 rows=192 loops=1)
5 Group Key: messages.sender_id, messages.recipient_id, messages.send_date, messages.message_text
6 -> Append (cost=5.07..527.22 rows=200 width=48) (actual time=0.038..0.192 rows=192 loops=1)
7 -> Bitmap Heap Scan on messages (cost=5.07..262.55 rows=100 width=48) (actual time=0.038..0.094 rows=97 loops=1)
8 Recheck Cond: (recipient_id = 123)
9 Heap Blocks: exact=89
10 -> Bitmap Index Scan on messages_recipient_id_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.022..0.022 rows=97 loops=1)
11 Index Cond: (recipient_id = 123)
12 -> Bitmap Heap Scan on messages messages_1 (cost=5.19..262.67 rows=100 width=48) (actual time=0.033..0.085 rows=95 loops=1)
13 Recheck Cond: (sender

Code Snippets

CREATE INDEX ON messages(recipient_id);
CREATE INDEX ON messages(sender_id);
CREATE TABLE users
(
    user_id integer PRIMARY KEY,
    user_name text 
) ;

CREATE TABLE messages
(
    sender_id    integer NOT NULL REFERENCES users(user_id) 
                 ON UPDATE CASCADE ON DELETE RESTRICT,
    recipient_id integer NOT NULL REFERENCES users(user_id) 
                 ON UPDATE CASCADE ON DELETE RESTRICT,
    send_date    timestamp NOT NULL DEFAULT now(),
    message_text text,
    PRIMARY KEY(sender_id, recipient_id, send_date)
) ;
CREATE INDEX ON messages (recipient_id) ;
-- Following index not needed, given our primary key
-- CREATE INDEX ON messages (sender_id) ;
-- Create 1000 users
INSERT INTO users (user_id, user_name)
SELECT
    user_id, 'user' || user_id AS user_name
FROM
    generate_series(1, 1000) AS x(user_id) ;

-- Create (aprox.) 100000 messages
INSERT INTO messages (sender_id, recipient_id, send_date)
SELECT
    (random()*999+1)::integer AS sender_id,
    (random()*999+1)::integer AS recipient_id,
    send_date
FROM
    generate_series(timestamp '2017-01-01', timestamp '2017-01-31', interval '25 seconds') AS x(send_date); 

ANALYZE;
-- Checking the query with OR
EXPLAIN ANALYZE
SELECT * 
FROM messages
WHERE recipient_id = 123 OR sender_id = 123
ORDER BY send_date ;
QUERY PLAN
1   Sort  (cost=420.21..420.71 rows=200 width=48) (actual time=0.430..0.445 rows=192 loops=1)
2     Sort Key: send_date
3     Sort Method: quicksort  Memory: 34kB
4     ->  Bitmap Heap Scan on messages  (cost=10.31..412.56 rows=200 width=48) (actual time=0.092..0.389 rows=192 loops=1)
5           Recheck Cond: ((recipient_id = 123) OR (sender_id = 123))
6           Heap Blocks: exact=157
7           ->  BitmapOr  (cost=10.31..10.31 rows=200 width=0) (actual time=0.061..0.061 rows=0 loops=1)
8                 ->  Bitmap Index Scan on messages_recipient_id_idx  (cost=0.00..5.04 rows=100 width=0) (actual time=0.033..0.033 rows=97 loops=1)
9                       Index Cond: (recipient_id = 123)
10                ->  Bitmap Index Scan on messages_pkey  (cost=0.00..5.17 rows=100 width=0) (actual time=0.025..0.025 rows=95 loops=1)
11                      Index Cond: (sender_id = 123)
12  Planning time: 0.379 ms
13  Execution time: 0.527 ms

Context

StackExchange Database Administrators Q#172907, answer score: 14

Revisions (0)

No revisions yet.