snippetsqlMinor
PostgreSQL How to optimize a query with ORDER BY and LIMIT 1?
Viewed 0 times
postgresqlorderwithlimitqueryoptimizehowand
Problem
I have the following PostgreSQL schema:
Its main use case is a typical producer/consumer scenario. Different users may insert an item in the database in a particular box for a particular user and each user can retrieve the topmost(this means the oldest) item in a box that is addressed to her/him. It more or less mimics the functionality of a queue on a database level.
More precisely, the most common operations are the following:
And retrieve commands based on a combination of either
and
The last two snippets are packed within a stored procedure.
I was wondering how to achieve best performance given this use case and knowing that the users will insert and retrieve somewhere between
EDIT
This is the
```
Limit (cost=23.07..23.07 rows=1 width=35)
-> Sort (cost=23.07..25.07 rows=799 width=35)
Sort Key: ARRIVAL
-> Seq Scan on Item i (cost=0.00..19.07 rows=799 width=35)
Filter: (((RECEIVER = 1) OR (RECEIVER IS NUL
CREATE TABLE User (
ID INTEGER PRIMARY KEY
);
CREATE TABLE BOX (
ID INTEGER PRIMARY KEY
);
CREATE SEQUENCE seq_item;
CREATE TABLE Item (
ID INTEGER PRIMARY KEY DEFAULT nextval('seq_item'),
SENDER INTEGER REFERENCES User(id),
RECEIVER INTEGER REFERENCES User(id),
INFO TEXT,
BOX_ID INTEGER REFERENCES Box(id) NOT NULL,
ARRIVAL TIMESTAMP
);Its main use case is a typical producer/consumer scenario. Different users may insert an item in the database in a particular box for a particular user and each user can retrieve the topmost(this means the oldest) item in a box that is addressed to her/him. It more or less mimics the functionality of a queue on a database level.
More precisely, the most common operations are the following:
INSERT INTO ITEM(SENDER, RECEIVER, INFO, BOX_ID, ARRIVAL)
VALUES (nsid, nrid, ncontent, nqid, ntime);And retrieve commands based on a combination of either
RECEIVER+SENDER or RECEIVER+BOX_ID:SELECT * INTO it FROM Item i WHERE (i.RECEIVER=? OR i.RECEIVER is NULL) AND
(i.BOX_ID=?) ORDER BY ARRIVAL LIMIT 1;
DELETE FROM Item i WHERE i.id=it.id;and
SELECT * INTO it FROM Item i WHERE (i.RECEIVER=? OR i.RECEIVER is NULL) AND
(i.SENDER=?) ORDER BY ARRIVAL LIMIT 1;
DELETE FROM Item i WHERE i.id=it.id;The last two snippets are packed within a stored procedure.
I was wondering how to achieve best performance given this use case and knowing that the users will insert and retrieve somewhere between
50,000 and 500,000 items (however, the database is never expected to contain more than 100,000 items at a given point)?EDIT
This is the
EXPLAIN I get with for the SELECT statements no indexes:```
Limit (cost=23.07..23.07 rows=1 width=35)
-> Sort (cost=23.07..25.07 rows=799 width=35)
Sort Key: ARRIVAL
-> Seq Scan on Item i (cost=0.00..19.07 rows=799 width=35)
Filter: (((RECEIVER = 1) OR (RECEIVER IS NUL
Solution
A btree index on
One on
Similarly, you would want another index on
Also, performance testing on a table with 800 rows is useless if the real table will have 100,000 rows.
(sender,arrival) could help. That would allow it to jump directly to the first-arrived message for a given sender. One on
(arrival,sender) is less likely to help. That allows you to jump to the first-sent message globally, but then you still have to walk along those messages until you hit one from the specified sender. If that particular sender is new, or only sends messages to people who keep up on their inbox, then you might have to walk through most of the index before you find a qualifying message. It might help in that you only have to walk the index, not the index and table combined, but that would still be a much smaller win than the proper index of (sender,arrival).Similarly, you would want another index on
(box_id, arrival), not on (arrival, box_id).Also, performance testing on a table with 800 rows is useless if the real table will have 100,000 rows.
Context
StackExchange Database Administrators Q#119760, answer score: 3
Revisions (0)
No revisions yet.