HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

PostgreSQL How to optimize a query with ORDER BY and LIMIT 1?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlorderwithlimitqueryoptimizehowand

Problem

I have the following PostgreSQL schema:

CREATE TABLE User (
    ID INTEGER PRIMARY KEY
);

CREATE TABLE BOX (
    ID INTEGER PRIMARY KEY 
);

CREATE SEQUENCE seq_item;

CREATE TABLE Item (
    ID INTEGER PRIMARY KEY DEFAULT nextval('seq_item'),
    SENDER INTEGER REFERENCES User(id),
    RECEIVER INTEGER REFERENCES User(id),
    INFO TEXT,
    BOX_ID INTEGER REFERENCES Box(id) NOT NULL,
    ARRIVAL TIMESTAMP
);


Its main use case is a typical producer/consumer scenario. Different users may insert an item in the database in a particular box for a particular user and each user can retrieve the topmost(this means the oldest) item in a box that is addressed to her/him. It more or less mimics the functionality of a queue on a database level.

More precisely, the most common operations are the following:

INSERT INTO ITEM(SENDER, RECEIVER, INFO, BOX_ID, ARRIVAL) 
VALUES (nsid, nrid, ncontent, nqid, ntime);


And retrieve commands based on a combination of either RECEIVER+SENDER or RECEIVER+BOX_ID:

SELECT * INTO it FROM Item i WHERE (i.RECEIVER=? OR i.RECEIVER is NULL) AND 
(i.BOX_ID=?) ORDER BY ARRIVAL LIMIT 1;
DELETE FROM Item i WHERE i.id=it.id;


and

SELECT * INTO it FROM Item i WHERE (i.RECEIVER=? OR i.RECEIVER is NULL) AND 
(i.SENDER=?) ORDER BY ARRIVAL LIMIT 1;
DELETE FROM Item i WHERE i.id=it.id;


The last two snippets are packed within a stored procedure.

I was wondering how to achieve best performance given this use case and knowing that the users will insert and retrieve somewhere between 50,000 and 500,000 items (however, the database is never expected to contain more than 100,000 items at a given point)?

EDIT

This is the EXPLAIN I get with for the SELECT statements no indexes:

```
Limit (cost=23.07..23.07 rows=1 width=35)
-> Sort (cost=23.07..25.07 rows=799 width=35)
Sort Key: ARRIVAL
-> Seq Scan on Item i (cost=0.00..19.07 rows=799 width=35)
Filter: (((RECEIVER = 1) OR (RECEIVER IS NUL

Solution

A btree index on (sender,arrival) could help. That would allow it to jump directly to the first-arrived message for a given sender.

One on (arrival,sender) is less likely to help. That allows you to jump to the first-sent message globally, but then you still have to walk along those messages until you hit one from the specified sender. If that particular sender is new, or only sends messages to people who keep up on their inbox, then you might have to walk through most of the index before you find a qualifying message. It might help in that you only have to walk the index, not the index and table combined, but that would still be a much smaller win than the proper index of (sender,arrival).

Similarly, you would want another index on (box_id, arrival), not on (arrival, box_id).

Also, performance testing on a table with 800 rows is useless if the real table will have 100,000 rows.

Context

StackExchange Database Administrators Q#119760, answer score: 3

Revisions (0)

No revisions yet.