patternsqlModerate
Order of columns in a compound index in PostgreSQL (and query order)
Viewed 0 times
postgresqlcompoundordercolumnsqueryandindex
Problem
I have a table with 50K rows. It is actually a PostGIS table.
The query has 4 parts (1 mandatory) (3 Optional)
I expect about 100 - 4,000 returned rows
If I create a compound index on the table, which column should I use first. The fine grained is probably the location (data is spread over the world). I currently have it as GIST index.
The other indexes would be BTREE.
My intuition says use fine grained, and course last. E.g. There are only about 12 file types, so that would be very big buckets for the index.
What do the PostgreSQL and PostGIS gurus (who know the internals of the system) say?
UPDATE:
Let me sharpen this question.
Tell me if any of these assumptions are wrong. (I am pretty new to the idea of compound indexes)
-
The query has 4 parts (1 mandatory) (3 Optional)
- intersection box (a geography rectangle) with 4 lat,long (I use st_intersects) [Mandatory]
- Date Range (min, max) on a date field
- File type (a set of up to 8 text values) currently using IN( .....) but I can make that a temp table if needed. I see a lot of people don't like IN.
- Country (a text value).
I expect about 100 - 4,000 returned rows
If I create a compound index on the table, which column should I use first. The fine grained is probably the location (data is spread over the world). I currently have it as GIST index.
The other indexes would be BTREE.
My intuition says use fine grained, and course last. E.g. There are only about 12 file types, so that would be very big buckets for the index.
What do the PostgreSQL and PostGIS gurus (who know the internals of the system) say?
UPDATE:
Let me sharpen this question.
- I don't want anyone to have to do the work I should do. I respect your time too much. So I will get to the explain analyze later.
- All I was looking for was some pointers and tips and guidelines.
- I read this excellent little posting: https://devcenter.heroku.com/articles/postgresql-indexes#managing-and-maintaining-indexes about indexes
- What I normally do is create 4 separate indexes (geo-box, country name, file_type, and date) but what want to see what a composite query would do.
Tell me if any of these assumptions are wrong. (I am pretty new to the idea of compound indexes)
- Order is important. Choose as the first index the one that will cut the rows down the most (in my case the location (geography) which is a simple polygon or multi-polygon would do the best).
- Sometimes queries will skip indexes. But if I create a compound query with key (#1, #2, #3, #4) then even if the user creates something that asks for #1, #3 the planner will still use the single composite query, since they order is maintained.
-
Solution
As a part of my work I maintain a fairly large PostgreSQL database (around 120gb on disk, several multi-million-row tables) and have collected a few tricks on how to speed up the queries. First some comments on your assumptions:
-
I would suggest against making a 4-way index. Try creating one and then check the size, they can get really huge. In my experience, four 1-key indexes have been almost as fast as a single 4-way index. A trick that does work nicely for some specific queries are partial indexes, ie something like this:
CREATE INDEX ON table_x (key1, key2, key3) WHERE some_x_column = 'XXXX';
I have created aliases in my .psqlrc-file with queries to help find what indexes to add or remove. Feel free to have a look at them over at GitHub: .psql
I use the :seq_scans and :bigtables a lot, and then \d table_name to get details about the table. Don't forget to reset the statistics after you've done some changes, select pg_stat_reset();
- Yes, order is important, but it's only the first one that is really different, the rest are second class indexes.
- I'm not sure it will always use both, my guess is that the query planner will use #1, then do something clever with the rest.
- I have no experience with GIST.
- Yes, add all indexes first, see what is used the most and what gives the best performance.
- I would sugest that you try both and measure what works best. Try rewriting the sql with different subqueries, maybe country and time in one, then join with the intersect-query. I have not noticed any performance problem with IN-clauses, as long as the IN-list is not thousands of elements long. My guess is that a few different queries specifically tuned depending on the input criteria available will give the best results.
-
I would suggest against making a 4-way index. Try creating one and then check the size, they can get really huge. In my experience, four 1-key indexes have been almost as fast as a single 4-way index. A trick that does work nicely for some specific queries are partial indexes, ie something like this:
CREATE INDEX ON table_x (key1, key2, key3) WHERE some_x_column = 'XXXX';
I have created aliases in my .psqlrc-file with queries to help find what indexes to add or remove. Feel free to have a look at them over at GitHub: .psql
I use the :seq_scans and :bigtables a lot, and then \d table_name to get details about the table. Don't forget to reset the statistics after you've done some changes, select pg_stat_reset();
Context
StackExchange Database Administrators Q#59922, answer score: 15
Revisions (0)
No revisions yet.