HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Do I risk losing the benefits of indexing if I have an index on every column?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
theriskcolumnindexingeverylosingbenefitsindexhave

Problem

I use PostgreSQL 9.2. I have a table with ~5 million rows and 150 columns. The table does not change at all (I replace it once a year). Users query this table with all kinds of filters on any some of the columns, e.g.

select * from table where C > 43 and H is not null;
select * from table where A is null and F  1 and X > 2;


For performance I plan to create an index on every column of the table. Some feeling in my stomach tells me to ask the experts first: Is it good design for the above described use case to create an index on every column?

UPDATE:
I have to speculate about real use cases. I can't measure the exact queries yet. This is in design phase.

The server is well equipped with RAM and SSD storage, so queries are already "fast" now, and I can feel the effect of caching when I fire similar queries in sequence.

The columns are of types double, integer, timestamp and geometry (which explicitly gets a 'gist' index).

The queries will include from 1 to 10 columns. Usually ~6. Results will usually be <20k rows. Queries on a column will never relate to another column.

Thanks for all the explanations. What I will do:
* select 1/4th of the columns that I think will be most used and create indexes.
* wait for more testing/usage and start measuring/analysing the queries and use-cases then.

Thank you

Solution

When you say "for performance" you're taking a narrow view - "performance" as SELECT performance only, one query at a time, without considering caching and I/O contention issues.

Yes, having an index on every column might improve read performance. It'll certainly slow write performance (INSERT, UPDATE and DELETE), though that's not a concern for your DB. More subtly, more indexes means more contention for space in RAM to cache the indexes, which means there's a greater chance that a scan of any given index will result in slow disk reads.

Then there's the fact that you can index more than just individual columns. It's often most productive to create multicolumn, partial, expression and/or descending indexes to best satisfy the needs of your queries. You simply cannot create every possible index. Here's a recent example of a real world index you'd never create without analysis of the queries that required it:

CREATE INDEX contfloattable_tag_and_timeseg
ON contfloattable(tagindex, (floor(extract(epoch FROM dateandtime) / 60 / 15) ));


So: you could create an index on every column, but it's a bad idea. Use EXPLAIN AND EXPLAIN ANALYZE, possibly via the auto_explain module, to analyse your workload and queries, and make indexing decisons based on how you actually query the tables. An index you don't use will hurt you in a read/write environment, and won't do any good in a read-only environment.

Don't view queries in isolation. Look at patterns. If query1 seems to want an index on col1 and query2 seems to want an index on both col1 and col2, don't bother creating an index on just col1. Create a composite index on col1, col2; it'll be almost as fast for a search on col1 alone and way better than maintaining two indexes and having them fight over cache space and disk I/O.

BTW, this sounds like it's more of an OLAP workload than an OLTP one, so you might want to consider schemas designed for OLAP workloads like star schema with fact and dimension tables.

Code Snippets

CREATE INDEX contfloattable_tag_and_timeseg
ON contfloattable(tagindex, (floor(extract(epoch FROM dateandtime) / 60 / 15) ));

Context

StackExchange Database Administrators Q#27949, answer score: 7

Revisions (0)

No revisions yet.