HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlCritical

Is a composite index also good for queries on the first field?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fieldthequeriesalsofirstforcompositegoodindex

Problem

Let's say I have a table with fields A and B. I make regular queries on A+B, so I created a composite index on (A,B). Would queries on only A also be fully optimized by the composite index?

Additionally, I created an index on A, but Postgres still uses the composite index for queries on only A. If the previous answer is positive, I guess it doesn't really matter, but why does it select the composite index by default, if the single A index is available?

Solution

It certainly is. We discussed that in great detail under this related question:

  • Working of indexes in PostgreSQL



All of this applies to (default) B-tree indices, not other types.

Space is allocated in multiples of MAXALIGN, which is typically 8 bytes on a 64-bit OS or (much less common) 4 bytes on a 32-bit OS. If you are not sure, check pg_controldata. It also depends on data types of indexed columns (some require alignment padding) and actual content.

An index on, say, two integer columns (4 bytes each) typically ends up to be exactly as big as an index on just one, where another 4 bytes are lost to alignment padding.

Update for Postgres 13: The new index deduplication changes this. Repeated index values can be compressed. An index on (a,b) is often less duplicative, so it gains less from deduplication.

If a is a unique column, deduplication doesn't change much. (There can still be duplicative entries due to the MVCC model, only one visible to any transaction at the same time, so adding b can still have a minor effect.)

Apart from that, there is really no downside for the query planner to use an index on (a,b) - compared to an index on just (a). And it is generally preferable for multiple queries to use the same index. The chance for it (or parts of it) to reside in (fast) cache grows when shared.

If you already maintain an index on (a,b), then it doesn't make sense to create another index on just (a) - unless it is substantially smaller. The same is not true for (b,a) vs. (a). Follow the link in the first line for more on that.

Coming from the opposite direction, when you need an additional index like that on (a,b), then consider dropping an existing index on just (a) - if possible. Often not possible as that's the index of a PK or UNIQUE constraint. Since Postgres 11 you might get away with just appending b to the constraint definition with the INCLUDE clause instead. Details in the manual.

Or create the new index on (b,a) instead to cover queries on just b additionally. For only equality conditions the order of index expressions does not matter. It does, though, when involving range conditions. See:

  • Multicolumn index and performance



There are potential downsides to including additional columns in an index, even if that only uses space otherwise lost to alignment padding:

  • Whenever the additional column is updated, the index now needs an update, too, which might add cost to write operations and create more index bloat.



  • HOT updates (Heap Only Tuple) on the table are not possible while any index column is involved.



More on HOT updates:

  • Redundant data in update statements



How to measure object sizes:

  • Measure the size of a PostgreSQL table row

Context

StackExchange Database Administrators Q#27481, answer score: 128

Revisions (0)

No revisions yet.