HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlMinor

Advantage of using INCLUDE as against adding the column in INDEX for covering index

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thecolumnincludeaddingagainstusingforcoveringindexadvantage

Problem

Postgres docs state the following about Index-Only Scans and Covering-Indexes:

if you commonly run queries like

SELECT y FROM tab WHERE x = 'key';

the traditional approach to speeding up such queries would be to
create an index on x only. However, an index defined as

CREATE INDEX tab_x_y ON tab(x) INCLUDE (y);

could handle these queries as index-only scans, because y can be
obtained from the index without visiting the heap.

Because column y is not part of the index's search key, it does not
have to be of a data type that the index can handle; it's merely
stored in the index and is not interpreted by the index machinery.
Also, if the index is a unique index, that is

CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);

the uniqueness condition applies to just column x, not to the
combination of x and y. (An INCLUDE clause can also be written in
UNIQUE and PRIMARY KEY constraints, providing alternative syntax for
setting up an index like this.)

Question 1: If the data type of y can be added in index and there is no uniqueness requirement then is there any advantage of using CREATE INDEX tab_x_y ON tab(x) INCLUDE (y) over CREATE INDEX tab_x_y ON tab(x, y) for queries like SELECT y FROM tab WHERE x = 'key';?

It's wise to be conservative about adding non-key payload columns to
an index, especially wide columns. If an index tuple exceeds the
maximum size allowed for the index type, data insertion will fail. In
any case, non-key columns duplicate data from the index's table and
bloat the size of the index, thus potentially slowing searches.

Question 2: Can someone explain with an example what wide columns mean?

Question 3: Can someone explain the below statement in context of INCLUDE(y). If INCLUDE supports index only scans then y will also have to be stored in index. Then how does the below statement not hold for INCLUDE(y).

In any case, non-key columns duplicate data from the index's table and
bloat the size of the index

Solution

In addition to Erwin's great answer, there is an additional advantage to using the INCLUDE syntax: documentation.

Imagine that you decide that you need an index on columns (a, b) of table tab. Now you find that there is already an index on (a, c). In this situation you have two options:

-
simply go ahead and create another index

-
if you know for sure that column c was only added to the index to support an index-only scan and is never used as a search condition, you can drop the old index and create a new one on (a, b, c), thus saving an index

Now it is usually difficult to determine that an index column is never used as a search condition, unless – well, unless it appears in the INCLUDE clause. In that case, you don't have to think twice and can replace the index on (a) INCLUDE (c) with one on (a, b) INCLUDE (c).

Context

StackExchange Database Administrators Q#313274, answer score: 6

Revisions (0)

No revisions yet.