patternMinor

Does a cluster index provide more benefits than pre-sorting the load file and creating non-cluster index?

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

sortingthefileprovidecreatingnonthanmoreprebenefits

Problem

Informix 11.70.TC4DE:

CREATE TABLE cluster_tbl
    (
fk_id INT,
data CHAR(2048)
    );

LOAD FROM "presorted.ld" INSERT INTO cluster_tbl;

CREATE UNIQUE CLUSTER INDEX cl_idx ON cluster_tbl(fk_id);

{###}

CREATE TABLE noncluster_tbl
    (
fk_id INT,
data CHAR(2048)
    );

LOAD FROM "presorted.ld" INSERT INTO noncluster_tbl;

CREATE UNIQUE INDEX ncl_idx ON noncluster_tbl(fk_id);

{###}

UPDATE STATISTICS;

It's my understanding that a cluster index is essentially the same as pre-sorting the load file, inserting it into the table and creating the index for the foreign key column.

I created two tables, each loaded with 2 Million rows of identically data, pre-sorted by the foreign key. However, one table has a cluster index on fk_id, the other table with a non-clustered index on fk_id.

Is there any difference in the tree structure or additional overhead in maintaining or accessing data from a table with a cluster index, versus the same table/data with a non-cluster index?

Solution

The order you insert data into a table does not guarantee the order of the data on this disk.

Even if the order happens to be as you wish, it's not structured in a way that the optimiser will know that it's searchable by that order. (It will be treated as un-ordered.)

When you have a non-clustered index, and you need to use fields not covered by the index itself, the index only contains a pointer to the data. This is similar to introducing an extra join; a quick one, but a definite extra step in loading that data.

In short, it should nearly always be advantageous to have a clustered index (excepting circumstances like repeatedly inserting data at random positions in the table).

Which is contrary to your experience...

So, what are the queries that you are running? It may be that something counter intuitive is happening, but it may still be explainable.

You've given us the schema, thanks.

Could you also include the test queries you are using?

And could you also include the methodology and results for measuring performance?

And, finally, if possible, the execution plans for each query?

Context

StackExchange Database Administrators Q#21000, answer score: 2

Revisions (0)

No revisions yet.