HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Oracle is not using a unique index for a long key

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
uniquelongusingforindexoraclenotkey

Problem

I have a table with 250K rows in my test database. (There are a few hundred millions in production, we can observe the same issue there.) The table has an nvarchar2(50) string identifier, not null, with a unique index on it (it's not the PK).

The identifiers are made up of a first part that has 8 different values in my test database (and about a thousand in production), then an @ sign, and finally a number, 1 to 6 digits long. For example there could be 50 thousand rows that start with 'ABCD_BGX1741F_2006_13_20110808.xml@', and it is followed by 50 thousand different numbers.

When I query for a single row based on its identifier, the cardinality is estimated as 1, the cost is very low, it works fine. When I query for more than one row with several identifiers in an IN expression or an OR expression, the estimations for the index are completely wrong, so a full table scan is used. If I force the index with a hint, it is very fast, the full table scan is actually executed an order of magnitude slower (and a lot more slower in production). So it is an optimizer problem.

As a test, I duplicated the table (in the same schema+tablespace) with the exact same DDL and exact same content. I recreated the unique index on the first table for good measure, and created the exact same index on the clone table. I did a DBMS_STATS.GATHER_SCHEMA_STATS('schemaname',estimate_percent=>100,cascade=>true);. You can even see that the index names are consecutive. So now the only difference between the two tables is that the first one was loaded in random order over a long time period, with blocks scattered on the disk (in a tablespace together with several other big tables), the second was loaded as one batched INSERT-SELECT. Other than that, I can't imagine any difference. (The original table has been shrinked since the last big deletion, and there hasn't been a single delete after that.)

Here are query plans for the sick and the clone table (The strings under the black brush are the

Solution

I found the solution! It is so beautiful and I actually learned a LOT about Oracle.

In one word: histograms.

I started reading a lot about how Oracle's CBO works and I stumbled upon histograms. I didn't fully understand so I took a look at the USER_HISTOGRAMS table, and voilá. There were several rows for the sick table, and practically nothing for the cloned table. For the sick table, there was one row for each of the 8 different identifier-starting-parts. And this is the key: they were cut off at 32 characters, before the @ sign. As I said, the first part of keys is highly repetitive, they become different after the @ sign.

It seems that histograms can be more powerful than the simple fact that a unique index always has a cardinality of 0 or 1 for a given value. When I was querying for 2+ rows, Oracle looked at the histogram, it thought that there could be tens of thousands of values for that identifier-starting-part, and it threw the CBO off course.

I deleted the histograms for that column in the old table and the problem went away!

More reading: https://blogs.oracle.com/optimizer/entry/how_do_i_drop_an_existing_histogram_on_a_column_and_stop_the_auto_stats_gathering_job_from_creating

Context

StackExchange Database Administrators Q#56130, answer score: 8

Revisions (0)

No revisions yet.