HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Performance problems on SQL Server 2016 that goes away after running two updates flipping a null column value

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
awayafterflippingcolumnsqlnullgoesvaluetworunning

Problem

This is an abstract question, at least to begin with. But I am looking for theories on why I am experiencing what I am experiencing.

I have a table with a number of columns, one of which is a Int that allows null. Lets refer to that as NullableIntCol.

We have a few non-clustered indexes on the table, some which includes the column here.

We are then Querying the table in various ways, some where we do a WHERE NullableIntCol is null and when we do that we are experiencing that the queries may take 5-10 seconds to execute. Fair, we need to optimize something.

In our test environment we have been able to replicate the issue. Although here we are talking 1-3 seconds. But the exact same pattern.

Datawise here the table has about 600.000 rows, ~350MB worth of data.

After we have played around with various combination of indexes etc. what I discovered was that if I run the following on the table:

BEGIN TRANSACTION;
UPDATE [dbo].[MyTable]
   SET [NullableIntCol] = -1
 WHERE [NullableIntCol] is null;
UPDATE [dbo].[MyTable]
   SET [NullableIntCol] = null
   WHERE [NullableIntCol] = -1;
COMMIT TRANSACTION;


Suddenly query performance is way way better. (50ms-200ms in our test env)

This is the Query plan prior to executing the above.

https://www.brentozar.com/pastetheplan/?id=H1VCvT9uh

Estimated:

Executed:

And this is the changed query plan after.

https://www.brentozar.com/pastetheplan/?id=HJXMu69_2

Estimated:

Executed:

Which is clearly a change. But I still lack the "Why".

Further.

I can then restore the database to the state it was before and again replicate the issue.

Initially I was thinking that it could perhaps be that the above almost triggered a full rebuild of the indexes, however rebuilding the indexes does not seem to have much effect if any at all.

So right now, I am out of theories on why I may see such a behavior.

Update 2023-06-29

After running UPDATE STATISTICS [Scehma1].[Object1]; post the query above, it seems to go back to t

Solution

The difference between the two plans is that in the first (slow) one, Sql Server expects that only 20 rows will satisfy the WHERE clause, and it resolves to use an index seek followed by lookups in the main table.
As it turns out, at least 1000 rows satisfy the clause and this plan comes out to be very inefficient because of the 1000 lookups.

In the second plan, Sql Server correctly estimates that at least 1000 rows will have to be retrieved, so it decides to do a full table scan (and it finds 1000 qualifying rows after reading just 2328 rows from the table).

Sql Server has a feature that auto updates the statistics of a table when a high percentage of the rows are modified.
Since, as you say, It's approximately around 90% of all the rows that contain the value "NULL", the update of all those rows has triggered the auto update of the statistics of that table, allowing Sql server to choose a better plan.

You can test this by issuing an UPDATE STATISTICS command instead of your UPDATE block:

UPDATE STATISTICS [dbo].[MyTable];


EDIT:
It's also possible that in the first query there was no statistics available, and Sql server created them after your updates.
If the AUTO_CREATE_STATISTICS option in on, the Query Optimizer creates statistics on individual columns used in a predicate, if these statistics are not already available. So the updates could have triggered the creation of statistics on [NullableIntCol] and the following query will find and use them.

In this case, you could obtain the same result with:

CREATE STATISTICS statistics_name
ON [dbo].[MyTable] (NullableIntCol)
WITH FULLSCAN;


Update after your posting of the execution plans.

Please note that the original queries you posted had a TOP 1000 limit, which is absent in the query plans.
A limit on the total number of rows to be retrieved has an high impact on the query plan the optimizer will choose so you are comparing apples to oranges.

Also a TOP 1000 has a meaning only if you specify an ORDER BY, which also have a big impact on the optimizer.

Nevertheless, the plans clearly show that 251959 rows satisfy your WHERE condition, out of a total of 569158. That's 44% of total rows.
No index lookup can be of any benefit if you have to retrieve such a large part of the table, so the best plan is always a table scan.
You could benefit from a covering index (which included all the columns you are selecting) but in your example you are selecting all columns of the table and such index would be a duplication of the whole table.

In such scenario, SQL server should never pick an index seek, unless it wrongly estimates that the percentage of rows satisfying the WHERE condition is much smaller. Having appropriate and updated statistics on Column4 and Column9 should avoid this.

But, if you are applying an ORDER BY and a TOP 1000 limit, then an index on Column4, Column9 and any further column specified in the ORDER BY could speed up the things by allowing to read just the first 1000 rows and then stop. This could be even faster if the index inlcuded all the columns needed in the select.

Also note that if you need even a single column not included in the index, sql server will have to do a lookup on the table for every row. So including 12 columns is useless if you select 15 columns and it just makes the index bigger and slower. Either include all the needed ones or don't include anything.

Code Snippets

UPDATE STATISTICS [dbo].[MyTable];
CREATE STATISTICS statistics_name
ON [dbo].[MyTable] (NullableIntCol)
WITH FULLSCAN;

Context

StackExchange Database Administrators Q#328721, answer score: 2

Revisions (0)

No revisions yet.