HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Identical tables, identical query, completely different execution times

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
tablescompletelyqueryidenticaldifferenttimesexecution

Problem

We have two tables with identical columns and indexing (No indexing at all, basically). We run the same query, which in case of the original table takes 5 seconds to run; in the case of the new table we let it run for 30 minutes and then killed the query.

We updated statistics, but that had no result. We tried a rebuild of the new table to see if defragmenting it would help, but that had no effect either.

On a hunch, we then exported both tables to the same database, in order to see whether or not anything would change there, but we have the exact same results.

I'm kind of baffled how this could be. To make things even more interesting, the original table contains more data than the new table, which should theoretically mean that the new table should finish the query faster.

Does anyone have a possible explanation? I've worked for years as a DBA (though not for a few years now), and frankly, I'm baffled as to why this could possibly happen.

In response to some of the comments, here is the table definition in question:

```
CREATE TABLE [dbo].[Fact_SubscriptionDetail_Test](
[CustomerSellTo_Key] [int] NULL,
[CustomerContactSellTo_Key] [int] NULL,
[CustomerRefTo_Key] [int] NULL,
[WebAuthUser_Key] [int] NULL,
[Country_Key] [int] NULL,
[Date_Key] [int] NULL,
[SnapshotDate_Key] [int] NULL,
[Product_Key] [int] NULL,
[License_Key] [int] NULL,
[Subscription_Key] [int] NULL,
[SubscriptionTypeLostSeatsType_Key] [int] NULL,
[M_SubscriptionDetail_NewSeats] [int] NULL,
[M_SubscriptionDetail_WinbackSeats] [int] NULL,
[M_SubscriptionDetail_RenewedFlexSeats] [int] NULL,
[M_SubscriptionDetail_RenewedCommitSeats] [int] NULL,
[M_SubscriptionDetail_RenewedSeats] [int] NULL,
[M_SubscriptionDetail_ActiveSeatsEndMonth] [int] NULL,
[M_SubscriptionDetail_LostNonPaymentSeats] [int] NULL,
[M_SubscriptionDetail_LostGraceInactiveSeats] [int] NULL,
[M_SubscriptionDetail_LostOtherSeats] [int] NULL,
[M_SubscriptionDetail_LostSeats] [int] NULL,
[M_SubscriptionDetail_GrossBookings] [numeric

Solution

...the original table contains more data than the new table, which should theoretically mean that the new table should finish the query faster.

If the execution plans were the same, this would most likely be true, but they're not. The number of rows expected (and the distribution of the data according to the statistics) affects the strategy chosen by the query optimizer.
Small table

When the heap table contains 398,399 rows, the optimizer chooses a serial plan using nested loops for all join operations except those directly affecting the heap table; those joins employ a hash join.

The complexity of the query is such that cardinality (row count) estimates are likely to be inaccurate, so the nested loop strategy ends up being a disaster. The optimizer considered a parallel plan, but rejected it as more expensive than the serial nested loops option.
Larger table

When the heap table contains 1,750,640 rows, changes in cost estimations mean the optimizer assesses that a parallel plan using hash joins and optimized bitmap filters extensively will be a better strategy.

This plan shape is much more resilient to cardinality estimation errors. The hash join build inputs may spill to disk, but the worst case is much better than executing an entire subtree a huge number of times (using nested loops).
Solution 1

If you know that a parallel plan with bitmap filters is the best choice in general, you could use a Plan Guide to enforce this, or encourage such a plan with an OPTION (HASH JOIN, MERGE JOIN) query hint. You may not always get parallelism if the tables are small, but performance should still improve (and be more predictable in general).
Solution 2

You could also explore breaking the query into simpler sections, initially along the lines of the Common Table Expression boundaries. Materializing sensible-sized intermediate results into temporary tables means:

  • Each part of the query can be optimized separately



  • The optimizer has statistics and accurate cardinality information



  • Additional indexes can be added to the temporary tables if proven to be beneficial



  • Redundant CTE evaluations can be skipped

Context

StackExchange Database Administrators Q#135652, answer score: 4

Revisions (0)

No revisions yet.