HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Poor cardinality estimate disqualifies INSERT from minimal logging?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
estimatelogginginsertcardinalitypoordisqualifiesminimalfrom

Problem

Why is the second INSERT statement ~5x slower than the first?

From the amount of log data generated, I think that the second is not qualifying for minimal logging. However, the documentation in the Data Loading Performance Guide indicates that both inserts should be able to be minimally logged. So if minimal logging is the key performance difference, why is it that the second query does not qualify for minimal logging? What can be done to improve the situation?

Query #1: Inserting 5MM rows using INSERT...WITH (TABLOCK)

Consider the following query, which inserts 5MM rows into a heap. This query executes in 1 second and generates 64MB of transaction log data as reported by sys.dm_tran_database_transactions.

CREATE TABLE dbo.minimalLoggingTest (n INT NOT NULL)
GO
INSERT INTO dbo.minimalLoggingTest WITH (TABLOCK) (n)
SELECT n
-- Any table/view/sub-query that correctly estimates that it will generate 5MM rows
FROM dbo.fiveMillionNumbers
-- Provides greater consistency on my laptop, where other processes are running
OPTION (MAXDOP 1)
GO


Query #2: Inserting the same data, but SQL underestimates the # of rows

Now consider this very similar query, which operates on exactly the same data but happens to draw from a table (or complex SELECT statement with many joins in my actual production case) where the cardinality estimate is too low. This query executes in 5.5 seconds and generates 461MB of transaction log data.

CREATE TABLE dbo.minimalLoggingTest (n INT NOT NULL)
GO
INSERT INTO dbo.minimalLoggingTest WITH (TABLOCK) (n)
SELECT n
-- Any table/view/sub-query that produces 5MM rows but SQL estimates just 1000 rows
FROM dbo.fiveMillionNumbersBadEstimate
-- Provides greater consistency on my laptop, where other processes are running
OPTION (MAXDOP 1)
GO


Full script

See this Pastebin for a full set of scripts to generate the test data and execute either of these scenarios. Note that you must use a database that is in the SIMPLE recovery m

Solution

Why is it that the second query does not qualify for minimal logging?

Minimal logging is available for the second query, but the engine chooses not to use it at runtime.

There is a minimum threshold for INSERT...SELECT below which it chooses not to use the bulk load optimizations. There is a cost involved in setting up a bulk rowset operation, and bulk-inserting only a few rows would not result in efficient space utilization.

What can be done to improve the situation?

Use one of the many other methods (e.g. SELECT INTO) that does not have this threshold. Alternatively, you might be able to rewrite the source query in some way to boost the estimated number of rows/pages over the threshold for INSERT...SELECT.

See also Geoff's self-answer for more useful information.

Possibly interesting trivia: SET STATISTICS IO reports logical reads for the target table only when bulk loading optimizations are not used.

Context

StackExchange Database Administrators Q#116368, answer score: 7

Revisions (0)

No revisions yet.