gotchasqlMajor
Why does LEN() function badly underestimate cardinality in SQL Server 2014?
Viewed 0 times
whycardinalitysqlbadlyfunctiondoesserverlen2014underestimate
Problem
I have a table with a string column and a predicate that checks for rows with a certain length. In SQL Server 2014, I am seeing an estimate of 1 row regardless of the length I am checking for. This is yielding very poor plans because there are actually thousands or even millions of rows and SQL Server is choosing to put this table on the outer side of a nested loop.
Is there an explanation for the cardinality estimate of 1.0003 for SQL Server 2014 while SQL Server 2012 estimates 31,622 rows? Is there a good workaround?
Here is a short reproduction of the issue:
Here is a more complete script showing additional tests
I have also read the whitepaper on the SQL Server 2014 Cardinality Estimator, but didn't find anything there that clarified the situation.
Is there an explanation for the cardinality estimate of 1.0003 for SQL Server 2014 while SQL Server 2012 estimates 31,622 rows? Is there a good workaround?
Here is a short reproduction of the issue:
-- Create a table with 1MM rows of dummy data
CREATE TABLE #customers (cust_nbr VARCHAR(10) NOT NULL)
GO
INSERT INTO #customers WITH (TABLOCK) (cust_nbr)
SELECT TOP 1000000
CONVERT(VARCHAR(10),
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) AS cust_nbr
FROM master..spt_values v1
CROSS JOIN master..spt_values v2
GO
-- Looking for string of a certain length.
-- While both CEs yield fairly poor estimates, the 2012 CE is much
-- more conservative (higher estimate) and therefore much more likely
-- to yield an okay plan rather than a drastically understimated loop join.
-- 2012: 31,622 rows estimated, 900K rows actual
-- 2014: 1 row estimated, 900K rows actual
SELECT COUNT(*)
FROM #customers
WHERE LEN(cust_nbr) = 6
OPTION (QUERYTRACEON 9481) -- Optionally, use 2012 CE
GOHere is a more complete script showing additional tests
I have also read the whitepaper on the SQL Server 2014 Cardinality Estimator, but didn't find anything there that clarified the situation.
Solution
For the legacy CE, I see the estimate is for 3.16228 % of the rows – and that is a "magic number" heuristic used for column = literal predicates (there are other heuristics based on predicate construction – but the
Now as for the new CE behavior, it looks like this is now visible to the optimizer (which means we can use statistics). I went through the exercise of looking at the calculator output below, and you can look at the associated auto-generation of stats as a pointer:
Unfortunately the logic relies on an estimate of the number of distinct values, which is not adjusted for the effect of the
Possible workaround
You can get a trie-based estimate under both CE models by rewriting the
Information on Trace Flags used:
LEN wrapped around the column for the legacy CE results matches this guess-framework). You can see examples of this on a post on Selectivity Guesses in absence of Statistics by Joe Sack, and Constant-Constant Comparison Estimation by Ian Jose.-- Legacy CE: 31622.8 rows
SELECT COUNT(*)
FROM #customers
WHERE LEN(cust_nbr) = 6
OPTION ( QUERYTRACEON 9481); -- Legacy CE
GONow as for the new CE behavior, it looks like this is now visible to the optimizer (which means we can use statistics). I went through the exercise of looking at the calculator output below, and you can look at the associated auto-generation of stats as a pointer:
-- New CE: 1.00007 rows
SELECT COUNT(*)
FROM #customers
WHERE LEN(cust_nbr) = 6
OPTION ( QUERYTRACEON 2312 ); -- New CE
GO
-- View New CE behavior with 2363 (for supported option use XEvents)
SELECT COUNT(*)
FROM #customers
WHERE LEN(cust_nbr) = 6
OPTION (QUERYTRACEON 2312, QUERYTRACEON 2363, QUERYTRACEON 3604, RECOMPILE); -- New CE
GO
/*
Loaded histogram for column QCOL:
[tempdb].[dbo].[#customers].cust_nbr from stats with id 2
Using ambient cardinality 1e+006 to combine distinct counts:
999927
Combined distinct count: 999927
Selectivity: 1.00007e-006
Stats collection generated:
CStCollFilter(ID=2, CARD=1.00007)
CStCollBaseTable(ID=1, CARD=1e+006 TBL: #customers)
End selectivity computation
*/
EXEC tempdb..sp_helpstats '#customers';
--Check out AVG_RANGE_ROWS values (for example - plenty of ~ 1)
DBCC SHOW_STATISTICS('tempdb..#customers', '_WA_Sys_00000001_B0368087');
--That's my Stats name yours is subject to changeUnfortunately the logic relies on an estimate of the number of distinct values, which is not adjusted for the effect of the
LEN function.Possible workaround
You can get a trie-based estimate under both CE models by rewriting the
LEN as a LIKE:SELECT COUNT_BIG(*)
FROM #customers AS C
WHERE C.cust_nbr LIKE REPLICATE('_', 6);Information on Trace Flags used:
- 2363: shows a lot of information, including statistics being loaded.
- 3604: prints the output of DBCC commands to the messages tab.
Code Snippets
-- Legacy CE: 31622.8 rows
SELECT COUNT(*)
FROM #customers
WHERE LEN(cust_nbr) = 6
OPTION ( QUERYTRACEON 9481); -- Legacy CE
GO-- New CE: 1.00007 rows
SELECT COUNT(*)
FROM #customers
WHERE LEN(cust_nbr) = 6
OPTION ( QUERYTRACEON 2312 ); -- New CE
GO
-- View New CE behavior with 2363 (for supported option use XEvents)
SELECT COUNT(*)
FROM #customers
WHERE LEN(cust_nbr) = 6
OPTION (QUERYTRACEON 2312, QUERYTRACEON 2363, QUERYTRACEON 3604, RECOMPILE); -- New CE
GO
/*
Loaded histogram for column QCOL:
[tempdb].[dbo].[#customers].cust_nbr from stats with id 2
Using ambient cardinality 1e+006 to combine distinct counts:
999927
Combined distinct count: 999927
Selectivity: 1.00007e-006
Stats collection generated:
CStCollFilter(ID=2, CARD=1.00007)
CStCollBaseTable(ID=1, CARD=1e+006 TBL: #customers)
End selectivity computation
*/
EXEC tempdb..sp_helpstats '#customers';
--Check out AVG_RANGE_ROWS values (for example - plenty of ~ 1)
DBCC SHOW_STATISTICS('tempdb..#customers', '_WA_Sys_00000001_B0368087');
--That's my Stats name yours is subject to changeSELECT COUNT_BIG(*)
FROM #customers AS C
WHERE C.cust_nbr LIKE REPLICATE('_', 6);Context
StackExchange Database Administrators Q#120064, answer score: 20
Revisions (0)
No revisions yet.