snippetsqlMinor
How can row estimates be improved in order to reduce chances of spills to tempdb
Viewed 0 times
chancesestimatescanordertempdbimprovedspillsreducehowrow
Problem
I notice that when there are spill to tempdb events (causing slow queries) that often the row estimates are way off for a particular join. I've seen spill events occur with merge and hash joins and they often increase the runtime 3x to 10x. This question concerns how to improve row estimates under the assumption that it will reduce chances of spill events.
Actual Number of rows 40k.
For this query, the plan shows bad row estimate (11.3 rows):
For this query, the plan shows good row estimate (56k rows):
Can statistics or hints be added to improve the row estimates for the first case? I tried adding statistics with particular filter values (property = 2840) but either could not get the combination correct or perhaps it is being ignored because the ObjectId is unknown at compile time and it might be choosing an average over all ObjectIds.
Is there any mode where it would do the probe query first and then use that to determine the row estimates or must it fly blindly?
This particular property has many values (40k) on a few objects and zero on the vast majority. I would be happy with a hint where the max expected number of rows for a given join could be specified. This is a generally haunting problem because some parameters may be determined dynamically as part of the join or would be better
Actual Number of rows 40k.
For this query, the plan shows bad row estimate (11.3 rows):
select Value
from Oav.ValueArray
where ObjectId = (select convert(bigint, Value) NodeId
from Oav.ValueArray
where PropertyId = 3331
and ObjectId = 3540233
and Sequence = 2)
and PropertyId = 2840
option (recompile);For this query, the plan shows good row estimate (56k rows):
declare @a bigint = (select convert(bigint, Value) NodeId
from Oav.ValueArray
where PropertyId = 3331
and ObjectId = 3540233
and Sequence = 2);
select Value
from Oav.ValueArray
where ObjectId = @a
and PropertyId = 2840
option (recompile);Can statistics or hints be added to improve the row estimates for the first case? I tried adding statistics with particular filter values (property = 2840) but either could not get the combination correct or perhaps it is being ignored because the ObjectId is unknown at compile time and it might be choosing an average over all ObjectIds.
Is there any mode where it would do the probe query first and then use that to determine the row estimates or must it fly blindly?
This particular property has many values (40k) on a few objects and zero on the vast majority. I would be happy with a hint where the max expected number of rows for a given join could be specified. This is a generally haunting problem because some parameters may be determined dynamically as part of the join or would be better
Solution
I won't comment about spills, tempdb or hints because the query seems pretty simple to need that much consideration. I think SQL-Server's optimizer will do its job quite good, if there are indexes suited for the query.
And your splitting into two queries is good as it shows what indexes will be useful. The first part:
needs an index on
The second part of the query:
needs an index on
If efficiency is not improved or these indexes were not used or there are still differences in row estimates appearing, then there would be need to look further into this query.
In that case, the conversions (needed from the EAV design and the storing of different datatypes in the same columns) are a probable cause and your solution of splitting (as @AAron Bertrand and @Paul White comment) the query into two parts seems natural and the way to go. A redesign so to have different datatypes in their respective columns might be another.
And your splitting into two queries is good as it shows what indexes will be useful. The first part:
(select convert(bigint, Value) NodeId
from Oav.ValueArray
where PropertyId = 3331
and ObjectId = 3540233
and Sequence = 2)needs an index on
(PropertyId, ObjectId, Sequence) including the Value. I'd make it UNIQUE to be safe. The query would throw error anyway during runtime if more than one rows were returned, so it's good to ensure in advance that this won't happen, with the unique index:CREATE UNIQUE INDEX
PropertyId_ObjectId_Sequence_UQ
ON Oav.ValueArray
(PropertyId, ObjectId, Sequence) INCLUDE (Value) ;The second part of the query:
select Value
from Oav.ValueArray
where ObjectId = @a
and PropertyId = 2840needs an index on
(PropertyId, ObjectId) including the Value:CREATE INDEX
PropertyId_ObjectId_IX
ON Oav.ValueArray
(PropertyId, ObjectId) INCLUDE (Value) ;If efficiency is not improved or these indexes were not used or there are still differences in row estimates appearing, then there would be need to look further into this query.
In that case, the conversions (needed from the EAV design and the storing of different datatypes in the same columns) are a probable cause and your solution of splitting (as @AAron Bertrand and @Paul White comment) the query into two parts seems natural and the way to go. A redesign so to have different datatypes in their respective columns might be another.
Code Snippets
(select convert(bigint, Value) NodeId
from Oav.ValueArray
where PropertyId = 3331
and ObjectId = 3540233
and Sequence = 2)CREATE UNIQUE INDEX
PropertyId_ObjectId_Sequence_UQ
ON Oav.ValueArray
(PropertyId, ObjectId, Sequence) INCLUDE (Value) ;select Value
from Oav.ValueArray
where ObjectId = @a
and PropertyId = 2840CREATE INDEX
PropertyId_ObjectId_IX
ON Oav.ValueArray
(PropertyId, ObjectId) INCLUDE (Value) ;Context
StackExchange Database Administrators Q#43599, answer score: 7
Revisions (0)
No revisions yet.