gotchasqlModerate
Why does NOLOCK make a scan with variable assignment slower?
Viewed 0 times
whyscanwithmakeslowerassignmentdoesnolockvariable
Problem
I'm fighting against NOLOCK in my current environment. One argument I've heard is that the overhead of locking slows down a query. So, I devised a test to see just how much this overhead might be.
I discovered that NOLOCK actually slows down my scan.
At first I was delighted, but now I 'm just confused. Is my test invalid somehow? Shouldn't NOLOCK actually allow a slightly faster scan? What's happening here?
Here's my script:
What I've tried that didn't work:
same results)
(same results)
I discovered that NOLOCK actually slows down my scan.
At first I was delighted, but now I 'm just confused. Is my test invalid somehow? Shouldn't NOLOCK actually allow a slightly faster scan? What's happening here?
Here's my script:
USE TestDB
GO
--Create a five-million row table
DROP TABLE IF EXISTS dbo.JustAnotherTable
GO
CREATE TABLE dbo.JustAnotherTable (
ID INT IDENTITY PRIMARY KEY,
notID CHAR(5) NOT NULL )
INSERT dbo.JustAnotherTable
SELECT TOP 5000000 'datas'
FROM sys.all_objects a1
CROSS JOIN sys.all_objects a2
CROSS JOIN sys.all_objects a3
/********************************************/
-----Testing. Run each multiple times--------
/********************************************/
--How fast is a plain select? (I get about 587ms)
DECLARE @trash CHAR(5), @dt DATETIME = SYSDATETIME()
SELECT @trash = notID --trash variable prevents any slowdown from returning data to SSMS
FROM dbo.JustAnotherTable
ORDER BY ID
OPTION (MAXDOP 1)
SELECT DATEDIFF(MILLISECOND,@dt,SYSDATETIME())
----------------------------------------------
--Now how fast is it with NOLOCK? About 640ms for me
DECLARE @trash CHAR(5), @dt DATETIME = SYSDATETIME()
SELECT @trash = notID
FROM dbo.JustAnotherTable (NOLOCK)
ORDER BY ID --would be an allocation order scan without this, breaking the comparison
OPTION (MAXDOP 1)
SELECT DATEDIFF(MILLISECOND,@dt,SYSDATETIME())What I've tried that didn't work:
- Running on different servers (same results, servers were 2016-SP1 and 2016-SP2, both quiet)
- Running on dbfiddle.uk on different versions (noisy, but probably
same results)
- SET ISOLATION LEVEL instead of hints (same results)
- Turning off lock escalation on the table (same results)
- Examining actual execution time of the scan in the actual query plan
(same results)
- Recompile h
Solution
NOTE: this might not be the type of answer you're looking for. But perhaps it will be helpful to other potential answerers as far as providing clues as where to start looking
When I run these queries under ETW tracing (using PerfView), I get the following results:
So the difference is 51ms. This is pretty dead on with your difference (~50ms). My numbers are slightly higher overall because of the profiler sampling overhead.
Finding the difference
Here's a side-by-side comparison showing that the 51ms difference is in the
Plain select is on the left at 332 ms, while the nolock version is on the right at 383 (51ms longer). You can also see that the two code paths differ in this way:
-
Plain
-
Using
This shows that there is some branching in the
Why does the
The nolock branch actually spends less time calling into
So what's the other 13ms (51ms total - 38ms accounted for so far) of overhead in
Interface dispatch
I thought this was more of a curiosity than anything, but the nolock version appears to incur some interface dispatch overhead by calling into the Windows API method
I should probably run this trace again with more iterations of the query, I think there are some small things, like hardware interrupts, that were not picked up by the 1ms sample rate of PerfView
Outside of that method, I noticed another small difference that causes the nolock version to run slower:
Releasing Locks
The nolock branch appears to more aggressively run the
The plain select is on top, at 12ms, while the nolock version is on the bottom at 26ms (14ms longer). You can also see in the "When" column that the code was executed more frequently during the sample. This may be an implementation detail of nolock, but it seems to introduce quite a bit of overhead for small samples.
There are lots of other small differences, but those are the big chunks.
When I run these queries under ETW tracing (using PerfView), I get the following results:
Plain - 608 ms
NOLOCK - 659 msSo the difference is 51ms. This is pretty dead on with your difference (~50ms). My numbers are slightly higher overall because of the profiler sampling overhead.
Finding the difference
Here's a side-by-side comparison showing that the 51ms difference is in the
FetchNextRow method in sqlmin.dll:Plain select is on the left at 332 ms, while the nolock version is on the right at 383 (51ms longer). You can also see that the two code paths differ in this way:
-
Plain
SELECTsqlmin!RowsetNewSS::FetchNextRowcalls
sqlmin!IndexDataSetSession::GetNextRowValuesInternal
-
Using
NOLOCKsqlmin!RowsetNewSS::FetchNextRowcalls
sqlmin!DatasetSession::GetNextRowValuesNoLockwhich calls either
sqlmin!IndexDataSetSession::GetNextRowValuesInternalor
kernel32!TlsGetValue
This shows that there is some branching in the
FetchNextRow method based on the isolation level / nolock hint.Why does the
NOLOCK branch take longer?The nolock branch actually spends less time calling into
GetNextRowValuesInternal (25ms less). But the code directly in GetNextRowValuesNoLock (not including methods it calls AKA the "Exc" column) runs for 63ms - which is the majority of the difference (63 - 25 = 38ms net increase in CPU time).So what's the other 13ms (51ms total - 38ms accounted for so far) of overhead in
FetchNextRow?Interface dispatch
I thought this was more of a curiosity than anything, but the nolock version appears to incur some interface dispatch overhead by calling into the Windows API method
kernel32!TlsGetValue via kernel32!TlsGetValueStub - a total of 17ms. The plain select appears to not go through the interface, so it never hits the stub, and only spends 6ms on TlsGetValue (a difference of 11ms). You can see this above in the first screenshot.I should probably run this trace again with more iterations of the query, I think there are some small things, like hardware interrupts, that were not picked up by the 1ms sample rate of PerfView
Outside of that method, I noticed another small difference that causes the nolock version to run slower:
Releasing Locks
The nolock branch appears to more aggressively run the
sqlmin!RowsetNewSS::ReleaseRows method, which you can see in this screenshot:The plain select is on top, at 12ms, while the nolock version is on the bottom at 26ms (14ms longer). You can also see in the "When" column that the code was executed more frequently during the sample. This may be an implementation detail of nolock, but it seems to introduce quite a bit of overhead for small samples.
There are lots of other small differences, but those are the big chunks.
Code Snippets
Plain - 608 ms
NOLOCK - 659 msContext
StackExchange Database Administrators Q#207422, answer score: 14
Revisions (0)
No revisions yet.