debugsqlMinor
Worker threads issue: How to read this crash dump, and what to do about it?
Viewed 0 times
thisdumpwhatreadthreadsissueworkerabouthowand
Problem
Our production SQL Server froze up yesterday. Had to kill the windows process to get it to restart.
The event viewer says this at the time of the crash:
New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 300 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 0%%. System Idle: 99%%.
Found this article (from 2010) that gives a few tips how to read the crash dumps. I looked at the crash dump from our server and found thousands of blocks that all look like this:
# Child-SP RetAddr Call Site
00 000000b5
01 000000b5
02 000000b5
03 000000b5
04 000000b5
05 000000b5
06 000000b5
07 000000b5
08 000000b5
09 000000b5
0a 000000b5
0b 000000b5
0c 00
Microsoft SQL Server 2016 (SP1-CU1) (KB3208177) - 13.0.4411.0 (X64)
Jan 6 2017 14:24:37
Copyright (c) Microsoft Corporation
Enterprise Edition (64-bit) on Windows Server 2012 R2 Standard 6.3
(Build 9600: ) (Hypervisor)
The event viewer says this at the time of the crash:
New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 300 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 0%%. System Idle: 99%%.
Found this article (from 2010) that gives a few tips how to read the crash dumps. I looked at the crash dump from our server and found thousands of blocks that all look like this:
2376 Id: 6bc.84c8 Suspend: 0 Teb: 00007ff7f74d4000 Unfrozen# Child-SP RetAddr Call Site
00 000000b5
5938bcd8 00007fffd22e6d8e ntdll!NtSignalAndWaitForSingleObject+0xa01 000000b5
5938bce0 00007fffc1944b99 KERNELBASE!SignalObjectAndWait+0xc802 000000b5
5938bd90 00007fffc19414dc sqldk!SOS_Scheduler::Switch+0x10603 000000b5
5938c080 00007fffc4c5185f sqldk!SOS_Scheduler::SuspendNonPreemptive+0xd304 000000b5
5938c0c0 00007fffc4debd67 sqlmin!EventInternal::Wait+0x1e705 000000b5
5938c110 00007fffc4debb61 sqlmin!LockOwner::Sleep+0x49a06 000000b5
5938c210 00007fffc4c57f1c sqlmin!lck_lockInternal+0xfd307 000000b5
5938cab0 00007fffc4c5ef3a sqlmin!GetLock+0x1d908 000000b5
5938cb80 00007fffc4ecc841 sqlmin!BTreeRow::AcquireLock+0x21209 000000b5
5938cc90 00007fffc4c5f20b sqlmin!IndexRowScanner::AcquireNextRowLock+0xf60a 000000b5
5938ccd0 00007fffc4c774fa sqlmin!IndexDataSetSession::GetNextRowValuesInternal+0x12e60b 000000b5
5938cf60 00007fffc4cb9dc2 sqlmin!RowsetNewSS::FetchNextRow+0x1d90c 00
Solution
Community Wiki answer generated from comments on this and the previous question.
sp_BlitzErik: Threadpool waits can happen for reasons other than just long running queries. Number of connections, number of simultaneous queries (especially if going parallel), and number of background tasks can also contribute, and that's just within SQL.
Enable the Remote DAC, run
Having too few CPUs for your workload, incorrect parallelism settings, long blocking chains are common causes.
Shanky: For dump analysis you should contact MS support, as normal blocking does not cause stack dumps.
Deadlocked schedulers are always a SQL Server bug, which should be reported to Microsoft.
sp_BlitzErik: Threadpool waits can happen for reasons other than just long running queries. Number of connections, number of simultaneous queries (especially if going parallel), and number of background tasks can also contribute, and that's just within SQL.
Enable the Remote DAC, run
sp_WhoIsActive next time it happens. You may have to use @show_sleeping_spids to see connection pooling issues.Having too few CPUs for your workload, incorrect parallelism settings, long blocking chains are common causes.
Shanky: For dump analysis you should contact MS support, as normal blocking does not cause stack dumps.
Deadlocked schedulers are always a SQL Server bug, which should be reported to Microsoft.
Context
StackExchange Database Administrators Q#168481, answer score: 3
Revisions (0)
No revisions yet.