patternsqlMinor
Suddenly SQL server restarted with pending task count increased
Viewed 0 times
sqlwithincreasedsuddenlyrestartedpendingservercounttask
Problem
We are trying to understand the issue where SQL server was suddenly restarted on its own-
Its a SQL server 2012 SP4 -GDR applied with 40 CPu's HT enabled , thus total of 80 logical processors-
MAXDOP=8 , CTOP 5
From error logs we found
/**/
BEGIN stack Dump
Non- Yielding Scheduler /**/
Error messages just at the time of restart of SQL. Also there was no mini dump created as checked.
Yes we have quite some queries running at that time. Top 3 waits seen were:-
However we also noticed wait called
At 19:46
At 19:51
At 19:51 we started seeing restart happened.
Question is why there would be a threadpool wait if almost 2000 workers are still available from above calculation. Also why those pending task count 165 when queries have so many schedulers available to run and complete the request for bunch of queries waiting on CXPACKETS?
Edit: Updating my question for couple of waits also seen from XE
As i am reading info on above wait here https://www.sqlskills.com/help/waits/sos_memory_toplevelblockallocator/
Based on the blog there is a fix in SP3 CU but with TF T8075 : I am currently on SP4-GDR latest patch, do i still need to apply the TF, though i do not see any messages like
Not sure if it helps- i see total server memory dropping a gig now and then and then matching upto target server memory which is = max server memory (750GB)
Its a SQL server 2012 SP4 -GDR applied with 40 CPu's HT enabled , thus total of 80 logical processors-
MAXDOP=8 , CTOP 5
From error logs we found
/**/
BEGIN stack Dump
Non- Yielding Scheduler /**/
Error messages just at the time of restart of SQL. Also there was no mini dump created as checked.
Yes we have quite some queries running at that time. Top 3 waits seen were:-
1.TranLogIO
2.CXPACKET
3.PAGELATCH SHHowever we also noticed wait called
SOS WORKER as collected from system health XE which i believe is nothing else but THREADPOOL. Therefore i went further and analyse the query processing details from that system health and found at the time of non yielding scheduler errors below was loggedAt 19:46
maxworkers- 2944 workers created 789 oldest pending taskwait time 0 pending task 4At 19:51
maxworkers- 2944 workers created 982 oldest pending taskwait time 256987 pending task 165At 19:51 we started seeing restart happened.
Question is why there would be a threadpool wait if almost 2000 workers are still available from above calculation. Also why those pending task count 165 when queries have so many schedulers available to run and complete the request for bunch of queries waiting on CXPACKETS?
Edit: Updating my question for couple of waits also seen from XE
SOS_MEMORY_TOPLEVELBLOCKALLOCATORAs i am reading info on above wait here https://www.sqlskills.com/help/waits/sos_memory_toplevelblockallocator/
Based on the blog there is a fix in SP3 CU but with TF T8075 : I am currently on SP4-GDR latest patch, do i still need to apply the TF, though i do not see any messages like
Failed allocate pages: FAIL_PAGE_ALLOCATION 513 in error logNot sure if it helps- i see total server memory dropping a gig now and then and then matching upto target server memory which is = max server memory (750GB)
Solution
I would start by saying that best person to analyze dump in MS guy or person who knows about it. I would just try to point out some basics from the log you posted
At 19:46 maxworkers- 2944 workers created 789 oldest pending taskwait time 0 pending task 4
At 19:51 maxworkers- 2944 workers created 982 oldest pending taskwait time 256987 pending task 165
Please see that
Also note a thread is assigned a scheduler and it should run on that scheduler and that is why so many threads were waiting on this hung scheduler.
At 19:46 maxworkers- 2944 workers created 789 oldest pending taskwait time 0 pending task 4
At 19:51 maxworkers- 2944 workers created 982 oldest pending taskwait time 256987 pending task 165
Please see that
taskwait time 256987 and pending task 165. This means the scheduler was hung and 165 tasks were waiting on it to get a scheduler and run. In this case MS SQL Server was not able to get out of this hung scheduler scenario and waited for a while but again it decided that it would be best to restart itself to come out of this hung scheduler scenario and hence it restarted. Now for reason why it hang is beyond my ability to tell with the information you have posted.Also note a thread is assigned a scheduler and it should run on that scheduler and that is why so many threads were waiting on this hung scheduler.
Context
StackExchange Database Administrators Q#275895, answer score: 3
Revisions (0)
No revisions yet.