patternsqlMinor
Diagnosing a block in a process that EXECs several layers of stored procedures
Viewed 0 times
storedproceduresprocessblockexecsseveralthatlayersdiagnosing
Problem
Disclaimer: I am a developer. Please be nice to me. I am not the developer that is responsible for what follows. I try to be one of the good ones.
I have inherited a support ticket that involves a specific process causing a block on the client's SQL Server 2008R2 installation. I can trigger the block at any time on the client's server, but cannot reproduce it anywhere else. We even went so far as to create a virtual server with exactly the same hardware stats as the client's server, restored the exact same database to the exact same SQL Server setup, but no dice--can't replicate it. The process itself is ugly: a stored procedure is called, which then executes several other stored procedures, all using named transactions, some nested in cursors. The process follows this pseudocode:
Sorry...not sure how else to describe it.
There is no
When I look at the process in Activity Monitor, the Task State is
If I run
I have inherited a support ticket that involves a specific process causing a block on the client's SQL Server 2008R2 installation. I can trigger the block at any time on the client's server, but cannot reproduce it anywhere else. We even went so far as to create a virtual server with exactly the same hardware stats as the client's server, restored the exact same database to the exact same SQL Server setup, but no dice--can't replicate it. The process itself is ugly: a stored procedure is called, which then executes several other stored procedures, all using named transactions, some nested in cursors. The process follows this pseudocode:
sp_Outermost (named transaction Trans_Outermost)
sp_Nested1 (Trans_Nested1)
sp_Nested2 (Trans_Nested2)
sp_Nested3 (Trans_Nested3)
sp_Nested3_1 (Trans_Nested3_1)
sp_Nested3_1_1 (Trans_Nested3_1_1)
sp_Nested3_1_1_1 (Trans_Nested3_1_1_1)
sp_Nested3_1_1_1_1 (Trans_Nested3_1_1_1_1)Sorry...not sure how else to describe it.
There is no
TRY-CATCH logic in any of the stored procedures, though there is some "custom" error handling that involves GOTOs and setting an "error number" (more on that later). When I look at the process in Activity Monitor, the Task State is
RUNNING, Command is SELECT, and the Wait Type is ASYNC_NETWORK_IO.If I run
DBCC OPENTRAN or look at sys.dm_tran_session_transactions and sys.dm_tran_active_transactions, it lists the outermost transaction (Trans_Outermost) as being the open transaction. However, when I run a query against sys.dm_exec_connections and sys.dm_exec_sessions, I am informed that the query being executed is actually sp_Nested3_1_1_1. This is always the case. Further, running a query ganked from this answer, I see that the statement being waiteSolution
Q. So, my question is, how can I figure out what the root cause of this blocking is?
You may find it's one of the old standards:
But it's going to be difficult if not impossible to trace this down in Activity Monitor alone. If you can withstand making some basic modifications on the client server I think you should focus on capturing what is blocking there.
-
It can identify blocking chains and locks and output to a table. But first you need to work out the table definition and create the table (in this case we'll put it in master too but a special tools database would be best practice). Here's a snippet adapted from Kendra Little.
-
Exec sp_configure 'blocked process threshold (s)'. If it's set to 0 then set it to something like 15 (seconds) and run Reconfigure. This is usually safe, but the standard buyer beware warnings apply. Don't set it much lower than this and if you set it to 30s you might miss out on things because 30s is the ADO.NET default timeout.
-
Create an Agent job which runs sp_WhoIsActive @get_transaction_info = 1, @get_plans = 1, @find_block_leaders = 1, @destination_table = 'BLOCKED_PROCESS_REPORT'
-
Create an Agent WMI alert on, type WMI event alert, with a Query of SELECT * FROM BLOCKED_PROCESS_REPORT, and a Response that executes the above job.
-
Test it in two sessions (BEGIN TRAN, INSERT into table, then DELETE from table in the other session, and wait to see that your BLOCKED_PROCESS_REPORT table begins getting populated with data about 30s later).
Now you sit back and wait. Once the problem happens again you'll have a bunch of detailed information in BLOCKED_PROCESS_REPORT about what's blocking what,
in what order, and what locks are taken, and can go from there.
Remember to clean up these things once you're done.
You may find it's one of the old standards:
- A concurrency issue with the code.
- A maintenance job doing a rebuild.
- A user connecting directly to the database for reporting (or Excel!) causing blocking.
But it's going to be difficult if not impossible to trace this down in Activity Monitor alone. If you can withstand making some basic modifications on the client server I think you should focus on capturing what is blocking there.
- Put Adam Mechanic's sp_WhoIsActive on the system. If you don't have a database just for holding tools, best put this in master so you can run it anywhere.
-
It can identify blocking chains and locks and output to a table. But first you need to work out the table definition and create the table (in this case we'll put it in master too but a special tools database would be best practice). Here's a snippet adapted from Kendra Little.
DECLARE @destination_table VARCHAR(4000) ;
SET @destination_table = 'BLOCKED_PROCESS_REPORT' ;
DECLARE @schema VARCHAR(4000) ;
EXEC sp_WhoIsActive @get_transaction_info = 1, @get_plans = 1, @find_block_leaders = 1, @RETURN_SCHEMA = 1, @SCHEMA = @schema OUTPUT ;
SET @schema = REPLACE(@schema, '', @destination_table) ;
PRINT @schema
EXEC(@schema) ;
-
Exec sp_configure 'blocked process threshold (s)'. If it's set to 0 then set it to something like 15 (seconds) and run Reconfigure. This is usually safe, but the standard buyer beware warnings apply. Don't set it much lower than this and if you set it to 30s you might miss out on things because 30s is the ADO.NET default timeout.
-
Create an Agent job which runs sp_WhoIsActive @get_transaction_info = 1, @get_plans = 1, @find_block_leaders = 1, @destination_table = 'BLOCKED_PROCESS_REPORT'
-
Create an Agent WMI alert on, type WMI event alert, with a Query of SELECT * FROM BLOCKED_PROCESS_REPORT, and a Response that executes the above job.
-
Test it in two sessions (BEGIN TRAN, INSERT into table, then DELETE from table in the other session, and wait to see that your BLOCKED_PROCESS_REPORT table begins getting populated with data about 30s later).
Now you sit back and wait. Once the problem happens again you'll have a bunch of detailed information in BLOCKED_PROCESS_REPORT about what's blocking what,
in what order, and what locks are taken, and can go from there.
Remember to clean up these things once you're done.
Context
StackExchange Database Administrators Q#140046, answer score: 4
Revisions (0)
No revisions yet.