HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Major performance problems on our production SQL Server, how would I troubleshoot this?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thistroubleshootmajorproductionsqlwouldperformancehowserverproblems

Problem

This question is basically the a follow up question to this question:

Strange performance problem with SQL Server 2016

We now went productive with this system. Though another application database was added to this SQL Server since my last post.

these are the system stats:

  • 128 GB RAM (110GB Max memory for SQL Server)



  • 4 Cores @2.6 GHz



  • 10 GBit network connection



  • All the storage is SSD based



  • Program files, Log files, database files and tempdb are on separate


partitions of the server

  • Windows Server 2012 R2



  • VMware Version HPE-ESXi-6.0.0-Update3-iso-600.9.7.0.17



  • VMware Tools version 10.0.9, build 3917699



  • Microsoft SQL Server 2016 (SP1) (KB3182545) - 13.0.4001.0 (X64) Oct 28 2016 18:17:30 Copyright (c) Microsoft Corporation Standard Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 (Build 9600: ) (Hypervisor)



Our system now has major performance issues. Very High CPU usage and thread counts:

Wait stats of activity monitor (I know it is not very reliable)

Results of sp_blitzfirst:

Results of sp_configure:

Advanced server settings (unfortunalty only in german)

MAXDOP Setting was changed by me.

I'm aware that this propably isn't a issue with the SQL Server itself. Its propably an issue with either virtualization (vmware), network related (I already tested this) or the application itself. I just want to nail it down even further.

Would high ASYNC_NETWORK_IO result in a high thread count for the sqlserver process? I'd imagine it spwan many workers because threads can't be closed. Is that right?

I will provide any additional info you need. Thanks in advance for your support!

EDIT:

Result of sp_Blitz @OutputType = ‘markdown’, @CheckServerInfo = 1

Priority 1: Backup:

  • Backing Up to Same Drive Where Databases Reside - 5 backups done on drive E:\ in the last two weeks, where database files also live. This represents a serious risk if that array fails.



Priority 1: Reliability:

-
Last good DBCC CHECKDB over 2 week

Solution

As discussed the last time you asked this question, your top wait is ASYNC_NETWORK_IO. SQL Server is sitting around waiting for the machine on the other end of the pipe to digest the next row of query results.

I got this info from the waits stats results of sp_Blitz (thanks for pasting that in):


1 - ASYNC_NETWORK_IO - 225.9 hours of waits, 143.5 minutes average
wait time per hour, 0.2% signal wait, 2146022 waiting tasks, 378.9 ms
average wait time.

Don't go off troubleshooting CPU threads - that's not related. Focus on your primary wait type and things that would cause that wait type.

To troubleshoot this further, run sp_WhoIsActive or sp_BlitzFirst (disclaimer: I'm one of the authors of that) - both of which will list the queries that are running currently. Look at the wait info column, find the queries waiting for ASYNC_NETWORK_IO, and look at the apps & servers they're running from.

From there, you can try:

  • Checking to see if those app servers are underpowered (like if they're maxed out on CPU, or paging to disk) and tune them



  • Working with the app developers to see if they're doing row-by-row processing on the results (like for every row that comes back from SQL Server, the app goes off and does some processing before asking for the next row of results)



  • Working with the app developers to select less data (like less rows or less columns if they don't need all the data - sometimes you see this when folks accidentally do a SELECT * and bring back more data than they needed, or they ask for all the rows when they only really need the top 1000)



Update with sp_WhoIsActive - in the sp_WhoIsActive screenshot you posted, you've got a couple of queries that are waiting on ASYNC_NETWORK_IO. For those, refer to the above instructions.

In the remainder of the queries, look at the "status" column of sp_WhoIsActive - the majority of them are "sleeping." That means they're not working at all - they're waiting for the apps on the other end of the pipe to send their next command. They have transactions open (see the "open_tran_count" column) but there's nothing SQL Server can do to speed up a sleeping transaction. These queries have been open for over forty minutes (the first column in sp_WhoIsActive. They're just not doing anything anymore. You've gotta get those folks to commit their transactions and close their connections. This isn't a performance tuning issue.

Everything we're seeing here points to a scenario where we're waiting on the app.

Context

StackExchange Database Administrators Q#187747, answer score: 18

Revisions (0)

No revisions yet.