HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlModerate

Strange performance problem with SQL Server 2016

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
problemwithsqlstrangeperformanceserver2016

Problem

We have a single instance of SQL Server 2016 SP1 running in a VMware virtual machine. It contains 4 databases, each for a different application. Those applications are all on separate virtual servers. None of them are in production use yet. People testing the applications are reporting performance issues, though.

These are the stats of the server:

  • 128 GB RAM (110GB Max memory for SQL Server)



  • 4 Cores @4.6 GHz



  • 10 GBit network connection



  • All the storage is SSD based



  • Program files, Log files, database files and tempdb are on separate partitions of the server



  • asd



The users are performing single screen access via a C++ based ERP application.

When I stress test the SQL Server with Microsoft's ostress using either many small queries or a big query, I get max performance. Only thing throttling is the client, because he can't answer fast enough.

But when there are barely any users, the SQL Server is barely doing anything. Yet people have to wait forever just to save anything in the application.

According to Paul Randal's "Tell me where it hurts" query, 50% of all wait events are ASYNC_NETWORK_IO.

This could either mean a network issue, or performance issue with the application server or client. Neither of those are even remotely using their resources at maximum capacity. Most of the time CPU is around 26% on all machines (Client, appserver, db server).

Latency of network connection is around 1-3ms. The IO of the db server is at max 20MB/s write speed during normal usage with the application (avg is 7-9MB/s). When I stress test, I get around max 5GB/s.

Buffer cache size is at 60GB for the DB of our ERP system, 20GB for our financing software, 1GB for quality assurance software, 3GB for document archiving system.

I gave the SQL Server account the right to use Instant File Initialization. That didn't increase performance in the slightest.

Page life expectancy is at around 15k+ during normal use. Drops to around .05k during the end

Solution

If your primary wait is ASYNC_NETWORK_IO, then the issue is not with SQL Server. It is almost always due to an application bottleneck. I don't mean a bottleneck on the application server, but rather a bottleneck in the application.

The application bottleneck is usually because of row-by-row processing while SQL Server is sending the data:

  • The application is requesting data from SQL Server



  • SQL Server is sending the data fast



  • The application is telling SQL Server to wait while it processes each row



  • SQL Server records waiting time on ASYNC_NETWORK_IO while the application is telling it to wait



Instead of that, the application needs to consume all of the data from SQL Server and THEN do its row-by-row processing. SQL Server is out of the picture at that point.
sp_BlitzFirst output

The LCK_M_S wait is not high. Only 2 seconds of the 30-second sample are on it, and its average is only 400ms. That is very, very unlikely to be the problem. ASYNC_NETWORK_IO is your top wait in that sample. Still an application issue. If you want help with the LCK stuff, we'd need to see the queries involved.

Even ASYNC_NETWORK_IO isn't that bad in that sample. My eyes get big when the waiting time is equal to or greater than the sample size. That's when I dig in.

Your entire issue is ASYNC_NETWORK_IO. This is not a SQL Server problem. It's a problem with either the application (doing row-by-row processing while SQL Server is sending the data), the application server (you already said it's fine) or the network (you've said that the network is fine). So the issue is with the application. The C++ app needs to be fixed.

Context

StackExchange Database Administrators Q#186597, answer score: 18

Revisions (0)

No revisions yet.