HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Reoccurring performance issues at the server level

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thereoccurringissueslevelperformanceserver

Problem

I am facing odd performance issues at one of my servers that is using SQL Server 2019 Standard Edition.

Users started complaining that the application is very slow/unresponsive. To check what is going on I logged in to SQL (without any issues) and I run sp_whoisactive without any parameters. It showed that my session is the only active one. So, I run sp_blitzfirst, also without any parameters and.... it took 90s it to be completed. The results were not super interesting. I have 0 warnings with priority between 1 and 199. The top wait pointed out as that script was PAGEIOLATEH_SH which was showing 84s of wait time (which btw is strange, because I have 8 cores, that script should compare snapshots took 5s apart so maximal possible wait time should be 40s, isn't it?).

When sp_blitzfirst was running I tried to check what is blocking that with sp_whoisactive and it was also showing that my sessions are the only one and it was showing in wait_info column
(9ms)PAGEIOLATCH_SH:MMLIVE:1(*).

I needed to do some googling/reading to check what that wait type means and when I went back to the server after ~30 minutes the issue was gone (for example sp_blitzfirst was returning results within 6s).

It seems that it was not one-time incident as I was advised that it happened at least twice within last two weeks when I was not around.

So, my understanding is that performance issues are caused by slow SQL server (because at the time when they were reported even diagnostic queries that I run directly on the server were extremely slow). But at the same time, it cannot be caused by any process that is running directly on the database (because I would see that process with sp_whoisactive).... so, it has to be VM or hardware level. My understanding of the wait type that was top 1 during that time is that SQL struggling to pull data from disk to memory. So, with our sysadmin we checked VM statistics but there was absolutely nothing unusual. We have checked IO statist

Solution

PAGEIOLATCH_SH is a wait type related to your disk, specifically waiting on a data page to be loaded off your disk. Based on the symptoms you've described, with nothing apparently running in the SQL Server instance, I would agree it's likely either a hardware issue or something else that was running on your server (outside of the SQL Server instance).

I had a bad disk issue occur in a VM (bug from VMWare) not too long ago that was hard to detect by looking at any I/O stats within SQL Server. But what ultimately helped me determine the issue was to use CrystalDiskMark to benchmark the disk itself directly on the server, to take my SQL Server instance out of the equation. So this is my first go-to whenever I suspect something wonky happening with my Disk outside of my SQL Server instance.

Outside of that, I would check if there's any disk-heavy operations that can be happening directly on the server too, e.g. anti-virus, server level backups / snapshots, Windows Task Scheduler jobs, etc.

The next time this happens, you can also open the Windows Resource Monitor which will show you what's consuming the most Reads and Writes on your Disk, in realtime.

For reference, this was my adventure where CrystalDiskMark helped me solve my problem. Further information, suggestions, and resolution in the chat.

Context

StackExchange Database Administrators Q#318225, answer score: 5

Revisions (0)

No revisions yet.