HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

IO_STALL question and understanding

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
andunderstandingquestionio_stall

Problem

I am collecting IO_STALLS from sys.dm_io_virtual_file_stats every 5minutes and then doing a delta to see which files are being affected most by IO.

In one 5min period I get a delta of 5826331 ms which is 97minutes.

I am a little confused by this, is this saying that an operation started 97mins ago only just finished at that point and hence recorded that wait time?

Thanks

Added code as requested:

```
/*

USE [SysDBA]
GO
*/
/ Object: Table [dbo].[DISKIOPS] Script Date: 04/07/2013 11:40:15 /
/*
DROP TABLE [dbo].[DISKIOPS]
GO
*/
--Create the table
/ Object: Table [dbo].[DISKIOPS] Script Date: 04/07/2013 11:40:15 /
/*
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].DISKIOPS NULL,
[NAME] varchar NULL,
[FILE_ID] [int] NULL,
[DB_FILE_TYPE] varchar NULL,
[DISK] varchar NULL,
[FILE_LOCATION] varchar NULL,
[TIMESTAMP] [datetime] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

*/

--Capture IO information from DMV and query to find deltas over time.
/*
USE [SysDBA]
GO

INSERT INTO [dbo].[DISKIOPS]
([IO_STALL]
,[IO_STALL_READ_MS]
,[IO_STALL_WRITE_MS]
,[NUM_OF_READS]
,[NUM_OF_WRITES]
,[SIZE_ON_DISK_MB]
,[DBNAME]
,[NAME]
,[FILE_ID]
,[DB_FILE_TYPE]
,[DISK]
,[FILE_LOCATION]
,[TIMESTAMP])
SELECT a.io_stall, a.io_stall_read_ms, a.io_stall_write_ms, a.num_of_reads,
a.num_of_writes,
--a.sample_ms, a.num_of_bytes_read, a.num_of_bytes_written,
( ( a.size_on_disk_bytes / 1024 ) / 1024.0 ) AS size_on_disk_mb,
db_name(a.database_id) AS dbname,
b.nam

Solution

Question comment pasted below:


io_stall by itself doesn't mean too much. If in 10 seconds you had 1000 operations stalling for 1 second each, you'll have 1000 seconds of stalls. That would be 16+ minutes of stalls in 10 seconds. You need to correlate this with IO operations...

The above is a pretty good example of how you can see monumental and seemingly exaggerated numbers. By itself, io_stall doesn't really mean anything. You need to know the scale of I/O operations for that cumulative stall.

Instead of having this:

SELECT MAX([IO2].[IO_STALL] - [IO1].[IO_STALL])
FROM IOPS IO1 JOIN IOPS IO2 ON IO1.ROW = (IO2.ROW+1)
...


You need to divide the stall by I/O operations to get the average stall per I/O (or per read, or write, or whatever granularity you're looking for). In other words, my recommendation would be to modify your query to look something like this:

SELECT
    MAX(([IO2].[IO_STALL] - [IO1].[IO_STALL]) / (IO2.NUM_OF_READS + IO2.NUM_OF_WRITES - IO1.NUM_OF_READS - IO1.NUM_OF_WRITES))
FROM IOPS IO1 JOIN IOPS IO2 ON IO1.ROW = (IO2.ROW+1)


And then you need to have an extra predicate clause to ensure you're not dividing by zero:

...
WHERE IO1.NAME = IO2.NAME
and (IO2.NUM_OF_READS + IO2.NUM_OF_WRITES - IO1.NUM_OF_READS - IO1.NUM_OF_WRITES) > 0
AND IO1.Disk = @Disk


What this basically does is calculate the average io_stall per I/O operation. By itself, a high io_stall could simply mean a higher workload and not necessarily the sign of a problem.

Code Snippets

SELECT MAX([IO2].[IO_STALL] - [IO1].[IO_STALL])
FROM IOPS IO1 JOIN IOPS IO2 ON IO1.ROW = (IO2.ROW+1)
...
SELECT
    MAX(([IO2].[IO_STALL] - [IO1].[IO_STALL]) / (IO2.NUM_OF_READS + IO2.NUM_OF_WRITES - IO1.NUM_OF_READS - IO1.NUM_OF_WRITES))
FROM IOPS IO1 JOIN IOPS IO2 ON IO1.ROW = (IO2.ROW+1)
...
WHERE IO1.NAME = IO2.NAME
and (IO2.NUM_OF_READS + IO2.NUM_OF_WRITES - IO1.NUM_OF_READS - IO1.NUM_OF_WRITES) > 0
AND IO1.Disk = @Disk

Context

StackExchange Database Administrators Q#45999, answer score: 13

Revisions (0)

No revisions yet.