patternsqlModerate
Page Life Expectancy (PLE), where to start?
Viewed 0 times
plewherepagestartlifeexpectancy
Problem
I have inherited a SQL server {2012 (SP3), but this question is intended to be generic} we are using SCOM to monitor it. Previously I was getting an alert once or twice a month for PLE
...
Additionally if you are looking at Lazy Writes on an inherited server you should check Recovery Interval
If this first query does not return values:
I have a pretty good idea what all these values represent, and how they work together. I have included comments and sources in my code above.
- 'Free list stalls/sec'
- 'Lazy writes/sec'
- 'Buffer cache hit ratio'
...
SELECT [object_name],
[counter_name],
[cntr_value] FROM sys.dm_os_performance_counters -- https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-performance-counters-transact-sql
WHERE [counter_name] = 'Page life expectancy' --if multiple NUMA on a server should return multiple Nodes,
OR [counter_name] = 'Free list stalls/sec' -- Number of requests per second that had to wait for a free page https://learn.microsoft.com/en-us/sql/relational-databases/performance-monitor/sql-server-buffer-manager-object
OR [counter_name] = 'Lazy writes/sec' --Flushes of dirty pages before a checkpoint runs.
OR [counter_name] = 'Buffer cache hit ratio' --percentage of pages found in the buffer cache without having to read from disk you want this ratio to be high
Order by [counter_name] DESC, [object_name];Additionally if you are looking at Lazy Writes on an inherited server you should check Recovery Interval
EXEC sp_configure @configname='recovery interval (min)'; --The 'config_value' default 0 indicates SQL is applying Checkpoints completely automatically https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-recovery-interval-server-configuration-optionIf this first query does not return values:
SELECT COUNT(*) FROM sys.dm_os_performance_counters; --If no values from the firs query, an value of 0 here indicates a seperate issue https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-performance-counters-transact-sqlI have a pretty good idea what all these values represent, and how they work together. I have included comments and sources in my code above.
Solution
You basically asked, "What should I do when Page Life Expectancy changes?"
My answer: nothing. I don't start by looking at Page Life Expectancy. That metric made sense in the SQL Server 7/2000 days when it was all that we had, but today, in 2018, we can do better.
Start by looking at wait stats - that tells you what SQL Server is waiting on.
I don't care whether PLE is 300 or 3,000 - tell me what you're WAITING on, SQL Server, and then I'll go troubleshoot that metric.
My personal favorite way to check waits is to use the open source sp_BlitzFirst (disclaimer: I wrote it.) By default, it takes a 5-second sample of your server's metrics and gives you a few guesses as to why it's slow right now.
Because you like writing long questions, you'll probably also like these:
The first result set gives you your waits since startup, and:
Takes a longer sample, and tells your waits over that time range.
Wait stats can be kind of cryptic, so next to every wait type, I link to the SQLskills wait stats repository for that wait type. You can just copy/paste out the name of your top wait type, go to their site, and learn more about what causes that wait and how to fix it.
If PLE is dropping due to queries reading a lot of data pages from disk, for example, you might see PAGEIOLATCH% wait types. If it's dropping due to queries getting huge memory grants, you might see RESOURCE_SEMAPHORE. If PLE isn't the problem, then you'll see different wait types altogether.
My answer: nothing. I don't start by looking at Page Life Expectancy. That metric made sense in the SQL Server 7/2000 days when it was all that we had, but today, in 2018, we can do better.
Start by looking at wait stats - that tells you what SQL Server is waiting on.
I don't care whether PLE is 300 or 3,000 - tell me what you're WAITING on, SQL Server, and then I'll go troubleshoot that metric.
My personal favorite way to check waits is to use the open source sp_BlitzFirst (disclaimer: I wrote it.) By default, it takes a 5-second sample of your server's metrics and gives you a few guesses as to why it's slow right now.
Because you like writing long questions, you'll probably also like these:
sp_BlitzFirst @SinceStartup = 1;The first result set gives you your waits since startup, and:
sp_Blitz @ExpertMode = 1, @Seconds = 60;Takes a longer sample, and tells your waits over that time range.
Wait stats can be kind of cryptic, so next to every wait type, I link to the SQLskills wait stats repository for that wait type. You can just copy/paste out the name of your top wait type, go to their site, and learn more about what causes that wait and how to fix it.
If PLE is dropping due to queries reading a lot of data pages from disk, for example, you might see PAGEIOLATCH% wait types. If it's dropping due to queries getting huge memory grants, you might see RESOURCE_SEMAPHORE. If PLE isn't the problem, then you'll see different wait types altogether.
Code Snippets
sp_BlitzFirst @SinceStartup = 1;sp_Blitz @ExpertMode = 1, @Seconds = 60;Context
StackExchange Database Administrators Q#203784, answer score: 15
Revisions (0)
No revisions yet.