patternsqlMinor
Buffer Size Changing?
Viewed 0 times
sizebufferchanging
Problem
I have a production database that is experiencing wildly fluctuating Page Life Expectancy (PLE) issues. (It crashes to zero at random times.)
I have been researching the PLE issue and have found something that seems to point to a VMWare issue, but I am not sure I am using the data right. It seems like I am losing buffer/cache pages.
I am using this query:
(Found here)
I am totaling the results (the count) before and after my PLE crashes. An example is 1,097,820 before and 131,394 after. So I seem to "lose" 966,426 pages.
My guess is that the hardware for all the virtual machines is under stress, so it will randomly swap out some memory from the server for a while. (This is just a guess.) When that happens all the pages are lost, so the PLE plummets.
So, am I using the
(I would love a way to confirm this conclusion.)
Or is there another explanation as to why the count drops so much?
Information below the line was added from the OP's comments
Our System Admins manage the VMs. I am hoping to understand my query before I go to them with this data. The timing of the PLE crashes seems random from the database point of view. (No re-indexing or other high performance stuff happening during the PLE Crashes)
I have done a ton of work to see if it was work load related. And while there is one poorly performing query, it is not enough to use up all the cache. [There is] no rebuilding or other non-routine user activity on the server when the buffer counts go down. A
I have been researching the PLE issue and have found something that seems to point to a VMWare issue, but I am not sure I am using the data right. It seems like I am losing buffer/cache pages.
I am using this query:
SELECT COUNT(*) AS cached_pages_count,
CASE database_id
WHEN 32767 THEN 'ResourceDb'
ELSE DB_NAME(database_id)
END AS database_name
FROM sys.dm_os_buffer_descriptors
GROUP BY DB_NAME(database_id), database_id
ORDER BY cached_pages_count DESC;(Found here)
I am totaling the results (the count) before and after my PLE crashes. An example is 1,097,820 before and 131,394 after. So I seem to "lose" 966,426 pages.
My guess is that the hardware for all the virtual machines is under stress, so it will randomly swap out some memory from the server for a while. (This is just a guess.) When that happens all the pages are lost, so the PLE plummets.
So, am I using the
sys.dm_os_buffer_descriptors view correctly? From what I read it always shows used buffer/cached pages. So if it is empty (or significantly reduced), I either don't have the memory anymore, or it is empty.(I would love a way to confirm this conclusion.)
Or is there another explanation as to why the count drops so much?
Information below the line was added from the OP's comments
Our System Admins manage the VMs. I am hoping to understand my query before I go to them with this data. The timing of the PLE crashes seems random from the database point of view. (No re-indexing or other high performance stuff happening during the PLE Crashes)
I have done a ton of work to see if it was work load related. And while there is one poorly performing query, it is not enough to use up all the cache. [There is] no rebuilding or other non-routine user activity on the server when the buffer counts go down. A
Solution
Q: I have a production data base that is experiencing wildly fluctuating Page Life Expectancy (PLE) issues. (It crashes to zero at random times.)
Let me ask you what is output of
Its sometime normal for PLE to fluctuate on system having high activity. Actually this is by very virtue how PLE code works in SQL Server. But the fact that its plummetting to zero quite frequently make me believe you might be hitting the bug I have mentioned above.
As per Microsoft Bug fix detail
You may experience slow performance in SQL Server 2012. When you check SQL Server Performance Monitor tools, you see the following:
•A rapid decline in the SQLServer:Buffer Manager\Page life expectancy performance counter values. When this issue occurs, the counter is near 0.
PLE on system is measure of how volatile your buffer pool is, its also measure of amount of I/O activity going in your SQL Server. MSDN says that
Page life expectancy - Indicates the number of seconds a page will stay in the buffer pool without references
Believe me this definition is incomplete. It describes it in form of time which is not a complete definition. I have always noticed that it is measure of I/O activity on server. The greater the I/O activity the more volatile would be BPool, thus fluctuating PLE.
Q: My guess is that the hardware for all the virtual machines is under stress, so it will randomly swap out some memory from the server for a while.
If you believe this is the case and you want SQL Server not be be victim of such issues you must make sure SQl Server service account has Locked Pages in Memory Privielge (LPIM). This will not let OS to force SQL Server page out its memory. If account running SQL Service is local system by default SQL Server will have this privilege in SQL Server 2012.
Note:
-
This is a workaround. The solution here would be to find out what is causing stress to VM machine. You should fix that. If you feel Wmware Balooning is the issue. You can use RAMMAP tool to track memory which is consumed by
-
Befor giving LPIM you must make sure you have set optimum value for max server memory and have left ENOUGH memory for OS to perform efficiently.
-
If you do not follow above two points and if OS comes under severe memory pressure due to LPIM OS processes would be paged out because it cannot force SQL Server to release memory(its locked/non pageable due to LPIM) and thus leading to tremendous slowness of OS processes.
Q: So, am I using the sys.dm_os_buffer_descriptors view correctly? From what I read it always shows used buffer/cached pages. So if it is empty (or significantly reduced), I either don't have the memory anymore, or it is empty. (I would love a way to confirm this conclusion.)
Buffer descriptors as already mentioned returns information about all the data pages that are currently in the SQL Server buffer pool. IMHO buffer pages
So what you are seeing via sys.dm_os_buffer_descriptors is not incorrect but I would
Let me ask you what is output of
Select @@Version. What is SP and CU level to which your SQL Server is patched. The reason I am asking this is because there was bug in SQl Server 2012 which forced PLE to plummet like what you are observing. Ths bug was fixed in SQL Server 2012 SP1 CU4. Or to be on safer said I would recommend you apply SQL Server 2012 SP2 instead of going for CU4Its sometime normal for PLE to fluctuate on system having high activity. Actually this is by very virtue how PLE code works in SQL Server. But the fact that its plummetting to zero quite frequently make me believe you might be hitting the bug I have mentioned above.
As per Microsoft Bug fix detail
You may experience slow performance in SQL Server 2012. When you check SQL Server Performance Monitor tools, you see the following:
•A rapid decline in the SQLServer:Buffer Manager\Page life expectancy performance counter values. When this issue occurs, the counter is near 0.
PLE on system is measure of how volatile your buffer pool is, its also measure of amount of I/O activity going in your SQL Server. MSDN says that
Page life expectancy - Indicates the number of seconds a page will stay in the buffer pool without references
Believe me this definition is incomplete. It describes it in form of time which is not a complete definition. I have always noticed that it is measure of I/O activity on server. The greater the I/O activity the more volatile would be BPool, thus fluctuating PLE.
Q: My guess is that the hardware for all the virtual machines is under stress, so it will randomly swap out some memory from the server for a while.
If you believe this is the case and you want SQL Server not be be victim of such issues you must make sure SQl Server service account has Locked Pages in Memory Privielge (LPIM). This will not let OS to force SQL Server page out its memory. If account running SQL Service is local system by default SQL Server will have this privilege in SQL Server 2012.
Note:
-
This is a workaround. The solution here would be to find out what is causing stress to VM machine. You should fix that. If you feel Wmware Balooning is the issue. You can use RAMMAP tool to track memory which is consumed by
Locked Driver. In RAMMAP tool if you see Locked driver taking huge memory its sign of VMware balooning. Take help from the team to configured/disable ballooning for the virtual machine on which SQL Server is running-
Befor giving LPIM you must make sure you have set optimum value for max server memory and have left ENOUGH memory for OS to perform efficiently.
-
If you do not follow above two points and if OS comes under severe memory pressure due to LPIM OS processes would be paged out because it cannot force SQL Server to release memory(its locked/non pageable due to LPIM) and thus leading to tremendous slowness of OS processes.
Q: So, am I using the sys.dm_os_buffer_descriptors view correctly? From what I read it always shows used buffer/cached pages. So if it is empty (or significantly reduced), I either don't have the memory anymore, or it is empty. (I would love a way to confirm this conclusion.)
Buffer descriptors as already mentioned returns information about all the data pages that are currently in the SQL Server buffer pool. IMHO buffer pages
are affected by I/O activity on server and thus indirectly related to PLE. If there is request to fetch large amount of pages from disk to memory its quite possible that SQL Server will flush datapages to disk if it finds it needs to create space in buffer pool to bring in the new pages in memory and thus decreasing the amount of data page present in memory for particular database.So what you are seeing via sys.dm_os_buffer_descriptors is not incorrect but I would
not suggest you to use Buffer descriptor DMV to gauge PLE on server. This would not be a correct approach.Context
StackExchange Database Administrators Q#115136, answer score: 5
Revisions (0)
No revisions yet.