patternsqlMinor
Why is my SQL Server giving out of memory errors when there should be plenty?
Viewed 0 times
givingwhyplentysqlshouldmemorywhenservererrorsthere
Problem
I have a production system running SQL Server 2019 Standard edition. It recently had a problem 3 times in 1 day where it became unresponsive until a reboot. Errors seem to point to memory limitations. 32GB is installed and the machine is dedicated to MSSQL. Max Memory set to 26GB.
(edited 2020-11-23 to remove some erroneous info re: MEMORYCLERK_SQLGENERAL.)
The most damnning info from the logs is a series of FAIL_PAGE_ALLOCATION errors.
Note: Since having the problem I've bumped up to a larger instance size (64GB)
Update 2020-11-22: The problem has reoccurred on the new larger instance. The memory errors began a few minutes after a scheduled backup of several databases. That may or may not be coincidence. It became nonresponsive and I forcibly restarted it.
The non-SQL load on this server is near-zero. The backup agent is just Ola Hallengren's TSQL scripts. The default SQL services are running but not in use. As mentioned, I've stopped SSAS. We occasionally use SSIS but not since the instance upgrade.
(edited 2020-11-23 to remove some erroneous info re: MEMORYCLERK_SQLGENERAL.)
The most damnning info from the logs is a series of FAIL_PAGE_ALLOCATION errors.
Note: Since having the problem I've bumped up to a larger instance size (64GB)
Update 2020-11-22: The problem has reoccurred on the new larger instance. The memory errors began a few minutes after a scheduled backup of several databases. That may or may not be coincidence. It became nonresponsive and I forcibly restarted it.
- I will start the MSFT support process as suggested by Josh and sqL_handLe.
- I've made an Agent job (sproc here) to collect the results from dm_os_memory_objects every 5 minutes, based on sqL_handLe's query.
- I've shared recent logs here. (I replaced the nl tags with actual newlines to make it easier to read.)
- Disabled SSAS as suggested by Jana Sattainathan. I did confirm it was running and had 13GB "Min Working Set" as shown in the linked article, even though its actual working set was small. I think it's possible this subtle issue is the culprit, but if it's as dangerous as it seems, I'm very surprised I haven't run into it before. Jana, consider converting your comment to an answer?
The non-SQL load on this server is near-zero. The backup agent is just Ola Hallengren's TSQL scripts. The default SQL services are running but not in use. As mentioned, I've stopped SSAS. We occasionally use SSIS but not since the instance upgrade.
Solution
Identifying large memory objects associated with MEMORYCLERK_SQLGENERAL may make this situation on November 18 more clear.
Recently I saw a case where automatic seeding of an availability group replica was generating errors on the remote side due to insufficient disk space. In that case, memory object MEMOBJ_DBCOPY grew to be quite large over time (MEMOBJ_DBCOPY is in MEMORYCLERK_SQLGENERAL).
Editing this to add ~~
Obviously, if MEMORYCLERK_SQLGENERAL just keeps growing it'll consume memory that could instead be used for other activities and performance will degrade over time. At some point, OOMs will occur - for example maybe there won't be enough memory in the pool and SQL Server will inform of that with an error.
And there's one more threshold/behavior to be aware of. When SQL Server "total server memory" is no longer in its initial growth after startup, and database cache declines to 2% of "target server memory" on any one SQLOS memory node (in this case looks like there is only one memory node), most if not all of the user worker threads will likely try to correct the situation and a large amount of CPU will be expended performing "free list stalls".
Microsoft mentions this briefly in this kb article (although the kb article doesn't shed any light on what might be causing MEMORYCLERK_SQLGENERAL to be growing so large in this case).
KB4536005 - Improvement: Fix incorrect memory page accounting that causes out-of-memory errors in SQL Server
https://support.microsoft.com/en-us/help/4536005/kb4536005-improvement-fix-incorrect-memory-page-accounting-that-causes
In some cases where SQL Server memory gets into internal trouble, using DBCC DROPCLEANBUFFERS, or DBCC FREEPROCCACHE, or DBCC FREESYSTEMCACHE ('ALL') can rescue the instance for the time being. Unfortunately, not the case here. Have to actually identify the memory objects involved in order to bring MEMORYCLERK_SQLGENERAL back under control.
Edited to discuss November 22 SQL Server log...
~~
In the November 22 memorystatus output, memoryclerk_sqlgeneral doesn’t seem to be of immediate concern: immediately after each of the many errors memoryclerk_sqlgeneral seems to be at or about the same size of 22040 kb. Other extremely interesting details: [system physical memory high] is 1(hinting SQL Server memory could grow) and [current committed] is less than [target committed] (also hinting SQL Server memory should be able to grow).
The full SQL Server log shows the increase of RAM in the instance.
We also see the modifications to [Max Server Memory].
Both 300000 mb and 280000 mb were too large for SQL Server to attain on a VM with 32409 mb of memory. So SQL Server chose lower values of [target server memory] on the VM that it could attain, and managed memory based on those values.
What is likely not as intuitive is on the new VM, for reasons currently unknown, SQL Server also chose a lower target than the new [Max Server Memory] value of 60000 mb. That can be seen in memorystatus sections like the following.
The [Target Committed] value here of 58468304 kb corresponds to what appe
-- memory objects in MEMORYCLERK_SQLGENERAL
SELECT omo.type, omo.pages_in_bytes, omo.partition_type_desc, omo.memory_node_id
FROM sys.dm_os_memory_objects omo
JOIN sys.dm_os_memory_clerks omc
ON omo.page_allocator_address = omc.page_allocator_address
WHERE omc.type = 'MEMORYCLERK_SQLGENERAL'
ORDER BY pages_in_bytes DESC;Recently I saw a case where automatic seeding of an availability group replica was generating errors on the remote side due to insufficient disk space. In that case, memory object MEMOBJ_DBCOPY grew to be quite large over time (MEMOBJ_DBCOPY is in MEMORYCLERK_SQLGENERAL).
Editing this to add ~~
Obviously, if MEMORYCLERK_SQLGENERAL just keeps growing it'll consume memory that could instead be used for other activities and performance will degrade over time. At some point, OOMs will occur - for example maybe there won't be enough memory in the pool and SQL Server will inform of that with an error.
And there's one more threshold/behavior to be aware of. When SQL Server "total server memory" is no longer in its initial growth after startup, and database cache declines to 2% of "target server memory" on any one SQLOS memory node (in this case looks like there is only one memory node), most if not all of the user worker threads will likely try to correct the situation and a large amount of CPU will be expended performing "free list stalls".
Microsoft mentions this briefly in this kb article (although the kb article doesn't shed any light on what might be causing MEMORYCLERK_SQLGENERAL to be growing so large in this case).
KB4536005 - Improvement: Fix incorrect memory page accounting that causes out-of-memory errors in SQL Server
https://support.microsoft.com/en-us/help/4536005/kb4536005-improvement-fix-incorrect-memory-page-accounting-that-causes
In some cases where SQL Server memory gets into internal trouble, using DBCC DROPCLEANBUFFERS, or DBCC FREEPROCCACHE, or DBCC FREESYSTEMCACHE ('ALL') can rescue the instance for the time being. Unfortunately, not the case here. Have to actually identify the memory objects involved in order to bring MEMORYCLERK_SQLGENERAL back under control.
Edited to discuss November 22 SQL Server log...
~~
In the November 22 memorystatus output, memoryclerk_sqlgeneral doesn’t seem to be of immediate concern: immediately after each of the many errors memoryclerk_sqlgeneral seems to be at or about the same size of 22040 kb. Other extremely interesting details: [system physical memory high] is 1(hinting SQL Server memory could grow) and [current committed] is less than [target committed] (also hinting SQL Server memory should be able to grow).
The full SQL Server log shows the increase of RAM in the instance.
11/18/2020 10:15:11,Server,Unknown,Detected 32409 MB of RAM. This is an informational message; no user action is required.
11/18/2020 15:26:01,Server,Unknown,Detected 64723 MB of RAM. This is an informational message; no user action is required.We also see the modifications to [Max Server Memory].
11/18/2020 10:33:03,spid96,Unknown,Configuration option 'max server memory (MB)' changed from 300000 to 280000. Run the RECONFIGURE statement to install.
11/19/2020 11:34:10,spid94,Unknown,Configuration option 'max server memory (MB)' changed from 280000 to 60000. Run the RECONFIGURE statement to install.Both 300000 mb and 280000 mb were too large for SQL Server to attain on a VM with 32409 mb of memory. So SQL Server chose lower values of [target server memory] on the VM that it could attain, and managed memory based on those values.
What is likely not as intuitive is on the new VM, for reasons currently unknown, SQL Server also chose a lower target than the new [Max Server Memory] value of 60000 mb. That can be seen in memorystatus sections like the following.
11/22/2020 09:06:59,spid69,Unknown,Memory Manager KB
---------------------------------------- ----------
VM Reserved 101157820
VM Committed 636280
Locked Pages Allocated 51513180
Large Pages Allocated 0
Emergency Memory 1024
Emergency Memory In Use 16
Target Committed 58468304
Current Committed 52149464
Pages Allocated 50371824
Pages Reserved 273864
Pages Free 35352
Pages In Use 7855040
Page Alloc Potential 50512960
NUMA Growth Phase 0
Last OOM Factor 3
Last OS Error 0The [Target Committed] value here of 58468304 kb corresponds to what appe
Code Snippets
-- memory objects in MEMORYCLERK_SQLGENERAL
SELECT omo.type, omo.pages_in_bytes, omo.partition_type_desc, omo.memory_node_id
FROM sys.dm_os_memory_objects omo
JOIN sys.dm_os_memory_clerks omc
ON omo.page_allocator_address = omc.page_allocator_address
WHERE omc.type = 'MEMORYCLERK_SQLGENERAL'
ORDER BY pages_in_bytes DESC;11/18/2020 10:15:11,Server,Unknown,Detected 32409 MB of RAM. This is an informational message; no user action is required.
11/18/2020 15:26:01,Server,Unknown,Detected 64723 MB of RAM. This is an informational message; no user action is required.11/18/2020 10:33:03,spid96,Unknown,Configuration option 'max server memory (MB)' changed from 300000 to 280000. Run the RECONFIGURE statement to install.
11/19/2020 11:34:10,spid94,Unknown,Configuration option 'max server memory (MB)' changed from 280000 to 60000. Run the RECONFIGURE statement to install.11/22/2020 09:06:59,spid69,Unknown,Memory Manager KB
---------------------------------------- ----------
VM Reserved 101157820
VM Committed 636280
Locked Pages Allocated 51513180
Large Pages Allocated 0
Emergency Memory 1024
Emergency Memory In Use 16
Target Committed 58468304
Current Committed 52149464
Pages Allocated 50371824
Pages Reserved 273864
Pages Free 35352
Pages In Use 7855040
Page Alloc Potential 50512960
NUMA Growth Phase 0
Last OOM Factor 3
Last OS Error 011/22/2020 09:06:59,spid69,Unknown,MEMORYCLERK_SQLBUFFERPOOL (node 0) KB
---------------------------------------- ----------
VM Reserved 25686200
VM Committed 141312
Locked Pages Allocated 1072140
SM Reserved 0
SM Committed 0
Pages Allocated 44537424
11/22/2020 09:06:59,spid69,Unknown,MEMORYCLERK_SQLGENERAL (node 0) KB
---------------------------------------- ----------
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
Pages Allocated 22040Context
StackExchange Database Administrators Q#279941, answer score: 2
Revisions (0)
No revisions yet.