HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Why is my SQL Server giving out of memory errors when there should be plenty?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
givingwhyplentysqlshouldmemorywhenservererrorsthere

Problem

I have a production system running SQL Server 2019 Standard edition. It recently had a problem 3 times in 1 day where it became unresponsive until a reboot. Errors seem to point to memory limitations. 32GB is installed and the machine is dedicated to MSSQL. Max Memory set to 26GB.

(edited 2020-11-23 to remove some erroneous info re: MEMORYCLERK_SQLGENERAL.)

The most damnning info from the logs is a series of FAIL_PAGE_ALLOCATION errors.

Note: Since having the problem I've bumped up to a larger instance size (64GB)

Update 2020-11-22: The problem has reoccurred on the new larger instance. The memory errors began a few minutes after a scheduled backup of several databases. That may or may not be coincidence. It became nonresponsive and I forcibly restarted it.

  • I will start the MSFT support process as suggested by Josh and sqL_handLe.



  • I've made an Agent job (sproc here) to collect the results from dm_os_memory_objects every 5 minutes, based on sqL_handLe's query.



  • I've shared recent logs here. (I replaced the nl tags with actual newlines to make it easier to read.)



  • Disabled SSAS as suggested by Jana Sattainathan. I did confirm it was running and had 13GB "Min Working Set" as shown in the linked article, even though its actual working set was small. I think it's possible this subtle issue is the culprit, but if it's as dangerous as it seems, I'm very surprised I haven't run into it before. Jana, consider converting your comment to an answer?



The non-SQL load on this server is near-zero. The backup agent is just Ola Hallengren's TSQL scripts. The default SQL services are running but not in use. As mentioned, I've stopped SSAS. We occasionally use SSIS but not since the instance upgrade.

Solution

Identifying large memory objects associated with MEMORYCLERK_SQLGENERAL may make this situation on November 18 more clear.

-- memory objects in MEMORYCLERK_SQLGENERAL
SELECT omo.type, omo.pages_in_bytes, omo.partition_type_desc, omo.memory_node_id
FROM sys.dm_os_memory_objects omo
JOIN sys.dm_os_memory_clerks omc 
ON omo.page_allocator_address = omc.page_allocator_address
WHERE omc.type = 'MEMORYCLERK_SQLGENERAL'
ORDER BY pages_in_bytes DESC;


Recently I saw a case where automatic seeding of an availability group replica was generating errors on the remote side due to insufficient disk space. In that case, memory object MEMOBJ_DBCOPY grew to be quite large over time (MEMOBJ_DBCOPY is in MEMORYCLERK_SQLGENERAL).

Editing this to add ~~
Obviously, if MEMORYCLERK_SQLGENERAL just keeps growing it'll consume memory that could instead be used for other activities and performance will degrade over time. At some point, OOMs will occur - for example maybe there won't be enough memory in the pool and SQL Server will inform of that with an error.

And there's one more threshold/behavior to be aware of. When SQL Server "total server memory" is no longer in its initial growth after startup, and database cache declines to 2% of "target server memory" on any one SQLOS memory node (in this case looks like there is only one memory node), most if not all of the user worker threads will likely try to correct the situation and a large amount of CPU will be expended performing "free list stalls".

Microsoft mentions this briefly in this kb article (although the kb article doesn't shed any light on what might be causing MEMORYCLERK_SQLGENERAL to be growing so large in this case).

KB4536005 - Improvement: Fix incorrect memory page accounting that causes out-of-memory errors in SQL Server
https://support.microsoft.com/en-us/help/4536005/kb4536005-improvement-fix-incorrect-memory-page-accounting-that-causes

In some cases where SQL Server memory gets into internal trouble, using DBCC DROPCLEANBUFFERS, or DBCC FREEPROCCACHE, or DBCC FREESYSTEMCACHE ('ALL') can rescue the instance for the time being. Unfortunately, not the case here. Have to actually identify the memory objects involved in order to bring MEMORYCLERK_SQLGENERAL back under control.

Edited to discuss November 22 SQL Server log...
~~

In the November 22 memorystatus output, memoryclerk_sqlgeneral doesn’t seem to be of immediate concern: immediately after each of the many errors memoryclerk_sqlgeneral seems to be at or about the same size of 22040 kb. Other extremely interesting details: [system physical memory high] is 1(hinting SQL Server memory could grow) and [current committed] is less than [target committed] (also hinting SQL Server memory should be able to grow).

The full SQL Server log shows the increase of RAM in the instance.

11/18/2020 10:15:11,Server,Unknown,Detected 32409 MB of RAM. This is an informational message; no user action is required.

11/18/2020 15:26:01,Server,Unknown,Detected 64723 MB of RAM. This is an informational message; no user action is required.


We also see the modifications to [Max Server Memory].

11/18/2020 10:33:03,spid96,Unknown,Configuration option 'max server memory (MB)' changed from 300000 to 280000. Run the RECONFIGURE statement to install.

11/19/2020 11:34:10,spid94,Unknown,Configuration option 'max server memory (MB)' changed from 280000 to 60000. Run the RECONFIGURE statement to install.


Both 300000 mb and 280000 mb were too large for SQL Server to attain on a VM with 32409 mb of memory. So SQL Server chose lower values of [target server memory] on the VM that it could attain, and managed memory based on those values.

What is likely not as intuitive is on the new VM, for reasons currently unknown, SQL Server also chose a lower target than the new [Max Server Memory] value of 60000 mb. That can be seen in memorystatus sections like the following.

11/22/2020 09:06:59,spid69,Unknown,Memory Manager               KB                             
---------------------------------------- ----------
VM Reserved                               101157820
VM Committed                                 636280
Locked Pages Allocated                     51513180
Large Pages Allocated                             0
Emergency Memory                               1024
Emergency Memory In Use                          16
Target Committed                           58468304
Current Committed                          52149464
Pages Allocated                            50371824
Pages Reserved                               273864
Pages Free                                    35352
Pages In Use                                7855040
Page Alloc Potential                       50512960
NUMA Growth Phase                                 0
Last OOM Factor                                   3
Last OS Error                                     0


The [Target Committed] value here of 58468304 kb corresponds to what appe

Code Snippets

-- memory objects in MEMORYCLERK_SQLGENERAL
SELECT omo.type, omo.pages_in_bytes, omo.partition_type_desc, omo.memory_node_id
FROM sys.dm_os_memory_objects omo
JOIN sys.dm_os_memory_clerks omc 
ON omo.page_allocator_address = omc.page_allocator_address
WHERE omc.type = 'MEMORYCLERK_SQLGENERAL'
ORDER BY pages_in_bytes DESC;
11/18/2020 10:15:11,Server,Unknown,Detected 32409 MB of RAM. This is an informational message; no user action is required.

11/18/2020 15:26:01,Server,Unknown,Detected 64723 MB of RAM. This is an informational message; no user action is required.
11/18/2020 10:33:03,spid96,Unknown,Configuration option 'max server memory (MB)' changed from 300000 to 280000. Run the RECONFIGURE statement to install.

11/19/2020 11:34:10,spid94,Unknown,Configuration option 'max server memory (MB)' changed from 280000 to 60000. Run the RECONFIGURE statement to install.
11/22/2020 09:06:59,spid69,Unknown,Memory Manager               KB                             
---------------------------------------- ----------
VM Reserved                               101157820
VM Committed                                 636280
Locked Pages Allocated                     51513180
Large Pages Allocated                             0
Emergency Memory                               1024
Emergency Memory In Use                          16
Target Committed                           58468304
Current Committed                          52149464
Pages Allocated                            50371824
Pages Reserved                               273864
Pages Free                                    35352
Pages In Use                                7855040
Page Alloc Potential                       50512960
NUMA Growth Phase                                 0
Last OOM Factor                                   3
Last OS Error                                     0
11/22/2020 09:06:59,spid69,Unknown,MEMORYCLERK_SQLBUFFERPOOL (node 0)               KB
---------------------------------------- ----------
VM Reserved                                25686200
VM Committed                                 141312
Locked Pages Allocated                      1072140
SM Reserved                                       0
SM Committed                                      0
Pages Allocated                            44537424
11/22/2020 09:06:59,spid69,Unknown,MEMORYCLERK_SQLGENERAL (node 0)                  KB
---------------------------------------- ----------
VM Reserved                                       0
VM Committed                                      0
Locked Pages Allocated                            0
SM Reserved                                       0
SM Committed                                      0
Pages Allocated                               22040

Context

StackExchange Database Administrators Q#279941, answer score: 2

Revisions (0)

No revisions yet.