HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Failed to allocate BUFs during DBCC CHECKDB

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
duringdbcccheckdballocatefailedbufs

Problem

I am writing to see if anyone can help me out with a problem that I just can't seem to figure out. This one is going to be a doosey and I am going to try and list out as much of the important info that I can but if I miss anything please let me know and I would be happy to provide whatever info you might need to be able to assist if you're willing?

The symptom that I am experiencing is that I have dbcc checkdb locking up when running against a VLDB (approx 1TB) as one of the tasks in a Maintenance Plan and the error log is reporting the error: Failed to allocate BUFs: FAIL_BUFFER_ALLOCATION 7 (sometimes 8) and then the error log starts filling up with memory charts (best way I can describe it, see attached screenshot) about physical and virtual memory.

So here is the scenario. We are currently beginning testing of our NEWSERVER before we migrate off our OLDSERVER. Everything is working as expected on our OLDSERVER. The problem is occurring on our NEWSERVER in the PROD instance during our nightly Maintenance Plan routine. There are multiple db's present in the instance but the one that we are concerned with is DB1. DB1 is made up of 2 data files and 1 log file. On the OLDSERVER the .mdf (519 GB) is located on H:, the .ndf (200 GB) is located on E:, and the .ldf (313 GB) on D:. On the NEWSERVER both data files are on E: and the log file is on D:. Note: I was not involved in the configuration of the database having 2 data files or their location, or the setup/configuration of either of the servers.

On the OLDSERVER the maintenance plan (consisting of a Check Database Integrity task, Full Database Backup, and a Maintenance Cleanup Task and is configured to run against DB1 only) completes nightly with no issues. On the NEWSERVER the maintenance plan (set up exactly the same way) will sometimes complete but is mostly slowing to a snail's crawl (or find something even slower than a snail) and will eventually fail during the Check Database Integrity task .

I can run th

Solution

While I can't tell from the OLDSERVER data, it looks like NEWSERVER has at least 2 NUMA Nodes whereas I'm not sure if OLDSERVER has multiple NUMA Nodes or a single node.

Assuming, for the moment, that the OLDERSERVER only had a single NUMA Node, it does seem you're hitting a known issue which was fixed in SQL Server 2016 SP2 CU5, whereas you're currently on SQL server 2016 SP2 CU3.

An "Out of Memory" error can occur when a Database Node Memory (KB) becomes less than 2 percent of the target size, and it cannot discard database pages on the node anymore to get free pages.

If you look at the MEMORYSTATUS output, it does fall into this with about 1.59% of pages available, at least on NUMA Node 0.

I'd apply, at a minimum, SP2 CU5 and run the load again.

Context

StackExchange Database Administrators Q#241402, answer score: 2

Revisions (0)

No revisions yet.