patternsqlModerate
Anyone use SUMA, trace flag 8048, or trace flag 8015?
Viewed 0 times
anyone8015flag8048usetracesuma
Problem
Recently included SQL Server startup Trace Flag 8048 to resolve a serious spinlock contention issue in a SQL Server 2008 R2 system.
Interested to hear from others who have found usage cases where performance value was delivered by trace flag 8048 (promote query memory grant strategy from per-NUMA node to per-core), trace flag 8015 (SQL Server ignores physical NUMA), or SUMA (interleaved sufficiently uniform memory access, a BIOS option on some NUMA machines).
Trace flag 8048
Trace flag 8015
Gory details of system workload, gathered metrics from troubled system, and gathered metrics from the system after intervention follow.
Trace flag 8048 was a 'fix', but was it the best fix? Would SQL Server ignoring physical NUMA due to trace flag 8015 have accomplished the same thing? What about setting the BIOS to interleave memory, leaving the server with SMP-imitating SUMA behavior instead of NUMA behavior?
About the system
About the workload
Key findings on the troubled system
-
From Perfmon, 15 second intervals
-
-
-
From wait and spinlock DMVs, 5 minute intervals
-
-
Bob Dorr's CSS Engineer Blog post on trace flag 8048 indicates that systems with more than 8 cores per NUMA node can run into similar symptoms due to bottleneck in query memory grant. Trace flag 8048 will change the strategy to per-core instead of per-NUMA node.
The Intervention
MSSQL was restarted with -T8048 in place. The difference was immediately evident: buffer
Interested to hear from others who have found usage cases where performance value was delivered by trace flag 8048 (promote query memory grant strategy from per-NUMA node to per-core), trace flag 8015 (SQL Server ignores physical NUMA), or SUMA (interleaved sufficiently uniform memory access, a BIOS option on some NUMA machines).
Trace flag 8048
Trace flag 8015
Gory details of system workload, gathered metrics from troubled system, and gathered metrics from the system after intervention follow.
Trace flag 8048 was a 'fix', but was it the best fix? Would SQL Server ignoring physical NUMA due to trace flag 8015 have accomplished the same thing? What about setting the BIOS to interleave memory, leaving the server with SMP-imitating SUMA behavior instead of NUMA behavior?
About the system
- 4 hex core Xeon E7540 @ 2.00GHz, hyperthreaded
- 128 GB RAM
- WS2008R2
- MSSQL 2008 R2 SP2
- maxdop 6
About the workload
- 1000s of Batch scheduled/queued reports driven from 2 report application servers.
- 3 flavors of batches: daily, weekly, monthly
- All report application servers connections to SQL Server are made as a single service account
- Maximum report concurrency = 90
Key findings on the troubled system
-
From Perfmon, 15 second intervals
-
- System remains at 95%-100% CPU busy
-
- SQL Server buffer page lookups
-
From wait and spinlock DMVs, 5 minute intervals
-
- High CMEMTHREAD waiters and wait time
-
- High SOS_SUSPEND_QUEUE spins and backoffs
Bob Dorr's CSS Engineer Blog post on trace flag 8048 indicates that systems with more than 8 cores per NUMA node can run into similar symptoms due to bottleneck in query memory grant. Trace flag 8048 will change the strategy to per-core instead of per-NUMA node.
The Intervention
MSSQL was restarted with -T8048 in place. The difference was immediately evident: buffer
Solution
This is an awesome post.
To answer your final question, I'd speculate that your answer is "yes".
That said, I probably would have pursued soft numa before resorting to the trace flags. I think you are right about the numa node allocation and that's could be at the root of your problem. Via soft numa, you could scale out the requests, depending on your count of numa nodes (4?) - to 4, if that's the correct number, and then assign, via ip address, each host to a specific numa node, in addition to that, I'd disable hyper threading. Combined, the issue would likely decrease, however, it would do so at the cost of fewer schedulers.
On a seperate thought, I'd look at forced parameterization - the fact that your load is driving your CPU so high is very interesting and it may be worth looking into that.
Lastly, on multi-numa node systems, I typically have the output of the following queries dumping to a table every N seconds. Makes for some interesting analysis when workload changes or trace flags are implemented:
and
To answer your final question, I'd speculate that your answer is "yes".
That said, I probably would have pursued soft numa before resorting to the trace flags. I think you are right about the numa node allocation and that's could be at the root of your problem. Via soft numa, you could scale out the requests, depending on your count of numa nodes (4?) - to 4, if that's the correct number, and then assign, via ip address, each host to a specific numa node, in addition to that, I'd disable hyper threading. Combined, the issue would likely decrease, however, it would do so at the cost of fewer schedulers.
On a seperate thought, I'd look at forced parameterization - the fact that your load is driving your CPU so high is very interesting and it may be worth looking into that.
Lastly, on multi-numa node systems, I typically have the output of the following queries dumping to a table every N seconds. Makes for some interesting analysis when workload changes or trace flags are implemented:
SELECT getdate() as poll_time, node_id, node_state_desc, memory_node_id, online_scheduler_count, active_worker_count, avg_load_balance, idle_scheduler_count
FROM sys.dm_os_nodes WITH (NOLOCK)
WHERE node_state_desc <> N'ONLINE DAC'and
SELECT top 10 getdate() as sample_poll, wait_type, count (*)
FROM sys.dm_os_waiting_tasks
WHERE [wait_type] NOT IN
('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK','SLEEP_SYSTEMTASK',
'SQLTRACE_BUFFER_FLUSH','WAITFOR', 'BROKER_TASK_STOP',
'BROKER_RECEIVE_WAITFOR', 'OLEDB','CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT' )
GROUP BY wait_type
ORDER BY COUNT (*) DESCCode Snippets
SELECT getdate() as poll_time, node_id, node_state_desc, memory_node_id, online_scheduler_count, active_worker_count, avg_load_balance, idle_scheduler_count
FROM sys.dm_os_nodes WITH (NOLOCK)
WHERE node_state_desc <> N'ONLINE DAC'SELECT top 10 getdate() as sample_poll, wait_type, count (*)
FROM sys.dm_os_waiting_tasks
WHERE [wait_type] NOT IN
('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK','SLEEP_SYSTEMTASK',
'SQLTRACE_BUFFER_FLUSH','WAITFOR', 'BROKER_TASK_STOP',
'BROKER_RECEIVE_WAITFOR', 'OLEDB','CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT' )
GROUP BY wait_type
ORDER BY COUNT (*) DESCContext
StackExchange Database Administrators Q#28756, answer score: 13
Revisions (0)
No revisions yet.