snippetsqlMinor
Data warehouse server. How do you calculate RAM/CPU specifications?
Viewed 0 times
youwarehousecalculatespecificationshowservercpudataram
Problem
I am trying to write a spec for a data warehouse server for our planned data warehouse upgrade.
As we run virtual servers on VMWare hosts we have the ability to add or remove resources as necessary. In the past we've incrementally added RAM and CPU as required. As our demands have increased we've lobbied for more resources. (primarily disk & RAM).
We ask for more. They give us as little as possible.
However recently whenever we talk about resources we are now criticized for not spec'ing the machine right in the first place, and I am now being told the dev hosts are maxed out, there is no more RAM available.
We're a small Local Government organisation with ~50 regular users of the DW. In normal daily use it runs fine. We get good mdx query performance, and our reports and dashboards are fast. Users are happy.
However our ETL processes run throughout the night, and we're starting to see evidence of memory pressure when processing datamarts simultaneously. Last night SSIS failed with warnings about an "out of memory error".
Our existing DW server is Win 2008 R2 with 4 CPU's and 16Gb of RAM running SQL 2012 Std. I have max server memory set to 12GB, leaving 4GB for OS and services etc. Our existing DW has 3 datamarts/OLAP cubes, and we are developing 2 more.
Our new
As we run virtual servers on VMWare hosts we have the ability to add or remove resources as necessary. In the past we've incrementally added RAM and CPU as required. As our demands have increased we've lobbied for more resources. (primarily disk & RAM).
We ask for more. They give us as little as possible.
However recently whenever we talk about resources we are now criticized for not spec'ing the machine right in the first place, and I am now being told the dev hosts are maxed out, there is no more RAM available.
We're a small Local Government organisation with ~50 regular users of the DW. In normal daily use it runs fine. We get good mdx query performance, and our reports and dashboards are fast. Users are happy.
However our ETL processes run throughout the night, and we're starting to see evidence of memory pressure when processing datamarts simultaneously. Last night SSIS failed with warnings about an "out of memory error".
Our existing DW server is Win 2008 R2 with 4 CPU's and 16Gb of RAM running SQL 2012 Std. I have max server memory set to 12GB, leaving 4GB for OS and services etc. Our existing DW has 3 datamarts/OLAP cubes, and we are developing 2 more.
+----------+----------+---------------+-----------+---------------+
| Datamart | Files GB | Fact (Rows) | Fact (Mb) | ETL & Process |
| OLAP cube| | | | Time (hours) |
+----------+----------+---------------+-----------+---------------+
| PBI | 3 | 190,000 | 180 | 0.2 |
| FBI | 30 | 26,100,000 | 10,000 | 1.5 |
| RBI | 175 | 62,000,000 | 32,000 | 8.3 |
| ABI* | 100 | 44,050,000 | 21,000 | 4.0 |
| EBI* | 11 | 100,000,000 | 6,000 | 2.0 |
+----------+----------+---------------+-----------+---------------+
* Planned/Estimated
Our new
Solution
Great question, and I did a session about this at TechEd a few years ago called Building the Fastest SQL Servers:
https://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI328
In it, I explain that for data warehouses, you need storage that can provide data fast enough for SQL Server to consume it. Microsoft built a great series of white papers called the Fast Track Data Warehouse Reference Architecture that goes into hardware details, but the basic idea is that your storage needs to be able to provide 200-300MB/sec sequential read performance, per CPU core, in order to keep the CPUs busy.
The more of your data that you can cache in memory, the slower storage you can get away with. But you've got less memory than required to cache the fact tables that you're dealing with, so storage speed becomes very important.
Here's your next steps:
Say you've got a 200GB database that you're dealing with, and you can't get enough storage throughput to keep your cores busy. It's not unthinkable to need not just 200GB of RAM, but even more - because after all, SSIS and SSAS really want to do their work in memory, so you have to have the engine's data available, plus work space for SSIS and SSAS.
This is also why people try to separate out SSIS and SSAS onto different VMs - they all need memory simultaneously.
https://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI328
In it, I explain that for data warehouses, you need storage that can provide data fast enough for SQL Server to consume it. Microsoft built a great series of white papers called the Fast Track Data Warehouse Reference Architecture that goes into hardware details, but the basic idea is that your storage needs to be able to provide 200-300MB/sec sequential read performance, per CPU core, in order to keep the CPUs busy.
The more of your data that you can cache in memory, the slower storage you can get away with. But you've got less memory than required to cache the fact tables that you're dealing with, so storage speed becomes very important.
Here's your next steps:
- Watch that video
- Test your storage with CrystalDiskMark (Here's how)
- With 4 cores, you'll want at least 800MB/sec of sequential read throughput
- If you don't have that, consider adding memory until the pain goes away (and caching the entire database in RAM isn't unthinkable)
Say you've got a 200GB database that you're dealing with, and you can't get enough storage throughput to keep your cores busy. It's not unthinkable to need not just 200GB of RAM, but even more - because after all, SSIS and SSAS really want to do their work in memory, so you have to have the engine's data available, plus work space for SSIS and SSAS.
This is also why people try to separate out SSIS and SSAS onto different VMs - they all need memory simultaneously.
Context
StackExchange Database Administrators Q#128914, answer score: 9
Revisions (0)
No revisions yet.