patternsqlMinor
SQL Server Data Warehouse stored procedure location
Viewed 0 times
storedsqlwarehouseprocedureserverdatalocation
Problem
We are building a Kimball data warehouse. We have a Staging Database, and Final Datawarehouse.
We have a stored procedure which creates the table CustomerTransaction.
Where is the best location to place the stored procedure? In the Staging Database or the Datawarehouse? Or what is the positives/negatives of each location? I just started creating data warehouses and trying to learn.
We have a stored procedure which creates the table CustomerTransaction.
Where is the best location to place the stored procedure? In the Staging Database or the Datawarehouse? Or what is the positives/negatives of each location? I just started creating data warehouses and trying to learn.
create procedure dbo.FactCustomerTransactionImport -- should this be in StagingDB or datawarehouse DB?
as
insert into DWFinal.dbo.FactCustomerTransaction
(
CustomerId,
Quantity,
Price,
Amount
)
select
CustomerId,
Quantity,
Price,
Quantity * Price as Amount
from StagingDB.dbo.CustomerTransactionSolution
This is a matter of taste. The main reason I see to break things up at all is for security, and secondarily for backups.
I think of ETL activity, including loading the final star schema tables, as an aspect of the
Here's how I typically break up a data warehouse. These should be distinct physical databases, and may be separate VMs or physical machines if the volume of data requires it.
In practice, you might need to grant some power users access to specific source databases, if there's data they need that doesn't rate being incorporated into the star schema. In such a case, you may also need to grant access to the synthetic key tables in
I think of ETL activity, including loading the final star schema tables, as an aspect of the
Staging layer. That's where all the "heavy lifting" happens in my warehouses.Here's how I typically break up a data warehouse. These should be distinct physical databases, and may be separate VMs or physical machines if the volume of data requires it.
- A separate database for each important data source (CRM, OLTP database, telephony system, etc.). Here, I store a straightforward copy of data from the source system, with a minimum of modification. E.g., I will record
DateCapturedandDateModifiedfor each record, but perform no cleanup of data values or types. The idea is to grab data as quickly as possible, to minimize load on the source system.
- In principle, these databases can be purged at will and repopulated from the source systems with no change to the rest of the warehouse.
- Using separate databases for each source lets users (i.e., service accounts) have limited access. My
Service_SalesforceReaderuser has read and write privs in theSalesforcedatabase, and nowhere else.
- Writing to these databases typically happens via dedicated ETL applications, with a few SQL procedures supporting them.
- A
Stagingdatabase for the intermediate steps; the "T" in "ETL". This includes lookup tables for data cleaning, tables for allocating synthetic keys, and log tables for ETL processes.
- Data here is mostly static which can be recreated from source control (e.g., country codes) or transformations of the source data, which can be recreated from those systems, albeit perhaps with different synthetic keys being created. Only people on the warehouse team should need access here.
- This database has 90% of the SQL procedures. They format and pivot the data into the star schema layout. There's one objective reason to keep the ETL logic in Staging: security. Only warehouse admins need access to those procedures, whereas many people may need access to the Warehouse layer.
- A
Warehousedatabase, with tidy fact tables and dimension tables. This layer may have views to help with specific consumers. E.g., if you have a visualization tool that doesn't play nice with joins, there might be a view (possibly materialized) for each star, with all dimensions joined in.
- The only sprocs I have in this layer, in my current warehouse, are for logging, and I don't believe they're actually used.
In practice, you might need to grant some power users access to specific source databases, if there's data they need that doesn't rate being incorporated into the star schema. In such a case, you may also need to grant access to the synthetic key tables in
Staging that link them together, so they can traverse from (for example) a CustomerSK integer to a Salesforce AccountID string. Doing so makes it difficult for you to change anything without breaking people's processes, so this should be a last resort.Context
StackExchange Database Administrators Q#220766, answer score: 4
Revisions (0)
No revisions yet.