patternMinor
Does surrogate key assignment for a fact table require that the source data has natural keys?
Viewed 0 times
thefactsourcehasassignmentkeysthatforsurrogatedoes
Problem
Let's say I have a simple OLTP database with orders, products, and customers:
And from it, I am building a data mart with an orders fact table, product dimension, customer dimension, and date dimension:
When loading the orders table into fact_orders (let's say I was using an SSIS Lookup Transformation to assign the surrogate keys), does that mean that the source of data for the orders would also need to have the natural "foreign key" values that were associated with the order in the OLTP system?
In other words, would the data that's being loaded come from a query like this?
And from it, I am building a data mart with an orders fact table, product dimension, customer dimension, and date dimension:
When loading the orders table into fact_orders (let's say I was using an SSIS Lookup Transformation to assign the surrogate keys), does that mean that the source of data for the orders would also need to have the natural "foreign key" values that were associated with the order in the OLTP system?
In other words, would the data that's being loaded come from a query like this?
SELECT
order_date, -- needed to get date surrogate key
customer_name, -- needed to get customer surrogate key
product_name, -- needed to get product surrogate key
order_number, -- denegenerate dimension,
qty_ordered AS order_qty, -- measure
total_amount AS order_amount -- measure
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.customer_id
INNER JOIN products p
ON o.product_id = p.product_idSolution
I'm not sure that you need "natural" keys, but you probably do need to maintain a key mapping of sorts. So you need to understand what relationships map between your source and target systems, identify the keys for those relationships and build your key mappings from there.
I had a previous question on this called "What is the best practice for mapping from natural keys to integer-based keys? (ETL)".
EDIT: So far I am seeing at least three, if not four mappings.
And lastly, I see the order_id as your key to the fact table. So I
would go
In my case I renamed the fields for the mart with dim_field_id because
I didn't want name collision within my tables or confusion as to which
Id they pointed to. Your ETL would have to know that
CustomersToDim_Customers.dim_customer_id really maps to
Dim_Customers.customer_id and that
CustomersToDim_Customers.customer_id really maps to
Customers.customer_id.
I would also be half-inclined to include the order_number in the
OrdersToFactOrders mapping table, but that is because I like to have
tracking data for audit purposes. Makes my life easier. But, based on
what you told me, order_number and order_id are one-to-one so the
inclusion of order_number would be redundant then and only necessary
if you have a perfectionist paranoia to make sure your data is correct
on both sides (I really like to make sure that A on side A and B on
side B are really correct after the ETL is done.).
I had a previous question on this called "What is the best practice for mapping from natural keys to integer-based keys? (ETL)".
EDIT: So far I am seeing at least three, if not four mappings.
CustomersToDim_Customers (customer_id, dim_customer_id)
ProductsToDim_Products (product_id, dim_product_id)
OrderDatesToDim_Date (order_date, date_id) or (map_id,order_date,date_id) if you want to use a key to map.And lastly, I see the order_id as your key to the fact table. So I
would go
OrdersToFactOrders (order_id,dim_date_id,dim_customer_id,dim_product_id)In my case I renamed the fields for the mart with dim_field_id because
I didn't want name collision within my tables or confusion as to which
Id they pointed to. Your ETL would have to know that
CustomersToDim_Customers.dim_customer_id really maps to
Dim_Customers.customer_id and that
CustomersToDim_Customers.customer_id really maps to
Customers.customer_id.
I would also be half-inclined to include the order_number in the
OrdersToFactOrders mapping table, but that is because I like to have
tracking data for audit purposes. Makes my life easier. But, based on
what you told me, order_number and order_id are one-to-one so the
inclusion of order_number would be redundant then and only necessary
if you have a perfectionist paranoia to make sure your data is correct
on both sides (I really like to make sure that A on side A and B on
side B are really correct after the ETL is done.).
Code Snippets
CustomersToDim_Customers (customer_id, dim_customer_id)
ProductsToDim_Products (product_id, dim_product_id)
OrderDatesToDim_Date (order_date, date_id) or (map_id,order_date,date_id) if you want to use a key to map.OrdersToFactOrders (order_id,dim_date_id,dim_customer_id,dim_product_id)Context
StackExchange Database Administrators Q#23365, answer score: 3
Revisions (0)
No revisions yet.