patternMinor
Handling attributes that are time-variant in a Datamart
Viewed 0 times
handlingvariantaretimeattributesthatdatamart
Problem
I have looked through the entire list of sites, and this is I think the best match. This is not really about database administration, more like database design. Please excuse me and point me to the correct site.
I am designing a database for a rudimentary BI system. At this moment I have hit a wall, which is this (explaining using dummy data):
Suppose my fact table contains this information:
The dimensions and their attributes are:
Now, from this I can easily generate a report like this:
Using a query like:
But my problem comes from the fact that the "club" status of a flyer is a moving target. A flyer who is in Gold today could have been in Silver in October, so I am counting him in the incorrect group here. Thus, I imagine I need a separate fact table like this:
"Club" drops out as an attribute of the original flyer dimension. Instead, a new club dimension emerges.
And then to generate the report I need, I join these two fact tables.
Am I on the right track? Or is there an alternative, simpler solution to this? One alternative I could think of is to include the club in the original fact table, handling it during the ETL process. So the fact becomes:
Please let me know which approach is better, or if there is a third one.
I am designing a database for a rudimentary BI system. At this moment I have hit a wall, which is this (explaining using dummy data):
Suppose my fact table contains this information:
John Doe flew from LAX to ATL on 1 Nov in flight AB-123The dimensions and their attributes are:
- Flyer - name, club
- Airport - city, code
- Date - year, month, date
- Flight - code, std, delay, price
Now, from this I can easily generate a report like this:
Airport --> LAX DFW ORD ATL Total
Gold 50 40 10 25 125
Silver 240 300 95 140 775
Bronze 1000 1500 800 1800 5100
Total 1290 1840 905 1965 6000Using a query like:
select fd.club, ad.code, count(f1.id) from flyer fd, airport ad, fact1 f1
where fd.id = f1.fid and ad.id = f1.aid and month(f1.date) = 10
group by f1.club, ad.code;But my problem comes from the fact that the "club" status of a flyer is a moving target. A flyer who is in Gold today could have been in Silver in October, so I am counting him in the incorrect group here. Thus, I imagine I need a separate fact table like this:
John Doe entered Bronze club on 8/15
John Doe entered Silver club on 10/20
..."Club" drops out as an attribute of the original flyer dimension. Instead, a new club dimension emerges.
And then to generate the report I need, I join these two fact tables.
Am I on the right track? Or is there an alternative, simpler solution to this? One alternative I could think of is to include the club in the original fact table, handling it during the ETL process. So the fact becomes:
John Doe of Silver Club flew from LAX to ATL on 1 Nov in flight AB-123Please let me know which approach is better, or if there is a third one.
Solution
The way to do this is what Kimball called a Type-2 or Type-6 slowly changing dimension.. Essentially, a type-2 SCD has a synthetic dimension key, and a unique key consisting of the natural key of the underlying entity (in this case the flyer) and an 'effective from' date. The synthetic key is joined against the fact table, so you can attach it with a simple equi-join (i.e. you don't have to filter by date range in the query).
All the attributes (e.g. club in this case) are attributes of the flyer. If one of these attributes changes, a new row is created on the dimension recording the new state, effective from the date of the change.
The type-6 is like an ordinary type 2, but has a self-join to the current version of the row. Whenever a new row is created for a given natural key all rows for that natural key are updated with the self-join to the current row. You may or may not need this functionality.
You can query an as-at status by joining the fact tables against the row that was recorded on them - i.e. the state that was current. If you have a type-6 the current status can be queried through the self-join, which can also be materialised on the fact table if desired.
This data will also play nicely with ad-hoc reporting tools and cubes, although implementing complex cube hiererchies on a slowly changing dimension is a bit fiddly (you need to keep placeholders for the natural keys of the hierarchy levels and combinations over time).
A good point to start would be a google search on "type 2 slowly changing dimension"
All the attributes (e.g. club in this case) are attributes of the flyer. If one of these attributes changes, a new row is created on the dimension recording the new state, effective from the date of the change.
The type-6 is like an ordinary type 2, but has a self-join to the current version of the row. Whenever a new row is created for a given natural key all rows for that natural key are updated with the self-join to the current row. You may or may not need this functionality.
You can query an as-at status by joining the fact tables against the row that was recorded on them - i.e. the state that was current. If you have a type-6 the current status can be queried through the self-join, which can also be materialised on the fact table if desired.
This data will also play nicely with ad-hoc reporting tools and cubes, although implementing complex cube hiererchies on a slowly changing dimension is a bit fiddly (you need to keep placeholders for the natural keys of the hierarchy levels and combinations over time).
A good point to start would be a google search on "type 2 slowly changing dimension"
Context
StackExchange Database Administrators Q#8606, answer score: 2
Revisions (0)
No revisions yet.