HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Handling attributes that are time-variant in a Datamart

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
handlingvariantaretimeattributesthatdatamart

Problem

I have looked through the entire list of sites, and this is I think the best match. This is not really about database administration, more like database design. Please excuse me and point me to the correct site.

I am designing a database for a rudimentary BI system. At this moment I have hit a wall, which is this (explaining using dummy data):

Suppose my fact table contains this information:

John Doe flew from LAX to ATL on 1 Nov in flight AB-123


The dimensions and their attributes are:

  • Flyer - name, club



  • Airport - city, code



  • Date - year, month, date



  • Flight - code, std, delay, price



Now, from this I can easily generate a report like this:

Airport --> LAX  DFW  ORD  ATL Total
Gold         50   40   10   25   125
Silver      240  300   95  140   775
Bronze     1000 1500  800 1800  5100
Total      1290 1840  905 1965  6000


Using a query like:

select fd.club, ad.code, count(f1.id) from flyer fd, airport ad, fact1 f1
where fd.id = f1.fid and ad.id = f1.aid and month(f1.date) = 10
group by f1.club, ad.code;


But my problem comes from the fact that the "club" status of a flyer is a moving target. A flyer who is in Gold today could have been in Silver in October, so I am counting him in the incorrect group here. Thus, I imagine I need a separate fact table like this:

John Doe entered Bronze club on 8/15
 John Doe entered Silver club on 10/20
 ...


"Club" drops out as an attribute of the original flyer dimension. Instead, a new club dimension emerges.

And then to generate the report I need, I join these two fact tables.

Am I on the right track? Or is there an alternative, simpler solution to this? One alternative I could think of is to include the club in the original fact table, handling it during the ETL process. So the fact becomes:

John Doe of Silver Club flew from LAX to ATL on 1 Nov in flight AB-123


Please let me know which approach is better, or if there is a third one.

Solution

The way to do this is what Kimball called a Type-2 or Type-6 slowly changing dimension.. Essentially, a type-2 SCD has a synthetic dimension key, and a unique key consisting of the natural key of the underlying entity (in this case the flyer) and an 'effective from' date. The synthetic key is joined against the fact table, so you can attach it with a simple equi-join (i.e. you don't have to filter by date range in the query).

All the attributes (e.g. club in this case) are attributes of the flyer. If one of these attributes changes, a new row is created on the dimension recording the new state, effective from the date of the change.

The type-6 is like an ordinary type 2, but has a self-join to the current version of the row. Whenever a new row is created for a given natural key all rows for that natural key are updated with the self-join to the current row. You may or may not need this functionality.

You can query an as-at status by joining the fact tables against the row that was recorded on them - i.e. the state that was current. If you have a type-6 the current status can be queried through the self-join, which can also be materialised on the fact table if desired.

This data will also play nicely with ad-hoc reporting tools and cubes, although implementing complex cube hiererchies on a slowly changing dimension is a bit fiddly (you need to keep placeholders for the natural keys of the hierarchy levels and combinations over time).

A good point to start would be a google search on "type 2 slowly changing dimension"

Context

StackExchange Database Administrators Q#8606, answer score: 2

Revisions (0)

No revisions yet.