patternMinor
"Measure Type Dimensions" in "Accumulating Snapshot" fact table
Viewed 0 times
factaccumulatingtypesnapshotdimensionsmeasuretable
Problem
I have an accumulating snapshot fact table that tracks the entry and exit of containers in a terminal.
The containers can enter and exit in 3 different ways, so I thought to create a specific dimension table that lists these 3 possible ways (train, vessel or truck).
Then I read this article which basically says that this technique is wrong, but I can't understand why.
First article:
Sometimes when a fact table has a long list of facts that is sparsely populated in any individual row, it is tempting to create a measure
type dimension that collapses the fact table row down to a single
generic fact identified by the measure type dimension. We generally do
not recommend this approach. Although it removes all the empty fact
columns, it multiplies the size of the fact table by the average
number of occupied columns in each row, and it makes intra-column
computations much more difficult. This technique is acceptable when
the number of potential facts is extreme (in the hundreds), but less
than a handful would be applicable to any given fact table row.
I understand that if a "Measure Type Dimension" is implemented for a transaction fact table it can create problems like this other article says, but I can't see any downside if used for an accumulating snapshot fact.
Second article: (some downsides of implementing a "Measure Type Dimension")
measures, we can’t add them up.
point of view, this design is a rubbish.
Response to Mark Storey-Smith's answer
Very nice approach, i would have never thought about that.
Another thing: every entry and exit of a vehicle that brings container into the terminal has a u
The containers can enter and exit in 3 different ways, so I thought to create a specific dimension table that lists these 3 possible ways (train, vessel or truck).
Then I read this article which basically says that this technique is wrong, but I can't understand why.
First article:
Sometimes when a fact table has a long list of facts that is sparsely populated in any individual row, it is tempting to create a measure
type dimension that collapses the fact table row down to a single
generic fact identified by the measure type dimension. We generally do
not recommend this approach. Although it removes all the empty fact
columns, it multiplies the size of the fact table by the average
number of occupied columns in each row, and it makes intra-column
computations much more difficult. This technique is acceptable when
the number of potential facts is extreme (in the hundreds), but less
than a handful would be applicable to any given fact table row.
I understand that if a "Measure Type Dimension" is implemented for a transaction fact table it can create problems like this other article says, but I can't see any downside if used for an accumulating snapshot fact.
Second article: (some downsides of implementing a "Measure Type Dimension")
- [...] If we go with a "Measure Type Dimension" we will lose this analytic ability. If one measure is not compatible with the other
measures, we can’t add them up.
- [...] The more number of passes our SQL need to run to produce a report, the slower the report.
- [...] On the BI tool, if you don’t put the measure type filter, you are risking the user getting “rubbish information”. From usability
point of view, this design is a rubbish.
Response to Mark Storey-Smith's answer
Very nice approach, i would have never thought about that.
Another thing: every entry and exit of a vehicle that brings container into the terminal has a u
Solution
I believe the guidance is referring to a wide fact table where the majority of measure values are null:
The suggestion is that some people will see all the nulls and decide to do this instead:
Not good.
In your scenario I think I'd be looking at something like this, which is very different to the scenario described in the articles you referenced.
To the additional questions...
I would add
I would add 3 new dimensions for
CREATE TABLE dbo.SparseFact
(
Dim1Key INT NOT NULL
, Dim2Key INT NOT NULL
, Dim3Key INT NOT NULL
, Dim4Key INT NOT NULL
, Dim5Key INT NOT NULL
, Value1 INT NULL
, Value2 INT NULL
, Value3 INT NULL
, Value4 INT NULL
, Value5 INT NULL
, Value6 INT NULL
, Value7 INT NULL
, Value8 INT NULL
..
, Value101 INT NULL
, Value102 INT NULL
, Value103 INT NULL
);The suggestion is that some people will see all the nulls and decide to do this instead:
CREATE TABLE dbo.DontDoThisFact
(
Dim1Key INT NOT NULL
, Dim2Key INT NOT NULL
, Dim3Key INT NOT NULL
, Dim4Key INT NOT NULL
, Dim5Key INT NOT NULL
, MeasureTypeKey INT NOT NULL
, Value INT NOT NULL
);Not good.
In your scenario I think I'd be looking at something like this, which is very different to the scenario described in the articles you referenced.
CREATE TABLE dbo.InventoryFact
(
ContainerKey INT NOT NULL
, TransportTypeKey TINYINT NOT NULL
, EntryDateTime DATETIME NULL
, ExitDateTime DATETIME NULL
);
CREATE TABLE dbo.TransportType
(
TransportTypeKey TINYINT IDENTITY(1,1) NOT NULL
, EntryTransport CHAR(10) NOT NULL
, ExitTransport CHAR(10) NOT NULL
);
INSERT
dbo.TransportType
SELECT
EntryTransport
, ExitTransport
FROM
(
SELECT EntryTransport = 'Train'
UNION
SELECT EntryTransport = 'Truck'
UNION
SELECT EntryTransport = 'Vessel'
UNION
SELECT EntryTransport = 'N/A'
UNION
SELECT EntryTransport = 'Unknown'
) en
CROSS JOIN
(
SELECT ExitTransport = 'Train'
UNION
SELECT ExitTransport = 'Truck'
UNION
SELECT ExitTransport = 'Vessel'
UNION
SELECT ExitTransport = 'N/A'
UNION
SELECT ExitTransport = 'Unknown'
) ex;To the additional questions...
I would add
ExpectedEntryDate, ExpectedExitDate to the Container/InventoryFact. Less certain, without visibility of all the data elements, I would probably put EntryVoyageId and ExitVoyageId in a separate junk dimension together as one row along with any other degenerate data items (identifiers for the truck, train etc).I would add 3 new dimensions for
VesselVoyage, TruckVoyage and TrainVoyage and 6 Voyage keys (inbound/outbound) to this one fact (it's 6 new keys, not 6 additional rows). You then have the option of placing Dock and Tollbooth in the appropriate Voyage dimension. If you keep the generic data in these dimensions (VesselFlag, TruckCapacity) and the specific in a junk dimension (VesselName, VesselMMSI) they won't explode in size.Code Snippets
CREATE TABLE dbo.SparseFact
(
Dim1Key INT NOT NULL
, Dim2Key INT NOT NULL
, Dim3Key INT NOT NULL
, Dim4Key INT NOT NULL
, Dim5Key INT NOT NULL
, Value1 INT NULL
, Value2 INT NULL
, Value3 INT NULL
, Value4 INT NULL
, Value5 INT NULL
, Value6 INT NULL
, Value7 INT NULL
, Value8 INT NULL
..
, Value101 INT NULL
, Value102 INT NULL
, Value103 INT NULL
);CREATE TABLE dbo.DontDoThisFact
(
Dim1Key INT NOT NULL
, Dim2Key INT NOT NULL
, Dim3Key INT NOT NULL
, Dim4Key INT NOT NULL
, Dim5Key INT NOT NULL
, MeasureTypeKey INT NOT NULL
, Value INT NOT NULL
);CREATE TABLE dbo.InventoryFact
(
ContainerKey INT NOT NULL
, TransportTypeKey TINYINT NOT NULL
, EntryDateTime DATETIME NULL
, ExitDateTime DATETIME NULL
);
CREATE TABLE dbo.TransportType
(
TransportTypeKey TINYINT IDENTITY(1,1) NOT NULL
, EntryTransport CHAR(10) NOT NULL
, ExitTransport CHAR(10) NOT NULL
);
INSERT
dbo.TransportType
SELECT
EntryTransport
, ExitTransport
FROM
(
SELECT EntryTransport = 'Train'
UNION
SELECT EntryTransport = 'Truck'
UNION
SELECT EntryTransport = 'Vessel'
UNION
SELECT EntryTransport = 'N/A'
UNION
SELECT EntryTransport = 'Unknown'
) en
CROSS JOIN
(
SELECT ExitTransport = 'Train'
UNION
SELECT ExitTransport = 'Truck'
UNION
SELECT ExitTransport = 'Vessel'
UNION
SELECT ExitTransport = 'N/A'
UNION
SELECT ExitTransport = 'Unknown'
) ex;Context
StackExchange Database Administrators Q#151547, answer score: 3
Revisions (0)
No revisions yet.