Slow Changing Dimensions: Three is Just Not Enough

nvisia is an award-winning software development partner driving competitive edge for clients.
When I put on my data modeler hat, I'm always thinking of the appropriate dimension to use when building out star schemas. For the most part, you have one of three common choices; Type 1, 2 or 3. I think we are all familiar with the definition of these types but let's just quickly review. 

As you know, Type 1 is a Slowing Changing Dimension that overwrites the old data with new data. A product dimension with changing prices will always have the most current price. To track history, that is where you can utilize the Type 2 method. Historical records can be kept that typically includes effective dates and sometimes a current flag column. The Type 3 SCD has both the current attribute value as well as the last attribute value in a single record. This method limits historical records and preserves history based on columns as opposed to rows. 

There is also a Type 0 method which is debatable as far as a slowly changing dimension. The value is always the original value inserted and changes are never performed. A date dimension is an example of type 0. 

Ralph Kimball defines Type 4 as a slowly changing dimension used when a group of attributes in a dimension rapidly changes and is split off to a mini-dimension. Also called rapidly changing monster dimension. This method is advantageous for high volatile or frequently used attributes in very large size dimension. So if you know you have attributes that fit this category, split it off to its own physical mini-dimension table. The surrogate keys of both tables are capture as foreign keys to the fact table. An example is separating age band or income level attributes from the Customer base dimension and including in the mini-dimension table. 

So these are the conventional types mostly used for dimension modeling. In the world of hybrid cars and hybrid golf clubs, there are also hybrid SCDs based primarily on the aforementioned methods. Each having its advantages and disadvantages. 

Type 5 is a hybrid of 4 and 1 basically adding a Type 1 dimension table on top of the Type 4 design of base and mini dimensions. The Type 1 dim table is joined directly to the base dim without having to join directly to the fact table. This will hold current state attributes such as current age band or current income level for a customer. This constrains the ETL team to update this Type 1 table as opposed to the whole base table on any changes. 

So now let's get really crazy. Are you ready? Let's put a type 3 attribute on top of a type 2 dimension and update it as a type 1. What do you get? You guessed it, Type 6 (3 + 2 + 1). One example that Kimball provided was tracking historical departments and current departments in a Product dimension. The Product dimension in itself is a type 2 dimension and historical data is preserved with historical records. The ETL strategy is standard processing for type 2. The current department field is the type 3 attribute and will only contain the current value or type 1. The ETL strategy is now different for this field. The ETL will need to update all historical rows for that product with the current department. 

Get it? Got it? Good. Stay with me now as we've come to the last SCD type. 

Type 7 is a different flavor of Type 6 providing similar functionality. The difference being to pull out the type 3 attribute (current department) and putting in its own separate dimension. This current dimension table will have a durable supernatural key meaning the key ID will never change for each given product. The ETL strategy would be to update this table for any changes to capture current values. The fact table will contain dual foreign keys; surrogate key linked to the type 2 dimension (Product table) and the durable supernatural key of the current dimension table. 

So in a nutshell, these are the Slowly Changing Dimension types. Simple yet complex but never restricted to only the three. 

Related Articles