There have been entire books and methodologies dedicated to dimensional modeling. This is not intended to expand or endorse any particular methodology, but to give a brief overview of dimensional modeling techniques for those who are not familiar with them. Please refer to Dimensional Modeling Basics Part 1, as examples will be expanded upon in the following sections. This installment will focus briefly on “Levels” in dimensions, but more on practical guidance on when to use these dimensional modeling techniques (as well as when not to use them).
Dimensional levels were briefly introduced in “Part 1” of this post. Levels can be literal, physical restrictions, or simply logical ways of grouping attributes. For example, consider the location dimension from “Part 1”. In this dimension, there are physical levels. “Springfield” can only be uniquely identified by state. “Springfield” Missouri is definitely different from “Springfield” in Massachusetts.
On the other hand, the “Levels” are not necessarily fixed in the example of a Type 2 slowly changing dimension such as product. It has “Type” as a logical level, but this type could change over time. Perhaps “Jetta” is only offered as a “Sedan” in some years. Two doors are removed and it is offered as a “Coupe” and as a “Sedan” in others. This would be reflected accurately in dimensions and facts, but in terms of “levels”, the distinction is mostly logical. The “Jetta” no longer fits into one logical level, as it can be considered a “Coupe” or a “Sedan”, perhaps simultaneously. This does not invalidate a logical level. If a user queries on all “Coupe”, they will get “Jetta” for the correct times when it is provided as a “Coupe”. It will also appear as a “Sedan” during appropriate timeframes. The ETL should determine when it is offered as a “Coupe” as opposed to a “Sedan”.
The previous statement brings up an obvious point. At the transactional level of detail, a VIN would be required to uniquely identify a vehicle. Many details would need to be captured at the operational level, including maintenance records, odometer readings, etc. Furthermore, each location has fixed costs, such as mortgage/lease/rent/utility costs that apply to a location but do not apply directly to the sale of an individual vehicle. Determination of costs/profitability was excluded from the example, but these “Measures” could be contained in the same “Sales Fact” or in different facts relating specifically to costs.
Dimensional modeling works well with aggregates and analyzing data at the aggregate level, as pointed out in “Part 1”. If VIN was added as the lowest level of the product dimension, for example, what would be aggregated? Reports collected at the lowest level of detail (such as a specific car) are operational reports by definition.
Far too often, dimensional modeling techniques are applied to operational data. For example, if a report contains a customer’s name, address or phone number, it is operational by definition. Dimensional modeling works well when facts are filtered based on dimensional attributes.
Would it be meaningful to find out how many people with the last name of “Smith” living on “Main Street” bought a certain model of a car? What would one do with that information? “Goleash” on “Main Street” has bought far fewer cars. So, “Goleash” is a bad customer and “Smith” is a good one? Do we need to market more heavily to “Smith” on “Main Street”?
Clearly, these are silly questions that dimensional modeling was not designed to answer. Operational reports, on the other hand, may need to provide these details for call lists, government reporting requirements, etc. Modeling this detail at the dimensional level makes no sense from a performance perspective. Filtering a fact and joining two dimensions to provide reporting detail negates dimensional modeling performance gains. Filtering dimensions to limit the fact rows returned is the intent. Depending on the physical implementation, this results in star transformation plans, bitmap filtering plans, etc. Joining facts (no matter how filtered) to dimensions normally results in hash or nested loops joins (depending on the vendor and the number of rows in the fact and dimensions).
There are additional benefits of dimensional modeling in a relational database context. Almost all BI tools can recognize a properly designed star schema, so reports are easy to design and will perform better than those generated against an operational model. Also, because a physical dimensional model has a smaller size, it is a better candidate for in-memory consumption, including non-relational models involving external flat-file cubes, in-memory grids, etc.
Performance and security of dimensional modeling can be further enhanced by physical partitioning (usually based on the time dimension) and by logical partitioning. Security and logical partitioning would be based on the location dimension, from “Part 1”, along with a previously undefined “Manager” definition. Using a combination of such dimensions would limit a user’s view to “levels” of dimensions and corresponding facts. A specific location manager could only view data for the location, while a regional manager could only view data for the locations within their region, while the COO could view data for all locations. This is just one example of how dimensional modeling fits well with the appropriate physical and logical partitioning strategies. For information regarding physical and logical partitioning strategies, please see my previous posts (and the references provided in them):
Click Here For Dimensional Modeling Basics (Part 1)
For Additional Blog Post please check out NVISIA's Enterprise Data Management Page