Summary: in this article, we will examine the dimension table concept, surrogate keys in dimension tables, and a brief overview of slowly changing dimensions.
In data warehousing, a dimension table is one of the companion tables to a fact table in the star schema. Different from a fact table that contains measures or business facts, a dimension table contains the textual descriptor of the business. The fields of the dimension table are designed to satisfy these two important requirements:
- Query constraining / grouping / filtering.
- Report labeling
Dimension table example
In the schema below we have 3 dimension tables Dim_Date
, Dim_Store
and Dim_Product
surrounding the fact table Fact_Sales
.
Surrogate keys in dimension tables
It is critical that the primary key’s value of a dimension table remains unchanged. And it is highly recommended that all dimension tables use surrogate keys as primary keys.
Surrogate keys are key generated and managed inside the data warehouse rather than keys extracted from data source systems.
There are several advantages of using surrogate keys in dimension tables:
- Performance – join processing between dimension tables and fact table is much more efficient by using a single field surrogate key.
- Integration – in terms of data acquisition, the surrogate key allows integrating data from multiple data sources even if they lack consistent source keys.
- Manage version of data – keep track of changes in dimension field values in the dimension table.
It is so important that the dimension tables should be designed in such a way that they can be shared between multiple data marts and cubes within a data warehouse. This ensures that the data warehouse provides consistent information for similar queries. And surrogate key must be used as the primary keys of dimension tables to enable the dimension tables to be shared easier.
Slowly changing dimension
The attributes of a given record in the dimension table could be changed e.g. product description, shipping address. This phenomenon is known as slowly changing dimension and there are corresponding techniques to deal with each type of slowly changing dimension effectively:
- Type 1 is used when the history of the data is not important. When data in the data source changes, the corresponding dimension attribute is overwritten.
- Type 2 is used when the change of data in the data source is important and you want to preserve the historic context of facts corresponding to the changing data. When data in the data source changes, a new row is inserted into the dimension table. The previous row remains unchanged.
- Type 3 – this happens when you want to learn about every fact before and after the attribute changes. To deal with this, you can introduce a new attribute to the existing row and update the value to both fields.
We have examined how the dimension tables fit into star schema and their relationships with the fact table. We also discussed surrogate keys and slowly changing dimension concepts applied to the dimension table.