Thursday, June 2, 2011

Different types of Dimensions

Different types of Dimensions

Date and time DimensionIt is created in the very beginning of data warehouse project and has usually a form of calendar with the granularity of a single day. It contains data types attributes , full data descriptions attributes and some additional records(at least one) for inapplicable , incorrect data or data that hasn’t appeared yet.
Time dimension is a kind of data dimension but limited in time like: month or year dimension table. It is necessary if we have fact tables subordinated to a specified period of time.
Those two types of dimensions should be created separately!

Big DimensionsWhen we talk about big dimensions we mean such as a customer or product dimension. Loading it may be quite terrifying because of a big number of records , a big number of fields in every record and their multiple sources( it is placed in many various places in enterprise system).
In this situation there is a need to perform three steps to create a single dimension table:
deduplication step which is a part of data cleaning-module
conforming step which is a part of data-conforming module
merge step which is a part of delivering-module
Taking under consideration the customer dimension we have to remove all duplications to create a correct total number of customers , align all attributes with the same content to create a single one value and finally place all attributes in one big dimension record.

Small DimensionsHave a form of little tables with only a few records and columns like a little transaction-type dimension with labels of each type or junk dimension with transactional codes. Those dimensions are usually unique for a one fact table and should not be built across the various facts tables.

Related DimensionsThe dimensions in the database should be, through the ideology, independent. However there is usually a relationship between them.
If this relationship occurs between big dimensions like product and store the combining them into a one is not recommended , so we make them separately depicting these relationship in the sale fact table. This step of dimensional modeling is called demoting the correlation between dimensions into a fact table.
However if this relationship concerns overlapping dimensions the combining them into a one single dimension is then very recommended.

Dimensional role playingThis situation occurs when a single dimension , often a time dimension, is attached multiple times to the one fact table. For example in an accumulating transaction fact table there are many foreign keys for time dimensions like: order data , deliver date , payment date.
How to resolve this problem? We need to built a generic single dimension table and implement those three views.

Sub DimensionsSometimes there is a relationship between two dimensions. Then one is a sub-dimension of the another. For example we have a data dimension table and customer dimension table which has a foreign key in first purchase date related to the data dimension.

Empty DimensionsDegenerate or empty dimension is a dimension derived from a fact table and it doesn’t have its own dimension table (It is stored in a fact table).For example if we have a dimension with only one record, it is unnecessary and very inconvenient to create it separately. Then we should locate it in a fact table as a degenerate dimension without the foreign key.

Slowly Changing Dimension (SCD)If the data in the dimension tables is slowly changing during the time we say that it is a slowly changing dimension.
There are three major ways of loading and handling these dimensions:
Type one-overwrites old data with a new data and doesn’t keep the history. It is very often used when we want to correct inappropriate data like wrong address or misspelled name.
Type two-tracks the history and create new records in the dimensional table with separate keys. It is unlimited (creates new records each time a change is made) and the most common.
Type three-also tracks a history but create new columns , not records. It is limited by number of columns we design.
Hybrid type-it is a combination of those three mentioned types. For example some fields in the dimension present the type 1 and then there are overwritten while other may present type 2 or 3.
In slowly changing dimension the relationship between the primary (surrogate)key and natural key is one to many However sometimes those changes may be overlooked and we have to correct bad data. Correcting a type 1 fields is very simple-just overwriting but others may be quite difficult.

Bridge TableThe bridge table is also called a helper table or reference table. It is used when there is a many to many relationship between fact and dimension table and it contains only the keys. It can be explained with a following example.
Lets say there are two tables: an employee table and a department table and we create a bridge table. It will contain information which employee belongs to which department. It makes our database more clearly.
The bridge table is also used for another purpose-to flatten out the hierarchy in the dimension table.

No comments:

Post a Comment