Tuesday, January 25, 2011

DS - Definitions PART-V

What is Normalization, First Normal Form, Second Normal Form , Third Normal Form?

1. Normalization is process for assigning attributes to entities–Reduces data redundancies–Helps eliminate data anomalies–Produces controlled redundancies to link tables

2. Normalization is the analysis of functional dependency between attributes / data items of user views? It reduces a complex user view to a set of small and stable subgroups of fields / relations

1NF: Repeating groups must be eliminated, Dependencies can be identified, All key attributes defined, No repeating groups in table

2NF: The Table is already in1NF, Includes no partial dependencies–No attribute dependent on a portion of primary key, Still possible to exhibit transitive dependency, Attributes may be functionally dependent on non-key attributes

3NF: The Table is already in 2NF, Contains no transitive dependencies

What is Fact table?
Fact Table contains the measurements or metrics or facts of business process. If your business process is "Sales", then a measurement of this business process such as "monthly sales number" is captured in the Fact table. Fact table also contains the foreign keys for the dimension tables.

What are conformed dimensions?
Answer1: Conformed dimensions mean the exact same thing with every possible fact table to which they are joined Ex: Date Dimensions is connected all facts like Sales facts, Inventory facts. . etc

Answer2: Conformed dimensions are dimensions which are common to the cubes. (cubes are the schemas contains facts and dimension tables)
Consider Cube-1 contains F1, D1, D2, D3 and Cube-2 contains F2, D1, D2, D4 are the Facts and Dimensions here D1, D2 are the Conformed Dimensions

What are the Different methods of loading Dimension tables?
Conventional Load: Before loading the data, all the Table constraints will be checked against the data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly. Later the data will be checked against the table constraints and the bad data won't be indexed.

What is confirmed dimensions?
Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly

What are Data Marts?
Data Marts are designed to help manager make strategic decisions about their business.
Data Marts are subset of the corporate-wide data that is of value to a specific group of users.

There are two types of Data Marts:

1. Independent data marts – sources from data captured form OLTP system, external providers or from data generated locally within a particular department or geographic area.
2. Dependent data mart – sources directly form enterprise data warehouses.

What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a data warehouse. For example: Based on design you can decide to put the sales data in each transaction. Now, level of granularity would mean what detail are you willing to put for each transactional fact. Product sales with respect to each minute or you want to aggregate it upto minute and put that data.

How are the Dimension tables designed?
Most dimension tables are designed using Normalization principles upto 2NF. In some instances they are further normalized to 3NF.
Find where data for this dimension are located.
Figure out how to extract this data.
Determine how to maintain changes to this dimension (see more on this in the next section).

What are non-additive facts?
Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.

What type of Indexing mechanism do we need to use for a typical datawarehouse?
On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes.

To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.

What Snow Flake Schema?
Snowflake Schema, each dimension has a primary dimension table, to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table.

What is real time data-warehousing?
Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data warehousing.
Real-time activity is activity that is happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete, there is data about it.

Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.


What are slowly changing dimensions?
SCD stands for Slowly changing dimensions. Slowly changing dimensions are of three types

SCD1: only maintained updated values.
Ex: a customer address modified we update existing record with new address.

SCD2: maintaining historical information and current information by using
A) Effective Date
B) Versions
C) Flags or combination of these

scd3: by adding new columns to target table we maintain historical information and current information

What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables?

Snapshot facts are semi-additive, while we maintain aggregated facts we go for semi-additive.
EX: Average daily balance

A fact table without numeric fact columns is called factless fact table.
Ex: Promotion Facts

While maintain the promotion values of the transaction (ex: product samples) because this table doesn’t contain any measures.

Differences between star and snowflake schemas?
Star schema - all dimensions will be linked directly with a fat table.
Snow schema - dimensions maybe interlinked or may have one-to-many relationship with other tables.

No comments:

Post a Comment