Tuesday, January 11, 2011

DS - Definitions PART-I

What are Aggregate tables?
Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions.Retrieving the required data from the actual table, which have millions of records will take more time and also affects the server performance.To avoid this we can aggregate the table to certain required level and can use it.This tables reduces the load in the database server and increases the performance of the query and can retrieve the result very fastly.

What is Dimensional Modelling? Why is it important ?
Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.

Why is Data Modeling Important?
Data modeling is probably the most labor intensive and time consuming part of the development process. Why bother especially if you are pressed for time? A common response by practitioners who write on the subject is that you should no more build a database without a model than you should build a house without blueprints.
The goal of the data model is to make sure that the all data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language , it can be reviewed and verified as correct by the end-users.
The data model is also detailed enough to be used by the database developers to use as a "blueprint" for building the physical database. The information contained in the data model will be used to define the relational tables, primary and foreign keys, stored procedures, and triggers. A poorly designed database will require more time in the long-term. Without careful planning you may create a database that omits data required to create critical reports, produces results that are incorrect or inconsistent, and is unable to accommodate changes in the user's requirements.

What is data mining?
Data mining is a process of extracting hidden trends within a datawarehouse. For example an insurance dataware house can be used to mine data for the most high risk people to insure in a certain geographial area.

What is ODS?
1. ODS means Operational Data Store.
2. A collection of operation or bases data that is extracted from operation databases and standardized, cleansed, consolidated, transformed, and loaded into an enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarized for a data warehouse. The ODS may also be used to audit the data warehouse to assure summarized and derived data is calculated properly. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as there operation databases.

What is a dimension table?
A dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. it contains only the textual attributes.

What is a lookup table?
A lookUp table is the one which is used when updating a warehouse. When the lookup is placed on the target table (fact table / warehouse) based upon the primary key of the target, it just updates the table by allowing only new records or updated records based on the lookup condition.

Why should you put your data warehouse on a different system than your OLTP system?
Answer1: A OLTP system is basically " data oriented " (ER model) and not " Subject oriented "(Dimensional Model) .That is why we design a separate system that will have a subject oriented OLAP system...
Moreover if a complex querry is fired on a OLTP system will cause a heavy overhead on the OLTP server that will affect the daytoday business directly.
Answer2: The loading of a warehouse will likely consume a lot of machine resources. Additionally, users may create querries or reports that are very resource intensive because of the potentially large amount of data available. Such loads and resource needs will conflict with the needs of the OLTP systems for resources and will negatively impact those production systems.

What is ETL?
ETL stands for extraction, transformation and loading.
ETL provide developers with an interface for designing source-to-target mappings, ransformation and job control parameter.
· Extraction
Take data from an external source and move it to the warehouse pre-processor database.
· Transformation
Transform data task allows point-to-point generating, modifying and transforming data.
· Loading
Load data task adds records to a database table in a warehouse.
What does level of Granularity of a fact table signify?
Granularity
The first step in designing a fact table is to determine the granularity of the fact table. By granularity, we mean the lowest level of information that will be stored in the fact table. This constitutes two steps:
Determine which dimensions will be included.
Determine where along the hierarchy of each dimension the information will be kept.
The determining factors usually goes back to the requirements
What is Normalization, First Normal Form, Second Normal Form , Third Normal Form?
1.Normalization is process for assigning attributes to entities–Reducesdata redundancies–Helps eliminate data anomalies–Produces controlledredundancies to link tables
2.Normalization is the analysis offunctional dependency between attributes / data items of userviews?It reduces a complex user view to a set of small andstable subgroups of fields / relations
1NF:Repeating groups must beeliminated, Dependencies can be identified, All key attributesdefined,No repeating groups in table
2NF: The Table is already in1NF,Includes no partial dependencies–No attribute dependent on a portionof primary key, Still possible to exhibit transitivedependency,Attributes may be functionally dependent on non-keyattributes
3NF: The Table is already in 2NF, Contains no transitivedependencies

What is Fact table?
Fact Table contains the measurements or metrics or facts of business process. If your business process is "Sales" , then a measurement of this business process such as "monthly sales number" is captured in the Fact table. Fact table also contains the foriegn keys for the dimension tables.

What are conformed dimensions?
Answer1: Conformed dimensions mean the exact same thing with every possible fact table to which they are joined Ex:Date Dimensions is connected all facts like Sales facts,Inventory facts..etc

Answer2: Conformed dimentions are dimensions which are common to the cubes.(cubes are the schemas contains facts and dimension tables)
Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the Facts and Dimensions here D1,D2 are the Conformed Dimensions

What are the Different methods of loading Dimension tables?
Conventional Load: Before loading the data, all the Table constraints will be checked against the data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against the table constraints and the bad data won't be indexed.

What is confirmed fact?
Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly

What are Data Marts?
Data Marts are designed to help manager make strategic decisions about their business.
Data Marts are subset of the corporate-wide data that is of value to a specific group of users.
There are two types of Data Marts:
1.Independent data marts – sources from data captured form OLTP system, external providers or from data generated locally within a particular department or geographic area.
2.Dependent data mart – sources directly form enterprise data warehouses.

What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a data warehouse. For example: Based on design you can decide to put the sales data in each transaction. Now, level of granularity would mean what detail are you willing to put for each transactional fact. Product sales with respect to each minute or you want to aggregate it upto minute and put that data.

How are the Dimension tables designed?
Most dimension tables are designed using Normalization principles upto 2NF. In some instances they are further normalized to 3NF.
Find where data for this dimension are located.
Figure out how to extract this data.
Determine how to maintain changes to this dimension (see more on this in the next section).

No comments:

Post a Comment