Tuesday, January 25, 2011

DS - Definitions PART-VII

What is a Star Schema?
Star schema is a type of organising the tables such that we can retrieve the result from the database easily and fastly in the warehouse environment. Usually a star schema consists of one or more dimension tables around a fact table which looks like a star, so that it got its name.

What is a general purpose scheduling tool?
The basic purpose of the scheduling tool in a DW Application is to stream line the flow of data from Source To Target at specific time or based on some condition.

How do you load the time dimension?
Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. It is not unusual for 100 years to be represented in a time dimension, with one row per day.

What are modeling tools available in the Market?
These tools are used for Data/dimension modeling

Tool Name                  Company Name
Erwin                           Computer Associates
Embarcadero                Embarcadero Technologies
Rational Rose               IBM Corporation
Power Designer            Sybase Corporation
Oracle Designer            Oracle Corporation

Name some of modeling tools available in the Market?
These tools are used for Data/dimension modeling
1. Oracle Designer
2. ERWin (Entity Relationship for windows)
3. Informatica (Cubes/Dimensions)
4. Embarcadero
5. Power Designer Sybase

What is VLDB?
Answer 1: VLDB stands for Very Large Database.
It is an environment or storage space managed by a relational database management system (RDBMS) consisting of vast quantities of information.

Answer 2: VLDB doesn’t refer to size of database or vast amount of information stored. It refers to the window of opportunity to take back up the database.

Window of opportunity refers to the time of interval and if the DBA was unable to take back up in the specified time then the database was considered as VLDB.

What are Data Marts?
A data mart is a focused subset of a data warehouse that deals with a single area(like different department) of data and is organized for quick analysis

What are the steps to build the data warehouse?
Gathering business requirements
Identifying Sources
Identifying Facts
Defining Dimensions
Define Attributes
Redefine Dimensions & Attributes
Organize Attribute Hierarchy & Define Relationship
Assign Unique Identifiers
Additional conventions: Cardinality/Adding ratios

What is Difference between E-R Modeling and Dimensional Modeling?

Basic diff is E-R modeling will have logical and physical model. Dimensional model will have only physical model.

E-R modeling is used for normalizing the OLTP database design.

Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.


Why fact table is in normal form?
Basically the fact table consists of the Index keys of the dimension / look up tables and the measures. So when ever we have the keys in a table, that itself implies that the table is in the normal form.

What are the advantages data mining over traditional approaches?
Data Mining is used for the estimation of future. For example, if we take a company/business organization, by using the concept of Data Mining, we can predict the future of business interms of Revenue (or) Employees (or) Cutomers (or) Orders etc.

Traditional approches use simple algorithms for estimating the future. But, it does not give accurate results when compared to Data Mining.

What are the vaious ETL tools in the Market?
Various ETL tools used in market are:
Informatica
Data Stage
Oracle Warehouse Bulider
Ab Initio
Data Junction

What is a CUBE in datawarehousing concept?
Cubes are logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values.

What is data validation strategies for data mart validation after loading process?
Data validation is to make sure that the loaded data is accurate and meets the business requriments.
Strategies are different methods followed to meet the validation requriments

What is the datatype of the surrgate key?
Datatype of the surrgate key is either inteeger or numaric or number


What is degenerate dimension table?
Degenerate Dimensions : If a table contains the values, which r neither dimesion nor measures is called degenerate dimensions. Ex : invoice id, empno

What is Dimensional Modelling?
Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.

What are the methodologies of Data Warehousing.?
Every company has methodology of their own. But to name a few SDLC Methodology, AIM methodology are stardadly used. Other methodologies are AMM, World class methodology and many more.

What is a linked cube?
Linked cube in which a sub-set of the data can be analysed into great detail. The linking ensures that the data in the cubes remain consistent.

No comments:

Post a Comment