Tuesday, January 11, 2011

DS - Definitions PART-III

What is the main differnce between schema in RDBMS and schemas in DataWarehouse....?

RDBMS Schema
* Used for OLTP systems
* Traditional and old schema
* Normalized
* Difficult to understand and navigate
* Cannot solve extract and complex problems
* Poorly modelled

DWH Schema
* Used for OLAP systems
* New generation schema
* De Normalized
* Easy to understand and navigate
* Extract and complex problems can be easily solved
* Very good model

What is hybrid slowly changing dimension?
Hybrid SCDs are combination of both SCD 1 and SCD 2.
It may happen that in a table, some columns are important and we need to track changes for them i.e capture the historical data for them whereas in some columns even if the data changes, we don't care.
For such tables we implement Hybrid SCDs, where in some columns are Type 1 and some are Type 2.

What are the different architecture of datawarehouse?
There are two main things
1. Top down - (Bill Inmon)
2.Bottom up - (Ralph kimbol)

1.what is incremintal loading?
2.what is batch processing?
3.what is crass reference table?
4.what is aggregate fact table?

Incremental loading means loading the ongoing changes in the OLTP.
Aggregate table contains the [measure] values, aggregated /grouped/summed up to some level of hirarchy.

what is junk dimension? what is the difference between junk dimension and degenerated dimension?
Junk dimension: Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension.
Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table inorder eliminate unneccessary joins while retrieving order information..

What are the possible data marts in Retail sales.?
Product information, sales information

What is the definition of normalized and denormalized view and what are the differences between them?
Normalization is the process of removing redundancies.
Denormalization is the process of allowing redundancies.

What is meant by metadata in context of a Datawarehouse and how it is important?
Meta data is the data about data; Business Analyst or data modeler usually capture information about data - the source (where and how the data is originated), nature of data (char, varchar, nullable, existance, valid values etc) and behavior of data (how it is modified / derived and the life cycle ) in data dictionary a.k.a metadata. Metadata is also presented at the Datamart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.

Differences between star and snowflake schemas?
Star schema A single fact table with N number of Dimension
Snowflake schema Any dimensions with extended dimensions are know as snowflake schema

Difference between Snow flake and Star Schema. What are situations where Snow flake Schema is better than Star Schema to use and when the opposite is true?
Star schema contains the dimesion tables mapped around one or more fact tables.
It is a denormalised model.
No need to use complicated joins.
Queries results fastly.
Snowflake schema
It is the normalised form of Star schema.
contains indepth joins ,bcas the tbales r splitted in to many pieces.We can easily do modification directly in the tables.
We hav to use comlicated joins ,since we hav more tables .
There will be some delay in processing the Query .

What is VLDB?
The perception of what constitutes a VLDB continues to grow. A one terabyte database would normally be considered to be a VLDB.

What's the data types present in BO?n what happens if we implement view in the designer n report
Three different data types: Dimensions,Measure and Detail.
View is nothing but an alias and it can be used to resolve the loops in the universe.

Can a dimension table contains numeric values?
Yes.But those datatype will be char (only the values can numeric/char)

What is the difference between view and materialized view?
View - store the SQL statement in the database and let you use it as a table. Everytime you access the view, the SQL statement executes.
Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that everytime you run the query, the stored result set is used. Pros include quick query results.

What is surrogate key ? where we use it expalin with examples
Surrogate Key is a substitution for the natural primary key.
It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.
Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.

Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may display only the AIRPORT_NAME.

What is ER Diagram?
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views.

Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects.

Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is:  it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables. it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user.

In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.

No comments:

Post a Comment