Monday, January 31, 2011

ETL - Informatica Part III

What are the various methods of getting incremental records or delta records from the source systems?
One foolproof method is to maintain a field called ‘Last Extraction Date’ and then impose a condition in the code saying ‘current_extraction_date > last_extraction_date’.
First Method: If there is a column in the source which identifies the record inserted date. Then it will be easy to put a filter condition in the source qualifier.

Second Method: If there is no record in the source to identify the record inserted date. Then we need to do a target lookup based on the primary key and determine the new record and then insert.

Where do we use connected and un connected lookups
If return port only one then we can go for unconnected. More than one return port is not possible with Unconnected. If more than one return port then go for Connected.

What are the various test procedures used to check whether the data is loaded in the backend, performance of the mapping, and quality of the data loaded in INFORMATICA.
The best procedure to take a help of debugger where we monitor each and every process of mappings and how data is loading based on conditions breaks.

Lets suppose we have some 10,000 odd records in source system and when load them into target.How do we ensure that all 10,000 records that are loaded to target doesn’t contain any garbage values?
Write a sql Statement with the transformation(ETL) logic for all columns.

Create a temporary table with this sql statemnt.
Use this table and do minus with the target table to ensure the data is loaded properly.

If loaded as expected then minus should not return any rows.

What are parameter files ? Where do we use them?
Parameter file defines the value for parameter and variable used in a workflow, worklet or session.
the parameter file defines the consist value of the parameter.parameter files used in mapping ,session,workflow and worklet.

How to determine what records to extract?
When addressing a table some dimension key must reflect the need for a record to get extracted. Mostly it will be from time dimension (e.g. date >= 1st of current mth) or a transaction flag (e.g. Order Invoiced Stat). Foolproof would be adding an archive flag to record which gets reset when record changes.

What is a mapping, session, worklet, workflow, mapplet?
A mapping represents dataflow from sources to targets.
A mapplet creates or configures a set of transformations.

A workflow is a set of instruction sthat tell the Informatica server how to execute the tasks.

A worklet is an object that represents a set of tasks.

A session is a set of instructions that describe how and when to move data from sources to targets.

What is the differecnce between joiner and lookup

Joiner is used to join two or more tables to retrive data from tables(just like joins in sql).

Look up is used to check and compare source table and target table (just like correlated sub-query in sql).

What is fact less fact table? where you have used it in your project?
Factless table means only the key available in the Fact there is no measures available.

factless fact table means that contains only foreign keys with out any measures example: attendance report of employees in a particular company contains no measures only

What are the various test procedures used to check whether the data is loaded in the backend, performance of the mapping, and quality of the data loaded in INFORMATICA.
The best procedure to take a help of debugger where we monitor each and every process of mappings and how data is loading based on conditions breaks.

Go into workflow monitor after showing the status succeed click right button go into the property and you can see there no of source row and success target rows and rejected rows

No comments:

Post a Comment