Friday, January 7, 2011

IMP FAQ's

RE: How to handle Date convertions in Datastage? Convert a mm/dd/yyyy format to yyyy-dd-mm?
Converting mm/dd/yyyy format to yyyy-dd-mm  Below is the format
Oconv(Iconv(Filedname D/MDY[2 2 4] ) D-YDM[4 2 2] )

here first Iconv(Filedname D/MDY[2 2 4] ) will convert our given date in the internal format
later Oconv( Inter_date_format D-YDM[4 2 2] ) will convert our internal date format to required yyyy-dd-mm...

we can achieve this conversion using transformer stage conversion functions in PX.
a) "Iconv" function - Internal Convertion.
b) "Oconv" function - External Convertion.
**************************************************************

Types of Parallel Processing?
Hardware wise there are 3 types of parallel processing systems available:
1. SMP (symetric multiprocessing: multiple CPUs, shared memory, single OS)
2. MPP (Massively Parallel Processing Systems: multiple CPUs each having a personal set of resources - memory, OS, etc, but physically housed on the same machine)
3. Clusters: same as MPP, but physically dispersed (not on the same box & connected via high speed networks).
DS offers 2 types of parallelism to take advantage of the above hardware:
1. Pipeline Parallelism
2. Partition Parallelism

*****************************************************************

Functionality of Link Partitioner and Link Collector?

Link Partitioner : It actually splits data into various partitions or data flows using various
partition methods .
OR
It receives data on a single input link and diverts the data to a maximum no.of 64 output links and the data processed by the same stage having same meta data.

Link Collector : It collects the data coming from partitions, merges it into a single data flow
and loads to target.
OR
It will collects the data from 64 inputlinks, merges it into a single data flowand loads to target. these both r active stagesand the design and mode of execution of serverjobs has to be decidead by the designer.

Server jobs mainly execute the jobs in sequential fashion, the ipc stage as well as link partioner and link collector will simulate the parllel mode of execution over the sever jobs having single cpu.

How do you execute datastage job from command line prompt?
Using "dsjob" command as follows.
dsjob -run -jobstatus projectname jobname

Importance of Surrogate Key in Data warehousing?
1.Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is it is independent of underlying database. i.e Surrogate Key is not affected by the changes going on with a database.
if u have any updates to be applied on a database it will be directely effected on datawarehouse part(using surrogate key)
2.Surrogate Key should be system generated number and it should be small integer. For each dimension table depending on the SCD and no of total records expected over a 4 years time you may limit the max number.
This will improve the indexing performance query processing.
3. Primary Key does not allow data duplications for actual source data. We cannot maintain historical data of each record using PK.
SID acts as a primary key in target WH systems that allows data duplications and maintains complete historical data along with current data.

***********************************************************

1 comment: