Shyam S. Reddy: DWH & DS Part II

2. Datastage development and job design

*** 1. Error in Link collector - Stage does not support in-process active-to-active inputs or outputs

To get rid of the error just go to the Job Properties -> Performance and select Enable row buffer.
Then select Inter process which will let the link collector run correctly.
Buffer size set to 128Kb should be fine, however it’s a good idea to increase the timeout.

*** 2. What is the DataStage equivalent to like option in ORACLE

The following statement in Oracle:
select * from ARTICLES where article_name like ‘%WHT080%’;
Can be written in DataStage (for example as the constraint expression):
incol.empname matches ‘...WHT080...’
*** 3. what is the difference between logging text and final text message in terminator stage

Every stage has a ‘Logging Text’ area on their General tab which logs an informational message when the stage is triggered or started.
* Informational - is a green line, DSLogInfo() type message.
* The Final Warning Text - the red fatal, the message which is included in the sequence abort message

*** 4. Error in STPstage - SOURCE Procedures must have an output link

The error appears in Stored Procedure (STP) stage when there are no stages going out of that stage.To get rid of it go to ‘stage properties’ -> ‘Procedure type’ and select Transform

*** 5. How to invoke an Oracle PLSQL stored procedure from a server job

To run a pl/sql procedure from Datastage a Stored Procedure (STP) stage can be used.
However it needs a flow of at least one record to run.
It can be designed in the following way:
* source odbc stage which fetches one record from the database and maps it to one column - for example: select sysdate from dual
* A transformer which passes that record through. If required, add pl/sql procedure parameters as columns on the right-hand side of tranformer’s mapping
* Put Stored Procedure (STP) stage as a destination. Fill in connection parameters, type in the procedure name and select Transform as procedure type. In the input tab select ‘execute procedure for each row’ (it will be run once).

*** 6. Is it possible to run a server job in parallel?

Yes, even server jobs can be run in parallel.
To do that go to ‘Job properties’ -> General and check the Allow Multiple Instance button.
The job can now be run simultaneously from one or many sequence jobs. When it happens datastage will create new entries in Director and new job will be named with automatically generated suffix (for example second instance of a job named JOB_0100 will be named JOB_0100.JOB_0100_2). It can be deleted at any time and will be automatically recreated by datastage on the next run.

*** 7. Error in STPstage - STDPROC property required for stage xxx

The error appears in Stored Procedure (STP) stage when the ‘Procedure name’ field is empty. It occurs even if the Procedure call syntax is filled in correctly.
To get rid of error fill in the ‘Procedure name’ field.

*** 8. Datastage routine to open a text file with error catching

Note! work dir and file1 are parameters passed to the routine.
* open file1
OPENSEQ work_dir : ‘\’ : file1 TO H.FILE1 THEN
CALL DSLogInfo(“******************** File “ : file1 : “ opened successfully”, “JobControl”)
END ELSE
CALL DSLogInfo(“Unable to open file”, “JobControl”)
ABORT

*** 9. Datastage routine which reads the first line from a text file

Note! work dir and file1 are parameters passed to the routine.

* open file1
OPENSEQ work_dir : ‘\’ : file1 TO H.FILE1 THEN
CALL DSLogInfo(“******************** File “ : file1 : “ opened successfully”, “JobControl”)
END ELSE
CALL DSLogInfo(“Unable to open file”, “JobControl”)
ABORT
END

READSEQ FILE1.RECORD FROM H.FILE1 ELSE
Call DSLogWarn(“******************** File is empty”, “JobControl”)
END

firstline = Trim(FILE1.RECORD[1,32],” “,”A”) ******* will read the first 32 chars
Call DSLogInfo(“******************** Record read: “ : firstline, “JobControl”)
CLOSESEQ H.FILE1

END

*** 10. How to test a datastage routine or transform?

To test a datastage routine or transform go to the Datastage Manager.
Navigate to Routines, select a routine you want to test and open it. First compile it and then click ‘Test...’ which will open a new window. Enter test parameters in the left-hand side column and click run all to see the results.
Datastage will remember all the test arguments during future tests.

*** 11. When hashed files should be used? What are the benefits or using them?

Hashed files are the best way to store data for lookups. They’re very fast when looking up the key-value pairs.
Hashed files are especially useful if they store information with data dictionaries (customer details, countries, exchange rates). Stored this way it can be spread across the project and accessed from different jobs.
*** 12. How to construct a container and deconstruct it or switch between local and shared?

To construct a container go to Datastage designer, select the stages that would be included in the container and from the main menu select Edit -> Construct Container and choose between local and shared.
Local will be only visible in the current job, and share can be re-used. Shared containers can be viewed and edited in Datastage Manager under ‘Routines’ menu.
Local Datastage containers can be converted at any time to shared containers in datastage designer by right clicking on the container and selecting ‘Convert to Shared’. In the same way it can be converted back to local.

*** 13. Corresponding datastage data types to ORACLE types?

Most of the datastage variable types map very well to oracle types. The biggest problem is to map correctly oracle NUMBER(x,y) format.
The best way to do that in Datastage is to convert oracle NUMBER format to Datastage Decimal type and to fill in Length and Scale column accordingly.
There are no problems with string mappings: oracle Varchar2 maps to datastage Varchar, and oracle char to datastage char.

*** 14. How to adjust commit interval when loading data to the database?

In earlier versions of datastage the commit interval could be set up in:
General -> Transaction size (in version 7.x it’s obsolete)
Starting from Datastage 7.x it can be set up in properties of ODBC or ORACLE stage in Transaction handling -> Rows per transaction.

*** 15. What is the use of INROWNUM and OUTROWNUM datastage variables?

@INROWNUM and @OUTROWNUM are internal datastage variables which do the following:
* @INROWNUM counts incoming rows to a transformer in a datastage job
* @OUTROWNUM counts oucoming rows from a transformer in a datastage job

These variables can be used to generate sequences, primary keys, id’s, numbering rows and also for debugging and error tracing.
They play similiar role as sequences in Oracle.
If set to 0 the commit will be issued at the end of a successfull transaction.

*** 16. Datastage trim function cuts out more characters than expected

By deafult datastage trim function will work this way:
Trim(“a b c d”) will return “a b c d” while in many other programming/scripting languages “a b c d” result would be expected.
That is beacuse by default an R parameter is assumed which is R - Removes leading and trailing occurrences of character, and reduces multiple occurrences to a single occurrence.
To get the “a b c d” as a result use the trim function in the following way: Trim(“ a b c d “,” “,”B”)

Shyam S. Reddy

Tuesday, August 23, 2011

DWH & DS Part II

No comments:

Post a Comment

Total Pageviews

Popular Posts