Wednesday, August 24, 2011

DS - Useful QA's - Part II

10. Subject: SUGSEGV error
Ans1: When ever you have upgraded from v7 to v7.5 if a job Contains Transformer you need to do force compile (Go to Designer--->File Menu select the Force Compile the job and try to run the job. It should be ok. If Still you have a problem set this Environmental variable to True ($APT_DISABLE_COMBINATION=TRUE).

Ans2: Up to my findings your target table should be empty and there shouldn’t be any single record in the table then you will not get this error

*************************************

11. Convert a Julian date format to normal date format.
Example: My input is 2010078 now I want to convert this into 03/19/2010 format.

In datastage there is a standard function for this “DateFromJulianDay” but some how not able to use this. I need this for Server Jobs

Ans > Use iconv(coln,d2/) function in the derivation and you will get the result

Ans>>> you can do this conversion in a transformer stage. In the derivation of the concerned date field, use the iconv/oconv functions which might help you on this. I have experience only in PX/EE so not sure how you can achieve this in server job but you can definitely give it. I think it should work.

*****************************

11. Subject: main_program: ORCHESTRATE step execution terminating due to SIGINT

Getting following error message when I run parallel job. “main_program: ORCHESTRATE step execution terminating due to SIGINT”. This job loads data from dataset into sql server table?

Ans: Does your job generating more than 50 warnings?

If you want to know the exact reason, implant the below env variable:

1) Set the environment variable APT_DISABLE_COMBINATION to “true” and

Re-run to locate which stage is causing your abort...

******************************

12. How to get a user defined sql query file using Dynamic RDBMS stage in Datastage 8.1 version or can we use any other stage.

Ans: You can write your SQL query in SQL tab
output--> General Tab--> select User defined query--> SQL tab -- here you can write your query and make sure to add table definition (Columns tab--> Load) for your query once you finish writing your query


2Ans: Enter the pathname for this file instead of the text for the query. With this choice, you can edit the SQL statements.

Guess is to call the SQL Script including path name under Output tab->SQL->User-defined..

Under Outputab-->General->Provide table name and make sure that you have selected User-defined SQL Query file...

3Ans: Find the below navigation:

RDMBS-->Under Output Tab --> General Tab --> Click on QUERY TYPE drop down list were you can found the below options:

1.Generated SQL query.
2.User-defined SQL Query
3.User-defined SQL Query File

******************************

13. Created sequencers that run jobs. Want to create exception handler. Is there any general framework to create exception in DS? Are there any generic step by step ways?

Ans: You can always include the Exception Handler Activity followed by Notification stage or a command Activity to do the job.
Each job can be accompanied by a Terminator on the case of Failure, so that It reaches the control Exception handler at once.

2Ans: There is an exception handler stage available in the pallete when you create a sequence job. Its function is to handle any exception that are not explicitly handled in your job design.

******************************

14. When running a job, I get the following errors,

PS_JOB,0: [IIS-CONN-DB2-000004] DB2 function SQLFetch failed: SQLSTATE = 21000: Native Error Code = -811: Msg = [IBM][CLI Driver][DB2/6000] SQL0811N The result of a scalar fullselect, SELECT INTO statement, or VALUES INTO statement is more than one row. SQLSTATE=21000 (CC_DB2DBStatement::dbsFetch, file CC_DB2DBStatement.cpp, line 1448)

But when I view data in DB2 connector stage, it is ok, and then I copy that SQL to run in DB2’s command line, it is ok too.

Ans: According to the error message, your query returns more than one row.
Of course, this cannot be handled safely. If you don’t care about which
row to use, then you could apply the FETCH FIRST 1 ROW ONLY clause. If the
row matters, find the correct one.

********************************

15. In my datastage parallel job I want to get the number of rows in a dataset. I do not want care about columns (say I do not know the columns present in the dataset) present in the dataset and I do not want to pass through the records in the dataset to any stages and get the count from the link. Just if I given with the dataset I have to get the count of records present in it. Will that be possible. If so please help me how to find it.

Ans: use the linkinfo option to get the same.

dsjob -linkinfo <project> <job > <stage> <link>

Job - Job in which the dataset is been loaded.
Stage - Dataset Stage name
Link- Link Name.

2Ans: You can pass the rows to aggregator stage and use one unique field as a key, this will give you the row count.

3Ans: From the Designer use the DataSet Management tool, from the command
line use orchadmin (could also call this from the command stage or
sequential file stage.

4Ans: dsrecords is command-line utility which lists number of records in a dataset..

1.create a sample job wit out any stages & add job parameter it should be your dataset name.

2. write a script which will have the dsrecords command-line utility & call this script the ExecSH Command in before sub-routinue..

*******************************


Source: http://it.toolbox.com


Shyam S Reddy

DS - Useful QA's - Part I

1. There is 10 years records and I want only Last months record from that then what to do?

Ans1: Select the SYSDATE and trunc for month and subrtract minus 1. And check for less than 2. Hope its a database which has data field in it.

Ans2: Using field date functions with IF, THEN, ELSE in transformer after sequential file (Or)
Using External filter Stage with grep command
(Or)
Using filter command in sequential file stage
****************************

2. We are getting it on Insert into table sh: dbx: not found Operator terminated abnormally: received signal SIGSEGV

Ans: SIGSEGV is the UNIX signal meaning “Segmentation Violation”, an attempt to access a memory address not owned by the process Run out of memory.

Lets Try The Below Solution’s may help out:

1) Change the jobs to use Join Stage or Merge Stage instead of lookup. Try to Handle the file Sizes

2) Some Time Job will run successfully but some time not. In this case handling ‘NULL’ or Data Type Conversions.

****************************

3. When running datastage job IN UNIX, getting the below Error, I am running the dsenv also before running the job.

$dsjob -run PROJECT_NAME DSJOJ_NAME
Error running job  Status code = -99 DSJE_REPERROR ??

1Ans: Its an Repository error.
Check the project and Job name for any type.
$DSHOME/bin/dsjob -run -jobstatus PROJECT_NAME DSJOJ_NAME
Pass parameter is any mandatory

**********************************************
4. I have a requirement of getting random “n” records from a text file and that “n” would be present in another table. For example if my original input file is input.txt then the random records to be fetched would be present in number.txt.

1Ans: You could count your input file, divide the count by “n” to give you x, then take every “x” record by input rownum in a transform stage.

2Ans: Or assign a Random partition and choose only one partition till you reach ‘n’ records.

****************************************************

5: How to lock \unlock the jobs as Admin?

1Ans: Open DataStage Director
click on Job
click on CleanupResources
there you can see a window.
select “Show all”
 
You can see your user id and then logout from there

2Ans: If the CleanUp resource option is diabled, you should ask you adminstrator to enable the Admistration over the Director to get that done.
And the other option should be executed in “Command” option from the admisntrator.

************************************

6: How can we use shell script in the middle of job?? I can get record count from datastage job and then How can I assign parameter to seperate records.
To make my question more clear, If I have 3 records, then count=3. So, the file should create 3 seperate text files and each file contain only one record.

1Ans: First you would have calculated the total number of records. You create a records based on some or you key column with a comma seperated in a file.

get: key1,key2,key3

write it into a file and read it in Start loop activity or pass it as parameter to start loop activity.
Call the actual job, and pass these keys as parameter.
In transformer if the key is key1 by using job parmeters and pass that particular record to output.

Pass the parameter to filename. like File#parameter#.txt
So you will get each record in each file based on the keys.

Source: Deepak – ITToolbox.com

*******************************************************

7: DataStage never shows the log until the script invoked by it is finished. Sometimes, the script is hanging there and seems to be doing nothing. so I’d like to know if there is anywhere DataStage puts the execution detail(progress or echo ) into?

find some files in dstmp for this, but I doubt that.

1Ans: Using Execute command.
yep, Echo would get printed in the log of DataStage, but I wanted to see this log before the Execute command stage finishes, coz sometimes the job is hanging there without this Execute command stage complete, I’d like to find out what the problem is.

******************************************************

8: table with 1-Column and N-Rows now I need to collect all the rows and insert together in other table separated by comas.

1Ans: Concatinate in Transformer with previous row and pass only the last row for that set.

2Ans: Use stage variable or write sql query( Using analytical function, if you are using oracle as database) in database stage that you are using.


*********************************

Subject: Unable to execute dsjob command
Using Datastage7.5 Server job. Executed a job from command line, following are the steps

Step1 cd /Ascential/Ascential/DataStage/DSEngine
Step 2 . ./dsenv
Step 3 cd bin
Step 4 dsjob -run taxdev Enp_Load
taxdev is the project name
Enp_Load is the job
I am getting an error ---- ksh: dsjob: cannot execute

What is the issue?

Ans: First check if you have dsjob exe in that given path.
Do check if you are in the current DS Home. You can check it by cat /.dshome

I executed cat /.dshome & I got /Ascential/Ascential/DataStage/DSEngine .
dsjob is present in /Ascential/Ascential/DataStage/DSEngine/bin
dsjob shows zero bytes ,what should it consists of? Its a command or a executable file? Could you please help me in knowing this.

Yes? it is a executable. You would just find binary information in the file.
It shouldn’t be zero byte.
See how it got corrupted. And try to restore it.
Do see if that’s the only file got corrupted too.

Source : ITToolbox.com

Tuesday, August 23, 2011

Health Benefits of Drinking Water

“I'm dying of thirst!"

Well, you just might. It sounds so simple. H20 - two parts hydrogen and one part oxygen. This substance also known as water, is the most essential element, next to air, to our survival. Water might be everywhere, but one must never take it for granted.
Water makes up more than two thirds of human body weight, and without water, we would die in a few days. The human brain is made up of 95% water, blood is 82% and lungs 90%. A mere 2% drop in our body's water supply can trigger signs of dehydration: fuzzy short-term memory, trouble with basic math, and difficulty focusing on smaller print, such as a computer screen. (Are you having trouble reading this? Drink up!) Mild dehydration is also one of the most common causes of daytime fatigue. An estimated seventy-five percent of Americans have mild, chronic dehydration. Pretty scary statistic for a developed country where water is readily available through the tap or bottle water.

Water makes up more than two thirds of human body weight, and without water, we would die in a few days.

Water is important to the mechanics of the human body. The body cannot work without it, just as a car cannot run without gas and oil. In fact, all the cell and organ functions that make up our entire anatomy and physiology depend on water for their functioning.
  • Water serves as a lubricant
  • Water forms the base for saliva
  • Water forms the fluids that surround the joints.
  • Water regulates the body temperature, as the cooling and heating is distributed through perspiration.
  • Water helps to alleviate constipation by moving food through the intestinal tract and thereby eliminating waste - the best detox agent.
  • Water helps to regulate metabolism
Drinking eight glasses of water daily can decrease the risk of colon cancer by 45%, bladder cancer by 50% and it can potentially even reduce the risk of breast cancer.

 In addition to the daily maintenance of   our bodies, water also plays a key role in the prevention of disease. Drinking eight glasses of water daily can decrease the risk of colon cancer by 45%, bladder cancer by 50% and it can potentially even reduce the risk of breast cancer. And those are just a few examples! As you follow other links on our website, you can read more in depth about how water can aid in the prevention and cure of many types of diseases, ailments and disorders that affect the many systems of our bodies.
Since water is such an important component to our physiology, it would make sense that the quality of the water should be just as important as the quantity. Drinking water should always be clean and free of contaminants to ensure proper health and wellness.



When you were a kid in school, you learned that each molecule of water is made up of two hydrogen atoms and one oxygen atom. You may also have learned that it was great fun to fill up your squirt guns with water, at least until the principal caught you. What you may not have learned, however, was that you need water to be a healthy human being.
Why You Need to Drink Water
Your body is estimated to be about 60 to 70 percent water. Blood is mostly water, and your muscles, lungs, and brain all contain a lot of water. Your body needs water to regulate body temperature and to provide the means for nutrients to travel to all your organs. Water also transports oxygen to your cells, removes waste, and protects your joints and organs.
Signs of Dehydration
You lose water through urination, respiration, and by sweating, and if you are very active, you lose more water than if you are sedentary. Diuretics, such as caffeine pills and alcohol, result in the need to drink more water because they induce fluid loss.
Symptoms of mild dehydration include chronic pains in joints and muscles, lower back pain, headaches and constipation. A strong odor to your urine, along with a yellow or amber color, indicates that you may not be getting enough water. Note that riboflavin, a B vitamin, will make your urine bright yellow. Certain medications can change the color of urine as well. Thirst is an obvious sign of dehydration, and in fact, you need water before you feel thirsty.

25 REASONS TO GET RUNNING:

Running is one of the best forms of exercise, not only for getting fit but for weight control too. A 30-minute run, three to four times a week, trims your waistline, strengthens the cardiovascular system, improves muscle tone and definition and helps foster a positive mental attitude. Running can also be a great social activity. Here are the top 25 tips on why you should start running today.
  1. Running is the quickest way to cardiovascular fitness. You get fitter faster by running regularly.
  2. Running is one of the cheapest forms of exercising. You only need a pair of running shoes and shorts and T-shirt to get started.
  3. Running is easy to learn. Most people have run at some point in their life, either competitively or at school/college, so there is no steep learning curve involved.
  4. Running makes you feel good, not only physically, but mentally as well.
  5. Running is excellent for reducing stress levels. There is nothing better than a run to clear your head and concentrate the mind. Many great ideas have been formulated on training runs.
  6. Running is one of the most flexible methods of training around. Wherever you are in the world, it is nearly always possible to get out for a run. You can jog, steady run or sprint depending on how you feel that particular day.
  7. Running is good for your heart. It not only strengthens the heart but also reduces the actual resting heart rate.
  8. Running is accessible to all. As long as you are physically able, you can go for a run virtually anytime and anywhere. There are no race barriers to running.
  9. Running helps you to lose weight and tone muscles. The more your train and run, the more weight you will lose. Also by toning up, you are getting rid of the excess fat and converting it to muscle.
  10. Running is for people of all ages. Old or young, everyone can benefit from running.
  11. It is good for your sex life! By being more physically active you increase your sex drive.
  12. When run training, you can get away with eating chocolate and legitimately say it is part of your training program.
  13. Running is a great way to recover from a hangover. A run will help clear your head faster than sitting around feeling terrible.
  14. Running is a great safety aid. Being able to run away from people puts you in a much better position if you are chased.
  15. You can use running to change other people's lives. Run for a charity and raise money to help the needy or less able.
  16. Running with friends gives you time to catch up on gossip and get fit at the same time.
  17. Running is a time-efficient way to get fit and lose weight so you have more free time after exercising to enjoy yourself.
  18. Running is a challenge that can be motivational and bring great personal satisfaction.
  19. Running is a great way to see a new place if you have limited time available for sightseeing. Just think how much more of a place you can see when running around it compared to walking.
  20. Running improves your concentration as you are forced to concentrate more when running. This in turn improves your concentration when you have to focus in other areas of life.
  21. You nearly always work better after a run because you are still on a runner's high and feel ready for anything.
  22. Running is a great way to meet people, especially if you join a club or enter a race.
  23. There are no monthly subscriptions with running. You just put your running shoes on and go.
  24. Endurance built-up through running can be transferred to other sports.
  25. If you drink alcohol you can just run a little bit further the next day to work off the extra calories.


Shyam S Reddy

DWH - DS OSH Script

Datastage OSH Script

The IBM InfoSphere DataStage and QualityStage Designer client creates IBM
InfoSphere DataStage jobs that are compiled into parallel job flows, and reusable
components that execute on the parallel Information Server engine. It allows you
to use familiar graphical point-and-click techniques to develop job flows for
extracting, cleansing, transforming, integrating, and loading data into target files,
target systems, or packaged applications.

The Designer generates all the code. It generates the OSH (Orchestrate SHell
Script) and C++ code for any Transformer stages used.
Briefly, the Designer performs the following tasks:

_ Validates link requirements, mandatory stage options, transformer logic, etc.
_ Generates OSH representation of data flows and stages (representations of
framework “operators”).
_ Generates transform code for each Transformer stage which is then compiled
into C++ and then to corresponding native operators.
_ Reusable BuildOp stages can be compiled using the Designer GUI or from
the command line.
Here is a brief primer on the OSH:--> Comment blocks introduce each operator, the order of which is determined by
the order stages were added to the canvas.
--> OSH uses the familiar syntax of the UNIX shell. such as Operator name,
schema, operator options (“-name value” format), input (indicated by n<
where n is the input#), and output (indicated by the n> where n is the
output #).
--> For every operator, input and/or output data sets are numbered sequentially
starting from zero.
--> Virtual data sets (in memory native representation of data links) are
generated to connect operators.


Framework (Information Server Engine) terms and DataStage terms have
equivalency. The GUI frequently uses terms from both paradigms. Runtime
messages use framework terminology because the framework engine is where
execution occurs. The following list shows the equivalency between framework
and DataStage terms:
--> Schema corresponds to table definition
--> Property corresponds to format
--> Type corresponds to SQL type and length
--> Virtual data set corresponds to link
--> Record/field corresponds to row/column
--> Operator corresponds to stage

Note: The actual execution order of operators is dictated by input/output designators, and not by their placement on the diagram. The data sets connect the OSH operators. These are “virtual data sets”, that is, in memory data flows. Link names are used in data set names — it is therefore good practice to give the links meaningful name.


Source: http://datastage-tutorials.blogspot.com/2009/09/1.html

Shyam S Reddy

DWH & DS Part III (Contd)

*** 17. Database update actions in ORACLE stage

The destination table can be updated using various Update actions in Oracle stage. Be aware of the fact that it’s crucial to select the key columns properly as it will determine which column will appear in the WHERE part of the SQL statement. Available actions:
* Clear the table then insert rows - deletes the contents of the table (DELETE statement) and adds new rows (INSERT).
* Truncate the table then insert rows - deletes the contents of the table (TRUNCATE statement) and adds new rows (INSERT).
* Insert rows without clearing - only adds new rows (INSERT statement).
* Delete existing rows only - deletes matched rows (issues only the DELETE statement).
* Replace existing rows completely - deletes the existing rows (DELETE statement), then adds new rows (INSERT).
* Update existing rows only - updates existing rows (UPDATE statement).
* Update existing rows or insert new rows - updates existing data rows (UPDATE) or adds new rows (INSERT). An UPDATE is issued first and if succeeds the INSERT is ommited.
* Insert new rows or update existing rows - adds new rows (INSERT) or updates existing rows (UPDATE). An INSERT is issued first and if succeeds the UPDATE is ommited.
* User-defined SQL - the data is written using a user-defined SQL statement.
* User-defined SQL file - the data is written using a user-defined SQL statement from a file.

*** 18. Use and examples of ICONV and OCONV functions?

ICONV and OCONV functions are quite often used to handle data in Datastage.
ICONV converts a string to an internal storage format and OCONV converts an expression to an output format.
Syntax:
Iconv (string, conversion code)
Oconv(expression, conversion )

Some useful iconv and oconv examples:
Iconv(“10/14/06”, “D2/”) = 14167
Oconv(14167, “D-E”) = “14-10-2006”

*** 19. ERROR 81021 Calling subroutine DSR_RECORD ACTION=2

Error message:

DataStage Repository Interface:
Error calling subroutine: DSR_RECORD (Action=2);
check DataStage is set up correctly in project
Development (Internal Error (81021))

Datastage system help gives the following error desription:
SYS.HELP. 081021
MESSAGE.. dsrpc: Error writing to Pipe.


The problem appears when a job sequence is used and it contains many stages (usually more than 10) and very often when a network connection is slow.

Basically the cause of a problem is a failure between DataStage client and the server communication.

The solution to the issue is:
# Do not log in to Datastage Designer using ‘Omit’ option on a login screen. Type in explicitly username and password and a job should compile successfully.
# execute the DS.REINDEX ALL command from the Datastage shell - if the above does not help


Oconv(14167, “D DMY[,A,]”) = “14 OCTOBER 2006”
Oconv(12003005, “MD2$,”) = “$120,030.05”

That expression formats a number and rounds it to 2 decimal places:
Oconv(L01.TURNOVER_VALUE*100,”MD2”)

Iconv and oconv can be combined in one expression to reformat date format easily:
Oconv(Iconv(“10/14/06”, “D2/”),”D-E”) = “14-10-2006”

*** 20. How to check Datastage internal error descriptions


# To check the description of a number go to the datastage shell (from administrator or telnet to the server machine) and invoke the following command: SELECT * FROM SYS.MESSAGE WHERE @ID=‘081021’; - where in that case the number 081021 is an error number

The command will produce a brief error description which probably will not be helpful in resolving an issue but can be a good starting point for further analysis.


*** 21. Error timeout waiting for mutex


The error message usually looks like follows:
... ds_ipcgetnext() - timeout waiting for mutex

There may be several reasons for the error and thus solutions to get rid of it.
The error usually appears when using Link Collector, Link Partitioner and Interprocess (IPC) stages. It may also appear when doing a lookup with the use of a hash file or if a job is very complex, with the use of many transformers.

There are a few things to consider to work around the problem:
- increase the buffer size (up to to 1024K) and the Timeout value in the Job properties (on the Performance tab).
- ensure that the key columns in active stages or hashed files are composed of allowed characters – get rid of nulls and try to avoid language specific chars which may cause the problem.
- try to simplify the job as much as possible (especially if it’s very complex). Consider splitting it into two or three smaller jobs, review fetches and lookups and try to optimize them (especially have a look at the SQL statements).

*** 22. ERROR 30107 Subroutine failed to complete successfully

Error message:
Error calling subroutine:
DSR_RECORD (Action=2); or *DataStage*DSR_SELECT (Action=7);
check DataStage is set up correctly in project Development
(Subroutine failed to complete successfully(30107))

Datastage system help gives the following error desription:
SYS.HELP. 930107
MESSAGE.. DataStage/SQL: Illegal placement of parameter markers

The problem appears when a project is moved from one project to another (for example when deploying a project from a development environment to production).
The solution to the issue is:
# Rebuild the repository index by executing the DS.REINDEX ALL command from the Datastage shell

*** 23. Datastage Designer hangs when editing job activity properties

The appears when running Datastage Designer under Windows XP after installing patches or the Service Pack 2 for Windows.
After opening a job sequence and navigating to the job activity properties window the application freezes and the only way to close it is from the Windows Task Manager.

The solution of the problem is very simple. Just Download and install the “XP SP2 patch” for the Datastage client.
It can be found on the IBM client support site (need to log in):
https://www.ascential.com/eservice/public/welcome.do

Go to the software updates section and select an appropriate patch from the Recommended DataStage patches section.
Sometimes users face problems when trying to log in (for example when the license doesn’t cover the IBM Active Support), then it may be necessary to contact the IBM support which can be reached at WDISupport@us.ibm.com

*** 24. Can Datastage use Excel files as a data input?

# Microsoft Excel spreadsheets can be used as a data input in Datastage. Basically there are two possible approaches available: Access Excel file via ODBC - this approach requires creating an ODBC connection to the Excel file on a Datastage server machine and use an ODBC stage in Datastage. The main disadvantage is that it is impossible to do this on an Unix machine. On Datastage servers operating in Windows it can be set up here:
Control Panel --> Administrative Tools --> Data Sources (ODBC) --> User DSN --> Add --> Driver do Microsoft Excel (.xls) --> Provide a Data source name --> Select the workbook --> OK

# Save Excel file as CSV - save data from an excel spreadsheet to a CSV text file and use a sequential stage in Datastage to read the data.



Source: http://datastage-tutorials.blogspot.com/2009/09/1.html

Shyam S Reddy
 

DWH & DS Part II

2. Datastage development and job design

*** 1. Error in Link collector - Stage does not support in-process active-to-active inputs or outputs

To get rid of the error just go to the Job Properties -> Performance and select Enable row buffer.
Then select Inter process which will let the link collector run correctly.
Buffer size set to 128Kb should be fine, however it’s a good idea to increase the timeout.

*** 2. What is the DataStage equivalent to like option in ORACLE

The following statement in Oracle:
select * from ARTICLES where article_name like ‘%WHT080%’;
Can be written in DataStage (for example as the constraint expression):
incol.empname matches ‘...WHT080...’

*** 3. what is the difference between logging text and final text message in terminator stage

Every stage has a ‘Logging Text’ area on their General tab which logs an informational message when the stage is triggered or started.
* Informational - is a green line, DSLogInfo() type message.
* The Final Warning Text - the red fatal, the message which is included in the sequence abort message

*** 4. Error in STPstage - SOURCE Procedures must have an output link

The error appears in Stored Procedure (STP) stage when there are no stages going out of that stage.To get rid of it go to ‘stage properties’ -> ‘Procedure type’ and select Transform

*** 5. How to invoke an Oracle PLSQL stored procedure from a server job

To run a pl/sql procedure from Datastage a Stored Procedure (STP) stage can be used.
However it needs a flow of at least one record to run.
It can be designed in the following way:
* source odbc stage which fetches one record from the database and maps it to one column - for example: select sysdate from dual
* A transformer which passes that record through. If required, add pl/sql procedure parameters as columns on the right-hand side of tranformer’s mapping
* Put Stored Procedure (STP) stage as a destination. Fill in connection parameters, type in the procedure name and select Transform as procedure type. In the input tab select ‘execute procedure for each row’ (it will be run once).

*** 6. Is it possible to run a server job in parallel?

Yes, even server jobs can be run in parallel.
To do that go to ‘Job properties’ -> General and check the Allow Multiple Instance button.
The job can now be run simultaneously from one or many sequence jobs. When it happens datastage will create new entries in Director and new job will be named with automatically generated suffix (for example second instance of a job named JOB_0100 will be named JOB_0100.JOB_0100_2). It can be deleted at any time and will be automatically recreated by datastage on the next run.


*** 7. Error in STPstage - STDPROC property required for stage xxx

The error appears in Stored Procedure (STP) stage when the ‘Procedure name’ field is empty. It occurs even if the Procedure call syntax is filled in correctly.
To get rid of error fill in the ‘Procedure name’ field.

*** 8. Datastage routine to open a text file with error catching

Note! work dir and file1 are parameters passed to the routine.
* open file1
OPENSEQ work_dir : ‘\’ : file1 TO H.FILE1 THEN
CALL DSLogInfo(“******************** File “ : file1 : “ opened successfully”, “JobControl”)
END ELSE
CALL DSLogInfo(“Unable to open file”, “JobControl”)
ABORT

*** 9. Datastage routine which reads the first line from a text file

Note! work dir and file1 are parameters passed to the routine.

* open file1
OPENSEQ work_dir : ‘\’ : file1 TO H.FILE1 THEN
CALL DSLogInfo(“******************** File “ : file1 : “ opened successfully”, “JobControl”)
END ELSE
CALL DSLogInfo(“Unable to open file”, “JobControl”)
ABORT
END

READSEQ FILE1.RECORD FROM H.FILE1 ELSE
Call DSLogWarn(“******************** File is empty”, “JobControl”)
END

firstline = Trim(FILE1.RECORD[1,32],” “,”A”) ******* will read the first 32 chars
Call DSLogInfo(“******************** Record read: “ : firstline, “JobControl”)
CLOSESEQ H.FILE1

END

*** 10. How to test a datastage routine or transform?

To test a datastage routine or transform go to the Datastage Manager.
Navigate to Routines, select a routine you want to test and open it. First compile it and then click ‘Test...’ which will open a new window. Enter test parameters in the left-hand side column and click run all to see the results.
Datastage will remember all the test arguments during future tests.

*** 11. When hashed files should be used? What are the benefits or using them?

Hashed files are the best way to store data for lookups. They’re very fast when looking up the key-value pairs.
Hashed files are especially useful if they store information with data dictionaries (customer details, countries, exchange rates). Stored this way it can be spread across the project and accessed from different jobs.

*** 12. How to construct a container and deconstruct it or switch between local and shared?

To construct a container go to Datastage designer, select the stages that would be included in the container and from the main menu select Edit -> Construct Container and choose between local and shared.
Local will be only visible in the current job, and share can be re-used. Shared containers can be viewed and edited in Datastage Manager under ‘Routines’ menu.
Local Datastage containers can be converted at any time to shared containers in datastage designer by right clicking on the container and selecting ‘Convert to Shared’. In the same way it can be converted back to local.

*** 13. Corresponding datastage data types to ORACLE types?

Most of the datastage variable types map very well to oracle types. The biggest problem is to map correctly oracle NUMBER(x,y) format.
The best way to do that in Datastage is to convert oracle NUMBER format to Datastage Decimal type and to fill in Length and Scale column accordingly.
There are no problems with string mappings: oracle Varchar2 maps to datastage Varchar, and oracle char to datastage char.

*** 14. How to adjust commit interval when loading data to the database?

In earlier versions of datastage the commit interval could be set up in:
General -> Transaction size (in version 7.x it’s obsolete)
Starting from Datastage 7.x it can be set up in properties of ODBC or ORACLE stage in Transaction handling -> Rows per transaction.

*** 15. What is the use of INROWNUM and OUTROWNUM datastage variables?

@INROWNUM and @OUTROWNUM are internal datastage variables which do the following:
* @INROWNUM counts incoming rows to a transformer in a datastage job
* @OUTROWNUM counts oucoming rows from a transformer in a datastage job
These variables can be used to generate sequences, primary keys, id’s, numbering rows and also for debugging and error tracing.
They play similiar role as sequences in Oracle.
If set to 0 the commit will be issued at the end of a successfull transaction.

*** 16. Datastage trim function cuts out more characters than expected

By deafult datastage trim function will work this way:
Trim(“a b c d”) will return “a b c d” while in many other programming/scripting languages “a b c d” result would be expected.
That is beacuse by default an R parameter is assumed which is R - Removes leading and trailing occurrences of character, and reduces multiple occurrences to a single occurrence.
To get the “a b c d” as a result use the trim function in the following way: Trim(“ a b c d “,” “,”B”)

DWH & DS Part I

*** 1. What are the ways to execute datastage jobs?

A job can be run using a few different methods:
* from Datastage Director (menu Job -> Run now...)
* from command line using a dsjob command
* Datastage routine can run a job (DsRunJob command)
* by a job sequencer

*** 2. How to invoke a Datastage shell command?

Datastage shell commands can be invoked from :
* Datastage administrator (projects tab -> Command)
* Telnet client connected to the datastage server

*** 3. How to stop a job when its status is running?

To stop a running job go to DataStage Director and click the stop button (or Job -> Stop from menu). If it doesn’t help go to Job -> Cleanup Resources, select a process with holds a lock and click Logout
If it still doesn’t help go to the datastage shell and invoke the following command: ds.tools.It will open an administration panel. Go to 4.Administer processes/locks , then try invoking one of the clear locks commands (options 7-10).

*** 4. How to run and schedule a job from command line?

To run a job from command line use a dsjob command
Command Syntax: dsjob [-file | [-server ][-user ][-password ]] []
The command can be placed in a batch file and run in a system scheduler.

*** 5. How to release a lock held by jobs?

Go to the datastage shell and invoke the following command: ds.tools
It will open an administration panel. Go to 4.Administer processes/locks , then try invoking one of the clear locks commands (options 7-10).

*** 6. User privileges for the default DataStage roles?

The role privileges are:
* DataStage Developer - user with full access to all areas of a DataStage project
* DataStage Operator - has privileges to run and manage deployed DataStage jobs
* -none- - no permission to log on to DataStage

*** 7. What is a command to analyze hashed file?

There are two ways to analyze a hashed file. Both should be invoked from the datastage command shell. These are:
* FILE.STAT command
* ANALYZE.FILE command

*** 8. Is it possible to run two versions of datastage on the same pc?

Yes, even though different versions of Datastage use different system dll libraries.
To dynamically switch between Datastage versions install and run DataStage Multi-Client Manager. That application can unregister and register system libraries used by Datastage.


*** 9. How to send notifications from Datastage as a text message (sms) to a cell phone

There is a few possible methods of sending sms messages from Datastage. However, there is no easy way to do this directly from Datastage and all methods described below will require some effort.

The easiest way of doing that from the Datastage standpoint is to configure an SMTP (email) server as a mobile phone gateway. In that case, a Notification Activity can be used to send message with a job log and any desired details. DSSendMail Before-job or After-job subroutine can be also used to send sms messages.

If configured properly, the recipients email address will have the following format: 600123456@oursmsgateway.com

If there is no possibility of configuring a mail server to send text messages, you can to work it around by using an external application run directly from the operational system. There is a whole bunch of unix scripts and applications to send sms messages.

In that solution, you will need to create a batch script which will take care of sending messages and invoke it from Datastage using ExecDOS or ExecSh subroutines passing the required parameters (like phone number and message body).

Please keep in mind that all these solutions may require a contact to the local cellphone provider first and, depending on the country, it may not be free of charge and in some cases the provider may not support the capability at all.

Friday, August 19, 2011

Teardata - II

Teradata Storage Architecture

The Parsing Engine interprets the SQL command and converts the data record from the host into an AMP message.

The Message Passing Layer distributes the row to the appropriate Access Module Processor (AMP).

The AMP formats the row and writes it to its associated disks

The disk holds the row for subsequent access.

The Host or Client system supplies the records. These records are the raw data from which the database will be constructed.

The Parsing Engine is a component that interprets SQL requests, receives input records and passes data. To do that it sends the messages through the Message Passing Layer to the AMPs.

The Message Passing Layer is implemented variously as hardware or software, depending on the platform used. It determines which unit or units should receive a message.

Think of the AMP (Access Module Processor) as a computer designed for and dedicated to managing a portion of the entire database. It performs all the database management functions – such as sorting, aggregating, and formatting the data. It receives data from the PE, formats the rows, and distributes the rows to the disk storage units it controls. It also retrieves the rows requested by the parsing engine.

Disks are simply disk drives associated with an AMP. They store the data rows. On current systems, they are usually implemented using a disk array (to be discussed later).

Teradata Storage Architecture

Parsing Engine dispatches request to insert a row.
Message Passing Layer insures that a row gets to the appropriate AMP (Access Module Processor).
AMP stores row on its associated disk.
Each AMP can have multiple physical disks associated with it.

Teradata Retrieval Architecture

Retrieving data from the Teradata RDBMS simply reverses the process of the storage model. A request is made for data and is passed on to a Parsing Engine (PE). The PE optimizes the request for efficient processing and creates tasks for the AMPs to perform, which will result in the request being satisfied. These tasks are then dispatched to the AMPs via the Message Passing Layer. Often times all AMPs must participate in creating the answer set, such as in returning all rows of a table. Other times, only one or a few AMPs need participate, depending on the nature of the request. The PE will insure that only the AMPs that are needed will be assigned tasks on behalf of this request.

Once the AMPs have been given their assignments, they will retrieve the desired rows from their respective disks. If sorting, aggregating or formatting of any kind is needed, the AMPs will also take care of that. The rows are then returned to the requesting PE via the Message Passing Layer. The PE takes the returned answer set and returns it to the requesting client application.

Teardata Basics - I

** Source: NCR TSG ( Teradata Solutions Group ) BOOK

Teradata was the first commercial database system to support a trillion bytes of data. It is hard to imagine the size of a trillion. Most people are comfortable with Kilobytes, Megabytes and even Gigabytes, but Terabytes are another order of magnitude. To put it in perspective, the lifespan of the average person is 2.5 Giga seconds (or said differently 2,500,000,000 seconds). A tera-second is 31,688 years!

How Large is a Trillion

1 Kilobyte = 103 = 1000 bytes
1 Megabyte = 106 = 1,000,000 bytes
1 Gigabyte = 109 = 1,000,000,000 bytes
1 Terabyte = 1012 = 1,000,000,000,000 bytes
1 Petabyte = 1015 = 1,000,000,000,000,000 bytes
1 Exabyte = 1018 = 1,000,000,000,000,000,000 bytes
1 Zetabyte = 1021 = 1,000,000,000,000,000,000,000 bytes
1 Yottabyte = 1024 = 1,000,000,000,000,000,000,000,000 bytes

1 million seconds = 11.57 days
1 billion seconds = 31.6 years
1 trillion seconds = 31,688 years
1 million inches = 15.7 miles
1 trillion inches = 15,700,000 miles (30 roundtrips to the moon)
1 million square inches = .16 acres = .0002 sq. miles
1 trillion square inches = 249 square miles (larger than Singapore)
$1 million = < $ .01 for every person in U.S.
$1 billion = $ 3.64 for every person is U.S.
$1 trillion = $ 3,636 for every person in U.S.

The Teradata Charter

Relational database
Enormous capacity
- Billions of rows
- Terabytes of data
High performance parallel processing
Single database server for multiple clients
Network and mainframe connectivity
Industry standard access language (SQL)
Manageable growth via modularity
Fault tolerance at all levels of hardware and software
Data integrity and reliability

DWH Material - III

OLAPon-line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

OLAP functionality is characterized by dynamic multi-dimensional analysis of consolidated enterprise data supporting end user analytical and navigational activities including:

- calculations and modeling applied across dimensions, through hierarchies and/or across members
-        trend analysis over sequential time periods
-         slicing subsets for on-screen viewing
-         drill-down to deeper levels of consolidation
-         reach-through to underlying detail data
-         rotation to new dimensional comparisons in the viewing area

OLAP is implemented in a multi-user client/server mode and offers consistently rapid response to queries, regardless of database size and complexity. OLAP helps the user synthesize enterprise information through comparative, personalized viewing, as well as through analysis of historical and projected data in various "what-if" data model scenarios. This is achieved through use of an OLAP Server.

OLAP Server
An OLAP server is a high-capacity, multi-user data manipulation engine specifically designed to support and operate on multi-dimensional data structures. A multi- dimensional structure is arranged so that every data item is located and accessed based on the intersection of the dimension members which define that item. The design of the server and the structure of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of raw data based on formulaic relationships. The OLAP Server may either physically stage the processed multi-dimensional information to deliver consistent and rapid response times to end users, or it may populate its data structures in real-time from relational or other databases, or offer a choice of both. Given the current state of technology and the end user requirement for consistent and rapid response times, staging the multi-dimensional data in the OLAP Server is often the preferred method.


OLAP Client
End user applications that can request slices from OLAP servers and provide two- dimensional or multi-dimensional displays, user modifications, selections, ranking, calculations, etc., for visualization and navigation purposes. OLAP clients may be as simple as a spreadsheet program retrieving a slice for further work by a spreadsheet- literate user or as high-functioned as a financial modeling or sales analysis application.

ROLAP - Relational OLAP (ROLAP) application provides the dimensional interface to a relational database.

MOLAP - Multidimensional OLAP (MOLAP) engines provide highly specialized support for analysis. Because facts are prestored at all valid combinations of the dimensions, the query performance is very high where as the amount of data to be stored is voluminous.


Drill Down/Up
Drilling down or up is a specific analytical technique whereby the user navigates among levels of data ranging from the most summarized (up) to the most detailed (down). The drilling paths may be defined by the hierarchies within dimensions or other relationships that may be dynamic within or between dimensions. For example, when viewing sales data for North America, a drill-down operation in the Region dimension would then display Canada, the eastern United States and the Western United States. A further drill- down on Canada might display Toronto, Vancouver, Montreal, etc.

DWH Material - II

ODSThe Operational Data Store (ODS) is a subject oriented, integrated, volatile & current data store containing detailed corporate data. ODS stores only a specific data (Ex. credit data) from different systems. Because it is always current, it requires very frequent data loads from source systems to the ODS.

Hierarchical Relationships
Any dimension's members may be organized based on parent-child relationships, typically where a parent member represents the consolidation of the members which are its children. The result is a hierarchy, and the parent/child relationships are hierarchical relationships. For example, the Time dimension would contain the following hierarchy
Day – Week – Month – Quarter – Year.Based on the business queries, such hierarchies would be formed in all dimensions.


ExtractionRefers to the process of extracting data from OLTP systems by means of scripts/tools for use in the data warehouse.

TransformationRefers to a set of procedures that are used to ensure the uniformity of data in the warehouse. Multiple OLTP systems may store the same data in different formats. For example, gender may be stores as Male/Female, 1/0, M/F. The process of standardizing this storage and applying the standard for all extracted data is called transformation.
CleaningRefers to the process of validating the extracted data. The business queries would specify some business rules that the extracted data has to undergo. The process of applying these rules is called cleansing.

ETLExtraction, Transformation & Loading process loads the data from heterogeneous source systems to the data warehouse. The data is also cleansed by eliminating duplicate rows, standardizing on acronyms (For ex, co., comp., company etc) in the staging area. The ETL process is triggered at a frequency based on the business needs.

Multi-dimensional/HyperCube/CubeA group of data cells arranged by the dimensions of the data. For example, a spreadsheet exemplifies a two-dimensional array with the data cells arranged in rows and columns, each being a dimension. A three-dimensional array can be visualized as a cube with each dimension forming a side of the cube, including any slice parallel with that side. Higher dimensional arrays have no physical metaphor, but they organize the data in the way users think of their enterprise. Typical enterprise dimensions are time, measures, products, geographical regions, sales channels, etc.

Member/AttributeA dimension member is a discrete name or identifier used to identify a data item's position and description within a dimension. For example, January 1989 or 1Qtr93 are typical examples of members of a Time dimension. Wholesale, Retail, etc., are typical examples of members of a Distribution Channel dimension.


Consolidation/AggregationMulti-dimensional databases generally have hierarchies or formula-based relationships of data within each dimension. Consolidation involves computing all of these data relationships for one or more dimensions, for example, adding up all Departments to get Total Division data. While such relationships are normally summations, any type of computational relationship or formula might be defined.