Many thanks for visiting my Blog..!!Please share this blog using below share buttons and leave your Comments/Feedback/Appreciations on Tab: Feedback
Share This Blog..!!

Datastage Interview Questions and Answers V1.4


Datastage Interview Related Questions and Answers V1.4

This post is very useful for datastage beginners..DK®


You will get Answers for below mentioned questions, in this Post:DK® 

What are the main features of datastage?
What are Stage Variables, Derivations and Constants?
Types of views in Datastage Director?
How do you execute datastage job from command line prompt?
Functionality of Link Partitioner and Link Collector?
What are the types of jobs available in datastage?
What are the types of jobs available in datastage?
What is the difference between Server Job and Parallel Jobs?
What is a project
What is sequencer
# What is a container
# What are different types of containers
# What are mainframe jobs
# What are parallel jobs
# How do you use procedure in datastage job
What is odbc stage
What is hash file ? what are its types?
# What type of hash file is to be used in general in  datastage jobs
# What is a stage variable?
What are constraints and derivations?
# How do you reject records in a transformer?
Why do you need stage variables?
What is the precedence of stage variables,derivations,  and constraints?
What are data elements?
What are routines ?
What are transforms and what is the differenece between  routines and transforms
What a datastage macro?
What is job control ?
How many types of stage.
Define the difference between active and Passive Stage
What are the plugin stages used in your projects.
What is Plug in stage , How to install plugin stages.
What is Sort plugin.
What is Aggregate stage.
What is the hash file stage and Sequential file stage.
What types of flat files you used.have you used tab delimited.
What is the Job control code


-------------------------------------------------------------------------


1.      What are the main features of datastage?DK®
DataStage has the following features to aid the design and processing required to build a data warehouse :
  1. Uses graphical design tools. With simple point and click techniques you can draw a scheme to represent your processing requirements.
  2. Extracts data from any number or types of database.
  3. Handles all the metadata definitions required to define your data warehouse.
  4. You can view and modify the table definitions at any point during the design of your application.
  5. Aggregates data.
  6. You can modify SQL SELECT statements used to extract data.
  7. Transforms data. DataStage has a set of predefined transforms and functions. you can use to convert your data. You can easily extend the functionality by defining your own transforms to use.
  8. Loads the data warehouse.

2.      What are Stage Variables, Derivations and Constants?DK®
A. Stage Variable - An intermediate processing variable that retains value during read
and doesn’t pass the value into target column.
Derivation - Expression that specifies value to be passed on to the target column.
Constant - Conditions that are either true or false that specifies flow of data with a link.

3.      Types of views in Datastage Director?DK®
There are 3 types of views in Datastage Director
a) Job View - Dates of Jobs Compiled.
b) Log View - Status of Job last run
c) Status View - Warning Messages, Event Messages, Program Generated Messages.

4.      How do you execute datastage job from command line prompt?DK®
A) Using "dsjob" command as follows.
dsjob -run -jobstatus projectname jobname

5.      Functionality of Link Partitioner and Link Collector?
A)Link Partitioner: It actually splits data into various partitions or data flows using
various partition methods.
Link Collector: It collects the data coming from partitions, merges it into a single data
flow and loads to target.



6.      What are the types of jobs available in datastage?DK®
  1. Server Job
  2. Parallel Job
  3. Sequencer Job
  4. Container Job

A.      Server jobs were doesn’t support the partitioning techniques but parallel jobs support the partition techniques. 
B.      Server jobs are not support SMTP,MPP but parallel supports SMTP,MPP. 
C.      Server jobs are running in single node but parallel jobs are running in multiple nodes. 
D.     Server jobs prefer while getting source data is low but data is huge then prefer the parallel


Datastage Projects :-  A Complete project may contain several jobs and user-defined components .
Project Contain Datastage jobs.
Built-in components . These are predefined components used in job.
User-defined components . These are  customized components created using the Datastage manager . each user-defined component performs a specific task in a job .

All work done in project. Projects are created during and after installation process.you can add project on the Projects tab of Administrator.
A project is associated with a directory.The Project directory is used by DataStage to store jobs and other datastage objects and metedata.

Graphically  create controlling job, without using the job control function.

A group of stages and link in a job design is called container.
There are two kinds of Containers:
Local and Shared.
Local Containers only exist within the single job they are used. Use Shared Containers to simplify complex job designs.
Shared Containers exist outside of any specific job. They are listed in the Shared Containers branch is Manager. These Shared Containers can be added to any job. Shared containers are frequently used to share a commonly used set of job components.
A Job Container contains two unique stages. The Container Input stage is used to pass data into the Container. The Container Output stage is used to pass data out of the Container.

Local containers. These are created within a job and are only accessible by that job. A local container is edited in a tabbed page of the job’s Diagram window.
Shared containers. These are created separately and are stored in the Repository in the same way that jobs are. Instances of a shared container can be inserted into any server job in the project. Shared containers are edited in their own Diagram window.

A Mainframe job is complied and run on the mainframe , Data Extracted by such jobs is then loaded into the datawarehouse .

These are compiled and run on the DataStage server in a similar way to server jobs , but support parallel processing on SMP,MPP and cluster systems

Use ODBC plug,pass one dummy colomn and  give procedure name in SQL tab.



 A Stage that extracts data from or loads data into a database that implements the industry standard Open Database Connectivity API. Used to represent a data source , an aggregation step , or target data table ( Server Job Only )

Hash file is just like indexed sequential file , this file internally indexed with a particular key value . There are two type of hash file Static Hash File and Dynamic Hash File .

Static Hash File .

In Datastage transformation , we can define some variable and define Value from source .

We can create constraints and derivations with datastage variable .

Through datastage constraint we can reject record .

That is  depend upon job requirement , through stage variable we can file data.

 stage variables =>constraints=> derivations

A specification that describes the type of data in a column and how the data is converted .

In Datastage routine is just like function , which we call in datastage job . there are In-Built routine and and also we can create routine .

Transforms is used to manipulate data within datastage job .

In datastage macro can be used in expressions , job control routines and before / after subroutines . The available macros are concerned with ascertaining job status .

A job control routine provides the means of controlling other jobs from the current job .  A set of one or more jobs can be validated, run ,reset , stopped , scheduled in much the same way as the current job can be .

There are three basic type of stage
Built-in stages :-   Supplied with DataStage and used for extracting , aggregating , transforming , or  writing data . All type of job have these stage .
Plug-in stage :-  Additional stages that can be installed in DataStage to perform specialized tasks that the built-in stages do not support. Server jobs and parallel jobs can make use of these .
Job Sequence Stages :- Special built-in stages which allow you to define sequences of activities to run. Only job sequencer have these

There are two kinds of stages:
Passive stages define read and write access to data sources and repositories.
• Sequential
• ODBC
• Hashed
Active stages define how data is filtered and transformed.
• Transformer
• Aggregator
• Sort plug-in

Plug-In Stage :- A Stage that performd specific processing that is not supported by the standard server job stage .
Used Plug-in – ORAOCI8, Orabulk.

Installation :- Plug-in stages are added to the Diagram window in the same way as built-in stages. To add a plug-in stage:
A.      Do one of the following:
·         Choose the stage type from the Insert menu.
·         Click the appropriate stage button on the tool palette.
B.      Click in the Diagram window where you want to position the stage. The stage appears as a square in the Diagram window.
---- Some information about Plug-In Stage
You may find that the built-in stage types do not meet all your requirements for data extraction or transformation. In this case, you need to use a plug-in stage. The function and properties of a plug-in stage are determined by the plug-in used when the stage is inserted. Plug-ins are written to perform specific tasks, for example, to bulk load data into a data mart.
Two plug-ins are always installed with DataStage: BCPLoad and Orabulk. You can also choose to install a number of other plug-ins when you install DataStage.
You can write plug-ins. However, contact your local Ascential office before creating a plug-in, as there are a range of plug-ins already available that may meet your needs.

30. Difference Between ORAOCI8 and  Orabulk?DK®
ORAOCI8 :-  This Stage allow to connect Oracle Database .
OraBulk  :- The Orabulk plug-in generates control and data files for bulk loading into a single table on an Oracle target database. The files are suitable for loading into the target database using the Oracle command sqlldr.

A mainframe processing stage that sorts input columns

A stage type that compute s totals or other functions of set of data .

A stage that extracts data or load data into database that contain hashed file .

Sequential flat file with comma separated .

Job control code used in job control routine to creating controlling job, which  invokes and run other jobs.



No comments:

Post a Comment

disqus