DataStage ® Release: 8x: Job Type: Parallel: OS: Windows: I have an incoming file which I would like to read utilizing complex flat file stage. I do not have the cobol file definitions. Data looks like the following AAA123sdg20101120(0D0A) -header BBB2345ABCDE555(0D0A)- Batch header.
DataStage ® Release: 8x: Job Type: Parallel OS: Unix Additional info: How to get EBCDIC format text file in complex flat file? We are on release 8.5. We are going to get the file from our vendor with 6 different record formats in it.
ETL tools use the definition file to determine the formatting of the file. Use of Stages in Datastage 8.5 or 8x Series. Complex Flat File. External Source stage allows us to interact with or call external programs. This stage is. Complex flat file data load to a table This document gives the step by step procedure to develop a data stage job.
DataStage Tutorials Overview
Welcome to DataStage Tutorials. The objective of these tutorials is to gain understanding of IBM DataStage Tool. In these tutorials, we will cover topics such as DataStage Architecture, Job Sequencing in DataStage, Containers & Joins in DataStage etc.
In addition to DataStage, we will cover common interview questions and issues in Data Stage.
Index
DataStage Overview
DataStage is one of the GUI Based ETL Tool Which is used to create a usable Data Ware House or Data mart Applications.
In Data stage we have three types of Jobs is there:
DataStage continued to enhance it’s capabilities to manage data quality and data integration solutions. DataStage 8.0 introduced many new features to make development and maintenance of project comfortable. These enhancements include data quality management, connectivity methods, implementation of slowly changing dimension.
What is IBM Information Server?
IBM Information Server, consist of the following components, WebSphere DataStage and Quality Stage, WebSphere Information Analyzer, Federation Server, and Business Glossary, common administration, logging and reporting. These components are designed to provide much more efficient ways to manage metadata and develop ETL solutions. Components can be deployed based on client need.
Top ten features
- The Metadata Server
With the Hawk release, DataStage has created common administration, logging and reporting and this will improve metadata reporting available, compared to prior releases.
- Quality Stage
Data Quality is highly critical for data integration projects. With earlier releases such as MetaStage, Quality Stages used to add lot of additional overhead in installation, training and implementation. With new release of QualityStage, integration projects using standardization, matching and survivorship to improve quality will be more accessible and easier to use. Also, developer will be able to design jobs with data transformation stages and data quality stages in the same session. Designer is called DataStage and QualityStage Designer in current release, based on it’s usage.
- Frictionless Connectivity and Connection Objects
Managing connectivity information and propagating connectivity information between different environments, has added additional development and maintenance overhead. These new objects help in connecting to remote database connectivity easier. Earlier releases, development team may need to spend considerable time in resolving connectivity issues with the database. DataStage 8 will help the team by providing frictionless connectivity and connectivity objects, ensure reusability and reduces risk of data issues due to wrong connectivity information.
- Parallel job range lookup
It’s always important to get different options to access data for lookup and accessing over a range is always better option when data range is available for improving performance. Range lookup has been merged into the existing lookup form and are easy to use.
- SCD
Data Warehouse developers need to develop complex jobs to implement Slowly Changing Dimension. With this stage introduced in DataStage 8, following enhancements can be done easily, surrogate key generation, there is the slowly changing dimension stage and updates passed to in memory lookups. That's it for me with DBMS generated keys, I'm only doing the keys in the ETL job from now on! DataStage server jobs have the hash file lookup where you can read and write to it at the same time, parallel jobs will have the updateable lookup.
- Collaboration
This new feature allows developers to open any job, which is already opened by other developers. This copy of developer will be READ ONLY. This helps the developers in reducing wait time, when job is currently LOCKED by other user. New enhancements also allows you to unlock the job associated with a disconnected session from the web console in an easier way than prior releases.
- Session Disconnection
With this feature an administrator can disconnect sessions and unlock jobs.
- Improved SQL Builder
This feature reduces the effort spent in synchronizing SQL Select list to the DataStage column list. This will ensure that column mismatches. Adding to this in ODBC Connector, you will be able to complex queries with GUI, which includes adding columns and where clause to the statement.
- Improved job startup times
With this new enhancement, when lot of small parallel jobs gets invocated, this will have less impact on DataStage long running jobs. Connectivity and resource allocation for parallel jobs has improved and load is balanced based on job requirement.
- Common logging
With this new feature, Data Stage has introduced common logging of Data Stage job logs. This helps in searching from Data Stage log. Data Stage has also introduced time based and record based job monitoring.
Change Data Capture
These are add on products (at an additional fee) that attach themselves to source databases and perform change data capture. Most source system database owners I've come across don't like you playing with their production transactional database and will not let you near it with a ten foot poll, but I guess there are exceptions:
-Oracle
-Microsoft SQL Server
-DB2 for z/OS
-IMS
There are three ways to get incremental feeds on the Information Server: the CDC products for DataStage, the Replication Server (renamed Information Integrator: Replication Edition, does DB2 replication very well) and the change data capture functions within DataStage jobs such as the parallel CDC stage.
Removed Functions
These are the functions that are not in DataStage 8,
-dssearch command line function
-dsjob '-import'
-Version Control tool
-Released jobs
-Oracle 8i native database stages
-ClickPack
The loss of the Version Control tool is not a big deal as the import/export functions have been improved. Building a release file as an export in version 8 is easier than building it in the Version Control tool in version 7.
Database Connectivity
The common connection objects functionality means the very wide range of DataStage database connections are now available across Information Server products.
Latest supported databases for version 8:
-DB2 8.1, 8.2 and 9.1
-Oracle 9i, 10i, 10gR2 not Oracle 8
-SQL Server 2005 plus stored procedures.
-Teradata v2r5.1, v2r6.0, v2r6.1 (DB server) / 8.1 (TTU) plus Teradata Parallel Transport (TPT) and stored procedures and macro support, reject links for bulk loads, restart capability for parallel bulk loads.
-Sybase ASE 15, Sybase IQ 11.5, 12.5, 12.7
-Informix 10 (IDS)
-SAS 612, 8.1, 9.1 and 9.1.3
-IBM WS MQ 6.1, WS MB 5.1
-Netezza v3.1
-ODBC 3.5 standard and level 3 compliant
-UniData 6 and UniVerse ?
-Red Brick ?
New Stages
A new stage from the IBM software family, new stages from new partners and the convergence of QualityStage functions into Datastage. Apart from the SCD stage these all come at an additional cost.
-WebSphere Federation and Classic Federation
-Netezza Enterprise Stage
-SFTP Enterprise Stage
-iWay Enterprise Stage
-Slowly Changing Dimension: for type 1 and type 2 SCDs.
- Six QualityStage stages
New Functions Existing Stages
-Complex Flat File Stage: Multi Format File (MFF) in addition to existing cobol file support.
-Surrogate Key Generator: the key sourceis a new feature included in this stage which is maintained via integrated state file or DBMS sequence.
-Lookup Stage: Range Look-up is a new function which is equivalent to the operator between. Lookup against a range of values was difficult to implement in previous DataStage versions. By having this functionality in the lookup stage, comparing a source column to a range of two lookup columns or a lookup column to a range of two source columns can be easily implemented.
-Transformer Stage: new surrogate key functions Initialize() and GetNextKey().
-Enterprise FTP Stage: now choose between ftp and sftp transfer.
-Secure FTP (SFTP) Select this option if you want to transfer files between computers in a secured channel. Secure FTP (SFTP) uses the SSH (Secured Shell) protected channel for data transfer between computers over a nonsecure network such as a TCP/IP network. Before you can use SFTP to transfer files, you should configure the SSH connection without any pass phrase for RSA authentication.
New Database Connector Functions
This is a big area of improvement.
LOB/BLOC/CLOB Data: pictures, documents etc of any size can now be moved between databases. Connector can transfer large objects (LOB) using inline or reference methods.However, a connector is the only stage that does reference methods so another connector is needed to transfer the LOB inline later in the job.
Reject Links: Connecter has its own reject-handling function which eliminates the need to add a Modify or a Transformer stage for capturing SQL errors or for aborting jobs. A choice between number of rows or percentage or rows rejected can be specified for terminating the job run.
Schema Reconciliation: Connector has a schema reconciliation function that automatically compares DataStage schemas to external-resource schemas such as a database. Schemas include data types, attributes and field lengths. Based on the reconciliation rules that you specify, runtime errors or extra transformation on mismatched schemas can be avoided.
Improved SQL Builder that supports more database types.
Connector is the best stage to use for your database because it gives themaximum parallel performance and offers more features compared to database
Test button The Test Button on connectors allows developers to test database connections without having to view the data or to run the job.
Connectors are for accessing external data sources and can be used to read, write, look up and filter data or simply to test the database connectivity during job design.
Drag and drop your configured database connections onto jobs.
Before and after SQL defined per job or per node with a failure handling option. Neater than previous versions.
DataStage 8 gives you access to the latest versions of databases that DataStage 7 may never get. Extra functions on all connectors includes improved reject handling, LOB support and easier stage configuration.
Database Repository
Note the database compatibility for the Metadata Server repository is the latest versions of the three DBMS engines. DB2 is an optional extra in the bundle if you don't want to use an existing database.
-IBM Information Server does not support the Database Partitioning Feature (DPF) for use in the repositorylayer
-DB2 Restricted Enterprise Edition 9 is included with IBM Information Server and is an optional part of the installation however its use is restricted to hosting the IBM Information Server repository layer and cannot be used for other applications
Oracle 10g
SQL Server 2005
Enterprise Packs
Different enterprise packs are available in version 8. These packs are:
-SAP BW Pack
-BAPI: (Staging Business API) loads from any source to BW.
-OpenHub: extract data from BW.
-SAP R/3 Pack
-ABAP: (Advanced Business Application Processing) auto generate ABAP, Extraction Object Builder, SQL Builder, Load and execute ABAP from DataStage, CPI-C Data Transfer, FTP Data Transfer, ABAP syntax check, background execution of ABAP.
-Business Component: access business views via Siebel Java Data Bean
-Direct Access: use a metadata browser to select data to extract
-Hierarchy: for extracts from Siebel to SAP BW.
-Oracle Applications Pack
-Oracle flex fields: extract using enhanced processing techniques.
-Oracle reference data structures: simplified access using the Hierarchy Access component.
Metadata browser and importer
-DataStage Pack for PeopleSoft Enterprise
-Import business metadata via a metadata browser.
-Extract data from PeopleSoft tables and trees.
-JD Edwards Pack
-Standard ODBC calls
-Pre-joined database tables via business views
Code Packs
These packs can be used by server and/or parallel jobs to interact with other coding languages. This lets you access programming modules or functions within a job:
-Java Pack: Produce or consume rows for DataStage Parallel or Server jobs. Using a java transformer.
-Web Service Pack: Access web services operations in a Server job transformer or Server routine.
-XML Pack: Read, write or transform XML files in parallel or server jobs.
The DataStage stages, custom stages, transformer functions and routines will usually be faster at transforming data than these packs however they are useful for re-using existing code.
Database OPEN and CLOSE Commands
The native parallel database stages provide options for specifying OPEN and CLOSE commands. These options allow commands (including SQL) to be sent to the database before (OPEN) or after (CLOSE) all rows are read/written/loaded to the database. OPEN and CLOSE are not offered by plug-in database stages.
For example, the OPEN command could be used to create a temporary table, and the CLOSE command could be used to select all rows from the temporary table and insert into a final target table.
As another example, the OPEN command can be used to create a target table, including database-specific options (tablespace, logging, constraints, etc) not possible with the “Create” option. In general, don’t let EE generate target tables unless they are used for temporary storage. There few options to specify Create table options, and doing so may violate data-management (DBA) policies.
It is important to understand the implications of specifying a user-defined OPEN and CLOSE command. For example, when reading from DB2, a default OPEN statement places a shared lock on the source. When specifying a user-defined OPEN command, this lock is not sent – and should be specified explicitly if appropriate.
Further details are outlined in the respective database sections of the Orchestrate Operators Reference which is part of the Orchestrate OEM documentation.
Data Stage Designer
DataStage Designer is used to design ETL jobs. Some of the functionalities provided are
Detailed below:
-Create DS jobs
-Create and use parameters within jobs
-Insert and link stages
-Configure stage and job properties
-Load and save table definitions
-Save and compile DS jobs
-Run jobs
Logging-In to DS Designer
The ‘Attach to Project’ window is used to log-into DS.
DS Log On Window
Note: Do not use the ‘Omit’ option while working in the UNIX environment. This option is Used for ‘Windows authentication’. It should not be used when DS is run on UNIX
The Data Stage Job
Starting Data Stage
The screen below displays when the user successfully logs-in.
DS Job Selection
Complex Flat File Stage Datastage Example Programs Pdf
Select a ‘New Parallel Job’ from the new job window.
Note: Options to choose from ‘Existing’ jobs or from ‘Recent’ jobs are available from the tab
of the same name.
DataStage EE Canvas
A typical DS Enterprise Edition canvas looks like the example below.
DS Canvas--Typical Data Stage Parallel Job
DS Stages and Usage
The datastage stages are divided into two categories
1.Active Stages
2.Passive Stages
Active Stages :Active stages model the flow of data and provide mechanisms for combining data streams, aggregating data, and converting data from one data type to another.
The look and feel of DataStage and QualityStage canvas remains the same but the new functionalities are major enhancements over the previous version. Data Connection Object,Parameter Set, Range Look-up and Slowly Changing Dimension are all designed to simplify design, help cut implementation effort and reduce cost. Advance Find provides a good way to do impact analysis, an important step in project management. Resource Estimation is as important for project planning. Meanwhile, Performance Analysis tool is another useful feature that can be used throughout the lifecycle of a job. By knowing what causes a performance bottleneck, production support groups can better cope with the ever-shrinking batch windows.
While Advance Find will not perform a Replace function and SQL Builder will not let us build complex SQL, all the changes in version 8 have positive impact on job development,production support and project management. Combined with the features offered in Information Server, existing customers who are looking to upgrade or new DataStage clients will benefit from the new enhancements.
For Indepth Knowledge on DataStage, click on below
PROFESSIONAL SUMMARY:
Over 6 years of Dynamic career reflecting pioneering experience and high performance in System Analysis, design, development and implementation of Relational Database and Data Warehousing Systems using IBM Data Stage 8.0.1/7.x/6.x/5.x (Info Sphere Information Server, Web Sphere, Ascential Data Stage).
Excellent Experience in Designing, Developing, Documenting, Testing of ETL jobs and mappings in Server and Parallel jobs using Data Stage to populate tables in Data Warehouse and Data marts.
Proficient in developing strategies for Extraction, Transformation and Loading (ETL) mechanism.
Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, Remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML.
Expert in designing Server jobs using various types of stages like Sequential file, ODBC, Hashed file, Aggregator, Transformer, Sort, Link Partitioner and Link Collector.
Experienced in integration of various data sources (DB2-UDB, SQL Server, PL/SQL, Oracle, Teradata, XML and MS-Access) into data staging area.
Expert in working with Data Stage Manager, Designer, Administrator, and Director.
Experience in analyzing the data generated by the business process, defining the granularity, source to target mapping of the data elements, creating Indexes and Aggregate tables for the data warehouse design and development.
Excellent knowledge of studying the data dependencies using metadata stored in the repository and prepared batches for the existing sessions to facilitate scheduling of multiple sessions.
Proven track record in troubleshooting of Data Stage jobs and addressing production issues like performance tuning and enhancement.
Expert in working on various operating systems like UNIX AIX 5.2/5.1, Sun Solaris V8.0 and Windows 2000/NT.
Proficient in writing, implementation and testing of triggers, procedures and functions in PL/SQL and Oracle.
Experienced in Database programming for Data Warehouses (Schemas), proficient in dimensional modeling (Star Schema modeling, and Snowflake modeling).
Expertise in UNIX shell scripts using K-shell for the automation of processes and scheduling the Data Stage jobs using wrappers.
Experience in using software configuration management tools like Rational Clear case/Clear quest for version control.
Experienced in Data Modeling as well as reverse engineering using tools Erwin, Oracle Designer and MS Visio, SQL server management studio, SSIS and SSRS and store procedure.
Expert in unit testing, system integration testing, implementation and maintenance of databases jobs.
Effective in cross-functional and global environments to manage multiple tasks and assignments concurrently with effective communication skills.
EDUCATIONAL QUALIFICATION: Bachelors in Electronics and Communication, TECHNICAL SKILLS:
ETL Tools
DATA STAGE- IBM Web Sphere Data stage and Quality Stage 8.0, Ascential Data Stage /7.5.2/5.1/6.0 Profile Stage 7.0, SSIS (SQL server 2005), Data Integrator.
Business Intelligence tools
Business Objects, Brio, SSRS(SQL Server 2005),IBM Cognos 8 BI
Complex Flat File Stage Datastage Example Programs List
Testing Tools
Auto Tester, Test Director, Lotus Notes
Data Modeling Tools
Erwin 4.0, Sybase Power Developer, SSIS,SSRS
Operating Systems
HP-UX, IBM-AIX 5.3, Windows 95/98/2000/ NT, Sun Solaris, Red-Hat Linux, MS SQL SERVER 2000/2005/2008& MS Access
WORK EXPERIENCE: Confidential, CANov 2010 – Present ETL Developer NetApp Inc is leading Network Appliance Manufacturer Company as well as data storage Company which provide Network appliance like hard disk, shelf for small business owners, large business owners. Also NetApp provides efficient data storage facility. The main aim is to provide variety of services like Data storage, Data Analysis, Data warehouse, Data mart etc. which can adopt consistent tailored processes in order to strive and fulfill promise of commitment and reliability to Customers.
Involved as primary on-site ETL Developer during the analysis, planning, design, development, and implementation stages of projects using IBM Web Sphere software (Quality Stage v8.1, Web Service, Information Analyzer, Profile Stage, WISD of IIS 8.0.1).
Prepared Data Mapping Documents and Design the ETL jobs based on the DMD with required Tables in the Dev Environment.
• Active participation in decision making and QA meetings and regularly interacted with the Business Analysts &development team to gain a better understanding of the Business Process, Requirements & Design.
Used DataStage as an ETL tool to extract data from sources systems, loaded the data into the ORACLE database.
Designed and Developed Data stage Jobs to Extract data from heterogeneous sources, Applied transform logics to extracted data and Loaded into Data Warehouse Databases.
Created Datastage jobs using different stages like Transformer, Aggregator, Sort, Join, Merge, Lookup, Data Set, Funnel, Remove Duplicates, Copy, Modify, Filter, Change Data Capture, Change Apply, Sample, Surrogate Key, Column Generator, Row Generator, Etc.
Extensively worked with Join, Look up (Normal and Sparse) and Merge stages.
Extensively worked with sequential file, dataset, file set and look up file set stages.
Extensively used Parallel Stages like Row Generator, Column Generator, Head, and Peek for development and de-bugging purposes.
Used the Data Stage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting executable versions on ad hoc or scheduled basis.
Developed complex store procedures using input/output parameters, cursors, views, triggers and complex queries using temp tables and joins.
Converted complex job designs to different job segments and executed through job sequencer for better performance and easy maintenance.
Creation of jobs sequences.
Maintained Data Warehouse by loading dimensions and facts as part of project. Also worked for different enhancements in FACT tables.
Created shell script to run data stage jobs from UNIX and then schedule this script to run data stage jobs through scheduling tool.
Coordinate with team members and administer all onsite and offshore work packages.
Analyze performance and monitor work with capacity planning.
Performed performance tuning of the jobs by interpreting performance statistics of the jobs developed.
Documented ETL test plans, test cases, test scripts, and validations based on design specifications for unit testing, system testing, functional testing, prepared test data for testing, error handling and analysis.
Participated in weekly status meetings.
Developed Test Plan that included the scope of the release, entrance and exit criteria and overall test strategy. Created detailed Test Cases and Test sets and executed them manually.
Environment: IBM Web Sphere DataStage 8.1 Parallel Extender, Web Services, Quality Stage 8.1, (Designer, Director, Manager), Microsoft Visio, IBM AIX 4.2/4.1 IBM DB2 Database, SQL Server, IBM DB2,Teradata, ORACLE 11G, Query man, Unix, Windows.
Confidential, NJJan 2010 - Oct 2010 Lead Sr. Datastage Developer Project was to design and develop enterprise data warehouse. Extract data from heterogeneous source system, transform them using business logic and load in to data warehouse.
Used the DataStage Designer to develop processes for extracting, cleansing, transforming, integrating and loading data into staging tables.
Extensively used ETL to load data from IBM DB2 database, XML & Flat files Source to Informix Database Server.
Involved in analysis, planning, design, development, and implementation phages of projects using IBM Web Sphere software (Quality Stage v8.0.1, Web Service, Information Analyzer, Profile Stage, WISD of IIS 8.0.1).
Developed complex jobs using various stages like Lookup, Join, Transformer, Dataset, Row Generator, Column Generator, Datasets, Sequential File, Aggregator and Modify Stages.
Created queries using join and case statement to validate data in different databases.
Created queries to compare data between two databases to make sure data is matched.
Used the DataStage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting executable versions on an ad hoc or scheduled basis.
Created shared container to incorporate complex business logic in job.
Monitoring the Datastage job on daily basis by running the UNIX shell script and made a force start whenever job fails.
Created and modified batch scripts to ftp files from different server to data stage server.
Extensively used slowly changing dimension Type 2 approach to maintain history in database.
Created Job Sequencers to automate the job.
Modified UNIX shell script to run Job sequencer from the mainframe job.
Create parameter set to assign a value to job at run time.
Standardized the Nomenclature used to define the same data by users from different business units.
Created multiple layer report providing a comprehensive and detail report with Drill through facility.
Used Parallel Extender for Parallel Processing for improving performance when extracting the data from the sources.
Worked with Metadata Definitions, Import and Export of Datastage jobs using Data stage Manager.
Providing the logical data model design, generating database, resolving technical issues, and loading data into multiple instances.
Implemented PL/SQL scripts in accordance with the necessary Business rules and procedures.
Developed PL/SQL procedures & functions to support the reports by retrieving the data from the data warehousing application.
Used PL/SQL programming to develop Stored Procedures/Functions and Database triggers.
Environment: IBM Web Sphere DataStage 8.0.1 Parallel Extender, Web Services, Quality Stage 8.0, (Designer, Director, Manager), Microsoft Visio, IBM AIX 4.2/4.1 IBM DB2 Database, SQL Server 2000, IBM DB2,Teradata, ORACLE 11G, Query man, BMQ, Unix, Windows.
Confidential, VAOct 2008 – Dec 2009 Lead Datastage ETL Developer Project was involved in design and development of a group insurance system, which processes claims for group insurance. It covers benefits with subsystems covering Term Life Insurance, Medical Indemnity and Managed Health Care. Data Modeling:
Gathered and analyzed the requirements of the in-house business users for the data warehousing from JAD sessions.
Collected the information about different Entities and attributes by studying the existing ODS and reverse engineering into Erwin.
Defined the Primary keys and foreign keys for the Entities.
Defined the query view, index options and relationships.
Created logical schema using ERWIN 4.0 and also created the Dimension Modeling for building the Cubes.
Designed staging and Error handling tables keeping in view the overall ETL strategy.
Assisted in creating the physical database by forward engineering.
ETL Process:
Extracted data from source systems transformed and loaded into Oracle database according to the required provision.
Primary on-site technical lead during the analysis, planning, design, development, and implementation stages of data quality projects using Integrity (now known as Quality Stage).
Involved in system analysis, design, development, support and documentation.
Created objects like tables, views, Materialized views procedures, packages using Oracle tools like PL/SQL, SQL*Plus, SQL*Loader and Handled Exceptions.
Involved in database development by creating Oracle PL/SQL Functions, Procedures, Triggers, Packages, Records and Collections.
Created views for hiding actual tables and to eliminate the complexity of the large queries.
Created various indexes on tables to improve the performance by eliminating the full table scans.
Used the DataStage Designer to develop processes for extracting, cleansing, transforming, integrating and loading data into Data Marts.
Created source table definitions in the DataStage Repository.
Identified source systems, their connectivity, related tables and fields and ensure data suitability for mapping.
Generated Surrogate ID’s for the dimensions in the fact table for indexed and faster access of data.
Created hash tables with referential integrity for faster table look-up and for transforming the data representing valid information.
Used built-in as well as complex transformations.
Used Data Stage Manager to manage the Metadata repository and for import/export of jobs.
Implemented parallel extender jobs for better performance using stages like Join, Merge, Sort and Lookup, transformer with different source files complex flat files, XML files.
Optimized job performance by carrying out Performance Tuning.
Created Stored Procedures to confirm to the Business rules.
Used Aggregator stages to sum the key performance indicators in decision support systems and for granularity required in DW.
Tuned DataStage transformations and jobs to enhance their performance.
Used the DataStage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting executable versions on an ad hoc or scheduled basis.
Scheduled Datastage job using Autosys scheduling tool.
Prepared the documentation of Data Acquisition and Interface System Design.
Assigned the tasks and provided technical support to the development team.
Monitored the development activities of the team and updated to the Management.
Created complicated reports using reporting tool Cognos.
Environment: IBM / Ascential Data Stage E.E./7.5(Manager, Designer, Director, Parallel Extender), Quality Stage 7.5 Data Stage BASIC language Expressions,Autosys, Erwin 4.0, Windows NT, UNIX, Oracle 9i, SQL SERVER, Cognos, Sequential files, .csv files.
Confidential, PAJan 2007 -- Sep 2008 Sr. Data Stage Developer As a DW developer designed, developed, and deployed DataStage Jobs and associated functionality. The warehouse employed highly complex data transformations including Slowly Changing Dimensions and a series of Stored Procedures, which made performance tuning and efficient mapping highly critical. Along with designing jobs from scratch re-wrote existing code to enhance performance and trouble-shoot errors in both DataStage & Oracle10G. Responsibilities:
Used IBM Datastage Designer to develop jobs for extracting, cleaning, transforming and loading data into data marts/data warehouse.
Developed several jobs to improve performance by reducing runtime using different partitioning techniques.
Used different stages of Datastage Designer like Lookup, Join, Merge, Funnel, Filter, Copy, Aggregator, and Sort etc.
Used to read complex flat files from mainframe machine buy using Complex Flat File Stage.
Sequential File, Aggregator, ODBC, Transformer, Hashed-File, Oracle OCI, XML, Folder, FTP Plug-in Stages were extensively used to develop the server jobs.
Use the EXPLAIN PLAN statement to determine the execution plan Oracle Database.
Worked on Complex data coming from Mainframes (EBCIDIC files) and knowledge of Job Control Language (JCL).
Used Cobol Copy books to import the Metadata information from mainframes.
Designed Datastage jobs using Quality Stage stages in 7.5 for data cleansing & data standardization Process. Implemented Survive stage & Match Stage for data patterns & data definitions.
Staged the data coming from various environments in staging area before into DataMarts.
Involved in writing Test Plans, Test Scenarios, Test Cases and Test Scripts and performed the Unit, Integration, system testing and User Acceptance Testing.
Used stage variables for source validations, to capture rejects and used Job Parameters for Automation of jobs.
Strong knowledge in creating procedures, functions, sequences, triggers.
Expertise in PLSQL/SQL.
Performed debugging and unit testing and System Integrated testing of the jobs.
Wrote UNIX shell script according to the business requirements.
Wrote customized server/parallel routines according to complexity of the business requirements.
Designed strategies for archiving of legacy data.
Created shell scripts to perform validations and run jobs on different instances (DEV, TEST and PROD).
Created & Deployed SSIS (SQL Server Integration Services) Projects, Schemas and Configured Report Server to generate reports through SSRS SQL Server 2005.
Used to create ad-hoc reports by MS SQL Server Reporting Services for the business users.
Used SQL Profiler to monitor the server performance, debug T-SQL and slow running queries.
Expertise in developing and debugging indexes, stored procedures, functions, triggers, cursors using T-SQL.
Wrote mapping documents for all the ETL Jobs (interfaces, Data Warehouse and Data Conversion activities).
Environment:IBM Web Sphere Data stage and Quality Stage 7.5, Ascential Datastage7.5/EE (Parallel Extender), SQL Server 2005/2008, Linux, Teradata 12, Oracle10g, Sybase, PL/SQL Toad, UNIX (HP-UX), Cognos 8 BI
Confidential, NJ Jr. DATASTAGE DEVELOPERJan 2006- Dec 2006
Merrill Lynchwas a global financial service provides capital markets services, investment banking and advisory services, wealth management, asset management, insurance, banking and related financial services worldwide.
Responsibilities:
Worked on the logical and physical design of the Data warehouse.Identified sources/targets and analyzed source data for dimensional modeling.
Good knowledge on Voluntary Insurance plans to employers to offer total Insurance packages.
Worked in design of Voluntary Disability, Voluntary Dental and Voluntary Life of data marts.
Good knowledge on policy and claims processing
Worked on integration of Health Claims ODS from legacy systems.
Designed and developed jobs for extracting, transforming, integrating, and loading data into data mart using DataStage Designer, used Data Stage manager for importing metadata from repository, new job categories and creating new data elements
Worked with EBCIDIC files to extract data in required format.
DataStage jobs were scheduled, monitored, performance of individual stages was analyzed and multiple instances of a job were run using DataStage Director.
Used Parallel Extenderfor splitting the data into subsets, utilized Lookup, Sort, Merge and other stages to achieve job performance
Used DS Erwin MetaBroker to import Erwin 4.x Metadata into DataStage Repository.
Developed user defined Routines and Transformations for implementing Complex business logic.
Extensively used Shared Containers and Job Sequencer to make complex jobs simple and to run the jobs in sequence
Involved in the preparation of ETL documentation by following the business rule, procedures and naming conventions.
Created reports for various Portfolios using the Universes as the main Data Providers.
Created the reports using Business Objects functionality’s like Queries, Slice and Dice, Drill Down, Cross Tab, Master Detail etc.
As a part of report development, created the reports using universes as a main data provider and using the Powerful business objects functionalities, and formulae. Involved in trouble shooting of various reporting errors.
Created Business Objects reports, Queries with constant interaction with the end users. Trained end users in understanding the reports. Functionalities such as Slice and Dice, Drill mode and Ranking were used for Multidimensional Formatting.
Web Intelligence was used to generate reports on the internet/intranet.
Exporting the Reports to the Broadcast Agent and Used the Broadcast Agent to Schedule, Monitor and Refresh the Reports.
Developed Test plans, Test Scenarios and Test cases for Code testing.
Trained team members
Provided 24/7 production support
Environment: IBM Web Sphere DataStage 7.5, Metastage 7.0, Business Objects 6.5, Oracle 9i, PL/SQL, SQL * Plus, UNIX Shell Scripts, Windows 2000/NT 4.0, ERWIN 4.1.
ConfidentialJune 2004 – Dec 2005 Jr. Datastage developer Description:ICICI Prudential Insurance provides a wide range of insurance policies such as Life Insurance, Health Insurance, Motor Vehicle Insurance and General Insurance etc. This project is developed as a process of automation for insurance policy management by using centralized data warehouse and Data Mart. This application provides the provision to take in various related information regarding the region, generates premiums and desired data in the form of reports.
Responsibilities:
Designed and developed mappings between sources and operational staging targets, using Star and Snow Flake Schemas.
Provided data models and data maps (extract, transform and load analysis) of the data marts for systems in the aggregation effort.
Involved in Extracting, cleansing, transforming, integrating and loading data into data warehouse using Datastage Designer.
Developed various transformations based on customer last name, zip code for internal business analytical purposes, loaded warehouse based on customer credit card number with dynamic data re-partitioning.
Developed user defined Routines and Transformations by using Universe Basic.
Used Datastage Manager for importing metadata from repository, new job categories and creating new data elements.
Used the Datastage Director and the runtime engine to schedule running the solution, testing and debugging its components and monitoring the resulting executable versions (on adhoc or scheduled basis).
Developed, maintained programs for scheduling data loading and transformations using Datastage and Oracle 8i.
Developed Shell scripts to automate file manipulation and data loading procedures.
Environment: Datastage 5.2/6.0, Oracle 8i, SQL, TOAD, UNIX, Windows NT 4.0.