This 4-day IBM InfoSphere QualityStage Essentials training course (Code: KM213G) teaches you how to build QualityStage parallel jobs that investigate, standardize, match, and consolidate data records. Students will gain experience by building an application that combines customer data from three source systems into a single master customer record.
Course Duration
- Four-days of instructor led training
- 60% lecture, 40% hands-on
Target Audience
- Data Analysts responsible for data quality using QualityStage
- Data Quality Architects
- Data Cleansing Developers
Prerequisites
- Familiarity with the Windows operating system
- Familiarity with a text editor helpful, but not required, would be some understanding of elementary statistics principles such as weighted averages and probability.
Agenda
Module 1: Data Quality Issues
- Listing the common data quality contaminants
- Describing data quality processes
Module 2: QualityStage Overview
- Describing QualityStage architecture
- Describing QualityStage clients and their functions
Module 3: Developing with QualityStage
- Importing metadata
- Building DataStage/QualityStage Jobs
- Running jobs
- Reviewing results
Module 4: Investigate
- Building Investigate jobs
- Using Character Discrete, Concatenate, and Word Investigations to analyze data fields
- Reviewing results
Module 5: Standardize
- Describing the Standardize stage
- Identifying Rule Sets
- Building jobs using the Standardize stage
- Interpreting standardize results
- Investigating unhandled data and patterns
Module 6: Match
- Building a QualityStage job to identify matching records
- Applying multiple Match passes to increase efficiency
- Interpreting and improving Match results
Module 7: Survive
- Building a QualityStage survive job that will consolidate matched records into a single master record
Module 8: Two-Source Match
- Building a QualityStage job to match data using a reference match