ExistBI provide this 3-day Data Engineering Integration Administration training class to assist students set up a live DEI environment by performing various administrative tasks such as Hadoop integration, Databricks integration, security mechanism set up, Data Engineering recovery, monitoring, and performance tuning. Learn to integrate the Informatica domain with the Hadoop and Databricks eco-system leveraging Hadoop’s lightning processing capability, and Databricks’ analytics cloud platform technology to churn huge data sets. Applicable for users of software version 10.5. 

Objectives

After successfully completing this Data Engineering Integration course, you will be able to:

  • Describe DEI Architecture
  • List DEI Components
  • List the steps to enable SAML on the Domain
  • Create Cluster Configuration Object for Hadoop integration
  • Set up Informatica Security that includes different Authentication and Authorization mechanisms
  • Tune the performance of the system
  • Monitor, view, and troubleshoot DEI logs

Course Duration

  • 3-days
  • 60% lecture, 40% hands-on

Pre-requisites

  • None

Audience

  • Administrators

Official Agenda

Module 1: Data Engineering Integration (DEI) Overview

  • Data Engineering and the role of DEI in the Big Data ecosystem
  • DEI Components
  • DEI architecture
  • Roles and responsibilities of Informatica DEI Administrator
  • DEI engines: Blaze, Spark, and Databricks
  • DEI 10.5 features

Module 2: SAML Authentication

  • SAML overview
  • SAML authentication in a domain
  • Steps to enable SAML on an existing Informatica domain

Module 3: Hadoop Integration

  • Cluster Integration overview
  • Data Engineering Integration Component Architecture
  • Prerequisites for Hadoop integration
  • HDP integration tasks
  • Create a Cluster Configuration
  • Integration with Hadoop
  • Lab: Create Cluster Configuration Object
  • Lab: Explore Cluster Configuration Views
  • Lab: Cluster Configuration Privileges and Permissions

Module 4: Security Overview

  • DEI security
  • Security aspects
  • Authentication overview
  • Authorization overview

Module 5: Kerberos Authentication and Ranger Authorization

  • Kerberos Authentication
  • Ranger Authorization
  • Pre-steps to run mappings in a Kerberos-Enabled Hadoop Environment
  • Run mappings on a cluster with Kerberos authentication and Ranger authorization
  • Lab: Execute Pre-steps for Running Mappings in a Kerberos-Enabled Hadoop Environment
  • Lab: Run Mappings in a Kerberos-Enabled Hadoop Environment

Module 6: Operating System Profiles

  • Operating System profiles for Data Integration Service
  • Operating System profile components
  • Configure system permissions for the Operating System profile users
  • Enable the Data Integration Service to use Operating System profiles
  • Execute a mapping using OS profiles
  • Lab: Execute a mapping using OS profiles

Module 7: HDFS and Fine-Grained Authorization

  • Authorization
  • HDFS permissions
  • Fine-Grained authorization
  • Lab: Access Directories with HDFS Permissions
  • Lab: Run a Mapping with HDFS Permissions
  • Lab: Restrict Ranger Permissions for Hive Tables and Columns
  • Lab: Run a Mapping with Fine-Grained Authorization

Module 8: Data Engineering Recovery

  • DIS processing overview
  • DIS Queuing
  • Execution Pools
  • Data Engineering recovery
  • Monitor recovered jobs
  • Lab: Recover DIS and execute a Mapping using Data Engineering Recovery

Module 9: DEI Performance Tuning

  • DEI Deployment types
  • Sizing recommendations
  • Hadoop cluster Hardware tuning
  • Tune Spark performance
  • infacmd autotune command
  • Lab: Tune DIS and MRS using infacmd Autotune command

Module 10: Monitoring and Troubleshooting

  • Hadoop Environment Logs
  • Spark Engine Monitoring
  • Blaze Engine Monitoring
  • Cloud File Management Utility
  • Log Aggregation
  • Log Packer
  • File Watcher
  • Customer pain points and solutions

Module 11: Databricks Overview

  • Databricks overview
  • Steps to configure Databricks
  • Databricks clusters
  • Notebooks, Jobs, and Data
  • Delta Lakes
  • Sequence generator for Databricks
  • Databricks warm pool

Module 12: Databricks Integration

  • Databricks Integration
  • Components of the Informatica and the Databricks environments
  • Run-time process on the Databricks Spark Engine
  • Databricks Integration Task Flow
  • Prerequisites for Databricks integration