Banking & Finance Industry Customer
The bank wanted to develop real time trigger system to flag suspicious wire transaction based on customer & Account amounts, payment velocity, preferred channel, payment designation, transaction types, etc. Unusual transactions were quickly identified and blocked until reviewed by fraud investigators.
Informatica BDM platform was used to build pipelines for both batch and real time data sets which were fed into the fraud detection data model. The batch data consisting of customer and product information was profiled, cleansed and standardized using Informatica Data Quality. The real time data was streamed using IIS and joined with the batch data sets.
This provided the fraud detection algorithms a clean and consistent view of the data and reduce false positives resulting in better customer experience.
Technical Stack: Informatica BDM/DQ, Informatica Intelligent Streaming (IIS), Cloudera Hadoop Cluster
Actions:
- Gather requirements, understand the technical landscape and make appropriate recommendations
- Design and implement a data ingestion and processing architecture for batch and real time data
- Implement data quality rules for data standardization
- Help in fine tuning of the target data model for machine learning and Big Data Analytics
- Tableau for Dashboard Development and Visulizations
Outcome:
Oil & Gas Industry Customer
Informatica Big Data Management and Intelligent Streaming were leveraged by this global energy company to monitor real time data coming in from the IOT oil well sensors.
The goal was to be able to monitor parameters like flow rate, temperature, pressure and take corrective action if they deviate from pre-determined thresholds.
Informatica BDM helped them build a framework which was fault tolerant and they could now take actions to mitigate risks and save money as operating conditions change.
Actions:
- Gather requirements, understand the technical landscape and make appropriate recommendations
- Design and implement a data ingestion and processing architecture for batch and real time data
- Help in fine tuning of the target data model for machine learning and Big Data Analytics
- Tableau for Dashboard Development and Visulizations
Outcome:
Insurance Industry Customer
Based on our understanding of Montana State Funds’ needs and our prior experience implementing Big Data Solutions for Insurance customers, ExistBI is recommending an approach which is designed for early delivery of key functionality and rapid time-to-value. The initial phase of the project will deliver the core components of the Big Data Pilot project. We will accomplish this by bringing in our technical leaders from our ExistBI team to enable the core capabilities of the solution. We will provide early collaboration with the Business stakeholders to broaden their perspective on what is possible, which will increase the impact and effectiveness of the new Big Data Platform for visual data discovery.
The ExistBI Solution
Discovery and Design: The Initial phase is the Discovery. During this time, the team will review the business requirements and gather details specific to the build of the data ingestion, profiling and transformation strategy.
Key activities include:
- Detail requirements within the first week and authoring of all design and technical specifications of the solution
- Evaluation of the integration and data cleansing to the new Data Lake with existing data sources in scope
- Creation of a Project Plan to deliver the pilot and define success criteria
Build: This phase will include the build of the data lake solution defined within the technical design documents that meet the business requirements. Key activities include:
- Building out the Informatica BDM jobs that will ingest the data sources and load them into the data lake
- Data Profiling and creation of standardization rules
- ETL Unit Testing
UAT: UAT is an important part of the solution to validate if the build adheres to the business requirements. Here we flush out and bugs and issues with the development and receive sign off. Testing scripts and results are provided to Montana State Fund team for verification on both throughput and accuracy. During this phase, sign-off is requested for all BDM loads. Key activities include:
- ExistBI to support UAT with business users for sign off on the ExistBI solution
- Support to fix any defects arising in UAT
- Business User Sign Off
ExistBI will bring our experienced team of subject matter experts to work with Montana State Fund to verify the functional and non-functional requirements and overall fit. The key deliverables from this work stream are:
– Obtain and Confirm business user requirements from Montana State Fund
– Define the data ingestion and transformation process
– Help with the creation Source to Target Mappings (STTM) and integrate the data sources into the data lake
– Create Informatica BDM mappings and DQ routines to load and cleanse source data into data lake
– Facilitate UAT with business users
At the conclusion of this pilot, Montana State Fund will have a functional Informatica BDM platform for data ingestion, cleansing and orchestration in place which will provide the data needs across the entire spectrum of subject areas defined by Montana State Fund business stakeholders.
For data ingestion, transformations and integration, ExistBI will utilize Informatica BDM platform.
Technology Components
The proposed solution will use the technology components identified during initial requirement gathering.
Technology | Usage |
Cloudera Cluster | Big Data Lake – Storage and Transformation
|
Informatica Big Data Management | Data Integration – ETL Development Data Quality – Profiling and Standardization |
Solution Architecture
- Raw Dataset
The source data is ingested as-is into the data lake in Raw dataset zone using BDM mappings.
- Transformed Dataset
The Raw Dataset is cleansed, standardized, transformed and loaded into the Transformed Dataset zone
- Curated Dataset
The transformed dataset is joined with other data sources using appropriate keys and lookup tables. This data in now curated and ready for consumption by visualization tools
This architectural solution will bring Montana State Fund the following benefits:
- Bring disparate data sources together
- More meaningful cleansed data
- Preserve the business subject areas
- Standardized attributes
- Ability to slice and dice, drill-down and drill-up
- Easy visualization and usage
Folow-Up Project
Montana State Fund has installed Informatica’s Enterprise Data Catalog (EDC) and the next step is to implement the Enterprise Data Catalog. As part of implementation – the following set of activities was in scope:
A. Review and discuss essential use cases and best practices for EDC including:
1. Onboarding of assets – data source scanning (the number of data sources to be scanned will not exceed 8)
2. EDC system attribute curation
3. Domain discovery
4. Domain discovery curation
5. Discuss and demonstrate concept of resource
6. Discuss and Demonstrate concept of lineage
7. Metadata enrichment
8. Discuss EDC metadata validation process
9. Demonstrate Tableau-EDC & EDC-Tableau integration
10. Discuss & explore notification options and how to leverage them
11. Demonstrate EDC data provisioning feature
B. Conduct enablement sessions for EDC Administrator, EDC technical resources
C. Conduct enablement sessions for business end-users such as data stewards, analysts and consumers.
D. Discuss ongoing rollout plan and process beyond the initial pilot.
E. Discuss operational recommendations to schedule resources and to maintain the catalog on an ongoing basis.
Documentation to be provided to MSF:
A. EDC Best Practices
B. EDC Metadata Resources onboarding document
C. EDC rollout plan
D. EDC role-based security model
E. EDC enablement document
Duration of Implementation:
All work was performed remotely during US business hours (not at MSF’s Helena, Montana Office as the office building was closed) as interfacing and demonstrating to MSF Business Users and Technical Staff was a significant aspect of the engagement.