InfoSphere BigMatch v11.4 for Apache Hadoop (ZZ850G)

InfoSphere BigMatch v11.4 for Apache Hadoop (ZZ850G)

Overview

Duration: 2 days

The IBM InfoSphere Big Match on Hadoop course will introduce students to the Probabilistic Matching Engine (PME) and how it can be used to resolve and discover entities across multiple data sets in Hadoop. 
Students will learn the basics of a PME algorithm including data model configuration, standardization, comparison and bucketing functions, weight generation, and threshold.
During the exercises, the student will work on a large use case, where they will apply their knowledge of Big Match to discover relationships be two data sets that can be used to understand the full view of the member data.

Objectives

  • Understand the capabilities of the Probabilistic Matching Engine
  • Understand how the Probabilistic Matching engine is used with Big Insights to solve certain use cases.   
  • Understand the technical framework of the Big Match solution and how member data is derived, bucketed and compared to produce a complete entity from multiple data sets.  
  • Create a project and data model using the Big Match Console  
  • Configure the HBase tables that will be used in a Big Match solution  
  • Configure an algorithm using he Big Match console that includes Standardization, Comparison and Bucketing functions.   
  • Set up Strings for Anonymous value, Equivalency values, Frequency values, and character maps using the Big Match console   
  • Set up and run the Weight Generation process   
  • Evaluate and set thresholds for the algorithm   
  • Deploy a new algorithm to Big Match   
  • Evaluate Entity results and reconfigure algorithm based on evaluation.  E.g. Large Buckets, Large Entities, Member not belonging to any buckets, etc  

Course Outline

Unit : 1 Introduction to Big Match for Apache Hadoop

  • What is Big Match
  • How Big Match Works
  • Big Match Components
  • Big Match Architecture

 

Unit :2 Big Match Data Model Definition

  • Members
  • Attribute Types
  • Member Attributes
  • Sources
  • Information Sources

 

Unit : 3 PME Algorithm

  • Standardization
  • Bucketing
  • Comparison Functions

 

Unit : 4 Bucket Analysis

  • Bucket Optimization
  • Bucket Concerns

 

Unit : 5 Weights

  • String Weights
  • Numeric Weights
  • Multi-dimensional Weights
  • Troubleshooting Weights

 

Unit : 6 HBase Tables

  • HBase concepts
  • Big Match commands
  • Big Match Tables (.pmebktidx, .pmemdmidx, .pmeentidx)
  • Best Practices

 

Unit : 7 BigMatch Applications

  • PME Derive
  • PME Compare
  • PME Link
  • PME Analysis
This course has no pre-requisites.
Course ID:
ZZ850G


Show Schedule for:

Please provide as much information as possible for us to help you with your enquiry.