trainocate-advanced-technology-courses-b
Home > Vendors > Advanced-Technology-Courses > big-data-hadoop

BIG-DATA-HADOOP - Big Data Hadoop

Overview

Duration: 9 days
The Big Data Hadoop training is designed to give you an in-depth knowledge of the Big Data framework using Hadoop and Spark. In this hands-on Hadoop course, you will execute real-life, industry-based projects using Integrated Lab.

Objectives

  • Big Data Hadoop Training is designed by industry experts to make you a Big Data Practitioner. The Big Data Hadoop course offers: 
  • In-depth knowledge of Big Data and Hadoop including HDFS (Hadoop Distributed File System), YARN (Yet Another Resource Negotiator) & Spark 
  • Comprehensive knowledge of various tools that fall in Hadoop Ecosystem like Kafka, NiFi, HBase
  • The capability to ingest data in HDFS using Sqoop & Flume, and analyze those large datasets stored in the HDFS
  • The exposure to many real-world industry-based projects which will be executed in Lab 
  • Projects which are diverse in nature covering various data sets from multiple domains such as banking, telecommunication, social media, insurance, and e-commerce 
  • Rigorous involvement of a Hadoop expert throughout the Big Data Hadoop Training to learn industry standards and best practices Big Data Hadoop Certification Training is designed by industry experts to make you a Certified Big Data Practitioner. The Big Data Hadoop course offers: 
  • In-depth knowledge of Big Data and Hadoop including HDFS (Hadoop Distributed File System), YARN (Yet Another Resource Negotiator) & MapReduce 
  • Comprehensive knowledge of various tools that fall in Hadoop Ecosystem like Spark, Hive, Sqoop,NIFI and HBase 
  • The capability to ingest data in HDFS using Sqoop & Flume, and analyze those large datasets stored in the HDFS 
  • The exposure to many real-world industry-based projects which will be executed in Lab 
  • Projects which are diverse in nature covering various data sets from multiple domains such as banking, telecommunication, social media, insurance, and e-commerce 
  • Rigorous involvement of a Hadoop expert throughout the Big Data Hadoop Training to learn industry standards and best practices

Content

1 SCALA (Object Oriented and Functional Programming)

1.1 Getting started With Scala.
1.2 Scala Background, Scala Vs Java and Basics.
1.3 Interactive Scala – REPL, data types, variables, expressions, simple functions.
1.4 Running the program with Scala Compiler.
1.5 Explore the type lattice and use type inference
1.6 Define Methods and Pattern Matching.

2 Scala Environment Set up.
2.1 Scala set up on Windows.
2.2 Scala set up on UNIX.

3 Functional Programming.
3.1 What is Functional Programming?
3.2 Differences between OOPS and FPP.

4 Collections (Very Important for Spark)
4.1 Iterating, mapping, filtering and counting
4.2 Regular expressions and matching with them.
4.3 Maps, Sets, group By, Options, flatten, flat Map
4.4 Word count, IO operations,file access, flatMap

5 Object Oriented Programming.
5.1 Classes and Properties.
5.2 Objects, Packaging and Imports.
5.3 Traits.
5.4 Objects, classes, inheritance, Lists with multiple related types, apply

6 Integrations
6.1 What is SBT?
6.2 Integration of Scala in Eclipse IDE.
6.3 Integration of SBT with Eclipse.

7 SPARK CORE
7.1 Batch versus real-time data processing 7.2 Introduction to Spark, Spark versus Hadoop
7.3 Architecture of Spark.
7.4 Coding Spark jobs in Scala
7.5 Exploring the Spark shell -> Creating Spark Context.
7.6 RDD Programming
7.7 Operations on RDD.
7.8 Transformations 7.9 Actions
7.10 Loading Data and Saving Data.
7.11 Key Value Pair RDD.
7.12 Broad cast variables.

8 Persistence.
8.1 Configuring and running the Spark cluster.
8.2 Exploring to Multi Node Spark Cluster.
8.3 Cluster management
8.4 Submitting Spark jobs and running in the cluster mode.
8.5 Developing Spark applications in Eclipse
8.6 Tuning and Debugging Spark.

9 HBase 9.1 What is HBASE
9.2 HBase architecture
9.3 Installing HBase.
9.4 HBase storage mechanism ().
9.5 Creating a database.
9.6 Create a table
9.7 Inserting Data
9.8 Modeling Data.
9.9 Performing basic DML operations.
9.10 Different types of access patterns

10 SPARK INTEGRATION WITH NO SQL HIVE and HBASE
10.1 Introduction to Spark HIVE and Spark HBASE Connectors.
10.2 Spark With Hive and HBASE -> Set up.
10.3 Creating Spark Context to connect the Hive and HBase.
10.4 Creating Spark RDD on the Hive and HBase.
10.5 Performing Transformation and Actions on the Hive and HBase RDD. 10.6 Running Spark Application in Eclipse to access the data in the Hive and HBase.

11 SPARK STREAMING
11.1 Introduction of Spark Streaming.
11.2 Architecture of Spark Streaming
11.3 Processing Distributed Log Files in Real Time
11.4 Discretized streams RDD.
11.5 Applying Transformations and Actions on Streaming Data
11.6 Integration with Flume and Kafka.
11.7 Integration with Hive and HBase
11.9 Monitoring streaming jobs.

12 SPARK SQL

12.1 Introduction to Apache Spark SQL
12.2 The SQL context
12.3 Importing and saving data
12.4 Processing the Text files, JSON and Parquet Files
12.5 Data Frames
12.6 user-defined functions
12.7 Using Hive
12.8 Local Hive Metastore server

NIFI
  • Introduction
  • Introduction to Cloudera Flow Management
  • Processors
  • Connections
  • Dataflows
  • Process Groups
  • FlowFile Provenance
  • Dataflow Templates
  • Apache NiFi Registry
  • FlowFile Attributes
  • Nifi Expression Language
  • DataFlow Optimization
  • NiFi Architecture
  • Site-to-Site Dataflows
  • Cloudera Edge Management and MiNiFi
  • Monitoring and Reporting
  • Controller Service
  • Integrating NiFi with the Cloudera Ecosystem
  • NiFi Security Conclusion

Kafka
  • Features
  • Terminology
  • Pros and Cons
  • Applications
  • Architecture
  • Workflow
  • Cluster
  • Producer
  • Consumer
  • Offset Management
  • Broker
  • Queuing
  • Client
  • Connect
  • Topic
  • Monitoring Tools
  • Operations
  • Role of Zookeeper
  • Kafka Streams
  • Kafka Spark Streaming
  • Kafka Performance Tuning
  • Kafka Load Testing
  • Serialization and Deserealization
  • Kafka Schema Registry
  • Security
  • Kafka Vs Other Messaging Queues

Audience

Anyone with some development experience in Java or any oop language with fair understanding in Linux operating systems.

Prerequisites

-

Certification

Trainocate Certificate of Attendance-

Schedule

Show Schedule for: