Spark and Hive Developer Training Course

Spark and Hive Developer Training Course

Overview

Course Duration: 5 Days

Hadoop Fundamentals is a one-stop course that introduces you to the domain of spark development as well as gives you technical know-how of the same. At the end of this course you will be able to earn a credential of Spark professional and you will be capable of dealing with Terabyte scale of data and analyze it successfully using spark and its ecosystem. 

Objectives

-
Big Data and Hadoop
  • What is Big Data 
  • What is Hadoop 
  • How Hadoop Works 
  • How Hadoop and Spark are related 
  • Hadoop Ecosystem
Just Enough Series
  • Just Enough Scala 
  • Introduction to Functional Programming 
  • Introduction to Scala 
  • Scala Syntax
        o Primitive and simple types
        o Control structures
        o Better Access Modifiers
        o Lazy Values 
        o Currying
  • Objects and Classes
        o Classes and Objects
        o Nulls, Nothing, and Units
        o Case Classes
        o Abstract Classes and Basic Traits

RDD in Depth
  • RDDs 
  • Creating RDDs from files 
  • Creating RDDs for another RDDs 
  • RDD operations 
  • Actions 
  • Transformations 
  • Pair RDDs 
  • Joins using RDD
Spark platforms
  • Spark local mode, YARN, Mesos and Standalone

Spark Hands On
  • Scala Spark Shell 
  • Basic operations on RDDs 
  • Pair RDD Hands On 
  • Building Spark Applications 
  • Submitting the Application over single node cluster 
  • RDD partitions 
  • Spark literature: Narrow, wide operations, shuffle, DAG, Shuffle, Stages, and Tasks 
  • Job metrics 
  • Fault Tolerance 
  • Configuring memory and CPU for Spark drivers and executors in standalone and YARN mode

Spark SQL & Dataframes
  • Spark SQL and the SQL Context 
  • Creating Dataframes 
  • Dataframe Queries and Transformations 
  • Saving Dataframes 
  • Dataframes and RDDs
Spark Dataframes Hands On 
  • Dataframes on a JSON file
  • Dataframes on hive tables 
  • Dataframes on JSON -Querying operations 
Spark SQL & Dataframes
  • What is Spark Streaming 
  • How it works 
  • DStreams 
  • Developing Spark Streaming Applications
Spark Streaming Hands On 
  • Running a Spark Streaming Application 
  • Kafka Integration for Real-Time streaming

Hive Fundamentals
  • Understanding of Hive 
  • Basic Intro
  • What is Hadoop and its Eco-System Components
  • Hive as a Data Warehouse 
  • Creating Tables for Analysis of data
  • Techniques of Loading Data into Tables. 
  • Difference between Internal and External Tables 
  • Understanding Hive Data Types 
  • Joining,Union datasets 
  • Join Optimizations 
  • Partitions and Bucketing 
  • Data Aggregation & Sampling 
  • GroupBy, Rollup, Cube, Having 
  • Performance Considerations Explain, Analyze 
  • Explain, Analyze 
  • Data File Considerations - Avro, ORC, RC, Parquet format 
  • Best Practices 
  • Hive Trouble Shooting 
  • Hive Best Practices
Hive Hands on
  • Hive Writing HSQL queries for data retrieval 
  • -Creating Tables for Analysis of data. 
  • -Techniques of Loading Data into Tables. 
  • -Difference between Internal and External Tables 
  • -Understanding Hive Data Types 
  • -Joining datasets - Inner Join, Outer Join, Cross Join 
  • -Writing Union Queries 
  • -Join Optimizations 
  • -Creating partitions and querying data
  • -Creating Buckets
  • Data Aggregation & Sampling – GroupBy, Rollup, Cube, Having
  • Performance Considerations – Explain, Analyze 
  • Data File Considerations – Avro, ORC, RC, Parquet format
-
Course ID:
SPARK-HIVE-DEV


Show Schedule for: