trainocate-advanced-technology-courses-b

SPARK-HIVE-DEV - Spark and Hive Developer Training Course

Overview

Duration: 5 days

Hadoop Fundamentals is a one-stop course that introduces you to the domain of spark development as well as gives you technical know-how of the same. At the end of this course you will be able to earn a credential of Spark professional and you will be capable of dealing with Terabyte scale of data and analyze it successfully using spark and its ecosystem. 


Objectives

Please refer to course overview

Content

Big Data and Hadoop 

  • What is Big Data 
  • What is Hadoop 
  • How Hadoop Works 
  • How Hadoop and Spark are related 
  • Hadoop Ecosystem

Just Enough Series 
1. Just Enough Scala 

  • Introduction to Functional Programming 
  • Introduction to Scala 
  • Scala Syntax
  • Primitive and simple types
  • Control structures
  • Better Access Modifiers
  • Lazy Values 
  • Currying

2. Objects and Classes

  • Classes and Objects
  • Nulls, Nothing, and Units
  • Case Classes
  • Abstract Classes and Basic Traits

RDD in Depth 

  • RDDs 
  • Creating RDDs from files 
  • Creating RDDs for another RDDs 
  • RDD operations 
  • Actions 
  • Transformations 
  • Pair RDDs 
  • Joins using RDD

Spark platforms 

  • Spark local mode, YARN, Mesos and Standalone

Spark Hands On 

  • Scala Spark Shell 
  • Basic operations on RDDs 
  • Pair RDD Hands On 
  • Building Spark Applications 
  • Submitting the Application over single node cluster 
  • RDD partitions 
  • Spark literature: Narrow, wide operations, shuffle, DAG, Shuffle, Stages, and Tasks 
  • Job metrics 
  • Fault Tolerance 
  • Configuring memory and CPU for Spark drivers and executors in standalone and YARN mode

Spark SQL & Dataframes 

  • Spark SQL and the SQL Context 
  • Creating Dataframes 
  • Dataframe Queries and Transformations 
  • Saving Dataframes 
  • Dataframes and RDDs

Spark Dataframes Hands On 

  • Dataframes on a JSON file
  • Dataframes on hive tables 
  • Dataframes on JSON -Querying operations 

Spark SQL & Dataframes 

  • What is Spark Streaming 
  • How it works 
  • DStreams 
  • Developing Spark Streaming Applications

Spark Streaming Hands On 

  • Running a Spark Streaming Application 
  • Kafka Integration for Real-Time streaming

Hive Fundamentals 

  • Understanding of Hive 
  • Basic Intro
  • What is Hadoop and its Eco-System Components
  • Hive as a Data Warehouse 
  • Creating Tables for Analysis of data
  • Techniques of Loading Data into Tables. 
  • Difference between Internal and External Tables 
  • Understanding Hive Data Types 
  • Joining,Union datasets 
  • Join Optimizations 
  • Partitions and Bucketing 
  • Data Aggregation & Sampling 
  • GroupBy, Rollup, Cube, Having 
  • Performance Considerations Explain, Analyze 
  • Explain, Analyze 
  • Data File Considerations - Avro, ORC, RC, Parquet format 
  • Best Practices 
  • Hive Trouble Shooting 
  • Hive Best Practices

Hive Hands on 

  • Hive Writing HSQL queries for data retrieval 
  • Creating Tables for Analysis of data. 
  • Techniques of Loading Data into Tables. 
  • Difference between Internal and External Tables 
  • Understanding Hive Data Types 
  • Joining datasets - Inner Join, Outer Join, Cross Join 
  • Writing Union Queries 
  • Join Optimizations 
  • Creating partitions and querying data
  • Creating Buckets
  • Data Aggregation & Sampling – GroupBy, Rollup, Cube, Having
  • Performance Considerations – Explain, Analyze 
  • Data File Considerations – Avro, ORC, RC, Parquet format

Audience

For: Typically professionals with basic knowledge of software development, programming languages, and databases will find this course really helpful. Basic knowledge should be enough to succeed at this course

Not For: Students who are absolute beginners at software development as a discipline will find it difficult to follow the course

Prerequisites

N/A

Certification

Trainocate Certificate of Attendance

Schedule

Show Schedule for: