Spark and Hadoop Training Course

Overview

Course Duration: 2 Days

Spark Fundamentals is a course that introduces you to the domain of spark development as well as gives you technical knowhow of the same. At the end of this course you will be able to earn a credential of Spark professional and you will be capable of dealing with Terabyte scale of data and analyze it successfully using spark and its ecosystem as Spark SQL.

Objectives

-
Big Data and Hadoop

  • What is Big Data
  • What is Hadoop 
  • How Hadoop Works 
  • How Hadoop and Spark are related 
  • Hadoop Ecosystem

Spark Architecture and Components
  • Spark Architecture
  • Spark Components

RDD in Depth

  • RDDs 
  • Creating RDDs from files 
  • Creating RDDs for another RDDs 
  • RDD operations 
  • Actions 
  • Transformations 
  • Pair RDDs 
  • Joins using RDD 
  • Map and Filter Transformation 
  • FlatMap Transformation
  • Caching and Persistence

Spark platforms
  • Spark local mode, YARN, Mesos and Standalone

Spark Hands On

  • Just Enough Python for Spark 
  • Basic operations on RDDs 
  • Pair RDD Hands On 
  • Building Spark Applications 
  • Submitting the Application over single node cluster 
  • Monitoring Spark Applications

Spark SQL & Dataframes

  • Spark SQL and the SQL Context 
  • Creating Dataframes 
  • Dataframe Queries and Transformations
  • Temp Tables/Views 
  • Easy Querying 
  • Saving Dataframes -Dataframes and RDDs 
  • Dataframe internals that makes it fast 
  • Catalyst Optimizer and Tungsten Load data into Spark from external data sources like databases 
  • Saving dataframe to external sources like HDFS, RDBMS
  • SQL features of Data frame -Accessing Hive tables from Spark 
  • Data formats – text format such csv, json, xml, binary formats such as parquet,orc 
  • UDF in Spark Dataframe  
  • Exposing Spark SQL as JDBC service and its benefits and limitations 
  • Hive Context vs Spark SQL Context

Spark Dataframes Hands On

  • Dataframes on a JSON file 
  • Dataframes on hive tables 
  • Dataframes on JSON Querying operations dataframes
-
Course ID:
SPARK-HADOOP


Show Schedule for: