Building Data Lakes on AWS
Module 1: Introduction to data lakes
- Describe the value of data lakes
- Compare data lakes and data warehouses
- Describe the components of a data lake
- Recognize common architectures built on data lakes
Module 2: Data ingestion, cataloging, and preparation
- Describe the relationship between data lake storage and data ingestion
- Describe AWS Glue crawlers and how they are used to create a data catalog
- Identify data formatting, partitioning, and compression for efficient storage and query
- Lab 1: Set up a simple data lake
Module 3: Data processing and analytics
- Recognize how data processing applies to a data lake
- Use AWS Glue to process data within a data lake
- Describe how to use Amazon Athena to analyze data in a data lake
Module 4: Building a data lake with AWS Lake Formation
- Describe the features and benefits of AWS Lake Formation
- Use AWS Lake Formation to create a data lake
- Understand the AWS Lake Formation security model
- Lab 2: Build a data lake using AWS Lake Formation
Module 5: Additional Lake Formation configurations
- Automate AWS Lake Formation using blueprints and workflows
- Apply security and access controls to AWS Lake Formation
- Match records with AWS Lake Formation FindMatches
- Visualize data with Amazon QuickSight
- Lab 3: Automate data lake creation using AWS Lake Formation blueprints
- Lab 4: Data visualization using Amazon QuickSight
Module 6: Architecture and course review
- Post course knowledge check
- Architecture review
- Course review
Building Batch Data Analytics Solutions on AWS
Module A: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
- Using the data pipeline for analytics
Module 1: Introduction to Amazon EMR
- Using Amazon EMR in analytics solutions
- Amazon EMR cluster architecture
- Interactive Demo 1: Launching an Amazon EMR cluster
- Cost management strategies
Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage
- Storage optimization with Amazon EMR
- Data ingestion techniques
Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR
- Apache Spark on Amazon EMR use cases
- Why Apache Spark on Amazon EMR Spark concepts
- Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the Spark shell
- Transformation, processing, and analytics
- Using notebooks with Amazon EMR
- Practice Lab 1: Low-latency data analytics using Apache Spark on Amazon EMR
Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive
- Using Amazon EMR with Hive to process batch data
- Transformation, processing, and analytics
- Practice Lab 2: Batch data processing using Amazon EMR with Hive
- Introduction to Apache HBase on Amazon EMR
Module 5: Serverless Data Processing
- Serverless data processing, transformation, and analytics
- Using AWS Glue with Amazon EMR workloads
- Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions
Module 6: Security and Monitoring of Amazon EMR Clusters
- Securing EMR clusters
- Interactive Demo 3: Client-side encryption with EMRFS
- Monitoring and troubleshooting Amazon EMR clusters
- Demo: Reviewing Apache Spark cluster history
Module 7: Designing Batch Data Analytics Solutions
- Batch data analytics use cases
Building Streaming Data Analytics Solutions on AWS
Module A: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
- Using the data pipeline for analytics
Module 1: Using Amazon Redshift in the Data Analytics Pipeline
- Why Amazon Redshift for data warehousing?
- Overview of Amazon Redshift
Module 2: Introduction to Amazon Redshift
- Amazon Redshift architecture
- Interactive Demo 1: Touring the Amazon Redshift console
- Amazon Redshift features
- Practice Lab 1: Load and query data in an Amazon Redshift cluster
Module 3: Ingestion and Storage
- Ingestion
- Interactive Demo 2: Connecting your Amazon Redshift cluster using a Jupyter notebook with Data API
- Data distribution and storage
- Interactive Demo 3: Analyzing semi-structured data using the SUPER data type
- Querying data in Amazon Redshift
- Practice Lab 2: Data analytics using Amazon Redshift Spectrum
Module 4: Processing and Optimizing Data
- Data transformation
- Advanced querying
- Practice Lab 3: Data transformation and querying in Amazon Redshift
- Resource management
- Interactive Demo 4: Applying mixed workload management on Amazon Redshift
- Automation and optimization
- Interactive demo 5: Amazon Redshift cluster resizing from the dc2.large to ra3.xlplus cluster
Module 5: Security and Monitoring of Amazon Redshift Clusters
- Securing the Amazon Redshift cluster
- Monitoring and troubleshooting Amazon Redshift clusters
Module 6: Designing Data Warehouse Analytics Solutions
- Data warehouse use case review
- Activity: Designing a data warehouse analytics workflow
Module B: Developing Modern Data Architectures on AWS
- Modern data architectures
Building Data Analytics Solutions Using Amazon Redshift
Module A: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
- Using the data pipeline for analytics
Module 1: Using Amazon Redshift in the Data Analytics Pipeline
- Why Amazon Redshift for data warehousing?
- Overview of Amazon Redshift
Module 2: Introduction to Amazon Redshift
- Amazon Redshift architecture
- Interactive Demo 1: Touring the Amazon Redshift console
- Amazon Redshift features
- Practice Lab 1: Load and query data in an Amazon Redshift cluster
Module 3: Ingestion and Storage
- Ingestion
- Interactive Demo 2: Connecting your Amazon Redshift cluster using a Jupyter notebook with Data API
- Data distribution and storage
- Interactive Demo 3: Analyzing semi-structured data using the SUPER data type
- Querying data in Amazon Redshift
- Practice Lab 2: Data analytics using Amazon Redshift Spectrum
Module 4: Processing and Optimizing Data
- Data transformation
- Advanced querying
- Practice Lab 3: Data transformation and querying in Amazon Redshift
- Resource management
- Interactive Demo 4: Applying mixed workload management on Amazon Redshift
- Automation and optimization
- Interactive demo 5: Amazon Redshift cluster resizing from the dc2.large to ra3.xlplus cluster
Module 5: Security and Monitoring of Amazon Redshift Clusters
- Securing the Amazon Redshift cluster
- Monitoring and troubleshooting Amazon Redshift clusters
Module 6: Designing Data Warehouse Analytics Solutions
- Data warehouse use case review
- Activity: Designing a data warehouse analytics workflow
Module B: Developing Modern Data Architectures on AWS
- Modern data architectures
Exam Readiness Workshop: AWS Certified Data Analytics Specialty
- Testing center information and expectations
- Exam overview and structure
- Content domains and question breakdown
- Topics and concepts within content domains
- Question structure and interpretation techniques