Module 1: Set up and configure an Azure Databricks environment
Build a solid foundation in Azure Databricks by understanding its architecture, integrations, compute options, and data organization capabilities. Learn how Azure Databricks provides a unified platform for data engineering, analytics, and AI workloads in the cloud.
Module 2: Explore Azure Databricks
Azure Databricks is a cloud service that provides a scalable platform for data analytics using Apache Spark.
Module 3: Understand Azure Databricks architecture
Azure Databricks architecture separates control and compute planes while organizing resources through a hierarchical structure. This module explores how the account hierarchy works, the differences between serverless and classic compute planes, and the various storage options available including default storage, external storage, and Unity Catalog managed storage for organizing and governing your data.
Module 4: Understand Azure Databricks Integrations
Azure Databricks integrates with multiple Microsoft services to provide end-to-end data engineering, analytics, and AI capabilities. This module explores how Azure Databricks works with Microsoft Fabric, Power BI, Visual Studio Code, Power Platform, Copilot Studio, Microsoft Purview, and Microsoft Foundry to enable comprehensive solutions that combine data lakehouse capabilities with business intelligence, application development, and conversational AI.
Module 5: Select and Configure Compute in Azure Databricks
Azure Databricks provides multiple compute options optimized for different workloads. This module explores how to choose the right compute type, configure performance settings, manage access permissions, and install libraries. You'll learn when to use serverless versus classic compute, how to optimize clusters for cost and performance, and best practices for securing compute resources.
Module 6: Create and organize objects in Unity Catalog
Unity Catalog's three-layer namespace—catalogs, schemas, and objects—provides a flexible foundation for organizing data assets while maintaining centralized governance. This module explores how to create catalogs for environment isolation, organize schemas within those catalogs, and create tables, views, and volumes for structured and unstructured data. You'll learn to implement foreign catalogs for external database access, apply effective naming conventions, and configure AI/BI Genie instructions to enhance data discoverability.
Module 7: Secure Unity Catalog objects
Unity Catalog provides centralized governance and security for data assets in Azure Databricks. This module explores how to secure Unity Catalog objects through access control strategies, fine-grained permissions, credential management, and authentication mechanisms. You'll learn how to implement table and schema-level security, enforce row and column filtering, securely access secrets from Azure Key Vault, and authenticate data access using service principals and managed identities.
Module 8: Govern Unity Catalog objects
This module covers essential governance practices in Unity Catalog, enabling you to secure, monitor, and manage your data estate effectively. You will learn how to implement fine-grained access control, track data lineage, configure audit logs, and share data securely.
Module 9: Design and implement data modeling with Azure Databricks
Effective data modeling forms the foundation of a performant and maintainable data platform. This module explores how to design ingestion logic, select appropriate tools and table formats, implement partitioning schemes, manage slowly changing dimensions, choose appropriate data granularity, and optimize table performance through clustering strategies in Azure Databricks with Unity Catalog.
Module 10: Ingest data into Unity Catalog
Data ingestion is a fundamental capability for any data platform. This module explores the comprehensive set of techniques available in Azure Databricks for loading data into Unity Catalog tables. You'll learn how to use managed connectors with Lakeflow Connect, write custom ingestion code in notebooks, apply SQL commands for batch file loading, process change data capture feeds, configure streaming ingestion from message buses, set up Auto Loader for automatic file detection, and orchestrate ingestion workflows with Lakeflow Spark Declarative Pipelines.
Module 11: Cleanse, transform, and load data into Unity Catalog
Data engineering requires transforming raw data into clean, well-structured formats ready for analysis. This module explores techniques for profiling data quality, selecting appropriate column types, resolving duplicates and null values, applying filtering and aggregation transformations, combining datasets with joins and set operators, reshaping data through pivoting and denormalization, and loading transformed data using append, overwrite, and merge strategies.
Module 12: Implement and manage data quality constraints with Azure Databricks
This module explores strategies for maintaining high data quality in Azure Databricks. You will learn how to implement validation checks, enforce schemas, manage schema drift, and use pipeline expectations to ensure data integrity throughout your data pipelines.
Module 13: Design and implement data pipelines with Azure Databricks
Learn to design and implement robust data pipelines in Azure Databricks using notebooks and Lakeflow Spark Declarative Pipelines, covering orchestration, error handling, and task logic.
Module 14: Implement Lakeflow Jobs with Azure Databricks
This module guides you through the process of implementing Lakeflow Jobs in Azure Databricks. You will learn how to create jobs, configure triggers and schedules, set up alerts, and manage automatic restarts to ensure reliable data pipeline execution.
Module 15: Implement development lifecycle processes in Azure Databricks
Azure Databricks integrates with established development practices through Git folders for version control and Databricks Asset Bundles for infrastructure-as-code deployments. This module explores Git version control best practices, branching and pull request workflows, comprehensive testing strategies, and CLI-based bundle deployment across environments.
Module 16: Monitor, troubleshoot and optimize workloads in Azure Databricks
Monitoring and optimization are essential for running reliable, cost-effective data workloads in Azure Databricks. This module explores cluster consumption metrics, Lakeflow Jobs troubleshooting, Spark job diagnostics, performance optimization for caching, skew, spill, and shuffle issues, and log streaming to Azure Log Analytics.