Vendors

This course explores software engineering best practices and DevOps principles, specifically designed for data engineers working with Databricks. Participants will build a strong foundation in key topics such as code quality, version control, documentation, and testing. The course emphasizes DevOps, covering core components, benefits, and the role of continuous integration and delivery (CI/CD) in optimizing data engineering workflows.

You will learn how to apply modularity principles in PySpark to create reusable components and structure code efficiently. Hands-on experience includes designing and implementing unit tests for PySpark functions using the pytest framework, followed by integration testing for Databricks data pipelines with DLT and Workflows to ensure reliability.

The course also covers essential Git operations within Databricks, including using Databricks Git Folders to integrate continuous integration practices. Finally, you will take a high level look at various deployment methods for Databricks assets, such as REST API, CLI, SDK, and Databricks Asset Bundles (DABs), providing you with the knowledge of techniques to deploy and manage your pipelines.

By the end of the course, you will be proficient in software engineering and DevOps best practices, enabling you to build scalable, maintainable, and efficient data engineering solutions.

img-course-overview.jpg

What You'll Learn

  • Software Engineering Best Practices, DevOps, and CI/CD Fundamentals
  • Continuous Integration (CI)
  • Introduction to Continuous Deployment (CD)

Who Should Attend

This course is designed for professionals who:

  • Are data engineers or pipeline developers tasked with building, testing and deploying data solutions using the Databricks Lakehouse Platform.
  • Want to adopt software-engineering best practices (modular code, unit & integration testing) and DevOps workflows (CI/CD, version control, deployment automation) in their data-engineering lifecycle.
  • Are familiar with the basics of PySpark, Git, notebooks and the Databricks workspace and now wish to deepen their capabilities around quality, reuse and release of data-engineering assets.
  • Are responsible for creating and maintaining production data pipelines (batch or streaming) and need to integrate testing, automation and deployment into their workflow to increase reliability and scale.
  • Are part of teams moving from ad hoc notebooks and manual pipelines toward governed, repeatable, automated data-engineering practices within Databricks.
img-who-should-learn.png

Prerequisites

  • Proficient knowledge of the Databricks platform, including experience with Databricks Workspaces, Apache Spark, Delta Lake and the Medallion Architecture, Unity Catalog, Delta Live Tables, and Workflows. A basic understanding of Git version control is also required.
  • Experience ingesting and transforming data, with proficiency in PySpark for data processing and DataFrame manipulations. Additionally, candidates should have experience writing intermediate level SQL queries for data analysis and transformation.
  • Knowledge of Python programming, with proficiency in writing intermediate level Python code, including the ability to design and implement functions and classes. Users should also be skilled in creating, importing, and effectively utilizing Python packages.

Learning Journey

Coming Soon...

Module 1. Software Engineering Best Practices, DevOps, and CI/CD Fundamentals

  • Introduction to Software Engineering (SWE) Best Practices
  • Introduction to Modularizing PySpark Code
  • Modularizing PySpark Code
  • DevOps Fundamentals
  • The Role of CI/CD in DevOps
  • Knowledge Check/Discussion

Module 2. Continuous Integration (CI)

  • Planning the Project
  • Project Setup Exploration
  • Introduction to Unit Tests for PySpark
  • Creating and Executing Unit Tests
  • Executing Integration Tests with DLT and Workflows
  • Performing Integration Tests with DLT and Workflows
  • Version Control with Git Overview

Module 3. Introduction to Continuous Deployment (CD)

  • Deplyoying Databricks Assets Overview (slides)
  • Deploying the Databricks Project

img-exam-cert

Frequently Asked Questions (FAQs)

None

Keep Exploring

Course Curriculum

Course Curriculum

Training Schedule

Training Schedule

Exam & Certification

Exam & Certification

FAQs

Frequently Asked Questions

img-improve-career.jpg

Improve yourself and your career by taking this course.

img-get-info.jpg

Ready to Take Your Business from Great to Awesome?

Level-up by partnering with Trainocate. Get in touch today.

Name
Email
Phone
I'm inquiring for
Inquiry Details

By submitting this form, you consent to Trainocate processing your data to respond to your inquiry and provide you with relevant information about our training programs, including occasional emails with the latest news, exclusive events, and special offers.

You can unsubscribe from our marketing emails at any time. Our data handling practices are in accordance with our Privacy Policy.