DGSE - Developing a Google SRE Culture
Many IT organizations experience a disconnect between developers, who focus on agility, and operators, who focus on stability. Site Reliability Engineering (SRE) is how Google bridges the gap between development and operations, while also providing mission-critical production support. In this course, you'll learn the fundamentals and best practices of SRE, the importance of adopting an SRE culture, and how SRE can improve collaboration between IT and business leaders—and help the entire organization succeed.
- Explain why SRE is important to an organization’s IT transformation project's success
- Distinguish between DevOps and SRE
- Articulate the pillars of DevOps
- Explain how SRE practices align to DevOps pillars
- Understand the value SRE can provide to an organization
- Describe the technical and cultural fundamentals of SRE
- Assess organizational SRE maturity level
- Identify where SRE can be applied within the business
- Recognize the skills an SRE needs
- Articulate the different types of SRE team implementations
- Advocate for SRE culture adoption across the organization
- IT leaders and business leaders who are interested in embracing SRE philosophy. Roles include, but are not limited to: CTO, IT director/manager, engineering VP/director/manager.
- Secondary audience:
- Other product and IT roles such as operations managers or engineers, software engineers, service managers, or product managers may also find this content useful as an introduction to SRE.
The course includes presentations, demonstrations, and hands-on labs.
Module 1: Welcome to Developing a Google SRE Culture
Module 2: DevOps, SRE, and Why They Exist
- Explain why SRE is important to an organization’s IT transformation project's success.
Module 3: SLOs with Consequences
- Distinguish between DevOps and SRE.
- Articulate the pillars of DevOps.
- Explain how SRE practices align to DevOps pillars.
Module 4: Make Tomorrow Better than Today
- Understand the value SRE can provide to an organization.
- Describe the technical fundamentals of SRE (SLOs, error budgets, and blameless postmortems).
- Describe the cultural fundamentals of SRE (Psychological safety, blamelessness, unified vision, collaboration, and knowledge sharing).
Module 5: Regulate Workload
- Describe the technical fundamentals of SRE (continuous integration/continous delivery, canarying, and toil automation).
- Describe the cultural fundamentals of SRE (design thinking, prototyping, psychology of change, and resistance to change).
Module 6: Apply SRE in Your Organization
- Describe the technical fundamentals of SRE (measuring toil and reliability, and monitoring).
- Describe the cultural fundamentals of SRE (goal-setting, transparency, data-driven decision making).
Module 7: Final Assessment
- Assess their organization’s SRE maturity level.
- Identify where SRE can be applied within their business.
- Recognize the skills an SRE needs.
- Articulate the different types of SRE team implementations.
- Advocate for SRE culture adoption across their organization.
- Assess SRE technical and cultural fundamentals knowledge
No formal prerequisites required.
Recommended pre-reading: Site Reliability Engineering: How Google Runs Production Systems - Chapter 1 Introduction
This course is not associated with any certification.