C110044G - Designing for Failure and Recovering from Failure
The Designing for Failure and Recovering from Failure module covers how to use practices to design for failure. The module includes a closer look into three specific practices and then explores the concept of recovering from failure invocation.
- Design and implement a circuit-breaker pattern for failure management
- Limit the effects of outages by using limited blast radius practices
- Improve application resiliency with chaos testing
- Learn software testing resiliency practices
- Learn when to Use Availability Zones versus Multi-zone Region failure domains on IBM Cloud
This course is intended for learners who are pursuing professional-level site reliability engineer certification on IBM Cloud.
Module Introduction
- Topic 1: Designing for Failure
- Topic 2: Designing and Implementing Circuit-breaker Pattern for Failure
- Topic 3: Limiting Effects of an Outage Using Limited Blast Radius Practices
- Topic 4: Improving Application Resiliency with Chaos Testing
- Topic 5: Software Testing Resiliency Practices
- Topic 6: When to Use Availability Zones Versus Multi-zone Regions Failure Domains on IBM Cloud
Module Summary
Before starting this curriculum, the target audience should understand:
- System Thinking
- DevOps practices
- Cloud Architecture
- Software engineering principles
- System administration
- Network and OSI model
- Networking and security practices for IBM Cloud
- Incident management
- Root cause analysis
The target audience should also be able to:
- Proficiently write code
- Create run books as a reference
- Make system components serviceable
- Interpret data and statistics to determine actions
- Use LogDNA, SysDig, Grafana, Prometheus, Kibana
- Interpret schematics
- Drive incidents to resolution
- Remediate underlying sources of unreliability
- Create and configure VMs
- Create and configure Containers on IBM Kubernetes Service (IKS)/Red Hat OpenShift Kubernetes Services (ROKS)
- Create and configure Containers using OpenShift
- Create and configure Serverless applications
- Configure for high availability and scalability