C110018G - Managing Incidents on IBM Cloud
The Managing Incidents on IBM Cloud module covers incident characteristics, implementing alerts for incident thresholds, creating runbooks for troubleshooting and mitigating incidents, and problem-solving techniques.
- Learn about incident characteristics
- Understand the incident management process
- Learn how to implement alerts for incident thresholds
- Gain an appreciation for the impacts of incidents on upstream and downstream processes
- Learn how to create runbooks to troubleshoot and mitigate the most common incidents
- Learn about automation services available on IBM Cloud
- Understand essential problem solving techniques
This course is intended for learners who are pursuing professional-level site reliability engineer certification on IBM Cloud.
Module Introduction
- Topic 1: Incident Characteristics
- Topic 2: The Incident Management Process
- Topic 3: Implementing Alerts for Incident Thresholds
- Topic 4: Impacts of Upstream and Downstream Dependencies
- Topic 5: Creating Runbooks to Troubleshoot and Mitigate Common Incidents
- Topic 6: Types of Automation Services Available on IBM Cloud
- Topic 7: Problem Solving Techniques
Module Summary
Before starting this curriculum, the target audience should understand:
- System Thinking
- DevOps practices
- Cloud Architecture
- Software engineering principles
- System administration
- Network and OSI model
- Networking and security practices for IBM Cloud
- Incident management
- Root cause analysis
The target audience should also be able to:
- Proficiently write code
- Create run books as a reference
- Make system components serviceable
- Interpret data and statistics to determine actions
- Use LogDNA, SysDig, Grafana, Prometheus, Kibana
- Interpret schematics
- Drive incidents to resolution
- Remediate underlying sources of unreliability
- Create and configure VMs
- Create and configure Containers on IBM Kubernetes Service (IKS)/Red Hat OpenShift Kubernetes Services (ROKS)
- Create and configure Containers using OpenShift
- Create and configure Serverless applications
- Configure for high availability and scalability