C110020G - Using Performance and Availability Metrics to Measure the Health of Services
This module explains how to use performance and availability metrics to measure the health of services on IBM Cloud.
- Understand which performance and availability metrics to use to achieve a desired outcome.
- Learn to select tools based on the metrics needed.
- Learn to use metrics tools for incident management.
This course is intended for learners who are pursuing professional-level site reliability engineer certification on IBM Cloud.
- Topic 1: Applying the Correct Metric for the Desired Outcome
- Topic 2: Selecting the Correct Tool for the Desired Metric
- Topic 3: Using Metrics Tools for Incident Management
Before starting this curriculum, the target audience should understand:
- System Thinking
- DevOps practices
- Cloud Architecture
- Software engineering principles
- System administration
- Network and OSI model
- Networking and security practices for IBM Cloud
- Incident management
- Root cause analysis
The target audience should also be able to:
- Proficiently write code
- Create run books as a reference
- Make system components serviceable
- Interpret data and statistics to determine actions
- Use LogDNA, SysDig, Grafana, Prometheus, Kibana
- Interpret schematics
- Drive incidents to resolution
- Remediate underlying sources of unreliability
- Create and configure VMs
- Create and configure Containers on IBM Kubernetes Service (IKS)/Red Hat OpenShift Kubernetes Services (ROKS)
- Create and configure Containers using OpenShift
- Create and configure Serverless applications
- Configure for high availability and scalability
Show Schedule for: