Skip to content

telemetryflow/telemetryflow-academy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TelemetryFlow Logo

TelemetryFlow Academy — Mastering Observability 101

Course Version License Sessions Duration Slides Author OpenTelemetry CC License

CC BY-NC-SA 4.0


TelemetryFlow Academy

Mastering Observability 101 is the flagship open course from TelemetryFlow Observability (TFO) — a structured, 6-week, 12-session journey that takes learners from the fundamentals of system monitoring all the way to modern, vendor-neutral, control-plane–driven observability with OpenTelemetry and TelemetryFlow.

This repository contains the complete curriculum: the master COURSE.md syllabus, the downloadable slide decks in pdf/, and the shared brand/license assets in assets/. The material is designed to be delivered live (instructor-led) or self-paced.

Author: Dwi Fahni Denni · Platform: TelemetryFlow Observability (TFO) · License: CC BY-NC-SA 4.0 · Version: 1.0.0 · Last Update: 2026-06-24


Table of Contents


Overview

Observability is no longer optional. Modern distributed systems — microservices, Kubernetes, event-driven architectures — generate a flood of telemetry data, yet debugging has become harder, not easier. The "Dashboard of Doom" (100 green dashboards while users see errors) is a symptom of the gap between monitoring (asking "is it broken?") and observability (asking "why is it broken?").

TelemetryFlow Academy bridges that gap. The course is built around three progressive arcs:

  1. Foundations (Weeks 1–2) — From monitoring to observability, the MELT pillars (Metrics, Events, Logs, Traces), and reliability engineering with SLIs/SLOs.
  2. Deep Dive (Weeks 3–5) — Distributed tracing, structured logging, OpenTelemetry (API, SDK, Collector), profiling, and frontend observability.
  3. Scale & Platform (Week 6) — The pain of scale (cost, noise, rigidity), control-plane concepts, and a hands-on introduction to TelemetryFlow Observability.

Each session is delivered as a self-contained slide deck (~12–15 slides), paired with discussion prompts, hands-on exercises, and an assignment.


Learning Outcomes

By the end of this course, learners will be able to:

  • Explain the difference between monitoring and observability, and articulate why observability is a system property, not a tool.
  • Identify and categorize the four pillars (MELT) of telemetry data and choose the right pillar for each scenario.
  • Apply the RED, USE, and Google Golden Signals methods, and define meaningful SLIs, SLOs, and error budgets.
  • Read and author distributed traces and structured logs, and correlate them via trace_id / span_id.
  • Describe the OpenTelemetry architecture (API, SDK, Collector, semantic conventions) and instrument a sample application.
  • Configure an OTel Collector pipeline (receivers, processors, exporters, connectors) and reason about deployment topologies.
  • Perform end-to-end correlation across metrics, logs, traces, and exemplars, and design dashboards/alerts that avoid alert fatigue.
  • Recognize the pain points of telemetry at scale (cost, noise, rigidity, vendor lock-in) and evaluate how a control plane / smart gateway (TelemetryFlow) addresses them.

Who Is This Course For?

Audience What You'll Get Out of It
SRE / Platform Engineers A structured vocabulary for observability plus hands-on OpenTelemetry & TFO patterns.
Backend / Microservices Developers Learn to instrument code the right way and debug distributed systems fast.
DevOps Engineers Operate the OTel Collector at scale and design resilient telemetry pipelines.
Engineering Managers / Architects Understand the cost/rigor trade-offs and what a control plane unlocks.
Students / Career Switchers A vendor-neutral, modern foundation in observability — no proprietary lock-in.

Prerequisites

  • Basic familiarity with Linux command line and HTTP/REST.
  • Comfort reading YAML / JSON configuration.
  • Exposure to at least one programming language (Go, Python, Java, JS/TS, or similar).
  • Conceptual familiarity with containers (Docker) and ideally Kubernetes (helpful but not required for Weeks 1–4).
  • No prior observability tooling experience required — we start from first principles.

Course Architecture

graph LR
    subgraph W1["Week 1 — Foundations"]
        S1["S1<br/>From Monitoring<br/>to Observability"]
        S2["S2<br/>The Three Pillars<br/>(MELT)"]
    end

    subgraph W2["Week 2 — Metrics & Reliability"]
        S3["S3<br/>Metrics & Reliability<br/>SLIs / SLOs"]
        S4["S4<br/>Tracing &<br/>Structured Logs"]
    end

    subgraph W3["Week 3 — OpenTelemetry I"]
        S5["S5<br/>OpenTelemetry<br/>Deep Dive Pt. 1"]
        S6["S6<br/>The OTel<br/>Collector"]
    end

    subgraph W4["Week 4 — Correlation & Scale"]
        S7["S7<br/>E2E Correlation<br/>& Dashboarding"]
        S8["S8<br/>The Pain<br/>of Scale"]
    end

    subgraph W5["Week 5 — Advanced Observability"]
        S9["S9<br/>Profiling<br/>& RUM"]
        S10["S10<br/>Pipeline &<br/>Control Plane"]
    end

    subgraph W6["Week 6 — TelemetryFlow (TFO)"]
        S11["S11<br/>TFO Architecture<br/>& Core Concepts"]
        S12["S12<br/>TFO Hands-on<br/>& Prod Readiness"]
    end

    S1 --> S2 --> S3 --> S4 --> S5 --> S6
    S6 --> S7 --> S8 --> S9 --> S10 --> S11 --> S12

    style W1 fill:#d1fae5
    style W2 fill:#bfdbfe
    style W3 fill:#fde68a
    style W4 fill:#fecaca
    style W5 fill:#ddd6fe
    style W6 fill:#fef3c7
Loading

The journey is intentionally cumulative: every session builds on the previous one, ending in a hands-on lab on TelemetryFlow Observability where learners apply dynamic routing, tail-based sampling, and cost-control policies.


Curriculum (12 Sessions)

The full authoritative outline lives in COURSE.md. Below is a quick-reference summary.

Week 1 — Foundations of Observability

Session 1 — From Monitoring to Observability ~12 slides

The evolution from monoliths → SOA → microservices, why monitoring (known-knowns) breaks down, the Rumsfeld Matrix, and the mindset shift: monitoring asks "is it broken?", observability asks "why is it broken?"

Session 2 — The Three Pillars of Observability (MELT) ~14 slides

Metrics, Events, Logs, Traces (and Profiles as the 4th pillar). Cardinality, the danger of siloed pillars, and a decision matrix for choosing the right pillar per scenario.

Week 2 — Metrics, Reliability, Traces & Logs

Session 3 — Metrics & Reliability (SLIs/SLOs) ~14 slides

Metric types (Counter, Gauge, Histogram, Summary), the RED and USE methods, Google's Golden Signals, SLIs/SLOs/SLAs, error budgets, and an intro to PromQL.

Session 4 — Distributed Tracing & Structured Logging ~15 slides

The hop-by-hop debugging nightmare, trace anatomy (Trace/Span/Parent IDs), W3C Trace Context (traceparent/tracestate), structured (JSON) logging, and the magic link: injecting trace_id into logs.

Week 3 — OpenTelemetry Deep Dive

Session 5 — OpenTelemetry Deep Dive — Part 1 ~14 slides

The fragmented past (vendor lock-in), OpenTelemetry as the CNCF standard (merging OpenTracing + OpenCensus), API vs SDK, semantic conventions, resources, and auto- vs manual-instrumentation.

Session 6 — OpenTelemetry Deep Dive — Part 2 (The Collector) ~15 slides

Why a Collector exists, the Receivers → Processors → Exporters → Connectors pipeline, agent vs gateway deployment modes, YAML config authoring, and the "YAML hell" problem at scale.

Week 4 — Correlation & the Pain of Scale

Session 7 — End-to-End Correlation & Dashboarding ~14 slides

The correlation matrix, exemplars, trace-to-log and log-to-metric linking, dashboard anti-patterns (spaghetti, wall of green), alerting on symptoms not causes, and curing alert fatigue.

Session 8 — The Pain of Scale ~14 slides

The dark side of telemetry at scale: cost explosion (ingestion pricing), the "observability tax", noise fatigue, pipeline rigidity (restart-induced data drops), vendor lock-in, and the case for a control plane.

Week 5 — Advanced Observability

Session 9 — Advanced Observability (Profiling & RUM) ~14 slides

Continuous profiling (eBPF, SIGPROF), the holy grail of profiling + traces, frontend observability, Real User Monitoring (RUM), Core Web Vitals, session replay, and synthetics.

Session 10 — Telemetry Pipeline & Control Plane Concepts ~13 slides

The data-plane limit, the control-plane concept, policy-based routing, head- vs tail-based sampling, GitOps for telemetry, and the blueprint: App → Agent → Control Plane/Smart Gateway → Backend.

Week 6 — TelemetryFlow Observability (TFO)

Session 11 — TFO Part 1: Architecture & Core Concepts ~15 slides

Where TFO sits (between OTel Agent and backends), its four core features — dynamic pipeline management, smart routing, centralized configuration (GitOps), and cost control — and how each maps to a Session 8 pain point.

Session 12 — TFO Part 2: Hands-on & Production Readiness ~14 slides

Real-world use cases (cost optimization, multi-IO, tail-based sampling at the gateway), a full hands-on lab (deploy app → OTel Agent → TFO → backends), dynamic routing without agent restarts, and migration best practices.


Slide Decks (PDF)

Each session ships as a polished PDF slide deck in pdf/. Decks are published as sessions are finalized.

Session Title File Status
1 From Monitoring to Observability Mastering Observability - Session 01.pdf ✅ Available
2 The Three Pillars of Observability (MELT) Mastering Observability - Session 02.pdf ✅ Available
3 Metrics & Reliability (SLIs/SLOs) Mastering Observability - Session 03.pdf ✅ Available
4 Distributed Tracing & Structured Logging Mastering Observability - Session 04.pdf ✅ Available
5–12 OpenTelemetry, Scale, TFO Coming soon 🚧 In progress

The slide count per session (~12–15) is documented in COURSE.md next to each session header.


How to Use This Repository

For Self-Paced Learners

  1. Read the COURSE.md syllabus top-to-bottom to understand the arc.
  2. Open the pdf/ deck for each session in order; take notes.
  3. Complete each session's assignment before moving on.
  4. Use the discussion prompts to reflect on real incidents in your own systems.

For Instructors / Community Teachers

  1. Schedule two sessions per week (≈ 90 minutes each) over 6 weeks.
  2. Present the PDF deck; pause for the discussion slides and hands-on exercises.
  3. Assign the per-session assignment as homework.
  4. Encourage learners to share "war stories" — Sessions 1 and 8 are explicitly designed for this.
  5. Adapt the TFO hands-on lab (Session 12) to your local environment.

For Community Contributors

See CONTRIBUTING.md for how to propose fixes, translations, additional examples, or new sessions.


Learning Pathway

graph TB
    Start([Start: Beginner]) --> Track1{Background?}

    Track1 -->|Backend Dev| Core[Weeks 1-4:<br/>Foundations + Tracing]
    Track1 -->|DevOps / SRE| Full[Full Course:<br/>All 12 Sessions]
    Track1 -->|Manager / Architect| Strategic[Sessions 1, 2, 7, 8,<br/>10, 11 — Strategy Track]

    Core --> OTel[Week 3:<br/>OpenTelemetry]
    OTel --> Scale[Weeks 4-5:<br/>Correlation & Scale]
    Scale --> TFO[Week 6:<br/>TelemetryFlow Hands-on]

    Full --> TFO
    Strategic --> TFO

    TFO --> Graduate([End: Observability<br/>Practitioner])

    style Start fill:#d1fae5
    style Graduate fill:#fef3c7
    style TFO fill:#bfdbfe
Loading

Key Concepts Covered

Mindset & Frameworks

  • Monitoring vs. Observability · Rumsfeld Matrix (known-knowns → unknown-unknowns)
  • The Four Pillars (MELT) · Cardinality trade-offs · High- vs low-cardinality data

Reliability Engineering

  • RED method (Rate, Errors, Duration) · USE method (Utilization, Saturation, Errors)
  • Google's Golden Signals · SLIs / SLOs / SLAs · Error budgets

Distributed Systems Debugging

  • W3C Trace Context (traceparent, tracestate) · Context propagation across HTTP/gRPC
  • Structured (JSON) logging · Trace-to-log correlation via trace_id
  • Exemplars (linking metrics → traces)

OpenTelemetry (CNCF Standard)

  • API vs SDK distinction · No-op defaults · Semantic conventions
  • Auto-instrumentation vs manual spans · Resource & service attributes
  • Collector pipeline: Receivers · Processors · Exporters · Connectors
  • Agent (sidecar/DaemonSet) vs Gateway deployment modes

Scale & Strategy

  • Ingestion-based pricing cost models · The "observability tax"
  • Head-based vs tail-based sampling · Policy-based routing
  • Data plane vs control plane · GitOps for telemetry

TelemetryFlow Observability (TFO)

  • Dynamic pipeline management (no-restart reconfiguration)
  • Smart routing (by content / metadata / policy)
  • Centralized, GitOps-compatible configuration
  • Cost control via noise filtering before expensive backends

Hands-On & Assignments

Every session ends with either an assignment or an interactive discussion. Highlights:

Session Type Activity
1 Assignment Identify 3 "unknown-unknowns" in your current project.
2 Hands-on + Assignment Categorize scenarios to pillars; map your stack to MELT.
3 Hands-on Write a PromQL query to measure an SLI.
4 Hands-on Read a waterfall chart; correlate a slow span to an error log.
5 Hands-on Auto-instrument a sample app with the OTel SDK.
6 Hands-on Author a Collector config (Receiver → Batch → OTLP Exporter).
7 Hands-on Navigate an E2E incident: Alert → Metric → Trace → Log.
9 Hands-on Analyze a CPU hotspot correlated with a slow span.
11 Assignment Design a data-reduction policy for your org.
12 Hands-on Lab Deploy app → OTel Agent → TFO → backends; route status=500 to a premium backend without restarting agents.

Repository Structure

telemetryflow-academy/
├── README.md              # This file — course overview & quickstart
├── COURSE.md              # Master syllabus (authoritative, 12 sessions)
├── CHANGELOG.md           # Version history for course materials
├── CONTRIBUTING.md        # How to contribute / report issues / suggest sessions
├── CODE_OF_CONDUCT.md     # Community standards
├── LICENSE                # CC BY-NC-SA 4.0 legal code
├── assets/                # Shared brand & license assets
│   ├── by-nc-sa.png       # CC BY-NC-SA 4.0 badge (PNG)
│   └── by-nc-sa.svg       # CC BY-NC-SA 4.0 badge (SVG)
├── docs/                  # Companion documentation (platform & training context)
│   ├── telemetryflow-platform.md   # TelemetryFlow Observability platform overview
│   └── enterprise-training.md      # Commercial enterprise observability training
└── pdf/                   # Published slide decks (one per session)
    ├── Mastering Observability - Session 01.pdf
    ├── Mastering Observability - Session 02.pdf
    ├── Mastering Observability - Session 03.pdf
    ├── Mastering Observability - Session 04.pdf
    ├── ...
    └── Mastering Observability - Session 12.pdf

Documentation

The docs/ folder contains companion documentation that gives learners deeper context around the platform that caps the course (Sessions 11 & 12).

Platform Context

📄 docs/telemetryflow-platform.mdTelemetryFlow Observability (TFO) — Platform Context

A learner-friendly adaptation of the canonical TelemetryFlow ecosystem overview. Read this before Sessions 11 & 12 to understand what TFO is, how it is architected, and how every concept from Sessions 1–10 maps to a concrete TFO feature. Includes the full Course → TFO Concept Map table.

Enterprise Training

🎓 docs/enterprise-training.mdEnterprise Observability Training

Commercial, instructor-led training that builds on this open course. Covers role-based tracks (SRE, DevOps, Leadership), delivery formats (on-site / virtual / hybrid), customization, TFO Practitioner and TFO Architect certification, and the engagement model. Vendor-neutral by default.


Enterprise Training

For teams that need to move from concept to production rollout, TelemetryFlow Academy offers commercial, instructor-led enterprise training that builds on this open course (CC BY-NC-SA 4.0).

Open Course Enterprise Training
Delivery Self-paced Instructor-led, cohort-based
Customization Fixed 12-session curriculum Your stack, your services, your incidents
Certification TFO Practitioner / TFO Architect
Support Community (GitHub Issues) Private channel + office hours
License / terms CC BY-NC-SA 4.0 (free) Commercial services agreement

Popular tracks include Observability Jumpstart (2d), OpenTelemetry Immersion (3d), SLO Engineering (2d), and the flagship TFO Production Rollout (5d). Vendor-neutral by default — TFO-specific modules are optional.

👉 See docs/enterprise-training.md for the full track catalog, delivery formats, certification pathways, and how to request a quote.


Installation / Local Use

This is a documentation/course repository — there is nothing to compile or run. To use the materials locally:

# Clone the repository
git clone https://github.com/telemetryflow/telemetryflow-academy.git
cd telemetryflow-academy

# Open the syllabus
open COURSE.md            # macOS     | xdg-open COURSE.md on Linux

# Open a slide deck
open "pdf/Mastering Observability - Session 01.pdf"

💡 The PDF decks are the primary learning artifact. They are self-contained and can be read without any additional tooling beyond a PDF viewer.


Contributing

We welcome contributions that improve clarity, fix errata, add translations, or propose additional sessions and exercises. Please read CONTRIBUTING.md first.

Some contribution ideas:

  • 🌍 Translations of slide decks or the syllabus
  • 🧪 Hands-on lab templates (Docker Compose, sample apps) to pair with a session
  • 📝 Errata & clarifications via issues or PRs
  • 💡 New session proposals for a follow-up course (e.g. Observability 201)

Code of Conduct

Participation in TelemetryFlow Academy — in issues, PRs, discussions, or any live session — is governed by our Code of Conduct. Please be excellent to each other.


License

TelemetryFlow Academy — Mastering Observability 101
© 2026 Telemetri Data Indonesia

Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license — CC BY-NC-SA 4.0.

CC BY-NC-SA 4.0

You are free to:

  • Share — copy and redistribute the material in any medium or format
  • Adapt — remix, transform, and build upon the material

Under the following terms:

  • Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
  • NonCommercial — You may not use the material for commercial purposes without explicit permission.
  • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

See the full legal code in LICENSE.


Author

Dwi Fahni Denni — Author & Instructor, Mastering Observability 101

TelemetryFlow Observability (TFO) — https://telemetryflow.id


Acknowledgments

  • The OpenTelemetry project and the broader CNCF observability community for establishing the vendor-neutral standard this course teaches.
  • The SRE and observability pioneers whose frameworks shape this curriculum — Google SRE (SLIs/SLOs/error budgets), the RED and USE methods (Tom Wilkie and Brendan Gregg), and the W3C Trace Context working group.
  • The Telemetri Data Indonesia community for reviewing and supporting the course.
  • Every learner who shares a "war story" — those stories make the material better for the next cohort.

Built with ❤️ by Telemetri Data Indonesia

Observability is a property of a system, not a tool. Go build observable systems.

About

TelemetryFlow Academy - Enterprise Training Observability (Mastering Observability 101)

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors