Skip to content

supreetha9/operations-control-tower

Repository files navigation

Operations Control Tower

Customer Support Analytics with Airflow + dbt + DuckDB + Custom Process Mining + Streamlit

A production-style operations analytics platform that simulates a year of Zendesk-style ticket lifecycle (50K tickets, 250K events), orchestrates the whole pipeline with Apache Airflow, builds analytical marts with dbt, runs custom pandas-based process mining (transition matrix, variant analysis, bottleneck scoring), and surfaces SLA risk and bottleneck recommendations through a 6-page Streamlit dashboard.


Business Problem

A growing B2B SaaS support team handles ~50K tickets/year through three groups (Tier 1 Support → Tier 2 Specialists → Engineering) with strict SLAs by ticket priority. Leadership needs answers to five questions every Monday morning:

  1. Are we hitting SLA? Per group, per priority, per ticket type.
  2. Where do tickets pile up? Which group is the real bottleneck — and is it cycle time, volume, or capacity?
  3. Will we have enough people next month? Volume forecast vs current FTE.
  4. What's hurting CSAT? Reopens? Escalations? Long cycles? Specific groups?
  5. What should I do about it? Concrete, prioritized actions — not just dashboards.

Key Findings (after make pipeline)

Finding Detail
Annual ticket volume ~50,000 tickets / ~250,000 events
SLA attainment ~85-90% overall; lowest on urgent-priority incidents
Top bottleneck group Tier 2 Specialists (highest cycle × volume × breach product)
Capacity gap Tier 2 understaffed ~10-15 days/month at current FTE
Reopen / escalation impact Each escalation adds ~3-5h cycle time on average
Process conformance ~60-70% of tickets follow the documented happy path

Tech Stack

Layer Tool Why
Source schema Zendesk-style (tickets, ticket_events, groups, sla_policies, satisfaction_ratings) Instant recognition for support-ops hiring managers
Warehouse DuckDB File-based, zero infra, dbt-supported
Transformation dbt-core + dbt-duckdb Industry-standard transformation layer
Orchestration Apache Airflow (Docker Compose) Industry standard for ops analytics scheduling
Process mining Custom pandas (no pm4py) Lightweight, readable, no heavy dep
Forecasting statsmodels (Holt-Winters) Weekly seasonality on daily ticket volume
Anomaly detection Rolling z-score Standard for ops monitoring
Dashboard Streamlit + Plotly 6 pages incl. Sankey + transition heatmap + at-risk alerts
Documentation Sphinx + Furo Production-quality docs
Environment pyenv + hatchling Reproducible builds

Architecture

                       ┌─────────────────────────────┐
                       │  Apache Airflow (Docker)    │
                       │  ops_daily_pipeline @daily  │
                       └──────────────┬──────────────┘
                                      │
                                      ▼
Synthetic Zendesk-style Generator (Python)
    ↓  6 Parquet files in data/raw/
       agents · sla_policies · tickets · ticket_events · ticket_costs · satisfaction_ratings
       │
dbt Project
    ↓  6 staging views (with schema tests)
    ↓  2 intermediate models (int_ticket_lifetime, int_event_transitions)
    ↓  5 mart tables (sla, group_performance, agent_productivity, ticket_outcomes, daily_ops)
    ↓  + 2 seeds (sla_policy_targets, group_capacity) + 2 macros + 80+ data tests
DuckDB Warehouse (data/ops.duckdb)
    ↓
Python Analysis
    ├─ Process mining: transition matrix, variants, bottleneck score, conformance
    ├─ Holt-Winters volume forecast
    ├─ Rolling z-score anomaly detection
    └─ At-risk SLA alert writer (data/alerts.log)
    ↓
Streamlit Dashboard (6 pages, Sankey, heatmap, recommendations)

Quick Start

Prerequisites: pyenv with pyenv-virtualenv. Docker Desktop only needed if you want to run Airflow locally.

# 1. Create the virtual environment
pyenv install -s 3.13.3
pyenv virtualenv 3.13.3 ops_env
pyenv local ops_env

# 2. Install all dependencies (Python + dbt-utils)
make all-env

# 3. Run the full pipeline (generate data → dbt build → analysis → alerts)
make pipeline

# 4. Launch the dashboard
make app

The dashboard opens at http://localhost:8501.

To bring up Airflow locally and trigger the same pipeline on a schedule:

cp .env.example .env
make airflow-up
make airflow-trigger    # or wait for the @daily schedule

Airflow UI: http://localhost:8080 (default admin / admin).

Run make help to see all available targets.

Dashboard Pages

Page What it shows
Executive Overview 6 top-line KPIs (volume, SLA, cycle time, backlog, cost, CSAT) + 30-day trend lines
Queue Performance Per-group cycle/breach table sorted by bottleneck score + live at-risk SLA alerts
Bottleneck Analysis Plotly Sankey of process flow + transition-matrix heatmap + top-10 variants + bottleneck bars
Capacity Planning 60d actuals + 30d Holt-Winters forecast + interactive per-group capacity-gap simulator
Quality Outcomes CSAT histogram, reopen/escalation rates by group, churn-after-ticket by tier
Recommendations Auto-generated prioritized actions from bottleneck + capacity + outcome analysis

dbt Project Structure

dbt_project/
├── dbt_project.yml          # project config
├── profiles.yml             # DuckDB adapter, target = ../data/ops.duckdb
├── packages.yml             # dbt-utils dependency
│
├── models/
│   ├── _sources.yml         # 6 raw Parquet sources (Zendesk schema)
│   ├── staging/             # 6 stg_* views (clean + cast + sla_status macro)
│   │   └── _staging.yml     # 30+ schema tests
│   ├── intermediate/        # ephemeral CTEs
│   │   ├── int_ticket_lifetime.sql
│   │   └── int_event_transitions.sql
│   └── marts/               # 5 analytical mart tables
│       └── _marts.yml       # mart-level contracts + tests
│
├── seeds/
│   ├── sla_policy_targets.csv     # ops attainment targets (95% urgent, 85% low, ...)
│   └── group_capacity.csv         # FTE capacity per group
│
├── macros/
│   ├── sla_status.sql       # within_sla / breached classifier
│   └── cycle_hours.sql      # standardized fractional-hour DATE_DIFF
│
└── tests/
    ├── assert_no_negative_cycle_time.sql
    └── assert_closed_tickets_have_close_ts.sql

A hiring manager opening this project sees a complete dbt project: sources, seeds, staging, intermediate, marts, schema tests, singular tests, macros — and the orchestration layer (Airflow) that operations-analytics roles screen for at companies like Zendesk, Atlassian, Intercom, GitLab, and Stripe.

Project Structure

operations-control-tower/
├── Makefile                    # pyenv-aware build targets (incl. dbt + Airflow)
├── pyproject.toml              # deps: dev / dbt / streamlit / docs
├── .python-version             # locks pyenv virtualenv (ops_env)
├── docker-compose.yml          # Airflow 2.10 + Postgres on LocalExecutor
├── .env.example                # AIRFLOW_UID / USERNAME / PASSWORD
│
├── data/
│   ├── raw/                    # 6 generated Parquet files (gitignored)
│   ├── sample/                 # small CSVs for inspection
│   ├── ops.duckdb              # dbt build target (gitignored)
│   └── alerts.log              # at-risk SLA alerts (gitignored)
│
├── airflow/
│   ├── dags/ops_pipeline.py    # daily DAG: generate → dbt build → analysis → alerts
│   ├── plugins/.gitkeep
│   └── logs/                   # gitignored
│
├── dbt_project/                # see dbt structure above
│
├── python/src/
│   ├── generate_data.py        # synthetic Zendesk-style event-log simulator
│   ├── process_mining.py       # transition matrix, variants, bottleneck, conformance
│   ├── analysis.py             # forecast, anomaly, capacity gap, CLI orchestration
│   ├── alerts.py               # at-risk SLA scanner + JSON alert log writer
│   └── metrics.py              # KPI helpers (safe_rate, format_pct, sla_bucket, ...)
│
├── streamlit_app/
│   ├── app.py                  # entry point
│   ├── pages/                  # 6 dashboard pages (incl. Sankey, heatmap)
│   └── utils/data_loader.py    # cached DuckDB loaders
│
├── tests/                      # 95+ pytest tests across 5 files
│
└── docs/                       # Sphinx documentation (incl. process_mining.rst, orchestration.rst)

Tests & Quality

Two layers of testing:

Layer Count Run via
dbt data tests 80+ (schema + singular + relationships) make dbt-build
Python pytest 95+ (unit + process-mining + integration) make test

make lint runs ruff over all Python code.

Documentation

Full project documentation built with Sphinx:

make docs        # live-reload server on port 8000
make docs-build  # one-shot HTML build to docs/_build/html

dbt's own lineage docs:

make dbt-docs    # serves at port 8081

License

MIT

About

Customer-support operations control tower: process mining, SLA bottleneck detection, and capacity forecasting on 250K+ workflow events across 50K cases. Airflow + dbt + DuckDB + Pandas + 6-page Streamlit dashboard.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors