Skip to content

correlator-io/correlator-airflow

Repository files navigation

🔗 airflow-correlator

Accelerate Airflow incident resolution with automated correlation

PyPI version codecov Python Version License


What It Does

Automatically connects Airflow task executions to incident correlation:

  • Emits OpenLineage events for task lifecycle (START/COMPLETE/FAIL)
  • Links task failures to upstream data quality issues
  • Provides direct navigation from incident to root cause
  • Reuses all 50+ built-in OpenLineage extractors

Why It Matters

The Problem: When data pipelines fail, teams spend significant time manually hunting through Airflow logs, lineage graphs, and job histories to find the root cause.

What You Get: Automated correlation between Airflow task executions and data quality test results, putting you in control of incidents instead of reacting to them.

Key Benefits:

  • Faster incident resolution: Automated correlation reduces investigation time
  • Eliminate tool switching: One correlation view instead of navigating multiple dashboards
  • Instant root cause: Direct path from task failure to problematic upstream job
  • Zero-friction setup: Simple configuration, no code changes required

Built on Standards: Uses OpenLineage, the industry standard for data lineage. No vendor lock-in, no proprietary formats.


Quick Start

IMPORTANT: Requires Airflow 2.11.0+ and apache-airflow-providers-openlineage>=2.4.0

1. Install

pip install correlator-airflow apache-airflow-providers-openlineage

2. Configure OpenLineage Transport

Option A: openlineage.yml (Recommended)

Create openlineage.yml in your Airflow home directory:

transport:
  type: correlator
  url: http://localhost:8080
  api_key: ${CORRELATOR_API_KEY}

Option B: Environment Variable

export AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "correlator", "url": "http://localhost:8080"}'

3. Run Your DAGs

That's it. Your Airflow task executions are now being correlated.


How It Works

airflow-correlator provides a custom OpenLineage transport that integrates with Airflow's built-in OpenLineage provider:

Airflow Task → [OL Provider Listener] → [OL Extractors] → [CorrelatorTransport] → Correlator
              └────── Built into Airflow ──────┘         └─── This plugin ───┘

See Architecture for technical details.


Versioning

This package follows Semantic Versioning with the following guidelines:

  • 0.x.y versions (e.g., 0.1.0, 0.2.0) indicate initial development phase:

    • The API is not yet stable and may change between minor versions
    • Features may be added, modified, or removed without major version changes
    • For production-critical systems, please pin a version that works in your environment
  • 1.0.0 and above will indicate a stable API with semantic versioning guarantees:

    • MAJOR version for incompatible API changes
    • MINOR version for backwards-compatible functionality additions
    • PATCH version for backwards-compatible bug fixes

The current version is in early development stage, so expect possible API changes until the 1.0.0 release.


Documentation

For detailed usage, configuration, and development:


Requirements

  • Python 3.9+
  • Airflow 2.11.0+ ONLY (older versions NOT supported)
  • apache-airflow-providers-openlineage>=2.4.0
  • Correlator

Links


License

Apache 2.0 - See LICENSE for details.