-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Overview
Optimize the digital-land-python Polars pipeline to exceed the current 2.7× performance baseline.
Create a single Airflow DAG that runs the transform pipeline for any dataset using a dataset name parameter (agreed with Owen on 16‑Feb‑2026).
Current State
Pipeline runs with legacy phase structure and materializes data between phases.
Current speed: 2.7× faster than old approach.
Desired State
Polars pipeline fully lazy, no intermediate collects, parallelized where possible.
Performance target: >3.2× improvement.
This is report generated in local.
Acceptance Criteria
Pipeline
- Bottlenecks identified and optimized.
- Apply Polars lazy optimisations across all refactored phases.
- All tests pass with performance gain.
Technical Considerations
Maximize Polars lazy mode, avoid premature materialization.
Explore parallel execution where safe.
Maintain backward compatibility.
Reuse existing ECS/Airflow patterns and document supported dataset names.
Metadata
Metadata
Labels
Type
Projects
Status