-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Overview
Implement an automated benchmarking workflow using Airflow on AWS ECS to generate, store, and track performance reports for the digital-land-python phase pipeline. Reports will be centrally stored in AWS for monitoring and historical analysis.
Current State
Benchmarking is run manually or locally.
No centralized storage or historical record of performance.
No automation or scheduled execution.
Current performance baseline: 2.7× improvement post‑Polars refactor.
Desired State
Automated daily/scheduled benchmarking via Airflow DAG.
ECS‑based container execution for scalability and isolation.
Reports stored in S3 with versioning + lifecycle policies.
Metrics and reports accessible via dashboards/APIs.
Ability to track historical performance trends.
Acceptance Criteria
Airflow DAG created for end‑to‑end benchmarking orchestration.
Benchmarking script containerized (Dockerfile for ECS).
IAM roles/policies configured for ECS tasks + S3 access.
S3 bucket configured with:
Versioning
Lifecycle policies
Structured folder layout: benchmarks/{date}/{report_name}
Collect performance metrics including:
Phase execution times
Memory usage
Data throughput (rows/sec)
Total pipeline runtime
Generate reports in JSON, CSV, and HTML formats.
Airflow task dependencies, retries, and error handling in place.
CloudWatch monitoring enabled for ECS tasks.
Deployment steps fully documented.
Alerts set up for performance degradation (>10% drop).
All integration tests pass in the containerized environment.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status