Skip to content

Create Performance Report for Legacy vs Polars Pipelines (Phases 2–9) #502

@mattsan-dev

Description

@mattsan-dev

Overview
Several pipeline phases between Normalise (Phase 2) and Harmonise (Phase 9) have been migrated to Polars, but we do not yet have clear measurements showing the performance impact. A performance report is required to compare these phases against their legacy equivalents. This will help confirm improvements and highlight any areas that still require optimisation.

Tech Approach

  • Benchmark only Phases 2 to 9: NormalisePhase, ParsePhase, ConcatFieldPhase, and all subsequent phases up to HarmonisePhase.
  • Use representative datasets to test both legacy and Polars implementations under the same conditions.
  • Measure runtime for each phase.
  • Ensure tests are performed on consistent hardware to avoid environmental differences.
  • Produce a concise report with tables or charts summarising the results and any observed regressions or anomalies.

Acceptance Criteria / Tests

  • A completed performance report comparing legacy and Polars performance for phases 2 through 9 only.
  • Metrics include runtime and high‑level system behaviour for each relevant phase.
  • Benchmarks are repeatable and test datasets are documented.
  • Any regressions or unexpected results are clearly identified.
  • Unit tests or performance checks are added or updated as needed to ensure future baseline monitoring.
  • All benchmarking code complies with project formatting and linting standards.

Resourcing & Dependencies

  • Depends on Polars versions of Phases 2 to 9 being stable.
  • Can be completed by any developer able to run and compare both pipelines.
  • No external team involvement expected.

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

Status

Testing 🧪

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions