Skip to content

Rapid local performance test environment supporting the Polars‑based transformation rewrite in digital-land-python #475

@mattsan-dev

Description

@mattsan-dev

Overview
A streamlined local test environment is required to accelerate the transformation logic rewrite being delivered as part of the Polars prototype initiative. The usefulness of the current Docker setup remains limited due to minimal documentation and the shift away from the all‑in‑one run.sh workflow. The docker‑compose configuration was originally intended to support that older approach but is no longer actively used. A simplified alternative approach will therefore be adopted, based on a Python virtual environment.

This new venv‑based environment will provide a rapid and lightweight platform for developing and optimising the Polars‑driven transformation flow, which is intended to replace the current row‑by‑row processing model. Buckinghamshire Council will continue to serve as the Local Authority test dataset.

The purpose of this environment is strictly performance optimisation. It does not replace or replicate the established correctness testing mechanism. All functional correctness will continue to be validated through the existing unit, integration and acceptance test suites.

Tech Approach

  • Create a new directory local_testing/ inside the digital-land-python repository with an appropriate structure.
  • Add Polars as an optional dependency group, used only within the rapid test environment.
  • Load Buckinghamshire Council data into data/raw/ or provide an automated download mechanism.
  • Implement the new transformation logic (prototype rewrite) inside polars_phases.py.
  • Provide a fast performance runner in main.py that:
    • loads the Buckinghamshire sample data
    • runs current row‑by‑row logic
    • runs the Polars‑based rewritten logic
    • outputs execution time and memory comparison
  • Add Makefile targets for setup, running tests, and performance benchmarking.
  • Document the purpose, scope and usage of the environment in both READMEs.

Acceptance Criteria / Tests

  • A developer can set up and run the environment in under ten minutes on an Apple Silicon Mac.
  • Polars installs cleanly using the setup script or Makefile.
  • Buckinghamshire Council sample data loads correctly into both legacy and Polars‑based transformation paths.
  • The performance runner outputs clear comparisons between the existing row‑by‑row logic and the prototype rewrite.
  • The environment supports rapid iteration of the transformation rewrite without running the full Digital Land pipeline.

Resourcing & Dependencies

  • Requires confirmation of which Buckinghamshire datasets should be included under data/raw/.
  • Suitable for any Python‑capable engineer.
  • No external dependencies unless dataset extraction or anonymisation triggers governance review.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

Projects

Status

Done - This Period

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions