-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Open
Copy link
Description
Overview
The data pipeline is being incrementally refactored to use Polars for improved performance and efficiency. Parts of the system still expect stream objects, which creates friction during the migration. Introducing two conversion utilities will allow both formats to coexist smoothly, ensuring a controlled and low risk transition.
Assumptions
Data validation is managed in previous phases.
Tech Approach
- Create a utility class that takes a Python stream object and returns a Polars DataFrame using standard Polars constructors.
- Create a second utility class that converts a Polars DataFrame back to a Python stream, preserving types and nested structures where possible.
- Ensure both utilities include simple validation and logging so that unexpected field structures can be identified early.
- Provide internal documentation explaining the expected input and output shapes for each utility.
- Relevant links for guidance:
Polars DataFrame documentation: https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/index.html
Polars conversion functions overview: https://pola-rs.github.io/polars/py-polars/html/reference/api/index.html
Acceptance Criteria / Tests
- A stream object can be successfully converted into a Polars DataFrame.
- The converted stream object is validated with expected columns and row counts.
- A Polars DataFrame can be converted back into a stream that matches the original structure where feasible.
- Unit tests are created for new classes
Resourcing and Dependencies
- No prerequisite tickets are required, although parallel work on pipeline refactoring may influence timelines.
- Any engineer familiar with the data pipeline and Polars can complete this ticket.
- No dependencies on external teams, although the Data Engineering team should be informed once the utilities are ready for adoption in the migration work.
Reactions are currently unavailable
Metadata
Metadata
Labels
No labels
Type
Projects
Status
In Review / QA 🔎