Inference and monitoring by Sebastijan-Dominis · Pull Request #2 · Sebastijan-Dominis/ml-workflow-engine

Sebastijan-Dominis · 2026-03-28T13:55:56Z

Description

The main contribution are the two new pipelines - infer.py and monitor.py.
The two pipelines enable model inference and monitoring, which now
completes the model lifecycle.

In writing these, some related and unrelated bugs were discovered as well.
This branch includes many bug fixes, updated documentation, added tests,
and some data and artifacts, so that the user can view expected outputs of
relevant pipelines immediately, without running anything. Main README.md
has been updated as well, and includes some new gifs.

Type of change

feature
fix
docs

Checklist

Tests added or updated
Tests passing
Documentation updated if needed
CI checks pass

Improved the data preprocessing orchestration script to always skip raw snapshot registration if metadata already exists, regardless of the `--skip-if-existing` flag. This ensures that we don't attempt to overwrite existing metadata, which could lead to inconsistencies. Updated the fake data generator to include pii columns, in order to make downstream pipeline running seamless. Updated the defaults in the same script to train a new model by default, and to generate 5000 rows of data by default. This ensures seamless downstream execution, since feature sets require at least 5000 rows. Changed the minimum row requirement in the interim hotel bookings config to 5000 as well. Creating a new version of interim configs for this purpose alone would be overkill at this point. Generated some fake data and snapshot bindings. Updated a test that was broken by the changes in data preprocessing orchestrator. Prepared some code to later alter in creating infer.py. Data, feature sets and experiments are still all held locally for now, but will be added to the repo eventually. Now is not the time to do so, as the code is still evolving rapidly.

Wrote an initial idea for infer.py, which will be the main entry point for inference runs. It is not modularized yet, but it is a starting point for the overall structure of inference runs. This required some upstream changes as well. Tests have been updated too. Marked infer.py as in progress.

Improved infer.py to include per-class probability columns, and to output some useful metadata.

Wrote the monitor.py pipeline, which also implied a few changes in the upstream code.

Modularized both inference and monitoring logic; added some docstrings; removed unused imports.

Added some prints for the frontend to show the user where the artifacts are being saved. Updated requirements.txt, since Docker needed it. Updated docker-compose.yml to have the used directories as volumes. Some of them were missing earlier.

Improved UI by adding background colors to the result text areas for pipelines and scripts, indicating success (green) or error (red) based on the presence of an "error" key in the result. Removed some parts of the comments from docker-related files.

Fully updated all of the relevant documentation, as well as the main README.md. Fixed a bug in .gitignore where env configs were ignored, while some of the log files were not. Fixed a bug in the fake data generator that caused failure due to poor datetime handling. This also implied a change in ml/components/feature_engineering/arrival_date.py, where the feature was not handled correctly, and assumed ideal scenarios. Added some tests for the snapshot binding generator. Updated pyproject.toml to ignore coverage for the fake data generator, since it requires very specific packages, and is not a core part of the repo anyway. Added some new gifs and uml diagrams and updated the old ones, as a part of the documentation update. Included some artifacts and logs to be committed, so that the user can immediately see the expected structure and content of some of the relevant pipelines and scripts outputs, without having to run them first. Only included a handful of them, to avoid cluttering the repo with too many files.

Sebastijan-Dominis added 10 commits March 25, 2026 06:58

Improved infer.py; [skip ci]

5d46398

Improved infer.py to include per-class probability columns, and to output some useful metadata.

First idea for monitor.py; [skip ci]

cfb0ef7

Wrote the monitor.py pipeline, which also implied a few changes in the upstream code.

Modularized the post-promotion code; [skip ci]

5a11f6d

Modularized both inference and monitoring logic; added some docstrings; removed unused imports.

Added post-prom. pipelines to ml_service; [skip ci]

d161ce3

Added tests.

ef78dcf

Sebastijan-Dominis self-assigned this Mar 28, 2026

Sebastijan-Dominis merged commit f32680e into main Mar 28, 2026
2 checks passed

Sebastijan-Dominis deleted the inference-and-monitoring branch March 28, 2026 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference and monitoring#2

Inference and monitoring#2
Sebastijan-Dominis merged 10 commits into
mainfrom
inference-and-monitoring

Sebastijan-Dominis commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Sebastijan-Dominis commented Mar 28, 2026

Description

Type of change

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant