Skip to content

Inference and monitoring#2

Merged
Sebastijan-Dominis merged 10 commits into
mainfrom
inference-and-monitoring
Mar 28, 2026
Merged

Inference and monitoring#2
Sebastijan-Dominis merged 10 commits into
mainfrom
inference-and-monitoring

Conversation

@Sebastijan-Dominis
Copy link
Copy Markdown
Owner

Description

The main contribution are the two new pipelines - infer.py and monitor.py.
The two pipelines enable model inference and monitoring, which now
completes the model lifecycle.

In writing these, some related and unrelated bugs were discovered as well.
This branch includes many bug fixes, updated documentation, added tests,
and some data and artifacts, so that the user can view expected outputs of
relevant pipelines immediately, without running anything. Main README.md
has been updated as well, and includes some new gifs.

Type of change

  • feature
  • fix
  • docs

Checklist

  • Tests added or updated
  • Tests passing
  • Documentation updated if needed
  • CI checks pass

Improved the data preprocessing orchestration script to always skip
raw snapshot registration if metadata already exists, regardless of the
`--skip-if-existing` flag. This ensures that we don't attempt to
overwrite existing metadata, which could lead to inconsistencies.
Updated the fake data generator to include pii columns, in order to make
downstream pipeline running seamless. Updated the defaults in the same
script to train a new model by default, and to generate 5000 rows of data
by default. This ensures seamless downstream execution, since feature
sets require at least 5000 rows. Changed the minimum row requirement in
the interim hotel bookings config to 5000 as well. Creating a new version
of interim configs for this purpose alone would be overkill at this point.
Generated some fake data and snapshot bindings. Updated a test that was
broken by the changes in data preprocessing orchestrator. Prepared some
code to later alter in creating infer.py. Data, feature sets and
experiments are still all held locally for now, but will be added to the
repo eventually. Now is not the time to do so, as the code is still
evolving rapidly.
Wrote an initial idea for infer.py, which will be the main entry point
for inference runs. It is not modularized yet, but it is a starting point
for the overall structure of inference runs. This required some upstream
changes as well. Tests have been updated too. Marked infer.py as in
progress.
Improved infer.py to include per-class probability columns, and to output
some useful metadata.
Wrote the monitor.py pipeline, which also implied a few changes in the
upstream code.
Modularized both inference and monitoring logic; added some docstrings;
removed unused imports.
Added some prints for the frontend to show the user where the artifacts
are being saved. Updated requirements.txt, since Docker needed it.
Updated docker-compose.yml to have the used directories as volumes.
Some of them were missing earlier.
Improved UI by adding background colors to the result text areas for
pipelines and scripts, indicating success (green) or error (red) based
on the presence of an "error" key in the result. Removed some parts of
the comments from docker-related files.
Fully updated all of the relevant documentation, as well as the main
README.md.

Fixed a bug in .gitignore where env configs were ignored, while some of
the log files were not.

Fixed a bug in the fake data generator that caused failure due to
poor datetime handling. This also implied a change in
ml/components/feature_engineering/arrival_date.py, where the feature
was not handled correctly, and assumed ideal scenarios.

Added some tests for the snapshot binding generator.

Updated pyproject.toml to ignore coverage for the fake data generator,
since it requires very specific packages, and is not a core part of the
repo anyway.

Added some new gifs and uml diagrams and updated the old ones, as a part
of the documentation update.

Included some artifacts and logs to be committed, so that the user can
immediately see the expected structure and content of some of the
relevant pipelines and scripts outputs, without having to run them first.
Only included a handful of them, to avoid cluttering the repo with too
many files.
@Sebastijan-Dominis Sebastijan-Dominis self-assigned this Mar 28, 2026
@Sebastijan-Dominis Sebastijan-Dominis merged commit f32680e into main Mar 28, 2026
2 checks passed
@Sebastijan-Dominis Sebastijan-Dominis deleted the inference-and-monitoring branch March 28, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant