This project is intended as a portfolio grade ML batch system.
The goal is to mirror real-world ML workflows—data pipelines, reproducible training, model promotion, and result reporting—rather than notebook-only experimentation or ad-hoc tuning.
-
ETL
- Ingest churn history.
- Apply bronze -> silver -> gold transformations.
- Produce train-ready data tables.
-
Train
- Config-driven modeling produces trained artifacts and metadata.
- Supported models: Logistic Regression, XGBoost, LightGBM.
-
Promotion
- Deterministically select the best contender.
- Evaluate the best contender against the current champion.
- Promote only if performance improves by an epsilon threshold.
-
Batch Report
- Score a new batch of customer data.
- Emit a structured batch report including:
- Churn risk buckets and customer priority ranks.
- Decision codes and suggested actions for predicted churners.
- Aggregate batch-level summaries.
This system uses Hugging Face Hub to store immutable artifact history.
dataset repository
model repository
Safe by Default
All artifact upload flags are disabled by default. This allows local dry runs and experimentation without modifying any remote dataset or model repositories.
Uploads must be explicitly enabled via configuration and require authentication.
make install
make dagster- Open the URL printed in your terminal (usually http://127.0.0.1:3000)
- Navigate to Jobs
- Select a job
- Materialize the job
- Inspect asset-level metadata and artifacts in the Dagster UI
- Partitioned batch data ingestion
- Scheduled Dagster jobs
- Dagster sensors for automated batch triggering
- Evaluate alternative storage backends beyond Hugging Face Hub