Modern MLOps Blueprint

This project demonstrates a multi-stage ML system designed to detect anamolous GitHub Pull Requests. It incorporates modern cloud-native technologies tested on Digital Ocean:

Vault for secure secret management.
GitHub Actions for CI/CD (building, ephemeral testing).
Argo CD (GitOps) for continuous deployment to DigitalOcean Kubernetes clusters.
Argo Rollouts for blue/green deployment on the model-serving stage.
Spark for distributed preprocessing.
MLflow for experiment tracking and model versioning.
Locust for performance testing.
MongoDB and DigitalOcean Spaces for data storage.

Checkout this blog post outlining the motivation for this repository.

You will need existing k8s clusters to to execute the machine learning pipelines in this repo (preferably one per environment) which are deployed via ArgoCD.

Stages of the Pipeline

Data-Fetch (Job)
- Runs fetch_github_data.py.
- Pulls secrets from Vault (GitHub token, Mongo URI).
- Upserts PR data into MongoDB.
Spark Preprocess (Job)
- Runs spark_preprocess.py.
- Uses Spark to read from Mongo, create features, stores as parquet files in DigitalOcean Spaces.
- Secrets for DO Spaces, etc., come from Vault.
ML Training (Job)
- Runs train_autoencoder.py. A simple feedforward autoencoder with:
  - Encoder: Compresses input data into a lower-dimensional representation.
  - Decoder: Attempts to reconstruct the input from the compressed representation.
  - A threshold-based anomaly detection method is used:
    - Anomalies are detected when reconstruction errors exceed mean + 2 * std_dev.
- Uses MLflow for experiment tracking; references preprocessed data from DO Spaces.
- Also references secrets from Vault (like mlflow_uri).
Model-Serving (Blue/Green with Argo Rollouts)
- A Flask-based ML model serving API that is designed to detect anomalies in pull requests, it is a long-running service.
- We define a Rollout with blueGreen strategy in model-serving-rollout.yaml.
- We can do a “preview” environment, then “promote” to active service for minimal downtime.
- We run Locust performance tests in ephemeral containers to confirm throughput, latency, etc.

Multi-Branch Approach (Staging & Production)

We use two main branches:

staging
main (production)

Each environment references the same argo-apps/base folder but different branches. Thus:

The staging cluster’s Argo CD points to the staging branch.
The production cluster’s Argo CD points to the main branch.

A typical workflow:

Developer merges changes into staging → triggers ephemeral container tests + staging cluster update.
Once validated in staging, we merge staging → main → triggers ephemeral tests + production cluster update.

No separate “overlays/staging” or “overlays/production” in a single branch are needed. Instead, each environment is captured by its dedicated branch.

CI/CD with GitHub Actions

We define four primary workflow files (one per pipeline stage):

data-fetch.yml
spark-preprocess.yml
ml-training.yml
model-serving.yml

Each workflow:

Lint (flake8) + Unit Tests (pytest).
Build the Docker image if that stage’s code changed.
Spin up an ephemeral container for integration tests (e.g., checking logs, or hitting an endpoint).
If tests pass, we push the Docker image to DigitalOcean Container Registry.
We then update the environment (staging or main) references in argo-apps/base/*.yaml so Argo CD sees it.

Single-Build / Reuse of Images

We build once, ephemeral-test that image in the pipeline, then reuse the identical image for both staging and production. That ensures environment parity: production runs the same artifact tested in staging—no separate rebuild.

4. Vault for Secrets

Each Python code references environment variables like:

VAULT_ADDR
VAULT_PATH_AUTH
VAULT_ROLE
VAULT_SECRET_PATH

… then uses hvac to authenticate with the Kubernetes auth method. Actual secrets (e.g., mongo_uri, github_token, mlflow_uri, etc.) reside in Vault under those paths, ensuring we never store secrets in environment variables or Git.

Blue-Green for Model-Serving

Instead of a standard Deployment, we use:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: model-serving
  labels:
    app: model-serving
spec:
  strategy:
    blueGreen:
      activeService: model-serving-active
      previewService: model-serving-preview
      autoPromotionEnabled: false
      # autoPromotionSeconds: 30 # if you want auto promote

This approach ensures no downtime when updating the serving container. We keep the old version active while spinning up the new version in “preview” mode. After validation, we promote the new version.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
argo-apps/base		argo-apps/base
data-fetch		data-fetch
ml-pipeline		ml-pipeline
model-serving		model-serving
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.md:Zone.Identifier		README.md:Zone.Identifier

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modern MLOps Blueprint

You will need existing k8s clusters to to execute the machine learning pipelines in this repo (preferably one per environment) which are deployed via ArgoCD.

Stages of the Pipeline

Multi-Branch Approach (Staging & Production)

CI/CD with GitHub Actions

Single-Build / Reuse of Images

4. Vault for Secrets

Blue-Green for Model-Serving

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Modern MLOps Blueprint

You will need existing k8s clusters to to execute the machine learning pipelines in this repo (preferably one per environment) which are deployed via ArgoCD.

Stages of the Pipeline

Multi-Branch Approach (Staging & Production)

CI/CD with GitHub Actions

Single-Build / Reuse of Images

4. Vault for Secrets

Blue-Green for Model-Serving

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages