Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
9d411e4
fix(observability): use hardcoded GenAI attribute keys instead of dep…
smasurekar Feb 23, 2026
467f86a
Fixing nemoguardrails config (#382)
niyatisingal Feb 26, 2026
5e1b9ed
Added MIG Slice Support for rtx6000pro (#379)
kumar-punit Feb 26, 2026
685c1d5
Port release-v2.4.0 to release-v2.5.0 and update container versions (…
shubhadeepd Feb 26, 2026
1944a7e
feat: Event-Driven Document/Video Ingestion Pipeline (#351)
minhngu-glitch Feb 27, 2026
befced7
Fix query decomp doc and prompt (#316)
nv-pranjald Feb 27, 2026
7d0796d
Upgrade to Nemotron RC NIMs and NV-Ingest 26.1.2 (#371)
smasurekar Feb 27, 2026
a3c0438
Preserve filename case in filter; fix syntax error examples to use do…
kumar-punit Feb 27, 2026
85d41b6
conf.py fix (#391)
kheiss-uwzoo Feb 27, 2026
8c4e6d4
Added vdb serialization if parallel ingestion which helps in high con…
kumar-punit Mar 2, 2026
c91de07
Update prompt and unify reasoning budged and enable thinking (#386)
nv-pranjald Mar 2, 2026
1f72a57
Concatenate multimodal content for VLM Embed (#362) (#392)
smasurekar Mar 2, 2026
a1d1144
security: Fix frontend CVEs (#396)
shubhadeepd Mar 2, 2026
de589b4
Add config to enable nemotron parse only extraction in nv-ingest (#395)
nv-nikkulkarni Mar 2, 2026
5cee48f
Added patch command in rtx6000pro mig block also in documentation (#398)
kumar-punit Mar 3, 2026
60d3ca6
Upgraded to GA nemotron-ranking-ms and nemotron-embedding-ms containe…
smasurekar Mar 4, 2026
720aed3
Update packages to resolve source code CVEs (#400)
shubhadeepd Mar 4, 2026
a85a69e
Update NIM wait times and patch VSS embed/rerank models (#397)
minhngu-glitch Mar 4, 2026
8fb149b
docs: Add release note for v2.5.0 release (#401)
shubhadeepd Mar 4, 2026
e7ad2fd
Upgraded to GA containers for page-elements, graphic-elements, table-…
smasurekar Mar 5, 2026
16f5ad1
Add SHM size 16gb to reranking and VLM NIM in compose (#405)
nv-nikkulkarni Mar 6, 2026
c347f4f
Fixed the query to be derived from messages if query is not explicitl…
kumar-punit Mar 6, 2026
1a11733
Added url for nvidia/llama-nemotron-rerank-1b-v2 model for cloud endp…
smasurekar Mar 9, 2026
65e965b
fix start proble failureThreshold to 750 (#411)
nv-pranjald Mar 9, 2026
48671f7
Update: Remove Vss on AIDP notebook (#409)
anngu-2xx3 Mar 10, 2026
e629fd1
Kheiss/chunking topic (#417)
kheiss-uwzoo Mar 10, 2026
2060496
CI - Updated Nemotron endpoints for embedding, page-elements, graphic…
smasurekar Mar 10, 2026
51368c3
Add notebook showcasing langchain connector for Nvidia RAG Retrieval …
shubhadeepd Mar 10, 2026
da47ad0
Revert "Added url for nvidia/llama-nemotron-rerank-1b-v2 model for cl…
smasurekar Mar 11, 2026
cfb1dd9
Nemoretriever OCR version 1.2.0 -> 1.2.1 in Helm (#422)
smasurekar Mar 11, 2026
7a0c866
Fix name for rag langchain connector (#423)
shubhadeepd Mar 11, 2026
4e2ff47
Print reasoning tokens if DEBUG logging is enabled (#424)
niyatisingal Mar 11, 2026
aa96586
CI: Update helm packaging and selective publish support (#425)
shubhadeepd Mar 11, 2026
b68c69f
Add rag-blueprint agent skill with CLAUDE.md and project config (#407)
kumar-punit Mar 11, 2026
0d40efa
doc: Fix broken links in notebooks and docs (#426)
shubhadeepd Mar 11, 2026
6d96e2e
Kheiss/prd additions (#427)
kheiss-uwzoo Mar 12, 2026
59a43ca
Launchable Updates for release 2.5 (#428) (#429)
shubhadeepd Mar 12, 2026
1c544b2
Kheiss/cont ingest (#431)
kheiss-uwzoo Mar 13, 2026
0f6c990
Nemotron 3 super deployment guide and migration guide (#430)
nv-pranjald Mar 13, 2026
1f40729
docs: move NIM_MAX_MODEL_LEN and LLM_MAX_TOKENS to general self-hoste…
nv-pranjald Mar 13, 2026
c1ad532
Update: Remove vss and update Minio access console (#433)
anngu-2xx3 Mar 13, 2026
cdc9217
VLM embed doc fix (#435)
smasurekar Mar 17, 2026
a67a48c
docs: add RAG accuracy benchmarks documentation (#434)
nv-pranjald Mar 17, 2026
aff8c66
Update to GA artifact path (#437)
shubhadeepd Mar 17, 2026
feecad5
adding missing accuracy benchmark documentation (#438)
kheiss-uwzoo Mar 18, 2026
7e5ef78
fix(unit): avoid real Milvus in delete_documents tests for CI (#440)
smasurekar Mar 18, 2026
e114a68
Update release date for v2.5.0
shubhadeepd Mar 17, 2026
2f26a00
fix: validate file paths in MCP upload/update tools to prevent path t…
sebastiondev Apr 7, 2026
2eb8f59
Kheiss/versioning modifiers (#462)
kheiss-uwzoo Apr 7, 2026
67bb386
Merge branch 'main' into fix/cwe22-mcp-server-file-acbb
niyatisingal Apr 9, 2026
7153f5a
Merge pull request #465 from sebastiondev/fix/cwe22-mcp-server-file-acbb
niyatisingal Apr 9, 2026
1c03212
Add GCNV data ingestor Helm chart example (#475)
sahoor-netapp Apr 18, 2026
56d3c61
docs(perf): add RAG performance measurement methodology (#490)
TruongNguyenG May 5, 2026
41ea4d6
fix(milvus): escape document source values in delete filter (CWE-89)
sebastiondev May 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
data/dataset.zip filter=lfs diff=lfs merge=lfs -text
data/ filter=lfs diff=lfs merge=lfs -text
examples/rag_event_ingest/data/**/*.mp4 filter=lfs diff=lfs merge=lfs -text
examples/rag_event_ingest/data/**/*.pdf filter=lfs diff=lfs merge=lfs -text
120 changes: 117 additions & 3 deletions .github/workflows/publish-artifacts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,16 @@ on:
- cron: '30 18 * * *'
workflow_dispatch:
inputs:
JOBS_TO_RUN:
description: 'Jobs to run (manual trigger only)'
required: true
default: 'all'
type: choice
options:
- all
- wheel-only
- containers-only
- helm-chart-only
CONTAINER_TAG:
description: 'Custom tag for containers (optional)'
required: false
Expand All @@ -15,6 +25,26 @@ on:
description: 'Artifactory version (optional, defaults to auto-generated from get_version.sh)'
required: false
default: ''
HELM_CHART_VERSION:
description: 'Helm chart version for NGC (optional, defaults to auto-generated from get_version.sh)'
required: false
default: ''
# Container-level selection (applies when JOBS_TO_RUN is 'all' or 'containers-only')
PUBLISH_RAG_SERVER:
description: 'Publish rag-server container'
required: false
default: true
type: boolean
PUBLISH_INGESTOR_SERVER:
description: 'Publish ingestor-server container'
required: false
default: true
type: boolean
PUBLISH_RAG_FRONTEND:
description: 'Publish rag-frontend container'
required: false
default: true
type: boolean

env:
RELEASE_TYPE: dev
Expand All @@ -26,6 +56,7 @@ jobs:
publish-wheel:
name: Build and Publish Python Wheel
runs-on: ubuntu-latest
if: github.event_name != 'workflow_dispatch' || github.event.inputs.JOBS_TO_RUN == 'all' || github.event.inputs.JOBS_TO_RUN == 'wheel-only'
container:
image: python:3.10
steps:
Expand Down Expand Up @@ -106,6 +137,7 @@ jobs:
publish-rag-server:
name: Build and Publish RAG Server Container
runs-on: ubuntu-latest
if: github.event_name != 'workflow_dispatch' || ((github.event.inputs.JOBS_TO_RUN == 'all' || github.event.inputs.JOBS_TO_RUN == 'containers-only') && github.event.inputs.PUBLISH_RAG_SERVER != 'false')
steps:
- name: Checkout code
uses: actions/checkout@v4
Expand Down Expand Up @@ -147,7 +179,7 @@ jobs:
# Tag and push to NGC Container Registry
echo "Pushing rag-server to NGC Container Registry..."
docker push nvcr.io/nvstaging/blueprint/rag-server:$TAG
docker tag nvcr.io/nvstaging/blueprint/rag-server:$TAG nvcr.io/nvstaging/blueprint/rag-server:latest
docker tag nvcr.io/nvidia/blueprint/rag-server:$TAG nvcr.io/nvstaging/blueprint/rag-server:latest
docker push nvcr.io/nvstaging/blueprint/rag-server:latest
echo "RAG server container publishing completed successfully"

Expand All @@ -164,6 +196,7 @@ jobs:
publish-ingestor-server:
name: Build and Publish Ingestor Server Container
runs-on: ubuntu-latest
if: github.event_name != 'workflow_dispatch' || ((github.event.inputs.JOBS_TO_RUN == 'all' || github.event.inputs.JOBS_TO_RUN == 'containers-only') && github.event.inputs.PUBLISH_INGESTOR_SERVER != 'false')
steps:
- name: Checkout code
uses: actions/checkout@v4
Expand Down Expand Up @@ -205,7 +238,7 @@ jobs:
# Tag and push to NGC Container Registry
echo "Pushing ingestor-server to NGC Container Registry..."
docker push nvcr.io/nvstaging/blueprint/ingestor-server:$TAG
docker tag nvcr.io/nvstaging/blueprint/ingestor-server:$TAG nvcr.io/nvstaging/blueprint/ingestor-server:latest
docker tag nvcr.io/nvidia/blueprint/ingestor-server:$TAG nvcr.io/nvstaging/blueprint/ingestor-server:latest
docker push nvcr.io/nvstaging/blueprint/ingestor-server:latest
echo "Ingestor server container publishing completed successfully"

Expand All @@ -222,6 +255,7 @@ jobs:
publish-rag-frontend:
name: Build and Publish RAG Frontend Container
runs-on: ubuntu-latest
if: github.event_name != 'workflow_dispatch' || ((github.event.inputs.JOBS_TO_RUN == 'all' || github.event.inputs.JOBS_TO_RUN == 'containers-only') && github.event.inputs.PUBLISH_RAG_FRONTEND != 'false')
steps:
- name: Checkout code
uses: actions/checkout@v4
Expand Down Expand Up @@ -263,7 +297,7 @@ jobs:
# Tag and push to NGC Container Registry
echo "Pushing rag-frontend to NGC Container Registry..."
docker push nvcr.io/nvstaging/blueprint/rag-frontend:$TAG
docker tag nvcr.io/nvstaging/blueprint/rag-frontend:$TAG nvcr.io/nvstaging/blueprint/rag-frontend:latest
docker tag nvcr.io/nvidia/blueprint/rag-frontend:$TAG nvcr.io/nvstaging/blueprint/rag-frontend:latest
docker push nvcr.io/nvstaging/blueprint/rag-frontend:latest
echo "RAG frontend container publishing completed successfully"

Expand All @@ -274,3 +308,83 @@ jobs:
docker images | grep "rag-frontend" | awk '{print $3}' | xargs -r docker rmi -f || echo "No rag-frontend images to delete"
docker system prune -f || true

# ============================================================================
# PUBLISH HELM CHART TO NGC
# ============================================================================
publish-helm-chart:
name: Build and Publish Helm Chart to NGC
runs-on: ubuntu-latest
if: github.event_name != 'workflow_dispatch' || github.event.inputs.JOBS_TO_RUN == 'all' || github.event.inputs.JOBS_TO_RUN == 'helm-chart-only'
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Install Helm
uses: azure/setup-helm@v4
with:
version: 'v3.17.0'

- name: Install NGC CLI
env:
NGC_API_KEY: ${{ secrets.CI_NVSTAGING_BLUEPRINT_KEY }}
run: |
echo "Installing NGC CLI..."
wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/4.9.10/files/ngccli_linux.zip -O ngccli_linux.zip
unzip -o ngccli_linux.zip
chmod u+x ngc-cli/ngc
echo "$(pwd)/ngc-cli" >> $GITHUB_PATH
echo "NGC CLI installed successfully"

- name: Determine Helm chart version
id: helm_version
run: |
if [ -n "${{ github.event.inputs.HELM_CHART_VERSION }}" ]; then
echo "Using custom Helm chart version: ${{ github.event.inputs.HELM_CHART_VERSION }}"
VERSION="${{ github.event.inputs.HELM_CHART_VERSION }}"
else
echo "Using auto-generated version from get_version.sh"
chmod +x ./ci/get_version.sh
VERSION=$(./ci/get_version.sh)
echo "Generated version: $VERSION"
fi
echo "version=$VERSION" >> $GITHUB_OUTPUT
echo "HELM_CHART_VERSION=$VERSION" >> $GITHUB_ENV

- name: Add Helm repositories
env:
NGC_API_KEY: ${{ secrets.CI_NVSTAGING_BLUEPRINT_KEY }}
run: |
cd deploy/helm
helm repo add nvidia-nim https://helm.ngc.nvidia.com/nim/nvidia/ --username='$oauthtoken' --password="$NGC_API_KEY"
helm repo add nim https://helm.ngc.nvidia.com/nim/ --username='$oauthtoken' --password="$NGC_API_KEY"
helm repo add nemo-microservices https://helm.ngc.nvidia.com/nvidia/nemo-microservices --username='$oauthtoken' --password="$NGC_API_KEY"
helm repo add baidu-nim https://helm.ngc.nvidia.com/nim/baidu --username='$oauthtoken' --password="$NGC_API_KEY"
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add elastic https://helm.elastic.co
helm repo add otel https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo add zipkin https://zipkin.io/zipkin-helm
helm repo add prometheus https://prometheus-community.github.io/helm-charts
helm repo update

- name: Package Helm chart
env:
NGC_API_KEY: ${{ secrets.CI_NVSTAGING_BLUEPRINT_KEY }}
run: |
cd deploy/helm
helm dependency update nvidia-blueprint-rag
helm package nvidia-blueprint-rag/ --version "${{ env.HELM_CHART_VERSION }}"
CHART_TGZ=$(ls nvidia-blueprint-rag-*.tgz)
echo "Created: $CHART_TGZ"

- name: Push Helm chart to NGC
env:
NGC_API_KEY: ${{ secrets.CI_NVSTAGING_BLUEPRINT_KEY }}
run: |
cd deploy/helm
CHART_TGZ="nvidia-blueprint-rag-${{ env.HELM_CHART_VERSION }}.tgz"
TARGET="nvstaging/blueprint/nvidia-blueprint-rag:${{ env.HELM_CHART_VERSION }}"
# Remove existing version to overwrite (ignore error if version does not exist)
ngc registry chart remove "$TARGET" --org nvstaging -y 2>/dev/null || true
ngc registry chart push "$TARGET" --source "$CHART_TGZ" --org nvstaging
echo "Helm chart published to NGC: $TARGET"

5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -80,4 +80,9 @@ coverage/
cover/
*.log
tests/data/
# Agent skills (installed via npx skills add)
/.agents/
/.claude/
skills-lock.json

# Workbench Project Layout
86 changes: 86 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# NVIDIA RAG Blueprint

Reference implementation for a Retrieval Augmented Generation pipeline. Python 3.11+ backend (FastAPI + LangChain), React/TypeScript frontend, deployable via Docker Compose or Helm.

## Project structure

```
src/nvidia_rag/
├── rag_server/ # RAG query/response server (FastAPI)
├── ingestor_server/ # Document ingestion server (FastAPI)
└── utils/ # Shared utilities
frontend/ # React + TypeScript UI (pnpm)
deploy/
├── compose/ # Docker Compose files and env configs
└── helm/ # Helm charts (standard + MIG-slicing)
docs/ # User-facing documentation (Sphinx, RST/MD)
tests/
├── unit/ # No network calls allowed
└── integration/ # Network calls permitted
notebooks/ # Jupyter notebooks for evaluation and examples
```

## Development commands

### Backend (Python)

```bash
uv sync # Install all deps
uv run pytest tests/unit/ # Unit tests
uv run pytest tests/integration/ # Integration tests
ruff check --fix src/ # Lint + autofix
ruff format src/ # Format
pre-commit run --all-files # Run all pre-commit hooks
```

### Frontend (TypeScript)

```bash
cd frontend
pnpm install
pnpm run dev # Dev server
pnpm run lint # ESLint
pnpm exec tsc --noEmit # Type check
pnpm run test:run # Tests
```

## Code conventions

- **Python**: Ruff for linting and formatting (line-length 88, double quotes, space indent). Config in `pyproject.toml`.
- **Type hints**: Required on all function signatures.
- **Imports**: Sorted by isort via Ruff. No in-function imports.
- **Tests**: Mirror source tree (`src/nvidia_rag/rag_server/server.py` → `tests/unit/rag_server/test_server.py`).
- **Frontend**: ESLint + TypeScript strict mode. Function components with hooks.
- **Env files**: `deploy/compose/nvdev.env` (NVIDIA-hosted NIMs) and `deploy/compose/.env` (self-hosted). These are the source of truth for Docker deployments — shell-only exports are lost on restart.

## Deployment modes

1. **Docker Compose** — `deploy/compose/` with env-file configs. Multiple profiles: standard, retrieval-only, NVIDIA-hosted.
2. **Helm** — `deploy/helm/nvidia-blueprint-rag/` chart with `values.yaml`. Supports MIG GPU slicing via `deploy/helm/mig-slicing/`.
3. **Library** — Import `nvidia_rag` as a Python package for custom pipelines.

## Key files

- `pyproject.toml` — All Python deps, ruff config, project metadata
- `deploy/compose/nvdev.env` — Default env file for NVIDIA API Catalog deployments
- `src/nvidia_rag/rag_server/prompt.yaml` — System prompt templates
- `docs/support-matrix.md` — GPU requirements per deployment mode
- `docs/service-port-gpu-reference.md` — Port mappings and GPU assignments

## PR and commit guidelines

- Target the `develop` branch, never `main`.
- All commits must be signed off (DCO).
- Run `pre-commit run --all-files` before submitting.
- See `CONTRIBUTING.md` for full workflow.

## Operations — `rag-blueprint` skill

For any operational task — deploying, configuring, troubleshooting, or shutting down the RAG Blueprint — read and follow the skill at `.agents/skills/rag-blueprint/SKILL.md`.

The skill handles:

- **Deploy** — Docker Compose (standard, retrieval-only, NVIDIA-hosted), Helm, MIG-slicing, library mode
- **Configure** — VLM, guardrails, query rewriting, ingestion, search & retrieval, models, observability, summarization, multimodal, MCP, evaluation, notebooks, UI, and more
- **Troubleshoot** — Debug unhealthy services, container errors, GPU issues, connectivity failures
- **Shutdown** — Stop, tear down, and clean up services
84 changes: 84 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# NVIDIA RAG Blueprint

Reference implementation for a Retrieval Augmented Generation pipeline. Python 3.11+ backend (FastAPI + LangChain), React/TypeScript frontend, deployable via Docker Compose or Helm.

## Project structure

```
src/nvidia_rag/
├── rag_server/ # RAG query/response server (FastAPI)
├── ingestor_server/ # Document ingestion server (FastAPI)
└── utils/ # Shared utilities
frontend/ # React + TypeScript UI (pnpm)
deploy/
├── compose/ # Docker Compose files and env configs
└── helm/ # Helm charts (standard + MIG-slicing)
docs/ # User-facing documentation (Sphinx, RST/MD)
tests/
├── unit/ # No network calls allowed
└── integration/ # Network calls permitted
notebooks/ # Jupyter notebooks for evaluation and examples
```

## Development commands

### Backend (Python)

```bash
uv sync # Install all deps
uv run pytest tests/unit/ # Unit tests
uv run pytest tests/integration/ # Integration tests
ruff check --fix src/ # Lint + autofix
ruff format src/ # Format
pre-commit run --all-files # Run all pre-commit hooks
```

### Frontend (TypeScript)

```bash
cd frontend
pnpm install
pnpm run dev # Dev server
pnpm run lint # ESLint
pnpm exec tsc --noEmit # Type check
pnpm run test:run # Tests
```

## Code conventions

- **Python**: Ruff for linting and formatting (line-length 88, double quotes, space indent). Config in `pyproject.toml`.
- **Type hints**: Required on all function signatures.
- **Imports**: Sorted by isort via Ruff. No in-function imports.
- **Tests**: Mirror source tree (`src/nvidia_rag/rag_server/server.py` → `tests/unit/rag_server/test_server.py`).
- **Frontend**: ESLint + TypeScript strict mode. Function components with hooks.
- **Env files**: `deploy/compose/nvdev.env` (NVIDIA-hosted NIMs) and `deploy/compose/.env` (self-hosted). These are the source of truth for Docker deployments — shell-only exports are lost on restart.

## Deployment modes

1. **Docker Compose** — `deploy/compose/` with env-file configs. Multiple profiles: standard, retrieval-only, NVIDIA-hosted.
2. **Helm** — `deploy/helm/nvidia-blueprint-rag/` chart with `values.yaml`. Supports MIG GPU slicing via `deploy/helm/mig-slicing/`.
3. **Library** — Import `nvidia_rag` as a Python package for custom pipelines.

## Key files

- `pyproject.toml` — All Python deps, ruff config, project metadata
- `deploy/compose/nvdev.env` — Default env file for NVIDIA API Catalog deployments
- `src/nvidia_rag/rag_server/prompt.yaml` — System prompt templates
- `docs/support-matrix.md` — GPU requirements per deployment mode
- `docs/service-port-gpu-reference.md` — Port mappings and GPU assignments

## PR and commit guidelines

- Target the `develop` branch, never `main`.
- All commits must be signed off (DCO).
- Run `pre-commit run --all-files` before submitting.
- See `CONTRIBUTING.md` for full workflow.

## Operations — `/rag-blueprint` skill

For any operational task, use the `rag-blueprint` skill (`.agents/skills/rag-blueprint/`).

- **Deploy** — Docker Compose (standard, retrieval-only, NVIDIA-hosted), Helm, MIG-slicing, library mode
- **Configure** — VLM, guardrails, query rewriting, ingestion, search & retrieval, models, observability, summarization, multimodal, MCP, evaluation, notebooks, UI, and more
- **Troubleshoot** — Debug unhealthy services, container errors, GPU issues, connectivity failures
- **Shutdown** — Stop, tear down, and clean up services
Loading
Loading