ACTIVATE — vLLM + RAG

This Compose stack runs from the github repo here and executes the below services in Docker or Singularity modes:

vLLM model server (OpenAI-compatible)
RAG retrieval API (Chroma)
Indexer (filesystem → Chroma, auto-updates)
Enhanced Proxy exposing /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models
Open WebUI (optional) pointing to the Proxy

See a turnkey demonstration of the workflow running on ACTIVATE at the link below:

Workflow Instructions

Pull down the weights of your choice into a known directory. For example we recommend using git lfs to pull down weights as this is more widely open to firewalls and is relatively fast at pulls:

cd /mymodeldir/
git lfs install
git clone https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5

The workflow will provide a field to also pull down a prebuilt vllm singularity container if running in this mode, but you can also pull this down manually for example using the authenticated pw cli:

cd ~/pw/activate-rag-vllm
pw buckets cp pw://mshaxted/codeassist/vllm.sif ./

Manual Quickstart

export HF_TOKEN=hf_xyz
export RUNMODE=docker # or singularity
export BUILD=true
export RUNTYPE=all # or vllm only

# run the service
./run.sh

Files you might care about

docker-compose.yml — stack definition
Dockerfile.rag — builds the RAG + Indexer + Proxy image
rag_proxy.py — enhanced OpenAI-compatible proxy with streaming + extra endpoints
rag_server.py — RAG search API
indexer.py, indexer_config.yaml — auto indexer for filesystem changes
docs/ — mount point for your documents
cache/ — workload specific data storage

Smoke tests

# Health
curl http://localhost:${PROXY_PORT}/health | jq

# Chat (non-stream)
curl -sS http://localhost:${PROXY_PORT}/v1/chat/completions  -H 'content-type: application/json'  -d '{"model":"'"${MODEL_NAME}"'","messages":[{"role":"user","content":"Summarize the docs."}], "max_tokens":200}' | jq

# Chat (stream)
curl -N http://localhost:${PROXY_PORT}/v1/chat/completions  -H 'content-type: application/json'  -d '{"model":"'"${MODEL_NAME}"'","messages":[{"role":"user","content":"Hello"}], "stream": true}'

Name		Name	Last commit message	Last commit date
Latest commit History 247 Commits
docker		docker
singularity		singularity
yamls		yamls
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README-VLLM.md		README-VLLM.md
README.md		README.md
clean.sh		clean.sh
indexer.py		indexer.py
indexer_config.yaml		indexer_config.yaml
rag_proxy.py		rag_proxy.py
rag_server.py		rag_server.py
start_service.sh		start_service.sh
thumbnail-vllm.png		thumbnail-vllm.png
thumbnail.png		thumbnail.png
workflow-vllm.yaml		workflow-vllm.yaml
workflow.yaml		workflow.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ACTIVATE — vLLM + RAG

Workflow Instructions

Manual Quickstart

Files you might care about

Smoke tests

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

parallelworks/activate-rag-vllm

Folders and files

Latest commit

History

Repository files navigation

ACTIVATE — vLLM + RAG

Workflow Instructions

Manual Quickstart

Files you might care about

Smoke tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages