Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: CI
name: lint-typecheck-test

on:
push:
Expand All @@ -7,7 +7,7 @@ on:
pull_request:

jobs:
check-and-test:
lint-typecheck-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
Expand Down
49 changes: 49 additions & 0 deletions .github/workflows/package-release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: package-release

on:
push:
tags:
- "v*"

permissions:
contents: write
id-token: write

jobs:
package-release:
runs-on: ubuntu-latest
environment: pypi
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install build
run: pip install build

- name: Build sdist and wheel
run: python -m build

- name: Verify tag matches package version
run: |
TAG_VERSION="${GITHUB_REF_NAME#v}"
PYPROJECT_VERSION=$(python -c "import tomllib; print(tomllib.load(open('pyproject.toml','rb'))['project']['version'])")
if [ "$TAG_VERSION" != "$PYPROJECT_VERSION" ]; then
echo "Tag version ($TAG_VERSION) does not match pyproject.toml ($PYPROJECT_VERSION)"
exit 1
fi

- name: Create GitHub Release
uses: softprops/action-gh-release@v2
with:
generate_release_notes: true
files: dist/*
prerelease: ${{ contains(github.ref_name, '-') }}

- name: Publish to PyPI
if: ${{ !contains(github.ref_name, '-') }}
uses: pypa/gh-action-publish-to-pypi@release/v1
91 changes: 24 additions & 67 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,96 +1,53 @@
# Contributing

## Development Setup
## Development setup

From the repository root:

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pre-commit install
```

See the [Development setup](README.md#development-setup) section in the root README for a quick overview.

## Tests

Layout mirrors the `vexrag/` package under `tests/` (e.g. `tests/core/`, `tests/cli/`, `tests/e2e/`). Shared fixtures live in `tests/conftest.py`; CLI/scan stubs in `tests/mocks.py`.

```bash
poe test # fast unit tests (excludes integration and e2e)
poe test-integration # wiring tests (mocked HTTP, etc.)
poe test-e2e # full vx scan against rag_examples (see below)
poe cov # unit tests with coverage (excludes e2e)
```

### E2E smoke scans (`poe test-e2e`)

Prerequisites:
| Command | Scope |
| --- | --- |
| `poe test` | Unit tests (default for PRs) |
| `poe test-integration` | Wiring tests with mocked HTTP |
| `poe test-e2e` | Full `vx scan` smoke tests |
| `poe cov` | Unit tests with coverage |

- Ollama running at `http://localhost:11434` with `llama3:8b`:
E2E tests require Ollama with `llama3:8b`, port `8080`, and deps from `rag_examples/` (see [README](README.md)). They skip cleanly when prerequisites are missing.

```bash
ollama pull llama3:8b
```
## Code quality

- Repo dev install: `pip install -e ".[dev]"` and `.venv/bin/vx` available.
- Port `8080` free (all e2e cases share one RAG target).
- RAG example dependencies installed in the example directory or current Python env (see each `rag_examples/*/requirements.txt`).
- For **native** vector-DB poisoner cases (`ollama-smoke-native-poisoner.yaml`), install optional extras as needed:

```bash
pip install -e ".[dev,sentence-transformers,faiss,chroma,qdrant]"
```

Expect ~7 sequential smoke scans (several minutes each with LLM calls). Tests skip cleanly when Ollama, models, deps, or port `8080` are unavailable — normal in CI without GPU/Ollama.

The **`medium_qdrant:native`** case uses a Qdrant **server** (`qdrant/qdrant` via Docker) so the RAG target and `vx scan` do not contend for an embedded `qdrant_data/` lock. Prerequisites:

- Docker installed and running; image `qdrant/qdrant` (pulled on first run).
- Same Ollama, `vx`, port `8080`, and native extras as other native cases.
CI runs the same checks as a local PR gate:

```bash
poe test-e2e -k "medium_qdrant:native"
poe check # ruff + mypy
poe test
```

Without Docker, that single test is **skipped**; the other six e2e cases are unchanged.

## Code Quality

Run checks before committing:
Auto-fix formatting and lint issues:

```bash
ruff check .
ruff format --check .
poe fix
```

Auto-fix issues:
Or run hooks manually: `pre-commit run --all-files`.

```bash
ruff check --fix .
ruff format .
```

## Pre-commit Hooks

Install hooks once per clone:

```bash
pre-commit install
```
## Pull requests

Run hooks manually on all files:

```bash
pre-commit run --all-files
```

## Git Commits
- Keep changes minimal and focused on one concern.
- Avoid unrelated refactors in the same PR.

- **Commits are human-only.** The Cursor agent must not run `git commit` or `git push` unless you explicitly ask it to.
- When changes are ready, review the diff and commit locally yourself.
## Releasing (maintainers)

## Commit Scope
1. Bump `version` in `pyproject.toml`.
2. Merge to `master` and wait for CI to pass.
3. `git tag vX.Y.Z && git push origin vX.Y.Z`.

- Keep changes minimal and focused on one concern.
- Avoid unrelated refactors in the same commit.
The [package-release workflow](.github/workflows/package-release.yml) builds `dist/*`, creates a GitHub Release, and publishes stable tags to PyPI. Prerelease tags (e.g. `v1.0.0-rc1`) skip PyPI.
122 changes: 82 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,99 +4,136 @@

<p align="center">
<a href="https://github.com/Shepard2154/VexRAG"><img src="https://img.shields.io/badge/project-in%20development-F59E0B?style=flat-square" alt="Project: in development" height="28"></a>
<a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.11-blue?style=flat-square" alt="Python 3.11" height="28"></a>
<a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.11+-blue?style=flat-square" alt="Python 3.11+" height="28"></a>
<a href="https://pepy.tech/projects/vexrag"><img src="https://static.pepy.tech/personalized-badge/vexrag?period=total&units=ABBREVIATION&left_color=BLACK&right_color=GREEN&left_text=downloads" alt="PyPI Downloads" height="28"></a>
<a href="https://t.me/vexrag"><img src="https://img.shields.io/badge/chat-join-blue?style=flat-square&logo=telegram" alt="Telegram chat" height="28"></a>
</p>

Most RAG security tools focus on jailbreaking or prompt injection. VexRAG is different: it injects poisoned passages directly into the retrieval index and measures whether the system’s answers remain factually correct. It is not about safety refusals — it’s about functional correctness under adversarial data manipulation.
Most RAG security tools focus on jailbreaking or prompt injection. VexRAG is different: it injects poisoned passages directly into the retrieval index and measures the system’s answer functional correctness under adversarial data manipulation.

> **Stability notice (pre-0.2.0):** VexRAG is currently test-stage software and is **not production-ready**.
> Until version `0.2.0`, backward compatibility is **not guaranteed** and updates may include **breaking changes**.
## Threat model — when to use VexRAG

**Sample RAG stacks** for getting started: [rag_examples](rag_examples/README.md).
VexRAG is for **security testing your own RAG** in a **controlled, isolated environment** (not production).

## Quickstart
**Use VexRAG if:**

### Prerequisites
- Retrieval data may be **untrusted or partially adversarial** (uploads, crawls, third-party corpora).
- You need to measure behavior when an attacker can **poison or skew the index**.
- You are building **trust-aware RAG** and want evidence of resilience, not only prompt-level guards.

**You probably do not need VexRAG if:**

- Every indexed document is fully trusted, ingestion is strictly controlled, and you accept that risk without red-team validation.
- You only care about **query-time** prompt injection or jailbreaks (VexRAG targets **retrieval and corpus** attacks).

## Science-first approach

VexRAG implements **paper-backed** attacks, not ad-hoc heuristics:

<table border="1" cellpadding="8" cellspacing="0">
<thead>
<tr>
<th align="left">Method</th>
<th align="left">Paper</th>
<th align="left">Summary</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>PoisonedRAG</strong></td>
<td><a href="https://arxiv.org/abs/2402.07867">arXiv:2402.07867</a></td>
<td>Poisoning the retrieval corpus</td>
</tr>
<tr>
<td><strong>HijackRAG</strong></td>
<td><a href="https://arxiv.org/abs/2410.22832">arXiv:2410.22832</a></td>
<td>Hijacking retrieved contexts</td>
</tr>
</tbody>
</table>

See [`vexrag/attack_algorithms/`](vexrag/attack_algorithms/) for implementation details and fidelity notes.

> **Warning — real corpus mutation**
> Scans write poisoned passages into the target retrieval index. Configs default to `cleanup: true`, but interrupted runs may still leave poison behind.
> Never target production or shared indexes. Back up the retrieval database before testing on your own data.

## Versioning Policy

VexRAG is an early-stage library. Until `1.0.0`, treat **any release as potentially breaking** — configs, CLI flags, and APIs may change without a major version bump.

When we deprecate public functionality, it stays available for **two minor releases** before removal (e.g. deprecated in `0.3.0`, removed in `0.5.0`).

From `1.0.0` onward we plan to follow [SemVer](https://semver.org/) (breaking changes in major releases only).

## Prerequisites

Use a sample RAG target from [rag_examples](rag_examples/README.md): start the example app (each folder’s README has the command), then scan it with `vx`.

For the default Ollama-based configs you need **Python 3.11+** and a running Ollama daemon. CI and releases are tested on **3.11** only; newer versions may work but are not officially supported yet.

```bash
python --version # requires 3.11
python --version # 3.11+ required; 3.11 tested in CI
ollama list
```

Install/pull required Ollama models:
Install/pull Ollama models for scan configs:

```bash
ollama pull llama3:8b
ollama pull nomic-embed-text:latest
```

You also need a running target API endpoint (for the small example: `http://localhost:8080`).
For full benchmarks (`ollama-default.yaml` and some advanced configs), also pull `nomic-embed-text:latest`.

### 1) Install VexRAG
All [rag_examples](rag_examples/README.md) targets default to `http://localhost:8080`; run one example at a time.

## Installation

```bash
pip install vexrag
```

For vector DB-specific extras:
Optional extras (install what your stack needs):

```bash
pip install "vexrag[qdrant]"
pip install "vexrag[chroma]"
pip install "vexrag[faiss]"
pip install "vexrag[sentence-transformers]"
```

### 2) Verify installation
The `sentence-transformers` extra enables the in-process embedding provider in scan configs; model weights download from Hugging Face on first use.

Verify the CLI:

```bash
vx --help
```

### 3) Run a scan from config
## Run a scan

```bash
vx scan --config path/to/scan.yaml
```

Use sample configs from `rag_examples/` as a starting point.

### 4) First successful scan (small local example)
## First successful scan

From `rag_examples/small/rag_01_in_memory_en`:

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install vexrag
pip install -r requirements.txt
python small_rag.py
python3 small_rag.py
vx scan --config scan_configs_examples/ollama-smoke.yaml
```

Expected outcome:
- `small_rag.py` serves the target API on `http://localhost:8080`.
- `vx scan` completes and prints a scan report with attack/evaluation results (no connection/preflight errors).

## Development setup

From the repository root:

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pre-commit install # optional
```

Run quality checks:

```bash
ruff check .
ruff format --check .
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for commit and workflow notes.
- `python3 small_rag.py` serves the target API on `http://localhost:8080` (embeddings via Hugging Face; first run may download model weights).
- `vx scan` finishes with a scan report. Smoke config needs only `ollama pull llama3:8b`.

## Project roadmap

Expand All @@ -107,11 +144,16 @@ See [CONTRIBUTING.md](CONTRIBUTING.md) for commit and workflow notes.
- [x] Support for vLLM and Ollama
- [x] Simple RAG examples for quick onboarding to VexRAG
- [x] Support for Qdrant, FAISS, Chroma, and file-based retrieval backends
- [x] Codebase hardening: refactors, typing, tooling, *removing AI slop*

### In Progress
- [ ] Codebase hardening: refactors, typing, tooling, *removing AI slop*
- [ ] Stable Python API to run scans and generate cases from code, not only via `vx`

### Ideas / Backlog
- [ ] Expand red-team methods in VexRAG
- [ ] Expand supported retrieval backends
- [ ] Implement a web version of VexRAG

## Feedback

Feel free to [open a GitHub issue](https://github.com/Shepard2154/VexRAG/issues) for bugs, questions, or attack methods you would like to see in VexRAG. Pull requests and local development notes are in [CONTRIBUTING.md](CONTRIBUTING.md).
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "vexrag"
version = "0.1.3"
version = "0.2.0"
description = "A Red Team framework that evaluates RAG functional correctness when the retrieval backend contains poisoned passages."
readme = "README.md"
requires-python = ">=3.11"
Expand Down
Loading