diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
new file mode 100644
index 0000000..0cae487
--- /dev/null
+++ b/.github/workflows/docs.yml
@@ -0,0 +1,32 @@
+name: Documentation
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+  workflow_dispatch:
+
+jobs:
+  docs:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Install docs dependencies
+        run: pip install mkdocs-material
+
+      - name: Build docs
+        run: mkdocs build --strict
+
+      - name: Deploy to GitHub Pages
+        if: github.ref == 'refs/heads/main' && github.event_name != 'pull_request'
+        run: mkdocs gh-deploy --force
diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 0000000..5724f82
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,43 @@
+# Architecture
+
+![DSAgt architecture](assets/architecture.png)
+
+DSAgt wraps an unmodified agent CLI with four independently-operable layers. Each layer exposes its own MCP server so the agent discovers and invokes capabilities through the standard MCP tool protocol.
+
+## Layers
+
+**Tool Registry** (`dsagt-registry-server`)
+The agent registers CLI tools as markdown files with YAML frontmatter under `<project>/tools/`. The registry server handles dependency installation via `uv run --with` and wraps every execution with `dsagt-run` for provenance capture. The agent discovers tools via `search_registry`.
+
+**Knowledge Base** (`dsagt-knowledge-server`)
+Semantic search over six independently-partitioned ChromaDB collections. Three are global (populated by `dsagt setup-kb`); three are per-project (filled automatically during use). Background jobs handle long ingest operations. The agent searches via `kb_search`, ingests via `kb_ingest`, and saves user-confirmed facts via `kb_remember`.
+
+**Provenance** (`dsagt-run`)
+A thin wrapper invoked by the registry server around every tool execution. Records the command, arguments, exit code, duration, file counts, and truncated stderr to `<project>/trace_archive/<record_id>.json` and emits an OTLP span to MLflow. The agent calls `reconstruct_pipeline` to render the trace archive as a reproducible bash script or Snakemake workflow.
+
+**Observability** (MLflow + OTLP)
+MLflow runs locally at a port pinned at `dsagt init` time. All four layers emit OTLP HTTP spans to MLflow's `/v1/traces` endpoint. The agent's own LLM-call traces land in the same store when you export the `OTEL_EXPORTER_OTLP_ENDPOINT` printed by `dsagt init`.
+
+## Project Layout
+
+```
+~/dsagt-projects/<name>/
+  dsagt_config.yaml             # project configuration
+  tools/                        # registered CLI tool specs (markdown + YAML frontmatter)
+  tools/code/                   # agent-written tool scripts
+  skills/                       # agent skills (SKILL.md + reference docs)
+  trace_archive/                # tool execution records (JSON, from dsagt-run)
+  mlflow/                       # MLflow traces, metrics, artifacts
+  kb_index/                     # knowledge base vector collections
+  explicit_memories.yaml        # user-confirmed facts
+
+  # Per-agent runtime config (one of, generated by dsagt init):
+  #   claude:   CLAUDE.md, .mcp.json
+  #   goose:    goose.yaml, .goosehints
+  #   codex:    AGENTS.md, .codex-data/config.toml
+  #   opencode: AGENTS.md, opencode.json
+  #   roo:      .roomodes, .roo/mcp.json
+  #   cline:    .clinerules/, cline_mcp_settings.json
+```
+
+Projects are registered in `~/.dsagt/projects.yaml` so `dsagt mlflow <name>` and `dsagt info <name>` work from any directory. The data layer is agent-agnostic — re-running `dsagt init <same-name> --agent <other>` switches agent platforms while preserving all accumulated knowledge and traces.
diff --git a/docs/assets/architecture.png b/docs/assets/architecture.png
new file mode 100644
index 0000000..7dbdf46
Binary files /dev/null and b/docs/assets/architecture.png differ
diff --git a/docs/cli.md b/docs/cli.md
new file mode 100644
index 0000000..d4b24e4
--- /dev/null
+++ b/docs/cli.md
@@ -0,0 +1,49 @@
+# CLI Reference
+
+All commands are available after running `uv sync` and activating the virtual environment (`source .venv/bin/activate`).
+
+## Project Management
+
+| Command | Description |
+|---------|-------------|
+| `dsagt init <name> --agent <platform> [--location <path>] [--mlflow-port N]` | Create a project; write per-agent MCP config; print the launch one-liner |
+| `dsagt list` | List all projects with agent, status, and path |
+| `dsagt info <name> [--json]` | Resolved config (with source per value) and a session/error summary |
+| `dsagt mv <name> <new-location>` | Move a project to a new location |
+| `dsagt rm <name> [-y] [--keep-files]` | Unregister a project and optionally delete its directory |
+
+## Session Lifecycle
+
+| Command | Description |
+|---------|-------------|
+| `dsagt mlflow <name>` | Start MLflow for a project and print OTel routing exports |
+| `dsagt stop <name>` | Stop the MLflow daemon |
+| `dsagt memory --project <name>` | Distill new traces from MLflow into episodic memory |
+
+## Setup
+
+| Command | Description |
+|---------|-------------|
+| `dsagt setup-kb [--collection <name>]` | Build the shared core knowledge base collections |
+| `dsagt smoke-test [--agent claude\|goose\|codex\|opencode]` | End-to-end install verification |
+
+## Project Location
+
+The default project location is `~/dsagt-projects/<name>/`. Override with `--location`:
+
+```bash
+dsagt init my-project --agent claude --location /data/runs   # /data/runs/my-project/
+dsagt init my-project --agent claude --location .            # ./my-project/
+```
+
+## Server Commands
+
+These are launched automatically by `dsagt init` via the per-agent MCP config and are not typically run directly.
+
+| Command | Description |
+|---------|-------------|
+| `dsagt-registry-server` | Tool registry MCP server |
+| `dsagt-knowledge-server` | Knowledge base MCP server |
+| `dsagt-run` | Provenance-capturing tool execution wrapper |
+| `dsagt-proxy` | LiteLLM proxy server (proxy mode only) |
+| `dsagt-setup-kb` | Core knowledge base setup (called by `dsagt setup-kb`) |
diff --git a/docs/developer.md b/docs/developer.md
new file mode 100644
index 0000000..38cc778
--- /dev/null
+++ b/docs/developer.md
@@ -0,0 +1,47 @@
+# Developer Guide
+
+Material for contributors and users who are working beyond the default `dsagt init` → `dsagt mlflow` → agent flow.
+
+## Tests
+
+```bash
+uv run python -m pytest -m "not integration"     # unit tests, no creds required
+uv run python -m pytest -m integration -v        # integration tests (require .env)
+```
+
+Integration tests read endpoint and key values from `.env` at the repo root. Copy `.env.example` to `.env` and fill in your values.
+
+For per-flow hand-tests (CLI, proxy mode, VS Code extensions), see the scripts under [`tests/smoke_test/manual_runs/`](https://github.com/AI-ModCon/dsagt/tree/main/tests/smoke_test/manual_runs/).
+
+## Proxy Mode
+
+`dsagt init` followed by `dsagt start <project> --enable-proxy` spawns a LiteLLM proxy in front of your agent's LLM calls. This adds:
+
+- Full LLM-call traces (request bodies, tool-use blocks, response payloads) in MLflow for agents whose native OTel does not emit those payloads (codex, opencode).
+- Cache-breakpoint injection on outgoing requests (Anthropic prompt caching).
+- Sidechannel detection for agent-internal title-generator / session-namer calls.
+- Model-name aliasing — useful when an agent CLI hardcodes a model whitelist incompatible with your gateway's served names (cline, roo).
+
+Proxy mode reads upstream LLM credentials from `.env` or the shell. See [`tests/smoke_test/manual_runs/proxy_walkthrough.md`](https://github.com/AI-ModCon/dsagt/blob/main/tests/smoke_test/manual_runs/proxy_walkthrough.md) for the full setup walkthrough.
+
+## Troubleshooting
+
+**Agent command not found.** The agent CLI is not installed or is not on PATH. See the [supported agents table](index.md#supported-agents).
+
+**MCP servers not connecting.** Verify uv resolves the server commands:
+
+```bash
+uv run which dsagt-registry-server
+uv run which dsagt-knowledge-server
+```
+
+If missing, reinstall: `uv sync --reinstall`.
+
+**MLflow UI empty.** Confirm MLflow is running for the right project:
+
+```bash
+dsagt info <name>           # shows the pinned port
+curl http://localhost:<mlflow_port>
+```
+
+**Claude keychain conflict.** If `claude` will not authenticate against a non-default gateway, run `claude /logout` to clear the macOS Keychain OAuth token, then re-export `ANTHROPIC_BASE_URL` / `ANTHROPIC_API_KEY` and re-launch.
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 0000000..ff5eaa6
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,44 @@
+# DSAgt
+
+**D**ata**S**mith **Ag**en**t** — AI-assisted data pipeline builder.
+
+DSAgt connects an MCP-compatible AI coding agent to tool registration, a semantic knowledge base, execution provenance, and observability infrastructure. It provides data-pipeline scaffolding around your existing agent CLI or VS Code extension (Claude Code, Goose, Codex, and others).
+
+## Supported Agents
+
+| Agent | Install | Verify |
+|-------|---------|--------|
+| [Claude Code](https://github.com/anthropics/claude-code) | `npm i -g @anthropic-ai/claude-code` | `claude --version` |
+| [Goose](https://github.com/block/goose) | See [Goose docs](https://github.com/block/goose#installation) | `goose --version` |
+| [Codex](https://github.com/openai/codex) | `npm i -g @openai/codex` | `codex --version` |
+| [opencode](https://github.com/sst/opencode) | See [opencode docs](https://opencode.ai/docs/) | `opencode --version` |
+| [Roo Code](https://github.com/RooCodeInc/Roo-Code) | `npm i -g @roo-code/cli` | `roo --version` |
+| [Cline](https://github.com/cline/cline) | `npm i -g cline` | `cline --version` |
+
+## Prerequisites
+
+- Python 3.12–3.13
+- [uv](https://github.com/astral-sh/uv)
+- One of the supported agent platforms above, installed and authenticated against your LLM provider
+
+## Installation
+
+```bash
+git clone https://github.com/AI-ModCon/dsagt.git
+cd dsagt
+uv sync
+source .venv/bin/activate
+```
+
+## Key Capabilities
+
+| Layer | What it does |
+|-------|-------------|
+| **Tool Registry** | Register CLI tools as markdown specs; the agent discovers and runs them via `search_registry` |
+| **Knowledge Base** | Semantic search over indexed document collections (ChromaDB + FAISS) |
+| **Provenance** | `dsagt-run` wrapper records every tool execution to `trace_archive/` and MLflow |
+| **Explicit Memory** | User-confirmed facts persisted to YAML and the knowledge base |
+| **Episodic Memory** | Session distillation via outlier detection over MLflow traces |
+| **Observability** | Full OTLP tracing to a local MLflow instance |
+
+See the [Quick Start](quickstart.md) to try all of these in a single session.
diff --git a/docs/knowledge-base.md b/docs/knowledge-base.md
new file mode 100644
index 0000000..3370677
--- /dev/null
+++ b/docs/knowledge-base.md
@@ -0,0 +1,40 @@
+# Knowledge Base
+
+DSAgt maintains six independently-partitioned ChromaDB collections. The first three are global (under `~/.dsagt/kb_index/`, populated by `dsagt setup-kb`); the last three are per-project (under `<project>/kb_index/`, populated automatically during use).
+
+## Collections
+
+| Collection | Source | Populated by |
+|---|---|---|
+| **Tool Specs** | Bundled CLI tool specs in `src/dsagt/tools/` | `dsagt setup-kb` |
+| **Skills** | Bundled skill workflows in `src/dsagt/skills/` | `dsagt setup-kb` |
+| **Domain Knowledge** | NeMo Curator + AIDRIN reference corpora; user-ingested docs | `dsagt setup-kb` + agent's `kb_ingest` |
+| **Explicit Memory** | User-confirmed facts | Agent's `kb_remember` (also written to `<project>/explicit_memories.yaml`) |
+| **Episodic Memory** | Distilled facts from MLflow traces | `dsagt memory --project <name>` |
+| **Tool Use Records** | `dsagt-run` execution traces | `dsagt-run` wrapper writes JSON to `<project>/trace_archive/`; indexed by `dsagt memory` |
+
+## Explicit Memory
+
+Explicit memories are facts the user confirms during a session. The agent saves them via `kb_remember`, which writes to both the ChromaDB collection and `<project>/explicit_memories.yaml`. The agent fetches them via `kb_get_memories` on demand (typically when you ask it to recall something) — they are not auto-loaded at session start.
+
+## Episodic Memory
+
+`dsagt memory --project <name>` distills new traces from the project's MLflow store into episodic memory using per-category outlier detection over embedding centroids. Run this after each session to accumulate cross-session memory.
+
+## Search
+
+The agent searches all collections via `kb_search` (knowledge MCP server) and writes via `kb_ingest` / `kb_remember`. Tool Specs and Skills are queried through specialized routes (`search_registry`, `search_skills`) over the same backend.
+
+Hybrid search (dense embeddings + sparse BM25 via Reciprocal Rank Fusion) is on by default per collection route. Cross-encoder reranking is optional.
+
+## Setup
+
+```bash
+dsagt setup-kb                       # all global collections (local embedder)
+dsagt setup-kb --collection nemo_curator
+dsagt setup-kb --embedding-backend api \
+    --embedding-base-url <url> \
+    --embedding-api-key <key>
+```
+
+The Tool Specs and Skills collections are wiped and rebuilt on every `setup-kb` run — re-run after upgrading DSAgt to pick up new bundled assets.
diff --git a/docs/mcp-servers.md b/docs/mcp-servers.md
new file mode 100644
index 0000000..ee999cc
--- /dev/null
+++ b/docs/mcp-servers.md
@@ -0,0 +1,38 @@
+# MCP Servers
+
+DSAgt exposes its capabilities through two MCP servers. Both are launched automatically by `dsagt init` and configured in the per-agent runtime file (`.mcp.json` for Claude Code, `goose.yaml` for Goose, etc.).
+
+## Registry Server
+
+**Command:** `dsagt-registry-server`
+
+Handles tool registration, dependency installation, and tool discovery.
+
+| Tool | Description |
+|------|-------------|
+| `search_registry` | Semantic search over registered tool specs |
+| `save_tool_spec` | Register a new CLI tool as a markdown file with YAML frontmatter |
+| `install_dependencies` | Install tool dependencies via `uv run --with` |
+| `reconstruct_pipeline` | Render the trace archive as a bash script or Snakemake workflow |
+
+Tools are markdown files with YAML frontmatter under `<project>/tools/`. Executables are wrapped with `dsagt-run` for provenance and `uv run --with` for Python dependencies.
+
+## Knowledge Server
+
+**Command:** `dsagt-knowledge-server`
+
+Semantic search and ingestion over indexed document collections.
+
+| Tool | Description |
+|------|-------------|
+| `kb_search` | Search across one or more knowledge collections |
+| `kb_ingest` | Index a file or directory into a named collection (runs in background for large corpora) |
+| `kb_remember` | Save a user-confirmed fact to explicit memory |
+| `kb_get_memories` | Retrieve explicit memories for the current project |
+| `search_skills` | Discover agent skill workflows |
+
+### Backend
+
+The default embedding backend is local (`sentence-transformers`, CPU-only, no API key needed). Switch to `embedding.backend: api` in `dsagt_config.yaml` to route through a hosted embedder via LiteLLM. Cross-encoder reranking is available via `knowledge.rerank: true`.
+
+Hybrid search (dense + sparse BM25) is on by default and controlled per-route via the `hybrid` flag.
diff --git a/docs/observability.md b/docs/observability.md
new file mode 100644
index 0000000..3344a9a
--- /dev/null
+++ b/docs/observability.md
@@ -0,0 +1,45 @@
+# Observability
+
+DSAgt provides end-to-end trace visibility through a local MLflow instance. All internal layers emit OTLP HTTP spans to MLflow's `/v1/traces` endpoint.
+
+## Starting MLflow
+
+```bash
+dsagt mlflow <project-name>
+```
+
+Prints the MLflow UI URL and the `export` block for routing agent OTel output. The port is pinned at `dsagt init` time and listed by `dsagt info <name>`.
+
+## Trace Coverage
+
+| Source | Span type | Contents |
+|--------|-----------|----------|
+| Knowledge base | `kb.search`, `kb.embed`, `kb.index_search`, `kb.rerank` | Per-phase timing trees |
+| Tool executions | `tool.execute` | Exit code, duration, file counts, truncated stderr. Full payload in `trace_archive/<record_id>.json` |
+| Registry events | `save_tool_spec`, `install_dependencies`, `reconstruct_pipeline` | Span metadata |
+| Native agent OTel | LLM call spans | Coverage varies by agent (see below) |
+
+### Agent OTel Coverage
+
+Export the variables printed by `dsagt mlflow` before launching your agent:
+
+| Agent | Coverage |
+|-------|----------|
+| claude | Full request/response payloads |
+| goose | Full request/response payloads |
+| codex | Token counts and tool names |
+| opencode | None natively |
+
+Every span carries the project's `session.id` for filtering in the MLflow trace view.
+
+## Provenance and Reconstruction
+
+Tool execution records on disk (`trace_archive/<record_id>.json`) provide the canonical provenance chain. The agent calls `reconstruct_pipeline` to render the archive as a reproducible bash script or Snakemake workflow.
+
+## Stopping MLflow
+
+```bash
+dsagt stop <project-name>
+```
+
+Releases the port and stops the gunicorn workers. The PID is stored in `<project>/.runtime`.
diff --git a/docs/quickstart.md b/docs/quickstart.md
new file mode 100644
index 0000000..e1cec67
--- /dev/null
+++ b/docs/quickstart.md
@@ -0,0 +1,96 @@
+# Quick Start
+
+This guide walks through knowledge ingest, tool registration, provenance, and explicit memory using the mock project in [`tests/smoke_test/`](https://github.com/AI-ModCon/dsagt/tree/main/tests/smoke_test/). The examples use `claude`; substitute another agent (`goose`, `codex`, `opencode`) if you prefer — the prompts are agent-agnostic.
+
+## Setup
+
+```bash
+# Install
+git clone https://github.com/AI-ModCon/dsagt.git
+cd dsagt
+uv sync
+source .venv/bin/activate
+
+# Set a convenience variable for the smoke test directory (not a normal dsagt step)
+export SMOKE_DIR="$(pwd)/tests/smoke_test"
+
+# 1. Create a new project called quickstart
+dsagt init quickstart --agent claude
+
+# 2. Start MLflow in the background and print the OTel routing exports
+dsagt mlflow quickstart
+
+# 3. Paste the export block from step 2 into this shell, then launch the agent
+cd ~/dsagt-projects/quickstart && claude
+```
+
+## Agent Prompts
+
+Inside the agent, paste these prompts one at a time. Replace `$SMOKE_DIR` with the absolute path you exported — the chat does not expand shell variables.
+
+1. > Ingest the docs in `$SMOKE_DIR/knowledge/` into a collection named `knowledge`.
+2. > Register the csvkit CLI tools `csvcut`, `csvgrep`, `csvstat`, and `csvlook`.
+3. > Use the `scan_directory` tool from the registry to scan `$SMOKE_DIR/data/`.
+4. > Summarize `samples.csv` — columns, row count, quality issues using csvkit tools from the registry.
+5. > Put this in explicit memory: samples.csv has null values in the status and timestamp columns.
+6. > Tell me what you remember about the samples dataset.
+
+## Teardown
+
+After exiting the agent, distill the session into episodic memory and stop the MLflow daemon:
+
+```bash
+# Distill traces into episodic memory
+dsagt memory --project quickstart
+
+# Stop the MLflow daemon
+dsagt stop quickstart
+```
+
+## What Was Exercised
+
+| Prompt | DSAgt layer |
+|--------|-------------|
+| 1 | Knowledge MCP server (`kb_ingest`) — chunks and indexes docs into ChromaDB |
+| 2 | Registry MCP server (`save_tool_spec`) — writes `tools/csvcut.md`, etc. |
+| 3 | `dsagt-run` provenance wrapper — records exec layer to `trace_archive/` |
+| 4 | KB recall via `kb_search` and registered tool execution |
+| 5–6 | Explicit memory (`kb_remember` → `explicit_memories.yaml`) + `kb_get_memories` |
+
+## Verify the Artifacts
+
+```bash
+dsagt info quickstart
+ls ~/dsagt-projects/quickstart/{tools,trace_archive}
+cat ~/dsagt-projects/quickstart/explicit_memories.yaml
+```
+
+The MLflow UI URL is printed by `dsagt mlflow quickstart`.
+
+## Non-Interactive Smoke Test
+
+The same flow runs non-interactively and asserts each artifact is present:
+
+```bash
+dsagt smoke-test --agent claude
+```
+
+## First-Time Knowledge Base Setup
+
+`dsagt setup-kb` builds shared ChromaDB collections under `~/.dsagt/kb_index/` that every project on this machine reuses. Run this once after installation.
+
+```bash
+dsagt setup-kb                       # all collections (local embedder, no creds)
+dsagt setup-kb --collection nemo_curator
+dsagt setup-kb --embedding-backend api --embedding-base-url ... --embedding-api-key ...
+```
+
+Three collections are populated:
+
+- **Tool Specs** — DSAgt's bundled tool specs from `src/dsagt/tools/`, tagged `source: bundled`.
+- **Skills** — DSAgt's bundled skill workflows from `src/dsagt/skills/`.
+- **Domain Knowledge** — NeMo Curator and AI Data Readiness Inspector reference corpora.
+
+The Tool Specs and Skills collections are wiped and rebuilt on every run, so re-run `setup-kb` after upgrading DSAgt.
+
+The default embedder is a local sentence-transformers model (~130 MB, CPU-only, no API key). Pass `--embedding-backend api` to route through a hosted embedder via LiteLLM.
diff --git a/docs/tools-skills.md b/docs/tools-skills.md
new file mode 100644
index 0000000..6c057bd
--- /dev/null
+++ b/docs/tools-skills.md
@@ -0,0 +1,43 @@
+# Tools and Skills
+
+## Tools
+
+Tools are CLI executables defined as markdown files with YAML frontmatter under `<project>/tools/`. The agent registers new tools via the registry MCP server's `save_tool_spec` tool.
+
+A tool spec includes:
+
+- A YAML frontmatter block describing the command, arguments, dependencies, and tags.
+- A markdown body with usage examples and notes for the agent.
+
+Example tool spec structure:
+
+```markdown
+---
+name: csvstat
+command: csvstat
+dependencies: []
+tags: [csv, statistics]
+---
+
+Prints descriptive statistics for all columns in a CSV file.
+
+Usage: csvstat [options] [FILE]
+```
+
+The registry server wraps every registered tool with `dsagt-run` for provenance capture and `uv run --with` for Python dependencies, so the agent can call any tool without managing environments manually.
+
+### Bundled Tools
+
+DSAgt ships a `scan_directory` tool in `src/dsagt/tools/` that is indexed into the global Tool Specs collection by `dsagt setup-kb`.
+
+## Skills
+
+Skills are instruction-based agent workflows in `<project>/skills/`. Each skill is a directory containing a `SKILL.md` file and optional reference documents. The agent discovers skills via `search_skills`.
+
+### Bundled Skills
+
+DSAgt ships a `datacard-generator` skill in `src/dsagt/skills/` with reference templates for generating dataset documentation. It is indexed into the global Skills collection by `dsagt setup-kb`.
+
+### Adding Skills
+
+Place a new directory under `<project>/skills/` with a `SKILL.md` describing the workflow. The knowledge server indexes it automatically on next startup, or trigger a re-index via `kb_ingest`.
diff --git a/docs/use-cases/cryoem.md b/docs/use-cases/cryoem.md
new file mode 100644
index 0000000..f07226e
--- /dev/null
+++ b/docs/use-cases/cryoem.md
@@ -0,0 +1,15 @@
+# Cryo-EM
+
+**Domain:** Structural biology — cryo-electron microscopy
+
+**Dataset:** EMPIAR-10017 β-galactosidase micrographs via CryoPPP
+
+**Source:** [`use_cases/cryoem/`](https://github.com/AI-ModCon/dsagt/tree/main/use_cases/cryoem/)
+
+## Overview
+
+This use case demonstrates DSAgt-assisted curation of cryo-EM data from the EMPIAR public archive. The agent registers curation tools, ingests domain knowledge about cryo-EM data quality, and builds a pipeline for micrograph preprocessing.
+
+## Guides
+
+- [Cryo-EM Demo](https://github.com/AI-ModCon/dsagt/blob/main/use_cases/cryoem/cryoem_demo.md) — full walkthrough
diff --git a/docs/use-cases/index.md b/docs/use-cases/index.md
new file mode 100644
index 0000000..76876f1
--- /dev/null
+++ b/docs/use-cases/index.md
@@ -0,0 +1,9 @@
+# Use Cases
+
+End-to-end walkthroughs for representative scientific domains live in [`use_cases/`](https://github.com/AI-ModCon/dsagt/tree/main/use_cases/). Each covers data acquisition, tool registration, pipeline construction, and agent-driven execution against a real dataset.
+
+| Use case | Domain | Guide |
+|----------|--------|-------|
+| [Microbial Isolates](microbial-isolates.md) | Genomics — short-read QC and assembly with `fastp` and `megahit` | `use_cases/microbial_isolates/` |
+| [Cryo-EM](cryoem.md) | Structural biology — EMPIAR-10017 β-galactosidase micrographs via CryoPPP | `use_cases/cryoem/` |
+| [VASP / ISAAC](vasp.md) | Materials science — DFT input/output handling with VASP | `use_cases/isaac_vasp/` |
diff --git a/docs/use-cases/microbial-isolates.md b/docs/use-cases/microbial-isolates.md
new file mode 100644
index 0000000..b6757dc
--- /dev/null
+++ b/docs/use-cases/microbial-isolates.md
@@ -0,0 +1,17 @@
+# Microbial Isolates
+
+**Domain:** Genomics — short-read QC and assembly
+
+**Tools:** `fastp`, `megahit`
+
+**Source:** [`use_cases/microbial_isolates/`](https://github.com/AI-ModCon/dsagt/tree/main/use_cases/microbial_isolates/)
+
+## Overview
+
+This use case walks through processing microbial isolate short-read sequencing data using DSAgt. The agent registers `fastp` for quality control and `megahit` for assembly, then builds and executes a pipeline against a real isolate dataset.
+
+## Guides
+
+- [Isolate Demo](https://github.com/AI-ModCon/dsagt/blob/main/use_cases/microbial_isolates/isolate_demo.md) — step-by-step walkthrough
+- [Genomics Background](https://github.com/AI-ModCon/dsagt/blob/main/use_cases/microbial_isolates/genomics.md) — domain context
+- [fastp + megahit Best Practices](https://github.com/AI-ModCon/dsagt/blob/main/use_cases/microbial_isolates/fastp_megahit_best_practices.md) — parameter guidance
diff --git a/docs/use-cases/vasp.md b/docs/use-cases/vasp.md
new file mode 100644
index 0000000..9f4ce28
--- /dev/null
+++ b/docs/use-cases/vasp.md
@@ -0,0 +1,15 @@
+# VASP / ISAAC
+
+**Domain:** Materials science — density functional theory
+
+**Tools:** VASP, ISAAC
+
+**Source:** [`use_cases/isaac_vasp/`](https://github.com/AI-ModCon/dsagt/tree/main/use_cases/isaac_vasp/)
+
+## Overview
+
+This use case covers DFT input/output handling with VASP using DSAgt and the ISAAC workflow system. The agent registers VASP pre/post-processing tools and transfers NEB calculation results into the ISAAC database.
+
+## Guides
+
+- [VASP / ISAAC README](https://github.com/AI-ModCon/dsagt/blob/main/use_cases/isaac_vasp/README.md) — setup and walkthrough
diff --git a/mkdocs.yml b/mkdocs.yml
new file mode 100644
index 0000000..5fd83a0
--- /dev/null
+++ b/mkdocs.yml
@@ -0,0 +1,61 @@
+site_name: DSAgt
+site_description: DataSmith Agent — AI-assisted data pipeline builder
+site_url: https://ai-modcon.github.io/dsagt
+repo_url: https://github.com/AI-ModCon/dsagt
+repo_name: AI-ModCon/dsagt
+edit_uri: edit/main/docs/
+
+theme:
+  name: material
+  palette:
+    - media: "(prefers-color-scheme: light)"
+      scheme: default
+      primary: indigo
+      accent: indigo
+      toggle:
+        icon: material/brightness-7
+        name: Switch to dark mode
+    - media: "(prefers-color-scheme: dark)"
+      scheme: slate
+      primary: indigo
+      accent: indigo
+      toggle:
+        icon: material/brightness-4
+        name: Switch to light mode
+  features:
+    - navigation.tabs
+    - navigation.sections
+    - navigation.top
+    - search.highlight
+    - search.suggest
+    - content.code.copy
+    - content.code.annotate
+
+markdown_extensions:
+  - admonition
+  - pymdownx.details
+  - pymdownx.superfences
+  - pymdownx.highlight:
+      anchor_linenums: true
+  - pymdownx.inlinehilite
+  - pymdownx.tabbed:
+      alternate_style: true
+  - tables
+  - toc:
+      permalink: true
+
+nav:
+  - Home: index.md
+  - Quick Start: quickstart.md
+  - Architecture: architecture.md
+  - MCP Servers: mcp-servers.md
+  - Knowledge Base: knowledge-base.md
+  - Tools & Skills: tools-skills.md
+  - Observability: observability.md
+  - CLI Reference: cli.md
+  - Use Cases:
+    - Overview: use-cases/index.md
+    - Microbial Isolates: use-cases/microbial-isolates.md
+    - Cryo-EM: use-cases/cryoem.md
+    - VASP / ISAAC: use-cases/vasp.md
+  - Developer Guide: developer.md
diff --git a/pyproject.toml b/pyproject.toml
index dfd4336..bcd0c77 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -70,6 +70,9 @@ dev = [
     "black>=24.0",
     "ruff>=0.5.0",
 ]
+docs = [
+    "mkdocs-material>=9.5",
+]
 
 [build-system]
 requires = ["setuptools>=61.0", "wheel"]