AI-ModCon · ajtritt · Jun 22, 2026 · Jun 16, 2026 · Jun 16, 2026
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -0,0 +1,32 @@
+name: Documentation
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+  workflow_dispatch:
+
+jobs:
+  docs:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Install docs dependencies
+        run: pip install mkdocs-material
+
+      - name: Build docs
+        run: mkdocs build --strict
+
+      - name: Deploy to GitHub Pages
+        if: github.ref == 'refs/heads/main' && github.event_name != 'pull_request'
+        run: mkdocs gh-deploy --force
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -0,0 +1,43 @@
+# Architecture
+
+![DSAgt architecture](assets/architecture.png)
+
+DSAgt wraps an unmodified agent CLI with four independently-operable layers. Each layer exposes its own MCP server so the agent discovers and invokes capabilities through the standard MCP tool protocol.
+
+## Layers
+
+**Tool Registry** (`dsagt-registry-server`)
+The agent registers CLI tools as markdown files with YAML frontmatter under `<project>/tools/`. The registry server handles dependency installation via `uv run --with` and wraps every execution with `dsagt-run` for provenance capture. The agent discovers tools via `search_registry`.
+
+**Knowledge Base** (`dsagt-knowledge-server`)
+Semantic search over six independently-partitioned ChromaDB collections. Three are global (populated by `dsagt setup-kb`); three are per-project (filled automatically during use). Background jobs handle long ingest operations. The agent searches via `kb_search`, ingests via `kb_ingest`, and saves user-confirmed facts via `kb_remember`.
+
+**Provenance** (`dsagt-run`)
+A thin wrapper invoked by the registry server around every tool execution. Records the command, arguments, exit code, duration, file counts, and truncated stderr to `<project>/trace_archive/<record_id>.json` and emits an OTLP span to MLflow. The agent calls `reconstruct_pipeline` to render the trace archive as a reproducible bash script or Snakemake workflow.
+
+**Observability** (MLflow + OTLP)
+MLflow runs locally at a port pinned at `dsagt init` time. All four layers emit OTLP HTTP spans to MLflow's `/v1/traces` endpoint. The agent's own LLM-call traces land in the same store when you export the `OTEL_EXPORTER_OTLP_ENDPOINT` printed by `dsagt init`.
+
+## Project Layout
+
+```
+~/dsagt-projects/<name>/
+  dsagt_config.yaml             # project configuration
+  tools/                        # registered CLI tool specs (markdown + YAML frontmatter)
+  tools/code/                   # agent-written tool scripts
+  skills/                       # agent skills (SKILL.md + reference docs)
+  trace_archive/                # tool execution records (JSON, from dsagt-run)
+  mlflow/                       # MLflow traces, metrics, artifacts
+  kb_index/                     # knowledge base vector collections
+  explicit_memories.yaml        # user-confirmed facts
+
+  # Per-agent runtime config (one of, generated by dsagt init):
+  #   claude:   CLAUDE.md, .mcp.json
+  #   goose:    goose.yaml, .goosehints
+  #   codex:    AGENTS.md, .codex-data/config.toml
+  #   opencode: AGENTS.md, opencode.json
+  #   roo:      .roomodes, .roo/mcp.json
+  #   cline:    .clinerules/, cline_mcp_settings.json
+```
+
+Projects are registered in `~/.dsagt/projects.yaml` so `dsagt mlflow <name>` and `dsagt info <name>` work from any directory. The data layer is agent-agnostic — re-running `dsagt init <same-name> --agent <other>` switches agent platforms while preserving all accumulated knowledge and traces.
diff --git a/docs/assets/architecture.png b/docs/assets/architecture.png
diff --git a/docs/cli.md b/docs/cli.md
@@ -0,0 +1,49 @@
+# CLI Reference
+
+All commands are available after running `uv sync` and activating the virtual environment (`source .venv/bin/activate`).
+
+## Project Management
+
+| Command | Description |
+|---------|-------------|
+| `dsagt init <name> --agent <platform> [--location <path>] [--mlflow-port N]` | Create a project; write per-agent MCP config; print the launch one-liner |
+| `dsagt list` | List all projects with agent, status, and path |
+| `dsagt info <name> [--json]` | Resolved config (with source per value) and a session/error summary |
+| `dsagt mv <name> <new-location>` | Move a project to a new location |
+| `dsagt rm <name> [-y] [--keep-files]` | Unregister a project and optionally delete its directory |
+
+## Session Lifecycle
+
+| Command | Description |
+|---------|-------------|
+| `dsagt mlflow <name>` | Start MLflow for a project and print OTel routing exports |
+| `dsagt stop <name>` | Stop the MLflow daemon |
+| `dsagt memory --project <name>` | Distill new traces from MLflow into episodic memory |
+
+## Setup
+
+| Command | Description |
+|---------|-------------|
+| `dsagt setup-kb [--collection <name>]` | Build the shared core knowledge base collections |
+| `dsagt smoke-test [--agent claude\|goose\|codex\|opencode]` | End-to-end install verification |
+
+## Project Location
+
+The default project location is `~/dsagt-projects/<name>/`. Override with `--location`:
+
+```bash
+dsagt init my-project --agent claude --location /data/runs   # /data/runs/my-project/
+dsagt init my-project --agent claude --location .            # ./my-project/
+```
+
+## Server Commands
+
+These are launched automatically by `dsagt init` via the per-agent MCP config and are not typically run directly.
+
+| Command | Description |
+|---------|-------------|
+| `dsagt-registry-server` | Tool registry MCP server |
+| `dsagt-knowledge-server` | Knowledge base MCP server |
+| `dsagt-run` | Provenance-capturing tool execution wrapper |
+| `dsagt-proxy` | LiteLLM proxy server (proxy mode only) |
+| `dsagt-setup-kb` | Core knowledge base setup (called by `dsagt setup-kb`) |
diff --git a/docs/developer.md b/docs/developer.md
@@ -0,0 +1,47 @@
+# Developer Guide
+
+Material for contributors and users who are working beyond the default `dsagt init` → `dsagt mlflow` → agent flow.
+
+## Tests
+
+```bash
+uv run python -m pytest -m "not integration"     # unit tests, no creds required
+uv run python -m pytest -m integration -v        # integration tests (require .env)
+```
+
+Integration tests read endpoint and key values from `.env` at the repo root. Copy `.env.example` to `.env` and fill in your values.
+
+For per-flow hand-tests (CLI, proxy mode, VS Code extensions), see the scripts under [`tests/smoke_test/manual_runs/`](https://github.com/AI-ModCon/dsagt/tree/main/tests/smoke_test/manual_runs/).
+
+## Proxy Mode
+
+`dsagt init` followed by `dsagt start <project> --enable-proxy` spawns a LiteLLM proxy in front of your agent's LLM calls. This adds:
+
+- Full LLM-call traces (request bodies, tool-use blocks, response payloads) in MLflow for agents whose native OTel does not emit those payloads (codex, opencode).
+- Cache-breakpoint injection on outgoing requests (Anthropic prompt caching).
+- Sidechannel detection for agent-internal title-generator / session-namer calls.
+- Model-name aliasing — useful when an agent CLI hardcodes a model whitelist incompatible with your gateway's served names (cline, roo).
+
+Proxy mode reads upstream LLM credentials from `.env` or the shell. See [`tests/smoke_test/manual_runs/proxy_walkthrough.md`](https://github.com/AI-ModCon/dsagt/blob/main/tests/smoke_test/manual_runs/proxy_walkthrough.md) for the full setup walkthrough.
+
+## Troubleshooting
+
+**Agent command not found.** The agent CLI is not installed or is not on PATH. See the [supported agents table](index.md#supported-agents).
+
+**MCP servers not connecting.** Verify uv resolves the server commands:
+
+```bash
+uv run which dsagt-registry-server
+uv run which dsagt-knowledge-server
+```
+
+If missing, reinstall: `uv sync --reinstall`.
+
+**MLflow UI empty.** Confirm MLflow is running for the right project:
+
+```bash
+dsagt info <name>           # shows the pinned port
+curl http://localhost:<mlflow_port>
+```
+
+**Claude keychain conflict.** If `claude` will not authenticate against a non-default gateway, run `claude /logout` to clear the macOS Keychain OAuth token, then re-export `ANTHROPIC_BASE_URL` / `ANTHROPIC_API_KEY` and re-launch.
diff --git a/docs/index.md b/docs/index.md
@@ -0,0 +1,44 @@
+# DSAgt
+
+**D**ata**S**mith **Ag**en**t** — AI-assisted data pipeline builder.
+
+DSAgt connects an MCP-compatible AI coding agent to tool registration, a semantic knowledge base, execution provenance, and observability infrastructure. It provides data-pipeline scaffolding around your existing agent CLI or VS Code extension (Claude Code, Goose, Codex, and others).
+
+## Supported Agents
+
+| Agent | Install | Verify |
+|-------|---------|--------|
+| [Claude Code](https://github.com/anthropics/claude-code) | `npm i -g @anthropic-ai/claude-code` | `claude --version` |
+| [Goose](https://github.com/block/goose) | See [Goose docs](https://github.com/block/goose#installation) | `goose --version` |
+| [Codex](https://github.com/openai/codex) | `npm i -g @openai/codex` | `codex --version` |
+| [opencode](https://github.com/sst/opencode) | See [opencode docs](https://opencode.ai/docs/) | `opencode --version` |
+| [Roo Code](https://github.com/RooCodeInc/Roo-Code) | `npm i -g @roo-code/cli` | `roo --version` |
+| [Cline](https://github.com/cline/cline) | `npm i -g cline` | `cline --version` |
+
+## Prerequisites
+
+- Python 3.12–3.13
+- [uv](https://github.com/astral-sh/uv)
+- One of the supported agent platforms above, installed and authenticated against your LLM provider
+
+## Installation
+
+```bash
+git clone https://github.com/AI-ModCon/dsagt.git
+cd dsagt
+uv sync
+source .venv/bin/activate
+```
+
+## Key Capabilities
+
+| Layer | What it does |
+|-------|-------------|
+| **Tool Registry** | Register CLI tools as markdown specs; the agent discovers and runs them via `search_registry` |
+| **Knowledge Base** | Semantic search over indexed document collections (ChromaDB + FAISS) |
+| **Provenance** | `dsagt-run` wrapper records every tool execution to `trace_archive/` and MLflow |
+| **Explicit Memory** | User-confirmed facts persisted to YAML and the knowledge base |
+| **Episodic Memory** | Session distillation via outlier detection over MLflow traces |
+| **Observability** | Full OTLP tracing to a local MLflow instance |
+
+See the [Quick Start](quickstart.md) to try all of these in a single session.
diff --git a/docs/knowledge-base.md b/docs/knowledge-base.md
@@ -0,0 +1,40 @@
+# Knowledge Base
+
+DSAgt maintains six independently-partitioned ChromaDB collections. The first three are global (under `~/.dsagt/kb_index/`, populated by `dsagt setup-kb`); the last three are per-project (under `<project>/kb_index/`, populated automatically during use).
+
+## Collections
+
+| Collection | Source | Populated by |
+|---|---|---|
+| **Tool Specs** | Bundled CLI tool specs in `src/dsagt/tools/` | `dsagt setup-kb` |
+| **Skills** | Bundled skill workflows in `src/dsagt/skills/` | `dsagt setup-kb` |
+| **Domain Knowledge** | NeMo Curator + AIDRIN reference corpora; user-ingested docs | `dsagt setup-kb` + agent's `kb_ingest` |
+| **Explicit Memory** | User-confirmed facts | Agent's `kb_remember` (also written to `<project>/explicit_memories.yaml`) |
+| **Episodic Memory** | Distilled facts from MLflow traces | `dsagt memory --project <name>` |
+| **Tool Use Records** | `dsagt-run` execution traces | `dsagt-run` wrapper writes JSON to `<project>/trace_archive/`; indexed by `dsagt memory` |
+
+## Explicit Memory
+
+Explicit memories are facts the user confirms during a session. The agent saves them via `kb_remember`, which writes to both the ChromaDB collection and `<project>/explicit_memories.yaml`. The agent fetches them via `kb_get_memories` on demand (typically when you ask it to recall something) — they are not auto-loaded at session start.
+
+## Episodic Memory
+
+`dsagt memory --project <name>` distills new traces from the project's MLflow store into episodic memory using per-category outlier detection over embedding centroids. Run this after each session to accumulate cross-session memory.
+
+## Search
+
+The agent searches all collections via `kb_search` (knowledge MCP server) and writes via `kb_ingest` / `kb_remember`. Tool Specs and Skills are queried through specialized routes (`search_registry`, `search_skills`) over the same backend.
+
+Hybrid search (dense embeddings + sparse BM25 via Reciprocal Rank Fusion) is on by default per collection route. Cross-encoder reranking is optional.
+
+## Setup
+
+```bash
+dsagt setup-kb                       # all global collections (local embedder)
+dsagt setup-kb --collection nemo_curator
+dsagt setup-kb --embedding-backend api \
+    --embedding-base-url <url> \
+    --embedding-api-key <key>
+```
+
+The Tool Specs and Skills collections are wiped and rebuilt on every `setup-kb` run — re-run after upgrading DSAgt to pick up new bundled assets.
diff --git a/docs/mcp-servers.md b/docs/mcp-servers.md
@@ -0,0 +1,38 @@
+# MCP Servers
+
+DSAgt exposes its capabilities through two MCP servers. Both are launched automatically by `dsagt init` and configured in the per-agent runtime file (`.mcp.json` for Claude Code, `goose.yaml` for Goose, etc.).
+
+## Registry Server
+
+**Command:** `dsagt-registry-server`
+
+Handles tool registration, dependency installation, and tool discovery.
+
+| Tool | Description |
+|------|-------------|
+| `search_registry` | Semantic search over registered tool specs |
+| `save_tool_spec` | Register a new CLI tool as a markdown file with YAML frontmatter |
+| `install_dependencies` | Install tool dependencies via `uv run --with` |
+| `reconstruct_pipeline` | Render the trace archive as a bash script or Snakemake workflow |
+
+Tools are markdown files with YAML frontmatter under `<project>/tools/`. Executables are wrapped with `dsagt-run` for provenance and `uv run --with` for Python dependencies.
+
+## Knowledge Server
+
+**Command:** `dsagt-knowledge-server`
+
+Semantic search and ingestion over indexed document collections.
+
+| Tool | Description |
+|------|-------------|
+| `kb_search` | Search across one or more knowledge collections |
+| `kb_ingest` | Index a file or directory into a named collection (runs in background for large corpora) |
+| `kb_remember` | Save a user-confirmed fact to explicit memory |
+| `kb_get_memories` | Retrieve explicit memories for the current project |
+| `search_skills` | Discover agent skill workflows |
+
+### Backend
+
+The default embedding backend is local (`sentence-transformers`, CPU-only, no API key needed). Switch to `embedding.backend: api` in `dsagt_config.yaml` to route through a hosted embedder via LiteLLM. Cross-encoder reranking is available via `knowledge.rerank: true`.
+
+Hybrid search (dense + sparse BM25) is on by default and controlled per-route via the `hybrid` flag.
diff --git a/docs/observability.md b/docs/observability.md
@@ -0,0 +1,45 @@
+# Observability
+
+DSAgt provides end-to-end trace visibility through a local MLflow instance. All internal layers emit OTLP HTTP spans to MLflow's `/v1/traces` endpoint.
+
+## Starting MLflow
+
+```bash
+dsagt mlflow <project-name>
+```
+
+Prints the MLflow UI URL and the `export` block for routing agent OTel output. The port is pinned at `dsagt init` time and listed by `dsagt info <name>`.
+
+## Trace Coverage
+
+| Source | Span type | Contents |
+|--------|-----------|----------|
+| Knowledge base | `kb.search`, `kb.embed`, `kb.index_search`, `kb.rerank` | Per-phase timing trees |
+| Tool executions | `tool.execute` | Exit code, duration, file counts, truncated stderr. Full payload in `trace_archive/<record_id>.json` |
+| Registry events | `save_tool_spec`, `install_dependencies`, `reconstruct_pipeline` | Span metadata |
+| Native agent OTel | LLM call spans | Coverage varies by agent (see below) |
+
+### Agent OTel Coverage
+
+Export the variables printed by `dsagt mlflow` before launching your agent:
+
+| Agent | Coverage |
+|-------|----------|
+| claude | Full request/response payloads |
+| goose | Full request/response payloads |
+| codex | Token counts and tool names |
+| opencode | None natively |
+
+Every span carries the project's `session.id` for filtering in the MLflow trace view.
+
+## Provenance and Reconstruction
+
+Tool execution records on disk (`trace_archive/<record_id>.json`) provide the canonical provenance chain. The agent calls `reconstruct_pipeline` to render the archive as a reproducible bash script or Snakemake workflow.
+
+## Stopping MLflow
+
+```bash
+dsagt stop <project-name>
+```
+
+Releases the port and stops the gunicorn workers. The PID is stored in `<project>/.runtime`.