AI-ModCon · aarontuor · Jun 12, 2026 · Jun 12, 2026 · Jun 12, 2026 · Jun 23, 2026
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -21,12 +21,15 @@ jobs:
         with:
           python-version: "3.12"
 
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+
       - name: Install docs dependencies
-        run: pip install mkdocs-material
+        run: uv sync --group docs
 
       - name: Build docs
-        run: mkdocs build --strict
+        run: uv run mkdocs build --strict
 
       - name: Deploy to GitHub Pages
         if: github.ref == 'refs/heads/main' && github.event_name != 'pull_request'
-        run: mkdocs gh-deploy --force
+        run: uv run mkdocs gh-deploy --force
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,120 @@
+# Changelog
+
+All notable changes to DSAgt are documented here. The format is based on
+[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
+adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+## [0.2.0] - 2026-06-24
+
+This release adds an **external skill-catalog system** and consolidates the
+agent-facing surface into a **single MCP server**.
+
+Agents can now discover and install skills from federated GitHub/GitLab catalogs
+(Genesis, Anthropic, K-Dense, and more) and author their own — while installed
+skills are picked up through each agent's *native* `SKILL.md` auto-discovery, so
+`search_skills` is reserved for the one job native discovery can't do: browsing
+the catalog of skills you haven't installed yet.
+
+In parallel, the registry and knowledge MCP servers — two processes that each
+loaded their own embedder and opened their own ChromaDB (pure duplication, plus
+a write-here/read-there hazard on the shared skill-catalog collections) —
+collapse into one `dsagt-server`: one embedder, one Chroma owner, one connection
+per agent, so startup is faster with fewer moving parts per project.
+
+**Upgrading from 0.1.0 (forwards compatibility).** There is no automatic
+migration — adopting 0.2.0 is rebuild-not-migrate, and no project data changes:
+- Re-run `dsagt start <project>` for each existing project; it regenerates the
+  per-agent MCP config to point at the single `dsagt-server`.
+- For **cline** only, delete `<project>/.cline-data` first — `cline mcp add`
+  has no remove, so the stale `dsagt-registry`/`dsagt-knowledge` entries would
+  otherwise linger next to the new one.
+- Tools, skills, the KB index, traces, and memory all carry over untouched.
+
+### Added
+- **External skill catalogs**: discover and install agent skills from GitHub /
+  GitLab sources via `add_skill_source`, `search_skills`, and `install_skill`
+  (plus the `dsagt skills sync/add/list/search` CLI), backed by per-source
+  ChromaDB collections. Curated sources ship out of the box (`scientific`,
+  `anthropic`, `antigravity`, `composio`, `genesis`); any git URL / `owner/repo`
+  also works.
+- **Genesis catalog integration**: the curated `genesis` source (OSTI GitLab,
+  `gitlab.osti.gov/genesis/genesis-skills`) makes the BASE-Data / ModCon skills
+  — `datacard-generator` (frontmatter name `generating-datacards`),
+  `croissant-validator`, `hdmf-schema-builder` — pullable on demand
+  (`dsagt skills add <project> genesis`, then `install_skill`) rather than
+  bundled in the package, alongside the rest of the Genesis catalog (HPC/Slurm,
+  HuggingFace, LangChain, and more).
+- **Native skill discovery**: installed and bundled skills are mirrored into
+  each agent's native skill directory (`.claude/skills/`, `.agents/skills/`, …)
+  at init/start, so every supported agent auto-discovers them.
+- **`skill-creator`** bundled skill for authoring new skills from the Anthropic
+  template.
+- **Source-qualified catalog install**: when the same skill name exists in more
+  than one synced source, install a specific one with a `<source-slug>/<skill>`
+  name (via `install_skill` or `dsagt skills add <project> <slug>/<skill>`)
+  instead of dead-ending on the ambiguity guard.
+- **Keyword fallback** for `search_skills`: a zero-dependency token-overlap
+  scorer so catalog search works even when no embedding model is configured.
+- **License / attribution provenance on install**: installing a catalog skill
+  preserves upstream `LICENSE` / `NOTICE` files and stamps a `PROVENANCE.txt`
+  recording the source repo and path into the installed skill directory.
+- **`isaac_skills_demo` use case**: an end-to-end, skill-oriented walkthrough
+  (`use_cases/isaac_skills_demo/`) that drives a real agent through syncing a
+  catalog, installing a skill, and converting mock VASP output into an Isaac
+  record — with prompts and mock data included.
+- **Install-from-GitHub instructions** for non-developers (`pip install
+  git+https://github.com/AI-ModCon/dsagt.git` into any Python 3.12/3.13
+  environment) in the README and docs.
+
+### Changed
+- **The two MCP servers are now one `dsagt-server`** — one shared
+  `KnowledgeBase`/embedder, one MCP entry per agent, one trace `service.name`.
+  The tool surface is organized by concern (registry / knowledge / memory /
+  skill) behind the single server.
+- Skill discovery is now **catalog-only**: installed and bundled skills are
+  discovered natively by every supported agent, so `search_skills` covers only
+  the not-yet-installed external catalog. Catalogs are indexed on frontmatter
+  (name + description + tags) rather than the full SKILL.md body.
+- `search_skills` now reports when no external catalog is synced instead of a
+  bare "no match", and `list_skill_sources` flags each known source as
+  `synced`/available with its indexed count.
+- `install_skill` clarifies that an installed skill is usable in the current
+  session immediately — a restart is only needed for hands-free native
+  auto-invocation.
+- The package version is single-sourced from `dsagt.__version__` (pyproject
+  reads it via setuptools dynamic metadata).
+- Documentation home page (`docs/index.md`) pulls the supported-agents table
+  and install instructions directly from the README via the
+  `mkdocs-include-markdown` plugin, so the two no longer drift.
+
+### Removed
+- **BREAKING:** the `dsagt-registry-server` and `dsagt-knowledge-server` console
+  scripts, replaced by `dsagt-server` (see **Upgrading** above).
+- The bundled `datacard-generator` skill — it lives in the Genesis catalog and
+  is now installed on demand via `dsagt skills add <project> genesis`.
+- Dead indexing of installed/bundled skills into the `skills` ChromaDB
+  collection (nothing read it after the catalog-only search change).
+
+### Fixed
+- CLI-added skill sources are now persisted to the project config.
+- `dsagt --version` now works (it was documented but unimplemented — argparse
+  errored). Reports the version from `dsagt.__version__`.
+- Catalog skills with technically-invalid YAML frontmatter (e.g. an unquoted
+  `description` containing a colon, like `…readiness levels: Level 1…`) are no
+  longer silently dropped from discovery. `_parse_frontmatter` falls back to a
+  lenient flat parse that recovers `name`/`description`/`tags`, so such skills —
+  including Genesis's `generating-datacards` (`datacard-generator`) — are
+  searchable and installable instead of skipped.
+
+## [0.1.0] - 2026-01-11
+
+### Added
+- Initial release: registry and knowledge MCP servers, BYOA per-agent config
+  generation, MLflow/OTel observability, the tool/skill registry, execution
+  provenance, and explicit + episodic memory.
+
+[Unreleased]: https://github.com/AI-ModCon/dsagt/compare/v0.2.0...HEAD
+[0.2.0]: https://github.com/AI-ModCon/dsagt/compare/v0.1.0...v0.2.0
+[0.1.0]: https://github.com/AI-ModCon/dsagt/releases/tag/v0.1.0
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,116 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+DSAGT (DataSmith Agent) is an AI-assisted data pipeline builder that exposes MCP (Model Context Protocol) servers to agent platforms (Claude Code, Goose, Roo, Cline, Codex). It helps domain scientists create reproducible, auditable data curation pipelines through iterative, knowledge-driven tool generation.
+
+## Two run modes
+
+1. **BYOA (Bring Your Own Agent)** — default for everyday use. `dsagt init --agent <name>` writes per-agent MCP config artifacts; `dsagt mlflow <project>` backgrounds MLflow and prints the OTel routing exports the user pastes into the shell that runs `claude` / `goose` / etc. Project / agent / session_id are read from `<project>/dsagt_config.yaml` + `.runtime` (single source of truth, no env-var duplication). `dsagt memory --project X` extracts episodic memory from accumulated traces — but only from proxy-shape traces (see #2).
+2. **Proxy mode** — `dsagt start --enable-proxy <project>` interposes a LiteLLM proxy between the agent and its LLM provider. The proxy autologs every LLM call into MLflow with `mlflow.spanInputs` / `mlflow.spanOutputs` populated, which is the only trace shape `dsagt memory` knows how to extract from. Use this when you want both (a) request/response columns populated in the MLflow UI and (b) episodic memory extraction. Native agent OTel emission (Claude Code, Goose) is visible in the UI but uses a different shape (`api_response_body` log events), so memory extraction skips those traces.
+
+## Commands
+
+```bash
+uv sync --all-groups                                                  # install
+uv run --no-sync python -m pytest tests/test_<file>.py -q             # targeted tests
+uv run black .                                                         # format
+uv run ruff check .                                                    # lint
+```
+
+**`python -m pytest` not bare `pytest`.** The bare-binary form picks up the wrong pytest on this machine and crashes with `ModuleNotFoundError: dsagt`.
+
+**Don't run the full suite by default.** ~50s for 547 tests is too slow for an iteration loop. Run only the test file relevant to the change. `tests/test_config.py` covers session, init, agents, BYOA hints, launch shim, and memory state. Skip `test_integration.py`, `test_*_integration.py`, `test_server_startup.py`, `test_dependency_integration.py` unless explicitly relevant — they hit network or spawn subprocesses.
+
+## Code Organization
+
+The codebase separates **commands** (entry points with argparse, launched as CLI tools or subprocesses) from **modules** (importable logic). Commands live in `src/dsagt/commands/`, modules live in `src/dsagt/`.
+
+**Commands** (`src/dsagt/commands/`):
+- `cli.py` — `dsagt init / mlflow / memory / info / list / mv / rm / setup-kb / smoke-test / stop / start` (user-facing CLI). `dsagt start --enable-proxy` activates proxy mode; without `--enable-proxy` it's the supervised BYOA equivalent (start MLflow + agent under one process tree).
+- `proxy_server.py` — `dsagt-proxy` (LiteLLM proxy with OTel autolog). Spawned by `dsagt start --enable-proxy`.
+- `run_tool.py` — `dsagt-run` (tool execution wrapper).
+- `setup_core_kb.py` — core KB setup (called via `dsagt setup-kb`).
+- `info.py` — `dsagt info` (project / config introspection).
+
+(The MCP server — `dsagt-server` — lives in the `src/dsagt/mcp/` package, see below.)
+
+**Modules** (`src/dsagt/`):
+- `session.py` — Project init, agent config generation, env-var resolution, config load/validate, service start/stop, end-of-session memory extraction orchestration.
+- `agents/` — Per-agent-platform setup (`base.py` ABC + `claude.py` / `goose.py` / `cline.py` / `roo.py` / `codex.py`). Each subclass owns its `write_static`, `write_dynamic`, `env_overrides`, `byoa_env_hints`, `launch_oneliner`. Shared helpers (`_mcp_env_block`, `_render_launch_shim`, `_build_mcp_servers_dict`) in `base.py`.
+- `knowledge.py` — FAISS/ChromaDB document retrieval, embedding backends, per-collection routing.
+- `registry.py` — `ToolRegistry` (CLI tools) + `SkillRegistry` (agent instruction skills), KB indexing.
+- `provenance.py` — Tool execution records (`run_and_record`, `ToolRecordStore`), execution-record indexing into ChromaDB, pipeline reconstruction (`reconstruct_pipeline`, dependency graph).
+- `observability.py` — MLflow / OTel tracing setup, `init_tracing`, span helpers.
+- `memory.py` — Explicit memory (YAML), episodic-memory extraction prompt + LLM call, outlier detection, `extract_session`.
+- `skills.py` — External skill catalog data plane (`SkillsCatalog`: clone/sync/index/install), the `SkillRouter` render facade, and the Genesis-derived keyword scorer (`rank_skills`).
+
+**MCP server** (`src/dsagt/mcp/`) — the single merged `dsagt-server`. `server.py` owns `main()`, the shared-KB startup (`_build_kb_from_config`), and the dispatch shell (`build_dispatch_server`); the tool surface is split by concern across `registry_tools.py` (tool registry + execution + provenance, 8 tools), `knowledge_tools.py` (KB retrieval, 6), `memory_tools.py` (explicit memory + suggestions, 4), and `skill_tools.py` (skill search/install/sources, 5). Each `*_tools.py` exposes a `_*_tools_and_handlers()` factory (composed by `create_dsagt_server`) plus a `create_*_server` test wrapper.
+
+Entry points are defined in `pyproject.toml` `[project.scripts]`: the CLI/proxy/run/setup-kb tools point to `dsagt.commands.*:main`, and `dsagt-server` points to `dsagt.mcp.server:main`.
+
+**Bundled assets** (shipped as `package-data` in `pyproject.toml`):
+- `src/dsagt/tools/` — built-in tool specs (markdown + YAML frontmatter) copied into new projects.
+- `src/dsagt/skills/` — built-in skills (e.g., `datacard-generator`) the agent discovers via `search_skills`.
+- `src/dsagt/dsagt_instructions.md` — agent-platform-agnostic system instructions injected into per-agent files at init.
+
+**`use_cases/`** holds end-to-end domain walkthroughs (`microbial_isolates/`, `cryoem/`, `isaac_vasp/`). They are reference material for users, not part of the test suite. `isaac_vasp/` is currently in active development on this branch.
+
+## BYOA artifacts
+
+`dsagt init --agent X --location <path>` writes (in the project dir):
+- `dsagt_config.yaml` — internal config (project name, agent, mlflow port pinned at init, embedding/knowledge/extraction settings). No user-facing fields, no credentials.
+- Per-agent instructions file (e.g., `CLAUDE.md`, `.goosehints`, `AGENTS.md`).
+- Per-agent MCP config artifact (`.mcp.json` for claude, `goose.yaml` for goose, `cline_mcp_settings.json` via `cline mcp add`, `.roo/mcp.json`, `.codex-data/config.toml`). All include the env block (DSAGT_PROJECT, DSAGT_PROJECT_DIR, MLFLOW_TRACKING_URI, EMBEDDING_*) so MCP-server children that don't inherit shell env still log to the right MLflow.
+- `dsagt-launch.sh` — bash shim that exports all dsagt-internal env (DSAGT_*, MLFLOW_*, OTEL_*, agent-specific telemetry verbosity flags), resolves the OTel experiment-id header at run time via curl, then execs the agent. The user runs this directly to launch.
+
+`dsagt memory --project X` tracks a high-water-mark in `<pdir>/.dsagt/extracted_at.json` so re-runs only process new traces.
+
+## Architecture
+
+### MCP Server
+
+A single merged `dsagt-server` (`src/dsagt/mcp/`) exposes 23 tools across four concern modules under one `Server` + one shared `KnowledgeBase`:
+
+1. **Registry tools** (`mcp/registry_tools.py` + `registry.py` / `provenance.py`) — tool analysis, registration, dependency installation, command/file/http execution, and pipeline reconstruction. Tools are saved as markdown specs with YAML frontmatter.
+2. **Knowledge tools** (`mcp/knowledge_tools.py` + `knowledge.py`) — semantic search over document collections (FAISS + ChromaDB, optional cross-encoder reranking); long ops run as background jobs.
+3. **Memory tools** (`mcp/memory_tools.py` + `memory.py`) — explicit memory + outlier suggestions (`kb_remember` / `kb_get_memories` / …).
+4. **Skill tools** (`mcp/skill_tools.py` + `skills.py`) — skill search / install + external catalog sources.
+
+### Observability
+
+- **MLflow** — Token usage, cost, latency, full LLM-call traces via OTel. Started by `dsagt mlflow <project>` (foreground, in its own terminal). Port is pinned at init time and lives in `dsagt_config.yaml`.
+- **dsagt-run** (`commands/run_tool.py` + `provenance.py`) — Wraps tool commands; captures execution layer (command, stdout/stderr, timing, file lists) into `trace_archive/`.
+- **MCP-server OTel** — `dsagt-server` calls `init_tracing()` at startup; its tool spans (kb.*, registry.*) flow to MLflow alongside the agent's LLM-call spans.
+
+### Memory System
+
+- **Episodic memory** (`memory.py:extract_session`) — End-of-session LLM extraction of facts from MLflow traces into ChromaDB, with per-category outlier detection via embedding centroids. Triggered by `dsagt memory --project X`.
+- **Explicit memory** (`memory.py:ExplicitMemory`) — User-confirmed facts in YAML, loaded into agent context at session start.
+
+### Key Design Patterns
+
+- **Agent-agnostic**: DSAGT is infrastructure, not an agent. Capabilities are MCP services.
+- **Session isolation**: Each project gets its own directory with config, tools, skills, kb_index, trace_archive, mlflow data.
+- **Tools vs Skills**: Tools are CLI executables in `<project>/tools/` (specs with parameters, wrapped by dsagt-run). Skills are agent instruction workflows in `<project>/skills/` (SKILL.md + reference docs). Both are discoverable via ChromaDB-backed semantic search.
+
+## DSAGT Pipeline Builder Workflow
+
+When acting as a pipeline builder (using the MCP servers), follow these constraints:
+
+1. **Never directly access data** — all data operations go through registered tools.
+2. **Tool preference hierarchy**: Registered tool → KB package tool → Custom implementation.
+3. **Generate paired tools** — every data operation gets a check tool (pre/post audit) and an operation tool.
+4. **Audit everything** — before/after JSON reports saved to `audit/`.
+5. **One step at a time** — iterate with the user, confirming approach before execution.
+
+## Testing Patterns
+
+- Tests use pytest with `subprocess.run` mocking for command execution.
+- MCP server tests invoke handlers directly (no stdio transport).
+- Async tests for server handlers.
+- Temp directories for isolation; the `_use_tmp_registry` fixture in `tests/test_config.py` patches `DEFAULT_PROJECTS_BASE` and the project registry to `tmp_path`.
+- Integration tests in `test_*_integration.py` require real `EMBEDDING_*` / `LLM_*` credentials.
+- A handful of tests are class-skipped under `TestProviderEnvInjection` with a long reason about "old-code-shape env_overrides" — those describe the pre-Phase-1 design where `env_overrides` did broad provider-credential translation, which is now narrower (model env-var pinning only; provider creds + base URLs come via `proxy_env_overrides` or per-agent config files). Kept around for reference; safe to delete once Phase 2 is stable on real workloads.