Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
365f962
feat(skills): external skill catalogs, native discovery, skill-creator
aarontuor Jun 12, 2026
b6e5598
fix(skills): persist CLI-added sources to config; demo prompt script
aarontuor Jun 12, 2026
a049992
docs(skills): demo rebuild block must sync the catalog
aarontuor Jun 12, 2026
752b959
feat(skills): clarify catalog discovery and install signals
aarontuor Jun 23, 2026
6af5d0b
chore(release): 0.2.0 with single-sourced version
aarontuor Jun 23, 2026
f7fc77c
docs: add pip-from-github install, sync README/docs, build on uv
aarontuor Jun 23, 2026
4b81e43
refactor(skills,mcp): merge MCP servers into one dsagt-server; Skills…
aarontuor Jun 24, 2026
95a49fa
chore(release): 0.3.0 — single dsagt-server + catalog-only skill disc…
aarontuor Jun 24, 2026
8030e7a
docs: single dsagt-server sweep + skill-routing figure in Tools & Skills
aarontuor Jun 24, 2026
433bab6
fix(cli): implement `dsagt --version` (single-sourced from __version__)
aarontuor Jun 24, 2026
7fccb46
feat(skills): source-qualified catalog install to disambiguate cross-…
aarontuor Jun 24, 2026
d7fda87
docs(changelog): note source-qualified install + dsagt --version unde…
aarontuor Jun 24, 2026
c5c1bd2
docs: mention skills discovery/creation in the README tagline
aarontuor Jun 24, 2026
acb4dde
refactor(mcp,skills): split the MCP server into a concern-based mcp/ …
aarontuor Jun 24, 2026
c30ba80
fix(skills): recover catalog skills with technically-invalid YAML fro…
aarontuor Jun 24, 2026
1bbc7ff
chore(release): 0.2.0 — external skill catalogs + single dsagt-server
aarontuor Jun 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,15 @@ jobs:
with:
python-version: "3.12"

- name: Install uv
uses: astral-sh/setup-uv@v5

- name: Install docs dependencies
run: pip install mkdocs-material
run: uv sync --group docs

- name: Build docs
run: mkdocs build --strict
run: uv run mkdocs build --strict

- name: Deploy to GitHub Pages
if: github.ref == 'refs/heads/main' && github.event_name != 'pull_request'
run: mkdocs gh-deploy --force
run: uv run mkdocs gh-deploy --force
120 changes: 120 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Changelog

All notable changes to DSAgt are documented here. The format is based on
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.2.0] - 2026-06-24

This release adds an **external skill-catalog system** and consolidates the
agent-facing surface into a **single MCP server**.

Agents can now discover and install skills from federated GitHub/GitLab catalogs
(Genesis, Anthropic, K-Dense, and more) and author their own — while installed
skills are picked up through each agent's *native* `SKILL.md` auto-discovery, so
`search_skills` is reserved for the one job native discovery can't do: browsing
the catalog of skills you haven't installed yet.

In parallel, the registry and knowledge MCP servers — two processes that each
loaded their own embedder and opened their own ChromaDB (pure duplication, plus
a write-here/read-there hazard on the shared skill-catalog collections) —
collapse into one `dsagt-server`: one embedder, one Chroma owner, one connection
per agent, so startup is faster with fewer moving parts per project.

**Upgrading from 0.1.0 (forwards compatibility).** There is no automatic
migration — adopting 0.2.0 is rebuild-not-migrate, and no project data changes:
- Re-run `dsagt start <project>` for each existing project; it regenerates the
per-agent MCP config to point at the single `dsagt-server`.
- For **cline** only, delete `<project>/.cline-data` first — `cline mcp add`
has no remove, so the stale `dsagt-registry`/`dsagt-knowledge` entries would
otherwise linger next to the new one.
- Tools, skills, the KB index, traces, and memory all carry over untouched.

### Added
- **External skill catalogs**: discover and install agent skills from GitHub /
GitLab sources via `add_skill_source`, `search_skills`, and `install_skill`
(plus the `dsagt skills sync/add/list/search` CLI), backed by per-source
ChromaDB collections. Curated sources ship out of the box (`scientific`,
`anthropic`, `antigravity`, `composio`, `genesis`); any git URL / `owner/repo`
also works.
- **Genesis catalog integration**: the curated `genesis` source (OSTI GitLab,
`gitlab.osti.gov/genesis/genesis-skills`) makes the BASE-Data / ModCon skills
— `datacard-generator` (frontmatter name `generating-datacards`),
`croissant-validator`, `hdmf-schema-builder` — pullable on demand
(`dsagt skills add <project> genesis`, then `install_skill`) rather than
bundled in the package, alongside the rest of the Genesis catalog (HPC/Slurm,
HuggingFace, LangChain, and more).
- **Native skill discovery**: installed and bundled skills are mirrored into
each agent's native skill directory (`.claude/skills/`, `.agents/skills/`, …)
at init/start, so every supported agent auto-discovers them.
- **`skill-creator`** bundled skill for authoring new skills from the Anthropic
template.
- **Source-qualified catalog install**: when the same skill name exists in more
than one synced source, install a specific one with a `<source-slug>/<skill>`
name (via `install_skill` or `dsagt skills add <project> <slug>/<skill>`)
instead of dead-ending on the ambiguity guard.
- **Keyword fallback** for `search_skills`: a zero-dependency token-overlap
scorer so catalog search works even when no embedding model is configured.
- **License / attribution provenance on install**: installing a catalog skill
preserves upstream `LICENSE` / `NOTICE` files and stamps a `PROVENANCE.txt`
recording the source repo and path into the installed skill directory.
- **`isaac_skills_demo` use case**: an end-to-end, skill-oriented walkthrough
(`use_cases/isaac_skills_demo/`) that drives a real agent through syncing a
catalog, installing a skill, and converting mock VASP output into an Isaac
record — with prompts and mock data included.
- **Install-from-GitHub instructions** for non-developers (`pip install
git+https://github.com/AI-ModCon/dsagt.git` into any Python 3.12/3.13
environment) in the README and docs.

### Changed
- **The two MCP servers are now one `dsagt-server`** — one shared
`KnowledgeBase`/embedder, one MCP entry per agent, one trace `service.name`.
The tool surface is organized by concern (registry / knowledge / memory /
skill) behind the single server.
- Skill discovery is now **catalog-only**: installed and bundled skills are
discovered natively by every supported agent, so `search_skills` covers only
the not-yet-installed external catalog. Catalogs are indexed on frontmatter
(name + description + tags) rather than the full SKILL.md body.
- `search_skills` now reports when no external catalog is synced instead of a
bare "no match", and `list_skill_sources` flags each known source as
`synced`/available with its indexed count.
- `install_skill` clarifies that an installed skill is usable in the current
session immediately — a restart is only needed for hands-free native
auto-invocation.
- The package version is single-sourced from `dsagt.__version__` (pyproject
reads it via setuptools dynamic metadata).
- Documentation home page (`docs/index.md`) pulls the supported-agents table
and install instructions directly from the README via the
`mkdocs-include-markdown` plugin, so the two no longer drift.

### Removed
- **BREAKING:** the `dsagt-registry-server` and `dsagt-knowledge-server` console
scripts, replaced by `dsagt-server` (see **Upgrading** above).
- The bundled `datacard-generator` skill — it lives in the Genesis catalog and
is now installed on demand via `dsagt skills add <project> genesis`.
- Dead indexing of installed/bundled skills into the `skills` ChromaDB
collection (nothing read it after the catalog-only search change).

### Fixed
- CLI-added skill sources are now persisted to the project config.
- `dsagt --version` now works (it was documented but unimplemented — argparse
errored). Reports the version from `dsagt.__version__`.
- Catalog skills with technically-invalid YAML frontmatter (e.g. an unquoted
`description` containing a colon, like `…readiness levels: Level 1…`) are no
longer silently dropped from discovery. `_parse_frontmatter` falls back to a
lenient flat parse that recovers `name`/`description`/`tags`, so such skills —
including Genesis's `generating-datacards` (`datacard-generator`) — are
searchable and installable instead of skipped.

## [0.1.0] - 2026-01-11

### Added
- Initial release: registry and knowledge MCP servers, BYOA per-agent config
generation, MLflow/OTel observability, the tool/skill registry, execution
provenance, and explicit + episodic memory.

[Unreleased]: https://github.com/AI-ModCon/dsagt/compare/v0.2.0...HEAD
[0.2.0]: https://github.com/AI-ModCon/dsagt/compare/v0.1.0...v0.2.0
[0.1.0]: https://github.com/AI-ModCon/dsagt/releases/tag/v0.1.0
116 changes: 116 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

DSAGT (DataSmith Agent) is an AI-assisted data pipeline builder that exposes MCP (Model Context Protocol) servers to agent platforms (Claude Code, Goose, Roo, Cline, Codex). It helps domain scientists create reproducible, auditable data curation pipelines through iterative, knowledge-driven tool generation.

## Two run modes

1. **BYOA (Bring Your Own Agent)** — default for everyday use. `dsagt init --agent <name>` writes per-agent MCP config artifacts; `dsagt mlflow <project>` backgrounds MLflow and prints the OTel routing exports the user pastes into the shell that runs `claude` / `goose` / etc. Project / agent / session_id are read from `<project>/dsagt_config.yaml` + `.runtime` (single source of truth, no env-var duplication). `dsagt memory --project X` extracts episodic memory from accumulated traces — but only from proxy-shape traces (see #2).
2. **Proxy mode** — `dsagt start --enable-proxy <project>` interposes a LiteLLM proxy between the agent and its LLM provider. The proxy autologs every LLM call into MLflow with `mlflow.spanInputs` / `mlflow.spanOutputs` populated, which is the only trace shape `dsagt memory` knows how to extract from. Use this when you want both (a) request/response columns populated in the MLflow UI and (b) episodic memory extraction. Native agent OTel emission (Claude Code, Goose) is visible in the UI but uses a different shape (`api_response_body` log events), so memory extraction skips those traces.

## Commands

```bash
uv sync --all-groups # install
uv run --no-sync python -m pytest tests/test_<file>.py -q # targeted tests
uv run black . # format
uv run ruff check . # lint
```

**`python -m pytest` not bare `pytest`.** The bare-binary form picks up the wrong pytest on this machine and crashes with `ModuleNotFoundError: dsagt`.

**Don't run the full suite by default.** ~50s for 547 tests is too slow for an iteration loop. Run only the test file relevant to the change. `tests/test_config.py` covers session, init, agents, BYOA hints, launch shim, and memory state. Skip `test_integration.py`, `test_*_integration.py`, `test_server_startup.py`, `test_dependency_integration.py` unless explicitly relevant — they hit network or spawn subprocesses.

## Code Organization

The codebase separates **commands** (entry points with argparse, launched as CLI tools or subprocesses) from **modules** (importable logic). Commands live in `src/dsagt/commands/`, modules live in `src/dsagt/`.

**Commands** (`src/dsagt/commands/`):
- `cli.py` — `dsagt init / mlflow / memory / info / list / mv / rm / setup-kb / smoke-test / stop / start` (user-facing CLI). `dsagt start --enable-proxy` activates proxy mode; without `--enable-proxy` it's the supervised BYOA equivalent (start MLflow + agent under one process tree).
- `proxy_server.py` — `dsagt-proxy` (LiteLLM proxy with OTel autolog). Spawned by `dsagt start --enable-proxy`.
- `run_tool.py` — `dsagt-run` (tool execution wrapper).
- `setup_core_kb.py` — core KB setup (called via `dsagt setup-kb`).
- `info.py` — `dsagt info` (project / config introspection).

(The MCP server — `dsagt-server` — lives in the `src/dsagt/mcp/` package, see below.)

**Modules** (`src/dsagt/`):
- `session.py` — Project init, agent config generation, env-var resolution, config load/validate, service start/stop, end-of-session memory extraction orchestration.
- `agents/` — Per-agent-platform setup (`base.py` ABC + `claude.py` / `goose.py` / `cline.py` / `roo.py` / `codex.py`). Each subclass owns its `write_static`, `write_dynamic`, `env_overrides`, `byoa_env_hints`, `launch_oneliner`. Shared helpers (`_mcp_env_block`, `_render_launch_shim`, `_build_mcp_servers_dict`) in `base.py`.
- `knowledge.py` — FAISS/ChromaDB document retrieval, embedding backends, per-collection routing.
- `registry.py` — `ToolRegistry` (CLI tools) + `SkillRegistry` (agent instruction skills), KB indexing.
- `provenance.py` — Tool execution records (`run_and_record`, `ToolRecordStore`), execution-record indexing into ChromaDB, pipeline reconstruction (`reconstruct_pipeline`, dependency graph).
- `observability.py` — MLflow / OTel tracing setup, `init_tracing`, span helpers.
- `memory.py` — Explicit memory (YAML), episodic-memory extraction prompt + LLM call, outlier detection, `extract_session`.
- `skills.py` — External skill catalog data plane (`SkillsCatalog`: clone/sync/index/install), the `SkillRouter` render facade, and the Genesis-derived keyword scorer (`rank_skills`).

**MCP server** (`src/dsagt/mcp/`) — the single merged `dsagt-server`. `server.py` owns `main()`, the shared-KB startup (`_build_kb_from_config`), and the dispatch shell (`build_dispatch_server`); the tool surface is split by concern across `registry_tools.py` (tool registry + execution + provenance, 8 tools), `knowledge_tools.py` (KB retrieval, 6), `memory_tools.py` (explicit memory + suggestions, 4), and `skill_tools.py` (skill search/install/sources, 5). Each `*_tools.py` exposes a `_*_tools_and_handlers()` factory (composed by `create_dsagt_server`) plus a `create_*_server` test wrapper.

Entry points are defined in `pyproject.toml` `[project.scripts]`: the CLI/proxy/run/setup-kb tools point to `dsagt.commands.*:main`, and `dsagt-server` points to `dsagt.mcp.server:main`.

**Bundled assets** (shipped as `package-data` in `pyproject.toml`):
- `src/dsagt/tools/` — built-in tool specs (markdown + YAML frontmatter) copied into new projects.
- `src/dsagt/skills/` — built-in skills (e.g., `datacard-generator`) the agent discovers via `search_skills`.
- `src/dsagt/dsagt_instructions.md` — agent-platform-agnostic system instructions injected into per-agent files at init.

**`use_cases/`** holds end-to-end domain walkthroughs (`microbial_isolates/`, `cryoem/`, `isaac_vasp/`). They are reference material for users, not part of the test suite. `isaac_vasp/` is currently in active development on this branch.

## BYOA artifacts

`dsagt init --agent X --location <path>` writes (in the project dir):
- `dsagt_config.yaml` — internal config (project name, agent, mlflow port pinned at init, embedding/knowledge/extraction settings). No user-facing fields, no credentials.
- Per-agent instructions file (e.g., `CLAUDE.md`, `.goosehints`, `AGENTS.md`).
- Per-agent MCP config artifact (`.mcp.json` for claude, `goose.yaml` for goose, `cline_mcp_settings.json` via `cline mcp add`, `.roo/mcp.json`, `.codex-data/config.toml`). All include the env block (DSAGT_PROJECT, DSAGT_PROJECT_DIR, MLFLOW_TRACKING_URI, EMBEDDING_*) so MCP-server children that don't inherit shell env still log to the right MLflow.
- `dsagt-launch.sh` — bash shim that exports all dsagt-internal env (DSAGT_*, MLFLOW_*, OTEL_*, agent-specific telemetry verbosity flags), resolves the OTel experiment-id header at run time via curl, then execs the agent. The user runs this directly to launch.

`dsagt memory --project X` tracks a high-water-mark in `<pdir>/.dsagt/extracted_at.json` so re-runs only process new traces.

## Architecture

### MCP Server

A single merged `dsagt-server` (`src/dsagt/mcp/`) exposes 23 tools across four concern modules under one `Server` + one shared `KnowledgeBase`:

1. **Registry tools** (`mcp/registry_tools.py` + `registry.py` / `provenance.py`) — tool analysis, registration, dependency installation, command/file/http execution, and pipeline reconstruction. Tools are saved as markdown specs with YAML frontmatter.
2. **Knowledge tools** (`mcp/knowledge_tools.py` + `knowledge.py`) — semantic search over document collections (FAISS + ChromaDB, optional cross-encoder reranking); long ops run as background jobs.
3. **Memory tools** (`mcp/memory_tools.py` + `memory.py`) — explicit memory + outlier suggestions (`kb_remember` / `kb_get_memories` / …).
4. **Skill tools** (`mcp/skill_tools.py` + `skills.py`) — skill search / install + external catalog sources.

### Observability

- **MLflow** — Token usage, cost, latency, full LLM-call traces via OTel. Started by `dsagt mlflow <project>` (foreground, in its own terminal). Port is pinned at init time and lives in `dsagt_config.yaml`.
- **dsagt-run** (`commands/run_tool.py` + `provenance.py`) — Wraps tool commands; captures execution layer (command, stdout/stderr, timing, file lists) into `trace_archive/`.
- **MCP-server OTel** — `dsagt-server` calls `init_tracing()` at startup; its tool spans (kb.*, registry.*) flow to MLflow alongside the agent's LLM-call spans.

### Memory System

- **Episodic memory** (`memory.py:extract_session`) — End-of-session LLM extraction of facts from MLflow traces into ChromaDB, with per-category outlier detection via embedding centroids. Triggered by `dsagt memory --project X`.
- **Explicit memory** (`memory.py:ExplicitMemory`) — User-confirmed facts in YAML, loaded into agent context at session start.

### Key Design Patterns

- **Agent-agnostic**: DSAGT is infrastructure, not an agent. Capabilities are MCP services.
- **Session isolation**: Each project gets its own directory with config, tools, skills, kb_index, trace_archive, mlflow data.
- **Tools vs Skills**: Tools are CLI executables in `<project>/tools/` (specs with parameters, wrapped by dsagt-run). Skills are agent instruction workflows in `<project>/skills/` (SKILL.md + reference docs). Both are discoverable via ChromaDB-backed semantic search.

## DSAGT Pipeline Builder Workflow

When acting as a pipeline builder (using the MCP servers), follow these constraints:

1. **Never directly access data** — all data operations go through registered tools.
2. **Tool preference hierarchy**: Registered tool → KB package tool → Custom implementation.
3. **Generate paired tools** — every data operation gets a check tool (pre/post audit) and an operation tool.
4. **Audit everything** — before/after JSON reports saved to `audit/`.
5. **One step at a time** — iterate with the user, confirming approach before execution.

## Testing Patterns

- Tests use pytest with `subprocess.run` mocking for command execution.
- MCP server tests invoke handlers directly (no stdio transport).
- Async tests for server handlers.
- Temp directories for isolation; the `_use_tmp_registry` fixture in `tests/test_config.py` patches `DEFAULT_PROJECTS_BASE` and the project registry to `tmp_path`.
- Integration tests in `test_*_integration.py` require real `EMBEDDING_*` / `LLM_*` credentials.
- A handful of tests are class-skipped under `TestProviderEnvInjection` with a long reason about "old-code-shape env_overrides" — those describe the pre-Phase-1 design where `env_overrides` did broad provider-credential translation, which is now narrower (model env-var pinning only; provider creds + base URLs come via `proxy_env_overrides` or per-agent config files). Kept around for reference; safe to delete once Phase 2 is stable on real workloads.
Loading
Loading