diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index 0cae487..9ed5cb7 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -21,12 +21,15 @@ jobs: with: python-version: "3.12" + - name: Install uv + uses: astral-sh/setup-uv@v5 + - name: Install docs dependencies - run: pip install mkdocs-material + run: uv sync --group docs - name: Build docs - run: mkdocs build --strict + run: uv run mkdocs build --strict - name: Deploy to GitHub Pages if: github.ref == 'refs/heads/main' && github.event_name != 'pull_request' - run: mkdocs gh-deploy --force + run: uv run mkdocs gh-deploy --force diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..89cef74 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,120 @@ +# Changelog + +All notable changes to DSAgt are documented here. The format is based on +[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project +adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## [Unreleased] + +## [0.2.0] - 2026-06-24 + +This release adds an **external skill-catalog system** and consolidates the +agent-facing surface into a **single MCP server**. + +Agents can now discover and install skills from federated GitHub/GitLab catalogs +(Genesis, Anthropic, K-Dense, and more) and author their own — while installed +skills are picked up through each agent's *native* `SKILL.md` auto-discovery, so +`search_skills` is reserved for the one job native discovery can't do: browsing +the catalog of skills you haven't installed yet. + +In parallel, the registry and knowledge MCP servers — two processes that each +loaded their own embedder and opened their own ChromaDB (pure duplication, plus +a write-here/read-there hazard on the shared skill-catalog collections) — +collapse into one `dsagt-server`: one embedder, one Chroma owner, one connection +per agent, so startup is faster with fewer moving parts per project. + +**Upgrading from 0.1.0 (forwards compatibility).** There is no automatic +migration — adopting 0.2.0 is rebuild-not-migrate, and no project data changes: +- Re-run `dsagt start ` for each existing project; it regenerates the + per-agent MCP config to point at the single `dsagt-server`. +- For **cline** only, delete `/.cline-data` first — `cline mcp add` + has no remove, so the stale `dsagt-registry`/`dsagt-knowledge` entries would + otherwise linger next to the new one. +- Tools, skills, the KB index, traces, and memory all carry over untouched. + +### Added +- **External skill catalogs**: discover and install agent skills from GitHub / + GitLab sources via `add_skill_source`, `search_skills`, and `install_skill` + (plus the `dsagt skills sync/add/list/search` CLI), backed by per-source + ChromaDB collections. Curated sources ship out of the box (`scientific`, + `anthropic`, `antigravity`, `composio`, `genesis`); any git URL / `owner/repo` + also works. +- **Genesis catalog integration**: the curated `genesis` source (OSTI GitLab, + `gitlab.osti.gov/genesis/genesis-skills`) makes the BASE-Data / ModCon skills + — `datacard-generator` (frontmatter name `generating-datacards`), + `croissant-validator`, `hdmf-schema-builder` — pullable on demand + (`dsagt skills add genesis`, then `install_skill`) rather than + bundled in the package, alongside the rest of the Genesis catalog (HPC/Slurm, + HuggingFace, LangChain, and more). +- **Native skill discovery**: installed and bundled skills are mirrored into + each agent's native skill directory (`.claude/skills/`, `.agents/skills/`, …) + at init/start, so every supported agent auto-discovers them. +- **`skill-creator`** bundled skill for authoring new skills from the Anthropic + template. +- **Source-qualified catalog install**: when the same skill name exists in more + than one synced source, install a specific one with a `/` + name (via `install_skill` or `dsagt skills add /`) + instead of dead-ending on the ambiguity guard. +- **Keyword fallback** for `search_skills`: a zero-dependency token-overlap + scorer so catalog search works even when no embedding model is configured. +- **License / attribution provenance on install**: installing a catalog skill + preserves upstream `LICENSE` / `NOTICE` files and stamps a `PROVENANCE.txt` + recording the source repo and path into the installed skill directory. +- **`isaac_skills_demo` use case**: an end-to-end, skill-oriented walkthrough + (`use_cases/isaac_skills_demo/`) that drives a real agent through syncing a + catalog, installing a skill, and converting mock VASP output into an Isaac + record — with prompts and mock data included. +- **Install-from-GitHub instructions** for non-developers (`pip install + git+https://github.com/AI-ModCon/dsagt.git` into any Python 3.12/3.13 + environment) in the README and docs. + +### Changed +- **The two MCP servers are now one `dsagt-server`** — one shared + `KnowledgeBase`/embedder, one MCP entry per agent, one trace `service.name`. + The tool surface is organized by concern (registry / knowledge / memory / + skill) behind the single server. +- Skill discovery is now **catalog-only**: installed and bundled skills are + discovered natively by every supported agent, so `search_skills` covers only + the not-yet-installed external catalog. Catalogs are indexed on frontmatter + (name + description + tags) rather than the full SKILL.md body. +- `search_skills` now reports when no external catalog is synced instead of a + bare "no match", and `list_skill_sources` flags each known source as + `synced`/available with its indexed count. +- `install_skill` clarifies that an installed skill is usable in the current + session immediately — a restart is only needed for hands-free native + auto-invocation. +- The package version is single-sourced from `dsagt.__version__` (pyproject + reads it via setuptools dynamic metadata). +- Documentation home page (`docs/index.md`) pulls the supported-agents table + and install instructions directly from the README via the + `mkdocs-include-markdown` plugin, so the two no longer drift. + +### Removed +- **BREAKING:** the `dsagt-registry-server` and `dsagt-knowledge-server` console + scripts, replaced by `dsagt-server` (see **Upgrading** above). +- The bundled `datacard-generator` skill — it lives in the Genesis catalog and + is now installed on demand via `dsagt skills add genesis`. +- Dead indexing of installed/bundled skills into the `skills` ChromaDB + collection (nothing read it after the catalog-only search change). + +### Fixed +- CLI-added skill sources are now persisted to the project config. +- `dsagt --version` now works (it was documented but unimplemented — argparse + errored). Reports the version from `dsagt.__version__`. +- Catalog skills with technically-invalid YAML frontmatter (e.g. an unquoted + `description` containing a colon, like `…readiness levels: Level 1…`) are no + longer silently dropped from discovery. `_parse_frontmatter` falls back to a + lenient flat parse that recovers `name`/`description`/`tags`, so such skills — + including Genesis's `generating-datacards` (`datacard-generator`) — are + searchable and installable instead of skipped. + +## [0.1.0] - 2026-01-11 + +### Added +- Initial release: registry and knowledge MCP servers, BYOA per-agent config + generation, MLflow/OTel observability, the tool/skill registry, execution + provenance, and explicit + episodic memory. + +[Unreleased]: https://github.com/AI-ModCon/dsagt/compare/v0.2.0...HEAD +[0.2.0]: https://github.com/AI-ModCon/dsagt/compare/v0.1.0...v0.2.0 +[0.1.0]: https://github.com/AI-ModCon/dsagt/releases/tag/v0.1.0 diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..8fbfcb9 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,116 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +DSAGT (DataSmith Agent) is an AI-assisted data pipeline builder that exposes MCP (Model Context Protocol) servers to agent platforms (Claude Code, Goose, Roo, Cline, Codex). It helps domain scientists create reproducible, auditable data curation pipelines through iterative, knowledge-driven tool generation. + +## Two run modes + +1. **BYOA (Bring Your Own Agent)** — default for everyday use. `dsagt init --agent ` writes per-agent MCP config artifacts; `dsagt mlflow ` backgrounds MLflow and prints the OTel routing exports the user pastes into the shell that runs `claude` / `goose` / etc. Project / agent / session_id are read from `/dsagt_config.yaml` + `.runtime` (single source of truth, no env-var duplication). `dsagt memory --project X` extracts episodic memory from accumulated traces — but only from proxy-shape traces (see #2). +2. **Proxy mode** — `dsagt start --enable-proxy ` interposes a LiteLLM proxy between the agent and its LLM provider. The proxy autologs every LLM call into MLflow with `mlflow.spanInputs` / `mlflow.spanOutputs` populated, which is the only trace shape `dsagt memory` knows how to extract from. Use this when you want both (a) request/response columns populated in the MLflow UI and (b) episodic memory extraction. Native agent OTel emission (Claude Code, Goose) is visible in the UI but uses a different shape (`api_response_body` log events), so memory extraction skips those traces. + +## Commands + +```bash +uv sync --all-groups # install +uv run --no-sync python -m pytest tests/test_.py -q # targeted tests +uv run black . # format +uv run ruff check . # lint +``` + +**`python -m pytest` not bare `pytest`.** The bare-binary form picks up the wrong pytest on this machine and crashes with `ModuleNotFoundError: dsagt`. + +**Don't run the full suite by default.** ~50s for 547 tests is too slow for an iteration loop. Run only the test file relevant to the change. `tests/test_config.py` covers session, init, agents, BYOA hints, launch shim, and memory state. Skip `test_integration.py`, `test_*_integration.py`, `test_server_startup.py`, `test_dependency_integration.py` unless explicitly relevant — they hit network or spawn subprocesses. + +## Code Organization + +The codebase separates **commands** (entry points with argparse, launched as CLI tools or subprocesses) from **modules** (importable logic). Commands live in `src/dsagt/commands/`, modules live in `src/dsagt/`. + +**Commands** (`src/dsagt/commands/`): +- `cli.py` — `dsagt init / mlflow / memory / info / list / mv / rm / setup-kb / smoke-test / stop / start` (user-facing CLI). `dsagt start --enable-proxy` activates proxy mode; without `--enable-proxy` it's the supervised BYOA equivalent (start MLflow + agent under one process tree). +- `proxy_server.py` — `dsagt-proxy` (LiteLLM proxy with OTel autolog). Spawned by `dsagt start --enable-proxy`. +- `run_tool.py` — `dsagt-run` (tool execution wrapper). +- `setup_core_kb.py` — core KB setup (called via `dsagt setup-kb`). +- `info.py` — `dsagt info` (project / config introspection). + +(The MCP server — `dsagt-server` — lives in the `src/dsagt/mcp/` package, see below.) + +**Modules** (`src/dsagt/`): +- `session.py` — Project init, agent config generation, env-var resolution, config load/validate, service start/stop, end-of-session memory extraction orchestration. +- `agents/` — Per-agent-platform setup (`base.py` ABC + `claude.py` / `goose.py` / `cline.py` / `roo.py` / `codex.py`). Each subclass owns its `write_static`, `write_dynamic`, `env_overrides`, `byoa_env_hints`, `launch_oneliner`. Shared helpers (`_mcp_env_block`, `_render_launch_shim`, `_build_mcp_servers_dict`) in `base.py`. +- `knowledge.py` — FAISS/ChromaDB document retrieval, embedding backends, per-collection routing. +- `registry.py` — `ToolRegistry` (CLI tools) + `SkillRegistry` (agent instruction skills), KB indexing. +- `provenance.py` — Tool execution records (`run_and_record`, `ToolRecordStore`), execution-record indexing into ChromaDB, pipeline reconstruction (`reconstruct_pipeline`, dependency graph). +- `observability.py` — MLflow / OTel tracing setup, `init_tracing`, span helpers. +- `memory.py` — Explicit memory (YAML), episodic-memory extraction prompt + LLM call, outlier detection, `extract_session`. +- `skills.py` — External skill catalog data plane (`SkillsCatalog`: clone/sync/index/install), the `SkillRouter` render facade, and the Genesis-derived keyword scorer (`rank_skills`). + +**MCP server** (`src/dsagt/mcp/`) — the single merged `dsagt-server`. `server.py` owns `main()`, the shared-KB startup (`_build_kb_from_config`), and the dispatch shell (`build_dispatch_server`); the tool surface is split by concern across `registry_tools.py` (tool registry + execution + provenance, 8 tools), `knowledge_tools.py` (KB retrieval, 6), `memory_tools.py` (explicit memory + suggestions, 4), and `skill_tools.py` (skill search/install/sources, 5). Each `*_tools.py` exposes a `_*_tools_and_handlers()` factory (composed by `create_dsagt_server`) plus a `create_*_server` test wrapper. + +Entry points are defined in `pyproject.toml` `[project.scripts]`: the CLI/proxy/run/setup-kb tools point to `dsagt.commands.*:main`, and `dsagt-server` points to `dsagt.mcp.server:main`. + +**Bundled assets** (shipped as `package-data` in `pyproject.toml`): +- `src/dsagt/tools/` — built-in tool specs (markdown + YAML frontmatter) copied into new projects. +- `src/dsagt/skills/` — built-in skills (e.g., `datacard-generator`) the agent discovers via `search_skills`. +- `src/dsagt/dsagt_instructions.md` — agent-platform-agnostic system instructions injected into per-agent files at init. + +**`use_cases/`** holds end-to-end domain walkthroughs (`microbial_isolates/`, `cryoem/`, `isaac_vasp/`). They are reference material for users, not part of the test suite. `isaac_vasp/` is currently in active development on this branch. + +## BYOA artifacts + +`dsagt init --agent X --location ` writes (in the project dir): +- `dsagt_config.yaml` — internal config (project name, agent, mlflow port pinned at init, embedding/knowledge/extraction settings). No user-facing fields, no credentials. +- Per-agent instructions file (e.g., `CLAUDE.md`, `.goosehints`, `AGENTS.md`). +- Per-agent MCP config artifact (`.mcp.json` for claude, `goose.yaml` for goose, `cline_mcp_settings.json` via `cline mcp add`, `.roo/mcp.json`, `.codex-data/config.toml`). All include the env block (DSAGT_PROJECT, DSAGT_PROJECT_DIR, MLFLOW_TRACKING_URI, EMBEDDING_*) so MCP-server children that don't inherit shell env still log to the right MLflow. +- `dsagt-launch.sh` — bash shim that exports all dsagt-internal env (DSAGT_*, MLFLOW_*, OTEL_*, agent-specific telemetry verbosity flags), resolves the OTel experiment-id header at run time via curl, then execs the agent. The user runs this directly to launch. + +`dsagt memory --project X` tracks a high-water-mark in `/.dsagt/extracted_at.json` so re-runs only process new traces. + +## Architecture + +### MCP Server + +A single merged `dsagt-server` (`src/dsagt/mcp/`) exposes 23 tools across four concern modules under one `Server` + one shared `KnowledgeBase`: + +1. **Registry tools** (`mcp/registry_tools.py` + `registry.py` / `provenance.py`) — tool analysis, registration, dependency installation, command/file/http execution, and pipeline reconstruction. Tools are saved as markdown specs with YAML frontmatter. +2. **Knowledge tools** (`mcp/knowledge_tools.py` + `knowledge.py`) — semantic search over document collections (FAISS + ChromaDB, optional cross-encoder reranking); long ops run as background jobs. +3. **Memory tools** (`mcp/memory_tools.py` + `memory.py`) — explicit memory + outlier suggestions (`kb_remember` / `kb_get_memories` / …). +4. **Skill tools** (`mcp/skill_tools.py` + `skills.py`) — skill search / install + external catalog sources. + +### Observability + +- **MLflow** — Token usage, cost, latency, full LLM-call traces via OTel. Started by `dsagt mlflow ` (foreground, in its own terminal). Port is pinned at init time and lives in `dsagt_config.yaml`. +- **dsagt-run** (`commands/run_tool.py` + `provenance.py`) — Wraps tool commands; captures execution layer (command, stdout/stderr, timing, file lists) into `trace_archive/`. +- **MCP-server OTel** — `dsagt-server` calls `init_tracing()` at startup; its tool spans (kb.*, registry.*) flow to MLflow alongside the agent's LLM-call spans. + +### Memory System + +- **Episodic memory** (`memory.py:extract_session`) — End-of-session LLM extraction of facts from MLflow traces into ChromaDB, with per-category outlier detection via embedding centroids. Triggered by `dsagt memory --project X`. +- **Explicit memory** (`memory.py:ExplicitMemory`) — User-confirmed facts in YAML, loaded into agent context at session start. + +### Key Design Patterns + +- **Agent-agnostic**: DSAGT is infrastructure, not an agent. Capabilities are MCP services. +- **Session isolation**: Each project gets its own directory with config, tools, skills, kb_index, trace_archive, mlflow data. +- **Tools vs Skills**: Tools are CLI executables in `/tools/` (specs with parameters, wrapped by dsagt-run). Skills are agent instruction workflows in `/skills/` (SKILL.md + reference docs). Both are discoverable via ChromaDB-backed semantic search. + +## DSAGT Pipeline Builder Workflow + +When acting as a pipeline builder (using the MCP servers), follow these constraints: + +1. **Never directly access data** — all data operations go through registered tools. +2. **Tool preference hierarchy**: Registered tool → KB package tool → Custom implementation. +3. **Generate paired tools** — every data operation gets a check tool (pre/post audit) and an operation tool. +4. **Audit everything** — before/after JSON reports saved to `audit/`. +5. **One step at a time** — iterate with the user, confirming approach before execution. + +## Testing Patterns + +- Tests use pytest with `subprocess.run` mocking for command execution. +- MCP server tests invoke handlers directly (no stdio transport). +- Async tests for server handlers. +- Temp directories for isolation; the `_use_tmp_registry` fixture in `tests/test_config.py` patches `DEFAULT_PROJECTS_BASE` and the project registry to `tmp_path`. +- Integration tests in `test_*_integration.py` require real `EMBEDDING_*` / `LLM_*` credentials. +- A handful of tests are class-skipped under `TestProviderEnvInjection` with a long reason about "old-code-shape env_overrides" — those describe the pre-Phase-1 design where `env_overrides` did broad provider-credential translation, which is now narrower (model env-var pinning only; provider creds + base URLs come via `proxy_env_overrides` or per-agent config files). Kept around for reference; safe to delete once Phase 2 is stable on real workloads. diff --git a/README.md b/README.md index de1bc88..c73984b 100644 --- a/README.md +++ b/README.md @@ -4,10 +4,11 @@ ![DSAgt architecture](latex/architecture.png) -DSAgt connects an MCP-compatible AI coding agent to tool registration, a semantic knowledge base, execution provenance, and observability infrastructure. DSAgt provides data-pipeline scaffolding around a user's existing agent CLI or VS Code extension (Claude Code, Goose, Codex, …); +DSAgt connects an MCP-compatible AI coding agent to tool registration, a semantic knowledge base, skills discovery and creation, execution provenance, and observability infrastructure. DSAgt provides data-pipeline scaffolding around a user's existing agent CLI or VS Code extension (Claude Code, Goose, Codex, …); -**Prerequisites:** Python 3.10–3.13, [uv](https://github.com/astral-sh/uv), and one of the supported agent platforms below — already installed and authenticated against whatever LLM provider you intend to use. +**Prerequisites:** Python 3.12 or 3.13, and one of the supported agent platforms below — already installed and authenticated against whatever LLM provider you intend to use. ([uv](https://github.com/astral-sh/uv) is only needed for the development install.) + | Agent | Install | Verify | |-------|---------|--------| | [Claude Code](https://github.com/anthropics/claude-code) | `npm i -g @anthropic-ai/claude-code` | `claude --version` | @@ -16,6 +17,44 @@ DSAgt connects an MCP-compatible AI coding agent to tool registration, a semanti | [opencode](https://github.com/sst/opencode) | See [opencode docs](https://opencode.ai/docs/) | `opencode --version` | | [Roo Code](https://github.com/RooCodeInc/Roo-Code) | `npm i -g @roo-code/cli` | `roo --version` | | [Cline](https://github.com/cline/cline) | `npm i -g cline` | `cline --version` | + + +## Installation + +### For use (no development) + + +If you just want to *run* DSAgt against your own data and agent — no repo checkout, no `uv` — install it straight from GitHub into a virtual environment. Any Python 3.12/3.13 environment works (`venv`, conda, etc.); only the `pip install git+…` step is officially supported. + +```bash +python3.12 -m venv ~/.venvs/dsagt # or: conda create -n dsagt python=3.12 && conda activate dsagt +source ~/.venvs/dsagt/bin/activate # (Windows venv: ~\.venvs\dsagt\Scripts\activate) +pip install "git+https://github.com/AI-ModCon/dsagt.git" +dsagt --version # 0.2.0 +``` + +This puts the `dsagt` CLI (and the `dsagt-run` / `dsagt-server` helpers) on your PATH. Then build the shared knowledge base once and create your first project: + +```bash +dsagt setup-kb # bundled tools + skills + reference corpora + # (downloads a ~130 MB local embedder on first run) +dsagt init my-project --agent claude # or: goose / codex / opencode / roo / cline +dsagt start my-project +``` + +To upgrade later, reinstall and re-run `setup-kb` to pick up new bundled tools/skills: + +```bash +pip install --upgrade "git+https://github.com/AI-ModCon/dsagt.git" +dsagt setup-kb +``` + +> Pin to a specific release once tags are published, e.g. `pip install "git+https://github.com/AI-ModCon/dsagt.git@v0.2.0"`. + + +### For development + +Clone the repo and use `uv` (editable install with the full test suite) — see [Quick Start](#quick-start) below. ## Quick Start @@ -68,8 +107,8 @@ What this exercised: | Prompt | Layer | |---|---| -| 1 | Knowledge MCP server (`kb_ingest`) — chunks and indexes docs into ChromaDB | -| 2 | Registry MCP server (`save_tool_spec`) — writes `tools/csvcut.md`, `tools/csvgrep.md`, etc. (one per registered tool) | +| 1 | `dsagt-server` (`kb_ingest`) — chunks and indexes docs into ChromaDB | +| 2 | `dsagt-server` (`save_tool_spec`) — writes `tools/csvcut.md`, `tools/csvgrep.md`, etc. (one per registered tool) | | 3 | `dsagt-run` provenance wrapper — records exec layer to `trace_archive/` | | 4 | Explicit memory (`kb_remember` → `explicit_memories.yaml`) + KB recall | @@ -88,10 +127,10 @@ The same flow runs non-interactively via `dsagt smoke-test --agent claude` (or ` `dsagt setup-kb` builds the shared ChromaDB collections under `~/.dsagt/kb_index/` that every project on this machine reuses. Three of the six collections shown in the [architecture diagram](#architecture) are populated here — the other three are per-project and fill in automatically during use (see [Knowledge Base](#knowledge-base) below): - **Tool Specs** — DSAgt's bundled tool specs from `src/dsagt/tools/`, tagged with `source: bundled` so the agent finds them via `search_registry` from the very first session. -- **Skills** — DSAgt's bundled skill workflows from `src/dsagt/skills/` (e.g. `datacard-generator`), discovered via `search_skills`. +- **Skills Catalog** — the default external skill source (`scientific`) cloned and frontmatter-indexed so `search_skills` returns installable skills from the first session. Add more (`dsagt skills add genesis`, etc.). The bundled `skill-creator` is auto-discovered natively by the agent, not indexed. - **Domain Knowledge** — Reference corpora (NVIDIA NeMo Curator, AI Data Readiness Inspector) downloaded and embedded so the agent has data-curation domain knowledge out of the box. -The Tool Specs and Skills collections are wipe-and-rebuild on every run, so re-run `setup-kb` after upgrading DSAgt to pick up new bundled assets. +The Tool Specs collection is wipe-and-rebuild on every run, so re-run `setup-kb` after upgrading DSAgt to pick up new bundled tools. (Bundled skills are not indexed — agents auto-discover them natively.) ```bash dsagt setup-kb # all collections (local embedder, no creds) @@ -142,16 +181,29 @@ Projects are registered in `~/.dsagt/projects.yaml` so `dsagt mlflow ` and # cline: .clinerules/, cline_mcp_settings.json (managed via cline mcp add) ``` -### MCP Servers +### MCP Server + +DSAGT exposes a single MCP server, **`dsagt-server`**, that an agent connects to once. It bundles two concern areas: -- **Registry** (`dsagt-registry-server`) — Tool registration and dependency installation. Tools are markdown files with YAML frontmatter under `/tools/`. Executables are wrapped with `dsagt-run` for provenance and `uv run --with` for Python dependencies. The agent discovers tools via `search_registry`. -- **Knowledge** (`dsagt-knowledge-server`) — Semantic search over indexed document collections (FAISS / ChromaDB). Background jobs handle long ingest operations. The agent searches via `kb_search`, ingests via `kb_ingest`, and saves user-confirmed facts via `kb_remember`. +- **Registry** — Tool registration and dependency installation. Tools are markdown files with YAML frontmatter under `/tools/`. Executables are wrapped with `dsagt-run` for provenance and `uv run --with` for Python dependencies. The agent discovers tools via `search_registry`. +- **Knowledge** — Semantic search over indexed document collections (FAISS / ChromaDB). Background jobs handle long ingest operations. The agent searches via `kb_search`, ingests via `kb_ingest`, and saves user-confirmed facts via `kb_remember`. + +> **Upgrading from the two-server layout?** Earlier versions ran separate `dsagt-registry-server` and `dsagt-knowledge-server` processes. There is no automatic migration: re-run `dsagt start ` and the per-agent MCP config is regenerated to point at the single `dsagt-server`. For **cline** specifically, `cline mcp add` has no remove, so an upgraded project keeps stale `dsagt-registry`/`dsagt-knowledge` entries alongside the new `dsagt` one — delete `/.cline-data` before `dsagt start` to get a clean config. ### Tools and Skills -**Tools** are CLI executables defined as markdown files with YAML frontmatter in `/tools/`. The agent registers new tools via the registry MCP server's `save_tool_spec`. +**Tools** are CLI executables defined as markdown files with YAML frontmatter in `/tools/`. The agent registers new tools via the MCP server's `save_tool_spec`. + +**Skills** are instruction-based agent workflows — a directory with a `SKILL.md` and optional reference docs. They come in two tiers: + +- **Installed** skills live in `/skills/` (DSAgt ships a bundled `skill-creator`; domain skills like the MODCON datacard generator are installed from the `genesis` catalog). These are mirrored into the agent's native skill directory (e.g. `.claude/skills/`, `.agents/skills/`) at `dsagt init`/`start`, where the agent auto-discovers and auto-invokes them — no `search_skills` needed (that covers only the catalog tier below). +- **Catalog** skills come from external Git repositories — GitHub *or* GitLab — indexed into a searchable catalog the agent browses with `search_skills` but that is **not** loaded into its context (so a catalog can hold thousands of skills). The agent enables a source with `add_skill_source(...)`, finds skills with `search_skills(...)`, then copies one into the project with `install_skill(...)`. + +The catalog is **opt-in**: a source must be synced before its skills are searchable. Curated named sources ship out of the box — `scientific`, `anthropic`, `antigravity`, `composio`, and `genesis` (the OSTI GENESIS catalog: HPC, HuggingFace, LangChain, OpenAI, plasma-sim, and more) — and any Git URL or `owner/repo` works too. Manage catalogs from the CLI with `dsagt skills list/search/add/sync `, or from the agent with `list_skill_sources` / `add_skill_source` / `search_skills` / `install_skill`. + +![DSAgt skills routing](latex/skills-routing.png) -**Skills** are instruction-based agent workflows in `/skills/`. Each skill is a directory containing a `SKILL.md` and optional reference docs. DSAgt ships with a bundled `datacard-generator` skill. The agent discovers skills via `search_skills`. +Skill handling runs through one service over two stores. **`SkillRouter`** is the single skill-MCP entry point — every skill tool routes through it: `add_skill_source` / `list_skill_sources` manage repos, `search_skills` queries the catalog, `install_skill` adopts a catalog skill into the project. **Registration** pulls skills from External Skills Repos (the curated `scientific` / `anthropic` / `antigravity` / `composio` / `genesis` sources, *or any git URL*) into the **Skills Catalog** — a federated, searchable store of *not-yet-installed* skills (semantic search, with a zero-dependency keyword fallback when no embedder is configured). **Discovery** is the catalog's irreplaceable job: surfacing skills the agent doesn't yet have, which native discovery can't see. **Progressive exposure** is native: the **Skill Directory** holds the project's installed + created skills in each agent's own skill dir (`.claude/skills`, `.agents/skills`, `.cline/skills`, `.roo/skills`), where the agent auto-discovers and model-invokes them by relevance — and authors new ones via the bundled **`skill-creator`** skill. The diagram source is [`latex/skills-routing.tex`](latex/skills-routing.tex). ### Knowledge Base @@ -160,7 +212,7 @@ Six independently-partitioned ChromaDB collections hold everything the agent sea | Collection | Source | Populated by | |---|---|---| | **Tool Specs** | Bundled CLI tool specs in `src/dsagt/tools/` | `dsagt setup-kb` | -| **Skills** | Bundled skill workflows in `src/dsagt/skills/` | `dsagt setup-kb` | +| **Skills Catalog** | Installable skills from external repos (one `skills_catalog__` per source), frontmatter-indexed | `dsagt setup-kb` (default source) + `add_skill_source` | | **Domain Knowledge** | NeMo Curator + AIDRIN reference corpora; user-ingested docs | `dsagt setup-kb` + agent's `kb_ingest` | | **Explicit Memory** | User-confirmed facts | Agent's `kb_remember` (also written to `/explicit_memories.yaml`); the agent fetches via `kb_get_memories` on demand — typically when you ask it to recall — not auto-loaded at session start | | **Episodic Memory** | Distilled facts from MLflow traces | `dsagt memory --project ` (per-category outlier detection via embedding centroids) | @@ -168,7 +220,7 @@ Six independently-partitioned ChromaDB collections hold everything the agent sea The default embedding backend is local (sentence-transformers, CPU-side, no API needed). Switch to `embedding.backend: api` in `dsagt_config.yaml` to route through a hosted embedder via LiteLLM. Cross-encoder reranking is optional (`knowledge.rerank: true`). -The agent searches via `kb_search` (knowledge MCP server) and writes via `kb_ingest` / `kb_remember`. Tool Specs and Skills are queried through specialized routes (`search_registry`, `search_skills`) over the same backend. +The agent searches via `kb_search` and writes via `kb_ingest` / `kb_remember`. Registered tools have their own `search_registry` route over the same backend. Installed skills are auto-discovered natively by the agent (not indexed); enabling external skill catalogs adds one `skills_catalog__` collection per source, which `search_skills` browses for installable skills. ### Observability @@ -190,6 +242,7 @@ Every span carries the project's `session.id` for filtering. Tool execution reco | `dsagt memory --project ` | Distill new traces from this project's MLflow into episodic memory | | `dsagt info [--json]` | Resolved config (with source per value) and a session/error summary | | `dsagt setup-kb [--collection ]` | Build the shared core knowledge base collections | +| `dsagt skills [source]` | Manage external skill catalogs and project skill installs | | `dsagt list` | List all projects with agent, status, and path | | `dsagt mv ` | Move a project to a new location | | `dsagt rm [-y] [--keep-files]` | Unregister a project (and optionally delete its directory) | diff --git a/design-notes/genesis-skills-comparison.md b/design-notes/genesis-skills-comparison.md new file mode 100644 index 0000000..aed6644 --- /dev/null +++ b/design-notes/genesis-skills-comparison.md @@ -0,0 +1,516 @@ +# Design Note — Genesis Skills vs DSAGT skill discovery + +**Status:** implemented (2026-06-23). Sections 0–6 are the original close read of +the Genesis Skills discovery engine vs DSAGT's. Sections 7–9 record the agreed +design as it stood mid-plan. **§10 records what actually shipped and supersedes +the tier-3 / progressive-disclosure / recency-queue parts of §7.4–7.6 and §9** — +a research pivot (every supported agent natively discovers SKILL.md) collapsed +those. Read §10 for the final architecture; §7–9 are kept for design history. + +**Date:** 2026-06-23 +**Author:** comparison drafted with Claude Code + +- Genesis Skills: + (the `skill-search/` engine + a curated 74-skill catalog under `skills/`) +- DSAGT skill discovery: [skills_catalog.py](../src/dsagt/commands/skills_catalog.py), + [registry.py](../src/dsagt/registry.py) (`SkillRegistry`), + [registry_server.py](../src/dsagt/commands/registry_server.py) (`search_skills`/`install_skill`), + [knowledge_server.py](../src/dsagt/commands/knowledge_server.py) (`add_skill_source`/`list_skill_sources`) + +--- + +## 0. Relationship (important framing) + +This is **not a competitor** — it's a sibling DOE effort. Genesis Skills is part +of the "Genesis Mission" (`genesis@osti.gov`, gitlab.osti.gov/genesis), and its +catalog includes a **`modcon-skills/`** category (`datacard-generator`, +`croissant-validator`, `hdmf-schema-builder`) — i.e. our own BASE-Data/ModCon +skills. DSAGT (`AI-ModCon`, BASE-Data team) and Genesis are in the same orbit. + +DSAGT already **consumes** Genesis as a catalog source (the `genesis` entry in +`KNOWN_SOURCES`, subdir `skills`, ~72 of 74 skills index cleanly; 2 are skipped +for malformed upstream YAML). So the natural relationship is **complementary**: +DSAGT = semantic, multi-source, platform-integrated layer that *indexes* Genesis +(and others); Genesis = portable, standard-conformant content + lightweight +discovery. + +--- + +## 1. What Genesis built + +A focused, polished, single-purpose **skill-discovery engine** (`skill-search/`) +plus a curated catalog (74 skills across 10+ domains). + +- **Search method:** pure-Python **keyword token-overlap** scoring + (`skill-search/skill_search/catalog.py`). Name-token matches ×2, + description-token matches ×1, plus exact/substring bonuses (+6 exact name, + +4 substring name, +2 substring description), stopword filtering, deterministic + tie-break by name. **Zero dependencies, no embeddings, no DB, no model + download.** +- **Two discovery strategies:** + 1. `--query` → top-k keyword matches (keeps full catalog out of context). + 2. `--load-all` → progressive disclosure: compact index of *all* skills + (name/description/path) upfront, load `SKILL.md` body on demand. + 3. `--include-prompt` → emits an `` XML block for direct + system-prompt injection. +- **Discovery:** filesystem recursion over nested `skills///`, + merged by name with **override semantics** (central catalog + user roots; later + roots win). No persistent index — re-scanned each call. +- **The engine is itself a Skill** (`SKILL.md`, `allowed-tools: Bash`): the agent + runs it via Bash and gets JSON back. **No server.** +- **Distribution:** `unpack.sh` flattens/symlinks skills into an agent's native + dir across **multiple harnesses** (Claude Code, Gemini, Codex `.agents/skills/`); + LangChain/LangGraph import the `skill_search` package directly. +- **Standard conformance:** explicitly follows the **agentskills.io open + standard** (`skill_spec.md` mirrors the spec), points at `skills-ref validate`, + and has real license hygiene (Apache-2.0, `NOTICE` inventory, per-subtree + licenses for vendored third-party skills). +- **Engineering polish:** standalone package (`catalog`/`frontmatter`/`models`/ + `prompt`/`tooling`/`errors`), ~500 lines of unit tests, layered deployment + resolution (`--central-root` flag → `SKILL_SEARCH_CENTRAL_ROOT` env → sibling + `../skills`). + +--- + +## 2. What DSAGT has + +Skill discovery is **one feature inside a platform** (KB, tool registry, +provenance, observability, memory), exposed over MCP. + +- **Search method:** **semantic embeddings + BM25 hybrid** over ChromaDB + (local `bge-small-en-v1.5` or hosted via LiteLLM; `route.json` `hybrid: true`, + `bm25.pkl` present). Better recall on natural-language queries. +- **Index:** persistent **per-source ChromaDB collections** + (`skills_catalog__`), built by `sync_source` (clone → index). Sparse + clone via `subdir`. Machine-global cache at `~/.dsagt/.skill_sources/`. +- **Sources:** many **remote** sources (GitHub *and* GitLab), curated + `KNOWN_SOURCES` (`scientific`, `anthropic`, `antigravity`, `composio`, + `genesis`) **plus any git URL / `owner/repo`**. +- **Runtime:** MCP server tools the agent calls directly — `search_skills`, + `add_skill_source`, `list_skill_sources`, `install_skill` — plus the + `dsagt skills sync/add/list/search` CLI. +- **Two tiers:** *installed* skills (mirrored into native `.claude/skills/` at + init/start, auto-invocable) vs *catalog* skills (indexed, not in context). +- **Dependencies:** requires an embedder (~130 MB local model or an API key). + +--- + +## 3. Side-by-side + +| Dimension | Genesis `skill-search` | DSAGT | +|---|---|---| +| Search | Keyword token-overlap, deterministic | Semantic embeddings + BM25 hybrid | +| Index | None; re-scans filesystem each call | Persistent per-source ChromaDB | +| Sources | One filesystem tree (its own repo) | Many remote sources (GitHub + GitLab), curated + arbitrary URL | +| Runtime | A Bash-invoked Skill; no server | MCP server tools + CLI | +| Dependencies | Zero (stdlib only) | Embedder (~130 MB local or API) | +| Scope | Just skill discovery/distribution | One feature in a platform (KB/registry/provenance/memory) | +| Standard | agentskills.io conformant + validation | Own SKILL.md convention (close, not formalized) | +| Multi-harness install | `unpack.sh` (Claude/Gemini/Codex) | Mirrors to `.claude/skills/` (per-agent) | +| Context modes | Search top-k, full index, prompt-XML | Search returns summaries; catalog kept out of context | +| Tests/packaging | Dedicated package + ~500 LOC unit tests | Solid, embedded in the larger codebase | +| License hygiene | NOTICE + per-subtree licenses | Not tracked on `install_skill` copy | + +--- + +## 4. Honest assessment + +**Where Genesis is better developed (the narrow slice):** +- Open-standard conformance + validation tooling. +- Portability — no server, works across harnesses, the engine ships *as a skill*. +- Zero-dependency, deterministic, instant — no embedder download, no API, reproducible. +- Cleaner standalone package with tests; proper license/attribution for vendored skills. +- A large curated catalog ready to go. + +**Where DSAGT is stronger (what matters for the platform):** +- Semantic + hybrid search beats keyword overlap on fuzzy/natural queries. + (Keyword overlap misses synonymy: "submit a batch job" won't score `slurm` + unless tokens literally overlap.) +- Multi-source, persistently-indexed catalog federating many remote hosts — + Genesis searches one tree; DSAGT federates Genesis *plus* K-Dense, Anthropic, … +- Integration: skills live alongside the KB, tool registry, provenance, and + memory in a reproducible pipeline workflow. + +--- + +## 5. Borrowables (candidate work items, prioritized) + +1. **Keyword fallback for `search_skills` when no embedder is configured.** + Today `search_skills` dead-ends with "requires a configured knowledge base" + when `kb is None`. Genesis's token-overlap scorer + (`catalog.py:_score_skill` / `rank_skills`) is exactly the zero-dependency + fallback that would make search work without the 130 MB model. **Highest + value / lowest cost**; directly fixes the no-embedder gap. +2. **Adopt the agentskills.io standard explicitly** for DSAGT's SKILL.md + (already ~compatible) and add a validation step (`skills-ref validate` or + equivalent). Improves interop — and we now feed from a standard-conformant + source. +3. **Progressive-disclosure prompt block** (`` XML) — a cheap + mode for small catalogs that skips the index entirely. +4. **License / NOTICE hygiene on `install_skill`.** When a (possibly + third-party) catalog skill is copied into a project, preserve its `LICENSE`. + Genesis tracks this per-subtree; DSAGT currently doesn't. +5. **Lean into complementarity.** Keep DSAGT as the semantic, multi-source, + platform-integrated layer that indexes Genesis (and others). Consider + coordinating with the Genesis team — modcon-skills already overlaps, so + there's a shared-content story. + +--- + +## 6. Open questions for the merge decision (pick up later) + +- Do we want DSAGT to *re-export* an agentskills.io-conformant catalog (so other + tools, incl. Genesis `skill-search`, can consume DSAGT's skills)? +- Should the keyword fallback (#1) be a permanent low-tier search mode (fast, + cheap, deterministic) selectable even when an embedder *is* available? +- Is there appetite to upstream DSAGT/ModCon skills into Genesis rather than + maintain parallel copies (the `modcon-skills/` overlap)? +- Packaging: is the Genesis `skill_search` Python package worth depending on + directly for the fallback, or do we reimplement the (small) scorer to avoid a + dependency on an external repo? + +--- + +## 7. Decided plan (2026-06-23) + +The architecture below is **not a rewrite** — DSAGT's existing chain already *is* +the curation + per-project-install model we want. The plan is a set of insertions +on top of it. + +### 7.1 Earmarked borrowables (committed) + +1. **Keyword fallback for `search_skills` when `kb is None`.** Reimplement (do + **not** depend on) Genesis's `_score_skill` / `rank_skills` token-overlap + scorer — ~50 LOC, vendored with attribution. Insert it *before* the + `kb is None` dead-end at `registry_server.py:324`. It reads frontmatter off + the on-disk clone cache (`~/.dsagt/.skill_sources/`) + bundled skills, which + already exists because `sync_source` clones even when `kb is None` + (`skills_catalog.py:218`). No embedder, no separate index. + *Status (built):* `src/dsagt/skill_keyword.py` mirrors Genesis exactly — + weights (name ×2 / desc ×1), **mutually-exclusive** substring bonus tiers + (+6 exact / +4 name / +2 desc, via `elif`), the same stopword set, and the + casefold-`\w+`-hyphen-split tokenizer that drops single-char tokens. + Verified against the upstream source, not just the prose spec. It is strictly + the *no-KB* path; when a KB exists the router uses ChromaDB (whose hybrid mode + already includes BM25), so the scorer and BM25 are mutually exclusive and + never double-rank. +2. **License / NOTICE hygiene on `install_skill`.** `install_into_project`'s + `copytree` (`skills_catalog.py:313`) already carries a *skill-dir-local* + LICENSE. The gap is the source repo's **root NOTICE / per-subtree license + provenance** — capture it at sync time (Stage A) and stamp it at install. + +### 7.2 Shipped-skill cut + +DSAGT ships only two skills today — `datacard-generator` (frontmatter name +`generating-datacards`) and `skill-creator` — plus one tool (`scan_directory`). + +- **Strike `datacard-generator`** from the repo; it lives in Genesis + `modcon-skills/`. Users get it via `add_skill_source genesis` (catalog tier). + **Verify the Genesis copy exists and matches before deleting**, then leave a + pointer. This makes Genesis the canonical home (open question #3 becomes a real + coordination dependency). +- **Keep `skill-creator`** as the single minimal shipped skill: it's + *infrastructure*, not domain content, so it doesn't belong in Genesis's curated + domain catalog; it's self-referential (the harness can scaffold test skills with + it); and it's stable. Shipped skill + `scan_directory` tool exist primarily as + **test-harness fixtures**, since no shipped tool does more than wrap standard CLI. + +### 7.3 Current MCP call chain (the anchor) + +``` +add_skill_source / list_skill_sources → search_skills → install_skill → dsagt start + (expose catalogs) (select a subset) (draw into project) (activate natively) + knowledge_server registry_server registry_server → agents/claude.py + skills_catalog _mirror_skills_to +``` + +- **Stage A — Expose.** `add_skill_source` → `resolve_source` → `sync_source`: + sparse-clone by `subdir` into `~/.dsagt/.skill_sources//`, then + `index_catalog` **wipes + rebuilds** that one source's + `skills_catalog__` ChromaDB collection; persists the source to + `dsagt_config.yaml`. `list_skill_sources` reports `KNOWN_SOURCES` + synced + state. (`skills_catalog.py:185`, `knowledge_server.py:609`) +- **Stage B — Select.** `search_skills` searches `SKILLS_COLLECTION` (installed) + **plus every** `skills_catalog__` collection, merges by score, returns + top-k tagged `[installed]` / `[catalog · install_skill to add]`. Dead-ends when + `kb is None` (except exact `skill_name`). (`registry_server.py:305`) +- **Stage C — Draw.** `install_skill` → `find_catalog_skill` (cross-source + ambiguity guard) → `install_into_project` copies the skill dir into + `/skills//`, re-indexes it as `registered`. + (`registry_server.py:390`, `skills_catalog.py:296`) +- **Stage D — Activate.** Next `dsagt start` mirrors `/skills/` → + `.claude/skills/` for native auto-invocation — **only `claude.py` does this**; + goose/cline/roo/codex have no native mirror. (`agents/claude.py:182`, + `agents/base.py:251`) + +Installed skills promoted via `install_skill` become part of the **core installed +set** for future sessions in that project. + +### 7.4 Tiering & backend selection (the core design) + +> ⚠️ **Superseded by §10.** Tiers 2/3 and the budget threshold below assumed some +> agents lack native skill discovery. Research proved otherwise — *all* supported +> agents are native, so tier 3 (and the disclosure block + recency queue it +> motivated) was never built. The *catalog vs installed* split and the +> *ChromaDB-or-keyword* backend selection survive; the rest is history. + + +Progressive disclosure and ChromaDB are **not competitors** — they sit on +different axes: + +- **What is disclosed:** installed (core, project) vs. catalog (federated, remote). +- **How the index is produced:** *full dump* (deterministic, no query) vs. + *query-driven selection* (needs a query, returns top-k). + +Both produce the **same `` block** (the agentskills.io-style +output contract). Backend is chosen by **context budget**, not by tier: + +> Full-dump while the disclosed set fits the budget; switch to query-driven +> *selection* when it doesn't. "Selection" = **ChromaDB if an embedder exists, +> Genesis keyword scorer if not.** ("Fall back on ChromaDB" is shorthand — the +> real fallback is to *selection*, whose backend is itself tiered.) + +Three operating tiers: + +1. **Catalog (any harness)** — *always* query-driven. Unbounded (Genesis + + Anthropic + …), so never full-dumped. ChromaDB top-k, or keyword scorer when + no embedder. +2. **Installed, native harness (Claude)** — dsagt **defers** to the harness. + `_mirror_skills_to` populates `.claude/skills/`; Claude's own runtime injects + names/descriptions and lazy-loads bodies (note the `_NATIVE_DESCRIPTION_CAP` + truncation at `base.py:231` — *Claude's* limit, not dsagt's). dsagt emits no + block here, so its budget threshold never fires. +3. **Installed, non-native harness (goose/cline/roo/codex)** — dsagt **is** the + producer of the block, because the harness has no native discovery. Full-dump + the installed set into the agent's instructions file until the context budget + is hit; past that, drop to a short pointer + query-driven selection. See 7.5. + +### 7.5 The non-native third tier, by example + +On goose/cline/roo/codex an installed skill lands in `/skills//` +but **nothing auto-injects it** — today the agent only finds it by calling +`search_skills`. The third tier closes that gap: dsagt emits the +progressive-disclosure block the harness won't. + +*Goose, 5 installed skills* (`skill-creator`, `generating-datacards`, +`slurm-submit`, `croissant-validator`, `fastq-qc`): at `dsagt start`, dsagt writes +an `` block (name + description + path per skill) into +`.goosehints`. ~5 × 40 ≈ **200 tokens** → full-dump. Goose passively knows all +five and reads a SKILL.md body on demand. No embedder, no query. + +*Same project, 100 installed skills*: the block is now ~4,000 tokens **every +session**. Past the budget, dsagt stops full-dumping, leaves a short pointer +("call `search_skills`…"), and lets selection carry it (ChromaDB top-k, or keyword +scorer). The agent pulls ~3 relevant skills per task instead of carrying all 100. + +This threshold fires **only** in tier 3 — the one case where dsagt both produces +the block and pays its context cost. Codex graduates tier 3 → tier 2 in this pass +by mirroring into its natively-discovered, **project-local** `.agents/skills/` +(§9.7), so it stops going through dsagt's threshold; goose/cline/roo remain +tier 3. + +### 7.6 The discovery router (the consolidating abstraction) + +All of the policy in 7.4–7.5 — *which backend, which tier, full-dump vs select, +installed vs catalog, defer-to-harness vs emit* — lives in **one** place rather +than smeared across the MCP handler, `base.py`, and each agent file. Otherwise the +decision tree gets re-implemented and drifts at every call site. + +**Home:** new module `src/dsagt/skill_discovery.py` (a *module*, not a command), +class `SkillRouter`. It **owns policy and delegates execution.** + +```python +class SkillRouter: + def __init__(self, *, kb, skill_registry, project_dir, agent, config): ... + + def search(self, query=None, *, top_k=8, tag=None, skill_name=None) -> str: + # Stage B. Owns backend choice (ChromaDB vs keyword vs full-dump), + # installed+catalog merge, no-embedder fallback, rendering. + + def disclosure_block(self) -> str | None: + # Stage D. Owns tier resolution + budget threshold. Returns None when + # the harness owns the tier (native). + + def list_sources(self) -> list[dict]: + # Stage A view. KNOWN_SOURCES + per-collection indexed count + synced? + # one consistent view for both the MCP handler and the CLI. +``` + +- **Owns:** backend selection (`_select`: kb→ChromaDB, else keyword scorer), + tier resolution, budget threshold (`_mode_for_installed`), and the single + `` renderer (`_render`, used by *both* `search` and + `disclosure_block`). +- **Delegates (unchanged):** `kb.search`, `sync_source` / `install_into_project` + (`skills_catalog`), `_discover_skill_dirs` / `_parse_frontmatter`, + `_mirror_skills_to`. It is a coordinator, not a reimplementation. +- **Three thin call sites:** `registry_server._handle_search_skills` → + `router.search`; each agent's `write_dynamic` → `router.disclosure_block`; and + the **CLI** `dsagt skills search/list` (`cli.py:_cmd_skills`) → `router.search` + / `router.list_sources`. The CLI is the decisive case: `cli.py:485-509` is + *already* a drifted copy of the `_handle_search_skills` merge logic (no `tag`, + no exact-`skill_name`, no `kb is None` path, different render). The router + collapses both copies into one. +- **`list_skill_sources` exposure status** (`knowledge_server.py:609`) also folds + into `router.list_sources()` — today the MCP handler and CLI `skills list + --catalog` compute "what's synced" differently. +- **Nativeness becomes explicit:** each agent declares `native_skills: bool` + (claude=True, codex=True; goose/cline/roo=False) instead of it being implicit in + "only claude calls `_mirror_skills_to`." The router reads the flag for tier 2 vs + 3. Codex is tier 2 via a **project-local** `.agents/skills/` mirror only (see §9.7). + +**Materialize vs. disclose (don't conflate).** Two separate jobs, both keyed off +`native_skills`: + +- *Materialize* = put skill files where the agent expects them. **dsagt does this + for every tier.** Native (claude): `_mirror_skills_to` → `.claude/skills/`. + Non-native: skills just live in `/skills/` from install (no native dir). +- *Disclose* = make the agent aware of them in context. Native: the harness does + it (router returns `None`). Non-native: `router.disclosure_block()` emits the + block. + +So the router defers tier-2 **disclosure** to Claude — **not** materialization. +dsagt still mirrors into `.claude/skills/`. The agent's `write_dynamic` is +`if native_skills: _mirror_skills_to(...) else: write(router.disclosure_block())`. +(`_mirror_skills_to` stays outside the router as materialization *execution*, not +because dsagt is hands-off for native harnesses.) + +**Recency queue (dedup + un-bury).** SkillRouter owns a length-bounded, +session-scoped FIFO of recently-exposed skill names. Skills emitted in the +disclosure block or returned by `search` are pushed on; while a skill is *fresh* +(in the queue) `search` suppresses re-surfacing it — it's already in context. As +new exposures push it off the tail it ages out and becomes eligible again (by +then likely buried in the transcript). This unifies the old double-listing and +re-emit threads: `install_skill` marks the new skill fresh, so it isn't +redundantly re-surfaced and no disclosure re-emit is needed. The queue **must be +disk-backed** (`/.dsagt/exposed_skills.json`) because the disclosure +block (agent-setup process) and `search` (MCP server process) are separate +processes that share it. *Refinement:* an explicit query that hits a fresh skill +returns a terse "already available" pointer rather than an empty result. + +--- + +## 8. Implementation surface (router-centric) + +Most changes land *inside* `SkillRouter`; call sites stay thin. + +| Change | Where | Kind | +|---|---|---| +| `SkillRouter` (backend select, tier, budget, render) | new `src/dsagt/skill_discovery.py` | New module | +| Vendored keyword scorer (genesis-derived, attributed) | new `src/dsagt/skill_keyword.py`, called by `router._select` | New code | +| `search_skills` → router | `registry_server._handle_search_skills` (replace body with `router.search`) | Thin call site | +| `disclosure_block` → router | `agents/{goose,cline,roo,codex}.py` `write_dynamic` (one-line call) | Thin call site | +| **CLI** `skills search/list` → router | `cli.py:_cmd_skills` (`cli.py:485-509` — kill the drifted dup) → `router.search` / `router.list_sources` | Thin call site | +| `list_skill_sources` → router | `knowledge_server.py:609` → `router.list_sources` | Thin call site | +| `native_skills` flag | each `agents/*.py` (declare True/False) | Declaration | +| License/NOTICE capture | `sync_source` + `install_into_project` (`skills_catalog.py:313`) — data-plane, outside router | Augment | +| Strike `datacard-generator` | `src/dsagt/skills/` + `pyproject.toml` package-data | Deletion | + +**Stays outside the router (delegated data / execution):** `SkillRegistry` +(installed-skills data + indexing — router reads it); `sync_source` / +`index_catalog` / `find_catalog_skill` / `install_into_project` (catalog +execution); `_mirror_skills_to` / `_truncate_native_description` (tier-2 native +mirror); `resolve_source` / `KNOWN_SOURCES` / `persist_source_to_config` (source +config). `_parse_frontmatter` and `_discover_skill_dirs` are already +single-sourced shared primitives — the router imports them, no move needed. + +--- + +## 9. Resolved decisions (2026-06-23) + +1. **Budget threshold** — a **token count** (estimated, e.g. `chars/4` to avoid a + tokenizer dependency), with a sensible default, as a property of `SkillRouter` + (per-agent overridable). The router measures it. +2. **Double-exposure** — solved by the recency queue (§7.6), not a static rule. + While a skill is *fresh* (in the session queue) `search` won't re-surface it. +3. **Mid-session freshness** — **no re-emit.** The agent retains newly installed + skills in context (`install_skill` returns the SKILL.md), and the queue marks + them fresh so `search` won't redundantly re-surface them. +4. **agentskills.io conformance** — adopt only the `` **output** + shape. **Not** full input conformance: strict validation would force + rewrite-or-exclude on third-party repos and add validation/normalization code — + *more* complex, not simpler, so it fails the "only if it simplifies" test. Keep + lenient parsing (parse what we can, skip malformed — as today). +5. **Keyword-scorer latency over large caches** — deferred; revisit only if + non-KB users hit usability issues. +6. **`datacard-generator`** — strike dsagt's copy; it's stale, the Genesis copy is + more current. *Pre-deletion check:* confirm the Genesis copy indexes cleanly + (isn't one of the 2 malformed-YAML skips) before removing ours. +7. **Codex → tier 2 now, project-local only.** Reuse the manifest-tracked + `_mirror_skills_to` pointed at `/.agents/skills/`. **Never** write + global `~/.agents/skills/` or `~/.codex` config, and (via `.dsagt-managed.json`) + never touch user-authored skills — footprint stays confined to the project dir + and transparent. *Verify:* Codex auto-discovers a project-local `.agents/skills/`. + +--- + +## 10. Implementation outcome (shipped 2026-06-23) + +Supersedes the tier-3 / progressive-disclosure / recency-queue parts of §7.4–7.6 +and §9. + +### 10.1 The research pivot + +Before building the agent piece, we verified (primary-source web research, four +independent passes + adversarial check) whether goose/cline/roo actually lack +native skill discovery. **They don't — every supported agent natively +auto-discovers SKILL.md skills:** + +| Agent | Native dir (project) | Notes | +|---|---|---| +| claude | `.claude/skills` | on by default | +| codex | `.agents/skills` | project-local (repo-root), on by default | +| goose | `.agents/skills` | built-in extension, on by default (also reads `.goose/`, `.claude/`) | +| cline | `.cline/skills` | **opt-in** — Settings → Features → Enable Skills (v3.48, Jan 2026) | +| roo | `.roo/skills` | v3.38 (May 2026); the "Roo shut down" rumor was false | + +**Consequence: tier 3 does not exist.** With no non-native harness there is +nothing to emit a disclosure block *for*. So we did **not** build: the +`` block, the sidecar, the `native_skills` boolean, the budget +threshold, or `disclosure_block()`. And the **recency queue was dropped** — its +sole rationale was the disclosure↔search double-exposure problem, which evaporates +without disclosure. `search` is now stateless. + +### 10.2 Final architecture (two concerns, cleanly split) + +**Materialization** (agent layer) — `AgentSetup.setup_skills(working_dir, config)` +mirrors installed (bundled + project) skills into each agent's `native_skills_dir` +via the manifest-tracked `_mirror_skills_to`. Called **once centrally** in +`agents/__init__.py:dynamic_agent_record` (covers all agents, BYOA + proxy). Each +agent declares `native_skills_dir` (class attr); gated by `skills.populate_native` +(default true). Codex/goose use the cross-agent `.agents/skills` standard. + +**Discovery** (`src/dsagt/skill_discovery.py:SkillRouter`) — stateless: +- `search(query, top_k, tag, skill_name)` — catalog tier + no-embedder keyword + fallback (`skill_keyword.py`, a faithful Genesis port). Installed skills are + natively advertised by every agent, so the catalog (not-yet-installed skills) + is the router's irreplaceable job; the keyword path also covers the + Cline-skills-disabled case. +- `list_sources()` — consolidated synced/indexed view. +- Three thin call sites: MCP `search_skills`, CLI `skills search/list`, + `knowledge_server.list_skill_sources`. + +### 10.3 What shipped + +- `src/dsagt/skill_keyword.py` — Genesis-faithful token-overlap scorer (verified + against upstream source: weights, `elif` bonus tiers, stopwords, tokenizer). +- `src/dsagt/skill_discovery.py` — stateless `SkillRouter`. +- `agents/base.py` — `native_skills_dir` ClassVar + `setup_skills`; called in + `dynamic_agent_record`. `native_skills_dir` set on all five agents; claude's + inline mirror removed (now central). +- Call sites rewired: `registry_server` (search), `cli.py` (search/list), + `knowledge_server` (list_sources). +- **License/NOTICE capture on install (done).** `install_into_project` → + `_capture_attribution`: copytree carries skill-local files; ancestor dirs up to + the cache repo root are walked for `LICENSE` / `NOTICE` / `COPYING` / + `ATTRIBUTION` (clone_github mirrors repo-root files into the cache even for + sparse `subdir` clones), nearest wins, and a `PROVENANCE.txt` is stamped. + `install_skill` surfaces what was preserved. +- **Struck the stale `datacard-generator` shipped skill (done).** Verified first + via the GitLab API + raw SKILL.md that Genesis's copy + (`skills/modcon-skills/datacard-generator`, name `generating-datacards`) has + well-formed frontmatter (not a malformed skip). Removed the dir; `skill-creator` + is now the only bundled skill. README/docs point users to + `dsagt skills add genesis` for it. +- Tests: `test_skill_discovery.py` (15), plus `setup_skills` and + attribution-capture coverage in `test_skills_catalog.py`. **228 passed / + 13 skipped** across all affected suites; ruff + black clean. diff --git a/design-notes/skills-catalog-server-merge.md b/design-notes/skills-catalog-server-merge.md new file mode 100644 index 0000000..a058d06 --- /dev/null +++ b/design-notes/skills-catalog-server-merge.md @@ -0,0 +1,177 @@ +# Design Note — SkillsCatalog encapsulation + MCP server merge + +**Status:** shipped — checklist items 1–5 done; only the deferred trace-driven +tool audit (item 6) remains. The implementation/refactor plan that follows from +`genesis-skills-comparison.md` §10 (the locked skills-routing model) and the +`latex/skills-routing.png` diagram. See §5 for what's left. + +**Date:** 2026-06-23 + +--- + +## 0. Where this comes from + +`genesis-skills-comparison.md` settled the *conceptual* model (catalog tier vs +installed tier; `SkillRouter` as the single skill-MCP hub; Skills Catalog +abstracting chroma+cache; Skill Directory for installed+created; `skill-creator` +for authoring). This note is the *engineering* plan to make the code match, in +three parts done in sequence. + +Already shipped (Steps A + B, tested): +- **Frontmatter-only catalog indexing** — `index_catalog` embeds `name + + description + tags`, not the SKILL.md body (progressive-disclosure level 1; + avoids embedder truncation + signal dilution; consistent with the keyword + scorer). Description also stored in metadata for clean search summaries. +- **Catalog-only retrieval** — `SkillRouter` searches only `skills_catalog__*` + collections; the keyword fallback scans only the clone cache. Installed skills + are no longer search candidates (they're natively discovered). + +--- + +## 1. Part 1 — `SkillsCatalog` module + B2 cleanup + +**Goal:** encapsulate all catalog logic behind one module; stop the dead +installed-skill indexing. + +- **`SkillsCatalog`** (new module) — *composition over `KnowledgeBase`*, NOT a + new KB and NOT a copy of the vector primitives. It holds a `KnowledgeBase` + handle (the host server's existing instance → shared embedder, no second model + load) + the clone-cache dir, and exposes the skill-domain API: + - `sync(source, *, force)` — clone + frontmatter-index into `skills_catalog__` + - `search(query, *, top_k, tag)` — ChromaDB over catalog collections, keyword + fallback over the cache when no embedder + - `install(name, project_dir)` — copy a catalog skill into the project (+ the + license/PROVENANCE capture already implemented) + - `list_sources()` — KNOWN_SOURCES + synced/indexed view + - The skill-specific behavior (frontmatter indexing, keyword fallback, a skill + `CollectionRoute` preset) lives here; the vector store + embedder are the + shared KB. ``SkillRouter`` becomes a thin render/MCP facade over it. +- **B2 cleanup** — drop the now-dead indexing into the `skills` collection + (`SkillRegistry._index_skill` / `save_skill` / `reindex_all`, and the + `install_skill` re-index). The router no longer searches `skills`, so this is + wasted embed work. (Touches a few `TestSaveSkill` / `test_reindex_all` + assertions.) + +Server-agnostic: `SkillsCatalog` takes whatever KB it's handed, so it drops +straight into the merged server in Part 2 with no rework. + +--- + +## 2. Part 2 — merge the two MCP servers + +**Why:** both `registry_server` and `knowledge_server` already construct their own +`KnowledgeBase` (`registry_server.py:881` + knowledge's `main()`), so the split +is pure duplication — two embedders, two Chroma accesses, and a write-here/ +read-there hazard on `skills_catalog__*` (sync in knowledge, search in registry). +The two-process split buys little isolation: the heavy/risky work is already +offloaded (`run_command` → `dsagt-run` subprocess; `kb_ingest` → job thread). + +Principle: **process = deployment unit; module = concern boundary.** Merge into +one server; keep `KnowledgeBase` / `ToolRegistry` / `SkillRegistry` / +`SkillsCatalog` / provenance as distinct modules behind one tool-dispatch shell. + +**Migration surface (the real cost):** +- Entry points: `dsagt-registry-server` + `dsagt-knowledge-server` → one + `dsagt-server` (`pyproject [project.scripts]`). +- **Per-agent MCP config generation** — every agent writes *two* server entries + (`dsagt-registry`, `dsagt-knowledge`) via `_build_mcp_servers_dict` / + `_mcp_server_args`; collapse to one across all five agents. +- Backward compat: existing projects' configs reference two servers; `dsagt + start` regenerates to one (write_dynamic overwrite handles it — verify). +- Tests: `test_registry_server`, `test_knowledge_server`, `test_config` (agent + config gen), `test_server_startup`. + +**Ups:** one embedder / one Chroma owner / no cross-process collection hazard; +one MCP server per agent (simpler config, faster startup, one `init_tracing`). +**Downs:** all tools share one process (isolation loss — bounded by the +offloading above); the migration churn above. + +--- + +## 3. Part 3 — tool-surface audit (SEPARATE, evidence-based, later) + +Today: **23 MCP tools** (11 registry + 12 knowledge). The agent already sees all +23 (connects to both servers), so the merge is *neutral* on count — proliferation +is pre-existing. + +- **Do NOT collapse into `mode=`/`action=` mega-tools.** A union-schema tool with + a mode enum is harder for a model than distinct, well-named tools (it must pick + tool *and* mode). Clear names are the discovery signal. +- **Do a trace-driven prune.** Every tool call is in MLflow — audit which tools + actually get used across real sessions, then remove/defer dead weight (likely + suspects: `kb_add_vector_db`, `kb_get_suggestions`/`kb_dismiss_suggestion`, + `kb_job_status`, `reconstruct_pipeline`, maybe `kb_append`). +- Treat as a deliberate pass *after* the merge — not guessed now. + +--- + +## 4. Sequence / checklist + +1. [x] `SkillRouter` owns the catalog (compose KB + cache). + - [x] catalog-only search + cache-only keyword fallback (Step B). + - [x] frontmatter-only catalog indexing (Step A). + - [x] `search()` no longer requires a skill registry (catalog needs only KB). + - [x] add `SkillRouter.sync(source)` (→ `sync_source`) + `SkillRouter.install(name, dir)` + (→ `install_into_project`) so the router owns all four ops. + - [x] wire `install_skill` (registry_server) → `router.install`; + `add_skill_source` (knowledge_server) → `router.sync`. +2. [x] B2: drop dead `skills`-collection indexing — removed + `SkillRegistry._index_skill` / `reindex_all` + the `save_skill` index call + + the `install_skill` re-index block. **Also** dropped the bundled-skill + indexing in `setup_core_kb` (same dead `skills` collection — full cut, not + in the original enumerated list but the same waste). `SKILLS_COLLECTION` + kept as a back-compat name only (no reader, no writer). `TestSaveSkill` + assertions were already file-based (only docstrings mentioned indexing); + `test_reindex_all` is `ToolRegistry`, untouched. All four skill suites + + `test_setup_core_kb` green; ruff + black clean. +2b. [x] **`SkillsCatalog` extraction** (folded into this pass). New + `SkillsCatalog(kb, cache_dir)` class in `commands/skills_catalog.py` owns + the catalog data plane — `sync` / `install` / `search(→list[dict])` / + `list_sources` + backend selection (ChromaDB vs keyword fallback). + `SkillRouter` is now a thin render/MCP facade holding a `SkillsCatalog` + (built from `kb`/`cache_dir`, or injected via `catalog=` so the server + shares one). All existing `SkillRouter(kb=…, skill_registry=…)` call sites + + tests unchanged. +3. [x] Merge servers → one `create_dsagt_server` + `dsagt-server` entry point. + Extracted `_registry_tools_and_handlers` / `_knowledge_tools_and_handlers`; + `create_*_server` kept as thin test-facing wrappers; `dsagt_server.py` + composes both `(tools, handlers)` under one `Server("dsagt")` with a + type-dispatched `call_tool` (registry str passthrough + knowledge + dict→json + error wrap) and one shared-KB `main()` (the cross-backend + guard now lives once, in `_build_kb_from_config`). Old per-server `main()`s + deleted; their KB-None registry path is gone (merged server requires a KB, + fails fast on misconfigured `api`). +4. [x] Collapse per-agent MCP config to one `dsagt` entry — `base.py` + (`_mcp_server_args()` no-arg, flat `_DSAGT_MCP_ALWAYS_ALLOW`, + `_build_mcp_servers_dict`), `claude` / `goose` (incl. `--with-extension` + in `interactive_command` + `run_script`) / `codex` / `cline` / + `opencode` (BYOA + proxy). `roo` rides `_build_mcp_servers_dict`. `info.py` + span-source buckets + module docstrings updated. +5. [x] Tests + backward-compat. 10 config-shape tests + `test_info` + smoke-test + comments updated to the single-server shape; new `test_dsagt_server.py` + (23-tool composition + both return-type contracts). Compat is + **rebuild-not-migrate** (README upgrade note + cline `.cline-data` caveat); + no migration code. 338 passed / 13 skipped across affected suites; entry + point re-registered via `uv sync`. +6. [ ] (later) trace-driven tool-surface audit. + +## 5. Current state + +**Checklist items 1–5 complete** (this work spanned two sessions). The skills +refactor + server merge has shipped: one `dsagt-server` process, one shared +`KnowledgeBase`, one MCP entry per agent, `SkillsCatalog` owning the catalog data +plane behind a thin `SkillRouter` facade, and the dead `skills`-collection +indexing fully removed. Backward compat is rebuild-not-migrate (README §"MCP +Server" upgrade note). 338 passed / 13 skipped across affected unit suites; +ruff + black clean; `dsagt-server` re-registered. + +**Only item 6 remains — the trace-driven tool-surface audit (§3), deliberately +deferred.** It needs real MLflow traces across sessions to decide which of the 23 +tools are dead weight; it is *not* a guess-now task. Pick it up when there's +trace data to mine. Likely first suspects (from §3): `kb_add_vector_db`, +`kb_get_suggestions`/`kb_dismiss_suggestion`, `kb_job_status`, +`reconstruct_pipeline`, maybe `kb_append`. + +**Key context to re-load in a fresh session:** this note + `genesis-skills-comparison.md` +§7–§10 (the locked model) + the diagram. The long diagram-iteration history is not +needed. diff --git a/docs/architecture.md b/docs/architecture.md index 5724f82..e3d00ba 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -2,18 +2,18 @@ ![DSAgt architecture](assets/architecture.png) -DSAgt wraps an unmodified agent CLI with four independently-operable layers. Each layer exposes its own MCP server so the agent discovers and invokes capabilities through the standard MCP tool protocol. +DSAgt wraps an unmodified agent CLI with four independently-operable layers. The tool-registry and knowledge-base layers are exposed through one MCP server (`dsagt-server`); the agent discovers and invokes their capabilities through the standard MCP tool protocol. ## Layers -**Tool Registry** (`dsagt-registry-server`) -The agent registers CLI tools as markdown files with YAML frontmatter under `/tools/`. The registry server handles dependency installation via `uv run --with` and wraps every execution with `dsagt-run` for provenance capture. The agent discovers tools via `search_registry`. +**Tool Registry** (`dsagt-server`) +The agent registers CLI tools as markdown files with YAML frontmatter under `/tools/`. DSAgt handles dependency installation via `uv run --with` and wraps every execution with `dsagt-run` for provenance capture. The agent discovers tools via `search_registry`. -**Knowledge Base** (`dsagt-knowledge-server`) -Semantic search over six independently-partitioned ChromaDB collections. Three are global (populated by `dsagt setup-kb`); three are per-project (filled automatically during use). Background jobs handle long ingest operations. The agent searches via `kb_search`, ingests via `kb_ingest`, and saves user-confirmed facts via `kb_remember`. +**Knowledge Base** (`dsagt-server`) +Semantic search over six independently-partitioned ChromaDB collections, served by the same process as the tool registry (one shared embedder, one ChromaDB owner). Three collections are global (populated by `dsagt setup-kb`); three are per-project (filled automatically during use). Background jobs handle long ingest operations. The agent searches via `kb_search`, ingests via `kb_ingest`, and saves user-confirmed facts via `kb_remember`. **Provenance** (`dsagt-run`) -A thin wrapper invoked by the registry server around every tool execution. Records the command, arguments, exit code, duration, file counts, and truncated stderr to `/trace_archive/.json` and emits an OTLP span to MLflow. The agent calls `reconstruct_pipeline` to render the trace archive as a reproducible bash script or Snakemake workflow. +A thin wrapper around every registered-tool execution. Records the command, arguments, exit code, duration, file counts, and truncated stderr to `/trace_archive/.json` and emits an OTLP span to MLflow. The agent calls `reconstruct_pipeline` to render the trace archive as a reproducible bash script or Snakemake workflow. **Observability** (MLflow + OTLP) MLflow runs locally at a port pinned at `dsagt init` time. All four layers emit OTLP HTTP spans to MLflow's `/v1/traces` endpoint. The agent's own LLM-call traces land in the same store when you export the `OTEL_EXPORTER_OTLP_ENDPOINT` printed by `dsagt init`. diff --git a/docs/assets/architecture.png b/docs/assets/architecture.png index 7dbdf46..013ca20 100644 Binary files a/docs/assets/architecture.png and b/docs/assets/architecture.png differ diff --git a/docs/assets/skills-routing.png b/docs/assets/skills-routing.png new file mode 100644 index 0000000..15c486d Binary files /dev/null and b/docs/assets/skills-routing.png differ diff --git a/docs/cli.md b/docs/cli.md index e4eb4d6..d2f8223 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -1,6 +1,6 @@ # CLI Reference -All commands are available after installing DSAgt. +All commands are available after [installation](index.md#installation) and activating your virtual environment. ## Project Management @@ -42,8 +42,7 @@ These are launched automatically by `dsagt init` via the per-agent MCP config an | Command | Description | |---------|-------------| -| `dsagt-registry-server` | Tool registry MCP server | -| `dsagt-knowledge-server` | Knowledge base MCP server | +| `dsagt-server` | MCP server — tool registry + knowledge base | | `dsagt-run` | Provenance-capturing tool execution wrapper | | `dsagt-proxy` | LiteLLM proxy server (proxy mode only) | | `dsagt-setup-kb` | Core knowledge base setup (called by `dsagt setup-kb`) | diff --git a/docs/developer.md b/docs/developer.md index 9ffbb2e..09191ae 100644 --- a/docs/developer.md +++ b/docs/developer.md @@ -28,11 +28,10 @@ Proxy mode reads upstream LLM credentials from `.env` or the shell. See [`tests/ **Agent command not found.** The agent CLI is not installed or is not on PATH. See the [supported agents table](index.md#supported-agents). -**MCP servers not connecting.** Verify the server commands are on your PATH: +**MCP server not connecting.** Verify uv resolves the server command: ```bash -which dsagt-registry-server -which dsagt-knowledge-server +uv run which dsagt-server ``` If missing, reinstall: `pip install --force-reinstall https://github.com/AI-ModCon/dsagt/archive/refs/tags/0.1.0.zip`. diff --git a/docs/index.md b/docs/index.md index 1b7fec3..5f963fc 100644 --- a/docs/index.md +++ b/docs/index.md @@ -6,22 +6,34 @@ DSAgt connects an MCP-compatible AI coding agent to tool registration, a semanti ## Supported Agents -| Agent | Install | Verify | -|-------|---------|--------| -| [Claude Code](https://github.com/anthropics/claude-code) | `npm i -g @anthropic-ai/claude-code` | `claude --version` | -| [Goose](https://github.com/block/goose) | See [Goose docs](https://github.com/block/goose#installation) | `goose --version` | -| [Codex](https://github.com/openai/codex) | `npm i -g @openai/codex` | `codex --version` | -| [opencode](https://github.com/sst/opencode) | See [opencode docs](https://opencode.ai/docs/) | `opencode --version` | -| [Roo Code](https://github.com/RooCodeInc/Roo-Code) | `npm i -g @roo-code/cli` | `roo --version` | -| [Cline](https://github.com/cline/cline) | `npm i -g cline` | `cline --version` | + +{% + include-markdown "../README.md" + start="" + end="" +%} ## Prerequisites -- Python 3.12–3.13 +- Python 3.12 or 3.13 - One of the supported agent platforms above, installed and authenticated against your LLM provider +- [uv](https://github.com/astral-sh/uv) — only for the development install ## Installation +### For use (no development) + + +{% + include-markdown "../README.md" + start="" + end="" +%} + +### For development + +Clone the repo and use `uv` (editable install; add `--all-groups` for the test suite): + ```bash pip install https://github.com/AI-ModCon/dsagt/archive/refs/tags/0.1.0.zip ``` diff --git a/docs/knowledge-base.md b/docs/knowledge-base.md index 3370677..1d47afc 100644 --- a/docs/knowledge-base.md +++ b/docs/knowledge-base.md @@ -7,7 +7,7 @@ DSAgt maintains six independently-partitioned ChromaDB collections. The first th | Collection | Source | Populated by | |---|---|---| | **Tool Specs** | Bundled CLI tool specs in `src/dsagt/tools/` | `dsagt setup-kb` | -| **Skills** | Bundled skill workflows in `src/dsagt/skills/` | `dsagt setup-kb` | +| **Skills Catalog** | Installable skills from external repos (one `skills_catalog__` collection per source), frontmatter-indexed | `dsagt setup-kb` (default source) + `add_skill_source` | | **Domain Knowledge** | NeMo Curator + AIDRIN reference corpora; user-ingested docs | `dsagt setup-kb` + agent's `kb_ingest` | | **Explicit Memory** | User-confirmed facts | Agent's `kb_remember` (also written to `/explicit_memories.yaml`) | | **Episodic Memory** | Distilled facts from MLflow traces | `dsagt memory --project ` | @@ -23,7 +23,7 @@ Explicit memories are facts the user confirms during a session. The agent saves ## Search -The agent searches all collections via `kb_search` (knowledge MCP server) and writes via `kb_ingest` / `kb_remember`. Tool Specs and Skills are queried through specialized routes (`search_registry`, `search_skills`) over the same backend. +The agent searches all collections via `kb_search` and writes via `kb_ingest` / `kb_remember`. Registered tools have their own `search_registry` route over the same backend. Skills are discovered separately — installed ones natively by the agent, installable ones via `search_skills` over the external catalog (see [Tools & Skills](tools-skills.md)). Hybrid search (dense embeddings + sparse BM25 via Reciprocal Rank Fusion) is on by default per collection route. Cross-encoder reranking is optional. @@ -37,4 +37,4 @@ dsagt setup-kb --embedding-backend api \ --embedding-api-key ``` -The Tool Specs and Skills collections are wiped and rebuilt on every `setup-kb` run — re-run after upgrading DSAgt to pick up new bundled assets. +The Tool Specs collection is wiped and rebuilt on every `setup-kb` run — re-run after upgrading DSAgt to pick up new bundled tools. (Bundled skills are not indexed — agents auto-discover them natively.) diff --git a/docs/mcp-servers.md b/docs/mcp-servers.md index ee999cc..4dc922f 100644 --- a/docs/mcp-servers.md +++ b/docs/mcp-servers.md @@ -1,12 +1,12 @@ -# MCP Servers +# MCP Server -DSAgt exposes its capabilities through two MCP servers. Both are launched automatically by `dsagt init` and configured in the per-agent runtime file (`.mcp.json` for Claude Code, `goose.yaml` for Goose, etc.). +DSAgt exposes its capabilities through a single MCP server, **`dsagt-server`**, configured in the per-agent runtime file (`.mcp.json` for Claude Code, `goose.yaml` for Goose, etc.) and launched automatically when the agent starts. It bundles four concern areas — a tool registry, a knowledge base, explicit memory, and skill discovery — behind one process with one shared embedder and one ChromaDB owner. -## Registry Server +> Earlier versions ran two separate servers (`dsagt-registry-server` + `dsagt-knowledge-server`), merged in 0.2.0. Re-run `dsagt start ` on an existing project to regenerate its config against the single server (for cline, delete `/.cline-data` first). -**Command:** `dsagt-registry-server` +## Registry tools -Handles tool registration, dependency installation, and tool discovery. +Tool registration, dependency installation, and tool discovery. | Tool | Description | |------|-------------| @@ -17,9 +17,7 @@ Handles tool registration, dependency installation, and tool discovery. Tools are markdown files with YAML frontmatter under `/tools/`. Executables are wrapped with `dsagt-run` for provenance and `uv run --with` for Python dependencies. -## Knowledge Server - -**Command:** `dsagt-knowledge-server` +## Knowledge tools Semantic search and ingestion over indexed document collections. @@ -29,7 +27,7 @@ Semantic search and ingestion over indexed document collections. | `kb_ingest` | Index a file or directory into a named collection (runs in background for large corpora) | | `kb_remember` | Save a user-confirmed fact to explicit memory | | `kb_get_memories` | Retrieve explicit memories for the current project | -| `search_skills` | Discover agent skill workflows | +| `search_skills` | Discover installable skills in the external catalog (installed skills are auto-discovered natively) | ### Backend diff --git a/docs/quickstart.md b/docs/quickstart.md index 32d6702..07e09cb 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -48,8 +48,8 @@ dsagt stop quickstart | Prompt | DSAgt layer | |--------|-------------| -| 1 | Knowledge MCP server (`kb_ingest`) — chunks and indexes docs into ChromaDB | -| 2 | Registry MCP server (`save_tool_spec`) — writes `tools/csvcut.md`, etc. | +| 1 | `dsagt-server` (`kb_ingest`) — chunks and indexes docs into ChromaDB | +| 2 | `dsagt-server` (`save_tool_spec`) — writes `tools/csvcut.md`, etc. | | 3 | `dsagt-run` provenance wrapper — records exec layer to `trace_archive/` | | 4 | KB recall via `kb_search` and registered tool execution | | 5–6 | Explicit memory (`kb_remember` → `explicit_memories.yaml`) + `kb_get_memories` | @@ -85,9 +85,9 @@ dsagt setup-kb --embedding-backend api --embedding-base-url ... --embedding-api- Three collections are populated: - **Tool Specs** — DSAgt's bundled tool specs from `src/dsagt/tools/`, tagged `source: bundled`. -- **Skills** — DSAgt's bundled skill workflows from `src/dsagt/skills/`. +- **Skills Catalog** — the default external skill source (`scientific`), cloned and frontmatter-indexed so `search_skills` has installable skills out of the box. - **Domain Knowledge** — NeMo Curator and AI Data Readiness Inspector reference corpora. -The Tool Specs and Skills collections are wiped and rebuilt on every run, so re-run `setup-kb` after upgrading DSAgt. +The Tool Specs collection is wiped and rebuilt on every run, so re-run `setup-kb` after upgrading DSAgt. (Bundled skills are not indexed — agents auto-discover them natively.) The default embedder is a local sentence-transformers model (~130 MB, CPU-only, no API key). Pass `--embedding-backend api` to route through a hosted embedder via LiteLLM. diff --git a/docs/tools-skills.md b/docs/tools-skills.md index 6c057bd..8e218a0 100644 --- a/docs/tools-skills.md +++ b/docs/tools-skills.md @@ -2,7 +2,7 @@ ## Tools -Tools are CLI executables defined as markdown files with YAML frontmatter under `/tools/`. The agent registers new tools via the registry MCP server's `save_tool_spec` tool. +Tools are CLI executables defined as markdown files with YAML frontmatter under `/tools/`. The agent registers new tools via the MCP server's `save_tool_spec` tool. A tool spec includes: @@ -24,7 +24,7 @@ Prints descriptive statistics for all columns in a CSV file. Usage: csvstat [options] [FILE] ``` -The registry server wraps every registered tool with `dsagt-run` for provenance capture and `uv run --with` for Python dependencies, so the agent can call any tool without managing environments manually. +DSAgt wraps every registered tool with `dsagt-run` for provenance capture and `uv run --with` for Python dependencies, so the agent can call any tool without managing environments manually. ### Bundled Tools @@ -32,12 +32,30 @@ DSAgt ships a `scan_directory` tool in `src/dsagt/tools/` that is indexed into t ## Skills -Skills are instruction-based agent workflows in `/skills/`. Each skill is a directory containing a `SKILL.md` file and optional reference documents. The agent discovers skills via `search_skills`. +Skills are instruction-based agent workflows in `/skills/`. Each skill is a directory containing a `SKILL.md` file and optional reference documents. + +### Skill discovery architecture + +![DSAgt skill routing](assets/skills-routing.png) + +Skills live in **two tiers**, and a single MCP service — the **SkillRouter** — is the one entry point that routes every skill operation between them: + +- **Catalog tier** — skills that exist in external repositories but are *not yet installed*. DSAgt federates many sources (`scientific`, `anthropic`, `antigravity`, `composio`, `genesis`, or any git URL); each is sparse-cloned and indexed into its own per-source ChromaDB collection. The agent browses this tier with `search_skills` and manages sources with `add_skill_source` / `list_skill_sources`. +- **Installed + created tier** — skills drawn into the project's Skill Directory (`/skills/`), either installed from the catalog (`install_skill`) or authored in place (e.g. with the bundled `skill-creator`). At `dsagt start` these are mirrored into each agent's *native* skill directory (`.claude/`, `.agents/`, `.cline/`, `.roo/`), where the agent auto-discovers and auto-invokes them. + +The diagram's three bands trace a skill's lifecycle: **Discovery** (the router) → **Registration** (the searchable catalog) → **Progressive Exposure** (the native Skill Directory the agent loads on its own). + +#### Why this shape + +- **Search is catalog-only.** Every supported agent (Claude, Codex, Goose, Cline, Roo) natively auto-discovers `SKILL.md` folders, so installed skills never need to be indexed or returned by a tool — the harness already loads them. That frees `search_skills` to do the one thing native discovery cannot: browse a catalog of potentially thousands of *un*installed skills without holding them all in context. Catalogs are indexed on frontmatter (name + description + tags), which keeps those summaries compact and avoids diluting the embedding with full SKILL.md bodies. +- **Keyword fallback, no embedder required.** When no embedding model is configured, the router falls back to a zero-dependency token-overlap scorer over the local clone cache, so `search_skills` still works (just less fuzzy) — no model download, no API key. +- **One router, not scattered policy.** Backend selection (semantic vs. keyword), the catalog/installed split, and source bookkeeping all live in the SkillRouter rather than being re-implemented at each MCP and CLI call site, so the behavior can't drift between them. +- **Federated and provenance-preserving.** Each source is an independent per-source collection, so re-syncing one never disturbs another; installing a catalog skill preserves its upstream `LICENSE`/`NOTICE` and stamps a `PROVENANCE.txt` into the installed directory. ### Bundled Skills -DSAgt ships a `datacard-generator` skill in `src/dsagt/skills/` with reference templates for generating dataset documentation. It is indexed into the global Skills collection by `dsagt setup-kb`. +DSAgt ships a `skill-creator` skill in `src/dsagt/skills/` (for scaffolding new SKILL.md skills). Bundled and installed skills are **not** indexed for search — every supported agent natively auto-discovers `SKILL.md` folders, so `search_skills` is reserved for the *catalog* tier (skills you can install but haven't yet). Domain skills — including the MODCON `datacard-generator` — are sourced from external catalogs (`dsagt skills add genesis`) rather than bundled, so they stay current upstream. ### Adding Skills -Place a new directory under `/skills/` with a `SKILL.md` describing the workflow. The knowledge server indexes it automatically on next startup, or trigger a re-index via `kb_ingest`. +Place a new directory under `/skills/` with a `SKILL.md` describing the workflow. The next `dsagt start` mirrors it into the agent's native skill directory (e.g. `.claude/skills/`), after which the agent auto-discovers and invokes it — no indexing step. diff --git a/latex/architecture.png b/latex/architecture.png index 7dbdf46..013ca20 100644 Binary files a/latex/architecture.png and b/latex/architecture.png differ diff --git a/latex/dsagt.tex b/latex/dsagt.tex index c4c7a05..68cc740 100644 --- a/latex/dsagt.tex +++ b/latex/dsagt.tex @@ -343,19 +343,29 @@ \section{Architecture} % Knowledge centered above (Explicit, Domain); Registry above (Skills, % Tool Specs); MLflow directly above Episodic; dsagt-run directly above % Tool Records. All x-coordinates fall out of the row-3 grid below. -\node[svc] (knowledge) at (-0.9,-2) {Knowledge\\Server}; -\node[svc] (registry) at (3.5,-2) {Registry\\Server}; +\node[svc] (knowledge) at (-0.9,-2) {Knowledge}; +\node[svc] (registry) at (3.5,-2) {Registry}; \node[art] (mlflow) at (6.8,-2) {MLflow\\Traces}; % dsagt-run width matches the artifact boxes (18mm) so it visually % pairs with MLflow Traces and Tool Records on either side. \node[svc, minimum width=18mm] (dsagtrun) at (9.0,-2) {\texttt{dsagt-run}}; +% The Knowledge and Registry tool surfaces are now exposed by ONE merged +% MCP server (dsagt-server). A single container box spans + sits behind +% both (background layer, lighter fill) so the two inner boxes still read +% as distinct concern modules while the bridge conveys "one server". +\begin{scope}[on background layer] + \node[draw, rounded corners=3pt, thick, fill=green!6, + fit=(knowledge)(registry), inner sep=1.2mm] (mcpbox) {}; +\end{scope} + % Agent <-> MCP servers (symmetric forks; bidirectional because MCP is % request/response — the server replies travel back to the agent). \draw[bilink] (agent.south) -- ++(0,-0.3) -| (knowledge.north); \draw[bilink] (agent.south) -- ++(0,-0.3) -| (registry.north); -% MCP label centered below the agent, between the two MCP-server forks. -\node[lbl] at (1.3,-1.2) {MCP}; +% MCP label in the center of the merged-server box, in the gap between the +% Knowledge and Registry modules — the bridge that joins them into one server. +\node[lbl] at (1.3,-2) {MCP}; % Agent -> MLflow + dsagt-run (fork off the same trunk as the MCP servers, % drop vertically directly above each box). @@ -383,7 +393,7 @@ \section{Architecture} % on every registered-tool invocation. \node[art] (explicit) at (-2.0,-4) {Explicit\\Memory}; \node[art] (domain) at (0.2,-4) {Domain\\Knowledge}; -\node[art] (skills) at (2.4,-4) {Skills}; +\node[art] (skills) at (2.4,-4) {Skills\\Catalog}; \node[art] (tools) at (4.6,-4) {Tool\\Specs}; \node[art] (episodic) at (6.8,-4) {Episodic\\Memory}; \node[art] (history) at (9.0,-4) {Tool\\Records}; diff --git a/latex/skills-routing.png b/latex/skills-routing.png new file mode 100644 index 0000000..15c486d Binary files /dev/null and b/latex/skills-routing.png differ diff --git a/latex/skills-routing.tex b/latex/skills-routing.tex new file mode 100644 index 0000000..f4a9d89 --- /dev/null +++ b/latex/skills-routing.tex @@ -0,0 +1,89 @@ +% Skills routing diagram — companion to Figure 1 (architecture) in dsagt.tex. +% Standalone so it exports straight to latex/skills-routing.png: +% pdflatex skills-routing.tex +% pdftoppm -png -r 175 -singlefile skills-routing.pdf skills-routing +% +% Colour semantics (see legend): blue = external (skill repos, agent), +% green = DSAgt service (SkillRouter), orange = DSAgt store / skill (Skills +% Catalog, Skill Directory, the bundled skill-creator). Concern bands are +% uniform faint outlines. SkillRouter is the single skill-MCP entry point: +% every skill tool is labelled on the router operation it drives. +\documentclass[border=2mm]{standalone} +\usepackage{tikz} +\usepackage{lmodern} +\usetikzlibrary{positioning, arrows.meta, fit, backgrounds, calc} + +\begin{document} +\begin{tikzpicture}[ + >={Stealth[length=2mm]}, + box/.style={ + draw, rounded corners=2pt, thick, + minimum height=12mm, align=center, + font=\footnotesize, fill=white}, + ext/.style={box, fill=blue!12}, % external (repos, agent) + svc/.style={box, fill=green!12}, % DSAgt service + store/.style={box, fill=orange!14}, % DSAgt store / skill + skill/.style={draw, rounded corners=2pt, thick, fill=white, + font=\scriptsize, minimum height=5mm, align=center}, + link/.style={->, thick, blue!60!black}, + bilink/.style={<->, thick, blue!60!black}, + lbl/.style={font=\scriptsize\itshape, text=gray!55!black, align=center}, + concern/.style={font=\footnotesize\bfseries, text=gray!62!black}, + band/.style={draw=gray!55, dashed, rounded corners, fill=black!3}, +] + +% ============================ legend ==================================== +\node[ext, minimum width=5mm, minimum height=4mm] (lk1) at (-1.6,7.4) {}; +\node[right=1mm of lk1, font=\scriptsize, anchor=west] (lt1) {External}; +\node[svc, minimum width=5mm, minimum height=4mm, right=14mm of lk1] (lk2) {}; +\node[right=1mm of lk2, font=\scriptsize, anchor=west] {DSAgt service}; +\node[store, minimum width=5mm, minimum height=4mm, right=24mm of lk2] (lk3) {}; +\node[right=1mm of lk3, font=\scriptsize, anchor=west] {DSAgt store / skill}; + +% ============================ external (outside bands) ================== +\node[ext, minimum width=32mm] (extrepos) at (-0.2,0.6) + {External Skills Repos\\{\scriptsize scientific $\cdot$ anthropic $\cdot$ antigravity}\\{\scriptsize composio $\cdot$ genesis $\cdot$ any git URL}}; + +\node[ext, minimum width=22mm] (agent) at (16.8,3.8) {Agent}; + +% ============================ DSAgt stores ============================== +\node[store, minimum width=34mm] (catalog) at (5.6,0.6) + {\textbf{Skills Catalog}\\{\scriptsize federated across sources}\\{\scriptsize searchable $\cdot$ not yet installed}}; + +\node[store, minimum width=46mm, minimum height=26mm] (native) at (11.6,0.9) {}; +\node[anchor=north, align=center, font=\footnotesize] + at ([yshift=-1.6mm]native.north) + {Skill Directory\\{\scriptsize installed + created}\\{\scriptsize .claude $\cdot$ .agents $\cdot$ .cline $\cdot$ .roo /skills}}; +\node[skill, minimum width=26mm, anchor=south] at ([yshift=2mm]native.south) + {\texttt{skill-creator}}; + +% ============================ DSAgt service ============================= +\node[svc, minimum width=42mm] (router) at (5.6,5.8) + {\texttt{SkillRouter}\\{\scriptsize search $\cdot$ keyword fallback $\cdot$ list\_sources}}; + +% ============================ router operations (skill MCP tools) ======= +\draw[link] (router.west) -| (extrepos.north) + node[lbl, pos=0.25, above] {add\_skill\_source\\list\_skill\_sources}; +\draw[bilink] (router.south) -- (catalog.north) + node[lbl, pos=0.5, right] {search\_skills}; +\draw[link] (router.east) -| (native.north) + node[lbl, pos=0.25, above] {install\_skill}; +\draw[bilink] (router.north) -- ++(0,0.5) -| (agent.north) + node[lbl, pos=0.25, above] {MCP}; + +% ============================ agent <-> skill directory (two-way) ======= +\draw[bilink] (native.east) -| (agent.south) + node[lbl, pos=0.22, below, align=center] {auto-invoke\\author}; + +% ============================ concern bands ============================= +\begin{scope}[on background layer] + \node[band, fit=(catalog), inner sep=3mm] (regband) {}; + \node[band, fit=(native), inner sep=3mm] (expband) {}; + \node[band, fit=(router), inner sep=3mm] (disband) {}; +\end{scope} +\node[concern] at (disband.north) [above=0.6mm] {DISCOVERY}; +\node[concern] at (regband.south) [below=0.6mm] {REGISTRATION}; +\node[concern] at (expband.south) [below=0.6mm] {PROGRESSIVE EXPOSURE}; + +\end{tikzpicture} +\end{document} diff --git a/mkdocs.yml b/mkdocs.yml index 5fd83a0..dda4d9d 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -31,6 +31,10 @@ theme: - content.code.copy - content.code.annotate +plugins: + - search + - include-markdown + markdown_extensions: - admonition - pymdownx.details diff --git a/pyproject.toml b/pyproject.toml index bcd0c77..bde3b3c 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "dsagt" -version = "0.1.0" +dynamic = ["version"] description = "DataSmith Agent - AI-assisted data pipeline builder" readme = "README.md" requires-python = ">=3.12,<3.14" @@ -60,8 +60,7 @@ dependencies = [ dsagt = "dsagt.commands.cli:main" dsagt-run = "dsagt.commands.run_tool:main" dsagt-proxy = "dsagt.commands.proxy_server:main" -dsagt-registry-server = "dsagt.commands.registry_server:main" -dsagt-knowledge-server = "dsagt.commands.knowledge_server:main" +dsagt-server = "dsagt.mcp.server:main" dsagt-setup-kb = "dsagt.commands.setup_core_kb:main" [dependency-groups] @@ -72,6 +71,7 @@ dev = [ ] docs = [ "mkdocs-material>=9.5", + "mkdocs-include-markdown-plugin>=6.0", ] [build-system] @@ -81,6 +81,9 @@ build-backend = "setuptools.build_meta" [tool.setuptools] package-dir = {"" = "src"} +[tool.setuptools.dynamic] +version = {attr = "dsagt.__version__"} + [tool.setuptools.packages.find] where = ["src"] diff --git a/src/dsagt/__init__.py b/src/dsagt/__init__.py index 5ad0529..8337be1 100644 --- a/src/dsagt/__init__.py +++ b/src/dsagt/__init__.py @@ -4,7 +4,9 @@ AI-assisted data pipeline builder for MCP-compatible agents. """ -__version__ = "0.1.0" +# Single source of truth for the package version: pyproject.toml reads this +# via `[tool.setuptools.dynamic] version = {attr = "dsagt.__version__"}`. +__version__ = "0.2.0" # Cap CPU thread count for embedding / tokenization libraries before any # heavy imports happen. Without this, PyTorch / sentence-transformers / diff --git a/src/dsagt/agents/__init__.py b/src/dsagt/agents/__init__.py index 2db06de..89fa517 100644 --- a/src/dsagt/agents/__init__.py +++ b/src/dsagt/agents/__init__.py @@ -362,6 +362,9 @@ def dynamic_agent_record( Path(working_dir), Path(config["project_dir"]), ) + # Mirror installed skills into the agent's native skills dir (all agents, + # both modes). Central here so each agent only declares native_skills_dir. + actions += setup.setup_skills(Path(working_dir), config) # Launch shim is BYOA-only. Skip when proxy mode is active — # ``dsagt start --enable-proxy`` is the only sensible entry point # in that mode (proxy URL must be plumbed through agent env). diff --git a/src/dsagt/agents/base.py b/src/dsagt/agents/base.py index 2835d80..928e114 100644 --- a/src/dsagt/agents/base.py +++ b/src/dsagt/agents/base.py @@ -13,6 +13,7 @@ import json import logging import shlex +import shutil import subprocess from abc import ABC, abstractmethod from pathlib import Path @@ -31,37 +32,37 @@ # files between init and start without losing edits on the next start. _DSAGT_MARKER = "DSAgt Pipeline Builder" -# Tools each dsagt MCP server exposes — listed in ``alwaysAllow`` so roo +# Tools the dsagt MCP server exposes — listed in ``alwaysAllow`` so roo # and cline auto-approve them without a human-in-the-loop prompt. Keep in -# sync with ``commands/registry_server.py`` and -# ``commands/knowledge_server.py`` tool registrations; a tool added there -# but not here means roo/cline will hang on its first call. -_DSAGT_MCP_ALWAYS_ALLOW = { - "registry": [ - "get_registry", - "http_request", - "install_dependencies", - "read_file", - "reconstruct_pipeline", - "run_command", - "save_skill", - "save_tool_spec", - "search_registry", - "search_skills", - ], - "knowledge": [ - "kb_add_vector_db", - "kb_append", - "kb_dismiss_suggestion", - "kb_get_memories", - "kb_get_suggestions", - "kb_ingest", - "kb_job_status", - "kb_list_collections", - "kb_remember", - "kb_search", - ], -} +# sync with the ``mcp/*_tools.py`` tool registrations (registry / knowledge / +# memory / skill); a tool added there but not here means roo/cline will hang on +# its first call. All dsagt MCP tools live behind the single ``dsagt-server``, +# so the always-allow list is one flat union. +_DSAGT_MCP_ALWAYS_ALLOW = [ + "add_skill_source", + "get_registry", + "http_request", + "install_dependencies", + "install_skill", + "kb_add_vector_db", + "kb_append", + "kb_dismiss_suggestion", + "kb_get_memories", + "kb_get_suggestions", + "kb_ingest", + "kb_job_status", + "kb_list_collections", + "kb_remember", + "kb_search", + "list_skill_sources", + "read_file", + "reconstruct_pipeline", + "run_command", + "save_skill", + "save_tool_spec", + "search_registry", + "search_skills", +] # Sentinel API key planted in agent / MCP-child env when the optional @@ -131,13 +132,13 @@ def _openai_env(llm: dict) -> dict[str, str]: # --------------------------------------------------------------------------- -def _mcp_server_args(server: str) -> list[str]: - """Build the argv tail for ``uv run dsagt--server``. +def _mcp_server_args() -> list[str]: + """Build the argv tail for ``uv run dsagt-server``. - Both servers read all configuration from ``DSAGT_PROJECT_DIR`` (env) - and dsagt_config.yaml — no CLI args needed. + The single merged server reads all configuration from the project's + ``dsagt_config.yaml`` (located via cwd-walk) — no CLI args needed. """ - return ["run", f"dsagt-{server}-server"] + return ["run", "dsagt-server"] def _mcp_env_block(config: dict) -> dict[str, str]: @@ -212,26 +213,100 @@ def _append_or_write(path: Path, content: str, marker: str) -> str | None: return f"Wrote {path}" +#: Claude Code caps a skill's frontmatter description (combined with +#: when_to_use) at this many characters; longer ones are rejected. We +#: truncate the *mirrored* copy only, never the project source. +_NATIVE_DESCRIPTION_CAP = 1536 + +#: Manifest filename inside a native skills dir listing the skill names +#: dsagt placed there, so the mirror can reap its own stale entries on +#: re-run without ever touching user-authored skills. +_SKILL_MANIFEST = ".dsagt-managed.json" + + +def _truncate_native_description(skill_md: Path) -> None: + """If the mirrored SKILL.md's description exceeds the native cap, trim it.""" + import yaml + + text = skill_md.read_text() + if not text.startswith("---"): + return + parts = text.split("---", 2) + if len(parts) < 3: + return + try: + front = yaml.safe_load(parts[1]) or {} + except yaml.YAMLError: + return + desc = front.get("description") + if isinstance(desc, str) and len(desc) > _NATIVE_DESCRIPTION_CAP: + front["description"] = desc[: _NATIVE_DESCRIPTION_CAP - 1].rstrip() + "…" + new_front = yaml.dump(front, default_flow_style=False, sort_keys=False) + skill_md.write_text(f"---\n{new_front}---{parts[2]}") + + +def _mirror_skills_to(target_dir: Path, skill_dirs: list[Path]) -> list[str]: + """Idempotently mirror *skill_dirs* into *target_dir* (e.g. .claude/skills). + + Copies each skill directory (SKILL.md + scripts/ + references/) under + ``target_dir//``. A manifest tracks the names dsagt owns so a + later run reaps skills that were removed upstream **without ever + touching user-authored skills** that dsagt didn't place. ``skill_dirs`` + should list bundled dirs before project dirs so a project skill wins a + name collision (copied last). + """ + actions: list[str] = [] + manifest_path = target_dir / _SKILL_MANIFEST + previously: list[str] = [] + if manifest_path.exists(): + try: + previously = json.loads(manifest_path.read_text()) + except (json.JSONDecodeError, OSError): + previously = [] + + target_dir.mkdir(parents=True, exist_ok=True) + managed: list[str] = [] + for src in skill_dirs: + if not (src / "SKILL.md").exists(): + continue + name = src.name + dest = target_dir / name + if dest.exists(): + shutil.rmtree(dest) + shutil.copytree(src, dest) + _truncate_native_description(dest / "SKILL.md") + if name not in managed: + managed.append(name) + + # Reap skills dsagt placed previously that are gone from the source set. + for stale in set(previously) - set(managed): + stale_dir = target_dir / stale + if stale_dir.is_dir(): + shutil.rmtree(stale_dir, ignore_errors=True) + + manifest_path.write_text(json.dumps(sorted(managed), indent=2) + "\n") + if managed: + actions.append(f"Mirrored {len(managed)} skill(s) into {target_dir}") + return actions + + def _build_mcp_servers_dict(env_block: dict | None) -> dict: - """Build the standard ``{"mcpServers": {...}}`` dict for dsagt servers. + """Build the standard ``{"mcpServers": {...}}`` dict for the dsagt server. Used by agents that load MCP config from a JSON file (roo via ``.roo/mcp.json``). Claude Code uses the same shape via ``.mcp.json`` but builds it inline in :class:`ClaudeSetup.write_dynamic`. Cline - doesn't use this — it requires ``cline mcp add`` to register servers. + doesn't use this — it requires ``cline mcp add`` to register the server. """ - mcp_config: dict = {"mcpServers": {}} - for server in ("registry", "knowledge"): - entry: dict = { - "command": "uv", - "args": _mcp_server_args(server), - "disabled": False, - "alwaysAllow": _DSAGT_MCP_ALWAYS_ALLOW[server], - } - if env_block: - entry["env"] = env_block - mcp_config["mcpServers"][f"dsagt-{server}"] = entry - return mcp_config + entry: dict = { + "command": "uv", + "args": _mcp_server_args(), + "disabled": False, + "alwaysAllow": _DSAGT_MCP_ALWAYS_ALLOW, + } + if env_block: + entry["env"] = env_block + return {"mcpServers": {"dsagt": entry}} def _toml_quote(value: str) -> str: @@ -472,6 +547,15 @@ def __init__(self, proxy_port: int | None = None): #: See agents/.py docstrings for the per-agent investigation. otel_payload_support: ClassVar[str] = "full" + #: Directory (relative to the working dir) the agent natively auto-discovers + #: ``SKILL.md`` skill folders from. ``setup_skills`` mirrors installed + #: (bundled + project) skills here so the agent discovers/auto-invokes them + #: without an MCP round-trip. Every supported agent has one — claude + #: ``.claude/skills``, codex/goose ``.agents/skills`` (the cross-agent + #: standard), cline ``.cline/skills``, roo ``.roo/skills``. ``None`` would + #: mean the agent has no native skill discovery (none currently). + native_skills_dir: ClassVar[str | None] = None + @abstractmethod def write_static(self, working_dir: Path) -> list[str]: """Write the agent's instructions file + any state directories. @@ -496,6 +580,29 @@ def write_dynamic( Returns a list of one-line action descriptions. """ + def setup_skills(self, working_dir: Path, config: dict) -> list[str]: + """Mirror installed (bundled + project) skills into the agent's native + skills dir so it auto-discovers/auto-invokes them. + + Mode-independent (runs for BYOA and proxy alike) and idempotent — the + manifest-tracked :func:`_mirror_skills_to` only reaps skills dsagt + placed, never user-authored ones. No-op when the agent declares no + ``native_skills_dir`` or ``skills.populate_native`` is disabled. + """ + if not self.native_skills_dir: + return [] + if not (config.get("skills") or {}).get("populate_native", True): + return [] + from dsagt.registry import SkillRegistry + + reg = SkillRegistry(runtime_dir=working_dir, kb=None) + # Bundled first, project last → project wins name collisions. + src_dirs = reg._bundled_skill_dirs() + reg._project_skill_dirs() + target = working_dir + for part in self.native_skills_dir.split("/"): + target = target / part + return _mirror_skills_to(target, src_dirs) + def runtime_env(self, config: dict) -> dict[str, str]: """Dsagt-owned env vars the agent process needs at runtime (BYOA). diff --git a/src/dsagt/agents/claude.py b/src/dsagt/agents/claude.py index f2a4094..1be23ad 100644 --- a/src/dsagt/agents/claude.py +++ b/src/dsagt/agents/claude.py @@ -82,6 +82,7 @@ class ClaudeSetup(AgentSetup): name = "claude" base_command = ["claude"] static_marker = "CLAUDE.md" + native_skills_dir = ".claude/skills" install_hint = "Install with `npm i -g @anthropic-ai/claude-code`." # Anthropic-protocol native; cross-protocol routing requires the proxy. credential_env_vars = ( @@ -157,17 +158,20 @@ def write_dynamic( actions: list[str] = [] env_block = _mcp_env_block(config) - mcp_config: dict = {"mcpServers": {}} - for server in ("registry", "knowledge"): - entry: dict = {"command": "uv", "args": _mcp_server_args(server)} - if env_block: - entry["env"] = env_block - mcp_config["mcpServers"][f"dsagt-{server}"] = entry + entry: dict = {"command": "uv", "args": _mcp_server_args()} + if env_block: + entry["env"] = env_block + mcp_config: dict = {"mcpServers": {"dsagt": entry}} mcp_path = working_dir / ".mcp.json" mcp_path.write_text(json.dumps(mcp_config, indent=2) + "\n") actions.append(f"Wrote {mcp_path}") + # Skills are mirrored into .claude/skills/ centrally via + # AgentSetup.setup_skills (driven by native_skills_dir) in + # dynamic_agent_record — see base.py. Picked up on the next Claude + # start, which is fine: this runs at init/start, before launch. + # Configure mlflow autolog claude — writes .claude/settings.json # with the MLflow Stop hook + tracking env vars. Idempotent and # preserves any existing keys in settings.json (mlflow's setup diff --git a/src/dsagt/agents/cline.py b/src/dsagt/agents/cline.py index e6fbac5..dd024d4 100644 --- a/src/dsagt/agents/cline.py +++ b/src/dsagt/agents/cline.py @@ -81,6 +81,9 @@ class ClineSetup(AgentSetup): name = "cline" base_command = ["cline"] static_marker = ".clinerules/dsagt_instructions.md" + # Cline skills are opt-in (Settings → Features → Enable Skills); mirroring + # is harmless if unused, and search_skills covers the disabled case. + native_skills_dir = ".cline/skills" install_hint = "Install with `npm i -g cline`." otel_payload_support = "none" # Cline's CLI nominally supports openai-native + anthropic, but cline @@ -90,19 +93,30 @@ class ClineSetup(AgentSetup): # by cline's anthropic SDK at runtime, covering api.anthropic.com and # gateway endpoints alike. Match roo's policy. credential_env_vars = ( - "ANTHROPIC_API_KEY", "ANTHROPIC_BASE_URL", "ANTHROPIC_MODEL", + "ANTHROPIC_API_KEY", + "ANTHROPIC_BASE_URL", + "ANTHROPIC_MODEL", ) # Cline emits no OTel — agent-side telemetry only via --proxy_traces. telemetry_env = {} credential_hints = ( - ("ANTHROPIC_API_KEY", "your provider API key (works for openai-shape " - "gateways too — cline's anthropic SDK reaches them via " - "ANTHROPIC_BASE_URL)"), - ("ANTHROPIC_BASE_URL", "gateway / proxy URL " - "(cline auth's -b flag is openai-only; the anthropic SDK reads " - "this env var at runtime)"), - ("ANTHROPIC_MODEL", "model name your gateway serves " - "(e.g. claude-haiku-4-5-20251001-v1-project)"), + ( + "ANTHROPIC_API_KEY", + "your provider API key (works for openai-shape " + "gateways too — cline's anthropic SDK reaches them via " + "ANTHROPIC_BASE_URL)", + ), + ( + "ANTHROPIC_BASE_URL", + "gateway / proxy URL " + "(cline auth's -b flag is openai-only; the anthropic SDK reads " + "this env var at runtime)", + ), + ( + "ANTHROPIC_MODEL", + "model name your gateway serves " + "(e.g. claude-haiku-4-5-20251001-v1-project)", + ), ) def write_static(self, working_dir: Path) -> list[str]: @@ -163,15 +177,22 @@ def write_dynamic( "serve both wire protocols on the same key." ) auth_cmd = [ - "cline", "auth", - "--config", cline_dir, - "-p", "anthropic", - "-k", api_key, - "-m", model, + "cline", + "auth", + "--config", + cline_dir, + "-p", + "anthropic", + "-k", + api_key, + "-m", + model, ] result = subprocess.run( - auth_cmd, cwd=str(working_dir), - capture_output=True, text=True, + auth_cmd, + cwd=str(working_dir), + capture_output=True, + text=True, ) if result.returncode != 0: detail = (result.stderr or result.stdout).strip() @@ -190,30 +211,35 @@ def write_dynamic( existing: set[str] = set() if mcp_path.exists(): try: - existing = set(json.loads(mcp_path.read_text()) - .get("mcpServers", {}).keys()) + existing = set( + json.loads(mcp_path.read_text()).get("mcpServers", {}).keys() + ) except (json.JSONDecodeError, OSError): existing = set() - for server in ("registry", "knowledge"): - name = f"dsagt-{server}" - if name in existing: - continue + if "dsagt" not in existing: add_cmd = [ - "cline", "mcp", "add", - "--config", cline_dir, - name, + "cline", + "mcp", + "add", + "--config", + cline_dir, + "dsagt", "--", - "uv", "run", f"dsagt-{server}-server", + "uv", + "run", + "dsagt-server", ] result = subprocess.run( - add_cmd, cwd=str(working_dir), - capture_output=True, text=True, + add_cmd, + cwd=str(working_dir), + capture_output=True, + text=True, ) if result.returncode != 0: detail = (result.stderr or result.stdout).strip() raise RuntimeError( - f"cline mcp add {name} failed " + f"cline mcp add dsagt failed " f"(exit {result.returncode}): {detail}" ) @@ -244,7 +270,10 @@ def write_dynamic( # we extract them into a helper. def _patch_mcp_servers( - self, config: dict, working_dir: Path, cline_dir: str, + self, + config: dict, + working_dir: Path, + cline_dir: str, ) -> str: """Idempotent ``cline mcp add`` + JSON env-block patch. Returns the action string for the printout. @@ -253,27 +282,35 @@ def _patch_mcp_servers( existing: set[str] = set() if mcp_path.exists(): try: - existing = set(json.loads(mcp_path.read_text()) - .get("mcpServers", {}).keys()) + existing = set( + json.loads(mcp_path.read_text()).get("mcpServers", {}).keys() + ) except (json.JSONDecodeError, OSError): existing = set() - for server in ("registry", "knowledge"): - name = f"dsagt-{server}" - if name in existing: - continue + if "dsagt" not in existing: add_cmd = [ - "cline", "mcp", "add", "--config", cline_dir, name, - "--", "uv", "run", f"dsagt-{server}-server", + "cline", + "mcp", + "add", + "--config", + cline_dir, + "dsagt", + "--", + "uv", + "run", + "dsagt-server", ] result = subprocess.run( - add_cmd, cwd=str(working_dir), - capture_output=True, text=True, + add_cmd, + cwd=str(working_dir), + capture_output=True, + text=True, ) if result.returncode != 0: detail = (result.stderr or result.stdout).strip() raise RuntimeError( - f"cline mcp add {name} failed " + f"cline mcp add dsagt failed " f"(exit {result.returncode}): {detail}" ) @@ -308,6 +345,7 @@ def proxy_write_dynamic( """ del pdir from .base import _PROXY_FORWARDED_SENTINEL + actions: list[str] = [] cline_dir = str(working_dir / ".cline-data") Path(cline_dir).mkdir(parents=True, exist_ok=True) @@ -320,16 +358,25 @@ def proxy_write_dynamic( "config['proxy']['port'] and config['llm']['model']." ) auth_cmd = [ - "cline", "auth", - "--config", cline_dir, - "-p", "openai", - "-k", _PROXY_FORWARDED_SENTINEL, - "-m", model, - "-b", f"http://localhost:{proxy_port}", + "cline", + "auth", + "--config", + cline_dir, + "-p", + "openai", + "-k", + _PROXY_FORWARDED_SENTINEL, + "-m", + model, + "-b", + f"http://localhost:{proxy_port}", ] result = subprocess.run( - auth_cmd, env=env, cwd=str(working_dir), - capture_output=True, text=True, + auth_cmd, + env=env, + cwd=str(working_dir), + capture_output=True, + text=True, ) if result.returncode != 0: detail = (result.stderr or result.stdout).strip() @@ -356,6 +403,7 @@ def launch_oneliner(self, project: str, project_dir: Path) -> str: """ del project import shlex + pdir = shlex.quote(str(project_dir)) cline_dir = shlex.quote(str(project_dir / ".cline-data")) return f"cd {pdir} && cline -v -y -a --config {cline_dir}" diff --git a/src/dsagt/agents/codex.py b/src/dsagt/agents/codex.py index 52ef5b3..506bc66 100644 --- a/src/dsagt/agents/codex.py +++ b/src/dsagt/agents/codex.py @@ -90,23 +90,24 @@ def _render_codex_config(mcp_env: dict) -> str: do land in ``codex.tool_result`` log events with full args + output. """ lines: list[str] = [] - for server in ("registry", "knowledge"): - lines.append(f"[mcp_servers.dsagt-{server}]") - lines.append('command = "uv"') - args = _mcp_server_args(server) - args_toml = ", ".join(_toml_quote(a) for a in args) - lines.append(f"args = [{args_toml}]") - if mcp_env: - lines.append(f"[mcp_servers.dsagt-{server}.env]") - for k, v in mcp_env.items(): - lines.append(f"{k} = {_toml_quote(v)}") - lines.append("") - lines.extend([ - "[otel]", - 'trace_exporter = "otlp-http"', - "log_user_prompt = true", - "", - ]) + lines.append("[mcp_servers.dsagt]") + lines.append('command = "uv"') + args = _mcp_server_args() + args_toml = ", ".join(_toml_quote(a) for a in args) + lines.append(f"args = [{args_toml}]") + if mcp_env: + lines.append("[mcp_servers.dsagt.env]") + for k, v in mcp_env.items(): + lines.append(f"{k} = {_toml_quote(v)}") + lines.append("") + lines.extend( + [ + "[otel]", + 'trace_exporter = "otlp-http"', + "log_user_prompt = true", + "", + ] + ) return "\n".join(lines) @@ -126,16 +127,18 @@ def _render_codex_config_proxy(mcp_env: dict, proxy_port: int, model: str) -> st any table headers in TOML. """ base = _render_codex_config(mcp_env) - header = "\n".join([ - f'model = "{model}"', - 'model_provider = "dsagt-proxy"', - '', - '[model_providers.dsagt-proxy]', - 'name = "DSAGT Proxy"', - f'base_url = "http://localhost:{proxy_port}/v1"', - 'wire_api = "chat"', - '', - ]) + header = "\n".join( + [ + f'model = "{model}"', + 'model_provider = "dsagt-proxy"', + "", + "[model_providers.dsagt-proxy]", + 'name = "DSAGT Proxy"', + f'base_url = "http://localhost:{proxy_port}/v1"', + 'wire_api = "chat"', + "", + ] + ) return header + base @@ -143,9 +146,11 @@ class CodexSetup(AgentSetup): name = "codex" base_command = ["codex"] static_marker = "AGENTS.md" + # Project-local .agents/skills (repo-root, codex-discovered) — never the + # global ~/.agents/skills or ~/.codex; manifest-tracked, user skills safe. + native_skills_dir = ".agents/skills" install_hint = ( - "Install with `npm i -g @openai/codex` or " - "`brew install --cask codex`." + "Install with `npm i -g @openai/codex` or " "`brew install --cask codex`." ) otel_payload_support = "partial" # Codex is openai-protocol native. @@ -164,7 +169,9 @@ def write_static(self, working_dir: Path) -> list[str]: instructions = _load_master_instructions() if instructions: action = _append_or_write( - working_dir / "AGENTS.md", instructions, _DSAGT_MARKER, + working_dir / "AGENTS.md", + instructions, + _DSAGT_MARKER, ) if action: actions.append(action) @@ -198,6 +205,7 @@ def write_dynamic( """ del env, pdir import shutil + actions: list[str] = [] codex_home = Path(working_dir) / ".codex-data" codex_home.mkdir(parents=True, exist_ok=True) @@ -220,7 +228,7 @@ def write_dynamic( base_toml = user_config.read_text() if user_config.exists() else "" mcp_env = _mcp_env_block(config) body = _render_codex_config(mcp_env) - merged = (base_toml.rstrip() + "\n\n" + body if base_toml else body) + merged = base_toml.rstrip() + "\n\n" + body if base_toml else body config_path.write_text(merged + "\n") actions.append(f"Wrote {config_path} ({len(mcp_env)} MCP env vars)") return actions @@ -233,6 +241,7 @@ def launch_oneliner(self, project: str, project_dir: Path) -> str: """ del project import shlex + pdir = shlex.quote(str(project_dir)) codex_home = shlex.quote(str(project_dir / ".codex-data")) return f"cd {pdir} && CODEX_HOME={codex_home} codex" @@ -254,6 +263,7 @@ def proxy_write_dynamic( """ del env, pdir import shutil + actions: list[str] = [] codex_home = Path(working_dir) / ".codex-data" codex_home.mkdir(parents=True, exist_ok=True) @@ -282,11 +292,9 @@ def proxy_write_dynamic( base_toml = user_config.read_text() if user_config.exists() else "" mcp_env = _mcp_env_block(config) body = _render_codex_config_proxy(mcp_env, proxy_port, model) - merged = (base_toml.rstrip() + "\n\n" + body if base_toml else body) + merged = base_toml.rstrip() + "\n\n" + body if base_toml else body config_path.write_text(merged + "\n") - actions.append( - f"Wrote {config_path} ({len(mcp_env)} MCP env vars, proxy mode)" - ) + actions.append(f"Wrote {config_path} ({len(mcp_env)} MCP env vars, proxy mode)") return actions def runtime_env(self, config: dict) -> dict[str, str]: @@ -312,10 +320,12 @@ def run_script( if not text: return 1 cmd = [ - "codex", "exec", + "codex", + "exec", "--dangerously-bypass-approvals-and-sandbox", "--skip-git-repo-check", - "-C", str(working_dir), + "-C", + str(working_dir), text, ] return _run_simple_script(cmd, env, working_dir, self.install_hint) diff --git a/src/dsagt/agents/goose.py b/src/dsagt/agents/goose.py index 71f5422..bff8a4b 100644 --- a/src/dsagt/agents/goose.py +++ b/src/dsagt/agents/goose.py @@ -51,6 +51,7 @@ class GooseSetup(AgentSetup): name = "goose" base_command = ["goose", "session"] static_marker = ".goosehints" + native_skills_dir = ".agents/skills" # cross-agent standard goose discovers install_hint = "See https://github.com/block/goose for installation." otel_payload_support = "full" # Multi-protocol; goose's runtime reads provider-specific creds plus @@ -62,21 +63,32 @@ class GooseSetup(AgentSetup): # Without HOST set, goose ignores BASE_URL and hits the provider's # default endpoint — silently for users with a lab gateway. credential_env_vars = ( - "GOOSE_PROVIDER", "GOOSE_MODEL", - "ANTHROPIC_API_KEY", "ANTHROPIC_BASE_URL", "ANTHROPIC_HOST", + "GOOSE_PROVIDER", + "GOOSE_MODEL", + "ANTHROPIC_API_KEY", + "ANTHROPIC_BASE_URL", + "ANTHROPIC_HOST", "ANTHROPIC_MODEL", - "OPENAI_API_KEY", "OPENAI_BASE_URL", "OPENAI_HOST", + "OPENAI_API_KEY", + "OPENAI_BASE_URL", + "OPENAI_HOST", ) # Goose's Rust client emits OTel automatically when # OTEL_EXPORTER_OTLP_ENDPOINT is set — no per-platform flags needed. telemetry_env = {} credential_hints = ( - ("GOOSE_PROVIDER", "anthropic, openai, etc. (skip if global ~/.config/goose configured)"), + ( + "GOOSE_PROVIDER", + "anthropic, openai, etc. (skip if global ~/.config/goose configured)", + ), ("GOOSE_MODEL", "the model name your provider serves"), ("ANTHROPIC_API_KEY", "if GOOSE_PROVIDER=anthropic"), ("ANTHROPIC_HOST", "if GOOSE_PROVIDER=anthropic and on a gateway / proxy"), ("OPENAI_API_KEY", "if GOOSE_PROVIDER=openai"), - ("OPENAI_HOST", "if GOOSE_PROVIDER=openai and on a gateway / proxy (NOT OPENAI_BASE_URL — goose ignores that)"), + ( + "OPENAI_HOST", + "if GOOSE_PROVIDER=openai and on a gateway / proxy (NOT OPENAI_BASE_URL — goose ignores that)", + ), ) def write_static(self, working_dir: Path) -> list[str]: @@ -84,7 +96,9 @@ def write_static(self, working_dir: Path) -> list[str]: instructions = _load_master_instructions() if instructions: action = _append_or_write( - working_dir / ".goosehints", instructions, _DSAGT_MARKER, + working_dir / ".goosehints", + instructions, + _DSAGT_MARKER, ) if action: actions.append(action) @@ -103,16 +117,18 @@ def write_dynamic( del config, env, pdir actions: list[str] = [] - goose_config: dict = {"extensions": {}} - for server in ("registry", "knowledge"): - args = _mcp_server_args(server) - goose_config["extensions"][server] = { - "enabled": True, - "name": server, - "type": "stdio", - "cmd": "uv " + " ".join(args), - "timeout": 300, + args = _mcp_server_args() + goose_config: dict = { + "extensions": { + "dsagt": { + "enabled": True, + "name": "dsagt", + "type": "stdio", + "cmd": "uv " + " ".join(args), + "timeout": 300, + } } + } goose_path = working_dir / "goose.yaml" goose_path.write_text( @@ -160,8 +176,7 @@ def interactive_command(self, config: dict) -> list[str]: """ del config cmd = list(self.base_command) - for server in ("registry", "knowledge"): - cmd.extend(["--with-extension", f"uv run dsagt-{server}-server"]) + cmd.extend(["--with-extension", "uv run dsagt-server"]) return cmd def run_script( @@ -175,8 +190,13 @@ def run_script( """Single ``goose run`` call — goose's instructions file IS multi-turn.""" del config env["GOOSE_MODE"] = "auto" - cmd = ["goose", "run", "--instructions", str(script_path), - "--max-turns", str(max_turns)] - for server in ("registry", "knowledge"): - cmd.extend(["--with-extension", f"uv run dsagt-{server}-server"]) + cmd = [ + "goose", + "run", + "--instructions", + str(script_path), + "--max-turns", + str(max_turns), + ] + cmd.extend(["--with-extension", "uv run dsagt-server"]) return _run_simple_script(cmd, env, working_dir, self.install_hint) diff --git a/src/dsagt/agents/opencode.py b/src/dsagt/agents/opencode.py index 34b6bf2..de764e0 100644 --- a/src/dsagt/agents/opencode.py +++ b/src/dsagt/agents/opencode.py @@ -72,15 +72,14 @@ def _render_opencode_config( "$schema": "https://opencode.ai/config.json", "mcp": {}, } - for server in ("registry", "knowledge"): - entry: dict = { - "type": "local", - "command": ["uv"] + _mcp_server_args(server), - "enabled": True, - } - if mcp_env: - entry["environment"] = dict(mcp_env) - config["mcp"][f"dsagt-{server}"] = entry + entry: dict = { + "type": "local", + "command": ["uv"] + _mcp_server_args(), + "enabled": True, + } + if mcp_env: + entry["environment"] = dict(mcp_env) + config["mcp"]["dsagt"] = entry providers: dict = {} if present_creds.get("OPENAI_API_KEY"): @@ -113,7 +112,9 @@ def _render_opencode_config( def _render_opencode_config_proxy( - mcp_env: dict, proxy_port: int, opencode_model: str | None, + mcp_env: dict, + proxy_port: int, + opencode_model: str | None, ) -> str: """Phase-2 proxy-mode opencode.json body. @@ -128,31 +129,35 @@ def _render_opencode_config_proxy( don't trip ProviderModelNotFoundError. """ from .base import _PROXY_FORWARDED_SENTINEL + proxy_url = f"http://localhost:{proxy_port}" config: dict = { "$schema": "https://opencode.ai/config.json", "mcp": {}, } - for server in ("registry", "knowledge"): - entry: dict = { - "type": "local", - "command": ["uv"] + _mcp_server_args(server), - "enabled": True, - } - if mcp_env: - entry["environment"] = dict(mcp_env) - config["mcp"][f"dsagt-{server}"] = entry + entry: dict = { + "type": "local", + "command": ["uv"] + _mcp_server_args(), + "enabled": True, + } + if mcp_env: + entry["environment"] = dict(mcp_env) + config["mcp"]["dsagt"] = entry # Both providers point at the proxy — opencode picks via model prefix. providers: dict = { - "openai": {"options": { - "apiKey": _PROXY_FORWARDED_SENTINEL, - "baseURL": proxy_url, - }}, - "anthropic": {"options": { - "apiKey": _PROXY_FORWARDED_SENTINEL, - "baseURL": proxy_url, - }}, + "openai": { + "options": { + "apiKey": _PROXY_FORWARDED_SENTINEL, + "baseURL": proxy_url, + } + }, + "anthropic": { + "options": { + "apiKey": _PROXY_FORWARDED_SENTINEL, + "baseURL": proxy_url, + } + }, } if opencode_model and "/" in opencode_model: @@ -179,17 +184,25 @@ class OpenCodeSetup(AgentSetup): # multi-protocol story, no on-disk credential leakage. credential_env_vars = ( "OPENCODE_MODEL", - "OPENAI_API_KEY", "OPENAI_BASE_URL", - "ANTHROPIC_API_KEY", "ANTHROPIC_BASE_URL", + "OPENAI_API_KEY", + "OPENAI_BASE_URL", + "ANTHROPIC_API_KEY", + "ANTHROPIC_BASE_URL", ) telemetry_env = {} credential_hints = ( - ("OPENCODE_MODEL", "model spec '/' " - "(e.g. 'openai/claude-haiku-4-5-20251001-v1-project' for a PNNL-shape " - "openai gateway, or 'anthropic/claude-sonnet-4-5')"), + ( + "OPENCODE_MODEL", + "model spec '/' " + "(e.g. 'openai/claude-haiku-4-5-20251001-v1-project' for a PNNL-shape " + "openai gateway, or 'anthropic/claude-sonnet-4-5')", + ), ("OPENAI_API_KEY", "if your gateway speaks openai wire protocol"), - ("OPENAI_BASE_URL", "openai gateway URL " - "(referenced by opencode.json's provider.openai.options.baseURL)"), + ( + "OPENAI_BASE_URL", + "openai gateway URL " + "(referenced by opencode.json's provider.openai.options.baseURL)", + ), ("ANTHROPIC_API_KEY", "if your gateway speaks anthropic wire protocol"), ("ANTHROPIC_BASE_URL", "anthropic gateway URL"), ) @@ -199,7 +212,9 @@ def write_static(self, working_dir: Path) -> list[str]: instructions = _load_master_instructions() if instructions: action = _append_or_write( - working_dir / "AGENTS.md", instructions, _DSAGT_MARKER, + working_dir / "AGENTS.md", + instructions, + _DSAGT_MARKER, ) if action: actions.append(action) @@ -228,18 +243,21 @@ def write_dynamic( present = { name: bool(env.get(name)) for name in ( - "OPENAI_API_KEY", "OPENAI_BASE_URL", - "ANTHROPIC_API_KEY", "ANTHROPIC_BASE_URL", + "OPENAI_API_KEY", + "OPENAI_BASE_URL", + "ANTHROPIC_API_KEY", + "ANTHROPIC_BASE_URL", ) } body = _render_opencode_config( - mcp_env, present, opencode_model=env.get("OPENCODE_MODEL"), + mcp_env, + present, + opencode_model=env.get("OPENCODE_MODEL"), ) config_path = working_dir / "opencode.json" config_path.write_text(body + "\n") n_providers = sum( - 1 for k in ("OPENAI_API_KEY", "ANTHROPIC_API_KEY") - if present.get(k) + 1 for k in ("OPENAI_API_KEY", "ANTHROPIC_API_KEY") if present.get(k) ) actions.append( f"Wrote {config_path} ({len(mcp_env)} MCP env vars, " @@ -279,9 +297,7 @@ def proxy_write_dynamic( body = _render_opencode_config_proxy(mcp_env, proxy_port, opencode_model) config_path = working_dir / "opencode.json" config_path.write_text(body + "\n") - actions.append( - f"Wrote {config_path} ({len(mcp_env)} MCP env vars, proxy mode)" - ) + actions.append(f"Wrote {config_path} ({len(mcp_env)} MCP env vars, proxy mode)") return actions def run_script( @@ -314,10 +330,13 @@ def run_script( "agents/opencode.py credential_hints." ) cmd = [ - "opencode", "run", - "--dir", str(working_dir), + "opencode", + "run", + "--dir", + str(working_dir), "--dangerously-skip-permissions", - "-m", model, + "-m", + model, text, ] return _run_simple_script(cmd, env, working_dir, self.install_hint) diff --git a/src/dsagt/agents/roo.py b/src/dsagt/agents/roo.py index 6cbc403..bef9816 100644 --- a/src/dsagt/agents/roo.py +++ b/src/dsagt/agents/roo.py @@ -83,6 +83,7 @@ class RooSetup(AgentSetup): name = "roo" base_command = ["roo"] static_marker = ".roomodes" + native_skills_dir = ".roo/skills" install_hint = ( "Install via " "https://github.com/RooCodeInc/Roo-Code/blob/main/apps/cli/install.sh" @@ -94,17 +95,28 @@ class RooSetup(AgentSetup): # impossible with that path. Only the anthropic provider works for # gateways — the Anthropic SDK natively reads ``ANTHROPIC_BASE_URL``. credential_env_vars = ( - "ANTHROPIC_API_KEY", "ANTHROPIC_BASE_URL", "ANTHROPIC_MODEL", + "ANTHROPIC_API_KEY", + "ANTHROPIC_BASE_URL", + "ANTHROPIC_MODEL", ) # Roo emits no OTel — agent-side telemetry only via --proxy_traces. telemetry_env = {} credential_hints = ( - ("ANTHROPIC_API_KEY", "your provider API key (works for openai-shape " - "gateways too — the Anthropic SDK reaches them via ANTHROPIC_BASE_URL)"), - ("ANTHROPIC_BASE_URL", "gateway / proxy URL " - "(roo CLI has no --base-url flag; this env var is the only way)"), - ("ANTHROPIC_MODEL", "model name your gateway serves " - "(e.g. claude-haiku-4-5-20251001-v1-project)"), + ( + "ANTHROPIC_API_KEY", + "your provider API key (works for openai-shape " + "gateways too — the Anthropic SDK reaches them via ANTHROPIC_BASE_URL)", + ), + ( + "ANTHROPIC_BASE_URL", + "gateway / proxy URL " + "(roo CLI has no --base-url flag; this env var is the only way)", + ), + ( + "ANTHROPIC_MODEL", + "model name your gateway serves " + "(e.g. claude-haiku-4-5-20251001-v1-project)", + ), ) def vscode_hint(self, project_dir: Path) -> list[str]: @@ -176,6 +188,7 @@ def launch_oneliner(self, project: str, project_dir: Path) -> str: """ del project import shlex + pdir = shlex.quote(str(project_dir)) return f"cd {pdir} && roo --mode dsagt" @@ -225,20 +238,26 @@ def proxy_run_script( """ del max_turns from .base import _PROXY_FORWARDED_SENTINEL + model = (config.get("llm") or {}).get("model") if not model: - raise RuntimeError( - "roo proxy_run_script requires config['llm']['model']." - ) + raise RuntimeError("roo proxy_run_script requires config['llm']['model'].") cmd = [ "roo", - "--print", "--oneshot", - "--mode", "dsagt", - "--prompt-file", str(script_path), - "--workspace", str(working_dir), + "--print", + "--oneshot", + "--mode", + "dsagt", + "--prompt-file", + str(script_path), + "--workspace", + str(working_dir), "--debug", - "--provider", "anthropic", - "--api-key", _PROXY_FORWARDED_SENTINEL, - "--model", model, + "--provider", + "anthropic", + "--api-key", + _PROXY_FORWARDED_SENTINEL, + "--model", + model, ] return _run_simple_script(cmd, env, working_dir, self.install_hint) diff --git a/src/dsagt/commands/cli.py b/src/dsagt/commands/cli.py index 40d30d4..adbe375 100644 --- a/src/dsagt/commands/cli.py +++ b/src/dsagt/commands/cli.py @@ -382,6 +382,142 @@ def _cmd_setup_kb(args): run_setup_kb(args) +def _cmd_skills(args): + """Manage external skill catalogs and project skill installs.""" + from dsagt.registry import SkillRegistry + from dsagt.session import kb_from_config, load_config + from dsagt.skills import ( + KNOWN_SOURCES, + SkillRouter, + install_into_project, + persist_source_to_config, + resolve_source, + sync_source, + ) + + def _open_kb(): + """Best-effort KB; None when no embedder is configured (keyword fallback).""" + try: + return kb_from_config(config) + except Exception: + return None + + action = getattr(args, "skills_action", None) + if not action: + print( + "Usage: dsagt skills ...", file=sys.stderr + ) + return 1 + + config = load_config(args.project) + pdir = Path(config["project_dir"]) + + if action == "sync": + kb = kb_from_config(config) + try: + sources = ( + [args.source] + if args.source + else config.get("skills", {}).get("sources", []) + ) + if not sources: + print("No skill sources configured.") + return 0 + for src in sources: + stats = sync_source(src, kb=kb, force=args.force) + print( + f" {stats['url']}: {stats['indexed']} skill(s) indexed (slug {stats['slug']})" + ) + finally: + kb.close() + return 0 + + if action == "add": + from dsagt.skills import SKILL_SOURCES_DIR + + target = args.target + # A "/" target is a source-qualified *install*, not + # a new "owner/repo" source to clone — both have one '/', so distinguish + # by whether the prefix is an already-synced source's cache dir. + qualified_install = ( + target.count("/") == 1 + and (SKILL_SOURCES_DIR / target.split("/", 1)[0]).is_dir() + ) + is_source = not qualified_install and ( + target in KNOWN_SOURCES + or target.startswith(("http://", "https://", "git@")) + or target.count("/") == 1 + ) + if is_source: + spec = resolve_source(target) + if target in KNOWN_SOURCES: + spec.setdefault("name", target) + kb = kb_from_config(config) + try: + stats = sync_source(target, kb=kb) + finally: + kb.close() + persist_source_to_config( + pdir, {"name": spec.get("name", stats["slug"]), **spec} + ) + print(f"Added source {stats['url']}: {stats['indexed']} skill(s) indexed.") + print( + "Run 'dsagt start' to mirror an installed skill natively, or " + f"'dsagt skills add {args.project} ' to install one." + ) + else: + try: + info = install_into_project(target, pdir) + except LookupError as e: + print(f"Error: {e}", file=sys.stderr) + return 1 + print( + f"{info['action'].capitalize()} skill '{info['name']}' at {info['dest_dir']}." + ) + print("It becomes natively discoverable on the next 'dsagt start'.") + return 0 + + if action == "list": + if args.catalog: + kb = _open_kb() + try: + reg = SkillRegistry(runtime_dir=pdir, kb=kb) + sources = SkillRouter(skill_registry=reg, kb=kb).list_sources() + finally: + if kb is not None: + kb.close() + synced = [s for s in sources if s["synced"]] + if synced: + print("Synced catalog sources:") + for s in synced: + print( + f" {s['name']}: {s['indexed']} skill(s) indexed ({s['url']})" + ) + else: + print("No catalog synced. Run 'dsagt skills sync'.") + else: + reg = SkillRegistry(runtime_dir=pdir, kb=None) + skills = reg.list_skills() + print(f"Installed/bundled skills ({len(skills)}):") + for s in skills: + print(f" {s.get('name')} — {(s.get('description') or '')[:80]}") + return 0 + + if action == "search": + kb = _open_kb() + try: + reg = SkillRegistry(runtime_dir=pdir, kb=kb) + router = SkillRouter(skill_registry=reg, kb=kb) + print(router.search(args.query)) + finally: + if kb is not None: + kb.close() + return 0 + + print(f"Unknown skills action: {action}", file=sys.stderr) + return 1 + + def _cmd_mlflow(args): """Run MLflow in the foreground. @@ -890,10 +1026,13 @@ def _run_one(agent: str) -> tuple[str, int, float]: def main(argv=None): + from dsagt import __version__ + parser = argparse.ArgumentParser( prog="dsagt", description="DSAgt project and session management." ) parser.add_argument("--verbose", action="store_true") + parser.add_argument("--version", action="version", version=f"dsagt {__version__}") sub = parser.add_subparsers(dest="command") @@ -1032,6 +1171,40 @@ def main(argv=None): add_setup_kb_args(p_setup_kb) + p_skills = sub.add_parser( + "skills", help="Manage external skill catalogs and project installs" + ) + skills_sub = p_skills.add_subparsers(dest="skills_action") + sp_sync = skills_sub.add_parser( + "sync", help="Clone + index skill source(s) into the catalog" + ) + sp_sync.add_argument("project", help="Project name") + sp_sync.add_argument( + "--source", help="Known source name or GitHub URL (default: all configured)" + ) + sp_sync.add_argument( + "--force", action="store_true", help="Re-clone sources from scratch" + ) + sp_add = skills_sub.add_parser( + "add", help="Install a catalog skill, or add+sync a new source" + ) + sp_add.add_argument("project", help="Project name") + sp_add.add_argument( + "target", help="Skill name to install, or source name/URL to add" + ) + sp_list = skills_sub.add_parser( + "list", help="List installed skills (or --catalog collections)" + ) + sp_list.add_argument("project", help="Project name") + sp_list.add_argument( + "--catalog", action="store_true", help="List synced catalog collections" + ) + sp_search = skills_sub.add_parser( + "search", help="Search installed + catalog skills" + ) + sp_search.add_argument("project", help="Project name") + sp_search.add_argument("query", help="Search query") + sub.add_parser("list", help="List all registered projects and their status") p_mv = sub.add_parser("mv", help="Move a project to a new location") @@ -1077,6 +1250,7 @@ def main(argv=None): "stop": _cmd_stop, "smoke-test": _cmd_smoke_test, "setup-kb": _cmd_setup_kb, + "skills": _cmd_skills, "list": _cmd_list, "mv": _cmd_mv, "rm": _cmd_rm, diff --git a/src/dsagt/commands/info.py b/src/dsagt/commands/info.py index 63adf46..0669d0c 100644 --- a/src/dsagt/commands/info.py +++ b/src/dsagt/commands/info.py @@ -18,8 +18,7 @@ and the agent's own OTel SDK). Possible values: - ``claude-code`` / ``goose`` / ``cline`` / ``roo`` / ``codex`` — agent-emitted LLM-call traces (the bulk of traffic) - - ``dsagt-knowledge-server`` / ``dsagt-registry-server`` — MCP server - spans (``kb.*``, ``registry.*``) + - ``dsagt-server`` — merged MCP server spans (``kb.*``, ``registry.*``) - ``dsagt-run`` — tool-execute spans """ @@ -257,8 +256,8 @@ def _fmt_count(n: int) -> str: # trace, so the no-root-span fallback in _source_from_spans needs # this name registered too. ("proxy_pre_call", "dsagt-proxy"), - ("kb.", "dsagt-knowledge-server"), - ("registry.", "dsagt-registry-server"), + ("kb.", "dsagt-server"), + ("registry.", "dsagt-server"), ("tool.execute", "dsagt-run"), ) diff --git a/src/dsagt/commands/knowledge_server.py b/src/dsagt/commands/knowledge_server.py deleted file mode 100644 index dd970cd..0000000 --- a/src/dsagt/commands/knowledge_server.py +++ /dev/null @@ -1,925 +0,0 @@ -""" -DSAgt Knowledge Base MCP Server. - -Provides semantic search over document collections for MCP-compatible agents. - -At startup, symlinks base indexes into a session-specific runtime directory. -All modifications (ingestion, append) happen in the runtime copy. - -Long-running operations (ingest, append) run in the background and return -immediately with a job_id. Use kb_job_status to poll for completion. - -Server configuration (chunk_size, vector_db, rerank) is read from the -project's dsagt_config.yaml. Embedding credentials flow through env vars -(LLM_API_KEY, OPENAI_BASE_URL, EMBEDDING_MODEL) set by dsagt start. - -Usage: - dsagt-knowledge-server --base-index-dir ./kb_index --runtime-dir ./runtime -""" - -import asyncio -import json -import logging -import os -import time -from dataclasses import dataclass, field -from functools import partial - -# Prevent fatal OpenMP crash when multiple libraries (FAISS, PyTorch/ -# sentence-transformers) each bundle their own libomp. -os.environ.setdefault("KMP_DUPLICATE_LIB_OK", "TRUE") - -import uuid -from pathlib import Path - -import mcp.server.stdio -import mcp.types as types -import yaml -from mcp.server.lowlevel import Server, NotificationOptions -from mcp.server.models import InitializationOptions - -from dsagt.knowledge import EMBEDDER_REGISTRY, VECTORINDEX_REGISTRY, CollectionRoute, KnowledgeBase -from dsagt.memory import SuggestionQueue -from dsagt.memory import ExplicitMemory -from dsagt.session import REGISTRY_DIR, _collection_exists, setup_runtime_kb # noqa: F401 - -logger = logging.getLogger(__name__) - - -# --------------------------------------------------------------------------- -# MCP server helpers -# --------------------------------------------------------------------------- - -async def _run_stdio(server: Server, name: str) -> None: - async with mcp.server.stdio.stdio_server() as (read_stream, write_stream): - await server.run( - read_stream, - write_stream, - InitializationOptions( - server_name=name, - server_version="0.1.0", - capabilities=server.get_capabilities( - notification_options=NotificationOptions(), - experimental_capabilities={}, - ), - ), - ) - - - - -# _collection_exists and setup_runtime_kb live in dsagt.session (imported above). - - -def _register_external_collection( - kb: KnowledgeBase, - collection_name: str, - vector_db: str, - connection_params: dict, - embedding_model: str, - description: str, -) -> None: - """Wire an already-built external vector store into the routing registry.""" - coll_dir = kb.index_dir / collection_name - coll_dir.mkdir(exist_ok=True) - - if description: - (coll_dir / "DESCRIPTION.md").write_text(description) - - if vector_db == "chroma": - index_kwargs = { - "collection_name": connection_params.get("collection", collection_name), - "persist_dir": None, - "host": connection_params.get("host", "localhost"), - "port": connection_params.get("port", 8000), - } - elif vector_db == "lancedb": - index_kwargs = { - "uri": connection_params["uri"], - "table": connection_params.get("table", collection_name), - } - elif vector_db == "qdrant": - index_kwargs = { - "url": connection_params["url"], - "collection": connection_params.get("collection", collection_name), - "api_key": connection_params.get("api_key"), - } - else: - raise ValueError( - f"Unsupported vector DB '{vector_db}'. " - f"Choose from: chroma, lancedb, qdrant" - ) - - route = CollectionRoute( - embedding_backend="api", - vector_db=vector_db, - embedder_kwargs={"model": embedding_model}, - index_kwargs=index_kwargs, - description=description, - ) - kb.register_route(collection_name, route) - - - - -# --------------------------------------------------------------------------- -# Background job tracker -# --------------------------------------------------------------------------- - -@dataclass -class _JobTracker: - """Tracks background ingest/append jobs and their completion state.""" - - jobs: dict[str, dict] = field(default_factory=dict) - active_collections: set[str] = field(default_factory=set) - - def start(self, coro, collection: str | None = None) -> str: - job_id = uuid.uuid4().hex[:8] - self.jobs[job_id] = { - "status": "running", - "result": None, - "error": None, - "collection": collection, - "started_at": time.monotonic(), - "message": "Starting -- embedding documents via API...", - } - if collection: - self.active_collections.add(collection) - - tracker = self # capture for the closure - - async def _run(): - try: - tracker.jobs[job_id]["message"] = "Embedding and indexing documents..." - result = await coro - tracker.jobs[job_id]["status"] = "complete" - tracker.jobs[job_id]["result"] = result - tracker.jobs[job_id]["message"] = "Done." - except Exception as e: - import traceback - tb = traceback.format_exc() - tracker.jobs[job_id]["status"] = "error" - tracker.jobs[job_id]["error"] = f"{type(e).__name__}: {e}" - tracker.jobs[job_id]["message"] = f"Failed: {type(e).__name__}: {e}" - tracker.jobs[job_id]["traceback"] = tb - logger.error("Job %s failed: %s\n%s", job_id, e, tb) - finally: - if collection: - tracker.active_collections.discard(collection) - - asyncio.get_event_loop().create_task(_run()) - return job_id - - -# --------------------------------------------------------------------------- -# Per-tool handlers (module-level, explicit dependencies) -# -# Each handler takes ``arguments: dict`` plus its dependencies as keyword -# args bound via functools.partial in create_knowledge_server(). Handlers -# return a result dict; the outer call_tool wrapper JSON-serializes it. -# --------------------------------------------------------------------------- - -async def _handle_kb_list_collections(arguments: dict, *, kb: KnowledgeBase) -> dict: - collections = await asyncio.to_thread(kb.list_collections) - return {"status": "ok", "collections": collections, "count": len(collections)} - - -async def _handle_kb_search( - arguments: dict, *, kb: KnowledgeBase, -) -> dict: - query = arguments["query"] - top_k = arguments.get("top_k", 5) - rerank = arguments.get("rerank") # None → kb.default_rerank - - collection_arg = arguments.get("collection") - collections_arg = arguments.get("collections") - - if not collection_arg and not collections_arg: - return {"status": "error", "error": "Provide 'collection' or 'collections'"} - - # Build ChromaDB where clause from the filter arguments. ChromaDB - # requires single-filter dicts or $and-wrapped lists; an empty dict - # would be invalid, so we only pass where when there are real filters. - where = { - key: arguments[key] - for key in ("category", "session_id", "source_type", "tool_name") - if arguments.get(key) is not None - } - return_code = arguments.get("return_code") - if return_code is not None: - where["return_code"] = int(return_code) - if len(where) > 1: - where = {"$and": [{k: v} for k, v in where.items()]} - - target_collections = collections_arg or [collection_arg] - all_results = [] - search_errors = [] - - for coll_name in target_collections: - try: - search_kwargs = dict(query=query, collection=coll_name, top_k=top_k, rerank=rerank) - if where: - search_kwargs["where"] = where - coll_results = await asyncio.to_thread(kb.search, **search_kwargs) - all_results.extend(coll_results) - except ValueError as e: - logger.warning("Search failed for '%s': %s", coll_name, e) - search_errors.append(str(e)) - - if search_errors and not all_results: - if len(target_collections) == 1: - return {"status": "error", "error": search_errors[0]} - return {"status": "error", "error": f"All collections failed: {'; '.join(search_errors)}"} - - score_key = "rerank_score" if rerank else "score" - all_results.sort(key=lambda r: r.get(score_key, r["score"]), reverse=True) - all_results = all_results[:top_k] - - result = { - "status": "ok", - "query": query, - "collection": collection_arg or ",".join(collections_arg), - "result_count": len(all_results), - "results": [ - { - "text": r["chunk"]["text"], - "score": r["score"], - "rerank_score": r.get("rerank_score"), - "source_file": r["chunk"]["metadata"].get("source_file", ""), - "chunk_index": r["chunk"]["metadata"].get("chunk_index", 0), - "metadata": { - k: v for k, v in r["chunk"]["metadata"].items() - if k not in ("source_file", "chunk_index", "collection", "file_type") - }, - } - for r in all_results - ], - } - if search_errors: - result["warnings"] = search_errors - return result - - -async def _handle_kb_ingest( - arguments: dict, *, kb: KnowledgeBase, job_tracker: _JobTracker, -) -> dict: - folder_path = Path(arguments["folder_path"]) - collection_name = arguments.get("collection_name") - file_types = arguments.get("file_types") - embedding_backend = arguments.get("embedding_backend") - embedding_model = arguments.get("embedding_model") - vector_db = arguments.get("vector_db") - - if not folder_path.exists(): - return {"status": "error", "error": f"Folder not found: {folder_path}"} - if not folder_path.is_dir(): - return {"status": "error", "error": f"Not a directory: {folder_path}"} - - target_name = collection_name or folder_path.name - warning = None - - if target_name in job_tracker.active_collections: - return { - "status": "error", - "error": ( - f"Collection '{target_name}' is already being ingested. " - f"Poll kb_job_status for progress." - ), - } - - if _collection_exists(kb.index_dir / target_name): - source_path = kb.index_dir / target_name / "source.txt" - existing_source = source_path.read_text().strip() if source_path.exists() else None - same_source = ( - existing_source is None - or Path(existing_source).resolve() == folder_path.resolve() - ) - if not same_source: - original_name = target_name - n = 1 - while ( - _collection_exists(kb.index_dir / target_name) - or target_name in job_tracker.active_collections - ): - target_name = f"{original_name}{n}" - n += 1 - warning = ( - f"Collection '{original_name}' already exists from a " - f"different folder; using '{target_name}'." - ) - - route = None - if embedding_backend or embedding_model or vector_db: - default = kb._default_route - inherited_model = embedding_model or default.embedder_kwargs.get("model") - route = CollectionRoute( - embedding_backend=embedding_backend or default.embedding_backend, - vector_db=vector_db or default.vector_db, - embedder_kwargs={"model": inherited_model} if inherited_model else {}, - ) - - ingest_kwargs: dict = {"collection_name": target_name} - if file_types: - ingest_kwargs["file_types"] = file_types - if route is not None: - ingest_kwargs["route"] = route - - async def _ingest_with_logging(): - import traceback as _tb - logger.info("Ingest starting: collection=%s folder=%s kwargs=%s", - target_name, folder_path, ingest_kwargs) - try: - result = await asyncio.to_thread(kb.ingest, folder_path, **ingest_kwargs) - logger.info("Ingest complete: %s", result) - return result - except Exception as _e: - logger.error("Ingest FAILED: %s\n%s", _e, _tb.format_exc()) - raise - - job_id = job_tracker.start(_ingest_with_logging(), collection=target_name) - result = { - "status": "started", - "job_id": job_id, - "collection": target_name, - "message": ( - f"Ingestion started. " - f"Poll kb_job_status(job_id='{job_id}') every 10 seconds. " - f"DO NOT call ingest again -- the job is running in the " - f"background. Large folders may take several minutes." - ), - } - if warning: - result["warning"] = warning - return result - - -async def _handle_kb_append( - arguments: dict, *, kb: KnowledgeBase, job_tracker: _JobTracker, -) -> dict: - collection = arguments["collection"] - paths = arguments["paths"] - if isinstance(paths, str): - paths = [paths] - file_types = arguments.get("file_types") - - if not _collection_exists(kb.index_dir / collection): - return {"status": "error", "error": f"Collection '{collection}' not found"} - - append_kwargs: dict = {} - if file_types: - append_kwargs["file_types"] = file_types - - job_id = job_tracker.start( - asyncio.to_thread(kb.append, collection, paths, **append_kwargs), - collection=collection, - ) - return { - "status": "started", - "job_id": job_id, - "collection": collection, - "message": f"Append started. Poll kb_job_status(job_id='{job_id}') for progress.", - } - - -async def _handle_kb_add_vector_db(arguments: dict, *, kb: KnowledgeBase) -> dict: - collection_name = arguments["collection_name"] - vector_db = arguments["vector_db"] - connection_params = arguments["connection_params"] - embedding_model = arguments["embedding_model"] - description = arguments.get("description", "") - - if (kb.index_dir / collection_name).exists(): - return { - "status": "error", - "error": ( - f"Collection '{collection_name}' already exists. " - "Choose a different name or delete the existing collection." - ), - } - - await asyncio.to_thread( - _register_external_collection, - kb, collection_name, vector_db, - connection_params, embedding_model, description, - ) - return { - "status": "ok", - "collection": collection_name, - "vector_db": vector_db, - "embedding_model": embedding_model, - "message": ( - f"External collection '{collection_name}' registered. " - "Use search to query it." - ), - } - - -async def _handle_kb_job_status(arguments: dict, *, job_tracker: _JobTracker) -> dict: - job_id = arguments["job_id"] - if job_id not in job_tracker.jobs: - return {"status": "error", "error": f"Unknown job: {job_id}"} - - job = job_tracker.jobs[job_id] - elapsed = int(time.monotonic() - job["started_at"]) - result = { - "status": job["status"], - "elapsed_seconds": elapsed, - "message": job.get("message", ""), - } - if job["status"] == "running": - result["instruction"] = ( - "Job is still running. DO NOT call ingest again. " - "Keep polling job_status every 10 seconds until " - "status is 'complete' or 'error'." - ) - if job["result"] is not None: - result["result"] = job["result"] - if job["error"] is not None: - result["error"] = job["error"] - if job.get("traceback") and job["status"] == "error": - result["traceback"] = job["traceback"] - return result - - -async def _handle_kb_remember( - arguments: dict, - *, - kb: KnowledgeBase, - memory: ExplicitMemory, - suggestions: SuggestionQueue, -) -> dict: - text = arguments["text"] - category = arguments.get("category", "") - session_id = arguments.get("session_id", "") - supersedes = arguments.get("supersedes") - promoted_from = arguments.get("promoted_from") - - store_result = await asyncio.to_thread( - memory.remember, - text=text, - category=category, - session_id=session_id, - supersedes=supersedes, - ) - - if not store_result.get("stored"): - return { - "status": "error", - "error": store_result.get("error", "Failed to store memory"), - } - - await asyncio.to_thread( - kb.add_entries, - texts=[text], - collection="session_memory", - metadatas=[{ - "source_type": "explicit_memory", - "category": category, - "session_id": session_id, - }], - ) - - if promoted_from: - suggestions.dismiss(promoted_from) - - return { - "status": "ok", - "entry_id": store_result["entry_id"], - "superseded_id": store_result.get("superseded_id"), - "promoted_from": promoted_from, - "total_memories": await asyncio.to_thread(memory.count), - } - - -async def _handle_kb_get_memories( - arguments: dict, *, memory: ExplicitMemory, suggestions: SuggestionQueue, -) -> dict: - entries = await asyncio.to_thread(memory.get_all) - pending = suggestions.get_all() - result = {"status": "ok", "count": len(entries), "memories": entries} - if pending: - result["suggestions"] = pending - result["suggestion_count"] = len(pending) - return result - - -async def _handle_kb_get_suggestions( - arguments: dict, *, suggestions: SuggestionQueue, -) -> dict: - pending = suggestions.get_all() - return {"status": "ok", "count": len(pending), "suggestions": pending} - - -async def _handle_kb_dismiss_suggestion( - arguments: dict, *, suggestions: SuggestionQueue, -) -> dict: - suggestion_id = arguments["suggestion_id"] - dismissed = suggestions.dismiss(suggestion_id) - if not dismissed: - return {"status": "error", "error": f"Suggestion not found: {suggestion_id}"} - return {"status": "ok", "dismissed": suggestion_id, "remaining": suggestions.count} - - -# --------------------------------------------------------------------------- -# Server factory (thin wiring — used by main() and tests) -# --------------------------------------------------------------------------- - -def create_knowledge_server( - kb: KnowledgeBase, - runtime_dir: str | Path | None = None, -): - """Create and configure the MCP knowledge server. - - This is the test-facing API: tests call it with a mock KB and get back - a server they can drive via call_tool_sync(). main() reads the project - config and constructs KB before calling this. - - The rerank default is on ``kb.default_rerank`` (set from - ``knowledge.rerank`` in dsagt_config.yaml). - """ - server = Server("knowledge") - - mem_dir = Path(runtime_dir) if runtime_dir else kb.index_dir.parent - memory = ExplicitMemory(runtime_dir=mem_dir) - suggestions = SuggestionQueue(mem_dir / "suggestions.json") - job_tracker = _JobTracker() - - handlers = { - "kb_list_collections": partial(_handle_kb_list_collections, kb=kb), - "kb_search": partial(_handle_kb_search, kb=kb), - "kb_ingest": partial(_handle_kb_ingest, kb=kb, job_tracker=job_tracker), - "kb_append": partial(_handle_kb_append, kb=kb, job_tracker=job_tracker), - "kb_add_vector_db": partial(_handle_kb_add_vector_db, kb=kb), - "kb_job_status": partial(_handle_kb_job_status, job_tracker=job_tracker), - "kb_remember": partial(_handle_kb_remember, kb=kb, memory=memory, suggestions=suggestions), - "kb_get_memories": partial(_handle_kb_get_memories, memory=memory, suggestions=suggestions), - "kb_get_suggestions": partial(_handle_kb_get_suggestions, suggestions=suggestions), - "kb_dismiss_suggestion": partial(_handle_kb_dismiss_suggestion, suggestions=suggestions), - } - - @server.list_tools() - async def list_tools() -> list[types.Tool]: - return [ - types.Tool( - name="kb_list_collections", - description=( - "List all available knowledge base collections with their " - "embedding model and vector DB. Use this to discover what " - "documentation is already indexed." - ), - inputSchema={"type": "object", "properties": {}}, - ), - types.Tool( - name="kb_search", - description=( - "Search knowledge base collections using semantic similarity. " - "Returns relevant chunks with source metadata. " - "Supports multi-collection search." - ), - inputSchema={ - "type": "object", - "properties": { - "query": { - "type": "string", - "description": "Natural language search query", - }, - "collection": { - "type": "string", - "description": "Name of a single collection to search", - }, - "collections": { - "type": "array", - "items": {"type": "string"}, - "description": "Search multiple collections and merge results (overrides 'collection')", - }, - "top_k": { - "type": "integer", - "description": "Number of results to return (default: 5)", - "default": 5, - }, - "rerank": { - "type": "boolean", - "description": "Use cross-encoder reranking (slower but more accurate). Default from config.", - "default": kb.default_rerank, - }, - "category": { - "type": "string", - "description": "Filter by category tag (ChromaDB collections only)", - }, - "session_id": { - "type": "string", - "description": "Filter by session ID (ChromaDB collections only)", - }, - "tool_name": { - "type": "string", - "description": "Filter by tool name (ChromaDB collections only)", - }, - "source_type": { - "type": "string", - "description": "Filter by source type (ChromaDB collections only)", - }, - "return_code": { - "type": "integer", - "description": "Filter by tool exit code (ChromaDB collections only)", - }, - }, - "required": ["query"], - }, - ), - types.Tool( - name="kb_ingest", - description=( - "Index a folder as a new knowledge base collection. " - "Returns immediately with a job_id. " - "IMPORTANT: poll kb_job_status every 10 seconds and wait for " - "status='complete'. DO NOT call ingest again for the same " - "folder while a job is running." - ), - inputSchema={ - "type": "object", - "properties": { - "folder_path": { - "type": "string", - "description": "Path to folder containing documents to index", - }, - "collection_name": { - "type": "string", - "description": "Name for the collection (default: folder name)", - }, - "file_types": { - "type": "array", - "items": {"type": "string"}, - "description": "File extensions to include, e.g. ['pdf', 'md', 'py']. Defaults to common types.", - }, - "embedding_backend": { - "type": "string", - "enum": list(EMBEDDER_REGISTRY.keys()), - "description": "Embedding backend override for this collection.", - }, - "embedding_model": { - "type": "string", - "description": "Embedding model override for this collection.", - }, - "vector_db": { - "type": "string", - "enum": list(VECTORINDEX_REGISTRY.keys()), - "description": "Vector database override for this collection.", - }, - }, - "required": ["folder_path"], - }, - ), - types.Tool( - name="kb_append", - description=( - "Add documents to an existing collection. Uses the same embedding " - "model and vector DB the collection was created with. " - "Returns immediately with a job_id -- poll kb_job_status for progress." - ), - inputSchema={ - "type": "object", - "properties": { - "collection": { - "type": "string", - "description": "Name of the existing collection to append to", - }, - "paths": { - "type": "array", - "items": {"type": "string"}, - "description": "List of file or folder paths to add", - }, - "file_types": { - "type": "array", - "items": {"type": "string"}, - "description": "File extensions to include when expanding folders.", - }, - }, - "required": ["collection", "paths"], - }, - ), - types.Tool( - name="kb_add_vector_db", - description=( - "Register an already-built external vector store as a collection. " - "Queries will be embedded via the API using the specified model." - ), - inputSchema={ - "type": "object", - "properties": { - "collection_name": {"type": "string", "description": "Unique name for this collection"}, - "vector_db": {"type": "string", "enum": ["chroma", "lancedb", "qdrant"], "description": "Vector store backend type"}, - "connection_params": {"type": "object", "description": "Backend-specific connection parameters."}, - "embedding_model": {"type": "string", "description": "The API model used to build this index"}, - "description": {"type": "string", "description": "Human-readable description for agent discovery"}, - }, - "required": ["collection_name", "vector_db", "connection_params", "embedding_model"], - }, - ), - types.Tool( - name="kb_job_status", - description="Check the status of a background ingest or append job.", - inputSchema={ - "type": "object", - "properties": { - "job_id": {"type": "string", "description": "Job ID returned by kb_ingest or kb_append"}, - }, - "required": ["job_id"], - }, - ), - types.Tool( - name="kb_remember", - description=( - "Store a user-confirmed fact as an explicit memory. " - "These persist across sessions. Use 'supersedes' to replace an outdated memory." - ), - inputSchema={ - "type": "object", - "properties": { - "text": {"type": "string", "description": "The fact to remember"}, - "category": {"type": "string", "description": "Classification tag"}, - "session_id": {"type": "string", "description": "Current session identifier"}, - "supersedes": {"type": "string", "description": "entry_id of an existing memory this replaces"}, - "promoted_from": {"type": "string", "description": "suggestion_id if promoted from outlier suggestion"}, - }, - "required": ["text"], - }, - ), - types.Tool( - name="kb_get_memories", - description=( - "Get all active explicit memories for this project. " - "Call at session start to load project context." - ), - inputSchema={"type": "object", "properties": {}}, - ), - types.Tool( - name="kb_get_suggestions", - description=( - "Get pending memory suggestions flagged by outlier detection. " - "Present to user for confirmation or dismissal." - ), - inputSchema={"type": "object", "properties": {}}, - ), - types.Tool( - name="kb_dismiss_suggestion", - description="Dismiss a pending memory suggestion.", - inputSchema={ - "type": "object", - "properties": { - "suggestion_id": {"type": "string", "description": "ID of the suggestion to dismiss"}, - }, - "required": ["suggestion_id"], - }, - ), - ] - - @server.call_tool() - async def call_tool(name: str, arguments: dict) -> list[types.TextContent]: - handler = handlers[name] - try: - result = await handler(arguments) - except ValueError as e: - result = {"status": "error", "error": str(e)} - except Exception as e: - logger.exception("Unexpected error in tool '%s'", name) - result = {"status": "error", "error": f"Unexpected error: {e}"} - return [types.TextContent(type="text", text=json.dumps(result, ensure_ascii=False))] - - return server - - -# --------------------------------------------------------------------------- -# Entry point -# --------------------------------------------------------------------------- - -def main(): - """Entry point for dsagt-knowledge-server. - - All configuration comes from the project directory: - - ``./dsagt_config.yaml`` → project path + non-secret settings - (chunk_size, vector_db, rerank) - - ``LLM_API_KEY``, ``OPENAI_BASE_URL`` env vars → embedding credentials - - No CLI arguments — the server derives everything from the YAML. By - contract the agent's launch one-liner is ``cd && ``, - so cwd is project_dir for the MCP children it spawns. - """ - from dsagt.observability import find_project_config - project_dir, _ = find_project_config() - if project_dir is None: - raise RuntimeError( - "dsagt-knowledge-server: no dsagt_config.yaml in cwd " - f"({Path.cwd()}). Launch the agent from the project " - "directory (`cd && `)." - ) - - log_file = project_dir / "dsagt_knowledge_server.log" - # Default INFO; users opt into DEBUG via DSAGT_LOG_LEVEL=DEBUG. See - # registry_server.py main() for rationale (httpcore/urllib3/llama_index - # at DEBUG floods agent debug output). - _level_name = os.environ.get("DSAGT_LOG_LEVEL", "INFO").upper() - _level = getattr(logging, _level_name, logging.INFO) - logging.basicConfig( - level=_level, - format="%(asctime)s %(levelname)s %(name)s: %(message)s", - handlers=[ - logging.FileHandler(log_file, mode="a"), - logging.StreamHandler(), - ], - ) - logger.info("Server starting — project_dir: %s, log: %s", project_dir, log_file) - - # Read project config. Required — this server runs inside a project - # created by dsagt init. Every section must be present with all fields - # filled. dsagt init generates complete defaults; if anything is missing, - # the config is broken and the server fails fast. - config_path = project_dir / "dsagt_config.yaml" - from dsagt.session import resolve_env_vars - config = resolve_env_vars(yaml.safe_load(config_path.read_text())) - - kb_config = config["knowledge"] - emb_config = config["embedding"] - - # Embedding backend selection. Default is "local" (sentence-transformers, - # CPU, no creds) so a fresh ``dsagt init`` works zero-config. Switching - # to "api" requires base_url + api_key — validate eagerly so a misconfig - # surfaces at MCP-server startup rather than at the first kb_search. - backend = (emb_config.get("backend") or "local").lower() - if backend not in ("local", "api"): - raise ValueError( - f"embedding.backend must be 'local' or 'api' (got {backend!r})" - ) - - # Only pass an explicit ``model`` when the user filled one in. Empty / - # ``${EMBEDDING_MODEL}`` placeholders mean "use the backend's default" - # — LocalEmbeddingClient has its own default; APIEmbeddingClient has - # no default and will raise downstream, which is what we want for a - # misconfigured api setup. - # - # Cross-backend leakage guard: HuggingFace identifiers ("org/repo") - # and OpenAI-style aliases ("text-embedding-3-small") share the same - # ``EMBEDDING_MODEL`` env var in most setups. When a user switches - # ``embedding.backend`` from api → local without also retargeting - # the env var, the api alias flows into LocalEmbeddingClient and - # produces a confusing 404 from HuggingFace at first embed. Drop - # the override when it's clearly mis-shaped for the active backend. - raw_model = (emb_config.get("model") or "").strip() - embedder_kwargs: dict = {} - if raw_model and not raw_model.startswith("${"): - looks_hf = "/" in raw_model - if backend == "local" and not looks_hf: - logger.warning( - "Ignoring embedding.model=%r for backend=local (does not " - "look like a HuggingFace identifier). Falling back to the " - "LocalEmbeddingClient default.", - raw_model, - ) - else: - embedder_kwargs["model"] = raw_model - if backend == "api": - base_url = emb_config.get("base_url") or "" - api_key = emb_config.get("api_key") or "" - if not base_url: - raise ValueError( - "embedding.backend='api' requires embedding.base_url in " - "dsagt_config.yaml. Either set it to your OpenAI-compatible " - "endpoint, or change backend to 'local'." - ) - if not api_key or api_key.startswith("${"): - raise ValueError( - "embedding.backend='api' requires embedding.api_key in " - "dsagt_config.yaml. Either fill it in (or export the " - "${EMBEDDING_API_KEY} env var), or change backend to 'local'." - ) - embedder_kwargs.update({"base_url": base_url, "api_key": api_key}) - - from dsagt.observability import init_tracing, configure_litellm_retries - init_tracing("dsagt-knowledge-server") # session_id picked up from DSAGT_SESSION_ID env - configure_litellm_retries() - - runtime_kb_dir = setup_runtime_kb(REGISTRY_DIR / "kb_index", project_dir) - - logger.info("Knowledge backend: %s", backend) - kb = KnowledgeBase( - index_dir=runtime_kb_dir, - chunk_size=kb_config["chunk_size"], - default_rerank=kb_config["rerank"], - default_embedder=backend, - default_index=kb_config["vector_db"], - embedder_kwargs=embedder_kwargs, - ) - # Background-load the embedder so the model is ready when the - # agent's first kb call lands. Without this, the first call pays - # the ~5–10s sentence-transformers import + SentenceTransformer - # construction cost, which looks like a hang to the operator. - kb.preload_default_embedder() - - server = create_knowledge_server(kb, runtime_dir=str(project_dir)) - try: - asyncio.run(_run_stdio(server, "knowledge")) - finally: - kb.close() - - -if __name__ == "__main__": - main() diff --git a/src/dsagt/commands/registry_server.py b/src/dsagt/commands/registry_server.py deleted file mode 100644 index 4202811..0000000 --- a/src/dsagt/commands/registry_server.py +++ /dev/null @@ -1,777 +0,0 @@ -""" -DSAgt Registry MCP Server. - -Provides tools for building a tool registry by reading documentation, -fetching web resources, and running commands to extract tool specifications. - -Tool specs are saved as skill markdown files in the runtime skills directory -and indexed into a ChromaDB collection for semantic search. - -Server configuration (embedding credentials) flows through env vars -(LLM_API_KEY, OPENAI_BASE_URL, EMBEDDING_MODEL) set by dsagt start. - -Usage: - dsagt-registry-server --runtime-dir ./my_session -""" - -import asyncio -import json -import logging -import os -import subprocess -import sys -from functools import partial -from pathlib import Path - -import httpx -import yaml - -import mcp.server.stdio -import mcp.types as types -from mcp.server.lowlevel import Server, NotificationOptions -from mcp.server.models import InitializationOptions - -from dsagt.knowledge import KnowledgeBase -from dsagt.observability import ( - obs, - registry_install_deps_span, - registry_reconstruct_pipeline_span, - registry_save_tool_span, -) -from dsagt.provenance import reconstruct_pipeline -from dsagt.registry import ( - SKILLS_COLLECTION, - TOOLS_COLLECTION, - SkillRegistry, - ToolRegistry, -) - -os.environ["PYTHONUNBUFFERED"] = "1" - -logger = logging.getLogger(__name__) - - -# --------------------------------------------------------------------------- -# MCP server helpers -# --------------------------------------------------------------------------- - -async def _run_stdio(server: Server, name: str): - async with mcp.server.stdio.stdio_server() as (read_stream, write_stream): - await server.run( - read_stream, write_stream, - InitializationOptions( - server_name=name, server_version="0.1.0", - capabilities=server.get_capabilities( - notification_options=NotificationOptions(), - experimental_capabilities={}, - ), - ), - ) - - -def _install_dependencies(packages: list[str], timeout: int = 120) -> str: - """Install packages using uv pip install. Returns a status string.""" - cmd = ["uv", "pip", "install", "--python", sys.executable] + packages - try: - result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout) - if result.returncode == 0: - output = result.stdout.strip() - return f"Successfully installed: {', '.join(packages)}\n{output}" - else: - return ( - f"Installation failed (exit code {result.returncode}):\n" - f"{result.stderr.strip()}" - ) - except subprocess.TimeoutExpired: - return f"Installation timed out after {timeout}s for: {', '.join(packages)}" - except FileNotFoundError: - return "Error: 'uv' command not found. Install uv: https://github.com/astral-sh/uv" - - -# --------------------------------------------------------------------------- -# Per-tool handlers (module-level, explicit dependencies) -# --------------------------------------------------------------------------- - -async def _handle_read_file(arguments: dict) -> str: - path = Path(arguments["path"]) - try: - return path.read_text() - except (FileNotFoundError, PermissionError, IsADirectoryError, - OSError, UnicodeDecodeError) as e: - return f"Error reading file: {e}" - - -async def _handle_http_request(arguments: dict) -> str: - url = arguments["url"] - method = arguments.get("method", "GET") - headers = arguments.get("headers", {}) - try: - async with httpx.AsyncClient(follow_redirects=True) as client: - response = await client.request( - method=method, url=url, headers=headers, timeout=30.0, - ) - return f"Status: {response.status_code}\n\n{response.text}" - except (httpx.HTTPError, httpx.InvalidURL) as e: - return f"Error making request: {e}" - - -async def _handle_run_command(arguments: dict) -> str: - command = arguments["command"] - args = arguments.get("args", []) - timeout = arguments.get("timeout", 10) - try: - result = subprocess.run( - [command] + args, - capture_output=True, text=True, timeout=timeout, - ) - except subprocess.TimeoutExpired: - return f"Command timed out after {timeout} seconds" - except FileNotFoundError: - return f"Command '{command}' not found" - - output = "" - if result.stdout: - output += f"STDOUT:\n{result.stdout}\n" - if result.stderr: - output += f"STDERR:\n{result.stderr}\n" - output += f"\nReturn code: {result.returncode}" - return output - - -async def _handle_save_tool_spec( - arguments: dict, *, registry: ToolRegistry, -) -> str: - spec = arguments["spec"] - # Some MCP clients (notably Claude Sonnet/Haiku 4.x) serialize nested - # object args as JSON strings instead of objects. Accept both shapes. - if isinstance(spec, str): - try: - spec = json.loads(spec) - except json.JSONDecodeError as e: - return f"Error: spec must be a JSON object (or string-encoded JSON object): {e}" - with registry_save_tool_span(spec.get("name")): - obs.set("language", spec.get("language")) - obs.set("n_dependencies", len(spec.get("dependencies") or [])) - obs.set("n_tags", len(spec.get("tags") or [])) - try: - action = registry.save_tool(spec) - except (KeyError, ValueError, OSError) as e: - obs.event("save_tool_failed", error=str(e)[:256]) - return f"Error saving tool spec: {e}" - - tool_count = len(registry.list_tools_raw()) - obs.set("action", action) - obs.set("registry_size", tool_count) - message = ( - f"Tool '{spec['name']}' {action} successfully. " - f"Registry now contains {tool_count} tools." - ) - deps = spec.get("dependencies", []) - if deps: - with registry_install_deps_span(deps): - dep_result = _install_dependencies(deps) - if dep_result.startswith("Successfully installed:"): - obs.set("status", "ok") - else: - obs.set("status", "failed") - obs.event("install_failed", message=dep_result[:256]) - message += f"\n\nDependency installation:\n{dep_result}" - return message - - -async def _handle_save_skill( - arguments: dict, *, skill_registry: SkillRegistry, -) -> str: - """Register a skill (workflow / agent instructions) for later reuse. - - Symmetric with save_tool_spec — writes SKILL.md to - ``/skills//`` and indexes it into - ``registered_skills`` so future ``search_skills`` calls find it. - """ - spec = arguments["spec"] - if isinstance(spec, str): - try: - spec = json.loads(spec) - except json.JSONDecodeError as e: - return f"Error: spec must be a JSON object (or string-encoded JSON object): {e}" - body = arguments.get("body") - reference_files = arguments.get("reference_files") - if isinstance(reference_files, str): - try: - reference_files = json.loads(reference_files) - except json.JSONDecodeError as e: - return f"Error: reference_files must be a JSON object: {e}" - try: - action = skill_registry.save_skill(spec, body=body, reference_files=reference_files) - except (KeyError, ValueError, OSError) as e: - return f"Error saving skill: {e}" - skill_count = len(skill_registry.list_skills()) - return ( - f"Skill '{spec['name']}' {action} successfully. " - f"Registry now contains {skill_count} skills." - ) - - -async def _handle_get_registry( - arguments: dict, *, registry: ToolRegistry, -) -> str: - tools = registry.list_tools_raw() - if not tools: - return "Registry is empty. No tools registered yet." - return yaml.dump({"tools": tools}, default_flow_style=False, sort_keys=False) - - -async def _handle_search_registry( - arguments: dict, *, registry: ToolRegistry, kb: KnowledgeBase | None, -) -> str: - tool_name = arguments.get("tool_name") - query = arguments.get("query", "") - tag = arguments.get("tag") - top_k = arguments.get("top_k", 10) - - if tool_name: - tool = registry.get_tool(tool_name) - if tool: - return ( - f"Found tool '{tool_name}':\n\n" - + yaml.dump(tool, default_flow_style=False, sort_keys=False) - ) - return f"No tool named '{tool_name}'." - - if kb is None: - return ( - "search_registry requires a configured knowledge base " - "(set embedding.api_key + embedding.base_url + embedding.model " - "in dsagt_config.yaml). Use search_registry with an exact " - "tool_name for KB-free lookups." - ) - - # Single ``tools`` collection — bundled and registered entries - # coexist, distinguished by ``metadata.source`` if needed. - results = kb.search( - query=query or "tool", - collection=TOOLS_COLLECTION, - top_k=top_k * 3 if tag else top_k, - ) - if tag and results: - results = [ - r for r in results - if tag in r.get("chunk", {}).get("metadata", {}).get("tags", "") - ][:top_k] - if not results: - return "No tools found matching the query." - - summaries = [] - for r in results: - chunk = r.get("chunk", {}) - meta = chunk.get("metadata", {}) - summaries.append( - f"- **{meta.get('tool_name', 'unknown')}** " - f"(score: {r.get('score', 0):.2f})\n" - f" {chunk.get('text', '')[:200]}" - ) - return f"Found {len(results)} tool(s):\n\n" + "\n\n".join(summaries) - - -async def _handle_search_skills( - arguments: dict, - *, - kb: KnowledgeBase | None, - skill_registry: SkillRegistry | None, -) -> str: - skill_name = arguments.get("skill_name") - query = arguments.get("query", "") - tag = arguments.get("tag") - top_k = arguments.get("top_k", 10) - - if skill_name and skill_registry: - skill = skill_registry.get_skill(skill_name) - if skill: - return ( - f"Found skill '{skill_name}':\n\n" - + yaml.dump(skill, default_flow_style=False, sort_keys=False) - ) - return f"No skill named '{skill_name}'." - - if kb is None: - return ( - "search_skills requires a configured knowledge base " - "(set embedding.api_key + embedding.base_url + embedding.model " - "in dsagt_config.yaml). Use search_skills with an exact " - "skill_name for KB-free lookups." - ) - - # Single ``skills`` collection — bundled and registered entries. - results = kb.search( - query=query or "skill", - collection=SKILLS_COLLECTION, - top_k=top_k * 3 if tag else top_k, - ) - if tag and results: - results = [ - r for r in results - if tag in r.get("chunk", {}).get("metadata", {}).get("tags", "") - ][:top_k] - if not results: - return "No skills found matching the query." - - summaries = [] - for r in results: - chunk = r.get("chunk", {}) - meta = chunk.get("metadata", {}) - summaries.append( - f"- **{meta.get('skill_name', 'unknown')}** " - f"(score: {r.get('score', 0):.2f})\n" - f" {chunk.get('text', '')[:200]}" - ) - return f"Found {len(results)} skill(s):\n\n" + "\n\n".join(summaries) - - -async def _handle_reconstruct_pipeline( - arguments: dict, *, runtime_dir: Path, -) -> str: - fmt = arguments.get("format", "bash") - trace_dir = runtime_dir / "trace_archive" - with registry_reconstruct_pipeline_span(fmt): - try: - script = reconstruct_pipeline(trace_dir, fmt=fmt) - except (FileNotFoundError, ValueError, OSError) as e: - obs.event("reconstruct_failed", error=str(e)[:256]) - return f"Error reconstructing pipeline: {e}" - obs.set("output_chars", len(script)) - return script - - -async def _handle_install_dependencies( - arguments: dict, *, registry: ToolRegistry, -) -> str: - tool_name = arguments.get("tool_name") - tools = registry.list_tools_raw() - if not tools: - return "Registry is empty. No tools registered yet." - - all_deps = [] - tools_with_deps = [] - for tool in tools: - if tool_name and tool.get("name") != tool_name: - continue - tool_deps = tool.get("dependencies", []) - if tool_deps: - all_deps.extend(tool_deps) - tools_with_deps.append(tool["name"]) - - if not all_deps: - scope = f"tool '{tool_name}'" if tool_name else "registry" - return f"No dependencies declared in {scope}." - - seen = set() - unique_deps = [d for d in all_deps if not (d in seen or seen.add(d))] - - with registry_install_deps_span(unique_deps): - obs.set("scope_tool", tool_name) - obs.set("n_tools_with_deps", len(tools_with_deps)) - result = _install_dependencies(unique_deps) - if result.startswith("Successfully installed:"): - obs.set("status", "ok") - else: - obs.set("status", "failed") - obs.event("install_failed", message=result[:256]) - return f"Installing dependencies for: {', '.join(tools_with_deps)}\n\n{result}" - - -# --------------------------------------------------------------------------- -# Server factory (thin wiring — used by main() and tests) -# --------------------------------------------------------------------------- - -def create_registry_server( - registry: ToolRegistry, - kb: KnowledgeBase | None = None, - skill_registry: SkillRegistry | None = None, -): - """Create and configure the MCP registry server. - - Test-facing API: tests call with a mock registry and get back a server - they can drive via call_tool_sync(). main() constructs the registry - and KB from config before calling this. - """ - server = Server("registry") - runtime_dir = Path(registry.runtime_dir) - - # Dispatch table — maps MCP tool names to handler functions with - # dependencies bound via functools.partial. - handlers = { - "read_file": _handle_read_file, - "http_request": _handle_http_request, - "run_command": _handle_run_command, - "save_tool_spec": partial(_handle_save_tool_spec, registry=registry), - "save_skill": partial(_handle_save_skill, skill_registry=skill_registry), - "get_registry": partial(_handle_get_registry, registry=registry), - "search_registry": partial(_handle_search_registry, registry=registry, kb=kb), - "search_skills": partial(_handle_search_skills, kb=kb, skill_registry=skill_registry), - "reconstruct_pipeline": partial(_handle_reconstruct_pipeline, runtime_dir=runtime_dir), - "install_dependencies": partial(_handle_install_dependencies, registry=registry), - } - - @server.list_tools() - async def list_tools() -> list[types.Tool]: - return [ - types.Tool( - name="read_file", - description="Read contents of a text file", - inputSchema={ - "type": "object", - "properties": { - "path": {"type": "string", "description": "Path to the file to read"}, - }, - "required": ["path"], - }, - ), - types.Tool( - name="http_request", - description="Make an HTTP request to fetch documentation or API specs", - inputSchema={ - "type": "object", - "properties": { - "url": {"type": "string", "description": "URL to request"}, - "method": {"type": "string", "description": "HTTP method", "default": "GET"}, - "headers": {"type": "object", "description": "Optional headers"}, - }, - "required": ["url"], - }, - ), - types.Tool( - name="run_command", - description="Execute a command to get help/usage information", - inputSchema={ - "type": "object", - "properties": { - "command": {"type": "string", "description": "Command to execute"}, - "args": { - "type": "array", - "items": {"type": "string"}, - "description": "Arguments (e.g., ['--help'])", - "default": [], - }, - "timeout": {"type": "number", "default": 10}, - }, - "required": ["command"], - }, - ), - types.Tool( - name="save_tool_spec", - description="Save a tool specification to the registry as a skill file", - inputSchema={ - "type": "object", - "properties": { - # ``anyOf`` accepts both a structured object and a - # JSON-encoded string. Some MCP clients (notably - # Claude Sonnet/Haiku 4.x) serialize nested object - # arguments as JSON strings instead of objects; the - # handler unwraps either shape. - "spec": { - "description": "Tool specification (object or JSON-encoded string)", - "anyOf": [ - { - "type": "object", - "properties": { - "name": {"type": "string", "description": "Unique tool identifier"}, - "description": {"type": "string", "description": "What the tool does"}, - "executable": {"type": "string", "description": "Command to execute"}, - "parameters": { - "type": "object", - "description": "Parameter definitions", - "additionalProperties": { - "type": "object", - "properties": { - "type": {"type": "string", "description": "Parameter type"}, - "required": {"type": "boolean"}, - "description": {"type": "string"}, - "default": {"description": "Default value"}, - "cli": { - "type": "string", - "description": ( - "How to render this parameter on the command line: " - "'positional[:N]' for positional args, '--name' or '-n' " - "for spaced flags, '--name=' or '-n=' for glued flags, " - "'key=' for dd-style key=value. Defaults to '--' " - "if omitted." - ), - }, - }, - "required": ["type", "description"], - }, - }, - "dependencies": { - "type": "array", - "items": {"type": "string"}, - "description": "Python packages to install", - }, - "tags": { - "type": "array", - "items": {"type": "string"}, - "description": "Tags for categorizing the tool", - }, - }, - "required": ["name", "description", "executable", "parameters"], - }, - {"type": "string"}, - ], - }, - }, - "required": ["spec"], - }, - ), - types.Tool( - name="save_skill", - description=( - "Register a skill (agent workflow / instructions) into " - "/skills//SKILL.md and index it into the " - "registered_skills KB collection. Symmetric with " - "save_tool_spec — use this when you've designed a " - "reusable instruction set you want future sessions to " - "discover via search_skills." - ), - inputSchema={ - "type": "object", - "properties": { - # ``anyOf`` for spec mirrors save_tool_spec — accept - # both structured object and JSON-encoded string for - # MCP clients that serialize nested args. - "spec": { - "description": "Skill spec (object or JSON-encoded string)", - "anyOf": [ - { - "type": "object", - "properties": { - "name": {"type": "string", "description": "Unique skill identifier (becomes the directory name)"}, - "description": {"type": "string", "description": "What the skill does / when to use it"}, - "tags": { - "type": "array", - "items": {"type": "string"}, - "description": "Tags for categorizing the skill", - }, - }, - "required": ["name", "description"], - }, - {"type": "string"}, - ], - }, - "body": { - "type": "string", - "description": ( - "Markdown body of the SKILL.md (workflow / " - "instructions the agent will follow). When " - "updating an existing skill, omit to preserve " - "the existing body." - ), - }, - "reference_files": { - "description": ( - "Optional additional files to write into the " - "skill directory. Object mapping relative " - "path -> file contents, or JSON-encoded string." - ), - "anyOf": [ - {"type": "object", "additionalProperties": {"type": "string"}}, - {"type": "string"}, - ], - }, - }, - "required": ["spec"], - }, - ), - types.Tool( - name="get_registry", - description="Get all tools from the registry", - inputSchema={"type": "object", "properties": {}}, - ), - types.Tool( - name="search_registry", - description="Search for tools by name, tag, or description via semantic search.", - inputSchema={ - "type": "object", - "properties": { - "query": {"type": "string", "description": "Search query"}, - "tag": {"type": "string", "description": "Filter by tag"}, - "tool_name": {"type": "string", "description": "Exact tool name lookup"}, - "top_k": {"type": "integer", "default": 10}, - }, - }, - ), - types.Tool( - name="search_skills", - description="Search for agent skills (workflows, templates) by name, tag, or description.", - inputSchema={ - "type": "object", - "properties": { - "query": {"type": "string", "description": "Search query"}, - "tag": {"type": "string", "description": "Filter by tag"}, - "skill_name": {"type": "string", "description": "Exact skill name lookup"}, - "top_k": {"type": "integer", "default": 10}, - }, - }, - ), - types.Tool( - name="reconstruct_pipeline", - description="Reconstruct a reproducible pipeline script from tool execution records.", - inputSchema={ - "type": "object", - "properties": { - "format": { - "type": "string", - "enum": ["bash", "snakemake"], - "default": "bash", - }, - }, - }, - ), - types.Tool( - name="install_dependencies", - description="Install Python dependencies for one or all tools in the registry.", - inputSchema={ - "type": "object", - "properties": { - "tool_name": {"type": "string", "description": "Install deps for a specific tool (omit for all)"}, - }, - }, - ), - ] - - @server.call_tool() - async def call_tool(name: str, arguments: dict) -> list[types.TextContent]: - handler = handlers[name] # KeyError = bug in list_tools schema - text = await handler(arguments) - return [types.TextContent(type="text", text=text)] - - return server - - -# --------------------------------------------------------------------------- -# Entry point -# --------------------------------------------------------------------------- - -def main(): - """Entry point for dsagt-registry-server. - - All configuration comes from the project directory: - - ``./dsagt_config.yaml`` → project path + name - - ``LLM_API_KEY``, ``OPENAI_BASE_URL`` env vars → embedding credentials - - No CLI arguments. By contract the agent's launch one-liner is - ``cd && ``, so cwd is project_dir for the MCP - children it spawns. - """ - import logging as _logging - from dsagt.observability import find_project_config - - project_dir, _cfg = find_project_config() - if project_dir is None: - raise RuntimeError( - "dsagt-registry-server: no dsagt_config.yaml in cwd " - f"({Path.cwd()}). Launch the agent from the project " - "directory (`cd && `)." - ) - - log_file = project_dir / "dsagt_registry_server.log" - # Default INFO; users opt into DEBUG via DSAGT_LOG_LEVEL=DEBUG. At DEBUG, - # transitive libraries (httpcore, urllib3, llama_index, chromadb) flood - # stderr with one line per network operation — when an agent like roo - # pipes the MCP server's stderr into its own debug stream, the human - # output gets buried under thousands of low-value lines. - _level_name = os.environ.get("DSAGT_LOG_LEVEL", "INFO").upper() - _level = getattr(_logging, _level_name, _logging.INFO) - _logging.basicConfig( - level=_level, - format="%(asctime)s %(levelname)s %(name)s: %(message)s", - handlers=[ - _logging.FileHandler(log_file, mode="a"), - _logging.StreamHandler(), - ], - ) - log = _logging.getLogger(__name__) - log.info("Server starting — log file: %s", log_file) - - config_path = project_dir / "dsagt_config.yaml" - from dsagt.session import resolve_env_vars - config = resolve_env_vars(yaml.safe_load(config_path.read_text())) - - emb_config = config["embedding"] - - from dsagt.observability import init_tracing, configure_litellm_retries - init_tracing("dsagt-registry-server") # session_id picked up from DSAGT_SESSION_ID env - configure_litellm_retries() - - # The KB is optional for the registry server — most tools (save_tool_spec, - # get_registry, read_file, run_command, etc.) work without it. Only - # search_registry and search_skills need the KB for semantic search; - # they return a clear error if the KB is None. - backend = (emb_config.get("backend") or "local").lower() - # Cross-backend leakage guard: see the matching block in - # knowledge_server.py for the rationale. In short: when - # ``embedding.backend`` is ``local`` but the resolved model name is - # an OpenAI-style alias (no slash), drop the override so we fall - # back to the LocalEmbeddingClient default rather than 404 from HF. - raw_model = (emb_config.get("model") or "").strip() - embedder_kwargs: dict = {} - if raw_model and not raw_model.startswith("${"): - looks_hf = "/" in raw_model - if backend == "local" and not looks_hf: - log.warning( - "Ignoring embedding.model=%r for backend=local (does not " - "look like a HuggingFace identifier). Falling back to the " - "LocalEmbeddingClient default.", - raw_model, - ) - else: - embedder_kwargs["model"] = raw_model - if backend == "local": - kb_available = True - else: # backend == "api" - api_key = emb_config.get("api_key") or "" - kb_available = ( - api_key and not api_key.startswith("${") - and emb_config.get("base_url") - ) - embedder_kwargs.update({ - "base_url": emb_config.get("base_url") or "", - "api_key": api_key, - }) - - kb = None - if kb_available: - kb = KnowledgeBase( - index_dir=project_dir / "kb_index", - default_embedder=backend, - default_index=config["knowledge"]["vector_db"], - embedder_kwargs=embedder_kwargs, - ) - # Background-load the embedder so the model is ready when the - # agent's first search_registry / save_tool_spec call lands. - # See knowledge_server for the same rationale. - kb.preload_default_embedder() - - registry = ToolRegistry( - source_tools_dir=None, - runtime_dir=str(project_dir), - kb=kb, - ) - - skill_reg = SkillRegistry( - source_skills_dir=None, - runtime_dir=str(project_dir), - kb=kb, - ) - - # Bundled tools/skills are pre-embedded in the shared - # ~/dsagt-projects/kb_index/ by ``dsagt setup-kb`` (or by the auto-bootstrap - # in ``dsagt start``) and COPIED into the project's kb_index by - # ``setup_runtime_kb`` before either MCP server spawns. This server - # does no embedding work for bundled content at startup; agent's - # save_tool_spec / save_skill incur a single embed at save time. - - server = create_registry_server(registry, kb, skill_reg) - asyncio.run(_run_stdio(server, "registry")) - - -if __name__ == "__main__": - main() diff --git a/src/dsagt/commands/setup_core_kb.py b/src/dsagt/commands/setup_core_kb.py index 9cf977c..f1f2bb4 100644 --- a/src/dsagt/commands/setup_core_kb.py +++ b/src/dsagt/commands/setup_core_kb.py @@ -22,7 +22,6 @@ import os import shutil import subprocess -import sys import tarfile import tempfile from pathlib import Path @@ -135,7 +134,9 @@ } -def clone_github(url: str, dest: Path, branch: str = "main", include: list[str] | None = None): +def clone_github( + url: str, dest: Path, branch: str = "main", include: list[str] | None = None +): """Clone a GitHub repo, optionally keeping only specific directories. When *include* is set, the named subdirectories are copied AND any @@ -175,7 +176,7 @@ def clone_github(url: str, dest: Path, branch: str = "main", include: list[str] def download_arxiv(paper_id: str, dest: Path): """Download arXiv paper (source if available, else PDF).""" client = httpx.Client(timeout=60.0, follow_redirects=True) - + try: # Try source tarball first response = client.get(f"https://arxiv.org/e-print/{paper_id}") @@ -190,7 +191,7 @@ def download_arxiv(paper_id: str, dest: Path): return except tarfile.ReadError: tar_path.unlink() - + # Fall back to PDF response = client.get(f"https://arxiv.org/pdf/{paper_id}.pdf") response.raise_for_status() @@ -248,6 +249,7 @@ def setup_collection( owned_kb = kb is None if owned_kb: from dsagt.knowledge import KnowledgeBase + kb = KnowledgeBase( index_dir=index_dir, default_embedder=embedding_backend, @@ -257,7 +259,8 @@ def setup_collection( try: result = kb.ingest( download_dir, - exclude_patterns=config.get("exclude_patterns") or DEFAULT_EXCLUDE_PATTERNS, + exclude_patterns=config.get("exclude_patterns") + or DEFAULT_EXCLUDE_PATTERNS, ) skipped = result.get("skipped_files", 0) miss_msg = f", {skipped} file misses" if skipped else "" @@ -275,6 +278,7 @@ def _current_dsagt_version() -> str: """Return the installed dsagt package version, or ``"unknown"`` if absent.""" try: from importlib.metadata import version + return version("dsagt") except Exception: return "unknown" @@ -310,15 +314,18 @@ def add_setup_kb_args(parser): ), ) parser.add_argument( - "--embedding-model", default=None, + "--embedding-model", + default=None, help="Embedding model name (falls back to EMBEDDING_MODEL env var)", ) parser.add_argument( - "--embedding-base-url", default=None, + "--embedding-base-url", + default=None, help="Embedding API base URL (falls back to OPENAI_BASE_URL env var)", ) parser.add_argument( - "--embedding-api-key", default=None, + "--embedding-api-key", + default=None, help="Embedding API key (falls back to LLM_API_KEY / OPENAI_API_KEY env var)", ) parser.add_argument( @@ -333,6 +340,12 @@ def add_setup_kb_args(parser): help="Re-ingest collections that already exist in the index directory " "(default: skip existing).", ) + parser.add_argument( + "--no-skill-catalog", + action="store_true", + help="Skip cloning + indexing the default external skill catalog " + "(the K-Dense scientific skills repo).", + ) def run_setup_kb(args): @@ -352,6 +365,7 @@ def run_setup_kb(args): # ``force``, the second basicConfig is a no-op because the root # logger already has handlers, and the INFO-level chatter survives. import logging as _logging + _logging.basicConfig( level=_logging.WARNING, format="%(levelname)s: %(message)s", @@ -368,10 +382,18 @@ def run_setup_kb(args): # clear error up front rather than 5 minutes into the first ingest. embedder_kwargs: dict = {} if args.embedding_backend == "api": - api_key = args.embedding_api_key or os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY") + api_key = ( + args.embedding_api_key + or os.getenv("LLM_API_KEY") + or os.getenv("OPENAI_API_KEY") + ) base_url = args.embedding_base_url or os.getenv("OPENAI_BASE_URL") model = args.embedding_model or os.getenv("EMBEDDING_MODEL") - missing = [n for n, v in [("api key", api_key), ("base URL", base_url), ("model", model)] if not v] + missing = [ + n + for n, v in [("api key", api_key), ("base URL", base_url), ("model", model)] + if not v + ] if missing: raise ValueError( "API embedding backend requires " @@ -388,6 +410,7 @@ def run_setup_kb(args): # to, so we skip init_tracing entirely. @traced decorators inside # KnowledgeBase see no backend and short-circuit cleanly. from dsagt.observability import configure_litellm_retries + configure_litellm_retries() # One KnowledgeBase per setup-kb invocation. The embedder cache @@ -399,9 +422,11 @@ def run_setup_kb(args): args.index_dir.mkdir(parents=True, exist_ok=True) from dsagt.knowledge import KnowledgeBase from dsagt.registry import ( - SKILLS_COLLECTION, TOOLS_COLLECTION, - ToolRegistry, SkillRegistry, _parse_frontmatter, + TOOLS_COLLECTION, + ToolRegistry, + _parse_frontmatter, ) + shared_kb = KnowledgeBase( index_dir=args.index_dir, default_embedder=args.embedding_backend, @@ -409,69 +434,85 @@ def run_setup_kb(args): embedder_kwargs=embedder_kwargs or {}, ) try: - # Bundled tools + skills: each spec file is a single chunk with - # rich metadata. Wipe-and-rebuild every run — there's no - # version sentinel, so the user controls when this happens. + # Bundled tools: each spec file is a single chunk with rich + # metadata. Wipe-and-rebuild every run — there's no version + # sentinel, so the user controls when this happens. Bundled + # *skills* are no longer indexed: every supported agent natively + # auto-discovers installed/bundled SKILL.md folders, so skill + # search covers only the external *catalog* tier below. current_version = _current_dsagt_version() tool_paths = [ - p for p in sorted(ToolRegistry._PACKAGE_TOOLS_DIR.glob("*.md")) + p + for p in sorted(ToolRegistry._PACKAGE_TOOLS_DIR.glob("*.md")) if _parse_frontmatter(p).get("name") ] - skill_dirs = [ - d for d in sorted(SkillRegistry._PACKAGE_SKILLS_DIR.iterdir()) - if d.is_dir() - and (d / "SKILL.md").exists() - and _parse_frontmatter(d / "SKILL.md").get("name") - ] - for name in (TOOLS_COLLECTION, SKILLS_COLLECTION): - coll_dir = args.index_dir / name - if coll_dir.exists(): - shutil.rmtree(coll_dir) + coll_dir = args.index_dir / TOOLS_COLLECTION + if coll_dir.exists(): + shutil.rmtree(coll_dir) if tool_paths: tool_specs = [_parse_frontmatter(p) for p in tool_paths] shared_kb.add_entries( texts=[p.read_text() for p in tool_paths], collection=TOOLS_COLLECTION, - metadatas=[{ - "tool_name": s["name"], - "tags": ",".join(s.get("tags", [])), - "executable": s.get("executable", ""), - "has_dependencies": str(bool(s.get("dependencies"))), - "source": "bundled", - "dsagt_version": current_version, - } for s in tool_specs], + metadatas=[ + { + "tool_name": s["name"], + "tags": ",".join(s.get("tags", [])), + "executable": s.get("executable", ""), + "has_dependencies": str(bool(s.get("dependencies"))), + "source": "bundled", + "dsagt_version": current_version, + } + for s in tool_specs + ], ) - if skill_dirs: - skill_specs = [_parse_frontmatter(d / "SKILL.md") for d in skill_dirs] - shared_kb.add_entries( - texts=[(d / "SKILL.md").read_text() for d in skill_dirs], - collection=SKILLS_COLLECTION, - metadatas=[{ - "skill_name": s["name"], - "tags": ",".join(s.get("tags", [])), - "source": "bundled", - "dsagt_version": current_version, - } for s in skill_specs], - ) - - print(" bundled tools + skills: indexed", flush=True) - - collections = {args.collection: COLLECTIONS[args.collection]} if args.collection else COLLECTIONS + print(" bundled tools: indexed", flush=True) + + # External skill catalog: clone + index the default source(s) so + # ``search_skills`` can browse installable skills out of the box. + # Best-effort — a clone failure (offline, repo moved) warns and + # continues rather than aborting the whole KB build. + if not getattr(args, "no_skill_catalog", False): + from dsagt.skills import sync_source + from dsagt.session import DEFAULTS + + for src in DEFAULTS["skills"]["sources"]: + try: + stats = sync_source(src, kb=shared_kb, force=args.rebuild) + print( + f" skill catalog {stats['slug']}: {stats['indexed']} indexed", + flush=True, + ) + except Exception as e: # noqa: BLE001 — best-effort, keep going + print( + f" skill catalog {src.get('url', src)}: skipped ({e})", + flush=True, + ) + + collections = ( + {args.collection: COLLECTIONS[args.collection]} + if args.collection + else COLLECTIONS + ) for name, config in collections.items(): target_dir = args.index_dir / name if _collection_exists(target_dir): if not args.rebuild: - print(f" {name}: already indexed (use --rebuild to force)", - flush=True) + print( + f" {name}: already indexed (use --rebuild to force)", + flush=True, + ) continue shutil.rmtree(target_dir) setup_collection( - name, config, args.index_dir, + name, + config, + args.index_dir, embedder_kwargs=embedder_kwargs, embedding_backend=args.embedding_backend, vector_db=args.vector_db, diff --git a/src/dsagt/dsagt_instructions.md b/src/dsagt/dsagt_instructions.md index d2bcfd6..cebfcdd 100644 --- a/src/dsagt/dsagt_instructions.md +++ b/src/dsagt/dsagt_instructions.md @@ -25,6 +25,17 @@ Before implementing anything, search for existing capabilities: - `search_skills(query)` — find agent skills (workflows, templates, procedures) - `get_registry()` — list all registered tools +**Skills come in two tiers.** *Installed* skills (in this project) are discovered **natively** by your platform — their names/descriptions are already in your context and you auto-invoke them; you do NOT need `search_skills` to find those. Separately there is a much larger *external catalog* of installable skills (entries marked `[catalog]`), NOT loaded into context. + +**The external catalog is opt-in and starts empty — sources must be synced before `search_skills` can see them.** A blank/weak `search_skills` result usually means the relevant source isn't synced yet, NOT that no such skill exists. So before concluding the catalog has nothing, call `list_skill_sources()` — it reports each known source with its `synced` flag and `indexed` count. The flow: + +1. `list_skill_sources()` — see which sources are already synced vs only `available` (known name + URL, not yet indexed). For materials/chem/bio/DFT skills, the `scientific` source (K-Dense) is the one to enable. +2. `add_skill_source(source=...)` — sync a source (a known name like `scientific`/`anthropic`, or a GitHub URL). Read-only indexing step; nothing is installed into the project. Only needed for sources whose `synced` is false. +3. `search_skills(query)` — now browse the synced catalog. Entries marked `[catalog]` are installable. +4. `install_skill(skill_name=...)` — copy a catalog skill into the project. Its SKILL.md + scripts land on disk immediately, so you can **use it this session** by reading `skills//SKILL.md` and following it. A restart (next `dsagt start`) is only needed for hands-free *native* auto-invocation, not for use. + +To author a brand-new skill instead of installing one, use the bundled `skill-creator` skill. + **When the user indicates they want a specific tool used** — phrasings like "use tool `foo`", "use `foo` from the registry", "run `foo`", or similar — look it up first (`search_registry(tool_name=...)` for exact match, `get_registry()` to browse). Read the returned spec's `executable` field and each parameter's `cli` field, then invoke via your shell. Do not substitute your own file/shell tools for a task a registered tool can do. (See section 1b for the verbatim-`executable` rule.) **Rendering parameters**: each parameter's `cli` field pins exactly how its value goes on the command line. Emit positional args first (in position order), then named args. Skip optional parameters whose value is absent; use the `default` when present. diff --git a/src/dsagt/mcp/__init__.py b/src/dsagt/mcp/__init__.py new file mode 100644 index 0000000..8fcb345 --- /dev/null +++ b/src/dsagt/mcp/__init__.py @@ -0,0 +1,12 @@ +"""DSAGT MCP server package — the single merged ``dsagt-server``. + +The MCP tool surface is split by concern across sibling modules: + +* :mod:`dsagt.mcp.registry_tools` — tool registry + execution + provenance +* :mod:`dsagt.mcp.knowledge_tools` — knowledge-base retrieval +* :mod:`dsagt.mcp.memory_tools` — explicit memory + suggestions +* :mod:`dsagt.mcp.skill_tools` — skill search / install / sources + +:mod:`dsagt.mcp.server` composes all four under one ``Server("dsagt")`` and +owns the ``dsagt-server`` entry point + shared-KB startup. +""" diff --git a/src/dsagt/mcp/knowledge_tools.py b/src/dsagt/mcp/knowledge_tools.py new file mode 100644 index 0000000..2ab2adc --- /dev/null +++ b/src/dsagt/mcp/knowledge_tools.py @@ -0,0 +1,672 @@ +"""MCP tools for knowledge-base retrieval. + +Semantic search over document collections, background ingest/append jobs, and +registration of external vector stores. Long-running operations (ingest, +append) run in the background and return immediately with a ``job_id``; poll +``kb_job_status`` for completion. + +Server configuration (chunk_size, vector_db, rerank) is read from the project's +dsagt_config.yaml. Embedding credentials flow through env vars (LLM_API_KEY, +OPENAI_BASE_URL, EMBEDDING_MODEL) set by ``dsagt start``. + +These definitions + handlers run inside the merged ``dsagt-server`` (see +:mod:`dsagt.mcp.server`); ``create_knowledge_server`` is retained only as a +test-facing constructor. Explicit-memory tools (``kb_remember`` / etc.) live in +:mod:`dsagt.mcp.memory_tools`; skill-source tools in +:mod:`dsagt.mcp.skill_tools`. +""" + +import os + +# Prevent fatal OpenMP crash when multiple libraries (FAISS, PyTorch/ +# sentence-transformers) each bundle their own libomp. Must precede the +# ``dsagt.knowledge`` import below. +os.environ.setdefault("KMP_DUPLICATE_LIB_OK", "TRUE") + +import asyncio # noqa: E402 +import logging # noqa: E402 +import time # noqa: E402 +import uuid # noqa: E402 +from dataclasses import dataclass, field # noqa: E402 +from functools import partial # noqa: E402 +from pathlib import Path # noqa: E402 + +import mcp.types as types # noqa: E402 + +from dsagt.knowledge import ( # noqa: E402 + EMBEDDER_REGISTRY, + VECTORINDEX_REGISTRY, + CollectionRoute, + KnowledgeBase, +) +from dsagt.mcp.server import build_dispatch_server # noqa: E402 +from dsagt.session import _collection_exists # noqa: E402 +from dsagt.session import setup_runtime_kb # noqa: E402, F401 (re-exported for tests) + +logger = logging.getLogger(__name__) + + +def _register_external_collection( + kb: KnowledgeBase, + collection_name: str, + vector_db: str, + connection_params: dict, + embedding_model: str, + description: str, +) -> None: + """Wire an already-built external vector store into the routing registry.""" + coll_dir = kb.index_dir / collection_name + coll_dir.mkdir(exist_ok=True) + + if description: + (coll_dir / "DESCRIPTION.md").write_text(description) + + if vector_db == "chroma": + index_kwargs = { + "collection_name": connection_params.get("collection", collection_name), + "persist_dir": None, + "host": connection_params.get("host", "localhost"), + "port": connection_params.get("port", 8000), + } + elif vector_db == "lancedb": + index_kwargs = { + "uri": connection_params["uri"], + "table": connection_params.get("table", collection_name), + } + elif vector_db == "qdrant": + index_kwargs = { + "url": connection_params["url"], + "collection": connection_params.get("collection", collection_name), + "api_key": connection_params.get("api_key"), + } + else: + raise ValueError( + f"Unsupported vector DB '{vector_db}'. " + f"Choose from: chroma, lancedb, qdrant" + ) + + route = CollectionRoute( + embedding_backend="api", + vector_db=vector_db, + embedder_kwargs={"model": embedding_model}, + index_kwargs=index_kwargs, + description=description, + ) + kb.register_route(collection_name, route) + + +# --------------------------------------------------------------------------- +# Background job tracker +# --------------------------------------------------------------------------- + + +@dataclass +class _JobTracker: + """Tracks background ingest/append jobs and their completion state.""" + + jobs: dict[str, dict] = field(default_factory=dict) + active_collections: set[str] = field(default_factory=set) + + def start(self, coro, collection: str | None = None) -> str: + job_id = uuid.uuid4().hex[:8] + self.jobs[job_id] = { + "status": "running", + "result": None, + "error": None, + "collection": collection, + "started_at": time.monotonic(), + "message": "Starting -- embedding documents via API...", + } + if collection: + self.active_collections.add(collection) + + tracker = self # capture for the closure + + async def _run(): + try: + tracker.jobs[job_id]["message"] = "Embedding and indexing documents..." + result = await coro + tracker.jobs[job_id]["status"] = "complete" + tracker.jobs[job_id]["result"] = result + tracker.jobs[job_id]["message"] = "Done." + except Exception as e: + import traceback + + tb = traceback.format_exc() + tracker.jobs[job_id]["status"] = "error" + tracker.jobs[job_id]["error"] = f"{type(e).__name__}: {e}" + tracker.jobs[job_id]["message"] = f"Failed: {type(e).__name__}: {e}" + tracker.jobs[job_id]["traceback"] = tb + logger.error("Job %s failed: %s\n%s", job_id, e, tb) + finally: + if collection: + tracker.active_collections.discard(collection) + + asyncio.get_event_loop().create_task(_run()) + return job_id + + +# --------------------------------------------------------------------------- +# Per-tool handlers (module-level, explicit dependencies) +# +# Each handler takes ``arguments: dict`` plus its dependencies as keyword +# args bound via functools.partial. Handlers return a result dict; the outer +# dispatch wrapper JSON-serializes it. +# --------------------------------------------------------------------------- + + +async def _handle_kb_list_collections(arguments: dict, *, kb: KnowledgeBase) -> dict: + collections = await asyncio.to_thread(kb.list_collections) + return {"status": "ok", "collections": collections, "count": len(collections)} + + +async def _handle_kb_search( + arguments: dict, + *, + kb: KnowledgeBase, +) -> dict: + query = arguments["query"] + top_k = arguments.get("top_k", 5) + rerank = arguments.get("rerank") # None → kb.default_rerank + + collection_arg = arguments.get("collection") + collections_arg = arguments.get("collections") + + if not collection_arg and not collections_arg: + return {"status": "error", "error": "Provide 'collection' or 'collections'"} + + # Build ChromaDB where clause from the filter arguments. ChromaDB + # requires single-filter dicts or $and-wrapped lists; an empty dict + # would be invalid, so we only pass where when there are real filters. + where = { + key: arguments[key] + for key in ("category", "session_id", "source_type", "tool_name") + if arguments.get(key) is not None + } + return_code = arguments.get("return_code") + if return_code is not None: + where["return_code"] = int(return_code) + if len(where) > 1: + where = {"$and": [{k: v} for k, v in where.items()]} + + target_collections = collections_arg or [collection_arg] + all_results = [] + search_errors = [] + + for coll_name in target_collections: + try: + search_kwargs = dict( + query=query, collection=coll_name, top_k=top_k, rerank=rerank + ) + if where: + search_kwargs["where"] = where + coll_results = await asyncio.to_thread(kb.search, **search_kwargs) + all_results.extend(coll_results) + except ValueError as e: + logger.warning("Search failed for '%s': %s", coll_name, e) + search_errors.append(str(e)) + + if search_errors and not all_results: + if len(target_collections) == 1: + return {"status": "error", "error": search_errors[0]} + return { + "status": "error", + "error": f"All collections failed: {'; '.join(search_errors)}", + } + + score_key = "rerank_score" if rerank else "score" + all_results.sort(key=lambda r: r.get(score_key, r["score"]), reverse=True) + all_results = all_results[:top_k] + + result = { + "status": "ok", + "query": query, + "collection": collection_arg or ",".join(collections_arg), + "result_count": len(all_results), + "results": [ + { + "text": r["chunk"]["text"], + "score": r["score"], + "rerank_score": r.get("rerank_score"), + "source_file": r["chunk"]["metadata"].get("source_file", ""), + "chunk_index": r["chunk"]["metadata"].get("chunk_index", 0), + "metadata": { + k: v + for k, v in r["chunk"]["metadata"].items() + if k + not in ("source_file", "chunk_index", "collection", "file_type") + }, + } + for r in all_results + ], + } + if search_errors: + result["warnings"] = search_errors + return result + + +async def _handle_kb_ingest( + arguments: dict, + *, + kb: KnowledgeBase, + job_tracker: _JobTracker, +) -> dict: + folder_path = Path(arguments["folder_path"]) + collection_name = arguments.get("collection_name") + file_types = arguments.get("file_types") + embedding_backend = arguments.get("embedding_backend") + embedding_model = arguments.get("embedding_model") + vector_db = arguments.get("vector_db") + + if not folder_path.exists(): + return {"status": "error", "error": f"Folder not found: {folder_path}"} + if not folder_path.is_dir(): + return {"status": "error", "error": f"Not a directory: {folder_path}"} + + target_name = collection_name or folder_path.name + warning = None + + if target_name in job_tracker.active_collections: + return { + "status": "error", + "error": ( + f"Collection '{target_name}' is already being ingested. " + f"Poll kb_job_status for progress." + ), + } + + if _collection_exists(kb.index_dir / target_name): + source_path = kb.index_dir / target_name / "source.txt" + existing_source = ( + source_path.read_text().strip() if source_path.exists() else None + ) + same_source = ( + existing_source is None + or Path(existing_source).resolve() == folder_path.resolve() + ) + if not same_source: + original_name = target_name + n = 1 + while ( + _collection_exists(kb.index_dir / target_name) + or target_name in job_tracker.active_collections + ): + target_name = f"{original_name}{n}" + n += 1 + warning = ( + f"Collection '{original_name}' already exists from a " + f"different folder; using '{target_name}'." + ) + + route = None + if embedding_backend or embedding_model or vector_db: + default = kb._default_route + inherited_model = embedding_model or default.embedder_kwargs.get("model") + route = CollectionRoute( + embedding_backend=embedding_backend or default.embedding_backend, + vector_db=vector_db or default.vector_db, + embedder_kwargs={"model": inherited_model} if inherited_model else {}, + ) + + ingest_kwargs: dict = {"collection_name": target_name} + if file_types: + ingest_kwargs["file_types"] = file_types + if route is not None: + ingest_kwargs["route"] = route + + async def _ingest_with_logging(): + import traceback as _tb + + logger.info( + "Ingest starting: collection=%s folder=%s kwargs=%s", + target_name, + folder_path, + ingest_kwargs, + ) + try: + result = await asyncio.to_thread(kb.ingest, folder_path, **ingest_kwargs) + logger.info("Ingest complete: %s", result) + return result + except Exception as _e: + logger.error("Ingest FAILED: %s\n%s", _e, _tb.format_exc()) + raise + + job_id = job_tracker.start(_ingest_with_logging(), collection=target_name) + result = { + "status": "started", + "job_id": job_id, + "collection": target_name, + "message": ( + f"Ingestion started. " + f"Poll kb_job_status(job_id='{job_id}') every 10 seconds. " + f"DO NOT call ingest again -- the job is running in the " + f"background. Large folders may take several minutes." + ), + } + if warning: + result["warning"] = warning + return result + + +async def _handle_kb_append( + arguments: dict, + *, + kb: KnowledgeBase, + job_tracker: _JobTracker, +) -> dict: + collection = arguments["collection"] + paths = arguments["paths"] + if isinstance(paths, str): + paths = [paths] + file_types = arguments.get("file_types") + + if not _collection_exists(kb.index_dir / collection): + return {"status": "error", "error": f"Collection '{collection}' not found"} + + append_kwargs: dict = {} + if file_types: + append_kwargs["file_types"] = file_types + + job_id = job_tracker.start( + asyncio.to_thread(kb.append, collection, paths, **append_kwargs), + collection=collection, + ) + return { + "status": "started", + "job_id": job_id, + "collection": collection, + "message": f"Append started. Poll kb_job_status(job_id='{job_id}') for progress.", + } + + +async def _handle_kb_add_vector_db(arguments: dict, *, kb: KnowledgeBase) -> dict: + collection_name = arguments["collection_name"] + vector_db = arguments["vector_db"] + connection_params = arguments["connection_params"] + embedding_model = arguments["embedding_model"] + description = arguments.get("description", "") + + if (kb.index_dir / collection_name).exists(): + return { + "status": "error", + "error": ( + f"Collection '{collection_name}' already exists. " + "Choose a different name or delete the existing collection." + ), + } + + await asyncio.to_thread( + _register_external_collection, + kb, + collection_name, + vector_db, + connection_params, + embedding_model, + description, + ) + return { + "status": "ok", + "collection": collection_name, + "vector_db": vector_db, + "embedding_model": embedding_model, + "message": ( + f"External collection '{collection_name}' registered. " + "Use search to query it." + ), + } + + +async def _handle_kb_job_status(arguments: dict, *, job_tracker: _JobTracker) -> dict: + job_id = arguments["job_id"] + if job_id not in job_tracker.jobs: + return {"status": "error", "error": f"Unknown job: {job_id}"} + + job = job_tracker.jobs[job_id] + elapsed = int(time.monotonic() - job["started_at"]) + result = { + "status": job["status"], + "elapsed_seconds": elapsed, + "message": job.get("message", ""), + } + if job["status"] == "running": + result["instruction"] = ( + "Job is still running. DO NOT call ingest again. " + "Keep polling job_status every 10 seconds until " + "status is 'complete' or 'error'." + ) + if job["result"] is not None: + result["result"] = job["result"] + if job["error"] is not None: + result["error"] = job["error"] + if job.get("traceback") and job["status"] == "error": + result["traceback"] = job["traceback"] + return result + + +# --------------------------------------------------------------------------- +# Tool defs + handler map (used by the merged server and the test wrapper) +# --------------------------------------------------------------------------- + + +def _knowledge_tools_and_handlers(kb: KnowledgeBase): + """Build the knowledge-base ``(tool defs, handler map)``. + + Combined with the other concern modules' tools under one MCP ``Server`` by + :func:`dsagt.mcp.server.create_dsagt_server`. The rerank default is on + ``kb.default_rerank`` (set from ``knowledge.rerank`` in dsagt_config.yaml). + """ + job_tracker = _JobTracker() + + handlers = { + "kb_list_collections": partial(_handle_kb_list_collections, kb=kb), + "kb_search": partial(_handle_kb_search, kb=kb), + "kb_ingest": partial(_handle_kb_ingest, kb=kb, job_tracker=job_tracker), + "kb_append": partial(_handle_kb_append, kb=kb, job_tracker=job_tracker), + "kb_add_vector_db": partial(_handle_kb_add_vector_db, kb=kb), + "kb_job_status": partial(_handle_kb_job_status, job_tracker=job_tracker), + } + + tools = [ + types.Tool( + name="kb_list_collections", + description=( + "List all available knowledge base collections with their " + "embedding model and vector DB. Use this to discover what " + "documentation is already indexed." + ), + inputSchema={"type": "object", "properties": {}}, + ), + types.Tool( + name="kb_search", + description=( + "Search knowledge base collections using semantic similarity. " + "Returns relevant chunks with source metadata. " + "Supports multi-collection search." + ), + inputSchema={ + "type": "object", + "properties": { + "query": { + "type": "string", + "description": "Natural language search query", + }, + "collection": { + "type": "string", + "description": "Name of a single collection to search", + }, + "collections": { + "type": "array", + "items": {"type": "string"}, + "description": "Search multiple collections and merge results (overrides 'collection')", + }, + "top_k": { + "type": "integer", + "description": "Number of results to return (default: 5)", + "default": 5, + }, + "rerank": { + "type": "boolean", + "description": "Use cross-encoder reranking (slower but more accurate). Default from config.", + "default": kb.default_rerank, + }, + "category": { + "type": "string", + "description": "Filter by category tag (ChromaDB collections only)", + }, + "session_id": { + "type": "string", + "description": "Filter by session ID (ChromaDB collections only)", + }, + "tool_name": { + "type": "string", + "description": "Filter by tool name (ChromaDB collections only)", + }, + "source_type": { + "type": "string", + "description": "Filter by source type (ChromaDB collections only)", + }, + "return_code": { + "type": "integer", + "description": "Filter by tool exit code (ChromaDB collections only)", + }, + }, + "required": ["query"], + }, + ), + types.Tool( + name="kb_ingest", + description=( + "Index a folder as a new knowledge base collection. " + "Returns immediately with a job_id. " + "IMPORTANT: poll kb_job_status every 10 seconds and wait for " + "status='complete'. DO NOT call ingest again for the same " + "folder while a job is running." + ), + inputSchema={ + "type": "object", + "properties": { + "folder_path": { + "type": "string", + "description": "Path to folder containing documents to index", + }, + "collection_name": { + "type": "string", + "description": "Name for the collection (default: folder name)", + }, + "file_types": { + "type": "array", + "items": {"type": "string"}, + "description": "File extensions to include, e.g. ['pdf', 'md', 'py']. Defaults to common types.", + }, + "embedding_backend": { + "type": "string", + "enum": list(EMBEDDER_REGISTRY.keys()), + "description": "Embedding backend override for this collection.", + }, + "embedding_model": { + "type": "string", + "description": "Embedding model override for this collection.", + }, + "vector_db": { + "type": "string", + "enum": list(VECTORINDEX_REGISTRY.keys()), + "description": "Vector database override for this collection.", + }, + }, + "required": ["folder_path"], + }, + ), + types.Tool( + name="kb_append", + description=( + "Add documents to an existing collection. Uses the same embedding " + "model and vector DB the collection was created with. " + "Returns immediately with a job_id -- poll kb_job_status for progress." + ), + inputSchema={ + "type": "object", + "properties": { + "collection": { + "type": "string", + "description": "Name of the existing collection to append to", + }, + "paths": { + "type": "array", + "items": {"type": "string"}, + "description": "List of file or folder paths to add", + }, + "file_types": { + "type": "array", + "items": {"type": "string"}, + "description": "File extensions to include when expanding folders.", + }, + }, + "required": ["collection", "paths"], + }, + ), + types.Tool( + name="kb_add_vector_db", + description=( + "Register an already-built external vector store as a collection. " + "Queries will be embedded via the API using the specified model." + ), + inputSchema={ + "type": "object", + "properties": { + "collection_name": { + "type": "string", + "description": "Unique name for this collection", + }, + "vector_db": { + "type": "string", + "enum": ["chroma", "lancedb", "qdrant"], + "description": "Vector store backend type", + }, + "connection_params": { + "type": "object", + "description": "Backend-specific connection parameters.", + }, + "embedding_model": { + "type": "string", + "description": "The API model used to build this index", + }, + "description": { + "type": "string", + "description": "Human-readable description for agent discovery", + }, + }, + "required": [ + "collection_name", + "vector_db", + "connection_params", + "embedding_model", + ], + }, + ), + types.Tool( + name="kb_job_status", + description="Check the status of a background ingest or append job.", + inputSchema={ + "type": "object", + "properties": { + "job_id": { + "type": "string", + "description": "Job ID returned by kb_ingest or kb_append", + }, + }, + "required": ["job_id"], + }, + ), + ] + return tools, handlers + + +def create_knowledge_server(kb: KnowledgeBase): + """Create a standalone MCP server exposing only the knowledge-base tools. + + Test-facing API: tests call it with a mock KB and drive the server via + ``call_tool_sync()``. The merged ``dsagt-server`` uses + :func:`_knowledge_tools_and_handlers` directly instead of this wrapper. + """ + tools, handlers = _knowledge_tools_and_handlers(kb) + return build_dispatch_server("knowledge", tools, handlers) diff --git a/src/dsagt/mcp/memory_tools.py b/src/dsagt/mcp/memory_tools.py new file mode 100644 index 0000000..e0a2c86 --- /dev/null +++ b/src/dsagt/mcp/memory_tools.py @@ -0,0 +1,231 @@ +"""MCP tools for explicit memory + outlier suggestions. + +User-confirmed facts that persist across sessions (``kb_remember`` / +``kb_get_memories``) and the outlier-detection suggestion queue +(``kb_get_suggestions`` / ``kb_dismiss_suggestion``). These front +:mod:`dsagt.memory` (``ExplicitMemory`` + ``SuggestionQueue``); the ``kb_`` +tool-name prefix is historical (the tools were born in the knowledge server) +and is kept for agent-facing backward compatibility. + +These definitions + handlers run inside the merged ``dsagt-server`` (see +:mod:`dsagt.mcp.server`); ``create_memory_server`` is retained only as a +test-facing constructor. +""" + +import asyncio +import logging +from functools import partial +from pathlib import Path + +import mcp.types as types + +from dsagt.knowledge import KnowledgeBase +from dsagt.mcp.server import build_dispatch_server +from dsagt.memory import ExplicitMemory, SuggestionQueue + +logger = logging.getLogger(__name__) + + +async def _handle_kb_remember( + arguments: dict, + *, + kb: KnowledgeBase, + memory: ExplicitMemory, + suggestions: SuggestionQueue, +) -> dict: + text = arguments["text"] + category = arguments.get("category", "") + session_id = arguments.get("session_id", "") + supersedes = arguments.get("supersedes") + promoted_from = arguments.get("promoted_from") + + store_result = await asyncio.to_thread( + memory.remember, + text=text, + category=category, + session_id=session_id, + supersedes=supersedes, + ) + + if not store_result.get("stored"): + return { + "status": "error", + "error": store_result.get("error", "Failed to store memory"), + } + + await asyncio.to_thread( + kb.add_entries, + texts=[text], + collection="session_memory", + metadatas=[ + { + "source_type": "explicit_memory", + "category": category, + "session_id": session_id, + } + ], + ) + + if promoted_from: + suggestions.dismiss(promoted_from) + + return { + "status": "ok", + "entry_id": store_result["entry_id"], + "superseded_id": store_result.get("superseded_id"), + "promoted_from": promoted_from, + "total_memories": await asyncio.to_thread(memory.count), + } + + +async def _handle_kb_get_memories( + arguments: dict, + *, + memory: ExplicitMemory, + suggestions: SuggestionQueue, +) -> dict: + entries = await asyncio.to_thread(memory.get_all) + pending = suggestions.get_all() + result = {"status": "ok", "count": len(entries), "memories": entries} + if pending: + result["suggestions"] = pending + result["suggestion_count"] = len(pending) + return result + + +async def _handle_kb_get_suggestions( + arguments: dict, + *, + suggestions: SuggestionQueue, +) -> dict: + pending = suggestions.get_all() + return {"status": "ok", "count": len(pending), "suggestions": pending} + + +async def _handle_kb_dismiss_suggestion( + arguments: dict, + *, + suggestions: SuggestionQueue, +) -> dict: + suggestion_id = arguments["suggestion_id"] + dismissed = suggestions.dismiss(suggestion_id) + if not dismissed: + return {"status": "error", "error": f"Suggestion not found: {suggestion_id}"} + return {"status": "ok", "dismissed": suggestion_id, "remaining": suggestions.count} + + +# --------------------------------------------------------------------------- +# Tool defs + handler map (used by the merged server and the test wrapper) +# --------------------------------------------------------------------------- + + +def _memory_tools_and_handlers( + kb: KnowledgeBase, + runtime_dir: str | Path | None = None, +): + """Build the explicit-memory ``(tool defs, handler map)``. + + Combined with the other concern modules' tools under one MCP ``Server`` by + :func:`dsagt.mcp.server.create_dsagt_server`. ``ExplicitMemory`` + + ``SuggestionQueue`` are rooted at ``runtime_dir`` (falling back to the KB + index's parent), matching the project's ``.dsagt`` memory location. + """ + mem_dir = Path(runtime_dir) if runtime_dir else kb.index_dir.parent + memory = ExplicitMemory(runtime_dir=mem_dir) + suggestions = SuggestionQueue(mem_dir / "suggestions.json") + + handlers = { + "kb_remember": partial( + _handle_kb_remember, kb=kb, memory=memory, suggestions=suggestions + ), + "kb_get_memories": partial( + _handle_kb_get_memories, memory=memory, suggestions=suggestions + ), + "kb_get_suggestions": partial( + _handle_kb_get_suggestions, suggestions=suggestions + ), + "kb_dismiss_suggestion": partial( + _handle_kb_dismiss_suggestion, suggestions=suggestions + ), + } + + tools = [ + types.Tool( + name="kb_remember", + description=( + "Store a user-confirmed fact as an explicit memory. " + "These persist across sessions. Use 'supersedes' to replace an outdated memory." + ), + inputSchema={ + "type": "object", + "properties": { + "text": { + "type": "string", + "description": "The fact to remember", + }, + "category": { + "type": "string", + "description": "Classification tag", + }, + "session_id": { + "type": "string", + "description": "Current session identifier", + }, + "supersedes": { + "type": "string", + "description": "entry_id of an existing memory this replaces", + }, + "promoted_from": { + "type": "string", + "description": "suggestion_id if promoted from outlier suggestion", + }, + }, + "required": ["text"], + }, + ), + types.Tool( + name="kb_get_memories", + description=( + "Get all active explicit memories for this project. " + "Call at session start to load project context." + ), + inputSchema={"type": "object", "properties": {}}, + ), + types.Tool( + name="kb_get_suggestions", + description=( + "Get pending memory suggestions flagged by outlier detection. " + "Present to user for confirmation or dismissal." + ), + inputSchema={"type": "object", "properties": {}}, + ), + types.Tool( + name="kb_dismiss_suggestion", + description="Dismiss a pending memory suggestion.", + inputSchema={ + "type": "object", + "properties": { + "suggestion_id": { + "type": "string", + "description": "ID of the suggestion to dismiss", + }, + }, + "required": ["suggestion_id"], + }, + ), + ] + return tools, handlers + + +def create_memory_server( + kb: KnowledgeBase, + runtime_dir: str | Path | None = None, +): + """Create a standalone MCP server exposing only the explicit-memory tools. + + Test-facing API: tests call it with a mock KB and drive the server via + ``call_tool_sync()``. The merged ``dsagt-server`` uses + :func:`_memory_tools_and_handlers` directly instead of this wrapper. + """ + tools, handlers = _memory_tools_and_handlers(kb, runtime_dir) + return build_dispatch_server("memory", tools, handlers) diff --git a/src/dsagt/mcp/registry_tools.py b/src/dsagt/mcp/registry_tools.py new file mode 100644 index 0000000..d77fc7e --- /dev/null +++ b/src/dsagt/mcp/registry_tools.py @@ -0,0 +1,526 @@ +"""MCP tools for the tool registry, execution, and provenance. + +The "tool lifecycle" surface of ``dsagt-server``: define a tool spec +(``save_tool_spec``), discover tools (``get_registry`` / ``search_registry``), +execute / gather (``read_file`` / ``http_request`` / ``run_command`` / +``install_dependencies``), and reconstruct a reproducible pipeline from the +recorded executions (``reconstruct_pipeline``). + +Tool specs are saved as markdown files in the runtime tools directory and +indexed into a ChromaDB collection for semantic search. Server configuration +(embedding credentials) flows through env vars (LLM_API_KEY, OPENAI_BASE_URL, +EMBEDDING_MODEL) set by ``dsagt start``. + +These definitions + handlers run inside the merged ``dsagt-server`` (see +:mod:`dsagt.mcp.server`); ``create_registry_server`` is retained only as a +test-facing constructor. Skill tools (``save_skill`` / ``search_skills`` / +``install_skill``) live in :mod:`dsagt.mcp.skill_tools`. +""" + +import json +import logging +import subprocess +import sys +from functools import partial +from pathlib import Path + +import httpx +import yaml + +import mcp.types as types + +from dsagt.knowledge import KnowledgeBase +from dsagt.mcp.server import build_dispatch_server +from dsagt.observability import ( + obs, + registry_install_deps_span, + registry_reconstruct_pipeline_span, + registry_save_tool_span, +) +from dsagt.provenance import reconstruct_pipeline +from dsagt.registry import TOOLS_COLLECTION, ToolRegistry + +logger = logging.getLogger(__name__) + + +def _install_dependencies(packages: list[str], timeout: int = 120) -> str: + """Install packages using uv pip install. Returns a status string.""" + cmd = ["uv", "pip", "install", "--python", sys.executable] + packages + try: + result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout) + if result.returncode == 0: + output = result.stdout.strip() + return f"Successfully installed: {', '.join(packages)}\n{output}" + else: + return ( + f"Installation failed (exit code {result.returncode}):\n" + f"{result.stderr.strip()}" + ) + except subprocess.TimeoutExpired: + return f"Installation timed out after {timeout}s for: {', '.join(packages)}" + except FileNotFoundError: + return ( + "Error: 'uv' command not found. Install uv: https://github.com/astral-sh/uv" + ) + + +# --------------------------------------------------------------------------- +# Per-tool handlers (module-level, explicit dependencies) +# --------------------------------------------------------------------------- + + +async def _handle_read_file(arguments: dict) -> str: + path = Path(arguments["path"]) + try: + return path.read_text() + except ( + FileNotFoundError, + PermissionError, + IsADirectoryError, + OSError, + UnicodeDecodeError, + ) as e: + return f"Error reading file: {e}" + + +async def _handle_http_request(arguments: dict) -> str: + url = arguments["url"] + method = arguments.get("method", "GET") + headers = arguments.get("headers", {}) + try: + async with httpx.AsyncClient(follow_redirects=True) as client: + response = await client.request( + method=method, + url=url, + headers=headers, + timeout=30.0, + ) + return f"Status: {response.status_code}\n\n{response.text}" + except (httpx.HTTPError, httpx.InvalidURL) as e: + return f"Error making request: {e}" + + +async def _handle_run_command(arguments: dict) -> str: + command = arguments["command"] + args = arguments.get("args", []) + timeout = arguments.get("timeout", 10) + try: + result = subprocess.run( + [command] + args, + capture_output=True, + text=True, + timeout=timeout, + ) + except subprocess.TimeoutExpired: + return f"Command timed out after {timeout} seconds" + except FileNotFoundError: + return f"Command '{command}' not found" + + output = "" + if result.stdout: + output += f"STDOUT:\n{result.stdout}\n" + if result.stderr: + output += f"STDERR:\n{result.stderr}\n" + output += f"\nReturn code: {result.returncode}" + return output + + +async def _handle_save_tool_spec( + arguments: dict, + *, + registry: ToolRegistry, +) -> str: + spec = arguments["spec"] + # Some MCP clients (notably Claude Sonnet/Haiku 4.x) serialize nested + # object args as JSON strings instead of objects. Accept both shapes. + if isinstance(spec, str): + try: + spec = json.loads(spec) + except json.JSONDecodeError as e: + return f"Error: spec must be a JSON object (or string-encoded JSON object): {e}" + with registry_save_tool_span(spec.get("name")): + obs.set("language", spec.get("language")) + obs.set("n_dependencies", len(spec.get("dependencies") or [])) + obs.set("n_tags", len(spec.get("tags") or [])) + try: + action = registry.save_tool(spec) + except (KeyError, ValueError, OSError) as e: + obs.event("save_tool_failed", error=str(e)[:256]) + return f"Error saving tool spec: {e}" + + tool_count = len(registry.list_tools_raw()) + obs.set("action", action) + obs.set("registry_size", tool_count) + message = ( + f"Tool '{spec['name']}' {action} successfully. " + f"Registry now contains {tool_count} tools." + ) + deps = spec.get("dependencies", []) + if deps: + with registry_install_deps_span(deps): + dep_result = _install_dependencies(deps) + if dep_result.startswith("Successfully installed:"): + obs.set("status", "ok") + else: + obs.set("status", "failed") + obs.event("install_failed", message=dep_result[:256]) + message += f"\n\nDependency installation:\n{dep_result}" + return message + + +async def _handle_get_registry( + arguments: dict, + *, + registry: ToolRegistry, +) -> str: + tools = registry.list_tools_raw() + if not tools: + return "Registry is empty. No tools registered yet." + return yaml.dump({"tools": tools}, default_flow_style=False, sort_keys=False) + + +async def _handle_search_registry( + arguments: dict, + *, + registry: ToolRegistry, + kb: KnowledgeBase | None, +) -> str: + tool_name = arguments.get("tool_name") + query = arguments.get("query", "") + tag = arguments.get("tag") + top_k = arguments.get("top_k", 10) + + if tool_name: + tool = registry.get_tool(tool_name) + if tool: + return f"Found tool '{tool_name}':\n\n" + yaml.dump( + tool, default_flow_style=False, sort_keys=False + ) + return f"No tool named '{tool_name}'." + + if kb is None: + return ( + "search_registry requires a configured knowledge base " + "(set embedding.api_key + embedding.base_url + embedding.model " + "in dsagt_config.yaml). Use search_registry with an exact " + "tool_name for KB-free lookups." + ) + + # Single ``tools`` collection — bundled and registered entries + # coexist, distinguished by ``metadata.source`` if needed. + results = kb.search( + query=query or "tool", + collection=TOOLS_COLLECTION, + top_k=top_k * 3 if tag else top_k, + ) + if tag and results: + results = [ + r + for r in results + if tag in r.get("chunk", {}).get("metadata", {}).get("tags", "") + ][:top_k] + if not results: + return "No tools found matching the query." + + summaries = [] + for r in results: + chunk = r.get("chunk", {}) + meta = chunk.get("metadata", {}) + summaries.append( + f"- **{meta.get('tool_name', 'unknown')}** " + f"(score: {r.get('score', 0):.2f})\n" + f" {chunk.get('text', '')[:200]}" + ) + return f"Found {len(results)} tool(s):\n\n" + "\n\n".join(summaries) + + +async def _handle_reconstruct_pipeline( + arguments: dict, + *, + runtime_dir: Path, +) -> str: + fmt = arguments.get("format", "bash") + trace_dir = runtime_dir / "trace_archive" + with registry_reconstruct_pipeline_span(fmt): + try: + script = reconstruct_pipeline(trace_dir, fmt=fmt) + except (FileNotFoundError, ValueError, OSError) as e: + obs.event("reconstruct_failed", error=str(e)[:256]) + return f"Error reconstructing pipeline: {e}" + obs.set("output_chars", len(script)) + return script + + +async def _handle_install_dependencies( + arguments: dict, + *, + registry: ToolRegistry, +) -> str: + tool_name = arguments.get("tool_name") + tools = registry.list_tools_raw() + if not tools: + return "Registry is empty. No tools registered yet." + + all_deps = [] + tools_with_deps = [] + for tool in tools: + if tool_name and tool.get("name") != tool_name: + continue + tool_deps = tool.get("dependencies", []) + if tool_deps: + all_deps.extend(tool_deps) + tools_with_deps.append(tool["name"]) + + if not all_deps: + scope = f"tool '{tool_name}'" if tool_name else "registry" + return f"No dependencies declared in {scope}." + + seen = set() + unique_deps = [d for d in all_deps if not (d in seen or seen.add(d))] + + with registry_install_deps_span(unique_deps): + obs.set("scope_tool", tool_name) + obs.set("n_tools_with_deps", len(tools_with_deps)) + result = _install_dependencies(unique_deps) + if result.startswith("Successfully installed:"): + obs.set("status", "ok") + else: + obs.set("status", "failed") + obs.event("install_failed", message=result[:256]) + return f"Installing dependencies for: {', '.join(tools_with_deps)}\n\n{result}" + + +# --------------------------------------------------------------------------- +# Tool defs + handler map (used by the merged server and the test wrapper) +# --------------------------------------------------------------------------- + + +def _registry_tools_and_handlers( + registry: ToolRegistry, + kb: KnowledgeBase | None = None, +): + """Build the registry/execution/provenance ``(tool defs, handler map)``. + + Combined with the other concern modules' tools under one MCP ``Server`` by + :func:`dsagt.mcp.server.create_dsagt_server`. + """ + runtime_dir = Path(registry.runtime_dir) + + handlers = { + "read_file": _handle_read_file, + "http_request": _handle_http_request, + "run_command": _handle_run_command, + "save_tool_spec": partial(_handle_save_tool_spec, registry=registry), + "get_registry": partial(_handle_get_registry, registry=registry), + "search_registry": partial(_handle_search_registry, registry=registry, kb=kb), + "reconstruct_pipeline": partial( + _handle_reconstruct_pipeline, runtime_dir=runtime_dir + ), + "install_dependencies": partial( + _handle_install_dependencies, registry=registry + ), + } + + tools = [ + types.Tool( + name="read_file", + description="Read contents of a text file", + inputSchema={ + "type": "object", + "properties": { + "path": { + "type": "string", + "description": "Path to the file to read", + }, + }, + "required": ["path"], + }, + ), + types.Tool( + name="http_request", + description="Make an HTTP request to fetch documentation or API specs", + inputSchema={ + "type": "object", + "properties": { + "url": {"type": "string", "description": "URL to request"}, + "method": { + "type": "string", + "description": "HTTP method", + "default": "GET", + }, + "headers": { + "type": "object", + "description": "Optional headers", + }, + }, + "required": ["url"], + }, + ), + types.Tool( + name="run_command", + description="Execute a command to get help/usage information", + inputSchema={ + "type": "object", + "properties": { + "command": { + "type": "string", + "description": "Command to execute", + }, + "args": { + "type": "array", + "items": {"type": "string"}, + "description": "Arguments (e.g., ['--help'])", + "default": [], + }, + "timeout": {"type": "number", "default": 10}, + }, + "required": ["command"], + }, + ), + types.Tool( + name="save_tool_spec", + description="Save a tool specification to the registry as a skill file", + inputSchema={ + "type": "object", + "properties": { + # ``anyOf`` accepts both a structured object and a + # JSON-encoded string. Some MCP clients (notably + # Claude Sonnet/Haiku 4.x) serialize nested object + # arguments as JSON strings instead of objects; the + # handler unwraps either shape. + "spec": { + "description": "Tool specification (object or JSON-encoded string)", + "anyOf": [ + { + "type": "object", + "properties": { + "name": { + "type": "string", + "description": "Unique tool identifier", + }, + "description": { + "type": "string", + "description": "What the tool does", + }, + "executable": { + "type": "string", + "description": "Command to execute", + }, + "parameters": { + "type": "object", + "description": "Parameter definitions", + "additionalProperties": { + "type": "object", + "properties": { + "type": { + "type": "string", + "description": "Parameter type", + }, + "required": {"type": "boolean"}, + "description": {"type": "string"}, + "default": { + "description": "Default value" + }, + "cli": { + "type": "string", + "description": ( + "How to render this parameter on the command line: " + "'positional[:N]' for positional args, '--name' or '-n' " + "for spaced flags, '--name=' or '-n=' for glued flags, " + "'key=' for dd-style key=value. Defaults to '--' " + "if omitted." + ), + }, + }, + "required": ["type", "description"], + }, + }, + "dependencies": { + "type": "array", + "items": {"type": "string"}, + "description": "Python packages to install", + }, + "tags": { + "type": "array", + "items": {"type": "string"}, + "description": "Tags for categorizing the tool", + }, + }, + "required": [ + "name", + "description", + "executable", + "parameters", + ], + }, + {"type": "string"}, + ], + }, + }, + "required": ["spec"], + }, + ), + types.Tool( + name="get_registry", + description="Get all tools from the registry", + inputSchema={"type": "object", "properties": {}}, + ), + types.Tool( + name="search_registry", + description="Search for tools by name, tag, or description via semantic search.", + inputSchema={ + "type": "object", + "properties": { + "query": {"type": "string", "description": "Search query"}, + "tag": {"type": "string", "description": "Filter by tag"}, + "tool_name": { + "type": "string", + "description": "Exact tool name lookup", + }, + "top_k": {"type": "integer", "default": 10}, + }, + }, + ), + types.Tool( + name="reconstruct_pipeline", + description="Reconstruct a reproducible pipeline script from tool execution records.", + inputSchema={ + "type": "object", + "properties": { + "format": { + "type": "string", + "enum": ["bash", "snakemake"], + "default": "bash", + }, + }, + }, + ), + types.Tool( + name="install_dependencies", + description="Install Python dependencies for one or all tools in the registry.", + inputSchema={ + "type": "object", + "properties": { + "tool_name": { + "type": "string", + "description": "Install deps for a specific tool (omit for all)", + }, + }, + }, + ), + ] + return tools, handlers + + +def create_registry_server( + registry: ToolRegistry, + kb: KnowledgeBase | None = None, +): + """Create a standalone MCP server exposing only the registry/exec/provenance tools. + + Test-facing API: tests call with a mock registry and drive the server via + ``call_tool_sync()``. The merged ``dsagt-server`` uses + :func:`_registry_tools_and_handlers` directly instead of this wrapper. + """ + tools, handlers = _registry_tools_and_handlers(registry, kb) + return build_dispatch_server("registry", tools, handlers) diff --git a/src/dsagt/mcp/server.py b/src/dsagt/mcp/server.py new file mode 100644 index 0000000..df46cee --- /dev/null +++ b/src/dsagt/mcp/server.py @@ -0,0 +1,304 @@ +"""DSAGT MCP Server — the single merged registry + knowledge server. + +Supersedes the two former servers (``dsagt-registry-server`` + +``dsagt-knowledge-server``). Both previously constructed their own +:class:`~dsagt.knowledge.KnowledgeBase` — two embedders, two Chroma accesses, +and a write-here/read-there hazard on the ``skills_catalog__*`` collections +(synced by knowledge, searched by registry). Merging into one process gives one +embedder, one Chroma owner, one ``init_tracing``, and one MCP server per agent. + +The heavy/risky work is already offloaded out of the event loop (``run_command`` +→ ``dsagt-run`` subprocess; ``kb_ingest`` → background job thread), so collapsing +the two processes costs little isolation. + +Tool *definitions* and *handlers* live in their concern modules +(:mod:`~dsagt.mcp.registry_tools` / :mod:`~dsagt.mcp.knowledge_tools` / +:mod:`~dsagt.mcp.memory_tools` / :mod:`~dsagt.mcp.skill_tools`); this module only +composes their ``(tools, handlers)`` under one dispatch shell +(:func:`build_dispatch_server`) and owns the shared-KB startup. The factory +imports are *lazy* (inside :func:`create_dsagt_server` / :func:`main`) so the +concern modules can import :func:`build_dispatch_server` from here without a +cycle. + +See ``design-notes/skills-catalog-server-merge.md`` §2. + +Backward compatibility is **rebuild-not-migrate**: a project created against the +old two-server layout adopts this by re-running ``dsagt start`` (which +regenerates the per-agent MCP config to a single ``dsagt`` server). See the +upgrade note in the README. +""" + +import os + +# Set before any import that may pull in FAISS / PyTorch / sentence-transformers +# (e.g. ``dsagt.knowledge`` below): prevents a fatal OpenMP crash when multiple +# libraries each bundle their own libomp. +os.environ["PYTHONUNBUFFERED"] = "1" +os.environ.setdefault("KMP_DUPLICATE_LIB_OK", "TRUE") + +import asyncio # noqa: E402 +import json # noqa: E402 +import logging # noqa: E402 +from pathlib import Path # noqa: E402 + +import yaml # noqa: E402 + +import mcp.server.stdio # noqa: E402 +import mcp.types as types # noqa: E402 +from mcp.server.lowlevel import Server, NotificationOptions # noqa: E402 +from mcp.server.models import InitializationOptions # noqa: E402 + +from dsagt.knowledge import KnowledgeBase # noqa: E402 +from dsagt.registry import SkillRegistry, ToolRegistry # noqa: E402 + +logger = logging.getLogger(__name__) + + +# --------------------------------------------------------------------------- +# Shared dispatch shell (used by the merged server *and* the per-concern +# test-facing ``create_*_server`` wrappers) +# --------------------------------------------------------------------------- + + +def build_dispatch_server(name: str, tools, handlers) -> Server: + """Wrap a ``(tools, handlers)`` pair in a configured MCP ``Server``. + + One dispatch contract for every concern module: catch + wrap errors, then + format by return type — a handler that returns ``str`` passes through, one + that returns ``dict`` is JSON-encoded. This is a superset of the old + per-server behavior (registry handlers returned ``str`` and never raised; + knowledge handlers returned ``dict`` and raised ``ValueError`` on bad + input), so it is behavior-preserving for both. + """ + server = Server(name) + + @server.list_tools() + async def list_tools() -> list[types.Tool]: + return tools + + @server.call_tool() + async def call_tool(tool_name: str, arguments: dict) -> list[types.TextContent]: + handler = handlers[tool_name] # KeyError = bug in list_tools schema + try: + result = await handler(arguments) + except ValueError as e: + result = {"status": "error", "error": str(e)} + except Exception as e: + logger.exception("Unexpected error in tool '%s'", tool_name) + result = {"status": "error", "error": f"Unexpected error: {e}"} + text = ( + result + if isinstance(result, str) + else json.dumps(result, ensure_ascii=False) + ) + return [types.TextContent(type="text", text=text)] + + return server + + +async def _run_stdio(server: Server, name: str) -> None: + async with mcp.server.stdio.stdio_server() as (read_stream, write_stream): + await server.run( + read_stream, + write_stream, + InitializationOptions( + server_name=name, + server_version="0.1.0", + capabilities=server.get_capabilities( + notification_options=NotificationOptions(), + experimental_capabilities={}, + ), + ), + ) + + +# --------------------------------------------------------------------------- +# Composition — merge the four concern modules' tools under one Server +# --------------------------------------------------------------------------- + + +def create_dsagt_server( + registry: ToolRegistry, + kb: KnowledgeBase | None, + skill_registry: SkillRegistry | None, + runtime_dir: str | Path | None = None, +): + """Compose the registry + knowledge + memory + skill tools under one ``Server``. + + Test-facing API: build the registries + a (mock) KB, then drive the + returned server via ``call_tool_sync()``. ``main()`` constructs the real + deps from project config before calling this. Factory imports are lazy to + keep the concern modules' top-level import of :func:`build_dispatch_server` + cycle-free. + """ + from dsagt.mcp.knowledge_tools import _knowledge_tools_and_handlers + from dsagt.mcp.memory_tools import _memory_tools_and_handlers + from dsagt.mcp.registry_tools import _registry_tools_and_handlers + from dsagt.mcp.skill_tools import _skill_tools_and_handlers + + groups = [ + _registry_tools_and_handlers(registry, kb), + _knowledge_tools_and_handlers(kb), + _memory_tools_and_handlers(kb, runtime_dir), + _skill_tools_and_handlers(skill_registry, kb, runtime_dir), + ] + + tools: list[types.Tool] = [] + handlers: dict = {} + for g_tools, g_handlers in groups: + overlap = set(handlers) & set(g_handlers) + if overlap: + raise RuntimeError( + f"dsagt-server tool-name collision across modules: {overlap}" + ) + tools += g_tools + handlers.update(g_handlers) + + return build_dispatch_server("dsagt", tools, handlers) + + +def _build_kb_from_config(config: dict, project_dir: Path) -> KnowledgeBase: + """Construct the one shared KnowledgeBase from project config. + + The single home for embedding-backend selection + the cross-backend + leakage guard that the two former server mains duplicated near-verbatim. + """ + from dsagt.session import REGISTRY_DIR, setup_runtime_kb + + kb_config = config["knowledge"] + emb_config = config["embedding"] + + backend = (emb_config.get("backend") or "local").lower() + if backend not in ("local", "api"): + raise ValueError( + f"embedding.backend must be 'local' or 'api' (got {backend!r})" + ) + + # Cross-backend leakage guard: HuggingFace identifiers ("org/repo") and + # OpenAI-style aliases ("text-embedding-3-small") share the same + # EMBEDDING_MODEL env var in most setups. When backend=local but the + # resolved model is an OpenAI-style alias (no slash), drop the override so + # we fall back to the LocalEmbeddingClient default rather than 404 from HF. + raw_model = (emb_config.get("model") or "").strip() + embedder_kwargs: dict = {} + if raw_model and not raw_model.startswith("${"): + looks_hf = "/" in raw_model + if backend == "local" and not looks_hf: + logger.warning( + "Ignoring embedding.model=%r for backend=local (does not look " + "like a HuggingFace identifier). Falling back to the " + "LocalEmbeddingClient default.", + raw_model, + ) + else: + embedder_kwargs["model"] = raw_model + if backend == "api": + base_url = emb_config.get("base_url") or "" + api_key = emb_config.get("api_key") or "" + if not base_url: + raise ValueError( + "embedding.backend='api' requires embedding.base_url in " + "dsagt_config.yaml. Either set it to your OpenAI-compatible " + "endpoint, or change backend to 'local'." + ) + if not api_key or api_key.startswith("${"): + raise ValueError( + "embedding.backend='api' requires embedding.api_key in " + "dsagt_config.yaml. Either fill it in (or export the " + "${EMBEDDING_API_KEY} env var), or change backend to 'local'." + ) + embedder_kwargs.update({"base_url": base_url, "api_key": api_key}) + + runtime_kb_dir = setup_runtime_kb(REGISTRY_DIR / "kb_index", project_dir) + logger.info("Knowledge backend: %s", backend) + kb = KnowledgeBase( + index_dir=runtime_kb_dir, + chunk_size=kb_config["chunk_size"], + default_rerank=kb_config["rerank"], + default_embedder=backend, + default_index=kb_config["vector_db"], + embedder_kwargs=embedder_kwargs, + ) + # Background-load the embedder so the model is ready when the agent's first + # search / kb call lands (otherwise the first call pays the ~5-10s + # sentence-transformers import + construction, which looks like a hang). + kb.preload_default_embedder() + return kb + + +def main(): + """Entry point for ``dsagt-server``. + + All configuration comes from the project directory: + - ``./dsagt_config.yaml`` → project path + non-secret settings + - ``EMBEDDING_*`` env vars → embedding credentials + + No CLI arguments. By contract the agent's launch one-liner is + ``cd && ``, so cwd is project_dir for the MCP children it + spawns. + """ + from dsagt.observability import ( + configure_litellm_retries, + find_project_config, + init_tracing, + ) + from dsagt.session import resolve_env_vars + + project_dir, _cfg = find_project_config() + if project_dir is None: + raise RuntimeError( + "dsagt-server: no dsagt_config.yaml in cwd " + f"({Path.cwd()}). Launch the agent from the project " + "directory (`cd && `)." + ) + + log_file = project_dir / "dsagt_server.log" + # Default INFO; users opt into DEBUG via DSAGT_LOG_LEVEL=DEBUG. At DEBUG, + # transitive libraries (httpcore, urllib3, llama_index, chromadb) flood + # stderr with one line per network op — when an agent pipes the MCP + # server's stderr into its own debug stream, human output gets buried. + _level_name = os.environ.get("DSAGT_LOG_LEVEL", "INFO").upper() + _level = getattr(logging, _level_name, logging.INFO) + logging.basicConfig( + level=_level, + format="%(asctime)s %(levelname)s %(name)s: %(message)s", + handlers=[ + logging.FileHandler(log_file, mode="a"), + logging.StreamHandler(), + ], + ) + logger.info("Server starting — project_dir: %s, log: %s", project_dir, log_file) + + config_path = project_dir / "dsagt_config.yaml" + config = resolve_env_vars(yaml.safe_load(config_path.read_text())) + + init_tracing("dsagt-server") # session_id picked up from DSAGT_SESSION_ID env + configure_litellm_retries() + + kb = _build_kb_from_config(config, project_dir) + + # Bundled tools are pre-embedded in the shared ~/dsagt-projects/kb_index/ + # by ``dsagt setup-kb`` (or the auto-bootstrap in ``dsagt start``) and + # copied into the project's kb_index by ``setup_runtime_kb`` above. No + # bundled embedding work happens here; save_tool_spec incurs a single + # embed at save time. + registry = ToolRegistry( + source_tools_dir=None, + runtime_dir=str(project_dir), + kb=kb, + ) + skill_reg = SkillRegistry( + source_skills_dir=None, + runtime_dir=str(project_dir), + kb=kb, + ) + + server = create_dsagt_server(registry, kb, skill_reg, runtime_dir=str(project_dir)) + try: + asyncio.run(_run_stdio(server, "dsagt")) + finally: + kb.close() + + +if __name__ == "__main__": + main() diff --git a/src/dsagt/mcp/skill_tools.py b/src/dsagt/mcp/skill_tools.py new file mode 100644 index 0000000..91bc140 --- /dev/null +++ b/src/dsagt/mcp/skill_tools.py @@ -0,0 +1,386 @@ +"""MCP tools for skill discovery, install, and catalog sources. + +The full skill surface of ``dsagt-server``, consolidated from the two former +servers: register a project skill (``save_skill``), enable + list external +catalog sources (``add_skill_source`` / ``list_skill_sources``), search the +catalog (``search_skills``), and install a catalog skill into the project +(``install_skill``). The catalog data plane + router live in +:mod:`dsagt.skills`; these handlers are the thin MCP wiring over it. + +Installed/created skills are natively auto-discovered by every supported agent, +so ``search_skills`` covers only the not-yet-installed *catalog* tier (plus the +no-embedder keyword fallback). + +These definitions + handlers run inside the merged ``dsagt-server`` (see +:mod:`dsagt.mcp.server`); ``create_skill_server`` is retained only as a +test-facing constructor. +""" + +import asyncio +import json +import logging +from functools import partial +from pathlib import Path + +import mcp.types as types + +from dsagt.knowledge import KnowledgeBase +from dsagt.mcp.server import build_dispatch_server +from dsagt.registry import SkillRegistry + +logger = logging.getLogger(__name__) + + +async def _handle_save_skill( + arguments: dict, + *, + skill_registry: SkillRegistry, +) -> str: + """Register a skill (workflow / agent instructions) for later reuse. + + Writes SKILL.md to ``/skills//``, where every supported + agent natively auto-discovers it (after the next ``dsagt start`` mirror). + No KB indexing — ``search_skills`` covers only the not-yet-installed + *catalog* tier, since installed skills are already natively discoverable. + """ + spec = arguments["spec"] + if isinstance(spec, str): + try: + spec = json.loads(spec) + except json.JSONDecodeError as e: + return f"Error: spec must be a JSON object (or string-encoded JSON object): {e}" + body = arguments.get("body") + reference_files = arguments.get("reference_files") + if isinstance(reference_files, str): + try: + reference_files = json.loads(reference_files) + except json.JSONDecodeError as e: + return f"Error: reference_files must be a JSON object: {e}" + try: + action = skill_registry.save_skill( + spec, body=body, reference_files=reference_files + ) + except (KeyError, ValueError, OSError) as e: + return f"Error saving skill: {e}" + skill_count = len(skill_registry.list_skills()) + return ( + f"Skill '{spec['name']}' {action} successfully. " + f"Registry now contains {skill_count} skills." + ) + + +async def _handle_search_skills( + arguments: dict, + *, + kb: KnowledgeBase | None, + skill_registry: SkillRegistry | None, +) -> str: + if skill_registry is None: + return "search_skills is unavailable (no skill registry configured)." + + from dsagt.skills import SkillRouter + + router = SkillRouter(skill_registry=skill_registry, kb=kb) + return router.search( + arguments.get("query", ""), + top_k=arguments.get("top_k", 10), + tag=arguments.get("tag"), + skill_name=arguments.get("skill_name"), + ) + + +async def _handle_install_skill( + arguments: dict, + *, + runtime_dir: Path, +) -> str: + """Install a catalog skill into ``/skills//``. + + The skill's files land on disk immediately, so the agent can use it in the + current session by reading its SKILL.md. *Native* auto-invocation requires + the next ``dsagt start`` (which mirrors installed skills into + ``.claude/skills/`` before launch) plus an agent restart. + """ + from dsagt.skills import SkillRouter + + name = arguments.get("skill_name") + if not name: + return "install_skill requires 'skill_name'." + try: + info = SkillRouter().install(name, runtime_dir) + except LookupError as e: + return f"Error: {e}" + + # Bare confirmation by design: the install→use→restart model and the + # license/PROVENANCE capture are already in the agent's instructions and on + # disk (PROVENANCE.txt), so repeating them on every install is just noise. + verb = "Updated" if info["action"] == "updated" else "Installed" + return f"{verb} '{info['name']}' → {info['dest_dir']}/" + + +async def _handle_add_skill_source( + arguments: dict, + *, + kb: KnowledgeBase, + runtime_dir: Path, +) -> dict: + """Enable a skill source (known name or GitHub URL): clone + index the catalog.""" + from dsagt.skills import ( + KNOWN_SOURCES, + SkillRouter, + persist_source_to_config, + resolve_source, + ) + + source = arguments.get("source") + if not source: + return { + "error": "add_skill_source requires 'source' (known name or GitHub URL)." + } + try: + spec = resolve_source(source) + if isinstance(source, str) and source in KNOWN_SOURCES: + spec.setdefault("name", source) + router = SkillRouter(kb=kb) + stats = await asyncio.to_thread(router.sync, source) + except (ValueError, RuntimeError) as e: + return {"error": str(e)} + persist_source_to_config( + runtime_dir, {"name": spec.get("name", stats["slug"]), **spec} + ) + return { + "source": spec["url"], + "slug": stats["slug"], + "skills_indexed": stats["indexed"], + "note": "Searchable via search_skills; install one with install_skill.", + } + + +async def _handle_list_skill_sources(arguments: dict, *, kb: KnowledgeBase) -> dict: + """List known skill sources, each flagged synced/available with its count. + + A source is ``synced`` (searchable via ``search_skills``) only after an + ``add_skill_source`` call has cloned + indexed it; otherwise it is + ``available`` (known name + URL, nothing indexed yet). Reporting the + flag + ``indexed`` count inline means the agent doesn't have to cross- + reference a separate ``synced_collections`` list to tell the difference. + """ + from dsagt.registry import CATALOG_COLLECTION_PREFIX, catalog_collection + from dsagt.skills import KNOWN_SOURCES, SkillRouter, _repo_slug + + synced = {c for c in kb.collections if c.startswith(CATALOG_COLLECTION_PREFIX)} + + # Single source of truth for the per-source synced/indexed view (shared + # with the CLI `skills list --catalog`). + sources = { + s["name"]: { + "url": s["url"], + "description": s["description"], + "synced": s["synced"], + "indexed": s["indexed"], + } + for s in SkillRouter(kb=kb).list_sources() + } + + # Surface any synced catalog whose source isn't in KNOWN_SOURCES (added + # by raw GitHub URL) so the count is never silently dropped. + known_colls = { + catalog_collection(_repo_slug(s["url"])) for s in KNOWN_SOURCES.values() + } + extra = sorted(synced - known_colls) + + any_synced = any(v["synced"] for v in sources.values()) or bool(extra) + return { + "sources": sources, + "other_synced_collections": extra, + "note": ( + "add_skill_source to sync a source whose synced=false; " + "then search_skills to browse. search_skills only sees synced sources." + if any_synced + else "No catalog synced yet — add_skill_source " + "(e.g. 'scientific') to enable one, then search_skills to browse." + ), + } + + +# --------------------------------------------------------------------------- +# Tool defs + handler map (used by the merged server and the test wrapper) +# --------------------------------------------------------------------------- + + +def _skill_tools_and_handlers( + skill_registry: SkillRegistry | None, + kb: KnowledgeBase | None = None, + runtime_dir: str | Path | None = None, +): + """Build the skill ``(tool defs, handler map)``. + + Combined with the other concern modules' tools under one MCP ``Server`` by + :func:`dsagt.mcp.server.create_dsagt_server`. ``runtime_dir`` (the project + dir, where skills install + sources persist) falls back to the skill + registry's ``runtime_dir`` then the KB index's parent. + """ + rt: Path | None = Path(runtime_dir) if runtime_dir else None + if rt is None and skill_registry is not None: + rt = Path(skill_registry.runtime_dir) + if rt is None and kb is not None: + rt = kb.index_dir.parent + + handlers = { + "save_skill": partial(_handle_save_skill, skill_registry=skill_registry), + "search_skills": partial( + _handle_search_skills, kb=kb, skill_registry=skill_registry + ), + "install_skill": partial(_handle_install_skill, runtime_dir=rt), + "add_skill_source": partial(_handle_add_skill_source, kb=kb, runtime_dir=rt), + "list_skill_sources": partial(_handle_list_skill_sources, kb=kb), + } + + tools = [ + types.Tool( + name="save_skill", + description=( + "Register a skill (agent workflow / instructions) into " + "/skills//SKILL.md, where the agent natively " + "auto-discovers it after the next `dsagt start`. Symmetric " + "with save_tool_spec — use this when you've designed a " + "reusable instruction set you want future sessions to load " + "automatically." + ), + inputSchema={ + "type": "object", + "properties": { + # ``anyOf`` for spec mirrors save_tool_spec — accept + # both structured object and JSON-encoded string for + # MCP clients that serialize nested args. + "spec": { + "description": "Skill spec (object or JSON-encoded string)", + "anyOf": [ + { + "type": "object", + "properties": { + "name": { + "type": "string", + "description": "Unique skill identifier (becomes the directory name)", + }, + "description": { + "type": "string", + "description": "What the skill does / when to use it", + }, + "tags": { + "type": "array", + "items": {"type": "string"}, + "description": "Tags for categorizing the skill", + }, + }, + "required": ["name", "description"], + }, + {"type": "string"}, + ], + }, + "body": { + "type": "string", + "description": ( + "Markdown body of the SKILL.md (workflow / " + "instructions the agent will follow). When " + "updating an existing skill, omit to preserve " + "the existing body." + ), + }, + "reference_files": { + "description": ( + "Optional additional files to write into the " + "skill directory. Object mapping relative " + "path -> file contents, or JSON-encoded string." + ), + "anyOf": [ + { + "type": "object", + "additionalProperties": {"type": "string"}, + }, + {"type": "string"}, + ], + }, + }, + "required": ["spec"], + }, + ), + types.Tool( + name="add_skill_source", + description=( + "Enable an external agent-skill source (a known name like " + "'scientific'/'anthropic'/'antigravity'/'composio', or a GitHub URL). " + "Clones it and indexes its skills into the searchable catalog " + "(search_skills). Does NOT load them into context." + ), + inputSchema={ + "type": "object", + "properties": { + "source": { + "type": "string", + "description": "Known source name or GitHub repo URL / owner/repo", + }, + }, + "required": ["source"], + }, + ), + types.Tool( + name="list_skill_sources", + description="List known + synced external skill sources and their indexed catalogs.", + inputSchema={"type": "object", "properties": {}}, + ), + types.Tool( + name="search_skills", + description=( + "Search agent skills by name, tag, or description. Spans installed " + "skills and the external installable catalog. Catalog hits are marked " + "'[catalog]' — use install_skill to add one to this project." + ), + inputSchema={ + "type": "object", + "properties": { + "query": {"type": "string", "description": "Search query"}, + "tag": {"type": "string", "description": "Filter by tag"}, + "skill_name": { + "type": "string", + "description": "Exact skill name lookup", + }, + "top_k": {"type": "integer", "default": 10}, + }, + }, + ), + types.Tool( + name="install_skill", + description=( + "Install a skill from the external catalog (found via search_skills) " + "into this project so the agent can use it natively. Copies SKILL.md " + "+ scripts/references; available natively after the next restart." + ), + inputSchema={ + "type": "object", + "properties": { + "skill_name": { + "type": "string", + "description": "Catalog skill name to install", + }, + }, + "required": ["skill_name"], + }, + ), + ] + return tools, handlers + + +def create_skill_server( + skill_registry: SkillRegistry | None = None, + kb: KnowledgeBase | None = None, + runtime_dir: str | Path | None = None, +): + """Create a standalone MCP server exposing only the skill tools. + + Test-facing API: tests call it with mock deps and drive the server via + ``call_tool_sync()``. The merged ``dsagt-server`` uses + :func:`_skill_tools_and_handlers` directly instead of this wrapper. + """ + tools, handlers = _skill_tools_and_handlers(skill_registry, kb, runtime_dir) + return build_dispatch_server("skills", tools, handlers) diff --git a/src/dsagt/observability.py b/src/dsagt/observability.py index 62405e3..55eb0f2 100644 --- a/src/dsagt/observability.py +++ b/src/dsagt/observability.py @@ -1,8 +1,8 @@ """ DSAgt observability — span emission via MLflow's native tracer provider. -Business modules (knowledge.py, provenance.py, registry_server.py, run_tool.py) -import the small public surface defined here. ``init_tracing`` installs +Business modules (knowledge.py, provenance.py, mcp/registry_tools.py, +run_tool.py) import the small public surface defined here. ``init_tracing`` installs MLflow's own ``TracerProvider`` as the OTel global, so every ``trace.get_tracer(...)`` call routes spans into MLflow's native trace store with full ``mlflow.spanInputs`` / ``mlflow.spanOutputs`` integration and @@ -60,7 +60,7 @@ # every span helper. DSAgt runs one process per project, so the session # id is a process-wide constant set at startup by init_tracing(), and # propagating it implicitly via this module-level value lets business -# code in knowledge.py / provenance.py / registry_server.py emit spans +# code in knowledge.py / provenance.py / mcp/registry_tools.py emit spans # without ever knowing about session ids. The cost is that tests have # to monkeypatch the global to isolate (see _reset_tracing fixture in # test_observability.py). diff --git a/src/dsagt/registry.py b/src/dsagt/registry.py index 7e91aed..967a7c9 100644 --- a/src/dsagt/registry.py +++ b/src/dsagt/registry.py @@ -32,8 +32,25 @@ #: they can be evicted and refreshed on dsagt upgrade without touching #: agent-registered entries. TOOLS_COLLECTION = "tools" +#: Legacy installed-skills collection. No longer written or read: installed +#: skills are natively auto-discovered by every supported agent, so skill +#: search covers only the *catalog* tier below. Kept as a name for back-compat +#: and ``dsagt info`` display of any pre-existing index. SKILLS_COLLECTION = "skills" +#: External skill catalogs (fetched from GitHub repos) live in their own +#: per-source collections named ``skills_catalog__``. Keeping each +#: source in its own collection lets a re-sync drop+rebuild one source's +#: directory without disturbing other catalogs — no delete-by-metadata +#: primitive needed. +CATALOG_COLLECTION_PREFIX = "skills_catalog__" + + +def catalog_collection(slug: str) -> str: + """KB collection name holding the indexed catalog for source *slug*.""" + return f"{CATALOG_COLLECTION_PREFIX}{slug}" + + #: Backwards-compat aliases — kept so external code that imported the #: previous names still resolves. New code should use the names above. TOOL_REGISTRY_COLLECTION = TOOLS_COLLECTION @@ -44,6 +61,7 @@ # Helpers (tools only) # --------------------------------------------------------------------------- + def _uv_run_prefix(deps: list[str]) -> str: """Build a 'uv run --with dep1,dep2 --' prefix for Python dependencies.""" if not deps: @@ -77,12 +95,24 @@ def _generate_tool_body(spec: dict) -> str: for name, p in params.items(): req = "yes" if p.get("required") else "no" default = p.get("default", "—") - lines.append(f"| `{name}` | {req} | {default} | {p.get('description', '')} |\n") + lines.append( + f"| `{name}` | {req} | {default} | {p.get('description', '')} |\n" + ) return "".join(lines) def _parse_frontmatter(path: Path) -> dict: - """Parse YAML frontmatter from a markdown file.""" + """Parse YAML frontmatter from a markdown file. + + Third-party skill catalogs (e.g. Genesis) ship SKILL.md files whose + frontmatter is *intended* as flat ``key: value`` but isn't strict YAML — + most commonly an unquoted ``description`` value that contains a colon + (``...readiness levels: Level 1...``), which PyYAML rejects as a nested + mapping. Rather than silently dropping such skills from discovery, fall back + to a best-effort flat parse (:func:`_lenient_frontmatter`) on YAML error so + ``name`` / ``description`` / ``tags`` are still recovered. dsagt-authored + tool/skill specs are valid YAML, so the fallback never fires for them. + """ text = path.read_text() if not text.startswith("---"): return {} @@ -92,7 +122,53 @@ def _parse_frontmatter(path: Path) -> dict: try: return yaml.safe_load(parts[1]) or {} except yaml.YAMLError as e: - raise ValueError(f"Invalid YAML frontmatter in {path}: {e}") from e + logger.warning( + "Frontmatter in %s isn't strict YAML (%s); recovering flat fields.", + path, + str(e).splitlines()[0], + ) + return _lenient_frontmatter(parts[1]) + + +def _lenient_frontmatter(block: str) -> dict: + """Best-effort flat ``key: value`` parse for frontmatter that isn't strict YAML. + + Splits each top-level line on its **first** colon (so a value may itself + contain colons); indented ``- item`` lines extend the previous key into a + list, other indented lines continue the previous string value. Inline + ``[...]`` / ``{...}`` values are parsed as YAML when they can be. Lines + without a colon, and comments, are ignored. This recovers the discovery + fields (name/description/tags) from technically-invalid-but-obvious + frontmatter instead of dropping the skill. + """ + out: dict = {} + key: str | None = None + for raw in block.splitlines(): + stripped = raw.strip() + if not stripped or stripped.startswith("#"): + continue + if raw[:1].isspace() and key is not None: + # Continuation of the previous key. + if stripped.startswith("- "): + if not isinstance(out.get(key), list): + out[key] = [] + out[key].append(stripped[2:].strip()) + elif isinstance(out.get(key), str): + out[key] = (out[key] + " " + stripped).strip() + continue + if ":" not in stripped: + continue + k, _, v = stripped.partition(":") + key = k.strip() + v = v.strip() + if v.startswith(("[", "{")): + try: + out[key] = yaml.safe_load(v) + except yaml.YAMLError: + out[key] = v + else: + out[key] = v + return out # --------------------------------------------------------------------------- @@ -115,6 +191,7 @@ def _parse_frontmatter(path: Path) -> dict: # Parameters with `type: boolean` render as a bare flag when truthy and emit # nothing when falsy; positional booleans are not supported. + def _parse_cli(cli: str, param_name: str) -> dict: """Classify a cli string into a rendering descriptor. Fails fast on invalid input.""" if cli == "positional": @@ -187,6 +264,7 @@ def render_arguments(parameters: dict, values: dict) -> list[str]: # Tool Registry # --------------------------------------------------------------------------- + class ToolRegistry: """ Manages CLI tool spec files and optional KB indexing. @@ -283,15 +361,17 @@ def list_tools(self) -> list[dict]: properties[param_name]["default"] = param_def["default"] if param_def.get("required", False): required.append(param_name) - tools.append({ - "name": tool["name"], - "description": tool["description"], - "inputSchema": { - "type": "object", - "properties": properties, - "required": required, - }, - }) + tools.append( + { + "name": tool["name"], + "description": tool["description"], + "inputSchema": { + "type": "object", + "properties": properties, + "required": required, + }, + } + ) return tools def get_tool(self, name: str) -> dict | None: @@ -322,7 +402,9 @@ def save_tool(self, spec: dict) -> str: spec = dict(spec) spec["executable"] = _wrap_executable( - spec["name"], spec["executable"], spec.get("dependencies"), + spec["name"], + spec["executable"], + spec.get("dependencies"), ) # Preserve existing body when updating so hand-edited docs survive @@ -389,6 +471,7 @@ def reindex_all(self) -> int: # Skill Registry # --------------------------------------------------------------------------- + class SkillRegistry: """ Manages instruction-based agent skills and optional KB indexing. @@ -435,14 +518,16 @@ def _bundled_skill_dirs(self) -> list[Path]: if not self._bundled_dir.exists(): return [] return [ - d for d in sorted(self._bundled_dir.iterdir()) + d + for d in sorted(self._bundled_dir.iterdir()) if d.is_dir() and (d / "SKILL.md").exists() ] def _project_skill_dirs(self) -> list[Path]: """Return skill directories the agent has saved into this project.""" return [ - d for d in sorted(self.skills_dir.iterdir()) + d + for d in sorted(self.skills_dir.iterdir()) if d.is_dir() and (d / "SKILL.md").exists() ] @@ -474,9 +559,11 @@ def save_skill( ``{relative_path: contents}`` for additional files the skill wants in its directory (templates, schemas, etc.). - Returns "added" or "updated". Indexes the resulting SKILL.md - into ``registered_skills`` via ``_index_skill`` if a KB is - configured — symmetric with ``ToolRegistry.save_tool``. + Returns "added" or "updated". Does **not** index into a KB: + saved skills land in ``/skills/`` where every supported + agent natively auto-discovers them, so search only covers the + not-yet-installed *catalog* tier (see ``SkillRouter``). The old + ``skills`` collection is no longer read by anything. """ name = spec.get("name") if not name: @@ -505,9 +592,6 @@ def save_skill( target.parent.mkdir(parents=True, exist_ok=True) target.write_text(contents) - if self._kb: - self._index_skill(spec, skill_md) - return action def _skill_md_path(self, name: str) -> Path | None: @@ -532,38 +616,3 @@ def get_skill_content(self, name: str) -> str | None: """Get the full SKILL.md content for a skill.""" path = self._skill_md_path(name) return path.read_text() if path is not None else None - - def _index_skill(self, spec: dict, skill_md: Path) -> None: - """Index a skill into the ``skills`` KB collection. - - Errors propagate to the caller — see _index_tool for the rationale. - """ - text = skill_md.read_text() - metadata = { - "skill_name": spec["name"], - "tags": ",".join(spec.get("tags", [])), - "source": "registered", # vs "bundled" - } - self._kb.add_entries( - texts=[text], - collection=SKILLS_COLLECTION, - metadatas=[metadata], - ) - - def reindex_all(self) -> int: - """Reindex project-local skills into ``registered_skills``. - - Bundled skills are NOT indexed here — they live in the shared - ``bundled_skills`` collection (see ToolRegistry.reindex_all - docstring for the same architecture). - """ - if not self._kb: - return 0 - count = 0 - for skill_dir in self._project_skill_dirs(): - skill_md = skill_dir / "SKILL.md" - spec = _parse_frontmatter(skill_md) - if spec.get("name"): - self._index_skill(spec, skill_md) - count += 1 - return count diff --git a/src/dsagt/session.py b/src/dsagt/session.py index 6ae19b3..0679c2f 100644 --- a/src/dsagt/session.py +++ b/src/dsagt/session.py @@ -53,7 +53,7 @@ # ``dsagt setup-kb``). Migrated from ``~/.dsagt/`` on 2026-05-07. REGISTRY_DIR = DEFAULT_PROJECTS_BASE REGISTRY_FILE = REGISTRY_DIR / "projects.yaml" -RESERVED_PROJECT_NAMES = ("projects.yaml", "kb_index") +RESERVED_PROJECT_NAMES = ("projects.yaml", "kb_index", ".skill_sources") DEFAULTS = { # ``llm`` block uses ${VAR} placeholders so per-project config @@ -103,6 +103,24 @@ "vector_db": "chroma", "rerank": False, }, + # External agent-skill catalogs. ``sources`` are GitHub repos whose + # SKILL.md skills get indexed into per-source catalog collections for + # ``search_skills`` (searchable but NOT loaded into agent context). + # The agent installs a chosen one via the ``install_skill`` MCP tool; + # the agent setup then mirrors installed + bundled skills into the + # platform's native skill dir (e.g. ``.claude/skills/``). + "skills": { + "sources": [ + { + "name": "scientific", + "url": "https://github.com/K-Dense-AI/scientific-agent-skills", + "branch": "main", + "subdir": "skills", + }, + ], + "populate_catalog": True, # index sources into the catalog at setup-kb + "populate_native": True, # mirror installed+bundled into .claude/skills + }, } _ENV_VAR_RE = re.compile(r"\$\{(\w+)\}") @@ -112,6 +130,7 @@ # Config helpers # --------------------------------------------------------------------------- + def resolve_env_vars(value): """Replace ${VAR_NAME} references with environment variable values.""" if isinstance(value, str): @@ -157,6 +176,7 @@ def default_config_content( "knowledge": DEFAULTS["knowledge"], "categories": DEFAULTS["categories"], "extraction": DEFAULTS["extraction"], + "skills": DEFAULTS["skills"], } return yaml.dump(body, default_flow_style=False, sort_keys=False) @@ -165,6 +185,7 @@ def default_config_content( # Project registry # --------------------------------------------------------------------------- + def _load_registry() -> dict[str, str]: """Load the project registry. Returns empty dict if no registry exists.""" if not REGISTRY_FILE.exists(): @@ -190,6 +211,34 @@ def list_projects() -> dict[str, str]: return _load_registry() +def kb_from_config(config: dict, index_dir: Path | None = None) -> "KnowledgeBase": + """Build a KnowledgeBase from a resolved project config. + + Mirrors the embedding-backend resolution used by ``extract_session`` so + callers (CLI ``skills`` group, catalog sync) get a KB wired to the same + embedder the project uses. Defaults to ``/kb_index``. + """ + pdir = Path(config["project_dir"]) + emb = config.get("embedding", {}) + backend = emb.get("backend", "local") + if backend == "local": + model = emb.get("model") + if model and "/" not in str(model): + model = None + embedder_kwargs = {"model": model} + else: + embedder_kwargs = { + "model": emb.get("model"), + "base_url": emb.get("base_url"), + "api_key": os.environ.get("EMBEDDING_API_KEY", ""), + } + return KnowledgeBase( + index_dir=index_dir or (pdir / "kb_index"), + default_embedder=backend, + embedder_kwargs=embedder_kwargs, + ) + + def project_dir(name: str) -> Path: """Resolve a project name to its directory via the registry.""" registry = _load_registry() @@ -207,6 +256,7 @@ def project_dir(name: str) -> Path: # Config loading # --------------------------------------------------------------------------- + def load_config(project_name: str) -> dict: """Load and validate a project config by name. @@ -242,13 +292,16 @@ def _validate(config: dict) -> None: backend = config.get("mlflow", {}).get("backend") if backend and backend not in VALID_MLFLOW_BACKENDS: - raise ValueError(f"'mlflow.backend' must be one of {VALID_MLFLOW_BACKENDS}, got '{backend}'") + raise ValueError( + f"'mlflow.backend' must be one of {VALID_MLFLOW_BACKENDS}, got '{backend}'" + ) # --------------------------------------------------------------------------- # Project initialization # --------------------------------------------------------------------------- + def _collection_exists(path: Path) -> bool: """Return True if *path* looks like a persisted KB collection directory. @@ -256,14 +309,11 @@ def _collection_exists(path: Path) -> bool: collections, and bare ChromaDB sqlite files (as produced by ``dsagt setup-kb`` for description-only collections). """ - return ( - path.is_dir() - and ( - (path / "index.faiss").exists() - or (path / "chroma_ids.json").exists() - or (path / "route.json").exists() - or (path / "chroma.sqlite3").exists() - ) + return path.is_dir() and ( + (path / "index.faiss").exists() + or (path / "chroma_ids.json").exists() + or (path / "route.json").exists() + or (path / "chroma.sqlite3").exists() ) @@ -390,7 +440,9 @@ def persist_agent_choice(project_name: str, agent: str) -> None: "# ollama, mistral, groq, deepseek.\n" "# Full list: https://docs.litellm.ai/docs/providers\n" ) - yaml_path.write_text(header + yaml.dump(raw, default_flow_style=False, sort_keys=False)) + yaml_path.write_text( + header + yaml.dump(raw, default_flow_style=False, sort_keys=False) + ) def move_project(project_name: str, new_location: Path) -> Path: @@ -434,6 +486,7 @@ def remove_project(project_name: str, keep_files: bool = False) -> Path: # Service start / stop # --------------------------------------------------------------------------- + def _embedding_provider(config: dict) -> str: """Resolve embedding provider with a fallback for two cases: @@ -481,12 +534,20 @@ def mlflow_command(pdir: Path, mlflow_config: dict, port: int) -> list[str]: else str(mlflow_dir) ) return [ - sys.executable, "-m", "mlflow", "server", - "--backend-store-uri", backend_uri, - "--default-artifact-root", str(mlflow_dir / "artifacts"), - "--host", "0.0.0.0", - "--port", str(port), - "--workers", "1", + sys.executable, + "-m", + "mlflow", + "server", + "--backend-store-uri", + backend_uri, + "--default-artifact-root", + str(mlflow_dir / "artifacts"), + "--host", + "0.0.0.0", + "--port", + str(port), + "--workers", + "1", ] @@ -508,7 +569,10 @@ def _process_command(pid: int) -> str: try: result = subprocess.run( ["ps", "-p", str(pid), "-o", "command="], - capture_output=True, text=True, check=False, timeout=2.0, + capture_output=True, + text=True, + check=False, + timeout=2.0, ) except (FileNotFoundError, subprocess.TimeoutExpired): return "" @@ -555,7 +619,9 @@ def reap_runtime(runtime_file: Path) -> list[str]: for name, (pid, pgid) in pending.items(): try: os.killpg(pgid, signal.SIGKILL) - stopped.append(f"Stopped {name} (pid {pid}, SIGKILL after {_STOP_GRACE_SECONDS}s)") + stopped.append( + f"Stopped {name} (pid {pid}, SIGKILL after {_STOP_GRACE_SECONDS}s)" + ) except ProcessLookupError: stopped.append(f"Stopped {name} (pid {pid})") @@ -619,7 +685,8 @@ def start_services(config: dict) -> dict[str, int]: ) logger.info( "MLflow started (pid %d) → http://localhost:%d", - mlflow_proc.pid, mlflow_port, + mlflow_proc.pid, + mlflow_port, ) pids = {"mlflow": mlflow_proc.pid} @@ -635,11 +702,17 @@ def start_services(config: dict) -> dict[str, int]: pids["proxy"] = proxy_proc.pid ports["proxy"] = proxy_port - runtime_file.write_text(json.dumps({ - "pids": pids, - "ports": ports, - "started_at": datetime.now(timezone.utc).isoformat(), - }, indent=2) + "\n") + runtime_file.write_text( + json.dumps( + { + "pids": pids, + "ports": ports, + "started_at": datetime.now(timezone.utc).isoformat(), + }, + indent=2, + ) + + "\n" + ) if not proxy_requested: _wait_for_mlflow(mlflow_port, mlflow_proc, mlflow_log, timeout=30.0) @@ -650,7 +723,11 @@ def start_services(config: dict) -> dict[str, int]: def _start_proxy( - config: dict, pdir: Path, mlflow_port: int, proxy_port: int, session_id: str, + config: dict, + pdir: Path, + mlflow_port: int, + proxy_port: int, + session_id: str, ) -> subprocess.Popen: """Spawn the dsagt-proxy subprocess. @@ -671,15 +748,25 @@ def _start_proxy( ) cmd = [ - sys.executable, "-m", "dsagt.commands.proxy_server", - "--port", str(proxy_port), - "--mlflow-url", f"http://localhost:{mlflow_port}", - "--project", config["project"], - "--session", session_id, - "--records-dir", str(pdir / "trace_archive"), - "--model", llm["model"], - "--base-url", llm["base_url"], - "--provider", llm["provider"], + sys.executable, + "-m", + "dsagt.commands.proxy_server", + "--port", + str(proxy_port), + "--mlflow-url", + f"http://localhost:{mlflow_port}", + "--project", + config["project"], + "--session", + session_id, + "--records-dir", + str(pdir / "trace_archive"), + "--model", + llm["model"], + "--base-url", + llm["base_url"], + "--provider", + llm["provider"], ] # Embedding routing through the proxy is only relevant when the # project's embedding backend is ``api`` — in ``local`` mode the @@ -693,11 +780,16 @@ def _start_proxy( f"--enable-proxy with embedding.backend=api needs " f"config.embedding.{required} (got {emb.get(required)!r})" ) - cmd.extend([ - "--embedding-model", emb["model"], - "--embedding-base-url", emb["base_url"], - "--embedding-provider", emb["provider"], - ]) + cmd.extend( + [ + "--embedding-model", + emb["model"], + "--embedding-base-url", + emb["base_url"], + "--embedding-provider", + emb["provider"], + ] + ) proxy_log = pdir / "proxy.log" # The proxy needs the *real* upstream credentials in env (not the # sentinel agents see). os.environ already has them from the user's @@ -719,13 +811,17 @@ def _start_proxy( ) logger.info( "Proxy started (pid %d) → http://localhost:%d", - proxy_proc.pid, proxy_port, + proxy_proc.pid, + proxy_port, ) return proxy_proc def _wait_for_proxy( - port: int, proc: subprocess.Popen, log_path: Path, timeout: float = 45.0, + port: int, + proc: subprocess.Popen, + log_path: Path, + timeout: float = 45.0, ) -> None: """Poll *port* until the proxy answers, the subprocess dies, or we time out. @@ -755,7 +851,10 @@ def _wait_for_proxy( def _wait_for_mlflow( - port: int, proc: subprocess.Popen, log_path: Path, timeout: float = 30.0, + port: int, + proc: subprocess.Popen, + log_path: Path, + timeout: float = 30.0, ) -> None: """Poll *port* until MLflow answers, the subprocess dies, or we time out. @@ -787,11 +886,11 @@ def stop_services(project_name: str) -> list[str]: return reap_runtime(project_dir(project_name) / ".runtime") - # --------------------------------------------------------------------------- # Memory extraction orchestration # --------------------------------------------------------------------------- + def run_extraction(project_name: str) -> dict: """Two-phase post-session work, both best-effort. @@ -857,7 +956,8 @@ def run_extraction(project_name: str) -> dict: mlflow_port = config.get("mlflow", {}).get("port") mlflow_uri = ( - f"http://localhost:{mlflow_port}" if mlflow_port + f"http://localhost:{mlflow_port}" + if mlflow_port else os.environ.get("MLFLOW_TRACKING_URI") ) try: diff --git a/src/dsagt/skills.py b/src/dsagt/skills.py new file mode 100644 index 0000000..27627d5 --- /dev/null +++ b/src/dsagt/skills.py @@ -0,0 +1,843 @@ +"""Skill discovery — catalog data plane, keyword scorer, and the router facade. + +One module for all the importable skill-discovery logic that is *not* an entry +point (entry points — the MCP tool handlers — stay in +``commands/registry_server.py`` and ``commands/knowledge_server.py``). Three +cohesive concerns, in dependency order: + +1. **Keyword scorer** (:func:`score_skill` / :func:`rank_skills`) — a faithful + reimplementation of the Genesis Skills ``skill-search`` engine, the + zero-dependency fallback ranker used when no embedder / KB is configured. +2. **Catalog data plane** (:class:`SkillsCatalog` + its module functions) — + fetch Agent-Skills repos, index per-source into ``skills_catalog__`` + collections, search, and install into a project. +3. **Router facade** (:class:`SkillRouter`) — the thin render layer the MCP + ``search_skills`` tool and the ``dsagt skills`` CLI share. + +Two tiers (see the skill-management plan): + +* **Catalog** — every skill in a configured source repo, indexed into a + per-source ``skills_catalog__`` KB collection. Searchable via + ``search_skills``, but NOT copied locally and NOT loaded into the agent's + context. This is the one job native skill discovery can't do (you can't hold + thousands of skill descriptions in context). +* **Installed** — a chosen skill copied into ``/skills//``. The + agent setup mirrors it into the agent's native skills dir for native + discovery (see ``agents.base.setup_skills``). + +Re-sync is idempotent by dropping the per-source collection directory and +rebuilding it. ``clone_github`` is imported lazily inside :func:`sync_source` +to avoid an import cycle with ``setup_core_kb`` (which calls back into +:func:`sync_source`). + +Genesis Skills: Apache-2.0, gitlab.osti.gov/genesis/genesis-skills +(``skill_search/catalog.py``). See ``design-notes/genesis-skills-comparison.md`` +and ``design-notes/skills-catalog-server-merge.md``. +""" + +from __future__ import annotations + +import json +import logging +import re +import shutil +from pathlib import Path + +from dsagt.registry import ( + CATALOG_COLLECTION_PREFIX, + _parse_frontmatter, + catalog_collection, +) +from dsagt.session import REGISTRY_DIR + +logger = logging.getLogger(__name__) + + +# =========================================================================== +# Keyword scorer — Genesis-derived token-overlap fallback (stdlib only) +# =========================================================================== +# +# A faithful reimplementation (not an import) of the Genesis Skills +# ``skill-search`` engine (``skill_search/catalog.py``: ``_score_skill`` / +# ``rank_skills``). Used by :class:`SkillsCatalog` when no embedder / KB is +# configured: keyword overlap only, deterministic. +# +# Scoring (per skill, against a query) — matching Genesis exactly: +# +# * +2 for each query token that also appears in the skill **name** +# * +1 for each query token that also appears in the **description** +# * then **at most one** substring bonus (mutually exclusive, in priority +# order): +6 if the query equals the name, else +4 if it is a substring of +# the name, else +2 if it is a substring of the description +# +# Tokens are casefolded ``\w+`` runs with hyphens split, single-character +# tokens and stopwords dropped. Ties break by name (ascending); below +# ``min_score`` are dropped. + +#: Stopword set — kept identical to Genesis so ranking parity holds. +_STOPWORDS = frozenset( + { + "a", + "an", + "and", + "as", + "at", + "be", + "for", + "from", + "if", + "in", + "into", + "is", + "it", + "of", + "on", + "or", + "please", + "the", + "this", + "to", + "use", + "using", + "with", + } +) + +_TOKEN_RE = re.compile(r"\w+", flags=re.UNICODE) + + +def _tokens(text: str) -> set[str]: + """Casefolded word tokens (hyphens split), single-char + stopwords removed.""" + normalized = (text or "").casefold().replace("-", " ") + return { + t for t in _TOKEN_RE.findall(normalized) if len(t) > 1 and t not in _STOPWORDS + } + + +def score_skill(query: str, name: str, description: str) -> float: + """Token-overlap score of one skill against *query* (0.0 = no match).""" + qtokens = _tokens(query) + normalized_query = (query or "").casefold().strip() + if not qtokens and not normalized_query: + return 0.0 + + score = 2 * len(qtokens & _tokens(name)) + len(qtokens & _tokens(description)) + + if normalized_query: + name_l = (name or "").casefold() + if normalized_query == name_l: + score += 6 + elif normalized_query in name_l: + score += 4 + elif normalized_query in (description or "").casefold(): + score += 2 + return float(score) + + +def rank_skills( + query: str, skills, top_k: int | None = 8, min_score: int = 1 +) -> list[tuple[dict, float]]: + """Rank *skills* (dicts with ``name`` + ``description``) against *query*. + + Returns ``[(skill, score), ...]`` for skills scoring at least *min_score*, + sorted by score descending then name ascending, truncated to *top_k* (all + when *top_k* is ``None``). + """ + scored: list[tuple[dict, float]] = [] + for s in skills: + sc = score_skill(query, s.get("name", ""), s.get("description", "")) + if sc >= min_score: + scored.append((s, sc)) + scored.sort(key=lambda kv: (-kv[1], (kv[0].get("name") or ""))) + return scored[:top_k] if top_k is not None else scored + + +# =========================================================================== +# Catalog data plane — sources, sync/index, lookup/install, SkillsCatalog +# =========================================================================== + +#: Default source enabled out of the box (matches dsagt_config.yaml default). +DEFAULT_SOURCE = "scientific" + +#: Curated, named skill sources. ``subdir`` scopes the recursive SKILL.md +#: walk when set (cheaper clone); when omitted the whole repo is cloned and +#: walked, which is robust to category-nested layouts. +KNOWN_SOURCES: dict[str, dict] = { + "scientific": { + "url": "https://github.com/K-Dense-AI/scientific-agent-skills", + "branch": "main", + "subdir": "skills", + "description": "K-Dense scientific agent skills — chem/bio/medicine/materials (140+).", + }, + "anthropic": { + "url": "https://github.com/anthropics/skills", + "branch": "main", + "subdir": "skills", + "description": "Official Anthropic skills + document-editing examples.", + }, + "antigravity": { + "url": "https://github.com/sickn33/antigravity-awesome-skills", + "branch": "main", + "subdir": None, + "description": "Antigravity Awesome Skills — 1,500+ cross-platform agentic skills.", + }, + "composio": { + "url": "https://github.com/ComposioHQ/awesome-claude-skills", + "branch": "master", + "subdir": None, + "description": "Composio awesome-claude-skills — workflow skills for many SaaS apps.", + }, + "genesis": { + "url": "https://gitlab.osti.gov/genesis/genesis-skills", + "branch": "main", + "subdir": "skills", + "description": "GENESIS skills (OSTI GitLab) — aggregated agent-skill " + "catalog: HPC (Slurm/PBS, Perlmutter/Aurora/Frontier), HuggingFace, " + "LangChain, OpenAI, Anthropic, plasma-sim, ModCon, and more (70+).", + }, +} + +#: Shared, machine-global cache of cloned source repos (sibling of kb_index/). +SKILL_SOURCES_DIR = REGISTRY_DIR / ".skill_sources" + + +# --------------------------------------------------------------------------- +# Source resolution + slugging +# --------------------------------------------------------------------------- + + +def resolve_source(source: str | dict) -> dict: + """Resolve a known-source name, a git URL (any host), or a full spec dict. + + A full ``http(s)://`` / ``git@`` URL works for any host (GitHub, GitLab, + …); the bare ``owner/repo`` shorthand assumes GitHub. Returns a dict with + at least ``url``; optional ``branch`` / ``subdir``. + """ + if isinstance(source, dict): + if not source.get("url"): + raise ValueError("source dict must include a 'url'") + return source + if source in KNOWN_SOURCES: + return dict(KNOWN_SOURCES[source]) + if source.startswith(("http://", "https://", "git@")) or source.count("/") == 1: + # Full URL or ``owner/repo`` shorthand. + url = ( + source + if "://" in source or source.startswith("git@") + else f"https://github.com/{source}" + ) + return {"url": url, "branch": "main", "subdir": None} + raise ValueError( + f"Unknown skill source '{source}'. Use a known name " + f"({', '.join(sorted(KNOWN_SOURCES))}), a git URL (any host), " + f"or owner/repo (GitHub)." + ) + + +def persist_source_to_config(project_dir: str | Path, spec: dict) -> bool: + """Append a resolved source to ``skills.sources`` in the project config. + + Dedupes by URL. Returns True if the config was updated. No-op (returns + False) if the config file is missing — the catalog is still indexed + either way. Used by both the ``add_skill_source`` MCP tool and the + ``dsagt skills add`` CLI so a CLI-added source is re-synced by a later + config-driven ``dsagt skills sync``. + """ + import yaml + + cfg_path = Path(project_dir) / "dsagt_config.yaml" + if not cfg_path.exists(): + return False + cfg = yaml.safe_load(cfg_path.read_text()) or {} + sources = cfg.setdefault("skills", {}).setdefault("sources", []) + if any(s.get("url") == spec.get("url") for s in sources): + return False + sources.append( + {k: spec[k] for k in ("name", "url", "branch", "subdir") if k in spec} + ) + cfg_path.write_text(yaml.dump(cfg, default_flow_style=False, sort_keys=False)) + return True + + +def _repo_slug(url: str) -> str: + """Stable, collection-name-safe slug from a repo URL (``owner-repo``). + + Host-agnostic: the scheme and host are stripped so github.com, gitlab.*, + etc. all reduce to the ``owner/repo`` path. GitHub URLs keep the slug + they had before this generalization, so existing catalog collections do + not need rebuilding. + """ + s = url.rstrip("/") + s = re.sub(r"^https?://", "", s) # drop scheme + s = re.sub(r"^git@", "", s) # ssh form: git@host:owner/repo + s = re.sub(r"\.git$", "", s).lower() + s = re.sub(r"^[^/:]+[/:]", "", s) # drop the host segment + s = re.sub(r"[^a-z0-9]+", "-", s).strip("-") + return s[:40] + + +# --------------------------------------------------------------------------- +# Discovery +# --------------------------------------------------------------------------- + + +def _discover_skill_dirs(root: Path) -> list[Path]: + """Recursively find skill directories (any dir holding a parseable SKILL.md). + + Recursive so both flat (``skills//SKILL.md``) and category-nested + (``skills///SKILL.md``) repo layouts work. A directory + qualifies only if its SKILL.md has YAML frontmatter with a ``name``. + """ + out: list[Path] = [] + if not root.exists(): + return out + for skill_md in sorted(root.rglob("SKILL.md")): + try: + spec = _parse_frontmatter(skill_md) + except ValueError as e: # malformed frontmatter — skip, don't abort + logger.warning("skipping %s: %s", skill_md, e) + continue + if spec.get("name"): + out.append(skill_md.parent) + return out + + +# --------------------------------------------------------------------------- +# Sync (clone + index) +# --------------------------------------------------------------------------- + + +def sync_source( + source: str | dict, + *, + kb=None, + cache_dir: Path = SKILL_SOURCES_DIR, + force: bool = False, +) -> dict: + """Clone *source* into the cache and (re)index its skills into the catalog. + + ``force`` re-clones from scratch. Indexing wipes and rebuilds only this + source's ``skills_catalog__`` collection, so other catalogs and the + installed/bundled ``skills`` collection are untouched. When *kb* is None + the clone still happens (so ``install`` works offline-of-KB) but nothing + is indexed. + """ + spec = resolve_source(source) + slug = _repo_slug(spec["url"]) + dest = cache_dir / slug + + if force and dest.exists(): + shutil.rmtree(dest) + if not dest.exists(): + from dsagt.commands.setup_core_kb import clone_github # lazy: break cycle + + dest.mkdir(parents=True, exist_ok=True) + subdir = spec.get("subdir") + include = [subdir] if subdir else None + clone_github( + spec["url"], dest, branch=spec.get("branch", "main"), include=include + ) + + walk_root = dest / spec["subdir"] if spec.get("subdir") else dest + skill_dirs = _discover_skill_dirs(walk_root) + indexed = index_catalog(skill_dirs, slug, spec["url"], kb) if kb is not None else 0 + if kb is not None and not skill_dirs: + logger.warning( + "source %s yielded no SKILL.md skills under %s", spec["url"], walk_root + ) + + return { + "slug": slug, + "url": spec["url"], + "discovered": len(skill_dirs), + "indexed": indexed, + "cache_dir": str(dest), + } + + +def _catalog_embed_text(spec: dict, fallback_name: str) -> str: + """Text embedded for catalog search: the frontmatter ``name`` + ``description`` + (+ ``tags``) only — *not* the SKILL.md body. + + Discovery is progressive-disclosure level 1: the description is authored to + say *what the skill does and when to use it*, which is exactly the match + target. Embedding the body would dilute that signal, and the embedder + truncates long input anyway (so a full SKILL.md is both incomplete and + misallocated). This also keeps the semantic backend ranking on the same + fields as the keyword fallback. + """ + name = spec.get("name") or fallback_name + desc = spec.get("description") or "" + tags = " ".join(spec.get("tags") or []) + return f"{name}: {desc} {tags}".strip() + + +def index_catalog(skill_dirs: list[Path], slug: str, url: str, kb) -> int: + """Wipe + rebuild source *slug*'s catalog collection from *skill_dirs*.""" + collection = catalog_collection(slug) + coll_dir = Path(kb.index_dir) / collection + if coll_dir.exists(): + shutil.rmtree(coll_dir) + + texts: list[str] = [] + metas: list[dict] = [] + for d in skill_dirs: + skill_md = d / "SKILL.md" + spec = _parse_frontmatter(skill_md) + name = spec.get("name") or d.name + texts.append(_catalog_embed_text(spec, d.name)) + metas.append( + { + "skill_name": name, + "description": (spec.get("description") or "")[:300], + "tags": ",".join(spec.get("tags") or []), + "source": f"catalog:{slug}", + "source_url": url, + "cache_path": str(d), + } + ) + if texts: + kb.add_entries(texts=texts, collection=collection, metadatas=metas) + return len(texts) + + +# --------------------------------------------------------------------------- +# Lookup + install +# --------------------------------------------------------------------------- + + +def find_catalog_skill(name: str, *, cache_dir: Path = SKILL_SOURCES_DIR) -> Path: + """Locate a cached catalog skill dir by name across all synced sources. + + Matches on frontmatter ``name`` first, then directory name. A bare name + must be unique across the machine-global clone cache; when the same name + exists in more than one synced source, pass a **source-qualified** + ``/`` (the slug is the per-source cache dir / catalog-collection + suffix, as shown by ``list_skill_sources`` / ``dsagt skills list + --catalog``) to pick one. Raises on no match or on a still-ambiguous bare + name. + """ + source_filter: str | None = None + skill = name + if "/" in name: + # Skill names never contain '/', so a slash means "/". + source_filter, skill = name.split("/", 1) + + matches: list[Path] = [] + if cache_dir.exists(): + for slug_dir in sorted(p for p in cache_dir.iterdir() if p.is_dir()): + if source_filter is not None and slug_dir.name != source_filter: + continue + for d in _discover_skill_dirs(slug_dir): + spec = _parse_frontmatter(d / "SKILL.md") + if spec.get("name") == skill or d.name == skill: + matches.append(d) + if not matches: + where = f" in source '{source_filter}'" if source_filter else "" + raise LookupError( + f"No catalog skill named '{skill}'{where}. Run 'dsagt skills sync' " + f"or add_skill_source first, then search_skills to find one." + ) + # Collapse matches that point at the same source repo (slug = first path + # part under cache_dir); ambiguity only matters across different sources. + by_source = {p.relative_to(cache_dir).parts[0]: p for p in matches} + if len(by_source) > 1: + sources = sorted(by_source) + raise LookupError( + f"Skill '{skill}' exists in multiple sources ({', '.join(sources)}); " + f"reinstall with a source-qualified name, e.g. '{sources[0]}/{skill}'." + ) + return next(iter(by_source.values())) + + +#: License / attribution files to preserve when installing a catalog skill. +_ATTRIBUTION_GLOBS = ( + "LICENSE*", + "NOTICE*", + "COPYING*", + "COPYRIGHT*", + "ATTRIBUTION*", +) + + +def _capture_attribution(src: Path, dest: Path, cache_dir: Path) -> list[str]: + """Preserve license/attribution when installing a (third-party) catalog skill. + + ``copytree`` already carries files *inside* the skill dir. A skill is often + governed by a per-subtree or repo-root ``LICENSE`` / ``NOTICE`` / + ``ATTRIBUTION`` that lives *outside* its own folder, so this also pulls those + from ancestor dirs up to the source repo root (which ``clone_github`` mirrors + into the cache root even for sparse ``subdir`` clones). Nearest ancestor + wins a filename collision; skill-local files (already in ``dest``) are never + overwritten. Always stamps a ``PROVENANCE.txt`` recording the source. + Returns the names of files captured from ancestors. + """ + src, dest, cache_dir = Path(src), Path(dest), Path(cache_dir) + try: + slug = src.relative_to(cache_dir).parts[0] + repo_root = cache_dir / slug + rel = src.relative_to(repo_root) + except ValueError: # src outside the cache (shouldn't happen) — degrade. + slug, repo_root, rel = src.parent.name, src.parent, Path(src.name) + + captured: list[str] = [] + node = src.parent + while True: + for pat in _ATTRIBUTION_GLOBS: + for f in sorted(node.glob(pat)): + if f.is_file() and not (dest / f.name).exists(): + shutil.copy2(f, dest / f.name) + captured.append(f.name) + if node == repo_root or node == node.parent: + break + node = node.parent + + (dest / "PROVENANCE.txt").write_text( + f"Installed by dsagt from catalog source: {slug}\n" + f"Source path in repo: {rel}\n" + ) + return captured + + +def install_into_project( + name: str, project_dir: str | Path, *, cache_dir: Path = SKILL_SOURCES_DIR +) -> dict: + """Copy a catalog skill into ``/skills//`` (with scripts/refs). + + The destination directory is named after the skill's frontmatter ``name`` + (falling back to its source dir name) so it matches the invocable name in + native discovery. Preserves upstream license/attribution (see + :func:`_capture_attribution`). Returns + ``{name, source_dir, dest_dir, action, attribution}``. + """ + src = find_catalog_skill(name, cache_dir=cache_dir) + spec = _parse_frontmatter(src / "SKILL.md") + skill_name = spec.get("name") or src.name + + dest = Path(project_dir) / "skills" / skill_name + action = "updated" if dest.exists() else "added" + if dest.exists(): + shutil.rmtree(dest) + shutil.copytree(src, dest) + attribution = _capture_attribution(src, dest, cache_dir) + + return { + "name": skill_name, + "source_dir": str(src), + "dest_dir": str(dest), + "action": action, + "attribution": attribution, + } + + +# --------------------------------------------------------------------------- +# SkillsCatalog — the catalog data plane (composition over KnowledgeBase) +# --------------------------------------------------------------------------- + + +class SkillsCatalog: + """The external-skill *catalog* behind one object. + + Composition over :class:`~dsagt.knowledge.KnowledgeBase`: it holds a KB + handle (the host server's existing instance → shared embedder, no second + model load) plus the clone-cache dir, and exposes the skill-domain ops — + ``sync`` / ``search`` / ``install`` / ``list_sources``. The skill-specific + behavior (frontmatter-indexed catalog collections, the no-embedder keyword + fallback over the clone cache) lives here; the vector store + embedder are + the shared KB. :class:`SkillRouter` is a thin render/MCP facade over this. + + Catalog tier only: installed/created skills are natively auto-discovered by + every supported agent, so they are never search candidates. ``search`` + covers the not-yet-installed catalog (which native discovery can't see) + plus the Cline-skills-disabled / no-embedder case via the keyword scorer. + """ + + def __init__(self, *, kb=None, cache_dir: Path | None = None): + """Compose a catalog over an existing KB + a clone-cache directory. + + ``kb`` is the host server's :class:`~dsagt.knowledge.KnowledgeBase` (so + the embedder/Chroma are shared, never a second model load); pass ``None`` + for the no-embedder keyword path. ``cache_dir`` overrides the default + machine-global clone cache (:data:`SKILL_SOURCES_DIR`) — handy for tests. + """ + self._kb = kb + self._cache_dir = cache_dir # default resolved lazily + + @property + def has_kb(self) -> bool: + """True when an embedder-backed KB is available (vs the keyword path).""" + return self._kb is not None + + def _resolved_cache_dir(self) -> Path: + """The clone-cache dir — the ``cache_dir`` override or the global default.""" + return Path(self._cache_dir or SKILL_SOURCES_DIR) + + # -- write ops (delegate to the module functions) ------------------------ + + def sync(self, source, *, force: bool = False) -> dict: + """Clone + frontmatter-index a source into its catalog collection.""" + return sync_source( + source, kb=self._kb, cache_dir=self._resolved_cache_dir(), force=force + ) + + def install(self, name: str, project_dir) -> dict: + """Copy a catalog skill into ``/skills//`` (+ attribution).""" + return install_into_project( + name, project_dir, cache_dir=self._resolved_cache_dir() + ) + + # -- backend selection --------------------------------------------------- + + def synced_collections(self) -> list[str]: + """The ``skills_catalog__*`` collections currently indexed in the KB.""" + if self._kb is None: + return [] + return [ + c for c in self._kb.collections if c.startswith(CATALOG_COLLECTION_PREFIX) + ] + + def search(self, query=None, *, top_k: int = 8, tag=None) -> list[dict]: + """Rank catalog candidates → normalized hit dicts (data, not rendered). + + ChromaDB semantic search when a KB exists, else the Genesis-derived + keyword scorer over the clone cache. + """ + if self._kb is not None: + return self._select_kb(query, top_k, tag) + return self._select_keyword(query, top_k, tag) + + def _select_kb(self, query, top_k: int, tag) -> list[dict]: + """Semantic backend: ChromaDB search across every synced catalog + collection, merged + sorted by score into normalized hit dicts. + + Each ``skills_catalog__*`` collection is queried independently (a + missing/corrupt one is skipped, not fatal); when a ``tag`` filter is + set we over-fetch (``top_k * 3``) then post-filter so the tag doesn't + starve the result set. + """ + collections = self.synced_collections() + fetch_k = top_k * 3 if tag else top_k + hits: list[dict] = [] + for coll in collections: + try: + hits.extend( + self._kb.search( + query=query or "skill", collection=coll, top_k=fetch_k + ) + ) + except (FileNotFoundError, KeyError, ValueError): + continue + out = [] + for r in hits: + chunk = r.get("chunk", {}) + meta = chunk.get("metadata", {}) + out.append( + { + "name": meta.get("skill_name", "unknown"), + "summary": (meta.get("description") or chunk.get("text", "") or "")[ + :200 + ], + "source": meta.get("source", ""), + "tags": meta.get("tags", ""), + "score": r.get("score", 0.0), + } + ) + if tag: + out = [h for h in out if tag in (h["tags"] or "")] + out.sort(key=lambda h: h["score"], reverse=True) + return out[:top_k] + + def _candidate_skills(self) -> list[dict]: + """Cached-catalog skills as scorer-ready dicts (no KB needed).""" + cands: list[dict] = [] + cache = self._resolved_cache_dir() + if cache.exists(): + for slug_dir in sorted(p for p in cache.iterdir() if p.is_dir()): + for d in _discover_skill_dirs(slug_dir): + spec = _parse_frontmatter(d / "SKILL.md") + if spec.get("name"): + cands.append( + { + "name": spec["name"], + "description": spec.get("description", ""), + "tags": ",".join(spec.get("tags", []) or []), + "source": f"catalog:{slug_dir.name}", + } + ) + return cands + + def _select_keyword(self, query, top_k: int, tag) -> list[dict]: + """No-embedder backend: rank cached-catalog skills with the Genesis + token-overlap scorer (:func:`rank_skills`) into normalized hit dicts. + + With no ``query`` there's nothing to score, so it returns the first + ``top_k`` candidates (tag-filtered) at score 0.0 — a browse mode rather + than a search. + """ + cands = self._candidate_skills() + if tag: + cands = [c for c in cands if tag in (c["tags"] or "")] + if not query: + picks = cands[:top_k] + return [ + {**c, "summary": (c["description"] or "")[:200], "score": 0.0} + for c in picks + ] + ranked = rank_skills(query, cands, top_k=top_k) + return [ + {**c, "summary": (c["description"] or "")[:200], "score": sc} + for c, sc in ranked + ] + + # -- source view --------------------------------------------------------- + + def list_sources(self) -> list[dict]: + """Known sources + synced flag + indexed count (one source of truth).""" + synced = set(self.synced_collections()) + out = [] + for name, spec in KNOWN_SOURCES.items(): + coll = catalog_collection(_repo_slug(spec["url"])) + is_synced = coll in synced + out.append( + { + "name": name, + "url": spec["url"], + "description": spec.get("description", ""), + "synced": is_synced, + "indexed": self._indexed_count(coll) if is_synced else 0, + } + ) + return out + + def _indexed_count(self, collection: str) -> int: + """Number of skills indexed in a catalog *collection* (0 if absent/no KB). + + Reads the collection's persisted ``chroma_ids.json`` directly rather than + querying the store — cheap, and works without loading the embedder. + """ + if self._kb is None: + return 0 + ids = Path(self._kb.index_dir) / collection / "chroma_ids.json" + try: + return len(json.loads(ids.read_text())) + except (FileNotFoundError, ValueError, OSError): + return 0 + + +# =========================================================================== +# SkillRouter — the thin render/MCP facade over the catalog +# =========================================================================== +# +# ``SkillRouter`` adds only the presentation concerns that the MCP handlers and +# the CLI share: rendering a ranked hit list into the ``search_skills`` string, +# the empty-result message, and the exact-``skill_name`` lookup (which needs the +# installed-skill registry, not the catalog). +# +# Construct it from the same inputs at every call site (MCP ``search_skills``, +# CLI ``skills search/list``) so policy can't diverge between them — or hand it +# a prebuilt :class:`SkillsCatalog` via ``catalog=`` so a server that already +# owns a shared KB reuses one catalog instance. +# +# Skill *materialization* (mirroring installed skills into each agent's native +# skills directory) lives in the agent layer (``AgentSetup.setup_skills``), not +# here: every supported agent (claude/codex/goose/cline/roo) natively +# auto-discovers ``SKILL.md`` folders, so there is no agent-facing disclosure +# tier for the router to own. ``search_skills`` exists for the *catalog* tier +# (skills not yet installed, which native discovery can't see) plus the +# no-embedder keyword fallback. + + +def _where_label(source: str) -> str: + """Human tag for a hit's origin, matching the legacy search output.""" + if source in ("bundled", "registered", "installed"): + return " [installed]" + if source.startswith("catalog:"): + return " [catalog · install_skill to add]" + return "" + + +class SkillRouter: + """Renders catalog discovery for the MCP ``search_skills`` tool + the CLI.""" + + def __init__(self, *, kb=None, skill_registry=None, cache_dir=None, catalog=None): + """Wire the router to a catalog + (optional) installed-skill registry. + + Pass a prebuilt ``catalog`` (a :class:`SkillsCatalog`) to share one + instance with the server; otherwise one is constructed from ``kb`` / + ``cache_dir``. ``skill_registry`` is only consulted for the exact + ``skill_name`` lookup in :meth:`search` (installed skills live there, + not in the catalog), so it may be ``None`` for catalog-only callers. + """ + self._catalog = ( + catalog + if catalog is not None + else SkillsCatalog(kb=kb, cache_dir=cache_dir) + ) + self._reg = skill_registry + + # -- rendering ----------------------------------------------------------- + + def _render_search(self, hits: list[dict]) -> str: + """Format ranked catalog hits into the ``search_skills`` markdown list. + + Each line carries the skill name, an origin tag (:func:`_where_label`), + the score, and the summary — the human-facing string the MCP tool and + CLI both return. + """ + lines = [] + for h in hits: + lines.append( + f"- **{h['name']}**{_where_label(h['source'])} " + f"(score: {h['score']:.2f})\n {h['summary']}" + ) + return f"Found {len(hits)} skill(s):\n\n" + "\n\n".join(lines) + + def _empty_message(self) -> str: + """The no-results string, tailored to *why* nothing matched. + + When a KB exists but no catalog source is synced yet, point the agent + at ``list_skill_sources`` / ``add_skill_source`` (the likely cause); + otherwise it's a genuine no-match for the query. + """ + if not self._catalog.synced_collections() and self._catalog.has_kb: + return ( + "No catalog skills found. No external skill catalog is synced " + "yet — search covers the catalog (skills you can install), since " + "installed skills are already natively discoverable. Call " + "list_skill_sources() to see available sources, then " + "add_skill_source(source=...) to sync one before searching again." + ) + return "No catalog skills found matching the query." + + # -- public API ---------------------------------------------------------- + + def search(self, query=None, *, top_k: int = 8, tag=None, skill_name=None) -> str: + """Stage B. Select + render. Stateless — no session/exposure tracking.""" + if skill_name: + import yaml + + if self._reg is None: + return f"No skill named '{skill_name}'." + spec = self._reg.get_skill(skill_name) + if spec: + return f"Found skill '{skill_name}':\n\n" + yaml.dump( + spec, default_flow_style=False, sort_keys=False + ) + return f"No skill named '{skill_name}'." + + hits = self._catalog.search(query, top_k=top_k, tag=tag) + if not hits: + return self._empty_message() + return self._render_search(hits) + + def sync(self, source, *, force: bool = False) -> dict: + """Stage A passthrough — see :meth:`SkillsCatalog.sync`.""" + return self._catalog.sync(source, force=force) + + def install(self, name: str, project_dir) -> dict: + """Stage C passthrough — see :meth:`SkillsCatalog.install`.""" + return self._catalog.install(name, project_dir) + + def list_sources(self) -> list[dict]: + """Stage A view passthrough — see :meth:`SkillsCatalog.list_sources`.""" + return self._catalog.list_sources() diff --git a/src/dsagt/skills/datacard-generator/SKILL.md b/src/dsagt/skills/datacard-generator/SKILL.md deleted file mode 100644 index 3a2c06a..0000000 --- a/src/dsagt/skills/datacard-generator/SKILL.md +++ /dev/null @@ -1,88 +0,0 @@ ---- -name: generating-datacards -description: "Generates MODCON Datacard v1 documentation for scientific datasets by introspecting a directory and filling a structured template. Use when the user asks to create a datacard, dataset card, dataset documentation, dataset metadata, document a dataset, or prepare a dataset for sharing. Supports three readiness levels: Level 1 (Discoverable), Level 2 (Interoperable & Reusable), Level 3 (Understandable & Trustworthy)." ---- - -# Generating Datacards - -Introspect a dataset directory to auto-populate a MODCON Datacard v1, then prompt the user for anything that couldn't be inferred. - -## Workflow - -Copy this checklist and check off steps as you go: - -``` -Progress: -- [ ] 1. Gather context (path + level) -- [ ] 2. Introspect dataset directory -- [ ] 3. Auto-fill the data card -- [ ] 4. Prompt for missing fields -- [ ] 5. Generate output file -- [ ] 6. Review summary -``` - -### 1. Gather Context - -Ask the user: -- **Dataset path** — directory to document -- **Readiness level** — choose one: - - **Level 1 — Discoverable**: identification, description, file structure, keywords - - **Level 2 — Interoperable & Reusable**: Level 1 + contacts, access, license, provenance, authors, related resources - - **Level 3 — Understandable & Trustworthy**: Level 2 + schema, quality, integrity, AI/ML, variable docs - -### 2. Introspect the Dataset Directory - -See [reference/introspection-commands.md](reference/introspection-commands.md) for exact commands to run for file structure, schema extraction, and existing metadata files. - -### 3. Auto-Fill the Data Card - -Read `references/datacard_template_v1.md`. Populate fields using this decision table: - -| Field | Auto-fill if… | Otherwise | -|-------|--------------|-----------| -| `datacard_creation.created_date` | Always | — | -| `datacard_creation.creation_method` | Always → `"hybrid"` | — | -| `dataset_info.data_formats` | File extensions found | Prompt | -| `dataset_info.modalities` | File types recognized | Prompt | -| `dataset_info.features` | Column headers extracted | Prompt at Level 2+ | -| `dataset_info.splits` | train/test/val dirs found | Leave empty | -| `dataset_counts` | File/record counts available | Prompt | -| `dataset_storage` | `du` output available | Prompt | -| `title` / `name` | README or metadata file found | Prompt | -| Description | README found | Prompt | -| Keywords | Metadata files found | Prompt at Level 1+ | -| License | LICENSE file found | Prompt at Level 2+ | -| Authors / contributors | CITATION.cff found | Prompt at Level 2+ | -| Citation | .bib or CITATION.cff found | Prompt at Level 2+ | - -Cross-check these fields for consistency between YAML frontmatter and markdown body: `title`/`name`, `authors`/`contributors`, `license`, `dataset_info.data_formats`, key dates. - -### 4. Prompt for Missing Fields - -Present auto-discovered values for confirmation, then ask for unfilled fields. Ask **3–5 at a time**. See [reference/field-prompts.md](reference/field-prompts.md) for the full per-level field list. - -### 5. Generate the Data Card - -- **Filename**: `modcon_datacard_.md` -- **Location**: save inside `/` by default; ask if the user prefers elsewhere -- Replace all `[!TODO]`, ``, ``, `` — none should appear in output -- Set `dataset_readiness.level` to chosen level -- Unprovided fields: `"N/A"` in YAML, brief "Not provided." in markdown body - -### 6. Review Summary - -Present: -- ✅ Auto-populated fields -- ✏️ User-provided fields -- ⬜ Empty/N/A fields (with reason) -- 💡 Suggestions for improvement - -Ask if the user wants to revise any sections. - -## References - -- **Template**: `references/datacard_template_v1.md` -- **Datasheet companion**: `references/datasheet_template.md` — consult if user mentions "datasheet" questions -- **Introspection commands**: [reference/introspection-commands.md](reference/introspection-commands.md) -- **Per-level field prompts**: [reference/field-prompts.md](reference/field-prompts.md) -- **Lookup tables** (OSTI codes, sensitivity tiers): [reference/lookup-tables.md](reference/lookup-tables.md) \ No newline at end of file diff --git a/src/dsagt/skills/datacard-generator/reference/datacard_template_v1.md b/src/dsagt/skills/datacard-generator/reference/datacard_template_v1.md deleted file mode 100644 index 3982f47..0000000 --- a/src/dsagt/skills/datacard-generator/reference/datacard_template_v1.md +++ /dev/null @@ -1,498 +0,0 @@ ---- -#YAML Metadata to support datacard metadata: (Instructions follow) -#----------------Datacard Information-- This section provides information about this datacard and its contents -datacard_info: - filename: modcon_datacard_${SNAKE_CASE_DATASET_NAME}.md # Follow this filename convention and enter the filename here - id: # PID assigned to this datacard if used. - type: #DOI | HANDLE | URL | LOCAL | OTHER - value: - datacard_access: #describe any access restrictions for this datacard, or "None - publicly accessible" - datacard_creation: - created_date: "${CREATED_DATE_YYYY_MM_DD}" # required - update_date: "${UPDATE_DATE_YYYY_MM_DD}" # add updates as needed - creation_method: #manual | automated | hybrid - created_by: [] # Contacts for this datacard. Include at least one (datacard_contact_person) or organizational contact (datacard_contact_organization) with email. - - datacard_contact_person: - name: - orcid: - email: - affiliation: - - datacard_contact_organization: - organization_name: - ror_id: - email: - datacard_template_version: 1.0 - language: en - -#-------------Data Identification-- Required. This section contains the information needed to link this datacard to the datasets it describes: -data_identifiers: - name: "${DATASET_NAME}" #provide a single, human readable name even if this datacard describes a collection of data. Ensure alignment with the datacard filename. - dataset_id: # Identify the dataset(s) described by this datacard. At least 1 identifier is required. Persistent, unique identifiers (PIDs) are preferred. - type: # DOI | HANDLE | URL | LOCAL | OTHER - value: - -#----- Level 1: Discoverable----- -#[Required Level1]: -# Identification -title: "${DATASET_NAME}" -id: # ID of the datacard if known. Otherwise this will be provided by the system. -project: # required - project name or identifier that the dataset is associated with - -#--- Data files/objects in dataset - -# for published datasets, include one or more dois. For unpublished datasets, provide a landing page url if available, or other identifier such as a local identifier or handle. -identifiers: [] # [required level1]. At least 1 identifier is required. - - type: # DOI | HANDLE | URL | LOCAL | OTHER - value: - - -#[Optional Level1]: -#--- Dataset Structure / Characteristics--- -dataset_info: - modalities: [] # e.g., ["tabular","image","time-series"] - data_formats: [] - features: [] # list of variables or features in the dataset. - splits: [] # e.g., ["train", "test", "validation"] - dataobject_type: dataset #Type of digital object (dataset, aiagent, eval, framework, model, etc.) - dataset_type: #See OSTI DOE Data Explorer Types - -#**OSTI DOE Data Explorer Dataset Types**: -#| Code | Type | Description | -#|------|------|-------------| -#| `GD` | Genome/Genetic Data | DNA/RNA sequences, genetic markers, genomic annotations | -#| `IM` | Image | Photographs, scans, visualizations, microscopy | -#| `ND` | Numeric Data | Measurements, time series, tabular data, sensor readings | -#| `SM` | Specialized Mix | Multiple data types combined | -#| `FP` | Figure/Plot | Charts, graphs, plots as primary deliverable | -#| `I` | Interactive Resource | Web applications, interactive visualizations, dashboards | -#| `MM` | Multimedia | Audio, video, combined media | -#| `MD` | Model | Computational models, simulations, trained ML models | -#| `AS` | Automated Software | Scripts, analysis pipelines, workflows | -#| `IP` | Instrumentation and Protocols | Experimental protocols, instrument specifications | -#| `IG` | Integrated Genomic Resources | Combined genomic databases and tools | - -dataset_readiness: - level: # 1 | 2 | 3 - evaluated_against: "Genesis Dataset Readiness Model v1.0" - evaluated_at: - evaluated_by: - confidence: # high | medium | low - - -#--- Lifecycle / Governance--- -release_status: # draft | approved | published | deprecated - -review_process: - review_purpose: # a short description of why is the data is being reviewed (e.g., "release", "partner sharing", "IRB",... ) - review_status: # "submitted" | "pending" | "approved" | "declined" | - review_institution: - name: # name of institution conducting the review - ror_id: # ROR ID of institution conducting the review - review_comments: - - -#------ Level 2: Interoperable and Reusable------- - -#[Required Level2]: -#--- Contacts -contact_point: - entity: - type: person # person - person: - given_name: - family_name: - orcid: - email: - affiliation: - entity: - type: organization # organization - organization: - name: - ror_id: - -additional_contacts: [] - -#--- Access, Permissions and Policy -access_policy: - # sensitivity_tier classification: - # tier0_openScience - # tier1_controlledResearch - # tier2_proprietary - # tier3_sensitive_exportControlled - # tier4_regulated_personal - # tier5_classified - sensitivity_tier: - access_level: # NOTE: currently only accepting "open", open | restricted | controlled - required NOTE: currently only accepting "open" - authorization: # none | accountRequired | userAgreement | dataUseAgreement | sponsorApproval | exportControlReview | irbApproval | other - policy_url: - policy_text: - -#[IF Applicable]: -cui_markings: # see CUI documentation -distribution_statement: #examples are: "Distribution A - Approved for public release"; "Distribution D - DoD only"; "Internal DOE use only" -handling_instructions: #examples are "No foreign dissemination" or "Export-controlled handling required" - -security_marking: - classification: # unclassified | cui | classified | confidential | Secret | TS | other - cui_marking: - distribution_statement: - handling_instructions: - declassification: - review_date: - authority: - -#--- Rights & License--- -license: - spdx_id: # SPDX ID is a standardized license identifier (e.g., CC-BY-4.0, MIT, Apache-2.0) - name: - link: - -additional_licenses: [] - -#[Optional Level2]: -# Domain and Purpose -categorization: - science_domain: "${SCIENCE_DOMAIN}" # required - high level scientific domain or discipline that the dataset is associated with, e.g., physics, chemistry, materials science, etc. - modalities: [] # e.g., ["tabular","image","time-series"] - task_category: [] # ML classification - task_subcategory: [] # ML subcategory classification - -# Resources -originating_research_organization: - entity: - type: organization # organization - organization: - name: - ror_id: - -facilities: [] -# list of facilities (as entities, type: organization) where the dataset was collected, processed, stored, or accessed. Examples include research labs, data centers, cloud platforms, supercomputing facilities, etc. - -fundings: -# list of funding sources that supported the creation, maintenance, or sharing of the dataset. Examples include grants, contracts, in-kind support, etc. - - funding: - award_number: - program: - funder: - entity: - type: organization # organization - organization: - name: - ror_id: - -#--- Dataset Responsibility & Credit -dataset_authors: - - entity: # at least 1 is required - type: person # person - person: - given_name: - family_name: - orcid: - email: - affiliation: - entity: - type: organization # organization - organization: - name: - ror_id: - role: creator - -dataset_contributors: [] # Acknowlege other contributors beyond dataset authors and include them as entities of type person or organization. - -#--- Stewardship & Maintenance--- -stewardship: - level: # projectManaged | repositoryManaged - -maintenance: - update_frequency: # none | ad_hoc | monthly | quarterly | annually | other - -# Provenance -#--- Dataset Provenance--- -dataset_provenance: - was_generated_by: - source_data: - processing_steps: - instrumentation: - simulation_details: - -#--- Related Resources (Optional but recommended) -related_resources: - related_datasets: [] - # - dataset_name: - # identifiers: - # - type: DOI | URL | LOCAL | OTHER - # value: - # relationship: isDerivedFrom | isBasedOn | isPartOf | hasPart | references | other - publications: - dois: [] - arxiv: [] - urls: [] - software: - # - name: - # version: - # identifiers: - # - type: DOI | URL | LOCAL | OTHER - # value: - # relationship: isDerivedFrom | isBasedOn | isPartOf | hasPart | references | other - aimodels: [] - # - name: #if available, provide the name of the AI model - # version: - # date accessed: - # identifiers: - # - type: DOI | URL | LOCAL | OTHER - # value: - # relationship: isDerivedFrom | isBasedOn | isPartOf | hasPart | references | other -#-- Methods -#--- Dates -dates: - data_collection_start: - data_collection_end: - issued: - modified: - - -#------ Level 3: Understandable & Trustworthy------- - -#[If Applicable Level3]: -#--- Semantic / Schema -semantic_layer: - # Describe formal schema / ontology alignment if available - # Leave blank if none exists (acceptable at Level 1 or Level 2) - schema_url: - ontology_alignment: [] - semantic_context: [] - controlled_vocabularies: [] - -#--- Data Quality -data_quality: - completeness: - known_issues: - validation_methods: - noise_characteristics: - uncertainty_notes: - -#--- Integrity / Fixity -integrity: - checksum_available: # true | false - checksum_type: - fixity_policy: - versioning_strategy: - -#--- AI / ML Usage -ai_usage: - ai_ready: # true | false | conditional - training_use_allowed: # true | false | conditional - inference_use_allowed: # true | false | conditional - restrictions: - bias_risks: - safety_considerations: - human_review_required: # true | false - -#---- System Level Info:---- - -#--- Dataset Scale -dataset_counts: - value: - category: # samples | files | records | timesteps | other - unit: - -dataset_storage: - compressed_bytes: - unpacked_bytes: - -#--- Repository-managed access endpoints--- -repository_access: - populated_by_repository: true - distributions: [] - # Example: - # - distribution: - # access_url: {} - # download_url: - # format: - # byte_size: - # checksum: - # type: [] - # items: [] - data_services: [] - # Example: - # - data_service: - # type: api | s3 | other - # additional_properties: false - # properties: - # endpoint: {} - # service_type: {} - # auth_hint: {} ---- - -### Instructions - - - - - - - - - - - -# Datacard for ${DATASET_NAME} -**Last Updated**: [!TODO] - -**Dataset Readiness Level:** - -### Machine Usability Snapshot -| Aspect | Status | -|--------|--------| -| AI Ready | Yes/No/Conditional| -| License Clarity | Yes/No| -| Machine Access | Yes/No| -| Checksum / Fixity | Yes/No| -| Semantic Context | Yes/No| - - -# ---- Level 1: Discoverable ---- ---- -## Identification - - -### Files & Structure -[!TODO] [required level1] - ---- -## Description - -### Dataset Description [required] -[!TODO] - -### Keywords [strongly recommended] -[!TODO] - -### Citation -[!TODO] - ---- - -# ---- Level 2: Interoperable and Reusable ---- - -### Sharing & Access -[!TODO] - -### Security / Marking Considerations -[!TODO] - ---- -### Access and Permissions -[!TODO] - -### Access conditions -[!TODO] - -### Release review process -[!TODO] - ---- -## Context - -### Domain and Purpose -[!TODO] - -### Resources used, including funding and facilities, to create the dataset -[!TODO] - ---- -## Provenance - -### Developed by -[!TODO] - -### Contributed by -[!TODO] - ---- -## Related Resources - -### Related datasets, standards, metadata, and ontologies -[!TODO] - -### Related publications -[!TODO] - -### Related software -[!TODO] - -### Related ai model -[!TODO] - ---- -## Methods - -### Dataset generation, collection, and procedures -[!TODO] - -### Maintenance & Updates -[!TODO] - - -# ---- Level 3: Understandable & Trustworthy ---- - -### Data Characteristics -[!TODO] - -### Data Quality & Limitations -[!TODO] - -### Related Schemas or Ontologies -[!TODO] - -### List of variable name(s), description(s), unit(s), and value labels for each variable in the dataset/file. -[!TODO] - -For example: -| Variable Name | Description | Unit | Value Labels | -|---------------|---------------------------|-----------|-----------------------------| -| temp | Temperature measurement | Celsius | N/A | -| status | Operational status | N/A | 0 = Off, 1 = On | - -### Codes used for missing data -[!TODO] - -For example: -| Code | Description | -|------|---------------------------| -| -999 | Data not collected | -| -888 | Measurement error | - -### Specialized formats or other abbreviations used -[!TODO] - -### Example of the contents -[!TODO] - -### Data Processing -[!TODO] - -### Software used to preprocess/ clean/ label the data -[!TODO] - -## Integrity & Versioning -[!TODO] - -## Semantic / Schema Information -[!TODO] - -## AI / Machine Learning Considerations -[!TODO] - ---- - -## Additional Information -[!TODO] - ---- \ No newline at end of file diff --git a/src/dsagt/skills/datacard-generator/reference/datasheet_template.md b/src/dsagt/skills/datacard-generator/reference/datasheet_template.md deleted file mode 100644 index c69afe9..0000000 --- a/src/dsagt/skills/datacard-generator/reference/datasheet_template.md +++ /dev/null @@ -1,320 +0,0 @@ ---- -language: -- en # ISO language tag -datasheet_version: 0.1.0 # version of the GENESIS datasheet template -name: Datasheet for {DATASET_NAME} # name of the datasheet -tags: -- project:genesis # include on all GENESIS project datasets -- science:lightsource # what kind of science is this for (e.g., materials, biology, lightsource, fusion, climate, etc.) -- keywords: [] # keywords associated with the dataset -- risk:general # indicates level of risk review {general, reviewed, restricted} ---- -# Datasheets for Datasets: Contextualizing Scientific Data - -Documenting a scientific dataset in a datasheet requires careful consideration of scope and resolution. -Scientific data often exists as part of a larger ecosystem of the scientific record: a single experiment may produce multiple datasets, or a dataset may be split into subsets for different analyses. -Deciding at what level to create a datasheet is not always straightforward—too granular, and you risk an unmanageable number of datasheets; too broad, and important details may be lost. -Consider how the dataset will be used and cited, who the intended consumers are (e.g., researchers, search engines, AI systems), and what level of documentation will provide meaningful transparency without unnecessary duplication. -Keep in mind that datasheets are not necessarily one-to-one with dataset DOIs: a datasheet may describe a single dataset, a collection of related datasets, or a versioned release. -Before you begin, reflect on these relationships and choose a resolution that balances clarity, usability, and sustainability. -This is not a prescriptive process; it deserves deliberate consideration. - -## Before you begin - -## Datasheet Structure - The datasheet is organized into the following sections: - - Motivation - - Composition - - Collection - - Preprocessing/Cleaning/Labeling - - Uses - - Distribution - - Maintenance - - Human Subject Research (if applicable) - -## Think about your dataset's modalities -Before filling out the datasheet, consider the different data modalities present in your dataset (e.g., sensor time series, microscopy images, simulation results, logbook text). -Each modality may have unique characteristics, collection methods, and intended uses that should be documented separately. -When addressing questions in the datasheet, specify which modality you are referring to, especially if the dataset contains multiple modalities. -This will help ensure clarity and provide a comprehensive understanding of the dataset's structure and content. - -## Think about your dataset's intended uses -Before filling out the datasheet, reflect on the intended uses of your dataset. -Consider the scientific tasks it is designed to support, the research gaps it aims to address, and the theories it seeks to test. -Understanding the intended applications will help you provide more accurate and relevant information in the datasheet, particularly in sections related to motivation, uses, and limitations. -This reflection will also guide you in identifying any potential risks or ethical considerations associated with the dataset's use. - -## Consider the scope and resolution of your datasheet -Before filling out the datasheet, carefully consider the scope and resolution at which you will document your dataset. -Decide whether the datasheet will cover a single dataset, a collection of related datasets, or a versioned release. -Think about the level of detail that will be most useful for your intended audience, balancing clarity and usability with the need to avoid unnecessary duplication. -This consideration will help ensure that the datasheet effectively communicates the essential information about your dataset while remaining manageable and relevant. - -## Consider the those filling out the datasheet -Before filling out the datasheet, think about who will be completing it. -Ensure that the individuals responsible for authoring the datasheet have a comprehensive understanding of the dataset, including its creation, composition, and intended uses. -Encourage collaboration among team members who contributed to the dataset to provide accurate and detailed responses. -This collaborative approach will help ensure that the datasheet reflects a well-rounded perspective and captures all relevant information about the dataset. - -### Guidelines -1. Recommend that there be a citable ``scientific record" that contains the Datasheet and Dataset and other elements of the scientific record, including facilities, authors, publications, software, datasets, schemas, and datasheets. -2. Recommend assigning a separate DOI to the Datasheet for independent versioning and authorship tracking, apart from related manuscripts and datasets. -3. Recommend providing a version of this datasheet with information that is suitable for the widest possible distribution and citation allowed under applicable laws. -If applicable and to the extent permissible in a document for such wide distribution, the datasheet should cite one or more controlled (at various classification levels, proprietary, or export controlled) versions of itself, and individual responses should indicate that the controlled version has different information. - Such different ``versions" of the document should have their own unique persistent identifiers. -4. Recommend a broad reading of the text of the questions, and provide partial answers if some parts cannot be provided. - For example, if a grant from a private foundation lacks an associated grant identifier, the entity's name and a description of the grant should be provided. - If providing the name of the grantor or the purpose of creation is not permissible at the wide distribution level, that portion can be left unanswered with an explicit marking of ``not available." -5. item If a combination of responses is not permitted (e.g., indicating the title and the motivation, or the motivation and the funding agency simultaneously may not be permitted) at the wide distribution level, then the topics describing the dataset and its allowed uses are to be given preference over the provenance. -6. In situations where there might be some uncertainty, make sure the datasheet is reviewed by a derivative classifier. - -**[[REMOVE "Datasheets for Datasets: Contextualizing Scientific Data" section prior to sharing the Datasheet]]** - -# Datasheet for ${DATASET_NAME} - -## Dataset Metadata - -### Dataset citation: - -Include DOI or URL. Use information that matches a data catalog entry if applicable. Citations in [bibtex format](https://www.bibtex.com/g/bibtex-format/). Please include either a `doi` or `url` field in the citation. - -### Name and contact information for the datasheet author(s): - -A person or group that was primarily responsible for authoring the datasheet. Provide the Name, [ORCID](https://orcid.org/), affiliation (ROR ID) and email address of the person or group responsible for the datasheet. - -## Motivation - -### What was the primary purpose for creating this dataset (e.g., to support a specific scientific task, address a research gap, or test a theory)? -Describe the intended use and, if applicable, any assumptions made about the underlying system (e.g., proxy variables, stationarity, symmetry, theoretical models). - -### Was the dataset created for or in the context of an AI application? -Please provide a description of the AI application, including the type of model(s) that will be trained or evaluated using this dataset. - -### Who created the dataset (e.g., which team, research group), what are their institutional affiliations, and on behalf of which research entity (e.g., company, institution, organization) was the dataset created? - -Provide the [ROR ID](https://ror.org/) for research entities if available. - -### What resources were used? For example, what funding, facility time, computing resources, and datasets were used to create the dataset? Provide answers in the sections below. - -#### Facilities: - -list the facilities. Provide the facility name and [ROR ID](https://ror.org/) if available. - -#### Funding: - -If there are associated grants, please provide the name of the grantor(s) (e.g., federal agency and program office name) and [ROR ID](https://ror.org/) , and the grant name(s) and number(s) (e.g., PAMS Award number and/or lab contract number). List the solicitation numbers and links, or citations. Citations in [bibtex format](https://www.bibtex.com/g/bibtex-format/). Please include either a `doi` or `url` field in the citation. - -#### Other Supporting Entities: - -If available, provide a [ROR ID](https://ror.org/), link, or citation for other supporting entities. Citations in [bibtex format](https://www.bibtex.com/g/bibtex-format/). Please include either a `doi` or `url` field in the citation. - -### Was the dataset created as part of a larger initiative? -If so, provide the name for the initiative, e.g., SciDAC, ModelTeam, Genesis. - -### Any other comments regarding the motivation for creating the dataset? (optional) -Provide any additional information about the motivation for creating the dataset. - -## Composition - -### Identify the data modalities contained in the dataset (e.g., sensor time series, microscopy images, simulation results, logbook text). -If multiple modalities exist, list each one. - -### What do the instances within each modality represent (e.g., detector hits, environmental monitoring readings, simulation runs, documents)? Are there multiple types of instances (e.g., beam pulses and cavity field maps; users and their proposals)? -Provide a description for each modality. - -### How many instances are there in total (of each type, if appropriate)? -Provide the number of instances for each modality. - -### Does the dataset include all possible instances for each modality or a subset (not necessarily random)? -If a subset, what is the larger collection (e.g., all experiments at a user facility during a given period)? Is the subset intended to be representative (e.g., energy ranges, geographic sites)? Describe the criteria and validation methods used to verify representativeness. -If it is not representative, please describe why not (e.g., to capture diversity in experimental conditions; due to data access restrictions; because data was intentionally excluded, and why). - -### What data does each instance within the dataset consist of (for each modality, if appropriate)? "Raw" data (e.g., unprocessed text or images) or features? -In either case, please provide a description. - -### Is any information missing from individual instances? -If so, please provide a description explaining why this information is missing (e.g., because it was unavailable). This does not include intentionally removed information but may include, e.g., redacted text or missing sensor data. - -### Are relationships between individual instances made explicit (e.g., parent-child relationships in a tree structure, protein–protein interaction networks)? -If so, please describe how these relationships are made explicit. - -### Have the data been split, or are there recommended data splits (e.g., training, development/validation, testing)? -If so, please provide a description of these splits, explaining the rationale behind them. - -### Are there any errors, sources of noise, or redundancies in the dataset? -If so, please provide a description. - -### Are external resources (e.g., file stores, calibration information, software packages, code, websites, other datasets) needed to interpret or reuse these data? -If not, respond "No." -Otherwise, for each external resource, provide a description, and if applicable, its access point (e.g., DOIs, link, [bibtex format](https://www.bibtex.com/g/bibtex-format/) citation). Please include either a `doi` or `url` field in the citation. - -#### If there are external resources, for each external resource, are there any restrictions (e.g., licenses, fees)? - -### Does the dataset contain data that might be considered Controlled Unclassified Information (CUI) or confidential (e.g., Personally Identifiable Information (PII), Confidential Business Information (CBI), pre-publication data, data protected by legal privilege, or data that includes the content of individuals' non-public communications)? -Refer to [Open Digital Rights Language (ODRL) Vocabulary & Expression](https://www.w3.org/TR/odrl-vocab/) and the [Information Model](https://www.w3.org/TR/odrl-model/) for a standard vocabulary to represent statements about the usage of data. - -### Does the dataset contain classified or export-controlled information? -If so, please provide an explanation. - -### Any other comments regarding the composition of the dataset? (optional) -Provide any additional information about the composition of the dataset. - -## Collection - -### How was the data for each instance obtained or generated (e.g., raw experimental measurements from user facilities, processed/physics-ready experimental data, outputs from computational simulations, or data derived from prior datasets)? -If the data were simulated or derived, provide details on how accuracy and validity were assessed. - -### Did the dataset incorporate data obtained through unconventional means (e.g., online crowdsourcing, volunteer-based citizen science)? -If yes, explain how these contributions were managed, including documentation, acknowledgment, and quality assurance measures. - -### For each instrument or facility used to generate and collect the data, what mechanisms or procedures were used to collect the data (e.g., hardware apparatuses or sensors, manual human curation, software programs, software APIs)? -Provide a description for each instrument or facility used, mechanisms and procedures used to collect the data. Provide relevant details on how these mechanisms or procedures were validated. - -### If the dataset is a sample from a larger set, what was the sampling strategy (e.g., deterministic, probabilistic with specific sampling probabilities)? -Provide a description of the sampling strategy and any methods used to assess sample representativeness. - -### Over what timeframe was the data collected or generated? Does this timeframe align with when the underlying phenomena or events occurred (e.g., recent simulation of historical accelerator configurations)? If not, describe the timeframe of the original events or conditions represented by the data. -Provide single date, range, or approximate date; suggested format YYYY-MM-DD, and, if appropriate, describe alignment with underlying phenomena or events - -### Any other comments regarding the dataset collection process? (optional) -Provide any additional information about the collection process for the dataset. - -## Preprocessing/cleaning/labeling - -### To create the final dataset, was any preprocessing/ cleaning/ labeling of raw data done? - -Preprocessing, cleaning, and labeling can include simple pipelines (e.g., discretization or bucketing, tokenization, part-of-speech tagging, SIFT feature extraction, removal of instances, processing of missing values), or more complex workflows including noise reduction and filtering, calibration, reconstruction and event building, simulation post-processing, meta-data enrichment (e.g., adding uncertainty estimates). -If so, please provide a description of the workflow. - -### Was the "raw" data saved in addition to the preprocessed/cleaned/labeled data (e.g., to support unanticipated future uses)? -If so, please provide a [bibtex format](https://www.bibtex.com/g/bibtex-format/) citation, link, or other access point to the "raw" data. Please include either a `doi` or `url` field in the citation. - -### Is the software that was used to preprocess/ clean/ label the data available? -If so, please provide a [bibtex format](https://www.bibtex.com/g/bibtex-format/) citation, PID, link, or other access point, along with descriptions of any required packages or libraries to run the scripts. Please include either a `doi` or `url` field in the citation. - -### Any other comments regarding the preprocessing, cleaning, and labeling workflow for the dataset? (optional) -Provide any additional information about the preprocessing, cleaning, and labeling workflow for the dataset. - -## Uses - -### Have AI-Readiness assessment tools been used? -If so, name the tool, and describe the results? - -### To what extent are the data prepared for AI? -Describe any steps taken to make the data suitable for AI applications, such as formatting, normalization, or feature extraction. - -### If the data has been used for AI, provide a description. -Highlight how the data was used for AI, including links to code or DOIs illustrating its use. - -### What (other) tasks could the dataset be used for? -Provide examples of tasks that the dataset could be used for, beyond its original purpose. - -### Are there limitations or characteristics, arising from the collection, preprocessing, or labeling, that may impact future uses of the dataset (e.g., incomplete coverage of parameter space, data quality variations, restrictions due to CUI or export control)? -If so, please provide a description. Describe mitigation strategies for the risks? - -### Are there tasks for which the dataset should not be used? -Describe tasks not recommended for this dataset's use. - -### Were any reviews for safety, cybersecurity, export control, or other considerations conducted? -If so, please provide a description of these review processes, including the outcomes. - -### Are there tutorials or other supporting information to assist a user of these data? -If so, please provide a description and [bibtex format](https://www.bibtex.com/g/bibtex-format/) citation, PID, link, or other access point. Please include either a `doi` or `url` field in the citation. - -### Any other comments on the use of the dataset? (optional) -Provide any additional information about the use of the dataset. - -## Distribution - -### Will all or parts of the dataset be shared outside the originating laboratory, facility, or site (e.g., with other DOE sites, collaborating institutions, or public repositories)? -If so, please provide a description of how the dataset will be shared, including any access mechanisms (e.g., data repositories, data catalogs, or other data sharing platforms). Provide relevant links or citations in [bibtex format](https://www.bibtex.com/g/bibtex-format/). Please include either a `doi` or `url` field in the citation. - -### Indicate the date or timeframe when the dataset was (or will be) made available. -Provide a single date or range; suggested format YYYY-MM-DD. - -### Access and reuse restrictions placed on the data: -Will the dataset be shared under a copyright or other intellectual property (IP) license, and/or under applicable terms of use (ToU)? -If your dataset is split into multiple parts (e.g., training and test sets), you may need to answer separately for each part. -If so, please describe this license and/or ToU, and provide a link or other access point to, or otherwise reproduce, any relevant licensing terms or ToU, as well as any fees associated with these restrictions. - -### Have any third parties imposed IP-based or other restrictions on the data associated with the instances? -If so, please describe these restrictions and provide a link or other access point to, or otherwise reproduce, any relevant licensing terms, as well as any fees associated with these restrictions. - -### Export control restrictions: -Do any export controls or other regulatory restrictions apply to the dataset or to individual instances? -If so, please describe these restrictions and provide a link or other access point to, or otherwise reproduce, any supporting documentation. - -### If the dataset or any individual instances have a classification or restriction, what is the specific level of classification or restriction (e.g., Classified, Unclassified, CUI)? -If there is no classification or restriction, respond "N/A." - -### Any other comments regarding the distribution of the dataset? (optional) -Provide any additional information about the distribution of the dataset. - -## Maintenance - -### Who will maintain and provide access to the dataset after its creation? -For example, DOE-supported repository, facility staff, or facility data steward. -Provide responsible parties’ names, roles, contact information, and persistent identifiers (e.g., [ORCID](https://orcid.org/)). -Indicate any DOE policies or agreements governing maintenance and retention. - -### Is there an erratum? -If so, please provide a link or other access point. - -### Will the dataset be updated (e.g., to correct labeling errors, add new instances, delete instances)? -If so, please describe how often, by whom, and how updates will be communicated to dataset consumers (e.g., mailing list, GitHub)? - -### Will older versions of the dataset continue to be supported/hosted/maintained? -If so, please describe how. If not, please describe how its obsolescence will be communicated to dataset consumers. - -### What is the expected lifespan of the dataset? -Provide a description of the expected lifespan, including any plans for archiving or decommissioning, including stakeholders, communities, and collaborators who will be consulted. If a retention policy applies, please provide a link or other access point to, or otherwise reproduce, the relevant documentation. - -### Will the dataset support controlled extensions or updates from collaborators? -If yes, describe the mechanism for submission, validation, and versioning, and how these changes will be communicated to users.Otherwise, explain why contributions are restricted. - -### Any other comments regarding the maintenance of the dataset? (optional) -Provide any additional information about the maintenance of the dataset. - -## Human Subject Research - -### Does the dataset include personal information (e.g., identifiable data, survey responses)? -If not, respond "No" and ignore the remainder of the Datasheet. -Otherwise, respond "Yes" and continue. - -### Human Subject Research: Dataset Composition - -#### Does the dataset identify any subpopulations (e.g., by age, gender, career stage)? -If not, answer "No." Otherwise, please describe how these subpopulations are identified and provide a description of their respective distributions within the dataset. - -#### Is it possible to identify individuals (i.e., one or more natural persons), either directly or indirectly (i.e., in combination with other data) from the dataset? -If not, answer "No." Otherwise, please describe how. - -#### Does the dataset contain data that might be considered sensitive in any way (e.g., data that reveals race or ethnic origins, sexual orientations, religious beliefs, political opinions or union memberships, or locations; financial or health data; biometric or genetic data; forms of government identification, such as social security numbers; criminal history)? -If not, answer "No." Otherwise, please provide a description. - -### Human Subject Research: Collection - -#### Were any ethical review processes conducted (e.g., by an institutional review board)? -If so, please provide a description of these review processes, including the outcomes, and a link or other access point to any supporting documentation. - -#### Were the individuals in question notified about the data collection? -If so, please describe (or show with screenshots or other information) how notice was provided, and provide a link or other access point to, or otherwise reproduce, the exact language of the notification itself. - -#### Did the individuals in question consent to the collection and use of their data? -If so, please describe (or show with screenshots or other information) how consent was requested and provided, and provide a link or other access point to, or otherwise reproduce, the exact language to which the individuals consented. - -#### If consent was obtained, were the consenting individuals provided with a mechanism to revoke their consent in the future or for certain uses? -If so, please provide a description and a link or other access point to the mechanism (if applicable). - -#### Has an analysis of the potential impact of the dataset and its use on data subjects (e.g., a data protection impact analysis) been conducted? -If so, please provide a description of this analysis, including the outcomes, as well as a link or other access point to any supporting documentation. - -### Human Subject Research: Maintenance - -#### Are there applicable limits on the retention of the data associated with the instances (e.g., were the individuals in question told that their data would be retained for a fixed period of time and then deleted)? -If so, please describe these limits and explain how they will be enforced. - -### Human Subject Research: Other - -#### Any other comments regarding the inclusion of human subject research in the dataset? (optional) -Provide any additional information about the inclusion of human subject research in the dataset. diff --git a/src/dsagt/skills/datacard-generator/reference/field-prompts.md b/src/dsagt/skills/datacard-generator/reference/field-prompts.md deleted file mode 100644 index 5be7460..0000000 --- a/src/dsagt/skills/datacard-generator/reference/field-prompts.md +++ /dev/null @@ -1,38 +0,0 @@ -# Per-Level Field Prompts - -Ask 3–5 fields at a time. Only prompt for fields not already auto-filled. - -## Level 1 — Discoverable - -- Dataset name and title -- Project name -- Dataset identifiers (DOI, URL) -- Dataset description -- Keywords -- Release status - -## Level 2 — Interoperable & Reusable (adds to Level 1) - -- Contact point (name, email, affiliation, ORCID) -- Access policy (sensitivity tier, access level, authorization) — see lookup-tables.md for sensitivity tiers -- License (SPDX ID, name, URL) -- Science domain -- Originating research organization -- Funding sources -- Dataset authors and contributors -- Provenance — how was the data generated or collected? -- Related resources (datasets, publications, software) -- Maintenance and stewardship plans -- Key dates (collection start/end, issued, modified) - -## Level 3 — Understandable & Trustworthy (adds to Level 2) - -- Semantic/schema info — ontologies, controlled vocabularies (must be user-provided) -- Data quality — completeness, known issues, validation methods, noise, uncertainty -- Integrity — checksums, fixity policy, versioning strategy -- AI/ML usage — AI-ready status, training/inference permissions, bias risks, safety notes -- Variable-level documentation — name, description, unit, value labels (one row per variable) -- Missing data codes -- Data processing steps - -**If the user selects Level 3 but provides minimal information:** generate the card with N/A for unprovided fields, list them in the review summary, and note which ones most improve trustworthiness. \ No newline at end of file diff --git a/src/dsagt/skills/datacard-generator/reference/introspection-commands.md b/src/dsagt/skills/datacard-generator/reference/introspection-commands.md deleted file mode 100644 index 0fd9517..0000000 --- a/src/dsagt/skills/datacard-generator/reference/introspection-commands.md +++ /dev/null @@ -1,80 +0,0 @@ -# Introspection Commands - -## Contents -- File structure and size -- Schema and data sampling (CSV, Parquet, JSON, HDF5/NetCDF, images) -- Existing metadata files -- Directory structure and splits -- Edge case handling - -## File Structure and Size - -```bash -# Full recursive file listing -find -type f | sort - -# Count files by extension -find -type f | sed 's/.*\.//' | sort | uniq -c | sort -rn - -# Total size (human-readable) -du -sh - -# Size of top-level items -du -sh /* -``` - -## Schema and Data Sampling - -```bash -# CSV/TSV — headers, row count, dtypes -head -n 3 -wc -l -python3 -c "import pandas as pd; df = pd.read_csv('', nrows=5); print(df.dtypes); print(df.shape)" - -# Parquet -python3 -c "import pandas as pd; df = pd.read_parquet(''); print(df.dtypes); print(df.shape)" - -# JSON — top-level structure -python3 -c "import json; d=json.load(open('')); print(type(d), list(d.keys()) if isinstance(d,dict) else f'{len(d)} items')" - -# HDF5 -python3 -c "import h5py; f=h5py.File('','r'); print(list(f.keys())); f.visititems(lambda n,o: print(n, type(o).__name__))" - -# NetCDF -python3 -c "import netCDF4 as nc; d=nc.Dataset(''); print(d.variables.keys()); print(d.dimensions)" - -# Images — count by format -find -type f \( -iname "*.jpg" -o -iname "*.png" -o -iname "*.tif" \) | wc -l -``` - -For files larger than 1GB, sample only (e.g., `nrows=1000`) and note this in the review summary. - -## Existing Metadata Files - -```bash -# Check for common metadata files -for f in README.md README.txt LICENSE CITATION.cff CITATION.bib .zenodo.json datacite.json croissant.json metadata.json; do - [ -f "/$f" ] && echo "FOUND: $f" -done - -# Print contents of found files -cat /README.md 2>/dev/null -cat /CITATION.cff 2>/dev/null -cat /LICENSE 2>/dev/null -``` - -## Directory Structure and Splits - -```bash -# Top-level listing and directory tree (2 levels) -ls / -find -maxdepth 2 -type d | sort -``` - -Look for `train/`, `test/`, `validation/`, or file naming patterns like `_train.csv`. - -## Edge Case Handling - -- **Empty directory**: Inform the user; ask whether to proceed with a skeleton card -- **Binary-only files**: Note under "Files & Structure" and mark schema fields N/A -- **Large files (>1GB)**: Sample only; note in the review summary \ No newline at end of file diff --git a/src/dsagt/skills/datacard-generator/reference/lookup-tables.md b/src/dsagt/skills/datacard-generator/reference/lookup-tables.md deleted file mode 100644 index e549645..0000000 --- a/src/dsagt/skills/datacard-generator/reference/lookup-tables.md +++ /dev/null @@ -1,32 +0,0 @@ -# Lookup Tables - -## OSTI Dataset Type Codes - -Use when setting `dataset_info.dataset_type`: - -| Code | Type | -|------|------| -| `GD` | Genome/Genetic Data | -| `IM` | Image | -| `ND` | Numeric Data | -| `SM` | Specialized Mix | -| `FP` | Figure/Plot | -| `I` | Interactive Resource | -| `MM` | Multimedia | -| `MD` | Model | -| `AS` | Automated Software | -| `IP` | Instrumentation and Protocols | -| `IG` | Integrated Genomic Resources | - -## Sensitivity Tiers - -Use when setting `access_policy.sensitivity_tier`: - -| Tier | Label | -|------|-------| -| `tier0` | Open Science | -| `tier1` | Controlled Research | -| `tier2` | Proprietary | -| `tier3` | Sensitive / Export Controlled | -| `tier4` | Regulated / Personal | -| `tier5` | Classified | \ No newline at end of file diff --git a/src/dsagt/skills/skill-creator/SKILL.md b/src/dsagt/skills/skill-creator/SKILL.md new file mode 100644 index 0000000..c4978b9 --- /dev/null +++ b/src/dsagt/skills/skill-creator/SKILL.md @@ -0,0 +1,65 @@ +--- +name: skill-creator +description: "Author a new agent Skill (a SKILL.md directory in the open Agent Skills format) from the Anthropic template. Use when the user wants to create a skill, scaffold a SKILL.md, package a repeatable workflow as a reusable skill, turn instructions into a skill, or capture a procedure so the agent can auto-invoke it later. Produces a valid SKILL.md (name + description frontmatter, optional scripts/ and references/) and saves it into the project's skills directory." +metadata: + version: "1.0" +--- + +# Skill Creator + +Scaffold a new, spec-valid agent Skill from the Anthropic template and save it so the agent can discover and auto-invoke it. + +A *skill* is a directory `/SKILL.md`: YAML frontmatter (`name`, `description`) plus markdown instructions the agent follows when the description matches the task. It can bundle `scripts/` and `references/`. See [references/agent_skills_spec.md](references/agent_skills_spec.md) for the full contract. + +## Workflow + +Copy this checklist and check off steps as you go: + +``` +Progress: +- [ ] 1. Gather skill intent (name, purpose, triggers) +- [ ] 2. Draft from the template +- [ ] 3. Write the body (instructions/workflow) +- [ ] 4. Add scripts/ and references/ if needed +- [ ] 5. Validate the frontmatter +- [ ] 6. Save into the project (save_skill) +- [ ] 7. Confirm + note how it activates +``` + +### 1. Gather Intent + +Ask the user (or infer from context): +- **name** — short, lowercase, hyphenated (e.g. `convert-vasp-outputs`). This becomes the directory name and the invocable name. +- **purpose** — one sentence on what the skill does. +- **triggers** — the user requests / phrasing that should make the agent reach for this skill. These become keywords in the `description`. + +### 2. Draft From the Template + +Start from [references/SKILL_template.md](references/SKILL_template.md). Fill the frontmatter: +- `name`: must equal the directory name. +- `description`: pack it with *what it does AND when to use it* (trigger phrases) — this is the only thing the agent sees when deciding to invoke. Keep it ≤ 1536 characters. + +### 3. Write the Body + +After the frontmatter, write the instructions the agent will follow. Prefer a copyable checklist (like this one) for multi-step workflows. Reference bundled files by relative path, e.g. `[reference](references/notes.md)`, or run a bundled script with `${CLAUDE_SKILL_DIR}/scripts/foo.py` so paths resolve regardless of cwd. + +### 4. Add Supporting Files (optional) + +- `scripts/` — runnable helpers the body invokes. +- `references/` — long docs/templates loaded on demand (keep them OUT of SKILL.md so they cost no tokens until used). + +### 5. Validate + +Confirm before saving: +- Frontmatter is valid YAML between `---` fences. +- `name` is present, lowercase-hyphenated, and equals the intended directory name. +- `description` is present and ≤ 1536 characters. +- Any `[link](references/...)` and `${CLAUDE_SKILL_DIR}/scripts/...` paths exist. + +### 6. Save + +Save via the **`save_skill`** MCP tool with the `spec` (frontmatter dict: `name`, `description`, optional `tags`), the `body` markdown, and any `reference_files` (a `{relative_path: contents}` map). This writes `/skills//`, which the agent natively auto-discovers after the next `dsagt start` (no KB indexing — `search_skills` is for the not-yet-installed catalog). + +### 7. Confirm + +Tell the user the skill was saved and how it activates: project skills are mirrored into the platform's native skill directory (e.g. `.claude/skills/`) at the next `dsagt start`, after which the agent auto-discovers it. To use it in the current session, restart the agent. diff --git a/src/dsagt/skills/skill-creator/references/SKILL_template.md b/src/dsagt/skills/skill-creator/references/SKILL_template.md new file mode 100644 index 0000000..99db025 --- /dev/null +++ b/src/dsagt/skills/skill-creator/references/SKILL_template.md @@ -0,0 +1,53 @@ +# SKILL.md template + +Copy the block below into `/SKILL.md` and fill it in. Only the +frontmatter `name` + `description` are required; everything else is +optional. (Based on the Anthropic skill template / open Agent Skills +standard — https://github.com/anthropics/skills/tree/main/template.) + +```markdown +--- +name: my-skill-name +description: A clear description of WHAT this skill does and WHEN to use it — include the user phrasings/triggers that should invoke it. (≤ 1536 chars; this is the only text the agent sees when deciding to invoke.) +# optional: +# tags: [domain, keyword] +# metadata: +# version: "1.0" +--- + +# My Skill Name + +One or two sentences framing the task this skill handles. + +## Workflow + +Copy this checklist and check off steps as you go: + +``` +Progress: +- [ ] 1. ... +- [ ] 2. ... +``` + +### 1. ... + +Step-by-step instructions. Reference bundled docs by relative path: +[details](references/details.md). Run bundled scripts with an absolute +skill-dir path so cwd doesn't matter: + + python ${CLAUDE_SKILL_DIR}/scripts/helper.py + +## Notes / Guidelines +- ... +``` + +## Optional bundled files + +``` +my-skill-name/ +├── SKILL.md (required) +├── references/ (long docs/templates, loaded on demand) +│ └── details.md +└── scripts/ (runnable helpers the body invokes) + └── helper.py +``` diff --git a/src/dsagt/skills/skill-creator/references/agent_skills_spec.md b/src/dsagt/skills/skill-creator/references/agent_skills_spec.md new file mode 100644 index 0000000..bd3f79a --- /dev/null +++ b/src/dsagt/skills/skill-creator/references/agent_skills_spec.md @@ -0,0 +1,48 @@ +# Agent Skills — condensed contract + +A *skill* packages instructions (and optionally code/docs) so an agent can +discover and follow a repeatable workflow. dsagt skills follow the open +Agent Skills standard, which is what Claude Code, Cursor, Codex, and +Antigravity all read — so one SKILL.md works across platforms. + +## Directory layout + +``` +/ +├── SKILL.md # required — frontmatter + instructions +├── references/ # optional — docs/templates loaded on demand +└── scripts/ # optional — runnable helpers +``` + +- The **directory name is the invocable name** (e.g. `.claude/skills/deploy/` → `/deploy`). Keep it lowercase, hyphenated. + +## Frontmatter + +YAML between `---` fences. Common fields: + +| Field | Required | Notes | +|-------|----------|-------| +| `name` | recommended | Should equal the directory name. Lowercase-hyphenated. | +| `description` | **yes (in practice)** | What it does AND when to use it (trigger phrases). The agent sees only this when deciding to invoke. **≤ 1536 characters.** | +| `tags` | no | List of keywords; dsagt uses these for `search_skills` tag filters. | +| `metadata` | no | Free-form (e.g. `version`). Ignored by the platform. | +| `license` | no | Free-form. Ignored by the platform. | + +Unknown/extra frontmatter fields are **silently ignored** by Claude Code, so dsagt-specific fields are safe to include. + +## How discovery works + +- At session start, each installed skill's `name` + `description` are loaded into the agent's context. The full SKILL.md body loads only when the skill is invoked (lazy — zero cost until used). +- The agent auto-invokes a skill when the `description` matches the task; the user can also invoke it directly (`/skill-name`). +- A **newly created** top-level skills directory is only picked up after the agent restarts. + +## Body conventions + +- Lead with a copyable progress checklist for multi-step workflows. +- Keep long material in `references/` (loaded on demand) rather than inline, to save context tokens. +- Reference bundled files by relative path, or run scripts via `${CLAUDE_SKILL_DIR}/scripts/...` so paths resolve regardless of working directory. + +## Two tiers in dsagt + +- **Catalog** — skills indexed from external GitHub source repos, searchable via `search_skills` but not installed. Not in context. +- **Installed** — skills in `/skills/` (saved via `save_skill` or installed via `install_skill`). Mirrored into the platform's native skill dir at `dsagt start`, then natively discovered. diff --git a/src/session.py b/src/session.py deleted file mode 100755 index bfb9d64..0000000 --- a/src/session.py +++ /dev/null @@ -1,458 +0,0 @@ -""" -DSAGT session management. - -Handles project initialization, agent-specific config generation, -and service lifecycle (proxy + MLflow). - -Each agent platform gets its configs generated from the single -dsagt_config.yaml — MCP server entries, agent instructions, env vars, -and proxy routing are all derived mechanically. -""" - -import json -import logging -import os -import signal -import subprocess -import sys -from pathlib import Path - -import yaml - -from dsagt.config import ( - VALID_AGENTS, - default_config_content, - load_config, - project_dir_for, -) - -logger = logging.getLogger(__name__) - -# Where bundled agent instruction templates live (relative to dsagt package) -_AGENTS_DIR = Path(__file__).parent.parent.parent / "agents" - - -# --------------------------------------------------------------------------- -# Project initialization -# --------------------------------------------------------------------------- - -def init_project(project_name: str, agent: str, runtime_base: str | Path = "runtime") -> Path: - """Create a new project directory with default config and subdirectories. - - Returns the project directory path. - """ - if agent not in VALID_AGENTS: - raise ValueError(f"agent must be one of {VALID_AGENTS}, got '{agent}'") - - project_dir = project_dir_for(project_name, runtime_base) - - if (project_dir / "dsagt_config.yaml").exists(): - raise FileExistsError(f"Project already exists: {project_dir}") - - project_dir.mkdir(parents=True, exist_ok=True) - for subdir in ("trace_archive", "mlflow", "skills", "kb_index"): - (project_dir / subdir).mkdir(exist_ok=True) - - config_content = default_config_content(project_name, agent) - (project_dir / "dsagt_config.yaml").write_text(config_content) - - return project_dir - - -# --------------------------------------------------------------------------- -# Agent config generation -# --------------------------------------------------------------------------- - -def generate_agent_configs(config: dict, working_dir: str | Path) -> list[str]: - """Generate agent-platform-specific config files in the working directory. - - Reads the dsagt_config.yaml values and writes the files each agent - platform expects: MCP server configs, agent instructions, env setup. - - Returns a list of descriptions of what was written. - """ - agent = config["agent"] - working_dir = Path(working_dir) - project_dir = Path(config["project_dir"]) - proxy_port = config["proxy"]["port"] - - generators = { - "claude-code": _generate_claude_code, - "goose": _generate_goose, - "roo": _generate_roo, - "cline": _generate_cline, - } - - return generators[agent](config, working_dir, project_dir, proxy_port) - - -def _mcp_server_args(server: str, project_dir: Path) -> list[str]: - """Build the args list for an MCP server entry.""" - base = ["run", f"dsagt-{server}-server"] - if server == "registry": - base += ["--runtime-dir", str(project_dir)] - elif server == "knowledge": - base += [ - "--base-index-dir", str(project_dir / "kb_index"), - "--runtime-dir", str(project_dir), - ] - return base - - -def _mcp_env_block(config: dict) -> dict: - """Build the env block for MCP server entries.""" - env = {} - embedding_key = config.get("embedding", {}).get("api_key", "") - if embedding_key and not embedding_key.startswith("${"): - env["LLM_API_KEY"] = embedding_key - return env - - -def _generate_claude_code(config, working_dir, project_dir, proxy_port) -> list[str]: - actions = [] - env_block = _mcp_env_block(config) - - mcp_config = {"mcpServers": {}} - for server in ("registry", "knowledge"): - entry = { - "command": "uv", - "args": _mcp_server_args(server, project_dir), - } - if env_block: - entry["env"] = env_block - mcp_config["mcpServers"][f"dsagt-{server}"] = entry - - mcp_path = working_dir / ".mcp.json" - mcp_path.write_text(json.dumps(mcp_config, indent=2) + "\n") - actions.append(f"Wrote {mcp_path}") - - # Copy agent instructions - instructions = _load_instructions("claude-code", "dsagt_instructions.md") - if instructions: - claude_md = working_dir / "CLAUDE.md" - if claude_md.exists(): - existing = claude_md.read_text() - if "DSAGT Pipeline Builder" not in existing: - claude_md.write_text(existing + "\n\n" + instructions) - actions.append(f"Appended DSAGT instructions to {claude_md}") - else: - claude_md.write_text(instructions) - actions.append(f"Wrote {claude_md}") - - # Write env helper - env_path = working_dir / ".dsagt_env" - _write_env_file(env_path, { - "ANTHROPIC_BASE_URL": f"http://localhost:{proxy_port}", - "DSAGT_PROJECT": config["project"], - "DSAGT_PROJECT_DIR": str(project_dir), - }) - actions.append(f"Wrote {env_path} (source this or export the variables)") - - return actions - - -def _generate_goose(config, working_dir, project_dir, proxy_port) -> list[str]: - actions = [] - model = config["llm"]["model"] - - goose_config = { - "GOOSE_PROVIDER": "openai", - "GOOSE_MODEL": model, - "extensions": {}, - } - - for server in ("registry", "knowledge"): - args = _mcp_server_args(server, project_dir) - goose_config["extensions"][server] = { - "enabled": True, - "name": server, - "type": "stdio", - "cmd": "uv " + " ".join(args), - "timeout": 300, - } - - goose_path = working_dir / "goose.yaml" - goose_path.write_text(yaml.dump(goose_config, default_flow_style=False, sort_keys=False)) - actions.append(f"Wrote {goose_path}") - - # Copy agent instructions - instructions = _load_instructions("goose", ".goosehints") - if instructions: - hints_path = working_dir / ".goosehints" - hints_path.write_text(instructions) - actions.append(f"Wrote {hints_path}") - - env_path = working_dir / ".dsagt_env" - _write_env_file(env_path, { - "OPENAI_HOST": f"http://localhost:{proxy_port}", - "DSAGT_PROJECT": config["project"], - "DSAGT_PROJECT_DIR": str(project_dir), - }) - actions.append(f"Wrote {env_path}") - - return actions - - -def _generate_roo(config, working_dir, project_dir, proxy_port) -> list[str]: - actions = [] - env_block = _mcp_env_block(config) - - mcp_config = {"mcpServers": {}} - for server in ("registry", "knowledge"): - entry = { - "command": "uv", - "args": _mcp_server_args(server, project_dir), - "disabled": False, - } - if env_block: - entry["env"] = env_block - mcp_config["mcpServers"][f"dsagt-{server}"] = entry - - roo_dir = working_dir / ".roo" - roo_dir.mkdir(exist_ok=True) - mcp_path = roo_dir / "mcp.json" - mcp_path.write_text(json.dumps(mcp_config, indent=2) + "\n") - actions.append(f"Wrote {mcp_path}") - - # Copy roomodes - roomodes = _load_instructions("roo", "roomodes") - if roomodes: - roomodes_path = working_dir / ".roomodes" - roomodes_path.write_text(roomodes) - actions.append(f"Wrote {roomodes_path}") - - env_path = working_dir / ".dsagt_env" - _write_env_file(env_path, { - "DSAGT_PROJECT": config["project"], - "DSAGT_PROJECT_DIR": str(project_dir), - }) - actions.append(f"Wrote {env_path}") - - return actions - - -def _generate_cline(config, working_dir, project_dir, proxy_port) -> list[str]: - actions = [] - env_block = _mcp_env_block(config) - - mcp_config = {"mcpServers": {}} - for server in ("registry", "knowledge"): - entry = { - "command": "uv", - "args": _mcp_server_args(server, project_dir), - "disabled": False, - "alwaysAllow": [], - } - if env_block: - entry["env"] = env_block - mcp_config["mcpServers"][f"dsagt-{server}"] = entry - - # Write to project dir (user merges into global cline settings) - mcp_path = working_dir / "cline_mcp.json" - mcp_path.write_text(json.dumps(mcp_config, indent=2) + "\n") - actions.append(f"Wrote {mcp_path} (merge into Cline MCP settings)") - - # Copy agent instructions - instructions = _load_instructions("cline", "dsagt_instructions.md") - if instructions: - rules_dir = working_dir / ".clinerules" - rules_dir.mkdir(exist_ok=True) - instr_path = rules_dir / "dsagt_instructions.md" - instr_path.write_text(instructions) - actions.append(f"Wrote {instr_path}") - - env_path = working_dir / ".dsagt_env" - _write_env_file(env_path, { - "DSAGT_PROJECT": config["project"], - "DSAGT_PROJECT_DIR": str(project_dir), - }) - actions.append(f"Wrote {env_path}") - - return actions - - -def _load_instructions(agent_dir: str, filename: str) -> str | None: - """Load an instruction template from the agents directory.""" - path = _AGENTS_DIR / agent_dir / filename - if path.exists(): - return path.read_text() - logger.warning("Instruction template not found: %s", path) - return None - - -def _write_env_file(path: Path, env_vars: dict) -> None: - """Write a sourceable env file.""" - lines = [f'export {k}="{v}"' for k, v in env_vars.items()] - path.write_text("\n".join(lines) + "\n") - - -# --------------------------------------------------------------------------- -# Service start / stop -# --------------------------------------------------------------------------- - -def _pid_file(project_dir: Path) -> Path: - return project_dir / ".pids" - - -def start_services(config: dict) -> dict[str, int]: - """Start the proxy and MLflow for a project. Returns {name: pid}.""" - project_dir = Path(config["project_dir"]) - pids = {} - - # Start MLflow - mlflow_port = config["mlflow"]["port"] - mlflow_backend = config["mlflow"]["backend"] - mlflow_dir = project_dir / "mlflow" - mlflow_dir.mkdir(exist_ok=True) - - if mlflow_backend == "sqlite": - backend_uri = f"sqlite:///{mlflow_dir}/mlflow.db" - else: - backend_uri = str(mlflow_dir) - - mlflow_cmd = [ - sys.executable, "-m", "mlflow", "server", - "--backend-store-uri", backend_uri, - "--default-artifact-root", str(mlflow_dir / "artifacts"), - "--host", "0.0.0.0", - "--port", str(mlflow_port), - ] - - mlflow_log = project_dir / "mlflow.log" - mlflow_proc = subprocess.Popen( - mlflow_cmd, - stdout=open(mlflow_log, "w"), - stderr=subprocess.STDOUT, - start_new_session=True, - ) - pids["mlflow"] = mlflow_proc.pid - logger.info("MLflow started (pid %d) → http://localhost:%d", mlflow_proc.pid, mlflow_port) - - # Start proxy - proxy_port = config["proxy"]["port"] - otel_endpoint = f"http://localhost:{mlflow_port}" - trace_dir = str(project_dir / "trace_archive") - - proxy_cmd = [ - sys.executable, "-m", "dsagt.proxy", - "--port", str(proxy_port), - "--records-dir", trace_dir, - "--session", config["project"], - "--otel-endpoint", otel_endpoint, - "--model", config["llm"]["model"], - ] - - proxy_log = project_dir / "proxy.log" - proxy_proc = subprocess.Popen( - proxy_cmd, - stdout=open(proxy_log, "w"), - stderr=subprocess.STDOUT, - env={**os.environ, "DSAGT_PROJECT": config["project"]}, - start_new_session=True, - ) - pids["proxy"] = proxy_proc.pid - logger.info("Proxy started (pid %d) → http://localhost:%d", proxy_proc.pid, proxy_port) - - # Save PIDs - pid_path = _pid_file(project_dir) - pid_path.write_text(json.dumps(pids, indent=2) + "\n") - - return pids - - -def stop_services(project_dir: str | Path) -> list[str]: - """Stop running services for a project. Returns descriptions of what was stopped.""" - project_dir = Path(project_dir) - pid_path = _pid_file(project_dir) - stopped = [] - - if not pid_path.exists(): - return ["No running services found."] - - pids = json.loads(pid_path.read_text()) - - for name, pid in pids.items(): - try: - os.killpg(os.getpgid(pid), signal.SIGTERM) - stopped.append(f"Stopped {name} (pid {pid})") - except (ProcessLookupError, PermissionError): - stopped.append(f"{name} (pid {pid}) was not running") - - pid_path.unlink(missing_ok=True) - return stopped - - -# --------------------------------------------------------------------------- -# Agent launch -# --------------------------------------------------------------------------- - -def agent_env(config: dict) -> dict: - """Build the environment dict an agent process needs to inherit. - - Merges the current environment with DSAGT-specific variables so the - proxy intercept, project identity, and embedding key are all set. - """ - project_dir = config["project_dir"] - proxy_port = config["proxy"]["port"] - agent = config["agent"] - - env = dict(os.environ) - env["DSAGT_PROJECT"] = config["project"] - env["DSAGT_PROJECT_DIR"] = project_dir - - # Proxy routing — agent-specific env var - if agent in ("claude-code", "roo", "cline"): - env["ANTHROPIC_BASE_URL"] = f"http://localhost:{proxy_port}" - if agent == "goose": - env["OPENAI_HOST"] = f"http://localhost:{proxy_port}" - - # Embedding API key for MCP servers - embedding_key = config.get("embedding", {}).get("api_key", "") - if embedding_key and not embedding_key.startswith("${"): - env["LLM_API_KEY"] = embedding_key - - return env - - -def agent_command(config: dict) -> list[str] | None: - """Return the shell command to launch the agent, or None for VS Code agents.""" - commands = { - "claude-code": ["claude"], - "goose": ["goose", "session"], - } - return commands.get(config["agent"]) - - -def launch_agent(config: dict, working_dir: str | Path) -> int: - """Launch the agent process in the foreground with the correct environment. - - For CLI agents (Claude Code, Goose): runs the agent interactively and - returns its exit code. - - For VS Code agents (Roo, Cline): prints instructions and returns 0. - """ - working_dir = Path(working_dir) - env = agent_env(config) - cmd = agent_command(config) - - if cmd is None: - # VS Code agents can't be launched from the CLI - agent = config["agent"] - print(f"\n Open VS Code in: {working_dir}") - if agent == "roo": - print(" Switch to the DSAGT Pipeline Builder mode (Cmd+.)") - elif agent == "cline": - print(" Open the Cline panel and verify MCP servers are connected") - print() - return 0 - - logger.info("Launching: %s", " ".join(cmd)) - try: - result = subprocess.run(cmd, env=env, cwd=str(working_dir)) - return result.returncode - except FileNotFoundError: - logger.error("Command not found: %s", cmd[0]) - logger.error("Is %s installed?", config["agent"]) - return 1 - except KeyboardInterrupt: - return 0 diff --git a/tests/mcp_helpers.py b/tests/mcp_helpers.py index 62c6a3a..583e414 100644 --- a/tests/mcp_helpers.py +++ b/tests/mcp_helpers.py @@ -151,8 +151,13 @@ def mcp_call_tool(proc, tool_name: str, arguments: dict, return read_mcp_message(proc, timeout=timeout, expect_id=msg_id) -def start_server(cmd: list[str], env: dict = None) -> subprocess.Popen: - """Start a server subprocess with stdio pipes.""" +def start_server(cmd: list[str], env: dict = None, cwd: str = None) -> subprocess.Popen: + """Start a server subprocess with stdio pipes. + + ``cwd`` sets the working directory — ``dsagt-server`` discovers its project + config from cwd (see ``observability.find_project_config``), so startup tests + must run it from the project dir. + """ proc_env = os.environ.copy() if env: proc_env.update(env) @@ -164,4 +169,5 @@ def start_server(cmd: list[str], env: dict = None) -> subprocess.Popen: stderr=subprocess.PIPE, text=True, env=proc_env, + cwd=cwd, ) diff --git a/tests/smoke_test/run.sh b/tests/smoke_test/run.sh index 2973567..8914eca 100755 --- a/tests/smoke_test/run.sh +++ b/tests/smoke_test/run.sh @@ -173,12 +173,12 @@ check() { check "csvtool_filter spec written" "test -f '${PDIR}/tools/csvtool_filter.md'" check "trace_archive has records" "ls '${PDIR}/trace_archive/'*.json | grep -q ." check "scan_directory record" "ls '${PDIR}/trace_archive/'*scan_directory*.json | grep -q ." -# Both files are written by dsagt-knowledge-server's kb_ingest_directory MCP -# tool — chroma.sqlite3 is the actual vector DB, route.json is the collection -# manifest. Checking only `test -d kb_index/knowledge` is too weak: an agent -# can satisfy it by hand-crafting an empty directory tree, masking a broken -# MCP wiring (which is exactly what we hit when cline's dsagt-knowledge -# server crashed silently and the LLM compensated by mkdir-ing the path). +# Both files are written by dsagt-server's kb_ingest MCP tool — chroma.sqlite3 +# is the actual vector DB, route.json is the collection manifest. Checking +# only `test -d kb_index/knowledge` is too weak: an agent can satisfy it by +# hand-crafting an empty directory tree, masking a broken MCP wiring (which is +# exactly what we hit when cline's dsagt server crashed silently and the LLM +# compensated by mkdir-ing the path). check "knowledge ingested (route)" "test -f '${PDIR}/kb_index/knowledge/route.json'" check "knowledge ingested (vectors)" "test -f '${PDIR}/kb_index/knowledge/chroma.sqlite3'" # Explicit memory writes to /explicit_memories.yaml (YAML at the @@ -218,8 +218,8 @@ for _, row in df.iterrows(): spans = row.get("spans") or [] # Match by service.name on root span — agent-emitted traces only. # MCP-server traces (kb.*, registry.*, tool.execute) carry - # service.name = "dsagt-knowledge-server" / "dsagt-registry-server" / - # "dsagt-run" and shouldn't count toward agent turn parity. + # service.name = "dsagt-server" / "dsagt-run" and shouldn't count + # toward agent turn parity. for s in spans: attrs = getattr(s, "attributes", None) or ( s.get("attributes") if isinstance(s, dict) else None diff --git a/tests/test_config.py b/tests/test_config.py index b616597..6670b70 100644 --- a/tests/test_config.py +++ b/tests/test_config.py @@ -164,6 +164,20 @@ def test_missing_project_raises(self, tmp_path): with pytest.raises(ValueError, match="project"): load_config(name) + def test_skills_block_backfilled_for_old_config(self, tmp_path): + """A config written before the skills block still gets the default.""" + name = self._write_config( + tmp_path, + "myproject", + {"project": "myproject", "agent": "claude", "llm": {"provider": "openai"}}, + ) + config = load_config(name) + sources = config["skills"]["sources"] + assert sources[0]["name"] == "scientific" + assert "K-Dense-AI" in sources[0]["url"] + assert config["skills"]["populate_native"] is True + assert config["skills"]["populate_catalog"] is True + def test_missing_agent_raises(self, tmp_path): name = self._write_config(tmp_path, "myproject", {"project": "myproject"}) with pytest.raises(ValueError, match="agent"): @@ -194,6 +208,18 @@ def test_project_dir_injected(self, tmp_path): assert config["project_dir"] == str(tmp_path / "myproject") +class TestSkillsDefaults: + + def test_defaults_has_skills(self): + from dsagt.session import DEFAULTS + + assert DEFAULTS["skills"]["sources"][0]["name"] == "scientific" + + def test_default_config_content_includes_skills(self): + body = yaml.safe_load(default_config_content("p", "claude", 5001)) + assert body["skills"]["sources"][0]["name"] == "scientific" + + # --------------------------------------------------------------------------- # Config: helpers # --------------------------------------------------------------------------- @@ -460,8 +486,8 @@ def test_claude_writes_mcp_json(self, tmp_path): mcp_path = working_dir / ".mcp.json" assert mcp_path.exists() mcp = json.loads(mcp_path.read_text()) - assert "dsagt-registry" in mcp["mcpServers"] - assert "dsagt-knowledge" in mcp["mcpServers"] + assert set(mcp["mcpServers"]) == {"dsagt"} + assert mcp["mcpServers"]["dsagt"]["args"] == ["run", "dsagt-server"] assert (working_dir / "CLAUDE.md").exists() # BYOA: .dsagt_env is no longer written; user manages shell env. assert not (working_dir / ".dsagt_env").exists() @@ -476,8 +502,8 @@ def test_goose_writes_goose_yaml(self, tmp_path): goose_path = working_dir / "goose.yaml" assert goose_path.exists() goose = yaml.safe_load(goose_path.read_text()) - assert "registry" in goose["extensions"] - assert "knowledge" in goose["extensions"] + assert set(goose["extensions"]) == {"dsagt"} + assert goose["extensions"]["dsagt"]["cmd"] == "uv run dsagt-server" assert (working_dir / ".goosehints").exists() def test_roo_writes_static_and_dynamic(self, tmp_path): @@ -494,7 +520,8 @@ def test_roo_writes_static_and_dynamic(self, tmp_path): # comes from dsagt_config.yaml via cwd-walk; the MCP env block # only carries EMBEDDING_* settings. mcp = json.loads((working_dir / ".roo" / "mcp.json").read_text()) - assert "EMBEDDING_BACKEND" in mcp["mcpServers"]["dsagt-registry"]["env"] + assert set(mcp["mcpServers"]) == {"dsagt"} + assert "EMBEDDING_BACKEND" in mcp["mcpServers"]["dsagt"]["env"] def test_cline_writes_static_only_in_split_test(self, tmp_path): # Cline's dynamic writer shells out to `cline auth` and `cline mcp @@ -521,7 +548,7 @@ def test_codex_writes_static_and_dynamic(self, tmp_path): assert (working_dir / "AGENTS.md").exists() assert (working_dir / ".codex-data").is_dir() toml = (working_dir / ".codex-data" / "config.toml").read_text() - assert "[mcp_servers.dsagt-registry.env]" in toml + assert "[mcp_servers.dsagt.env]" in toml # Project routing comes from dsagt_config.yaml via cwd-walk; the # MCP env block only carries EMBEDDING_* settings. assert "EMBEDDING_BACKEND" in toml @@ -569,9 +596,8 @@ def test_codex_config_toml_shape(self, tmp_path): } toml = _render_codex_config(mcp_env) - assert "[mcp_servers.dsagt-registry]" in toml - assert "[mcp_servers.dsagt-knowledge]" in toml - assert "[mcp_servers.dsagt-registry.env]" in toml + assert "[mcp_servers.dsagt]" in toml + assert "[mcp_servers.dsagt.env]" in toml assert 'MLFLOW_TRACKING_URI = "http://localhost:5001"' in toml # OTel opt-in: enables OTLP-HTTP export + un-redacts user prompt. assert "[otel]" in toml @@ -606,10 +632,10 @@ def test_opencode_config_json_shape(self): parsed = json.loads(body) assert parsed["$schema"] == "https://opencode.ai/config.json" - assert set(parsed["mcp"]) == {"dsagt-registry", "dsagt-knowledge"} - reg = parsed["mcp"]["dsagt-registry"] + assert set(parsed["mcp"]) == {"dsagt"} + reg = parsed["mcp"]["dsagt"] assert reg["type"] == "local" - assert reg["command"] == ["uv", "run", "dsagt-registry-server"] + assert reg["command"] == ["uv", "run", "dsagt-server"] assert reg["environment"]["DSAGT_PROJECT_DIR"] == "/proj" # Provider block uses {env:VAR} reference, never the resolved value. assert ( @@ -681,11 +707,11 @@ def test_mcp_servers_dict_shape(self): } mcp = _build_mcp_servers_dict(env_block) - assert set(mcp["mcpServers"]) == {"dsagt-registry", "dsagt-knowledge"} - assert mcp["mcpServers"]["dsagt-knowledge"]["disabled"] is False + assert set(mcp["mcpServers"]) == {"dsagt"} + assert mcp["mcpServers"]["dsagt"]["disabled"] is False # Env block plumbs through so the MCP server children have what they need. assert ( - mcp["mcpServers"]["dsagt-registry"]["env"]["MLFLOW_TRACKING_URI"] + mcp["mcpServers"]["dsagt"]["env"]["MLFLOW_TRACKING_URI"] == "http://localhost:5001" ) @@ -701,11 +727,10 @@ def test_mcp_config_omits_redundant_dsagt_env(self, tmp_path): self._write_both(config, working_dir) mcp = json.loads((working_dir / ".mcp.json").read_text()) - for server in ("dsagt-registry", "dsagt-knowledge"): - env = mcp["mcpServers"][server].get("env", {}) - assert "DSAGT_PROJECT" not in env - assert "DSAGT_PROJECT_DIR" not in env - assert "MLFLOW_TRACKING_URI" not in env + env = mcp["mcpServers"]["dsagt"].get("env", {}) + assert "DSAGT_PROJECT" not in env + assert "DSAGT_PROJECT_DIR" not in env + assert "MLFLOW_TRACKING_URI" not in env # --------------------------------------------------------------------------- @@ -1265,9 +1290,7 @@ def test_goose(self): "goose", "session", "--with-extension", - "uv run dsagt-registry-server", - "--with-extension", - "uv run dsagt-knowledge-server", + "uv run dsagt-server", ] def test_roo(self): @@ -1360,13 +1383,12 @@ def test_mcp_env_block_omits_empty_embedding_keys(self): assert "EMBEDDING_MODEL" not in env def test_mcp_server_args_are_just_command(self): - """MCP server args are just ["run", "dsagt--server"]. - All configuration flows through env vars and dsagt_config.yaml. + """MCP server args are just ["run", "dsagt-server"] — one merged + server. All configuration flows through dsagt_config.yaml (cwd-walk). """ from dsagt.agents import _mcp_server_args - assert _mcp_server_args("knowledge") == ["run", "dsagt-knowledge-server"] - assert _mcp_server_args("registry") == ["run", "dsagt-registry-server"] + assert _mcp_server_args() == ["run", "dsagt-server"] def test_mcp_env_block_carries_only_embedding_settings(self): from dsagt.agents import _mcp_env_block diff --git a/tests/test_dependency_integration.py b/tests/test_dependency_integration.py index c0ad35b..9db0064 100644 --- a/tests/test_dependency_integration.py +++ b/tests/test_dependency_integration.py @@ -31,7 +31,7 @@ import yaml -from dsagt.commands.registry_server import create_registry_server +from dsagt.mcp.registry_tools import create_registry_server from dsagt.registry import ToolRegistry diff --git a/tests/test_dsagt_server.py b/tests/test_dsagt_server.py new file mode 100644 index 0000000..3aa3486 --- /dev/null +++ b/tests/test_dsagt_server.py @@ -0,0 +1,116 @@ +"""Tests for the merged ``dsagt-server`` (all four concern modules under one Server). + +These verify the *composition* contract: every tool from the registry / knowledge +/ memory / skill modules is exposed under one MCP ``Server``, and the single +``call_tool`` wrapper preserves both return-type contracts (registry + skill +handlers may return a plain string; knowledge / memory handlers return a dict +that gets JSON-encoded). Also covers ``_build_kb_from_config`` credential +validation in-process (the full subprocess boot needs a live MLflow — see +``test_server_startup.py``). +""" + +import asyncio +import json +from pathlib import Path +from unittest.mock import MagicMock + +import mcp.types as types +import pytest + +from dsagt.mcp.server import _build_kb_from_config, create_dsagt_server +from dsagt.registry import SkillRegistry, ToolRegistry + + +def _make_merged_server(tmp_path: Path): + kb = MagicMock() + kb.index_dir = tmp_path / "kb_index" + kb.index_dir.mkdir() + kb.default_rerank = True + kb.collections = [] + runtime = str(tmp_path / "runtime") + reg = ToolRegistry(source_tools_dir=None, runtime_dir=runtime, kb=None) + sreg = SkillRegistry(source_skills_dir=None, runtime_dir=runtime, kb=None) + return create_dsagt_server(reg, kb, sreg, runtime_dir=runtime) + + +def _list_tools(server) -> list[str]: + handler = server.request_handlers[types.ListToolsRequest] + res = asyncio.run(handler(types.ListToolsRequest(method="tools/list"))) + return sorted(t.name for t in res.root.tools) + + +def _call(server, name: str, arguments: dict) -> str: + handler = server.request_handlers[types.CallToolRequest] + req = types.CallToolRequest( + method="tools/call", + params=types.CallToolRequestParams(name=name, arguments=arguments), + ) + res = asyncio.run(handler(req)) + return res.root.content[0].text + + +def test_merged_server_exposes_all_tools(tmp_path): + """Both concern modules' tools land under one server with no collision.""" + server = _make_merged_server(tmp_path) + names = _list_tools(server) + # 8 registry + 6 knowledge + 4 memory + 5 skill = 23 distinct tools. + assert len(names) == 23 + assert len(set(names)) == len(names) # no name collision + # Representative tools from each side. + for expected in ( + "get_registry", + "search_skills", + "kb_search", + "list_skill_sources", + ): + assert expected in names + + +def test_registry_tool_returns_plain_string(tmp_path): + """Registry handlers return a bare string — passed through unchanged.""" + server = _make_merged_server(tmp_path) + out = _call(server, "get_registry", {}) + # Not JSON — the registry contract is a human-readable string. + with pytest.raises(json.JSONDecodeError): + json.loads(out) + assert "tools:" in out + + +def test_knowledge_tool_returns_json(tmp_path): + """Knowledge handlers return a dict — JSON-encoded by the wrapper.""" + server = _make_merged_server(tmp_path) + out = _call(server, "list_skill_sources", {}) + parsed = json.loads(out) + assert "sources" in parsed + + +class TestBuildKbFromConfig: + """``_build_kb_from_config`` validates embedding config before building a KB. + + These raise paths fire before any embedder / ChromaDB construction, so they + need no real backend. + """ + + def _cfg(self, **embedding): + return { + "embedding": embedding, + "knowledge": {"chunk_size": 1024, "vector_db": "chroma", "rerank": False}, + } + + def test_invalid_backend_raises(self, tmp_path): + cfg = self._cfg(backend="not-a-backend") + with pytest.raises(ValueError, match="backend must be"): + _build_kb_from_config(cfg, tmp_path) + + def test_api_backend_without_base_url_raises(self, tmp_path): + cfg = self._cfg(backend="api", model="m", base_url="", api_key="k") + with pytest.raises(ValueError, match="requires embedding.base_url"): + _build_kb_from_config(cfg, tmp_path) + + def test_api_backend_without_api_key_raises(self, tmp_path): + # Unresolved ``${...}`` placeholder counts as missing. + cfg = self._cfg( + backend="api", model="m", base_url="http://x", api_key="${LLM_API_KEY}" + ) + with pytest.raises(ValueError, match="requires embedding.api_key"): + _build_kb_from_config(cfg, tmp_path) diff --git a/tests/test_info.py b/tests/test_info.py index 6b1fd22..afb3f9f 100644 --- a/tests/test_info.py +++ b/tests/test_info.py @@ -150,7 +150,7 @@ def test_report_aggregates_by_source_and_session(config): "trace_metadata": _metadata( session="sess-A", agent="claude", in_t=200, out_t=0, ), - "spans": _spans_for("dsagt-knowledge-server"), + "spans": _spans_for("dsagt-server"), }, { "trace_id": "t4", @@ -175,8 +175,8 @@ def test_report_aggregates_by_source_and_session(config): assert sources["claude-code"]["input_tokens"] == 2300 assert sources["claude-code"]["output_tokens"] == 170 assert sources["claude-code"]["errors"] == 1 - assert sources["dsagt-knowledge-server"]["traces"] == 1 - assert sources["dsagt-knowledge-server"]["errors"] == 0 + assert sources["dsagt-server"]["traces"] == 1 + assert sources["dsagt-server"]["errors"] == 0 assert [s["session"] for s in r["by_session"]] == ["sess-B", "sess-A"] sess_a = next(s for s in r["by_session"] if s["session"] == "sess-A") diff --git a/tests/test_kb_search_filters.py b/tests/test_kb_search_filters.py index 73919ca..23aedb7 100644 --- a/tests/test_kb_search_filters.py +++ b/tests/test_kb_search_filters.py @@ -11,7 +11,7 @@ import pytest import mcp.types as types -from dsagt.commands.knowledge_server import create_knowledge_server +from dsagt.mcp.knowledge_tools import create_knowledge_server from mcp_helpers import call_tool_json as call_tool diff --git a/tests/test_knowledge_integration.py b/tests/test_knowledge_integration.py index 5ff6fe5..ca94a1e 100755 --- a/tests/test_knowledge_integration.py +++ b/tests/test_knowledge_integration.py @@ -23,7 +23,7 @@ pytestmark = pytest.mark.integration from dsagt.knowledge import KnowledgeBase -from dsagt.commands.knowledge_server import create_knowledge_server, setup_runtime_kb +from dsagt.mcp.knowledge_tools import create_knowledge_server, setup_runtime_kb from mcp_helpers import call_tool_json as call_tool, call_tool_async as _call_tool_async_raw diff --git a/tests/test_knowledge_server.py b/tests/test_knowledge_server.py index 1504490..3275fbc 100644 --- a/tests/test_knowledge_server.py +++ b/tests/test_knowledge_server.py @@ -13,13 +13,12 @@ import asyncio import json import time -from pathlib import Path -from unittest.mock import MagicMock, patch +from unittest.mock import MagicMock import pytest import mcp.types as types -from dsagt.commands.knowledge_server import create_knowledge_server, setup_runtime_kb +from dsagt.mcp.knowledge_tools import create_knowledge_server, setup_runtime_kb from mcp_helpers import call_tool_json as call_tool @@ -34,7 +33,9 @@ async def _call_tool_async(server, name: str, arguments: dict) -> dict: return json.loads(result.root.content[0].text) -async def call_tool_and_await_job(server, name: str, arguments: dict) -> tuple[dict, dict]: +async def call_tool_and_await_job( + server, name: str, arguments: dict +) -> tuple[dict, dict]: """Call a tool that starts a background job, wait for it, return (initial, final).""" initial = await _call_tool_async(server, name, arguments) assert initial["status"] == "started" @@ -50,7 +51,9 @@ async def call_tool_and_await_job(server, name: str, arguments: dict) -> tuple[d raise TimeoutError(f"Job {job_id} did not complete") -def make_search_result(text: str, source_file: str, chunk_index: int = 0, score: float = 0.9): +def make_search_result( + text: str, source_file: str, chunk_index: int = 0, score: float = 0.9 +): """Create a search result in the format KnowledgeBase.search returns.""" return { "chunk": { @@ -70,6 +73,7 @@ def make_search_result(text: str, source_file: str, chunk_index: int = 0, score: # Fixtures # --------------------------------------------------------------------------- + @pytest.fixture def mock_kb(tmp_path): """A mocked KnowledgeBase with default behaviors.""" @@ -86,7 +90,12 @@ def mock_kb(tmp_path): make_search_result("Second result text", "/path/to/file2.md", 1, 0.80), ] kb.ingest.return_value = {"collection": "new_docs", "files": 5, "chunks": 42} - kb.append.return_value = {"collection": "docs", "files": 2, "chunks_added": 10, "total_chunks": 50} + kb.append.return_value = { + "collection": "docs", + "files": 2, + "chunks_added": 10, + "total_chunks": 50, + } return kb @@ -100,6 +109,7 @@ def server(mock_kb): # kb_list_collections # --------------------------------------------------------------------------- + class TestListCollections: def test_returns_collections(self, server, mock_kb): @@ -128,14 +138,19 @@ def test_empty_collections(self, mock_kb): # kb_search # --------------------------------------------------------------------------- + class TestSearch: def test_search_success(self, server, mock_kb): """Successful search returns formatted results.""" - result = call_tool(server, "kb_search", { - "query": "how to install", - "collection": "docs", - }) + result = call_tool( + server, + "kb_search", + { + "query": "how to install", + "collection": "docs", + }, + ) assert result["status"] == "ok" assert result["query"] == "how to install" @@ -150,12 +165,16 @@ def test_search_success(self, server, mock_kb): def test_search_passes_parameters(self, server, mock_kb): """Search forwards top_k and rerank to the knowledge base.""" - call_tool(server, "kb_search", { - "query": "test", - "collection": "docs", - "top_k": 10, - "rerank": False, - }) + call_tool( + server, + "kb_search", + { + "query": "test", + "collection": "docs", + "top_k": 10, + "rerank": False, + }, + ) mock_kb.search.assert_called_once_with( query="test", @@ -166,10 +185,14 @@ def test_search_passes_parameters(self, server, mock_kb): def test_search_defaults(self, server, mock_kb): """Search uses default top_k=5 and server's use_rerank setting.""" - call_tool(server, "kb_search", { - "query": "test", - "collection": "docs", - }) + call_tool( + server, + "kb_search", + { + "query": "test", + "collection": "docs", + }, + ) mock_kb.search.assert_called_once_with( query="test", @@ -182,10 +205,14 @@ def test_search_nonexistent_collection(self, server, mock_kb): """Searching a missing collection returns an error.""" mock_kb.search.side_effect = ValueError("Collection 'missing' not found") - result = call_tool(server, "kb_search", { - "query": "test", - "collection": "missing", - }) + result = call_tool( + server, + "kb_search", + { + "query": "test", + "collection": "missing", + }, + ) assert result["status"] == "error" assert "not found" in result["error"] @@ -196,10 +223,14 @@ def test_search_rerank_score_forwarded(self, server, mock_kb): {**make_search_result("text", "file.md"), "rerank_score": 0.99}, ] - result = call_tool(server, "kb_search", { - "query": "test", - "collection": "docs", - }) + result = call_tool( + server, + "kb_search", + { + "query": "test", + "collection": "docs", + }, + ) assert result["results"][0]["rerank_score"] == 0.99 @@ -208,6 +239,7 @@ def test_search_rerank_score_forwarded(self, server, mock_kb): # kb_ingest (background job pattern) # --------------------------------------------------------------------------- + class TestIngest: def test_ingest_returns_started(self, server, mock_kb, tmp_path): @@ -215,9 +247,13 @@ def test_ingest_returns_started(self, server, mock_kb, tmp_path): folder = tmp_path / "new_docs" folder.mkdir() - result = call_tool(server, "kb_ingest", { - "folder_path": str(folder), - }) + result = call_tool( + server, + "kb_ingest", + { + "folder_path": str(folder), + }, + ) assert result["status"] == "started" assert "job_id" in result @@ -245,23 +281,31 @@ def test_ingest_with_file_types(self, server, mock_kb, tmp_path): async def run(): await call_tool_and_await_job( - server, "kb_ingest", { + server, + "kb_ingest", + { "folder_path": str(folder), "file_types": ["md", "txt"], - } + }, ) # New server always passes collection_name to kb.ingest mock_kb.ingest.assert_called_once_with( - folder, collection_name="docs2", file_types=["md", "txt"], + folder, + collection_name="docs2", + file_types=["md", "txt"], ) asyncio.run(run()) def test_ingest_folder_not_found(self, server): """Ingesting a nonexistent folder returns an error immediately.""" - result = call_tool(server, "kb_ingest", { - "folder_path": "/nonexistent/folder", - }) + result = call_tool( + server, + "kb_ingest", + { + "folder_path": "/nonexistent/folder", + }, + ) assert result["status"] == "error" assert "not found" in result["error"].lower() @@ -271,9 +315,13 @@ def test_ingest_not_a_directory(self, server, tmp_path): file_path = tmp_path / "not_a_dir.txt" file_path.write_text("I'm a file") - result = call_tool(server, "kb_ingest", { - "folder_path": str(file_path), - }) + result = call_tool( + server, + "kb_ingest", + { + "folder_path": str(file_path), + }, + ) assert result["status"] == "error" assert "Not a directory" in result["error"] @@ -308,9 +356,13 @@ def test_ingest_deconflicts_existing_collection(self, server, mock_kb, tmp_path) mock_kb.ingest.return_value = {"collection": "docs1", "files": 3, "chunks": 10} - result = call_tool(server, "kb_ingest", { - "folder_path": str(folder), - }) + result = call_tool( + server, + "kb_ingest", + { + "folder_path": str(folder), + }, + ) assert result["status"] == "started" assert result["collection"] == "docs1" @@ -331,9 +383,13 @@ def test_ingest_deconflicts_symlinked_collection(self, server, mock_kb, tmp_path (mock_kb.index_dir / "docs").symlink_to(base_dir) mock_kb.ingest.return_value = {"collection": "docs1", "files": 3, "chunks": 10} - result = call_tool(server, "kb_ingest", { - "folder_path": str(folder), - }) + result = call_tool( + server, + "kb_ingest", + { + "folder_path": str(folder), + }, + ) assert result["status"] == "started" assert result["collection"] == "docs1" @@ -347,6 +403,7 @@ def test_ingest_deconflicts_symlinked_collection(self, server, mock_kb, tmp_path # kb_job_status # --------------------------------------------------------------------------- + class TestJobStatus: def test_unknown_job(self, server): @@ -365,12 +422,17 @@ def test_running_job(self, server, mock_kb, tmp_path): def blocking_ingest(*args, **kwargs): time.sleep(10) return {"collection": "slow_docs", "files": 1, "chunks": 5} + mock_kb.ingest.side_effect = blocking_ingest async def run(): - initial = await _call_tool_async(server, "kb_ingest", { - "folder_path": str(folder), - }) + initial = await _call_tool_async( + server, + "kb_ingest", + { + "folder_path": str(folder), + }, + ) assert initial["status"] == "started" job_id = initial["job_id"] @@ -385,6 +447,7 @@ async def run(): # kb_append (background job pattern) # --------------------------------------------------------------------------- + class TestAppend: def test_append_returns_started(self, server, mock_kb, tmp_path): @@ -394,10 +457,14 @@ def test_append_returns_started(self, server, mock_kb, tmp_path): coll_dir.mkdir(exist_ok=True) (coll_dir / "index.faiss").write_text("fake") - result = call_tool(server, "kb_append", { - "collection": "docs", - "paths": [str(tmp_path)], - }) + result = call_tool( + server, + "kb_append", + { + "collection": "docs", + "paths": [str(tmp_path)], + }, + ) assert result["status"] == "started" assert "job_id" in result @@ -411,10 +478,12 @@ def test_append_job_completes(self, server, mock_kb, tmp_path): async def run(): initial, final = await call_tool_and_await_job( - server, "kb_append", { + server, + "kb_append", + { "collection": "docs", "paths": [str(tmp_path)], - } + }, ) assert final["status"] == "complete" assert final["result"]["chunks_added"] == 10 @@ -423,10 +492,14 @@ async def run(): def test_append_collection_not_found(self, server, mock_kb): """Appending to a nonexistent collection returns an error immediately.""" - result = call_tool(server, "kb_append", { - "collection": "nonexistent", - "paths": ["/some/path"], - }) + result = call_tool( + server, + "kb_append", + { + "collection": "nonexistent", + "paths": ["/some/path"], + }, + ) assert result["status"] == "error" assert "not found" in result["error"].lower() @@ -436,6 +509,7 @@ def test_append_collection_not_found(self, server, mock_kb): # kb_search — error handling (transport-closed diagnostics) # --------------------------------------------------------------------------- + class TestSearchErrorHandling: """Verify the server returns error responses (not crashes) for common failure modes that would otherwise cause 'transport closed'.""" @@ -443,12 +517,18 @@ class TestSearchErrorHandling: def test_search_httpx_connect_error(self, mock_kb): """Network unreachable during search returns error, not crash.""" import httpx + mock_kb.search.side_effect = httpx.ConnectError("Connection refused") server = create_knowledge_server(mock_kb) - result = call_tool(server, "kb_search", { - "query": "test", "collection": "docs", - }) + result = call_tool( + server, + "kb_search", + { + "query": "test", + "collection": "docs", + }, + ) assert result["status"] == "error" assert "Connection refused" in result["error"] @@ -456,12 +536,18 @@ def test_search_httpx_connect_error(self, mock_kb): def test_search_httpx_timeout(self, mock_kb): """Embedding API timeout during search returns error, not crash.""" import httpx + mock_kb.search.side_effect = httpx.ReadTimeout("Read timed out") server = create_knowledge_server(mock_kb) - result = call_tool(server, "kb_search", { - "query": "test", "collection": "docs", - }) + result = call_tool( + server, + "kb_search", + { + "query": "test", + "collection": "docs", + }, + ) assert result["status"] == "error" assert "timed out" in result["error"].lower() @@ -469,16 +555,24 @@ def test_search_httpx_timeout(self, mock_kb): def test_search_httpx_401(self, mock_kb): """Expired/invalid API key during search returns error, not crash.""" import httpx + mock_resp = MagicMock() mock_resp.status_code = 401 mock_kb.search.side_effect = httpx.HTTPStatusError( - "401 Unauthorized", request=MagicMock(), response=mock_resp, + "401 Unauthorized", + request=MagicMock(), + response=mock_resp, ) server = create_knowledge_server(mock_kb) - result = call_tool(server, "kb_search", { - "query": "test", "collection": "docs", - }) + result = call_tool( + server, + "kb_search", + { + "query": "test", + "collection": "docs", + }, + ) assert result["status"] == "error" assert "401" in result["error"] @@ -486,16 +580,24 @@ def test_search_httpx_401(self, mock_kb): def test_search_httpx_500(self, mock_kb): """Embedding API server error returns error, not crash.""" import httpx + mock_resp = MagicMock() mock_resp.status_code = 500 mock_kb.search.side_effect = httpx.HTTPStatusError( - "500 Internal Server Error", request=MagicMock(), response=mock_resp, + "500 Internal Server Error", + request=MagicMock(), + response=mock_resp, ) server = create_knowledge_server(mock_kb) - result = call_tool(server, "kb_search", { - "query": "test", "collection": "docs", - }) + result = call_tool( + server, + "kb_search", + { + "query": "test", + "collection": "docs", + }, + ) assert result["status"] == "error" assert "500" in result["error"] @@ -505,9 +607,14 @@ def test_search_runtime_error(self, mock_kb): mock_kb.search.side_effect = RuntimeError("FAISS segfault simulation") server = create_knowledge_server(mock_kb) - result = call_tool(server, "kb_search", { - "query": "test", "collection": "docs", - }) + result = call_tool( + server, + "kb_search", + { + "query": "test", + "collection": "docs", + }, + ) assert result["status"] == "error" assert "FAISS segfault" in result["error"] @@ -517,9 +624,14 @@ def test_search_os_error(self, mock_kb): mock_kb.search.side_effect = OSError("Permission denied: index.faiss") server = create_knowledge_server(mock_kb) - result = call_tool(server, "kb_search", { - "query": "test", "collection": "docs", - }) + result = call_tool( + server, + "kb_search", + { + "query": "test", + "collection": "docs", + }, + ) assert result["status"] == "error" assert "Permission denied" in result["error"] @@ -530,6 +642,7 @@ def test_search_os_error(self, mock_kb): # setup_runtime_kb # --------------------------------------------------------------------------- + class TestSetupRuntimeKb: def test_copies_collections(self, tmp_path): @@ -616,18 +729,19 @@ def test_does_not_overwrite_existing(self, tmp_path): # Regression: OpenMP duplicate library crash (transport closed) # --------------------------------------------------------------------------- + class TestOpenMPWorkaround: - """Importing knowledge_server must set KMP_DUPLICATE_LIB_OK to prevent - a fatal OpenMP crash when FAISS and sentence-transformers (PyTorch) + """Importing the knowledge tools module must set KMP_DUPLICATE_LIB_OK to + prevent a fatal OpenMP crash when FAISS and sentence-transformers (PyTorch) both bundle libomp. Without this, kb_search with rerank=true kills the server process, producing 'transport closed' in MCP clients.""" def test_kmp_duplicate_lib_ok_is_set(self): - """KMP_DUPLICATE_LIB_OK is set after importing the knowledge server.""" + """KMP_DUPLICATE_LIB_OK is set after importing dsagt.mcp.knowledge_tools.""" import os - import dsagt.commands.knowledge_server # noqa: F401 + import dsagt.mcp.knowledge_tools # noqa: F401 assert os.environ.get("KMP_DUPLICATE_LIB_OK") == "TRUE" @@ -636,6 +750,7 @@ def test_kmp_duplicate_lib_ok_is_set(self): # Regression: rerank schema default must match server config # --------------------------------------------------------------------------- + class TestRerankSchemaDefault: """The kb_search schema previously hardcoded 'default': True for the rerank parameter, causing agents to request reranking even when the @@ -665,13 +780,141 @@ def test_search_omitted_rerank_passes_none(self, mock_kb): """Omitting rerank passes None to kb.search, which resolves to kb.default_rerank internally.""" server = create_knowledge_server(mock_kb) - call_tool(server, "kb_search", { - "query": "test", - "collection": "docs", - }) + call_tool( + server, + "kb_search", + { + "query": "test", + "collection": "docs", + }, + ) mock_kb.search.assert_called_once_with( query="test", collection="docs", top_k=5, rerank=None, ) + + +# --------------------------------------------------------------------------- +# kb_search — multi-collection fan-out (moved from the former memory test file) +# --------------------------------------------------------------------------- + + +class TestKbSearchMultiCollection: + + def test_single_collection_backward_compat(self, server, mock_kb): + """Plain search with collection still works.""" + result = call_tool( + server, + "kb_search", + { + "query": "test", + "collection": "docs", + }, + ) + + assert result["status"] == "ok" + mock_kb.search.assert_called_once() + + def test_multi_collection_fanout(self, server, mock_kb): + """Searching multiple collections calls search for each.""" + mock_kb.search.return_value = [ + make_search_result("result", "/file.md", 0, 0.9), + ] + + result = call_tool( + server, + "kb_search", + { + "query": "test", + "collections": ["docs", "papers"], + }, + ) + + assert result["status"] == "ok" + assert mock_kb.search.call_count == 2 + + def test_no_collection_returns_error(self, server): + result = call_tool( + server, + "kb_search", + { + "query": "test", + }, + ) + + assert result["status"] == "error" + + def test_multi_collection_merges_results(self, server, mock_kb): + """Results from multiple collections are merged and sorted.""" + call_count = [0] + + def varying_results(**kwargs): + call_count[0] += 1 + score = 0.9 if call_count[0] == 1 else 0.7 + return [ + make_search_result( + f"result_{call_count[0]}", + f"/file_{call_count[0]}.md", + score=score, + ) + ] + + mock_kb.search.side_effect = varying_results + + result = call_tool( + server, + "kb_search", + { + "query": "test", + "collections": ["docs", "papers"], + "top_k": 5, + }, + ) + + assert result["result_count"] == 2 + scores = [r["score"] for r in result["results"]] + assert scores == sorted(scores, reverse=True) + + def test_missing_collection_skipped(self, server, mock_kb): + """A missing collection logs a warning but doesn't fail the search.""" + + def search_with_error(**kwargs): + if kwargs["collection"] == "missing": + raise ValueError("Collection 'missing' not found") + return [make_search_result("result", "/file.md")] + + mock_kb.search.side_effect = search_with_error + + result = call_tool( + server, + "kb_search", + { + "query": "test", + "collections": ["docs", "missing"], + }, + ) + + assert result["status"] == "ok" + assert result["result_count"] == 1 + + +class TestKbSearchSchema: + + def _get_tool(self, server, name): + req = types.ListToolsRequest(method="tools/list") + handler = server.request_handlers[types.ListToolsRequest] + result = asyncio.run(handler(req)) + for tool in result.root.tools: + if tool.name == name: + return tool + return None + + def test_kb_search_has_collections_param(self, server): + tool = self._get_tool(server, "kb_search") + assert "collections" in tool.inputSchema["properties"] + + def test_kb_search_query_is_only_required(self, server): + tool = self._get_tool(server, "kb_search") + assert tool.inputSchema["required"] == ["query"] diff --git a/tests/test_knowledge_server_memory.py b/tests/test_knowledge_server_memory.py deleted file mode 100644 index a4d1e9f..0000000 --- a/tests/test_knowledge_server_memory.py +++ /dev/null @@ -1,278 +0,0 @@ -""" -Tests for kb_remember, kb_get_memories, and extended kb_search handlers. - -Drop this file into tests/test_knowledge_server_memory.py - -These tests use a real ExplicitMemory (file-backed, no mocking needed) -and a mocked KnowledgeBase (same pattern as existing server tests). -""" - -import asyncio -from pathlib import Path -from unittest.mock import MagicMock - -import pytest -import mcp.types as types - -from dsagt.commands.knowledge_server import create_knowledge_server -from dsagt.memory import ExplicitMemory -from mcp_helpers import call_tool_json as call_tool - - -def make_search_result(text, source_file, chunk_index=0, score=0.9): - return { - "chunk": { - "text": text, - "metadata": { - "source_file": source_file, - "collection": "test_collection", - "chunk_index": chunk_index, - "file_type": ".md", - }, - }, - "score": score, - } - - -# --------------------------------------------------------------------------- -# Fixtures -# --------------------------------------------------------------------------- - - -@pytest.fixture -def mock_kb(tmp_path): - kb = MagicMock() - kb.index_dir = tmp_path / "kb_index" - kb.index_dir.mkdir() - kb.list_collections.return_value = [ - {"name": "docs", "description": "Documentation"}, - ] - kb.search.return_value = [ - make_search_result("Result one", "/path/to/file.md", 0, 0.95), - ] - kb.ingest.return_value = {"collection": "docs", "files": 1, "chunks": 10} - kb.append.return_value = {"collection": "docs", "files": 1, "chunks_added": 5, "total_chunks": 15} - return kb - - -@pytest.fixture -def server(mock_kb, tmp_path): - return create_knowledge_server(mock_kb, runtime_dir=tmp_path) - - -@pytest.fixture -def memory(tmp_path): - return ExplicitMemory(runtime_dir=tmp_path) - - -# --------------------------------------------------------------------------- -# kb_remember -# --------------------------------------------------------------------------- - - -class TestKbRemember: - - def test_stores_a_fact(self, server): - result = call_tool(server, "kb_remember", { - "text": "fastp quality threshold is Q20", - }) - - assert result["status"] == "ok" - assert result["entry_id"] - assert result["total_memories"] == 1 - - def test_stores_with_metadata(self, server): - result = call_tool(server, "kb_remember", { - "text": "some fact", - "category": "quality_control", - "session_id": "sess_01", - }) - - assert result["status"] == "ok" - - def test_supersede_existing(self, server): - r1 = call_tool(server, "kb_remember", { - "text": "old threshold Q20", - }) - r2 = call_tool(server, "kb_remember", { - "text": "new threshold Q30", - "supersedes": r1["entry_id"], - }) - - assert r2["status"] == "ok" - assert r2["superseded_id"] == r1["entry_id"] - assert r2["total_memories"] == 1 - - def test_supersede_nonexistent_returns_error(self, server): - result = call_tool(server, "kb_remember", { - "text": "new fact", - "supersedes": "bad_id", - }) - - assert result["status"] == "error" - assert "not found" in result["error"] - - def test_multiple_facts(self, server): - call_tool(server, "kb_remember", {"text": "fact one"}) - call_tool(server, "kb_remember", {"text": "fact two"}) - result = call_tool(server, "kb_remember", {"text": "fact three"}) - - assert result["total_memories"] == 3 - - -# --------------------------------------------------------------------------- -# kb_get_memories -# --------------------------------------------------------------------------- - - -class TestKbGetMemories: - - def test_empty_returns_zero(self, server): - result = call_tool(server, "kb_get_memories", {}) - - assert result["status"] == "ok" - assert result["count"] == 0 - assert result["memories"] == [] - - def test_returns_stored_memories(self, server): - call_tool(server, "kb_remember", {"text": "fact one"}) - call_tool(server, "kb_remember", {"text": "fact two"}) - - result = call_tool(server, "kb_get_memories", {}) - - assert result["count"] == 2 - texts = {m["text"] for m in result["memories"]} - assert texts == {"fact one", "fact two"} - - def test_excludes_superseded(self, server): - r1 = call_tool(server, "kb_remember", {"text": "old fact"}) - call_tool(server, "kb_remember", { - "text": "new fact", - "supersedes": r1["entry_id"], - }) - - result = call_tool(server, "kb_get_memories", {}) - - assert result["count"] == 1 - assert result["memories"][0]["text"] == "new fact" - - -# --------------------------------------------------------------------------- -# kb_search — multi-collection fan-out -# --------------------------------------------------------------------------- - - -class TestKbSearchMultiCollection: - - def test_single_collection_backward_compat(self, server, mock_kb): - """Plain search with collection still works.""" - result = call_tool(server, "kb_search", { - "query": "test", - "collection": "docs", - }) - - assert result["status"] == "ok" - mock_kb.search.assert_called_once() - - def test_multi_collection_fanout(self, server, mock_kb): - """Searching multiple collections calls search for each.""" - mock_kb.search.return_value = [ - make_search_result("result", "/file.md", 0, 0.9), - ] - - result = call_tool(server, "kb_search", { - "query": "test", - "collections": ["docs", "papers"], - }) - - assert result["status"] == "ok" - assert mock_kb.search.call_count == 2 - - def test_no_collection_returns_error(self, server): - result = call_tool(server, "kb_search", { - "query": "test", - }) - - assert result["status"] == "error" - - def test_multi_collection_merges_results(self, server, mock_kb): - """Results from multiple collections are merged and sorted.""" - call_count = [0] - - def varying_results(**kwargs): - call_count[0] += 1 - score = 0.9 if call_count[0] == 1 else 0.7 - return [make_search_result( - f"result_{call_count[0]}", - f"/file_{call_count[0]}.md", - score=score, - )] - - mock_kb.search.side_effect = varying_results - - result = call_tool(server, "kb_search", { - "query": "test", - "collections": ["docs", "papers"], - "top_k": 5, - }) - - assert result["result_count"] == 2 - scores = [r["score"] for r in result["results"]] - assert scores == sorted(scores, reverse=True) - - def test_missing_collection_skipped(self, server, mock_kb): - """A missing collection logs a warning but doesn't fail the search.""" - def search_with_error(**kwargs): - if kwargs["collection"] == "missing": - raise ValueError("Collection 'missing' not found") - return [make_search_result("result", "/file.md")] - - mock_kb.search.side_effect = search_with_error - - result = call_tool(server, "kb_search", { - "query": "test", - "collections": ["docs", "missing"], - }) - - assert result["status"] == "ok" - assert result["result_count"] == 1 - - -# --------------------------------------------------------------------------- -# Tool schemas -# --------------------------------------------------------------------------- - - -class TestToolSchemas: - - def _get_tool(self, server, name): - req = types.ListToolsRequest(method="tools/list") - handler = server.request_handlers[types.ListToolsRequest] - result = asyncio.run(handler(req)) - for tool in result.root.tools: - if tool.name == name: - return tool - return None - - def test_kb_remember_exists(self, server): - tool = self._get_tool(server, "kb_remember") - assert tool is not None - assert "text" in tool.inputSchema["properties"] - assert tool.inputSchema["required"] == ["text"] - - def test_kb_remember_has_optional_params(self, server): - tool = self._get_tool(server, "kb_remember") - for param in ("category", "session_id", "supersedes"): - assert param in tool.inputSchema["properties"] - - def test_kb_get_memories_exists(self, server): - tool = self._get_tool(server, "kb_get_memories") - assert tool is not None - - def test_kb_search_has_collections_param(self, server): - tool = self._get_tool(server, "kb_search") - assert "collections" in tool.inputSchema["properties"] - - def test_kb_search_query_is_only_required(self, server): - tool = self._get_tool(server, "kb_search") - assert tool.inputSchema["required"] == ["query"] diff --git a/tests/test_memory_tools.py b/tests/test_memory_tools.py new file mode 100644 index 0000000..4b05111 --- /dev/null +++ b/tests/test_memory_tools.py @@ -0,0 +1,188 @@ +""" +Tests for the explicit-memory MCP tools (kb_remember, kb_get_memories). + +These tests use a real ExplicitMemory (file-backed, no mocking needed) and a +mocked KnowledgeBase (same pattern as the other server tests). The tools live +in :mod:`dsagt.mcp.memory_tools`; ``create_memory_server`` exposes just that +concern for driving via ``call_tool_sync``. +""" + +import asyncio +from unittest.mock import MagicMock + +import pytest +import mcp.types as types + +from dsagt.mcp.memory_tools import create_memory_server +from dsagt.memory import ExplicitMemory +from mcp_helpers import call_tool_json as call_tool + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + + +@pytest.fixture +def mock_kb(tmp_path): + kb = MagicMock() + kb.index_dir = tmp_path / "kb_index" + kb.index_dir.mkdir() + return kb + + +@pytest.fixture +def server(mock_kb, tmp_path): + return create_memory_server(mock_kb, runtime_dir=tmp_path) + + +@pytest.fixture +def memory(tmp_path): + return ExplicitMemory(runtime_dir=tmp_path) + + +# --------------------------------------------------------------------------- +# kb_remember +# --------------------------------------------------------------------------- + + +class TestKbRemember: + + def test_stores_a_fact(self, server): + result = call_tool( + server, + "kb_remember", + { + "text": "fastp quality threshold is Q20", + }, + ) + + assert result["status"] == "ok" + assert result["entry_id"] + assert result["total_memories"] == 1 + + def test_stores_with_metadata(self, server): + result = call_tool( + server, + "kb_remember", + { + "text": "some fact", + "category": "quality_control", + "session_id": "sess_01", + }, + ) + + assert result["status"] == "ok" + + def test_supersede_existing(self, server): + r1 = call_tool( + server, + "kb_remember", + { + "text": "old threshold Q20", + }, + ) + r2 = call_tool( + server, + "kb_remember", + { + "text": "new threshold Q30", + "supersedes": r1["entry_id"], + }, + ) + + assert r2["status"] == "ok" + assert r2["superseded_id"] == r1["entry_id"] + assert r2["total_memories"] == 1 + + def test_supersede_nonexistent_returns_error(self, server): + result = call_tool( + server, + "kb_remember", + { + "text": "new fact", + "supersedes": "bad_id", + }, + ) + + assert result["status"] == "error" + assert "not found" in result["error"] + + def test_multiple_facts(self, server): + call_tool(server, "kb_remember", {"text": "fact one"}) + call_tool(server, "kb_remember", {"text": "fact two"}) + result = call_tool(server, "kb_remember", {"text": "fact three"}) + + assert result["total_memories"] == 3 + + +# --------------------------------------------------------------------------- +# kb_get_memories +# --------------------------------------------------------------------------- + + +class TestKbGetMemories: + + def test_empty_returns_zero(self, server): + result = call_tool(server, "kb_get_memories", {}) + + assert result["status"] == "ok" + assert result["count"] == 0 + assert result["memories"] == [] + + def test_returns_stored_memories(self, server): + call_tool(server, "kb_remember", {"text": "fact one"}) + call_tool(server, "kb_remember", {"text": "fact two"}) + + result = call_tool(server, "kb_get_memories", {}) + + assert result["count"] == 2 + texts = {m["text"] for m in result["memories"]} + assert texts == {"fact one", "fact two"} + + def test_excludes_superseded(self, server): + r1 = call_tool(server, "kb_remember", {"text": "old fact"}) + call_tool( + server, + "kb_remember", + { + "text": "new fact", + "supersedes": r1["entry_id"], + }, + ) + + result = call_tool(server, "kb_get_memories", {}) + + assert result["count"] == 1 + assert result["memories"][0]["text"] == "new fact" + + +# --------------------------------------------------------------------------- +# Tool schemas +# --------------------------------------------------------------------------- + + +class TestToolSchemas: + + def _get_tool(self, server, name): + req = types.ListToolsRequest(method="tools/list") + handler = server.request_handlers[types.ListToolsRequest] + result = asyncio.run(handler(req)) + for tool in result.root.tools: + if tool.name == name: + return tool + return None + + def test_kb_remember_exists(self, server): + tool = self._get_tool(server, "kb_remember") + assert tool is not None + assert "text" in tool.inputSchema["properties"] + assert tool.inputSchema["required"] == ["text"] + + def test_kb_remember_has_optional_params(self, server): + tool = self._get_tool(server, "kb_remember") + for param in ("category", "session_id", "supersedes"): + assert param in tool.inputSchema["properties"] + + def test_kb_get_memories_exists(self, server): + tool = self._get_tool(server, "kb_get_memories") + assert tool is not None diff --git a/tests/test_observability.py b/tests/test_observability.py index dd1ff46..f29d11e 100644 --- a/tests/test_observability.py +++ b/tests/test_observability.py @@ -735,7 +735,7 @@ def _make_registry_server(tmp_path): Mirrors the pattern used by tests/test_registry_server.py so we exercise the real call_tool dispatcher rather than reaching into private state. """ - from dsagt.commands.registry_server import create_registry_server + from dsagt.mcp.registry_tools import create_registry_server from dsagt.registry import ToolRegistry source_dir = tmp_path / "source_skills" @@ -788,7 +788,7 @@ def test_save_tool_spec_with_deps_nests_install_span( from mcp_helpers import call_tool_sync as call_tool # Stub the actual uv install so the test doesn't hit the network. - import dsagt.commands.registry_server as rs_mod + import dsagt.mcp.registry_tools as rs_mod monkeypatch.setattr( rs_mod, @@ -819,7 +819,7 @@ def test_install_dependencies_failed_records_event( an install_failed event with the error message truncated.""" from mcp_helpers import call_tool_sync as call_tool - import dsagt.commands.registry_server as rs_mod + import dsagt.mcp.registry_tools as rs_mod monkeypatch.setattr( rs_mod, diff --git a/tests/test_registry.py b/tests/test_registry.py index 2b1c74f..5d5f155 100644 --- a/tests/test_registry.py +++ b/tests/test_registry.py @@ -9,11 +9,53 @@ import yaml from dsagt.registry import ( - ToolRegistry, SkillRegistry, _wrap_executable, _uv_run_prefix, _parse_frontmatter, + ToolRegistry, + SkillRegistry, + _wrap_executable, + _uv_run_prefix, + _parse_frontmatter, + _lenient_frontmatter, render_arguments, ) +class TestLenientFrontmatter: + """Frontmatter that isn't strict YAML must still yield discovery fields. + + Real third-party skill catalogs (e.g. Genesis) ship SKILL.md files whose + unquoted ``description`` contains a colon (``...readiness levels: Level + 1...``) — invalid YAML. These must be recovered, not dropped from discovery. + """ + + def test_unquoted_colon_in_description_is_recovered(self, tmp_path): + path = tmp_path / "SKILL.md" + path.write_text( + "---\n" + "name: generating-datacards\n" + "description: Generates a datacard. Supports levels: Level 1, Level 2.\n" + "---\n\n# Body\n" + ) + spec = _parse_frontmatter(path) # must NOT raise + assert spec["name"] == "generating-datacards" + assert spec["description"].startswith("Generates a datacard") + assert "Level 1" in spec["description"] # colon-bearing tail preserved + + def test_lenient_parses_inline_list_and_continuation(self): + spec = _lenient_frontmatter( + "\nname: x\ndescription: a: b: c\ntags: [one, two]\n" + ) + assert spec["name"] == "x" + assert spec["description"] == "a: b: c" # split on first colon only + assert spec["tags"] == ["one", "two"] + + def test_valid_yaml_still_uses_strict_path(self, tmp_path): + # Sanity: well-formed frontmatter is unchanged by the fallback. + path = tmp_path / "SKILL.md" + path.write_text("---\nname: ok\ndescription: clean\ntags:\n - a\n---\n") + spec = _parse_frontmatter(path) + assert spec == {"name": "ok", "description": "clean", "tags": ["a"]} + + # --------------------------------------------------------------------------- # Fixtures # --------------------------------------------------------------------------- @@ -87,6 +129,7 @@ def empty_registry(tmp_path): # list_tools # --------------------------------------------------------------------------- + class TestListTools: def test_schema_structure(self, registry): @@ -146,6 +189,7 @@ def test_no_params_tool(self, tmp_path): # get_tool # --------------------------------------------------------------------------- + class TestGetTool: def test_found(self, registry): @@ -164,6 +208,7 @@ def test_not_found(self, registry): # save_tool # --------------------------------------------------------------------------- + class TestSaveTool: def test_add_new_tool(self, empty_registry): @@ -177,27 +222,41 @@ def test_add_new_tool(self, empty_registry): def test_wraps_executable_with_dsagt_run(self, empty_registry): """save_tool automatically wraps the executable with dsagt-run.""" - empty_registry.save_tool({"name": "mytool", "description": "test", - "executable": "python mytool.py", "parameters": {}}) + empty_registry.save_tool( + { + "name": "mytool", + "description": "test", + "executable": "python mytool.py", + "parameters": {}, + } + ) tool = empty_registry.get_tool("mytool") assert tool["executable"] == "dsagt-run --tool mytool -- python mytool.py" def test_does_not_double_wrap(self, empty_registry): """If executable already has dsagt-run, don't wrap again.""" - empty_registry.save_tool({"name": "mytool", "description": "test", - "executable": "dsagt-run --tool mytool -- python mytool.py", - "parameters": {}}) + empty_registry.save_tool( + { + "name": "mytool", + "description": "test", + "executable": "dsagt-run --tool mytool -- python mytool.py", + "parameters": {}, + } + ) tool = empty_registry.get_tool("mytool") assert tool["executable"].count("dsagt-run") == 1 def test_python_deps_use_uv_run(self, empty_registry): """Python dependencies are wrapped with uv run --with.""" - empty_registry.save_tool({ - "name": "analyzer", "description": "test", - "executable": "python analyzer.py", - "parameters": {}, - "dependencies": ["pandas>=2.0", "numpy"], - }) + empty_registry.save_tool( + { + "name": "analyzer", + "description": "test", + "executable": "python analyzer.py", + "parameters": {}, + "dependencies": ["pandas>=2.0", "numpy"], + } + ) tool = empty_registry.get_tool("analyzer") assert tool["executable"] == ( "dsagt-run --tool analyzer -- uv run --with pandas>=2.0,numpy -- python analyzer.py" @@ -205,8 +264,14 @@ def test_python_deps_use_uv_run(self, empty_registry): def test_no_deps_no_uv_run(self, empty_registry): """Tools without dependencies don't get uv run prefix.""" - empty_registry.save_tool({"name": "simple", "description": "test", - "executable": "echo hi", "parameters": {}}) + empty_registry.save_tool( + { + "name": "simple", + "description": "test", + "executable": "echo hi", + "parameters": {}, + } + ) tool = empty_registry.get_tool("simple") assert "uv run" not in tool["executable"] assert tool["executable"] == "dsagt-run --tool simple -- echo hi" @@ -247,6 +312,7 @@ def test_update_overwrites_frontmatter(self, empty_registry): # Runtime isolation # --------------------------------------------------------------------------- + class TestRuntimeIsolation: def test_source_unchanged_after_init(self, tmp_path): @@ -277,6 +343,7 @@ def test_source_unchanged_after_init(self, tmp_path): # Default skills # --------------------------------------------------------------------------- + class TestDefaultTools: """Validate the tool files that ship with the package.""" @@ -310,6 +377,7 @@ def test_default_init_fallback(self, tmp_path): # render_arguments # --------------------------------------------------------------------------- + class TestRenderArguments: def test_default_cli_is_double_dash_name(self): @@ -353,7 +421,8 @@ def test_positionals_before_named(self): "path": {"type": "string", "cli": "positional:0"}, } assert render_arguments(params, {"path": "/x", "verbose": True}) == [ - "/x", "--verbose", + "/x", + "--verbose", ] def test_boolean_true_emits_flag(self): @@ -378,7 +447,9 @@ def test_optional_missing_skipped(self): assert render_arguments(params, {}) == [] def test_required_missing_raises(self): - params = {"directory": {"type": "string", "cli": "positional", "required": True}} + params = { + "directory": {"type": "string", "cli": "positional", "required": True} + } with pytest.raises(ValueError, match="directory"): render_arguments(params, {}) diff --git a/tests/test_registry_server.py b/tests/test_registry_server.py index a7e8cb2..dde1ba9 100644 --- a/tests/test_registry_server.py +++ b/tests/test_registry_server.py @@ -5,7 +5,6 @@ read_file, run_command, install_dependencies. """ -import json import subprocess import sys from pathlib import Path @@ -14,13 +13,17 @@ import pytest import yaml -from dsagt.registry import SkillRegistry, ToolRegistry -from dsagt.commands.registry_server import create_registry_server +from dsagt.registry import ToolRegistry +from dsagt.mcp.registry_tools import create_registry_server from mcp_helpers import call_tool_sync as call_tool -def make_spec(name="test_tool", description="A test tool", executable="echo hello", - dependencies=None): +def make_spec( + name="test_tool", + description="A test tool", + executable="echo hello", + dependencies=None, +): """Create a minimal valid tool spec.""" spec = { "name": name, @@ -59,7 +62,7 @@ def _make_server(tmp_path, tools=None): runtime_dir = tmp_path / "runtime" project_tools_dir = runtime_dir / "tools" project_tools_dir.mkdir(parents=True, exist_ok=True) - for spec in (tools or []): + for spec in tools or []: _write_tool(project_tools_dir, spec) reg = ToolRegistry( source_tools_dir=str(source_dir), @@ -72,6 +75,7 @@ def _make_server(tmp_path, tools=None): # Fixtures # --------------------------------------------------------------------------- + @pytest.fixture def server_and_registry(tmp_path): return _make_server(tmp_path) @@ -89,10 +93,13 @@ def registry(server_and_registry): @pytest.fixture def populated(tmp_path): - server, reg = _make_server(tmp_path, tools=[ - make_spec("tool_alpha", "Alpha tool", "python alpha.py"), - make_spec("tool_beta", "Beta data processor", "python beta.py"), - ]) + server, reg = _make_server( + tmp_path, + tools=[ + make_spec("tool_alpha", "Alpha tool", "python alpha.py"), + make_spec("tool_beta", "Beta data processor", "python beta.py"), + ], + ) return server, reg @@ -105,6 +112,7 @@ def populated_server(populated): # save_tool_spec # --------------------------------------------------------------------------- + class TestSaveToolSpec: def test_add_new_tool(self, server, registry): @@ -118,8 +126,16 @@ def test_add_new_tool(self, server, registry): def test_update_existing_tool(self, server, registry): """Saving a spec with the same name updates rather than duplicates.""" - call_tool(server, "save_tool_spec", {"spec": make_spec("my_tool", description="Version 1")}) - text = call_tool(server, "save_tool_spec", {"spec": make_spec("my_tool", description="Version 2")}) + call_tool( + server, + "save_tool_spec", + {"spec": make_spec("my_tool", description="Version 1")}, + ) + text = call_tool( + server, + "save_tool_spec", + {"spec": make_spec("my_tool", description="Version 2")}, + ) assert "updated" in text assert "1 tools" in text @@ -138,6 +154,7 @@ def test_accepts_stringified_spec(self, server, registry): """Some MCP clients (Claude Sonnet/Haiku 4.x) send nested-object args as JSON strings. The handler must accept both shapes.""" import json + spec = make_spec("stringy_tool") text = call_tool(server, "save_tool_spec", {"spec": json.dumps(spec)}) @@ -153,83 +170,9 @@ def test_rejects_invalid_stringified_spec(self, server, registry): # --------------------------------------------------------------------------- -# save_skill +# install_skill # --------------------------------------------------------------------------- -class TestSaveSkill: - - def test_add_new_skill_creates_files_and_indexes(self, tmp_path): - """save_skill writes SKILL.md and indexes when KB is configured. - - The skill count includes any bundled skills that ship in the - package (see SkillRegistry.list_skills which merges bundled + - project layers), so we assert the file was created and the - count went up by one rather than equality on a specific number. - """ - server, reg, kb = _make_server_with_kb(tmp_path) - from dsagt.registry import SkillRegistry as _SR - skill_reg = _SR(runtime_dir=str(tmp_path / "runtime"), kb=kb) - before = len(skill_reg.list_skills()) - - spec = { - "name": "csv_inspector", - "description": "Workflow for inspecting CSV columns and quality", - "tags": ["data_management", "quality_control"], - } - body = "# csv_inspector\n\nFirst, run head on the file. Then check nulls.\n" - text = call_tool(server, "save_skill", {"spec": spec, "body": body}) - - assert "added" in text - skill_md = tmp_path / "runtime" / "skills" / "csv_inspector" / "SKILL.md" - assert skill_md.exists() - content = skill_md.read_text() - assert "csv_inspector" in content - assert "First, run head" in content - after = len(skill_reg.list_skills()) - assert after == before + 1 - - def test_update_existing_skill_preserves_body_when_omitted(self, tmp_path): - """Saving a spec for an existing skill without body keeps the body.""" - server, reg, kb = _make_server_with_kb(tmp_path) - first_body = "# orig\n\nOriginal workflow body.\n" - call_tool(server, "save_skill", { - "spec": {"name": "wf", "description": "v1"}, - "body": first_body, - }) - # Update the description only — body should be preserved. - text = call_tool(server, "save_skill", { - "spec": {"name": "wf", "description": "v2 description"}, - }) - assert "updated" in text - skill_md = tmp_path / "runtime" / "skills" / "wf" / "SKILL.md" - content = skill_md.read_text() - assert "v2 description" in content - assert "Original workflow body" in content - - def test_save_skill_writes_reference_files(self, tmp_path): - """reference_files dict lands as additional files in the skill dir.""" - server, reg, kb = _make_server_with_kb(tmp_path) - text = call_tool(server, "save_skill", { - "spec": {"name": "with_template", "description": "Has a template"}, - "body": "# with_template\n\nUses template.json.\n", - "reference_files": {"template.json": '{"foo": "bar"}\n'}, - }) - assert "added" in text - skill_dir = tmp_path / "runtime" / "skills" / "with_template" - assert (skill_dir / "SKILL.md").exists() - assert (skill_dir / "template.json").read_text() == '{"foo": "bar"}\n' - - def test_save_skill_string_encoded_spec(self, tmp_path): - """MCP clients that JSON-encode nested object args still work.""" - server, reg, kb = _make_server_with_kb(tmp_path) - spec_json = json.dumps({"name": "s1", "description": "d"}) - text = call_tool(server, "save_skill", {"spec": spec_json, "body": "x"}) - assert "added" in text - - -# --------------------------------------------------------------------------- -# get_registry -# --------------------------------------------------------------------------- class TestGetRegistry: @@ -253,6 +196,7 @@ def test_populated_registry(self, populated_server, populated): # search_registry # --------------------------------------------------------------------------- + class TestSearchRegistryNoKB: """search_registry with no KB configured. @@ -266,12 +210,16 @@ class TestSearchRegistryNoKB: def test_exact_name_lookup_works_without_kb(self, populated_server): """tool_name lookup is KB-free and must keep working.""" - text = call_tool(populated_server, "search_registry", {"tool_name": "tool_alpha"}) + text = call_tool( + populated_server, "search_registry", {"tool_name": "tool_alpha"} + ) assert "tool_alpha" in text def test_exact_name_miss_without_kb(self, populated_server): """tool_name with a non-existent name returns a clean 'no tool' message.""" - text = call_tool(populated_server, "search_registry", {"tool_name": "nonexistent"}) + text = call_tool( + populated_server, "search_registry", {"tool_name": "nonexistent"} + ) assert "No tool named 'nonexistent'" in text def test_query_search_without_kb_returns_helpful_error(self, populated_server): @@ -291,6 +239,7 @@ def test_empty_query_without_kb_returns_helpful_error(self, populated_server): # read_file # --------------------------------------------------------------------------- + class TestReadFile: def test_read_success(self, server, tmp_path): @@ -311,31 +260,44 @@ def test_read_missing_file(self, server): # run_command # --------------------------------------------------------------------------- + class TestRunCommand: def test_success(self, server): """Running a valid command returns its output.""" - text = call_tool(server, "run_command", { - "command": "echo", - "args": ["hello"], - }) + text = call_tool( + server, + "run_command", + { + "command": "echo", + "args": ["hello"], + }, + ) assert "hello" in text assert "Return code: 0" in text def test_command_not_found(self, server): """Running a nonexistent command returns not found error.""" - text = call_tool(server, "run_command", { - "command": "nonexistent_command_xyz", - }) + text = call_tool( + server, + "run_command", + { + "command": "nonexistent_command_xyz", + }, + ) assert "not found" in text def test_timeout(self, server): """A command that exceeds the timeout reports timeout.""" - text = call_tool(server, "run_command", { - "command": "sleep", - "args": ["30"], - "timeout": 0.1, - }) + text = call_tool( + server, + "run_command", + { + "command": "sleep", + "args": ["30"], + "timeout": 0.1, + }, + ) assert "timed out" in text @@ -343,9 +305,10 @@ def test_timeout(self, server): # save_tool_spec — dependency installation # --------------------------------------------------------------------------- + class TestSaveToolSpecDependencies: - @patch("dsagt.commands.registry_server.subprocess.run") + @patch("dsagt.mcp.registry_tools.subprocess.run") def test_deps_installed_on_save(self, mock_run, server, registry): """When dependencies are provided, uv pip install is called.""" mock_run.return_value = MagicMock( @@ -358,10 +321,17 @@ def test_deps_installed_on_save(self, mock_run, server, registry): assert "Successfully installed" in text mock_run.assert_called_once() cmd = mock_run.call_args[0][0] - assert cmd == ["uv", "pip", "install", "--python", sys.executable, - "pandas>=2.0", "numpy"] - - @patch("dsagt.commands.registry_server.subprocess.run") + assert cmd == [ + "uv", + "pip", + "install", + "--python", + sys.executable, + "pandas>=2.0", + "numpy", + ] + + @patch("dsagt.mcp.registry_tools.subprocess.run") def test_deps_failure_still_saves_spec(self, mock_run, server, registry): """Even if uv pip install fails, the spec is saved as a skill file.""" mock_run.return_value = MagicMock( @@ -376,7 +346,7 @@ def test_deps_failure_still_saves_spec(self, mock_run, server, registry): assert tool is not None assert tool["dependencies"] == ["bogus-pkg"] - @patch("dsagt.commands.registry_server.subprocess.run") + @patch("dsagt.mcp.registry_tools.subprocess.run") def test_deps_timeout(self, mock_run, server): """Timeout during install is reported, spec is still saved.""" mock_run.side_effect = subprocess.TimeoutExpired("uv", 120) @@ -394,7 +364,7 @@ def test_no_deps_no_install_message(self, server, registry): assert "added" in text assert "Dependency" not in text - @patch("dsagt.commands.registry_server.subprocess.run") + @patch("dsagt.mcp.registry_tools.subprocess.run") def test_deps_persisted_in_skill_file(self, mock_run, server, registry): """Dependencies are stored in the skill file frontmatter.""" mock_run.return_value = MagicMock(returncode=0, stdout="ok", stderr="") @@ -404,7 +374,7 @@ def test_deps_persisted_in_skill_file(self, mock_run, server, registry): tool = registry.get_tool("dep_tool") assert tool["dependencies"] == ["requests>=2.28"] - @patch("dsagt.commands.registry_server.subprocess.run") + @patch("dsagt.mcp.registry_tools.subprocess.run") def test_uv_not_found(self, mock_run, server): """FileNotFoundError from missing uv is reported gracefully.""" mock_run.side_effect = FileNotFoundError("uv") @@ -419,15 +389,19 @@ def test_uv_not_found(self, mock_run, server): # install_dependencies # --------------------------------------------------------------------------- + class TestInstallDependencies: - @patch("dsagt.commands.registry_server.subprocess.run") + @patch("dsagt.mcp.registry_tools.subprocess.run") def test_install_all(self, mock_run, tmp_path): """install_dependencies with no tool_name installs all unique deps.""" - server, reg = _make_server(tmp_path, tools=[ - make_spec("tool_a", dependencies=["pandas", "numpy"]), - make_spec("tool_b", dependencies=["numpy", "scipy"]), - ]) + server, reg = _make_server( + tmp_path, + tools=[ + make_spec("tool_a", dependencies=["pandas", "numpy"]), + make_spec("tool_b", dependencies=["numpy", "scipy"]), + ], + ) mock_run.return_value = MagicMock(returncode=0, stdout="ok", stderr="") text = call_tool(server, "install_dependencies", {}) @@ -435,16 +409,27 @@ def test_install_all(self, mock_run, tmp_path): assert "tool_a" in text assert "tool_b" in text cmd = mock_run.call_args[0][0] - assert cmd == ["uv", "pip", "install", "--python", sys.executable, - "pandas", "numpy", "scipy"] - - @patch("dsagt.commands.registry_server.subprocess.run") + assert cmd == [ + "uv", + "pip", + "install", + "--python", + sys.executable, + "pandas", + "numpy", + "scipy", + ] + + @patch("dsagt.mcp.registry_tools.subprocess.run") def test_install_single_tool(self, mock_run, tmp_path): """install_dependencies with tool_name targets only that tool.""" - server, reg = _make_server(tmp_path, tools=[ - make_spec("tool_a", dependencies=["pandas"]), - make_spec("tool_b", dependencies=["scipy"]), - ]) + server, reg = _make_server( + tmp_path, + tools=[ + make_spec("tool_a", dependencies=["pandas"]), + make_spec("tool_b", dependencies=["scipy"]), + ], + ) mock_run.return_value = MagicMock(returncode=0, stdout="ok", stderr="") text = call_tool(server, "install_dependencies", {"tool_name": "tool_b"}) @@ -471,6 +456,7 @@ def test_tools_without_deps(self, tmp_path): # KB-backed tool indexing and search # --------------------------------------------------------------------------- + def _make_server_with_kb(tmp_path, tools=None): """Create (server, registry, kb) with a real local-embedding KnowledgeBase. @@ -484,7 +470,7 @@ def _make_server_with_kb(tmp_path, tools=None): runtime_dir = tmp_path / "runtime" project_tools_dir = runtime_dir / "tools" project_tools_dir.mkdir(parents=True, exist_ok=True) - for spec in (tools or []): + for spec in tools or []: _write_tool(project_tools_dir, spec) kb = KnowledgeBase( @@ -497,12 +483,7 @@ def _make_server_with_kb(tmp_path, tools=None): runtime_dir=str(runtime_dir), kb=kb, ) - skill_reg = SkillRegistry( - source_skills_dir=None, # use package default (empty bundled is fine) - runtime_dir=str(runtime_dir), - kb=kb, - ) - server = create_registry_server(reg, kb, skill_reg) + server = create_registry_server(reg, kb) return server, reg, kb @@ -515,10 +496,16 @@ def test_save_tool_indexes_into_kb(self, tmp_path): server, reg, kb = _make_server_with_kb(tmp_path) - call_tool(server, "save_tool_spec", {"spec": make_spec( - name="csv_filter", - description="Filter CSV rows by column value", - )}) + call_tool( + server, + "save_tool_spec", + { + "spec": make_spec( + name="csv_filter", + description="Filter CSV rows by column value", + ) + }, + ) results = kb.search("filter", collection=TOOL_REGISTRY_COLLECTION) assert len(results) > 0 @@ -542,12 +529,20 @@ def test_search_registry_by_name_not_found(self, tmp_path): def test_search_registry_semantic(self, tmp_path): """Semantic search finds tools by description similarity.""" server, reg, kb = _make_server_with_kb(tmp_path) - call_tool(server, "save_tool_spec", {"spec": make_spec( - name="csv_filter", - description="Filter and remove rows from a CSV spreadsheet based on column values", - )}) + call_tool( + server, + "save_tool_spec", + { + "spec": make_spec( + name="csv_filter", + description="Filter and remove rows from a CSV spreadsheet based on column values", + ) + }, + ) - text = call_tool(server, "search_registry", {"query": "delete rows from tabular data"}) + text = call_tool( + server, "search_registry", {"query": "delete rows from tabular data"} + ) assert "csv_filter" in text def test_search_registry_by_tag(self, tmp_path): @@ -562,7 +557,9 @@ def test_search_registry_by_tag(self, tmp_path): spec_other["tags"] = ["data_processing"] call_tool(server, "save_tool_spec", {"spec": spec_other}) - text = call_tool(server, "search_registry", {"query": "tool", "tag": "genomics"}) + text = call_tool( + server, "search_registry", {"query": "tool", "tag": "genomics"} + ) assert "fastp" in text def test_reindex_all(self, tmp_path): @@ -571,7 +568,9 @@ def test_reindex_all(self, tmp_path): server, reg, kb = _make_server_with_kb( tmp_path, - tools=[make_spec(name="preexisting", description="Already registered tool")], + tools=[ + make_spec(name="preexisting", description="Already registered tool") + ], ) # Skills were copied to runtime on init but not indexed (KB was empty) @@ -592,9 +591,12 @@ def test_no_kb_query_search_returns_explicit_error(self, tmp_path): and produced dramatically worse search results without telling anyone. """ - server, reg = _make_server(tmp_path, tools=[ - make_spec(name="csv_filter", description="Filter CSV rows"), - ]) + server, reg = _make_server( + tmp_path, + tools=[ + make_spec(name="csv_filter", description="Filter CSV rows"), + ], + ) text = call_tool(server, "search_registry", {"query": "csv"}) assert "csv_filter" not in text # the substring match must NOT happen diff --git a/tests/test_server_startup.py b/tests/test_server_startup.py index fdf5fd4..6d3960e 100644 --- a/tests/test_server_startup.py +++ b/tests/test_server_startup.py @@ -1,160 +1,54 @@ """ -Process-level MCP server startup tests. +Process-level ``dsagt-server`` entry-point tests. -Each server is spawned as a subprocess and must complete the MCP handshake -without an API key or network access. End-to-end ingest/search flows -belong in smoke_test (which drives them through the real agent), not here. +The single merged server (``dsagt.mcp.server:main``) is spawned as a subprocess +to verify the entry point is wired and fails fast + clearly on a misconfigured +project — without needing a live MLflow backend or network access. + +The full boot (init_tracing → shared KB → 23-tool MCP handshake) requires a +running MLflow server, so it is exercised by ``dsagt smoke-test`` (real agent), +not here. The 23-tool composition + dispatch contract is unit-tested in-process +by ``test_dsagt_server.py``; ``_build_kb_from_config``'s credential validation by +``test_dsagt_server.py::TestBuildKbFromConfig``. """ -import os import shutil import sys -from pathlib import Path import pytest -from mcp_helpers import mcp_initialize, mcp_list_tools, start_server - - -# --------------------------------------------------------------------------- -# Skip conditions -# --------------------------------------------------------------------------- +from mcp_helpers import start_server _uv_available = shutil.which("uv") is not None pytestmark = pytest.mark.skipif(not _uv_available, reason="uv not available") - -def _write_minimal_config( - project_dir: Path, - project_name: str = "test", - backend: str = "api", -) -> None: - """Write the minimum dsagt_config.yaml the servers need to start. - - Default backend is ``"api"`` here (with fake credentials) so startup - doesn't trigger a sentence-transformers model load — keeps these - tests fast. Tests that need to exercise the local backend pass - ``backend="local"`` explicitly. - """ - import yaml - config = { - "project": project_name, - "agent": "claude", - "embedding": { - "backend": backend, - "model": "test-model", - "base_url": "http://localhost:9999", - "api_key": "test-fake-key", - }, - "knowledge": { - "chunk_size": 1024, - "vector_db": "chroma", - "rerank": False, - }, - } - (project_dir / "dsagt_config.yaml").write_text( - yaml.dump(config, default_flow_style=False) - ) +_SERVER_CMD = [sys.executable, "-m", "dsagt.mcp.server"] -# --------------------------------------------------------------------------- -# Startup tests (no API key needed) -# --------------------------------------------------------------------------- - -class TestRegistryServerStartup: - - def test_starts_and_lists_tools(self, tmp_path): - """Registry server starts without embedding credentials (kb=None) - and completes the MCP handshake.""" - import yaml as _yaml - project = tmp_path / "runtime" - project.mkdir() - # backend="api" with empty credentials → kb_available=False → - # registry runs with kb=None. This is the fast path the test - # cares about (no ChromaDB / sentence-transformers init). With - # backend="local" (the default for new projects) registry would - # eagerly load the local embedder, which is the slow path - # exercised by other tests. - (project / "dsagt_config.yaml").write_text(_yaml.dump({ - "project": "test", - "agent": "claude", - "embedding": { - "backend": "api", - "model": "", "base_url": "", "api_key": "", - }, - "knowledge": {"chunk_size": 1024, "vector_db": "chroma", "rerank": False}, - })) - proc = start_server( - [sys.executable, "-m", "dsagt.commands.registry_server"], - env={ - "DSAGT_PROJECT_DIR": str(project), - # init_tracing raises without a backend; give it a local - # file store so the handshake proceeds without needing an - # MLflow server. - "MLFLOW_TRACKING_URI": f"file://{tmp_path}/mlruns", - }, - ) - try: - resp = mcp_initialize(proc) - assert "result" in resp, f"Init failed: {resp}" +class TestServerEntryPoint: - tools_resp = mcp_list_tools(proc) - assert "result" in tools_resp, f"list_tools failed: {tools_resp}" + def test_fails_fast_without_project_config(self, tmp_path): + """Run from a dir with no dsagt_config.yaml → clean fail-fast. - tool_names = [t["name"] for t in tools_resp["result"]["tools"]] - assert "save_tool_spec" in tool_names - assert "install_dependencies" in tool_names - finally: - proc.terminate() - proc.wait(timeout=5) - - -class TestKnowledgeServerStartup: - - def test_starts_and_lists_tools(self, tmp_path): - """Knowledge server starts, completes MCP handshake, and lists tools. - - Uses a fake API key — server accepts it at init time since it only - validates the key is non-empty. Actual API calls would fail, but we - don't make any here. + ``dsagt-server`` discovers its project from cwd; launched anywhere else + it must say so rather than boot a half-configured server. """ - project = tmp_path / "runtime" - project.mkdir() - _write_minimal_config(project) - proc = start_server( - [sys.executable, "-m", "dsagt.commands.knowledge_server"], - env={ - "DSAGT_PROJECT_DIR": str(project), - "LLM_API_KEY": "test-fake-key-for-startup", - "MLFLOW_TRACKING_URI": f"file://{tmp_path}/mlruns", - }, - ) - try: - resp = mcp_initialize(proc) - assert "result" in resp, f"Init failed: {resp}" - - tools_resp = mcp_list_tools(proc) - assert "result" in tools_resp, f"list_tools failed: {tools_resp}" - - tool_names = [t["name"] for t in tools_resp["result"]["tools"]] - assert "kb_search" in tool_names - assert "kb_list_collections" in tool_names - assert "kb_ingest" in tool_names - finally: - proc.terminate() - proc.wait(timeout=5) - - def test_api_backend_fails_fast_without_api_key(self, tmp_path): - """When the user explicitly opts into backend='api', the knowledge - server must fail at startup if the api key is missing — better than - booting with a broken embedder that 401s on the first kb_search. - - With the post-refactor default of backend='local' (no creds - needed), this hard-fail only fires for users who have explicitly - said they want the api backend. + empty = tmp_path / "empty" + empty.mkdir() + proc = start_server(_SERVER_CMD, cwd=str(empty)) + rc = proc.wait(timeout=15) + assert rc != 0 + assert "no dsagt_config.yaml in cwd" in proc.stderr.read() + + def test_fails_fast_without_observability_backend(self, tmp_path): + """A project config lacking ``mlflow.port`` → init_tracing fails fast. + + The merged server requires an observability backend (it autologs every + LLM call into MLflow); booting without one is a misconfiguration. """ import yaml + project = tmp_path / "runtime" project.mkdir() config = { @@ -164,22 +58,15 @@ def test_api_backend_fails_fast_without_api_key(self, tmp_path): "backend": "api", "model": "test-model", "base_url": "http://localhost:9999", - "api_key": "${LLM_API_KEY}", # unresolved placeholder + "api_key": "test-fake-key", }, "knowledge": {"chunk_size": 1024, "vector_db": "chroma", "rerank": False}, + # no mlflow.port } (project / "dsagt_config.yaml").write_text( yaml.dump(config, default_flow_style=False) ) - - stripped = {"LLM_API_KEY", "OPENAI_API_KEY"} - clean_env = {k: v for k, v in os.environ.items() if k not in stripped} - clean_env["DSAGT_PROJECT_DIR"] = str(project) - clean_env["MLFLOW_TRACKING_URI"] = f"file://{tmp_path}/mlruns" - - proc = start_server( - [sys.executable, "-m", "dsagt.commands.knowledge_server"], - env=clean_env, - ) - rc = proc.wait(timeout=10) - assert rc != 0, "backend='api' without api_key must fail at startup" + proc = start_server(_SERVER_CMD, cwd=str(project)) + rc = proc.wait(timeout=15) + assert rc != 0 + assert "no observability backend configured" in proc.stderr.read() diff --git a/tests/test_skill_discovery.py b/tests/test_skill_discovery.py new file mode 100644 index 0000000..489bcf5 --- /dev/null +++ b/tests/test_skill_discovery.py @@ -0,0 +1,228 @@ +"""Unit tests for the keyword scorer and SkillRouter (no network, no embedder). + +The router is exercised in its KB-free keyword mode against a real +``SkillRegistry`` (bundled skills suppressed via an empty source dir) plus a +fake catalog cache, and in KB mode against a small fake KnowledgeBase. +""" + +import json + +from dsagt.registry import SkillRegistry +from dsagt.skills import SkillRouter, rank_skills, score_skill + + +def _mkskill(d, name, desc): + d.mkdir(parents=True, exist_ok=True) + (d / "SKILL.md").write_text( + f"---\nname: {name}\ndescription: {desc}\n---\n# {name}\nbody\n" + ) + return d + + +def _registry(tmp_path, skills=None): + """SkillRegistry with bundled skills suppressed and optional project skills.""" + empty_bundled = tmp_path / "no_bundled" + empty_bundled.mkdir() + reg = SkillRegistry( + runtime_dir=tmp_path / "proj", + source_skills_dir=str(empty_bundled), + kb=None, + ) + for name, desc in (skills or {}).items(): + _mkskill(reg.skills_dir / name, name, desc) + return reg + + +class FakeKB: + """Minimal duck-typed KnowledgeBase: collections + search + index_dir.""" + + def __init__(self, collections, hits_by_collection=None, index_dir="/tmp/none"): + self.collections = list(collections) + self._hits = hits_by_collection or {} + self.index_dir = index_dir + + def search(self, query, collection, top_k=5): + return self._hits.get(collection, [])[:top_k] + + +def _hit(name, text, source, score, tags=""): + return { + "chunk": { + "metadata": {"skill_name": name, "source": source, "tags": tags}, + "text": text, + }, + "score": score, + } + + +# --------------------------------------------------------------------------- +# keyword scorer +# --------------------------------------------------------------------------- + + +def test_score_name_token_weighs_more_than_description(): + name_hit = score_skill("slurm", "slurm-submit", "unrelated text") + desc_hit = score_skill("slurm", "other", "submit a slurm job") + assert name_hit > desc_hit + + +def test_score_exact_name_beats_substring(): + exact = score_skill("datacard", "datacard", "x") + substr = score_skill("datacard", "datacard-generator", "x") + assert exact > substr > 0 + + +def test_score_substring_description_bonus(): + assert score_skill("batch job", "x", "submit a batch job") > 0 + + +def test_stopwords_do_not_score(): + # only stopwords overlap → no score + assert score_skill("the and of", "the skill", "and of the") == 0.0 + + +def test_empty_query_scores_zero(): + assert score_skill("", "anything", "anything") == 0.0 + + +def test_substring_bonuses_are_mutually_exclusive(): + # Genesis parity: at most ONE of the +6/+4/+2 substring bonuses fires. + # name-token 2 + desc-token 1 + exact-name 6 = 9; the desc-substring +2 is + # NOT also added (a stacking bug would give 11). + assert score_skill("alpha", "alpha", "alpha tool") == 9.0 + + +def test_single_char_tokens_dropped(): + # "x" is a single char → not a token, so no name-token overlap. + assert score_skill("x", "x", "y") == 6.0 # only the exact-name bonus + + +def test_rank_orders_and_breaks_ties_by_name(): + skills = [ + {"name": "zeta", "description": "submit jobs"}, + {"name": "alpha", "description": "submit jobs"}, + {"name": "unrelated", "description": "nothing here"}, + ] + ranked = rank_skills("submit", skills, top_k=5) + names = [s["name"] for s, _ in ranked] + assert names == ["alpha", "zeta"] # equal score → name asc; unrelated dropped + + +# --------------------------------------------------------------------------- +# keyword-mode search (kb is None) +# --------------------------------------------------------------------------- + + +def test_search_keyword_is_catalog_only(tmp_path): + # Installed skills are NOT search candidates (they're natively discovered); + # only the cached catalog is keyword-scored. + reg = _registry(tmp_path, {"slurm-submit": "submit a batch job to slurm"}) + cache = tmp_path / "cache" + _mkskill( + cache / "genesis" / "slurm-catalog", + "slurm-catalog", + "submit a batch job to slurm", + ) + r = SkillRouter(skill_registry=reg, cache_dir=cache) + out = r.search("slurm batch") + assert "slurm-catalog" in out # catalog skill found + assert "slurm-submit" not in out # installed skill NOT surfaced by search + assert "[catalog · install_skill to add]" in out + + +def test_search_keyword_includes_catalog_cache(tmp_path): + reg = _registry(tmp_path) + cache = tmp_path / "cache" + _mkskill( + cache / "genesis-skills" / "croissant", + "croissant-validator", + "validate a croissant metadata file", + ) + r = SkillRouter(skill_registry=reg, cache_dir=cache) + out = r.search("croissant") + assert "croissant-validator" in out + assert "[catalog · install_skill to add]" in out + + +def test_search_is_stateless(tmp_path): + # No recency queue: repeating a query yields the same result, no suppression. + reg = _registry(tmp_path) + cache = tmp_path / "cache" + _mkskill(cache / "src" / "slurm-x", "slurm-x", "submit a batch job to slurm") + r = SkillRouter(skill_registry=reg, cache_dir=cache) + first = r.search("slurm") + second = r.search("slurm") + assert first == second + assert "Found 1 skill" in second + + +def test_search_exact_name_is_kb_free(tmp_path): + reg = _registry(tmp_path, {"datacard-gen": "make a dataset card"}) + r = SkillRouter(skill_registry=reg) + out = r.search(skill_name="datacard-gen") + assert "datacard-gen" in out + assert r.search(skill_name="nope").startswith("No skill named") + + +# --------------------------------------------------------------------------- +# KB-mode search +# --------------------------------------------------------------------------- + + +def test_search_kb_merges_catalog_collections(tmp_path): + # Only skills_catalog__* collections are searched; the installed 'skills' + # collection is ignored even if present. + kb = FakeKB( + collections=["skills", "skills_catalog__a", "skills_catalog__b"], + hits_by_collection={ + "skills": [_hit("installed-one", "installed", "registered", 0.99)], + "skills_catalog__a": [_hit("cat-a", "catalog a", "catalog:a", 0.9)], + "skills_catalog__b": [_hit("cat-b", "catalog b", "catalog:b", 0.5)], + }, + ) + r = SkillRouter(skill_registry=_registry(tmp_path), kb=kb) + out = r.search("anything") + assert "installed-one" not in out # installed collection not searched + assert "cat-a" in out and "cat-b" in out + assert out.index("cat-a") < out.index("cat-b") # higher score first + + +def test_search_kb_tag_filter(tmp_path): + kb = FakeKB( + collections=["skills_catalog__x"], + hits_by_collection={ + "skills_catalog__x": [ + _hit("tagged", "x", "catalog:x", 0.9, tags="hpc,slurm"), + _hit("untagged", "y", "catalog:x", 0.8, tags=""), + ] + }, + ) + r = SkillRouter(skill_registry=_registry(tmp_path), kb=kb) + out = r.search("x", tag="slurm") + assert "tagged" in out and "untagged" not in out + + +# --------------------------------------------------------------------------- +# list_sources +# --------------------------------------------------------------------------- + + +def test_list_sources_flags_synced(tmp_path): + from dsagt.skills import KNOWN_SOURCES, _repo_slug + from dsagt.registry import catalog_collection + + genesis_coll = catalog_collection(_repo_slug(KNOWN_SOURCES["genesis"]["url"])) + index_dir = tmp_path / "idx" + (index_dir / genesis_coll).mkdir(parents=True) + (index_dir / genesis_coll / "chroma_ids.json").write_text( + json.dumps(["1", "2", "3"]) + ) + + kb = FakeKB(collections=[genesis_coll], index_dir=str(index_dir)) + # list_sources needs only a KB — no skill_registry required. + r = SkillRouter(kb=kb) + sources = {s["name"]: s for s in r.list_sources()} + assert sources["genesis"]["synced"] is True + assert sources["genesis"]["indexed"] == 3 + assert sources["anthropic"]["synced"] is False + assert sources["anthropic"]["indexed"] == 0 diff --git a/tests/test_skill_tools.py b/tests/test_skill_tools.py new file mode 100644 index 0000000..76485a3 --- /dev/null +++ b/tests/test_skill_tools.py @@ -0,0 +1,200 @@ +""" +Tests for the skill MCP tools (save_skill, search_skills, install_skill, +add_skill_source, list_skill_sources). + +The skill surface lives in :mod:`dsagt.mcp.skill_tools`; ``create_skill_server`` +exposes just that concern for driving via the MCP helpers. Handlers return a +mix of ``str`` (save/search/install) and ``dict`` (add/list sources), so the two +``call_tool`` helpers are both used. +""" + +import json +from unittest.mock import MagicMock + +import pytest + +from dsagt.mcp.skill_tools import create_skill_server +from dsagt.registry import SkillRegistry +from mcp_helpers import call_tool_json, call_tool_sync + + +def _make_skill_server(tmp_path): + """Create (server, skill_registry, kb) with a real local-embedding KB. + + The skill registry is rooted at ``/runtime`` so save_skill writes to + ``/runtime/skills//`` — the project layer the agent natively + discovers. + """ + from dsagt.knowledge import KnowledgeBase + + runtime_dir = tmp_path / "runtime" + runtime_dir.mkdir(parents=True, exist_ok=True) + kb = KnowledgeBase( + index_dir=tmp_path / "kb_index", + default_embedder="local", + default_index="chroma", + ) + skill_reg = SkillRegistry( + source_skills_dir=None, # package default (empty bundled is fine) + runtime_dir=str(runtime_dir), + kb=kb, + ) + server = create_skill_server(skill_reg, kb, runtime_dir=str(runtime_dir)) + return server, skill_reg, kb + + +# --------------------------------------------------------------------------- +# save_skill +# --------------------------------------------------------------------------- + + +class TestSaveSkill: + + def test_add_new_skill_creates_files_and_indexes(self, tmp_path): + """save_skill writes SKILL.md and the skill count goes up by one. + + The count includes any bundled skills that ship in the package (see + SkillRegistry.list_skills which merges bundled + project layers), so we + assert the file was created and the count incremented rather than + equality on a specific number. + """ + server, skill_reg, kb = _make_skill_server(tmp_path) + before = len(skill_reg.list_skills()) + + spec = { + "name": "csv_inspector", + "description": "Workflow for inspecting CSV columns and quality", + "tags": ["data_management", "quality_control"], + } + body = "# csv_inspector\n\nFirst, run head on the file. Then check nulls.\n" + text = call_tool_sync(server, "save_skill", {"spec": spec, "body": body}) + + assert "added" in text + skill_md = tmp_path / "runtime" / "skills" / "csv_inspector" / "SKILL.md" + assert skill_md.exists() + content = skill_md.read_text() + assert "csv_inspector" in content + assert "First, run head" in content + after = len(skill_reg.list_skills()) + assert after == before + 1 + + def test_update_existing_skill_preserves_body_when_omitted(self, tmp_path): + """Saving a spec for an existing skill without body keeps the body.""" + server, skill_reg, kb = _make_skill_server(tmp_path) + first_body = "# orig\n\nOriginal workflow body.\n" + call_tool_sync( + server, + "save_skill", + { + "spec": {"name": "wf", "description": "v1"}, + "body": first_body, + }, + ) + # Update the description only — body should be preserved. + text = call_tool_sync( + server, + "save_skill", + { + "spec": {"name": "wf", "description": "v2 description"}, + }, + ) + assert "updated" in text + skill_md = tmp_path / "runtime" / "skills" / "wf" / "SKILL.md" + content = skill_md.read_text() + assert "v2 description" in content + assert "Original workflow body" in content + + def test_save_skill_writes_reference_files(self, tmp_path): + """reference_files dict lands as additional files in the skill dir.""" + server, skill_reg, kb = _make_skill_server(tmp_path) + text = call_tool_sync( + server, + "save_skill", + { + "spec": {"name": "with_template", "description": "Has a template"}, + "body": "# with_template\n\nUses template.json.\n", + "reference_files": {"template.json": '{"foo": "bar"}\n'}, + }, + ) + assert "added" in text + skill_dir = tmp_path / "runtime" / "skills" / "with_template" + assert (skill_dir / "SKILL.md").exists() + assert (skill_dir / "template.json").read_text() == '{"foo": "bar"}\n' + + def test_save_skill_string_encoded_spec(self, tmp_path): + """MCP clients that JSON-encode nested object args still work.""" + server, skill_reg, kb = _make_skill_server(tmp_path) + spec_json = json.dumps({"name": "s1", "description": "d"}) + text = call_tool_sync(server, "save_skill", {"spec": spec_json, "body": "x"}) + assert "added" in text + + +# --------------------------------------------------------------------------- +# search_skills +# --------------------------------------------------------------------------- + + +class TestSearchSkills: + + def test_search_skills_empty_catalog_hints_to_sync(self, tmp_path): + """With no catalog synced, search_skills explains how to enable one + instead of returning a bare 'no match' the agent reads as exhausted.""" + server, skill_reg, kb = _make_skill_server(tmp_path) + + text = call_tool_sync(server, "search_skills", {"query": "vasp pymatgen dft"}) + assert "No catalog skills found" in text + assert "no external skill catalog is synced" in text.lower() + assert "add_skill_source" in text + + +# --------------------------------------------------------------------------- +# install_skill +# --------------------------------------------------------------------------- + + +class TestInstallSkill: + + def test_install_skill_routes_and_reports_missing(self, tmp_path): + """install_skill is registered and reports a clean error when the + named skill isn't in any synced catalog.""" + server, skill_reg, kb = _make_skill_server(tmp_path) + text = call_tool_sync( + server, + "install_skill", + {"skill_name": "zzz-definitely-not-a-real-skill-xyz"}, + ) + assert "No catalog skill" in text + + +# --------------------------------------------------------------------------- +# skill sources (add_skill_source / list_skill_sources) +# --------------------------------------------------------------------------- + + +@pytest.fixture +def mock_kb(tmp_path): + kb = MagicMock() + kb.index_dir = tmp_path / "kb_index" + kb.index_dir.mkdir() + kb.collections = [] + return kb + + +class TestSkillSources: + + def test_list_skill_sources_returns_known(self, mock_kb): + server = create_skill_server(kb=mock_kb) + result = call_tool_json(server, "list_skill_sources", {}) + assert "scientific" in result["sources"] + # Nothing synced → every known source flagged available, not synced. + assert result["sources"]["scientific"]["synced"] is False + assert result["sources"]["scientific"]["indexed"] == 0 + assert result["other_synced_collections"] == [] + assert "scientific" in result["note"] + + def test_add_skill_source_bad_source_errors(self, mock_kb): + server = create_skill_server(kb=mock_kb) + result = call_tool_json( + server, "add_skill_source", {"source": "not-a-real-known-name"} + ) + assert "error" in result diff --git a/tests/test_skills_catalog.py b/tests/test_skills_catalog.py new file mode 100644 index 0000000..7586e59 --- /dev/null +++ b/tests/test_skills_catalog.py @@ -0,0 +1,383 @@ +"""Unit tests for the external skill catalog (fetch / index / install) and +the native-skill mirror. No network: ``clone_github`` is monkeypatched and +the KB is a lightweight fake that records ``add_entries`` calls.""" + +import json + +import pytest + +from dsagt.agents.base import ( + _NATIVE_DESCRIPTION_CAP, + _SKILL_MANIFEST, + _mirror_skills_to, +) +from dsagt import skills as sc +from dsagt.registry import CATALOG_COLLECTION_PREFIX, catalog_collection + + +def _mkskill(d, name, desc="a short description"): + d.mkdir(parents=True, exist_ok=True) + (d / "SKILL.md").write_text( + f"---\nname: {name}\ndescription: {desc}\n---\n# {name}\nbody\n" + ) + return d + + +# --------------------------------------------------------------------------- +# slug + source resolution +# --------------------------------------------------------------------------- + + +def test_repo_slug_is_collection_safe(): + slug = sc._repo_slug("https://github.com/K-Dense-AI/scientific-agent-skills") + assert slug == "k-dense-ai-scientific-agent-skills" + assert sc._repo_slug("git@github.com:Foo/Bar.git") == "foo-bar" + + +def test_repo_slug_is_host_agnostic(): + # Non-GitHub hosts (GitLab, etc.) reduce to owner-repo, scheme/host dropped. + assert sc._repo_slug("https://gitlab.osti.gov/genesis/genesis-skills") == ( + "genesis-genesis-skills" + ) + assert sc._repo_slug("git@gitlab.osti.gov:genesis/genesis-skills.git") == ( + "genesis-genesis-skills" + ) + + +def test_known_source_genesis_covers_whole_skills_tree(): + spec = sc.resolve_source("genesis") + assert spec["url"] == "https://gitlab.osti.gov/genesis/genesis-skills" + # subdir scopes the recursive SKILL.md walk to the whole skills/ tree so + # every category (hpc, huggingface, langchain, …) is discoverable. + assert spec["subdir"] == "skills" + assert spec["branch"] == "main" + + +def test_persist_source_to_config_appends_and_dedupes(tmp_path): + import yaml + + cfg = tmp_path / "dsagt_config.yaml" + cfg.write_text(yaml.dump({"project": "p", "skills": {"sources": []}})) + spec = { + "name": "anthropic", + "url": "https://github.com/anthropics/skills", + "branch": "main", + } + assert sc.persist_source_to_config(tmp_path, spec) is True + sources = yaml.safe_load(cfg.read_text())["skills"]["sources"] + assert sources[-1]["name"] == "anthropic" + # Idempotent: same URL is not appended twice. + assert sc.persist_source_to_config(tmp_path, spec) is False + assert len(yaml.safe_load(cfg.read_text())["skills"]["sources"]) == 1 + # No config file → no-op, no crash. + assert sc.persist_source_to_config(tmp_path / "nope", spec) is False + + +def test_resolve_source_known_url_and_shorthand(): + assert ( + sc.resolve_source("scientific")["url"] == sc.KNOWN_SOURCES["scientific"]["url"] + ) + assert ( + sc.resolve_source("https://github.com/a/b")["url"] == "https://github.com/a/b" + ) + assert sc.resolve_source("a/b")["url"] == "https://github.com/a/b" + with pytest.raises(ValueError): + sc.resolve_source("not-a-known-name") + + +# --------------------------------------------------------------------------- +# discovery +# --------------------------------------------------------------------------- + + +def test_discover_skill_dirs_flat_and_nested(tmp_path): + root = tmp_path / "skills" + _mkskill(root / "flat", "flat") + _mkskill(root / "domain" / "nested", "nested") + # A dir whose SKILL.md has no name is ignored. + bad = root / "noname" + bad.mkdir(parents=True) + (bad / "SKILL.md").write_text("---\ndescription: x\n---\nbody") + names = sorted(p.name for p in sc._discover_skill_dirs(tmp_path)) + assert names == ["flat", "nested"] + + +# --------------------------------------------------------------------------- +# find + install +# --------------------------------------------------------------------------- + + +def test_find_catalog_skill_and_ambiguity(tmp_path): + cache = tmp_path / "cache" + _mkskill(cache / "srcA" / "skills" / "alpha", "alpha") + found = sc.find_catalog_skill("alpha", cache_dir=cache) + assert found.name == "alpha" + with pytest.raises(LookupError): + sc.find_catalog_skill("missing", cache_dir=cache) + # Same skill name in a second source → ambiguous. + _mkskill(cache / "srcB" / "skills" / "alpha", "alpha") + with pytest.raises(LookupError, match="multiple sources"): + sc.find_catalog_skill("alpha", cache_dir=cache) + + +def test_find_catalog_skill_source_qualified(tmp_path): + cache = tmp_path / "cache" + _mkskill(cache / "srcA" / "skills" / "alpha", "alpha") + _mkskill(cache / "srcB" / "skills" / "alpha", "alpha") + + # A "/" qualifier disambiguates which source to install from. + a = sc.find_catalog_skill("srcA/alpha", cache_dir=cache) + b = sc.find_catalog_skill("srcB/alpha", cache_dir=cache) + assert a.relative_to(cache).parts[0] == "srcA" + assert b.relative_to(cache).parts[0] == "srcB" + assert a.name == b.name == "alpha" + + # Qualifying with a source that lacks the skill is a clear, source-scoped miss. + with pytest.raises(LookupError, match="in source 'srcA'"): + sc.find_catalog_skill("srcA/missing", cache_dir=cache) + + +def test_install_into_project_source_qualified(tmp_path): + cache = tmp_path / "cache" + _mkskill(cache / "srcA" / "skills" / "dup", "dup", desc="from A") + _mkskill(cache / "srcB" / "skills" / "dup", "dup", desc="from B") + proj = tmp_path / "proj" + proj.mkdir() + + # Bare ambiguous name refuses; the source-qualified form installs srcB's copy. + with pytest.raises(LookupError, match="multiple sources"): + sc.install_into_project("dup", proj, cache_dir=cache) + info = sc.install_into_project("srcB/dup", proj, cache_dir=cache) + assert info["name"] == "dup" + assert (proj / "skills" / "dup" / "SKILL.md").read_text().count("from B") == 1 + + +def test_install_into_project_copies_subdirs(tmp_path): + cache = tmp_path / "cache" + skill = _mkskill(cache / "src" / "vasp-to-isaac", "vasp-to-isaac") + (skill / "scripts").mkdir() + (skill / "scripts" / "run.py").write_text("print(1)") + (skill / "references").mkdir() + (skill / "references" / "spec.md").write_text("# spec") + + proj = tmp_path / "proj" + proj.mkdir() + info = sc.install_into_project("vasp-to-isaac", proj, cache_dir=cache) + dest = proj / "skills" / "vasp-to-isaac" + assert info["action"] == "added" + assert (dest / "SKILL.md").exists() + assert (dest / "scripts" / "run.py").exists() + assert (dest / "references" / "spec.md").exists() + # Re-install reports "updated". + assert ( + sc.install_into_project("vasp-to-isaac", proj, cache_dir=cache)["action"] + == "updated" + ) + + +# --------------------------------------------------------------------------- +# sync_source (mocked clone + fake KB) +# --------------------------------------------------------------------------- + + +class _FakeKB: + def __init__(self, index_dir): + self.index_dir = index_dir + self.collections = [] + self.adds = [] # (collection, metadatas) + + def add_entries(self, texts, collection, metadatas=None): + self.adds.append((collection, metadatas)) + if collection not in self.collections: + self.collections.append(collection) + return {"collection": collection, "entries_added": len(texts)} + + +def test_sync_source_indexes_per_source_collection(tmp_path, monkeypatch): + # Fake clone: populate dest/ with two skills. + def fake_clone(url, dest, branch="main", include=None): + sub = include[0] if include else "" + base = dest / sub if sub else dest + _mkskill(base / "s1", "s1") + _mkskill(base / "s2", "s2") + + monkeypatch.setattr("dsagt.commands.setup_core_kb.clone_github", fake_clone) + + kb = _FakeKB(tmp_path / "kb_index") + cache = tmp_path / "cache" + stats = sc.sync_source( + {"url": "https://github.com/x/y", "branch": "main", "subdir": "skills"}, + kb=kb, + cache_dir=cache, + ) + slug = sc._repo_slug("https://github.com/x/y") + coll = catalog_collection(slug) + assert stats["discovered"] == 2 and stats["indexed"] == 2 + assert coll.startswith(CATALOG_COLLECTION_PREFIX) + added_coll, metas = kb.adds[-1] + assert added_coll == coll + assert all(m["source"] == f"catalog:{slug}" for m in metas) + assert {m["skill_name"] for m in metas} == {"s1", "s2"} + + +# --------------------------------------------------------------------------- +# native mirror +# --------------------------------------------------------------------------- + + +def test_mirror_manifest_preserves_user_skills_and_reaps(tmp_path): + target = tmp_path / ".claude" / "skills" + target.mkdir(parents=True) + # A user-authored skill dsagt must never touch. + _mkskill(target / "user-skill", "user-skill") + + bundled = _mkskill(tmp_path / "bundled" / "skill-creator", "skill-creator") + proj = _mkskill(tmp_path / "proj" / "alpha", "alpha") + + _mirror_skills_to(target, [bundled, proj]) + assert sorted(p.name for p in target.iterdir() if p.is_dir()) == [ + "alpha", + "skill-creator", + "user-skill", + ] + manifest = json.loads((target / _SKILL_MANIFEST).read_text()) + assert manifest == ["alpha", "skill-creator"] + assert "user-skill" not in manifest + + # Re-run with skill-creator gone → reaped; user-skill preserved. + _mirror_skills_to(target, [proj]) + assert sorted(p.name for p in target.iterdir() if p.is_dir()) == [ + "alpha", + "user-skill", + ] + + +def test_mirror_truncates_long_description(tmp_path): + long_desc = "x" * (_NATIVE_DESCRIPTION_CAP + 500) + src = _mkskill(tmp_path / "src" / "big", "big", desc=long_desc) + target = tmp_path / ".claude" / "skills" + _mirror_skills_to(target, [src]) + + import yaml + + mirrored = (target / "big" / "SKILL.md").read_text() + front = yaml.safe_load(mirrored.split("---", 2)[1]) + assert len(front["description"]) <= _NATIVE_DESCRIPTION_CAP + # Source untouched. + assert len((src / "SKILL.md").read_text()) > _NATIVE_DESCRIPTION_CAP + + +# --------------------------------------------------------------------------- +# AgentSetup.setup_skills — per-agent native-dir mirror +# --------------------------------------------------------------------------- + + +@pytest.mark.parametrize( + "agent,subdir", + [ + ("claude", ".claude/skills"), + ("goose", ".agents/skills"), + ("cline", ".cline/skills"), + ("roo", ".roo/skills"), + ("codex", ".agents/skills"), + ], +) +def test_setup_skills_mirrors_into_native_dir(tmp_path, agent, subdir): + from dsagt.agents import AGENTS + + _mkskill(tmp_path / "skills" / "myskill", "myskill") # a project skill + actions = AGENTS[agent]().setup_skills(tmp_path, {}) + target = tmp_path + for part in subdir.split("/"): + target = target / part + assert (target / "myskill" / "SKILL.md").exists() + assert any("kill" in a for a in actions) # reported a mirror action + + +def test_setup_skills_respects_populate_native_false(tmp_path): + from dsagt.agents import AGENTS + + _mkskill(tmp_path / "skills" / "myskill", "myskill") + actions = AGENTS["claude"]().setup_skills( + tmp_path, {"skills": {"populate_native": False}} + ) + assert actions == [] + assert not (tmp_path / ".claude" / "skills").exists() + + +# --------------------------------------------------------------------------- +# install_into_project — license / attribution capture +# --------------------------------------------------------------------------- + + +def test_install_captures_ancestor_attribution(tmp_path): + cache = tmp_path / "cache" + repo = cache / "srcrepo" + repo.mkdir(parents=True) + (repo / "LICENSE").write_text("Apache-2.0") # repo-root license + cat = repo / "skills" / "modcon" + cat.mkdir(parents=True) + (cat / "ATTRIBUTION.md").write_text("upstream credits") # per-subtree + _mkskill(cat / "myskill", "myskill") + + proj = tmp_path / "proj" + proj.mkdir() + info = sc.install_into_project("myskill", proj, cache_dir=cache) + dest = proj / "skills" / "myskill" + assert (dest / "SKILL.md").exists() + assert (dest / "ATTRIBUTION.md").read_text() == "upstream credits" + assert (dest / "LICENSE").read_text() == "Apache-2.0" + prov = (dest / "PROVENANCE.txt").read_text() + assert "srcrepo" in prov and "skills/modcon/myskill" in prov + assert set(info["attribution"]) == {"ATTRIBUTION.md", "LICENSE"} + + +def test_install_skill_local_license_wins(tmp_path): + cache = tmp_path / "cache" + repo = cache / "srcrepo" + repo.mkdir(parents=True) + (repo / "LICENSE").write_text("ROOT") # repo-root license + skill = _mkskill(repo / "myskill", "myskill") + (skill / "LICENSE").write_text("SKILL-LOCAL") # skill bundles its own + + proj = tmp_path / "proj" + proj.mkdir() + info = sc.install_into_project("myskill", proj, cache_dir=cache) + dest = proj / "skills" / "myskill" + # The skill's own LICENSE (copied by copytree) must not be overwritten. + assert (dest / "LICENSE").read_text() == "SKILL-LOCAL" + assert "LICENSE" not in info["attribution"] + + +# --------------------------------------------------------------------------- +# index_catalog — frontmatter-only embedding (progressive disclosure) +# --------------------------------------------------------------------------- + + +def test_index_catalog_embeds_frontmatter_not_body(tmp_path): + captured = {} + + class _KB: + index_dir = tmp_path / "idx" + collections: list = [] + + def add_entries(self, texts, collection, metadatas=None): + captured["texts"] = texts + captured["metas"] = metadatas + return {} + + skill = tmp_path / "myskill" + skill.mkdir() + (skill / "SKILL.md").write_text( + "---\nname: myskill\ndescription: does a thing\ntags: [hpc, slurm]\n---\n" + "# Body\nSECRET_BODY_MARKER should not be embedded.\n" + ) + dirs = sc._discover_skill_dirs(tmp_path) + sc.index_catalog(dirs, "slug", "http://x", _KB()) + + joined = " ".join(captured["texts"]) + assert "myskill" in joined and "does a thing" in joined # frontmatter embedded + assert "hpc" in joined and "slurm" in joined # tags embedded + assert "SECRET_BODY_MARKER" not in joined # body NOT embedded + # description is also carried in metadata for the search summary. + assert captured["metas"][0]["description"] == "does a thing" diff --git a/use_cases/genesis_skills/README.md b/use_cases/genesis_skills/README.md new file mode 100644 index 0000000..8c22c50 --- /dev/null +++ b/use_cases/genesis_skills/README.md @@ -0,0 +1,175 @@ +# DSAgt Demo: Genesis Skills for a Data-Curation Pipeline + +An end-to-end **data-preparation** walkthrough that flexes the skill catalog +against the **Genesis** source (OSTI GitLab). The agent pulls in the +BASE-Data/ModCon curation skills, grounds itself in domain context loaded into +the **knowledge base**, then prepares and **datacards a finished dataset**. + +The "finished product" is a small curated dataset — a CO2-methanation **catalyst +screen** (`mock_data/dataset/catalyst_screening.csv`, 8 rows) — plus the domain +docs that describe how it was produced. Everything is tiny, so the whole thing +runs in seconds with no real instruments or HPC. + +## What this demonstrates + +1. **Genesis skills, agent-facing.** `add_skill_source genesis` syncs the OSTI + GitLab catalog; `search_skills` finds the ModCon curation skills + (`generating-datacards`, `croissant-validator`); `install_skill` draws them + into the project where the agent **natively** auto-discovers them. +2. **Domain → knowledge base.** `kb_ingest` indexes the data dictionary + + measurement protocol into a project collection, so the agent describes + methods and provenance accurately (via `kb_search`) instead of guessing. +3. **Datacard for a finished product.** The installed `generating-datacards` + skill runs over the dataset — pulling field definitions, methodology, and + license from the KB-ingested domain docs — and emits a datacard, then + `croissant-validator` checks the Croissant metadata. + +The three tiers in one sentence: **catalog = searchable but not in context; +installed = native and auto-invoked; KB = retrievable domain grounding.** + +## Setup + +Assumes DSAgt (`uv sync --all-groups`) and Claude Code +(`npm i -g @anthropic-ai/claude-code`) are installed, plus git **with network +access to `gitlab.osti.gov`** (the Genesis catalog clones from OSTI GitLab, not +GitHub). Embedding credentials are optional — `search_skills` / `kb_search` use +semantic search when `EMBEDDING_*` is set and fall back to a keyword scorer +otherwise (configure it for sharper relevance over the domain docs). + +```bash +dsagt setup-kb # bundled tools/skills + core KB +dsagt init genesis-skills --agent claude +cp -r use_cases/genesis_skills/mock_data ~/dsagt-projects/genesis-skills/mock_data +dsagt start genesis-skills # mirrors skill-creator into .claude/skills/ before launch +``` + +The Genesis catalog is **project-scoped** and not synced by `init`, so the agent +enables it in step 1 below (or run `dsagt skills add genesis-skills genesis` from +a shell first). + +## Walkthrough + +Paste each prompt into Claude Code (running inside the project), one at a time, +and check the expected behavior. + +### 1 — Enable the Genesis source +> Enable the "genesis" skill source so we have the GENESIS / ModCon data-curation skills available. Then tell me how many skills it indexed. + +*Expect:* `add_skill_source(source="genesis")` → a shallow clone of OSTI GitLab, +**74** skills indexed into `skills_catalog__genesis-genesis-skills`, source +written to `dsagt_config.yaml`. (Two upstream skills — including +`datacard-generator` — have technically-invalid YAML frontmatter; dsagt recovers +their name/description with a lenient fallback rather than dropping them, so they +*are* searchable.) Confirm from a shell: + +```bash +dsagt skills list genesis-skills --catalog # expect the skills_catalog__genesis-genesis-skills collection +``` + +### 2 — Find and install the datacard skill (catalog, NOT in context) +Search for the two deliverables **separately** — a single "datacard *and* Croissant" +query lets the validator outrank the generator. First the datacard generator: + +> Search the catalog for a skill that creates a datacard / dataset documentation for a dataset, then install the best match into this project. + +*Expect:* `search_skills` (catalog hits tagged `[catalog · install_skill to add]`, +**`generating-datacards`** ranked top) → `install_skill(skill_name="generating-datacards")`; +the reply notes it'll be native after the next start and that a `PROVENANCE.txt` +(Genesis source) was written. + +### 3 — Find and install the Croissant validator +> Now search the catalog for a skill that validates Croissant / JSON-LD dataset metadata, and install the best match. + +*Expect:* `search_skills` (**`croissant-validator`** ranked top) → +`install_skill(skill_name="croissant-validator")`. **Verify** both installs (each +lands with any `scripts/`/`references/`): + +```bash +ls ~/dsagt-projects/genesis-skills/skills/ +cat ~/dsagt-projects/genesis-skills/skills/generating-datacards/PROVENANCE.txt +``` + +### 4 — Ingest the domain docs into the KB +> Ingest the domain docs under mock_data/domain/ into a new knowledge-base collection called "methanation_domain". Poll until it finishes, then tell me what's in it. + +*Expect:* `kb_ingest(folder_path="mock_data/domain", collection_name="methanation_domain")` +returns a `job_id`; the agent polls `kb_job_status` to completion, then +`kb_list_collections` shows `methanation_domain` (2 docs). **Verify:** + +```bash +dsagt info genesis-skills # the new collection appears in the KB summary +``` + +### 5 — Retrieve domain grounding +> Using the knowledge base, what reactor conditions were used for the CO2 conversion measurement, and what license applies to this dataset? + +*Expect:* `kb_search` over `methanation_domain` → **250 °C, 1 atm, H2:CO2 = 4:1, +GHSV 12,000**; license **CC-BY-4.0** (pulled from the protocol doc, not guessed). + +### 6 — Generate the datacard for the finished dataset +> Use the generating-datacards skill to write a datacard for mock_data/dataset/catalyst_screening.csv. Pull the field definitions, measurement methodology, provenance, and license from the methanation_domain knowledge-base collection — don't invent them. Save it to audit/catalyst_screening_datacard.md. Then compare your sections against mock_data/expected_datacard.md and report anything missing. + +*Expect:* the agent reads the installed skill's `SKILL.md`, queries the KB, +computes basic stats from the 8-row CSV, and writes +`audit/catalyst_screening_datacard.md` covering summary / provenance / schema / +methodology / stats / limitations / license. + +### 7 — Validate the metadata +> Use the croissant-validator skill to check the Croissant/JSON-LD metadata for this dataset (generate it from the datacard if needed), and report any schema errors. + +*Expect:* the validator skill runs and reports a clean pass or names specific +schema issues. + +### 8 — Inspect the tiers (run in a shell) +```bash +dsagt skills list genesis-skills # installed: skill-creator + generating-datacards + croissant-validator +dsagt skills list genesis-skills --catalog # catalog: skills_catalog__genesis-genesis-skills +dsagt info genesis-skills # KB shows methanation_domain +ls ~/dsagt-projects/genesis-skills/.claude/skills/ +cat ~/dsagt-projects/genesis-skills/.claude/skills/.dsagt-managed.json +ls ~/dsagt-projects/genesis-skills/audit/ # catalyst_screening_datacard.md +``` + +Restart Claude (`dsagt start genesis-skills` again) to pick up the installed +Genesis skills as native auto-invoked skills. + +## Post-Conditions + +1. The KB holds a `skills_catalog__genesis-genesis-skills` collection + (searchable via `search_skills`, absent from Claude's context) **and** a + `methanation_domain` document collection (retrievable via `kb_search`). +2. `generating-datacards` and `croissant-validator` are installed into + `/skills/`, mirrored into `.claude/skills/`, each with a + `PROVENANCE.txt` crediting the Genesis source. +3. `audit/catalyst_screening_datacard.md` was produced for the finished dataset, + grounded in the KB-ingested domain docs, covering the sections in + `mock_data/expected_datacard.md`. +4. The Croissant metadata validates (or its errors are reported). +5. `.claude/skills/.dsagt-managed.json` tracks exactly the dsagt-placed skills. + +## Cleanup + +```bash +dsagt stop genesis-skills +dsagt rm genesis-skills # add -y to skip the prompt +``` + +The shared catalog cache lives at `~/dsagt-projects/.skill_sources/` and is +reused across projects; delete it to force a fresh clone. + +## Notes + +- The Genesis catalog is hosted on **OSTI GitLab** (`gitlab.osti.gov`), not + GitHub — reached the same way as any other source (`add_skill_source` / + `dsagt skills add … genesis`); only the host differs. +- `generating-datacards` is the frontmatter *name* of the skill whose directory + is `datacard-generator` in the Genesis repo — `install_skill` accepts either. + It and `croissant-validator` live under Genesis's `modcon-skills/` category + (the BASE-Data team's own skills), so this demo is DSAgt consuming its sibling + project's curated skills. +- `mock_data/` is intentionally tiny and illustrative. `expected_datacard.md` is + a *shape* to check coverage against, not a byte-for-byte answer — the installed + skill owns the authoritative template. +- Sister demo: [`isaac_skills_demo`](../isaac_skills_demo/) flexes the same + catalog → install → native-mirror loop plus authoring a new skill with + `skill-creator`, against the K-Dense `scientific` (GitHub) source. diff --git a/use_cases/genesis_skills/mock_data/dataset/catalyst_screening.csv b/use_cases/genesis_skills/mock_data/dataset/catalyst_screening.csv new file mode 100644 index 0000000..b18c4b9 --- /dev/null +++ b/use_cases/genesis_skills/mock_data/dataset/catalyst_screening.csv @@ -0,0 +1,9 @@ +sample_id,composition,calcination_temp_C,bet_surface_area_m2_g,co2_conversion_pct,ch4_selectivity_pct,test_date,operator +CAT-001,Ni/Al2O3,500,142.3,38.5,91.2,2026-03-02,jlee +CAT-002,Ni-Ce/Al2O3,500,138.7,46.1,93.8,2026-03-02,jlee +CAT-003,Ni-Ce/Al2O3,650,121.4,52.7,95.1,2026-03-03,jlee +CAT-004,Ru/TiO2,400,88.6,29.3,88.4,2026-03-03,akumar +CAT-005,Ru-K/TiO2,400,85.1,34.8,90.7,2026-03-04,akumar +CAT-006,Co/SiO2,550,210.9,41.2,84.6,2026-03-04,akumar +CAT-007,Co-Mn/SiO2,550,198.3,49.5,87.9,2026-03-05,jlee +CAT-008,Fe/ZrO2,600,76.2,33.1,79.5,2026-03-05,akumar diff --git a/use_cases/genesis_skills/mock_data/domain/data_dictionary.md b/use_cases/genesis_skills/mock_data/domain/data_dictionary.md new file mode 100644 index 0000000..db37622 --- /dev/null +++ b/use_cases/genesis_skills/mock_data/domain/data_dictionary.md @@ -0,0 +1,25 @@ +# Data Dictionary — CO2 Methanation Catalyst Screening + +One row per catalyst sample tested in the fixed-bed reactor screen. + +| Column | Type | Units | Definition | +|---|---|---|---| +| `sample_id` | string | — | Unique catalyst identifier (`CAT-NNN`). Primary key. | +| `composition` | string | — | Active metal / promoter / support, e.g. `Ni-Ce/Al2O3` = Ni + Ce promoter on alumina. | +| `calcination_temp_C` | integer | °C | Calcination temperature during catalyst preparation. | +| `bet_surface_area_m2_g` | float | m²/g | BET specific surface area (N2 physisorption, see protocol). | +| `co2_conversion_pct` | float | % | CO2 converted at steady state (250 °C, 1 atm, H2:CO2 = 4:1). | +| `ch4_selectivity_pct` | float | % | Carbon selectivity to CH4 (balance: CO + higher hydrocarbons). | +| `test_date` | date | ISO 8601 | Date the reactor run was performed. | +| `operator` | string | — | Lab notebook operator initials. | + +## Controlled vocabularies + +- **Supports:** `Al2O3`, `TiO2`, `SiO2`, `ZrO2`. +- **Promoters:** `Ce`, `K`, `Mn` (optional; absent for unpromoted catalysts). + +## Quality rules + +- `co2_conversion_pct` and `ch4_selectivity_pct` are in `[0, 100]`. +- `bet_surface_area_m2_g > 0`. +- Every `sample_id` is unique and matches `^CAT-\d{3}$`. diff --git a/use_cases/genesis_skills/mock_data/domain/measurement_protocol.md b/use_cases/genesis_skills/mock_data/domain/measurement_protocol.md new file mode 100644 index 0000000..7ce2251 --- /dev/null +++ b/use_cases/genesis_skills/mock_data/domain/measurement_protocol.md @@ -0,0 +1,40 @@ +# Measurement Protocol — Methanation Screen v2 + +This protocol describes how the values in `catalyst_screening.csv` were +produced. It is domain context for the curation agent: ingest it into the +knowledge base so the agent can describe methods + provenance accurately in the +datacard without re-deriving them. + +## Catalyst preparation + +Catalysts were prepared by incipient-wetness impregnation of the support with +aqueous metal-nitrate precursors, dried at 110 °C overnight, and calcined in +static air for 4 h at the temperature recorded in `calcination_temp_C`. +Promoted samples (Ce, K, Mn) were co-impregnated. + +## BET surface area + +Specific surface area (`bet_surface_area_m2_g`) was measured by N2 physisorption +at 77 K on a Micromeritics ASAP 2020, multipoint BET in the relative-pressure +range 0.05–0.30, after degassing at 200 °C for 6 h. + +## Reactor screen + +Activity was measured in a fixed-bed quartz reactor, 50 mg catalyst diluted with +SiC, at **250 °C, 1 atm, H2:CO2 = 4:1, GHSV = 12,000 mL·g⁻¹·h⁻¹**. Steady state +was reached after 60 min on stream. Effluent was analyzed by online GC-TCD/FID. + +- `co2_conversion_pct` = (CO2_in − CO2_out) / CO2_in × 100 +- `ch4_selectivity_pct` = CH4 carbon / (total converted carbon) × 100 + +## Known limitations + +- Single-run measurements (no replicate error bars in this screen). +- Selectivity excludes trace C2+ (< 0.5 %), lumped into the balance. +- Intended for **relative** ranking of formulations, not absolute kinetics. + +## Provenance + +- Instrument campaign: `methanation-screen-2026Q1` +- Raw GC traces + reactor logs archived in the lab LIMS under the same campaign id. +- License for the curated dataset: **CC-BY-4.0**. diff --git a/use_cases/genesis_skills/mock_data/expected_datacard.md b/use_cases/genesis_skills/mock_data/expected_datacard.md new file mode 100644 index 0000000..4c57c96 --- /dev/null +++ b/use_cases/genesis_skills/mock_data/expected_datacard.md @@ -0,0 +1,43 @@ +# Expected Datacard Shape (reference target) + +This is the *shape* the generated datacard should cover — not a byte-for-byte +answer. The installed `generating-datacards` skill owns the authoritative +template; this file just lists the sections the agent should populate from the +dataset + the KB-ingested domain docs, so you can spot anything missing. + +--- + +## Dataset: CO2 Methanation Catalyst Screen (2026 Q1) + +**Summary.** 8 catalyst formulations screened for CO2 methanation activity and +CH4 selectivity in a fixed-bed reactor. One row per sample. + +### Provenance +- Campaign: `methanation-screen-2026Q1` +- Curated from `catalyst_screening.csv`; methods per the lab protocol. +- License: **CC-BY-4.0** + +### Schema +A row per `sample_id` with: `composition`, `calcination_temp_C` (°C), +`bet_surface_area_m2_g` (m²/g), `co2_conversion_pct` (%), `ch4_selectivity_pct` +(%), `test_date`, `operator`. (Field units + definitions pulled from the data +dictionary.) + +### Collection methodology +- Catalysts: incipient-wetness impregnation, calcined 4 h in static air. +- BET: N2 physisorption, 77 K, multipoint (P/P0 0.05–0.30). +- Activity: 250 °C, 1 atm, H2:CO2 = 4:1, GHSV 12,000 mL·g⁻¹·h⁻¹, steady state at 60 min. + +### Descriptive statistics (computed from the data) +- Rows: 8; samples unique; no missing values. +- `co2_conversion_pct`: range ~29–53 %. +- `ch4_selectivity_pct`: range ~80–95 %. + +### Limitations / caveats +- Single-run (no replicate error bars). +- Relative ranking, not absolute kinetics. +- Selectivity excludes trace C2+ (< 0.5 %). + +### Quality checks +- `sample_id` matches `^CAT-\d{3}$`, unique. +- Percentages in `[0, 100]`; `bet_surface_area_m2_g > 0`. diff --git a/use_cases/isaac_skills_demo/README.md b/use_cases/isaac_skills_demo/README.md new file mode 100644 index 0000000..16502d6 --- /dev/null +++ b/use_cases/isaac_skills_demo/README.md @@ -0,0 +1,190 @@ +# DSAgt Demo: Skill-Driven VASP → ISAAC Conversion + +A lightweight mock of the [`isaac_vasp`](../isaac_vasp/) workflow, built to **vet +the skill-management feature**. It follows the same arc as `isaac_vasp` — install +the **pymatgen** skill, author a `vasp-to-isaac` converter that parses VASP output +**with pymatgen**, and emit an ISAAC record — but the agent **discovers, syncs, +installs, and authors** those skills itself, surfaced through Claude Code's +*native* skill discovery. It uses tiny **mock VASP outputs** (`mock_data/`, a few +KB), so the whole thing runs in seconds with no DFT, no NERSC, and no 32 MB +OUTCAR files — yet the parsing is the real `pymatgen.io.vasp`, not a hand-rolled +stand-in. + +## What this demonstrates + +- **`list_skill_sources`** — the agent discovers what external sources it can + pull from (curated names + arbitrary git URLs) and which are synced. +- **`add_skill_source`** — the agent syncs a source (default: K-Dense + `scientific-agent-skills`, 140+ skills) into a searchable catalog that is + **not** loaded into context; with the single `dsagt-server` it's searchable + immediately, no restart. +- **`search_skills`** + **`install_skill`** — find a catalog skill (hits marked + `[catalog]`) and draw it into the project + Claude's native `.claude/skills/`. +- **`skill-creator`** — the bundled meta-skill scaffolds a new `vasp-to-isaac` + skill (from the Anthropic template) whose converter *uses the installed + pymatgen skill* to parse the VASP output — install-then-build, not install-and-ignore. +- **Native mirror** — installed + bundled skills appear under + `.claude/skills//` (tracked by `.dsagt-managed.json`), so Claude + auto-invokes them with no MCP round-trip. + +The two tiers in one sentence: **catalog = searchable but not in context; +installed = native and auto-invoked.** + +## Setup + +Assumes DSAgt (`uv sync --all-groups`) and Claude Code +(`npm i -g @anthropic-ai/claude-code`) are installed, plus git. Embedding +credentials are optional — `search_skills` uses semantic search when +`EMBEDDING_*` is set and falls back to a keyword scorer otherwise (configure it +for sharper relevance). + +```bash +dsagt setup-kb --no-skill-catalog # core KB only — the agent syncs the catalog in-session (step 3) +dsagt init isaac-skills-demo --agent claude +cp -r use_cases/isaac_skills_demo/mock_data ~/dsagt-projects/isaac-skills-demo/mock_data +dsagt start isaac-skills-demo # mirrors the bundled skill-creator into .claude/skills/ before launch +``` + +The project starts with **no external catalog synced** — that's deliberate: the +walkthrough has the agent *discover, sync, and search* it from inside the +session. Now that one `dsagt-server` owns the KB, a source the agent syncs +mid-session is **immediately searchable, no restart** — which the old two-server +split (sync in one process, search in another) couldn't do, so it had to be a +pre-`start` CLI step. + +> Plain `dsagt setup-kb` (without `--no-skill-catalog`) instead pre-syncs the +> default `scientific` source into the shared KB, which `init` then copies in. If +> you do that, step 3 below becomes an idempotent refresh — or just sync a +> different source there (e.g. `anthropic`). The `dsagt skills sync ` CLI +> still exists for scripted/headless setups. + +## Walkthrough + +Paste each prompt into Claude Code (running inside the project), one at a time. +The arc: **see what you have → find more → sync a source → install the relevant +skill → author a new one → run it.** + +### 1 — What do we have? (native discovery) +> Do you have a skill available for scaffolding new skills? Name it and give me a one-line summary of what it does. + +*Expect:* Claude names **`skill-creator`** and summarizes it — discovered +**natively, with no MCP call** and no file digging. That's the mirror working: +`dsagt start` copied the bundled `skill-creator` into `.claude/skills/`, so Claude +sees its name + description like any native skill (and loads the full `SKILL.md` +only when the skill is invoked — progressive disclosure). `search_skills` is for +the not-yet-installed *catalog* only, so it should not fire here. You confirm +*which* skills dsagt placed from a shell in step 7 (`cat .dsagt-managed.json`) — +that manifest is dsagt's internal mirror bookkeeping, not something the agent +reads. + +### 2 — Where can we find more skills? +> Where can I get more skills from? List the skill sources you can pull from and which are already synced. + +*Expect:* `list_skill_sources` → the known sources (`scientific`, `anthropic`, +`antigravity`, `composio`, `genesis`) with URLs, each flagged **available, not +synced** (nothing is synced yet on a `--no-skill-catalog` setup). + +### 3 — Sync skills from an external repo +> Sync the "scientific" source so we can search its catalog. + +*Expect:* `add_skill_source(source="scientific")` → a shallow clone of K-Dense +`scientific-agent-skills`, ~140 skills indexed into +`skills_catalog__k-dense-ai-scientific-agent-skills`, source persisted to +`dsagt_config.yaml`. Because it's one `dsagt-server`, the catalog is searchable +**immediately** — the next prompt can hit it with no restart. + +### 4 — Add the relevant skill +> Search the catalog for a skill that helps parse VASP output with pymatgen, then install the most relevant one into this project. + +*Expect:* `search_skills` (catalog hits tagged `[catalog · install_skill to add]`, +`pymatgen` at/near the top) → `install_skill(skill_name="pymatgen")`; the reply +notes it'll be native after the next start. The installed `pymatgen` skill carries +the reference docs (`pymatgen.io.vasp.Incar` / `Poscar` / `Outcar`) the converter +uses next. **Verify** the skill dir (with any `scripts/`/`references/`) landed: + +```bash +ls ~/dsagt-projects/isaac-skills-demo/skills/ +``` + +### 5 — Create the converter skill with skill-creator +This mirrors `isaac_vasp`'s `vasp-to-isaac` skill — the lightweight version reads +the small mock directory, but does the parsing with **real pymatgen**, the same +way the full workflow does (just without the heavy `vasprun.xml`). + +> Use the skill-creator skill to author a new project skill named "vasp-to-isaac". Following the pymatgen skill you just installed, its converter should use `pymatgen.io.vasp` — `Incar.from_file` (ENCUT, NSW, ISPIN, LDAUU), `Poscar.from_file` (formula, atom counts), and `Outcar` (final energy, energy(sigma->0), total magnetization, max force) — to read a VASP slab calc directory and emit an ISAAC-style JSON record. The mock has no vasprun.xml, so take energy/forces from the OUTCAR. Target the shape in mock_data/expected_isaac_record.json. Save it with save_skill. + +*Expect:* the agent reads `skill-creator`'s template + the `pymatgen` skill's IO +docs, then `save_skill` writes `/skills/vasp-to-isaac/` whose script +imports `pymatgen.io.vasp` (not a hand-rolled regex parser). + +### 6 — Run the converter on the mock data +> Invoke the vasp-to-isaac skill on mock_data/mock_slab/ and write the result to audit/mock_slab_isaac.json. Then diff its structure and values against mock_data/expected_isaac_record.json and report any differences. + +*Expect:* pymatgen parses the mock dir and the agent writes +`audit/mock_slab_isaac.json` with the key fields **pymatgen extracted** — final +energy ≈ -132.8421 eV (`Outcar.final_energy`), 12 atoms (`Poscar`), ENCUT 520 / +NSW 50 (`Incar`), total mag ≈ 8.0123 (`Outcar.total_mag`) — matching the +reference. (`pymatgen` must be importable in the project env — see Notes.) + +### 7 — Inspect the tiers (run in a shell) +```bash +dsagt skills list isaac-skills-demo # installed: skill-creator + pymatgen + vasp-to-isaac +dsagt skills list isaac-skills-demo --catalog # catalog: skills_catalog__k-dense-ai-scientific-agent-skills +ls ~/dsagt-projects/isaac-skills-demo/.claude/skills/ +cat ~/dsagt-projects/isaac-skills-demo/.claude/skills/.dsagt-managed.json +``` + +The manifest lists only the skills **dsagt** placed; any skill you hand-create +under `.claude/skills/` is never touched. Restart Claude +(`dsagt start isaac-skills-demo` again) to pick up newly-mirrored skills as +native auto-invoked skills. + +## Post-Conditions + +1. The KB holds the `skills_catalog__k-dense-ai-scientific-agent-skills` + collection, **synced in-session by the agent** (step 3), searchable via + `search_skills` but absent from Claude's context. +2. The `pymatgen` catalog skill was installed into `/skills/` and + mirrored into `.claude/skills/`. +3. A new `vasp-to-isaac` skill, authored via `skill-creator` and parsing with + **pymatgen** (`pymatgen.io.vasp`), exists and is native-discoverable. +4. `audit/mock_slab_isaac.json` was produced from the mock VASP directory by + pymatgen and matches the ISAAC shape + values. +5. `.claude/skills/.dsagt-managed.json` tracks exactly the dsagt-placed skills. + +## Cleanup + +```bash +dsagt stop isaac-skills-demo +dsagt rm isaac-skills-demo # add -y to skip the prompt +``` + +The shared catalog cache lives at `~/dsagt-projects/.skill_sources/` and is +reused across projects; delete it to force a fresh clone. + +## Notes + +- **pymatgen must be importable** in the project env to run step 6 (the converter + uses `pymatgen.io.vasp`, exactly as `isaac_vasp` does). Install it once — + `pip install pymatgen` (or `uv pip install pymatgen`) into the same environment + `dsagt` runs in; the installed `pymatgen` skill's `references/` document this. + This is the demo's one real dependency — "lightweight" is about the data, not + avoiding pymatgen. +- `mock_data/` is intentionally tiny and **not** real DFT output, but it is *valid + VASP format*: the INCAR/POSCAR parse cleanly, and the OUTCAR stub keeps exactly + the lines pymatgen's `Outcar` reads (TOTEN, `energy(sigma->0)`, magnetization, + the force block) while omitting the ~250k-line SCF/eigenvalue blocks. No + `vasprun.xml` (it'd be large), so the converter takes energy/forces from OUTCAR. +- With the default **local** embedder (`bge-small`), absolute search scores are + low (~0.03) because short queries under-score long SKILL.md text — *ranking* is + still correct (pymatgen #1). Switch `embedding.backend` to `api` for sharper + relevance. With no embedder at all, `search_skills` falls back to keyword + scoring; `install_skill` and the native mirror are pure filesystem ops. +- Add **more** sources the same way, in-session or from a shell — e.g. ask the + agent to "enable the anthropic source", or run + `dsagt skills add isaac-skills-demo antigravity` (or `composio`, `genesis`, or + any `https://github.com/owner/repo`). Each lands in its own + `skills_catalog__*` collection. +- Sister demo: [`genesis_skills`](../genesis_skills/) flexes the same catalog → + install → native loop plus KB domain ingest and datacard generation, against + the Genesis (OSTI GitLab) source. diff --git a/use_cases/isaac_skills_demo/mock_data/expected_isaac_record.json b/use_cases/isaac_skills_demo/mock_data/expected_isaac_record.json new file mode 100644 index 0000000..5d03192 --- /dev/null +++ b/use_cases/isaac_skills_demo/mock_data/expected_isaac_record.json @@ -0,0 +1,63 @@ +{ + "isaac_record_version": "1.05", + "record_id": "mock-iro2-110-slab-0001", + "record_type": "dft_calculation", + "record_domain": "materials", + "source_type": "MOCK", + "_note": "Reference SHAPE/values for the isaac_skills_demo. Derived from the tiny mock_slab/ stub, NOT a real DFT run.", + "timestamps": { + "created_utc": "2026-06-01T10:00:00Z" + }, + "sample": { + "material": { + "name": "IrO2 (110) slab", + "formula": "IrO2", + "provenance": "mock" + }, + "sample_form": "surface_slab" + }, + "system": { + "domain": "materials", + "technique": "DFT", + "instrument": { + "instrument_type": "compute", + "instrument_name": "VASP", + "vendor_or_project": "VASP" + }, + "configuration": { + "code_version": "6.3.2", + "compute_architecture": "cpu", + "cores": null + } + }, + "computation": { + "method": { + "family": "DFT", + "functional_class": "GGA", + "functional_name": "PBE", + "pseudopotential": "PAW", + "cutoff_eV": 520.0, + "spin_treatment": "collinear", + "hubbard_u": {"Ir": 4.0} + }, + "relaxation": { + "is_relaxation": true, + "nsw": 50, + "converged": true, + "ionic_steps": 50 + } + }, + "results": { + "total_energy_eV": -132.84210000, + "energy_sigma0_eV": -132.84210000, + "n_atoms": 12, + "n_species": {"Ir": 4, "O": 8}, + "total_magnetization_muB": 8.0123, + "max_residual_force_eV_per_A": 0.011 + }, + "assets": [ + {"name": "POSCAR", "role": "structure_input"}, + {"name": "INCAR", "role": "calc_parameters"}, + {"name": "OUTCAR", "role": "calc_output"} + ] +} diff --git a/use_cases/isaac_skills_demo/mock_data/mock_slab/INCAR b/use_cases/isaac_skills_demo/mock_data/mock_slab/INCAR new file mode 100644 index 0000000..4c80632 --- /dev/null +++ b/use_cases/isaac_skills_demo/mock_data/mock_slab/INCAR @@ -0,0 +1,18 @@ +# MOCK INCAR — slab ionic relaxation (NSW > 0 marks this as a slab calc) +SYSTEM = IrO2(110) mock slab +ISTART = 0 +ICHARG = 2 +ENCUT = 520 +ISMEAR = 0 +SIGMA = 0.05 +IBRION = 2 +NSW = 50 +ISIF = 2 +EDIFF = 1E-5 +EDIFFG = -0.02 +ISPIN = 2 +LDAU = .TRUE. +LDAUTYPE = 2 +LDAUL = 2 -1 +LDAUU = 4.0 0.0 +GGA = PE diff --git a/use_cases/isaac_skills_demo/mock_data/mock_slab/OUTCAR b/use_cases/isaac_skills_demo/mock_data/mock_slab/OUTCAR new file mode 100644 index 0000000..4640af5 --- /dev/null +++ b/use_cases/isaac_skills_demo/mock_data/mock_slab/OUTCAR @@ -0,0 +1,42 @@ + MOCK OUTCAR — heavily truncated stub for the skills demo. NOT real VASP output. + Only the few lines a converter typically greps for are kept; the SCF/eigenvalue + blocks that make a real OUTCAR ~250k lines are omitted on purpose. + + vasp.6.3.2 mock build + executed on LinuxIFC date 2026.06.01 10:00:00 + + INCAR: + ENCUT = 520.0 + ISPIN = 2 + NSW = 50 + LDAUU = 4.000 0.000 + + energy without entropy= -123.45678901 energy(sigma->0) = -123.40000000 + ... + FREE ENERGIE OF THE ION-ELECTRON SYSTEM (eV) + --------------------------------------------------- + free energy TOTEN = -132.10000000 eV (ionic step 1) + + energy without entropy= -131.98000000 energy(sigma->0) = -131.99000000 + + FREE ENERGIE OF THE ION-ELECTRON SYSTEM (eV) + --------------------------------------------------- + free energy TOTEN = -132.84210000 eV (ionic step 50, converged) + + energy without entropy= -132.80000000 energy(sigma->0) = -132.84210000 + + POSITION TOTAL-FORCE (eV/Angst) + ----------------------------------------------------------------------------------- + 0.00000 0.00000 7.04000 0.000000 0.000000 -0.004000 + 3.19250 3.24900 7.04000 0.000000 0.000000 0.003000 + 0.00000 0.00000 9.90000 0.000000 0.000000 0.011000 + ----------------------------------------------------------------------------------- + + magnetization (x) + number of electron 192.0000000 magnetization 8.0123000 + + General timing and accounting informations for this job: + ======================================================== + Total CPU time used (sec): 4210.123 + Elapsed time (sec): 1130.456 + reached required accuracy - stopping structural energy minimisation diff --git a/use_cases/isaac_skills_demo/mock_data/mock_slab/POSCAR b/use_cases/isaac_skills_demo/mock_data/mock_slab/POSCAR new file mode 100644 index 0000000..5101962 --- /dev/null +++ b/use_cases/isaac_skills_demo/mock_data/mock_slab/POSCAR @@ -0,0 +1,21 @@ +IrO2 (110) mock slab [MOCK — not real DFT input] +1.0 + 6.3850000000 0.0000000000 0.0000000000 + 0.0000000000 6.4980000000 0.0000000000 + 0.0000000000 0.0000000000 22.0000000000 + Ir O + 4 8 +Selective dynamics +Direct + 0.0000 0.0000 0.3200 F F F + 0.5000 0.5000 0.3200 F F F + 0.0000 0.0000 0.4500 T T T + 0.5000 0.5000 0.4500 T T T + 0.2500 0.2500 0.3850 T T T + 0.7500 0.7500 0.3850 T T T + 0.2500 0.7500 0.3850 T T T + 0.7500 0.2500 0.3850 T T T + 0.2500 0.2500 0.5100 T T T + 0.7500 0.7500 0.5100 T T T + 0.2500 0.7500 0.5100 T T T + 0.7500 0.2500 0.5100 T T T