diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml new file mode 100644 index 0000000..0cae487 --- /dev/null +++ b/.github/workflows/docs.yml @@ -0,0 +1,32 @@ +name: Documentation + +on: + push: + branches: [main] + pull_request: + branches: [main] + workflow_dispatch: + +jobs: + docs: + runs-on: ubuntu-latest + permissions: + contents: write + steps: + - uses: actions/checkout@v4 + with: + fetch-depth: 0 + + - uses: actions/setup-python@v5 + with: + python-version: "3.12" + + - name: Install docs dependencies + run: pip install mkdocs-material + + - name: Build docs + run: mkdocs build --strict + + - name: Deploy to GitHub Pages + if: github.ref == 'refs/heads/main' && github.event_name != 'pull_request' + run: mkdocs gh-deploy --force diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..5724f82 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,43 @@ +# Architecture + +![DSAgt architecture](assets/architecture.png) + +DSAgt wraps an unmodified agent CLI with four independently-operable layers. Each layer exposes its own MCP server so the agent discovers and invokes capabilities through the standard MCP tool protocol. + +## Layers + +**Tool Registry** (`dsagt-registry-server`) +The agent registers CLI tools as markdown files with YAML frontmatter under `/tools/`. The registry server handles dependency installation via `uv run --with` and wraps every execution with `dsagt-run` for provenance capture. The agent discovers tools via `search_registry`. + +**Knowledge Base** (`dsagt-knowledge-server`) +Semantic search over six independently-partitioned ChromaDB collections. Three are global (populated by `dsagt setup-kb`); three are per-project (filled automatically during use). Background jobs handle long ingest operations. The agent searches via `kb_search`, ingests via `kb_ingest`, and saves user-confirmed facts via `kb_remember`. + +**Provenance** (`dsagt-run`) +A thin wrapper invoked by the registry server around every tool execution. Records the command, arguments, exit code, duration, file counts, and truncated stderr to `/trace_archive/.json` and emits an OTLP span to MLflow. The agent calls `reconstruct_pipeline` to render the trace archive as a reproducible bash script or Snakemake workflow. + +**Observability** (MLflow + OTLP) +MLflow runs locally at a port pinned at `dsagt init` time. All four layers emit OTLP HTTP spans to MLflow's `/v1/traces` endpoint. The agent's own LLM-call traces land in the same store when you export the `OTEL_EXPORTER_OTLP_ENDPOINT` printed by `dsagt init`. + +## Project Layout + +``` +~/dsagt-projects// + dsagt_config.yaml # project configuration + tools/ # registered CLI tool specs (markdown + YAML frontmatter) + tools/code/ # agent-written tool scripts + skills/ # agent skills (SKILL.md + reference docs) + trace_archive/ # tool execution records (JSON, from dsagt-run) + mlflow/ # MLflow traces, metrics, artifacts + kb_index/ # knowledge base vector collections + explicit_memories.yaml # user-confirmed facts + + # Per-agent runtime config (one of, generated by dsagt init): + # claude: CLAUDE.md, .mcp.json + # goose: goose.yaml, .goosehints + # codex: AGENTS.md, .codex-data/config.toml + # opencode: AGENTS.md, opencode.json + # roo: .roomodes, .roo/mcp.json + # cline: .clinerules/, cline_mcp_settings.json +``` + +Projects are registered in `~/.dsagt/projects.yaml` so `dsagt mlflow ` and `dsagt info ` work from any directory. The data layer is agent-agnostic — re-running `dsagt init --agent ` switches agent platforms while preserving all accumulated knowledge and traces. diff --git a/docs/assets/architecture.png b/docs/assets/architecture.png new file mode 100644 index 0000000..7dbdf46 Binary files /dev/null and b/docs/assets/architecture.png differ diff --git a/docs/cli.md b/docs/cli.md new file mode 100644 index 0000000..d4b24e4 --- /dev/null +++ b/docs/cli.md @@ -0,0 +1,49 @@ +# CLI Reference + +All commands are available after running `uv sync` and activating the virtual environment (`source .venv/bin/activate`). + +## Project Management + +| Command | Description | +|---------|-------------| +| `dsagt init --agent [--location ] [--mlflow-port N]` | Create a project; write per-agent MCP config; print the launch one-liner | +| `dsagt list` | List all projects with agent, status, and path | +| `dsagt info [--json]` | Resolved config (with source per value) and a session/error summary | +| `dsagt mv ` | Move a project to a new location | +| `dsagt rm [-y] [--keep-files]` | Unregister a project and optionally delete its directory | + +## Session Lifecycle + +| Command | Description | +|---------|-------------| +| `dsagt mlflow ` | Start MLflow for a project and print OTel routing exports | +| `dsagt stop ` | Stop the MLflow daemon | +| `dsagt memory --project ` | Distill new traces from MLflow into episodic memory | + +## Setup + +| Command | Description | +|---------|-------------| +| `dsagt setup-kb [--collection ]` | Build the shared core knowledge base collections | +| `dsagt smoke-test [--agent claude\|goose\|codex\|opencode]` | End-to-end install verification | + +## Project Location + +The default project location is `~/dsagt-projects//`. Override with `--location`: + +```bash +dsagt init my-project --agent claude --location /data/runs # /data/runs/my-project/ +dsagt init my-project --agent claude --location . # ./my-project/ +``` + +## Server Commands + +These are launched automatically by `dsagt init` via the per-agent MCP config and are not typically run directly. + +| Command | Description | +|---------|-------------| +| `dsagt-registry-server` | Tool registry MCP server | +| `dsagt-knowledge-server` | Knowledge base MCP server | +| `dsagt-run` | Provenance-capturing tool execution wrapper | +| `dsagt-proxy` | LiteLLM proxy server (proxy mode only) | +| `dsagt-setup-kb` | Core knowledge base setup (called by `dsagt setup-kb`) | diff --git a/docs/developer.md b/docs/developer.md new file mode 100644 index 0000000..38cc778 --- /dev/null +++ b/docs/developer.md @@ -0,0 +1,47 @@ +# Developer Guide + +Material for contributors and users who are working beyond the default `dsagt init` → `dsagt mlflow` → agent flow. + +## Tests + +```bash +uv run python -m pytest -m "not integration" # unit tests, no creds required +uv run python -m pytest -m integration -v # integration tests (require .env) +``` + +Integration tests read endpoint and key values from `.env` at the repo root. Copy `.env.example` to `.env` and fill in your values. + +For per-flow hand-tests (CLI, proxy mode, VS Code extensions), see the scripts under [`tests/smoke_test/manual_runs/`](https://github.com/AI-ModCon/dsagt/tree/main/tests/smoke_test/manual_runs/). + +## Proxy Mode + +`dsagt init` followed by `dsagt start --enable-proxy` spawns a LiteLLM proxy in front of your agent's LLM calls. This adds: + +- Full LLM-call traces (request bodies, tool-use blocks, response payloads) in MLflow for agents whose native OTel does not emit those payloads (codex, opencode). +- Cache-breakpoint injection on outgoing requests (Anthropic prompt caching). +- Sidechannel detection for agent-internal title-generator / session-namer calls. +- Model-name aliasing — useful when an agent CLI hardcodes a model whitelist incompatible with your gateway's served names (cline, roo). + +Proxy mode reads upstream LLM credentials from `.env` or the shell. See [`tests/smoke_test/manual_runs/proxy_walkthrough.md`](https://github.com/AI-ModCon/dsagt/blob/main/tests/smoke_test/manual_runs/proxy_walkthrough.md) for the full setup walkthrough. + +## Troubleshooting + +**Agent command not found.** The agent CLI is not installed or is not on PATH. See the [supported agents table](index.md#supported-agents). + +**MCP servers not connecting.** Verify uv resolves the server commands: + +```bash +uv run which dsagt-registry-server +uv run which dsagt-knowledge-server +``` + +If missing, reinstall: `uv sync --reinstall`. + +**MLflow UI empty.** Confirm MLflow is running for the right project: + +```bash +dsagt info # shows the pinned port +curl http://localhost: +``` + +**Claude keychain conflict.** If `claude` will not authenticate against a non-default gateway, run `claude /logout` to clear the macOS Keychain OAuth token, then re-export `ANTHROPIC_BASE_URL` / `ANTHROPIC_API_KEY` and re-launch. diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..ff5eaa6 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,44 @@ +# DSAgt + +**D**ata**S**mith **Ag**en**t** — AI-assisted data pipeline builder. + +DSAgt connects an MCP-compatible AI coding agent to tool registration, a semantic knowledge base, execution provenance, and observability infrastructure. It provides data-pipeline scaffolding around your existing agent CLI or VS Code extension (Claude Code, Goose, Codex, and others). + +## Supported Agents + +| Agent | Install | Verify | +|-------|---------|--------| +| [Claude Code](https://github.com/anthropics/claude-code) | `npm i -g @anthropic-ai/claude-code` | `claude --version` | +| [Goose](https://github.com/block/goose) | See [Goose docs](https://github.com/block/goose#installation) | `goose --version` | +| [Codex](https://github.com/openai/codex) | `npm i -g @openai/codex` | `codex --version` | +| [opencode](https://github.com/sst/opencode) | See [opencode docs](https://opencode.ai/docs/) | `opencode --version` | +| [Roo Code](https://github.com/RooCodeInc/Roo-Code) | `npm i -g @roo-code/cli` | `roo --version` | +| [Cline](https://github.com/cline/cline) | `npm i -g cline` | `cline --version` | + +## Prerequisites + +- Python 3.12–3.13 +- [uv](https://github.com/astral-sh/uv) +- One of the supported agent platforms above, installed and authenticated against your LLM provider + +## Installation + +```bash +git clone https://github.com/AI-ModCon/dsagt.git +cd dsagt +uv sync +source .venv/bin/activate +``` + +## Key Capabilities + +| Layer | What it does | +|-------|-------------| +| **Tool Registry** | Register CLI tools as markdown specs; the agent discovers and runs them via `search_registry` | +| **Knowledge Base** | Semantic search over indexed document collections (ChromaDB + FAISS) | +| **Provenance** | `dsagt-run` wrapper records every tool execution to `trace_archive/` and MLflow | +| **Explicit Memory** | User-confirmed facts persisted to YAML and the knowledge base | +| **Episodic Memory** | Session distillation via outlier detection over MLflow traces | +| **Observability** | Full OTLP tracing to a local MLflow instance | + +See the [Quick Start](quickstart.md) to try all of these in a single session. diff --git a/docs/knowledge-base.md b/docs/knowledge-base.md new file mode 100644 index 0000000..3370677 --- /dev/null +++ b/docs/knowledge-base.md @@ -0,0 +1,40 @@ +# Knowledge Base + +DSAgt maintains six independently-partitioned ChromaDB collections. The first three are global (under `~/.dsagt/kb_index/`, populated by `dsagt setup-kb`); the last three are per-project (under `/kb_index/`, populated automatically during use). + +## Collections + +| Collection | Source | Populated by | +|---|---|---| +| **Tool Specs** | Bundled CLI tool specs in `src/dsagt/tools/` | `dsagt setup-kb` | +| **Skills** | Bundled skill workflows in `src/dsagt/skills/` | `dsagt setup-kb` | +| **Domain Knowledge** | NeMo Curator + AIDRIN reference corpora; user-ingested docs | `dsagt setup-kb` + agent's `kb_ingest` | +| **Explicit Memory** | User-confirmed facts | Agent's `kb_remember` (also written to `/explicit_memories.yaml`) | +| **Episodic Memory** | Distilled facts from MLflow traces | `dsagt memory --project ` | +| **Tool Use Records** | `dsagt-run` execution traces | `dsagt-run` wrapper writes JSON to `/trace_archive/`; indexed by `dsagt memory` | + +## Explicit Memory + +Explicit memories are facts the user confirms during a session. The agent saves them via `kb_remember`, which writes to both the ChromaDB collection and `/explicit_memories.yaml`. The agent fetches them via `kb_get_memories` on demand (typically when you ask it to recall something) — they are not auto-loaded at session start. + +## Episodic Memory + +`dsagt memory --project ` distills new traces from the project's MLflow store into episodic memory using per-category outlier detection over embedding centroids. Run this after each session to accumulate cross-session memory. + +## Search + +The agent searches all collections via `kb_search` (knowledge MCP server) and writes via `kb_ingest` / `kb_remember`. Tool Specs and Skills are queried through specialized routes (`search_registry`, `search_skills`) over the same backend. + +Hybrid search (dense embeddings + sparse BM25 via Reciprocal Rank Fusion) is on by default per collection route. Cross-encoder reranking is optional. + +## Setup + +```bash +dsagt setup-kb # all global collections (local embedder) +dsagt setup-kb --collection nemo_curator +dsagt setup-kb --embedding-backend api \ + --embedding-base-url \ + --embedding-api-key +``` + +The Tool Specs and Skills collections are wiped and rebuilt on every `setup-kb` run — re-run after upgrading DSAgt to pick up new bundled assets. diff --git a/docs/mcp-servers.md b/docs/mcp-servers.md new file mode 100644 index 0000000..ee999cc --- /dev/null +++ b/docs/mcp-servers.md @@ -0,0 +1,38 @@ +# MCP Servers + +DSAgt exposes its capabilities through two MCP servers. Both are launched automatically by `dsagt init` and configured in the per-agent runtime file (`.mcp.json` for Claude Code, `goose.yaml` for Goose, etc.). + +## Registry Server + +**Command:** `dsagt-registry-server` + +Handles tool registration, dependency installation, and tool discovery. + +| Tool | Description | +|------|-------------| +| `search_registry` | Semantic search over registered tool specs | +| `save_tool_spec` | Register a new CLI tool as a markdown file with YAML frontmatter | +| `install_dependencies` | Install tool dependencies via `uv run --with` | +| `reconstruct_pipeline` | Render the trace archive as a bash script or Snakemake workflow | + +Tools are markdown files with YAML frontmatter under `/tools/`. Executables are wrapped with `dsagt-run` for provenance and `uv run --with` for Python dependencies. + +## Knowledge Server + +**Command:** `dsagt-knowledge-server` + +Semantic search and ingestion over indexed document collections. + +| Tool | Description | +|------|-------------| +| `kb_search` | Search across one or more knowledge collections | +| `kb_ingest` | Index a file or directory into a named collection (runs in background for large corpora) | +| `kb_remember` | Save a user-confirmed fact to explicit memory | +| `kb_get_memories` | Retrieve explicit memories for the current project | +| `search_skills` | Discover agent skill workflows | + +### Backend + +The default embedding backend is local (`sentence-transformers`, CPU-only, no API key needed). Switch to `embedding.backend: api` in `dsagt_config.yaml` to route through a hosted embedder via LiteLLM. Cross-encoder reranking is available via `knowledge.rerank: true`. + +Hybrid search (dense + sparse BM25) is on by default and controlled per-route via the `hybrid` flag. diff --git a/docs/observability.md b/docs/observability.md new file mode 100644 index 0000000..3344a9a --- /dev/null +++ b/docs/observability.md @@ -0,0 +1,45 @@ +# Observability + +DSAgt provides end-to-end trace visibility through a local MLflow instance. All internal layers emit OTLP HTTP spans to MLflow's `/v1/traces` endpoint. + +## Starting MLflow + +```bash +dsagt mlflow +``` + +Prints the MLflow UI URL and the `export` block for routing agent OTel output. The port is pinned at `dsagt init` time and listed by `dsagt info `. + +## Trace Coverage + +| Source | Span type | Contents | +|--------|-----------|----------| +| Knowledge base | `kb.search`, `kb.embed`, `kb.index_search`, `kb.rerank` | Per-phase timing trees | +| Tool executions | `tool.execute` | Exit code, duration, file counts, truncated stderr. Full payload in `trace_archive/.json` | +| Registry events | `save_tool_spec`, `install_dependencies`, `reconstruct_pipeline` | Span metadata | +| Native agent OTel | LLM call spans | Coverage varies by agent (see below) | + +### Agent OTel Coverage + +Export the variables printed by `dsagt mlflow` before launching your agent: + +| Agent | Coverage | +|-------|----------| +| claude | Full request/response payloads | +| goose | Full request/response payloads | +| codex | Token counts and tool names | +| opencode | None natively | + +Every span carries the project's `session.id` for filtering in the MLflow trace view. + +## Provenance and Reconstruction + +Tool execution records on disk (`trace_archive/.json`) provide the canonical provenance chain. The agent calls `reconstruct_pipeline` to render the archive as a reproducible bash script or Snakemake workflow. + +## Stopping MLflow + +```bash +dsagt stop +``` + +Releases the port and stops the gunicorn workers. The PID is stored in `/.runtime`. diff --git a/docs/quickstart.md b/docs/quickstart.md new file mode 100644 index 0000000..e1cec67 --- /dev/null +++ b/docs/quickstart.md @@ -0,0 +1,96 @@ +# Quick Start + +This guide walks through knowledge ingest, tool registration, provenance, and explicit memory using the mock project in [`tests/smoke_test/`](https://github.com/AI-ModCon/dsagt/tree/main/tests/smoke_test/). The examples use `claude`; substitute another agent (`goose`, `codex`, `opencode`) if you prefer — the prompts are agent-agnostic. + +## Setup + +```bash +# Install +git clone https://github.com/AI-ModCon/dsagt.git +cd dsagt +uv sync +source .venv/bin/activate + +# Set a convenience variable for the smoke test directory (not a normal dsagt step) +export SMOKE_DIR="$(pwd)/tests/smoke_test" + +# 1. Create a new project called quickstart +dsagt init quickstart --agent claude + +# 2. Start MLflow in the background and print the OTel routing exports +dsagt mlflow quickstart + +# 3. Paste the export block from step 2 into this shell, then launch the agent +cd ~/dsagt-projects/quickstart && claude +``` + +## Agent Prompts + +Inside the agent, paste these prompts one at a time. Replace `$SMOKE_DIR` with the absolute path you exported — the chat does not expand shell variables. + +1. > Ingest the docs in `$SMOKE_DIR/knowledge/` into a collection named `knowledge`. +2. > Register the csvkit CLI tools `csvcut`, `csvgrep`, `csvstat`, and `csvlook`. +3. > Use the `scan_directory` tool from the registry to scan `$SMOKE_DIR/data/`. +4. > Summarize `samples.csv` — columns, row count, quality issues using csvkit tools from the registry. +5. > Put this in explicit memory: samples.csv has null values in the status and timestamp columns. +6. > Tell me what you remember about the samples dataset. + +## Teardown + +After exiting the agent, distill the session into episodic memory and stop the MLflow daemon: + +```bash +# Distill traces into episodic memory +dsagt memory --project quickstart + +# Stop the MLflow daemon +dsagt stop quickstart +``` + +## What Was Exercised + +| Prompt | DSAgt layer | +|--------|-------------| +| 1 | Knowledge MCP server (`kb_ingest`) — chunks and indexes docs into ChromaDB | +| 2 | Registry MCP server (`save_tool_spec`) — writes `tools/csvcut.md`, etc. | +| 3 | `dsagt-run` provenance wrapper — records exec layer to `trace_archive/` | +| 4 | KB recall via `kb_search` and registered tool execution | +| 5–6 | Explicit memory (`kb_remember` → `explicit_memories.yaml`) + `kb_get_memories` | + +## Verify the Artifacts + +```bash +dsagt info quickstart +ls ~/dsagt-projects/quickstart/{tools,trace_archive} +cat ~/dsagt-projects/quickstart/explicit_memories.yaml +``` + +The MLflow UI URL is printed by `dsagt mlflow quickstart`. + +## Non-Interactive Smoke Test + +The same flow runs non-interactively and asserts each artifact is present: + +```bash +dsagt smoke-test --agent claude +``` + +## First-Time Knowledge Base Setup + +`dsagt setup-kb` builds shared ChromaDB collections under `~/.dsagt/kb_index/` that every project on this machine reuses. Run this once after installation. + +```bash +dsagt setup-kb # all collections (local embedder, no creds) +dsagt setup-kb --collection nemo_curator +dsagt setup-kb --embedding-backend api --embedding-base-url ... --embedding-api-key ... +``` + +Three collections are populated: + +- **Tool Specs** — DSAgt's bundled tool specs from `src/dsagt/tools/`, tagged `source: bundled`. +- **Skills** — DSAgt's bundled skill workflows from `src/dsagt/skills/`. +- **Domain Knowledge** — NeMo Curator and AI Data Readiness Inspector reference corpora. + +The Tool Specs and Skills collections are wiped and rebuilt on every run, so re-run `setup-kb` after upgrading DSAgt. + +The default embedder is a local sentence-transformers model (~130 MB, CPU-only, no API key). Pass `--embedding-backend api` to route through a hosted embedder via LiteLLM. diff --git a/docs/tools-skills.md b/docs/tools-skills.md new file mode 100644 index 0000000..6c057bd --- /dev/null +++ b/docs/tools-skills.md @@ -0,0 +1,43 @@ +# Tools and Skills + +## Tools + +Tools are CLI executables defined as markdown files with YAML frontmatter under `/tools/`. The agent registers new tools via the registry MCP server's `save_tool_spec` tool. + +A tool spec includes: + +- A YAML frontmatter block describing the command, arguments, dependencies, and tags. +- A markdown body with usage examples and notes for the agent. + +Example tool spec structure: + +```markdown +--- +name: csvstat +command: csvstat +dependencies: [] +tags: [csv, statistics] +--- + +Prints descriptive statistics for all columns in a CSV file. + +Usage: csvstat [options] [FILE] +``` + +The registry server wraps every registered tool with `dsagt-run` for provenance capture and `uv run --with` for Python dependencies, so the agent can call any tool without managing environments manually. + +### Bundled Tools + +DSAgt ships a `scan_directory` tool in `src/dsagt/tools/` that is indexed into the global Tool Specs collection by `dsagt setup-kb`. + +## Skills + +Skills are instruction-based agent workflows in `/skills/`. Each skill is a directory containing a `SKILL.md` file and optional reference documents. The agent discovers skills via `search_skills`. + +### Bundled Skills + +DSAgt ships a `datacard-generator` skill in `src/dsagt/skills/` with reference templates for generating dataset documentation. It is indexed into the global Skills collection by `dsagt setup-kb`. + +### Adding Skills + +Place a new directory under `/skills/` with a `SKILL.md` describing the workflow. The knowledge server indexes it automatically on next startup, or trigger a re-index via `kb_ingest`. diff --git a/docs/use-cases/cryoem.md b/docs/use-cases/cryoem.md new file mode 100644 index 0000000..f07226e --- /dev/null +++ b/docs/use-cases/cryoem.md @@ -0,0 +1,15 @@ +# Cryo-EM + +**Domain:** Structural biology — cryo-electron microscopy + +**Dataset:** EMPIAR-10017 β-galactosidase micrographs via CryoPPP + +**Source:** [`use_cases/cryoem/`](https://github.com/AI-ModCon/dsagt/tree/main/use_cases/cryoem/) + +## Overview + +This use case demonstrates DSAgt-assisted curation of cryo-EM data from the EMPIAR public archive. The agent registers curation tools, ingests domain knowledge about cryo-EM data quality, and builds a pipeline for micrograph preprocessing. + +## Guides + +- [Cryo-EM Demo](https://github.com/AI-ModCon/dsagt/blob/main/use_cases/cryoem/cryoem_demo.md) — full walkthrough diff --git a/docs/use-cases/index.md b/docs/use-cases/index.md new file mode 100644 index 0000000..76876f1 --- /dev/null +++ b/docs/use-cases/index.md @@ -0,0 +1,9 @@ +# Use Cases + +End-to-end walkthroughs for representative scientific domains live in [`use_cases/`](https://github.com/AI-ModCon/dsagt/tree/main/use_cases/). Each covers data acquisition, tool registration, pipeline construction, and agent-driven execution against a real dataset. + +| Use case | Domain | Guide | +|----------|--------|-------| +| [Microbial Isolates](microbial-isolates.md) | Genomics — short-read QC and assembly with `fastp` and `megahit` | `use_cases/microbial_isolates/` | +| [Cryo-EM](cryoem.md) | Structural biology — EMPIAR-10017 β-galactosidase micrographs via CryoPPP | `use_cases/cryoem/` | +| [VASP / ISAAC](vasp.md) | Materials science — DFT input/output handling with VASP | `use_cases/isaac_vasp/` | diff --git a/docs/use-cases/microbial-isolates.md b/docs/use-cases/microbial-isolates.md new file mode 100644 index 0000000..b6757dc --- /dev/null +++ b/docs/use-cases/microbial-isolates.md @@ -0,0 +1,17 @@ +# Microbial Isolates + +**Domain:** Genomics — short-read QC and assembly + +**Tools:** `fastp`, `megahit` + +**Source:** [`use_cases/microbial_isolates/`](https://github.com/AI-ModCon/dsagt/tree/main/use_cases/microbial_isolates/) + +## Overview + +This use case walks through processing microbial isolate short-read sequencing data using DSAgt. The agent registers `fastp` for quality control and `megahit` for assembly, then builds and executes a pipeline against a real isolate dataset. + +## Guides + +- [Isolate Demo](https://github.com/AI-ModCon/dsagt/blob/main/use_cases/microbial_isolates/isolate_demo.md) — step-by-step walkthrough +- [Genomics Background](https://github.com/AI-ModCon/dsagt/blob/main/use_cases/microbial_isolates/genomics.md) — domain context +- [fastp + megahit Best Practices](https://github.com/AI-ModCon/dsagt/blob/main/use_cases/microbial_isolates/fastp_megahit_best_practices.md) — parameter guidance diff --git a/docs/use-cases/vasp.md b/docs/use-cases/vasp.md new file mode 100644 index 0000000..9f4ce28 --- /dev/null +++ b/docs/use-cases/vasp.md @@ -0,0 +1,15 @@ +# VASP / ISAAC + +**Domain:** Materials science — density functional theory + +**Tools:** VASP, ISAAC + +**Source:** [`use_cases/isaac_vasp/`](https://github.com/AI-ModCon/dsagt/tree/main/use_cases/isaac_vasp/) + +## Overview + +This use case covers DFT input/output handling with VASP using DSAgt and the ISAAC workflow system. The agent registers VASP pre/post-processing tools and transfers NEB calculation results into the ISAAC database. + +## Guides + +- [VASP / ISAAC README](https://github.com/AI-ModCon/dsagt/blob/main/use_cases/isaac_vasp/README.md) — setup and walkthrough diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100644 index 0000000..5fd83a0 --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,61 @@ +site_name: DSAgt +site_description: DataSmith Agent — AI-assisted data pipeline builder +site_url: https://ai-modcon.github.io/dsagt +repo_url: https://github.com/AI-ModCon/dsagt +repo_name: AI-ModCon/dsagt +edit_uri: edit/main/docs/ + +theme: + name: material + palette: + - media: "(prefers-color-scheme: light)" + scheme: default + primary: indigo + accent: indigo + toggle: + icon: material/brightness-7 + name: Switch to dark mode + - media: "(prefers-color-scheme: dark)" + scheme: slate + primary: indigo + accent: indigo + toggle: + icon: material/brightness-4 + name: Switch to light mode + features: + - navigation.tabs + - navigation.sections + - navigation.top + - search.highlight + - search.suggest + - content.code.copy + - content.code.annotate + +markdown_extensions: + - admonition + - pymdownx.details + - pymdownx.superfences + - pymdownx.highlight: + anchor_linenums: true + - pymdownx.inlinehilite + - pymdownx.tabbed: + alternate_style: true + - tables + - toc: + permalink: true + +nav: + - Home: index.md + - Quick Start: quickstart.md + - Architecture: architecture.md + - MCP Servers: mcp-servers.md + - Knowledge Base: knowledge-base.md + - Tools & Skills: tools-skills.md + - Observability: observability.md + - CLI Reference: cli.md + - Use Cases: + - Overview: use-cases/index.md + - Microbial Isolates: use-cases/microbial-isolates.md + - Cryo-EM: use-cases/cryoem.md + - VASP / ISAAC: use-cases/vasp.md + - Developer Guide: developer.md diff --git a/pyproject.toml b/pyproject.toml index dfd4336..bcd0c77 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -70,6 +70,9 @@ dev = [ "black>=24.0", "ruff>=0.5.0", ] +docs = [ + "mkdocs-material>=9.5", +] [build-system] requires = ["setuptools>=61.0", "wheel"]