Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,15 @@ jobs:
with:
python-version: "3.12"

- name: Install uv
uses: astral-sh/setup-uv@v5

- name: Install docs dependencies
run: pip install mkdocs-material
run: uv sync --group docs

- name: Build docs
run: mkdocs build --strict
run: uv run mkdocs build --strict

- name: Deploy to GitHub Pages
if: github.ref == 'refs/heads/main' && github.event_name != 'pull_request'
run: mkdocs gh-deploy --force
run: uv run mkdocs gh-deploy --force
49 changes: 49 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Changelog

All notable changes to DSAgt are documented here. The format is based on
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.2.0] - 2026-06-23

### Added
- External skill catalogs: discover and install agent skills from GitHub
sources via `add_skill_source`, `search_skills`, and `install_skill` (plus
the `dsagt skills sync/add/list/search` CLI), backed by per-source ChromaDB
collections.
- Native skill discovery: installed and bundled skills are mirrored into the
agent's native skill directory (e.g. `.claude/skills/`) at init/start.
- `skill-creator` bundled skill for authoring new skills from the Anthropic
template.
- Install-from-GitHub instructions for non-developers (`pip install
git+https://github.com/AI-ModCon/dsagt.git` into any Python 3.12/3.13
environment) in the README and docs.

### Changed
- `search_skills` now reports when no external catalog is synced instead of a
bare "no match", and `list_skill_sources` flags each known source as
`synced`/available with its indexed count.
- `install_skill` clarifies that an installed skill is usable in the current
session immediately — a restart is only needed for hands-free native
auto-invocation.
- The package version is single-sourced from `dsagt.__version__` (pyproject
reads it via setuptools dynamic metadata).
- Documentation home page (`docs/index.md`) pulls the supported-agents table
and install instructions directly from the README via the
`mkdocs-include-markdown` plugin, so the two no longer drift.

### Fixed
- CLI-added skill sources are now persisted to the project config.

## [0.1.0] - 2026-01-11

### Added
- Initial release: registry and knowledge MCP servers, BYOA per-agent config
generation, MLflow/OTel observability, the tool/skill registry, execution
provenance, and explicit + episodic memory.

[Unreleased]: https://github.com/AI-ModCon/dsagt/compare/v0.2.0...HEAD
[0.2.0]: https://github.com/AI-ModCon/dsagt/releases/tag/v0.2.0
[0.1.0]: https://github.com/AI-ModCon/dsagt/releases/tag/v0.1.0
41 changes: 40 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@

DSAgt connects an MCP-compatible AI coding agent to tool registration, a semantic knowledge base, execution provenance, and observability infrastructure. DSAgt provides data-pipeline scaffolding around a user's existing agent CLI or VS Code extension (Claude Code, Goose, Codex, …);

**Prerequisites:** Python 3.10–3.13, [uv](https://github.com/astral-sh/uv), and one of the supported agent platforms below — already installed and authenticated against whatever LLM provider you intend to use.
**Prerequisites:** Python 3.12 or 3.13, and one of the supported agent platforms below — already installed and authenticated against whatever LLM provider you intend to use. ([uv](https://github.com/astral-sh/uv) is only needed for the development install.)

<!-- md-shared:agents:start -->
| Agent | Install | Verify |
|-------|---------|--------|
| [Claude Code](https://github.com/anthropics/claude-code) | `npm i -g @anthropic-ai/claude-code` | `claude --version` |
Expand All @@ -16,6 +17,44 @@ DSAgt connects an MCP-compatible AI coding agent to tool registration, a semanti
| [opencode](https://github.com/sst/opencode) | See [opencode docs](https://opencode.ai/docs/) | `opencode --version` |
| [Roo Code](https://github.com/RooCodeInc/Roo-Code) | `npm i -g @roo-code/cli` | `roo --version` |
| [Cline](https://github.com/cline/cline) | `npm i -g cline` | `cline --version` |
<!-- md-shared:agents:end -->

## Installation

### For use (no development)

<!-- md-shared:install:start -->
If you just want to *run* DSAgt against your own data and agent — no repo checkout, no `uv` — install it straight from GitHub into a virtual environment. Any Python 3.12/3.13 environment works (`venv`, conda, etc.); only the `pip install git+…` step is officially supported.

```bash
python3.12 -m venv ~/.venvs/dsagt # or: conda create -n dsagt python=3.12 && conda activate dsagt
source ~/.venvs/dsagt/bin/activate # (Windows venv: ~\.venvs\dsagt\Scripts\activate)
pip install "git+https://github.com/AI-ModCon/dsagt.git"
dsagt --version # 0.2.0
```

This puts the `dsagt` CLI (and the `dsagt-run` / `dsagt-*-server` helpers) on your PATH. Then build the shared knowledge base once and create your first project:

```bash
dsagt setup-kb # bundled tools + skills + reference corpora
# (downloads a ~130 MB local embedder on first run)
dsagt init my-project --agent claude # or: goose / codex / opencode / roo / cline
dsagt start my-project
```

To upgrade later, reinstall and re-run `setup-kb` to pick up new bundled tools/skills:

```bash
pip install --upgrade "git+https://github.com/AI-ModCon/dsagt.git"
dsagt setup-kb
```

> Pin to a specific release once tags are published, e.g. `pip install "git+https://github.com/AI-ModCon/dsagt.git@v0.2.0"`.
<!-- md-shared:install:end -->

### For development

Clone the repo and use `uv` (editable install with the full test suite) — see [Quick Start](#quick-start) below.

## Quick Start

Expand Down
2 changes: 1 addition & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# CLI Reference

All commands are available after running `uv sync` and activating the virtual environment (`source .venv/bin/activate`).
All commands are available after [installation](index.md#installation) and activating your virtual environment.

## Project Management

Expand Down
31 changes: 21 additions & 10 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,23 +6,34 @@ DSAgt connects an MCP-compatible AI coding agent to tool registration, a semanti

## Supported Agents

| Agent | Install | Verify |
|-------|---------|--------|
| [Claude Code](https://github.com/anthropics/claude-code) | `npm i -g @anthropic-ai/claude-code` | `claude --version` |
| [Goose](https://github.com/block/goose) | See [Goose docs](https://github.com/block/goose#installation) | `goose --version` |
| [Codex](https://github.com/openai/codex) | `npm i -g @openai/codex` | `codex --version` |
| [opencode](https://github.com/sst/opencode) | See [opencode docs](https://opencode.ai/docs/) | `opencode --version` |
| [Roo Code](https://github.com/RooCodeInc/Roo-Code) | `npm i -g @roo-code/cli` | `roo --version` |
| [Cline](https://github.com/cline/cline) | `npm i -g cline` | `cline --version` |
<!-- Shared with README.md — edit there, not here. -->
{%
include-markdown "../README.md"
start="<!-- md-shared:agents:start -->"
end="<!-- md-shared:agents:end -->"
%}

## Prerequisites

- Python 3.12–3.13
- [uv](https://github.com/astral-sh/uv)
- Python 3.12 or 3.13
- One of the supported agent platforms above, installed and authenticated against your LLM provider
- [uv](https://github.com/astral-sh/uv) — only for the development install

## Installation

### For use (no development)

<!-- Shared with README.md — edit there, not here. -->
{%
include-markdown "../README.md"
start="<!-- md-shared:install:start -->"
end="<!-- md-shared:install:end -->"
%}

### For development

Clone the repo and use `uv` (editable install; add `--all-groups` for the test suite):

```bash
git clone https://github.com/AI-ModCon/dsagt.git
cd dsagt
Expand Down
4 changes: 4 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ theme:
- content.code.copy
- content.code.annotate

plugins:
- search
- include-markdown

markdown_extensions:
- admonition
- pymdownx.details
Expand Down
6 changes: 5 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dsagt"
version = "0.1.0"
dynamic = ["version"]
description = "DataSmith Agent - AI-assisted data pipeline builder"
readme = "README.md"
requires-python = ">=3.12,<3.14"
Expand Down Expand Up @@ -72,6 +72,7 @@ dev = [
]
docs = [
"mkdocs-material>=9.5",
"mkdocs-include-markdown-plugin>=6.0",
]

[build-system]
Expand All @@ -81,6 +82,9 @@ build-backend = "setuptools.build_meta"
[tool.setuptools]
package-dir = {"" = "src"}

[tool.setuptools.dynamic]
version = {attr = "dsagt.__version__"}

[tool.setuptools.packages.find]
where = ["src"]

Expand Down
4 changes: 3 additions & 1 deletion src/dsagt/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
AI-assisted data pipeline builder for MCP-compatible agents.
"""

__version__ = "0.1.0"
# Single source of truth for the package version: pyproject.toml reads this
# via `[tool.setuptools.dynamic] version = {attr = "dsagt.__version__"}`.
__version__ = "0.2.0"

# Cap CPU thread count for embedding / tokenization libraries before any
# heavy imports happen. Without this, PyTorch / sentence-transformers /
Expand Down
81 changes: 81 additions & 0 deletions src/dsagt/agents/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
import json
import logging
import shlex
import shutil
import subprocess
from abc import ABC, abstractmethod
from pathlib import Path
Expand Down Expand Up @@ -41,6 +42,7 @@
"get_registry",
"http_request",
"install_dependencies",
"install_skill",
"read_file",
"reconstruct_pipeline",
"run_command",
Expand All @@ -50,6 +52,7 @@
"search_skills",
],
"knowledge": [
"add_skill_source",
"kb_add_vector_db",
"kb_append",
"kb_dismiss_suggestion",
Expand All @@ -60,6 +63,7 @@
"kb_list_collections",
"kb_remember",
"kb_search",
"list_skill_sources",
],
}

Expand Down Expand Up @@ -212,6 +216,83 @@ def _append_or_write(path: Path, content: str, marker: str) -> str | None:
return f"Wrote {path}"


#: Claude Code caps a skill's frontmatter description (combined with
#: when_to_use) at this many characters; longer ones are rejected. We
#: truncate the *mirrored* copy only, never the project source.
_NATIVE_DESCRIPTION_CAP = 1536

#: Manifest filename inside a native skills dir listing the skill names
#: dsagt placed there, so the mirror can reap its own stale entries on
#: re-run without ever touching user-authored skills.
_SKILL_MANIFEST = ".dsagt-managed.json"


def _truncate_native_description(skill_md: Path) -> None:
"""If the mirrored SKILL.md's description exceeds the native cap, trim it."""
import yaml

text = skill_md.read_text()
if not text.startswith("---"):
return
parts = text.split("---", 2)
if len(parts) < 3:
return
try:
front = yaml.safe_load(parts[1]) or {}
except yaml.YAMLError:
return
desc = front.get("description")
if isinstance(desc, str) and len(desc) > _NATIVE_DESCRIPTION_CAP:
front["description"] = desc[: _NATIVE_DESCRIPTION_CAP - 1].rstrip() + "…"
new_front = yaml.dump(front, default_flow_style=False, sort_keys=False)
skill_md.write_text(f"---\n{new_front}---{parts[2]}")


def _mirror_skills_to(target_dir: Path, skill_dirs: list[Path]) -> list[str]:
"""Idempotently mirror *skill_dirs* into *target_dir* (e.g. .claude/skills).

Copies each skill directory (SKILL.md + scripts/ + references/) under
``target_dir/<dir-name>/``. A manifest tracks the names dsagt owns so a
later run reaps skills that were removed upstream **without ever
touching user-authored skills** that dsagt didn't place. ``skill_dirs``
should list bundled dirs before project dirs so a project skill wins a
name collision (copied last).
"""
actions: list[str] = []
manifest_path = target_dir / _SKILL_MANIFEST
previously: list[str] = []
if manifest_path.exists():
try:
previously = json.loads(manifest_path.read_text())
except (json.JSONDecodeError, OSError):
previously = []

target_dir.mkdir(parents=True, exist_ok=True)
managed: list[str] = []
for src in skill_dirs:
if not (src / "SKILL.md").exists():
continue
name = src.name
dest = target_dir / name
if dest.exists():
shutil.rmtree(dest)
shutil.copytree(src, dest)
_truncate_native_description(dest / "SKILL.md")
if name not in managed:
managed.append(name)

# Reap skills dsagt placed previously that are gone from the source set.
for stale in set(previously) - set(managed):
stale_dir = target_dir / stale
if stale_dir.is_dir():
shutil.rmtree(stale_dir, ignore_errors=True)

manifest_path.write_text(json.dumps(sorted(managed), indent=2) + "\n")
if managed:
actions.append(f"Mirrored {len(managed)} skill(s) into {target_dir}")
return actions


def _build_mcp_servers_dict(env_block: dict | None) -> dict:
"""Build the standard ``{"mcpServers": {...}}`` dict for dsagt servers.

Expand Down
13 changes: 13 additions & 0 deletions src/dsagt/agents/claude.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
_load_master_instructions,
_mcp_env_block,
_mcp_server_args,
_mirror_skills_to,
_run_simple_script,
)

Expand Down Expand Up @@ -168,6 +169,18 @@ def write_dynamic(
mcp_path.write_text(json.dumps(mcp_config, indent=2) + "\n")
actions.append(f"Wrote {mcp_path}")

# Mirror installed (project) + bundled skills into Claude Code's
# native skill dir so it discovers/auto-invokes them without an MCP
# round-trip. Bundled first, project last → project wins collisions.
# A newly-created .claude/skills/ is only picked up on Claude restart,
# which is fine: this runs at init/start, before the agent launches.
if (config.get("skills") or {}).get("populate_native", True):
from dsagt.registry import SkillRegistry

reg = SkillRegistry(runtime_dir=working_dir, kb=None)
src_dirs = reg._bundled_skill_dirs() + reg._project_skill_dirs()
actions += _mirror_skills_to(working_dir / ".claude" / "skills", src_dirs)

# Configure mlflow autolog claude — writes .claude/settings.json
# with the MLflow Stop hook + tracking env vars. Idempotent and
# preserves any existing keys in settings.json (mlflow's setup
Expand Down
Loading
Loading