From 18ca03d859540568dfea1b4de81478ebd5b64475 Mon Sep 17 00:00:00 2001 From: "Eric W. Tramel" Date: Wed, 6 May 2026 09:50:31 -0400 Subject: [PATCH 1/2] Update plugin creation agent guidance --- .claude/commands/create-plugin.md | 385 +++++++++--------- .../data-designer-plugin-authoring/SKILL.md | 160 ++++++++ 2 files changed, 347 insertions(+), 198 deletions(-) create mode 100644 .codex/skills/data-designer-plugin-authoring/SKILL.md diff --git a/.claude/commands/create-plugin.md b/.claude/commands/create-plugin.md index 195acef..b34eb05 100644 --- a/.claude/commands/create-plugin.md +++ b/.claude/commands/create-plugin.md @@ -1,322 +1,311 @@ --- -description: Create a new DataDesigner plugin with correct structure, conventions, and passing CI -argument-hint: Plugin name and description (e.g., "word-count - counts words in text columns") +description: Create a new Data Designer plugin with correct structure, docs, metadata, and local validation +argument-hint: Plugin slug and description (e.g., "word-count - counts words in text columns") --- -# DataDesigner Plugin Authoring +# Data Designer Plugin Authoring -You are creating a new DataDesigner plugin in the `data-designer-plugins` monorepo. Follow this guide precisely. It encodes lessons learned from prior plugin authoring attempts and addresses common pitfalls. +You are creating a Data Designer plugin in the `DataDesignerPlugins` monorepo. Use the repository tooling as the source of truth, keep the plugin self-contained, and prepare the per-plugin documentation so the Zensical site generation stays clean. **Plugin request:** $ARGUMENTS --- -## Phase 1: Understand the Codebase (DO NOT SKIP) +## Phase 1: Read Current Repo Context -Before writing any code, you must build context. Use dedicated tools (Read, Glob) rather than Bash for file exploration. +Before editing, read the current repository guidance and reference implementation. Use Claude Read/Glob tools for file exploration when possible. -**Required reads** (in parallel): +Required reads: -1. `CLAUDE.md` -- repo conventions (already in system context, but re-read for specifics) -2. `plugins/data-designer-template/src/data_designer_template/config.py` -- reference config -3. `plugins/data-designer-template/src/data_designer_template/impl.py` -- reference implementation -4. `plugins/data-designer-template/src/data_designer_template/plugin.py` -- reference wiring -5. `plugins/data-designer-template/tests/test_plugin.py` -- reference tests -6. `plugins/data-designer-template/pyproject.toml` -- reference packaging -7. `docs/adding-a-plugin.md` -- full authoring guide (agents often skip this -- don't) +1. `AGENTS.md` - repo conventions, workflow, PR expectations, and release guardrails. +2. `README.md` - current quick start, Makefile targets, and `ddp` CLI overview. +3. `docs/authoring.md` - plugin authoring guide. +4. `docs/workflow.md` - local checks, generated docs, and CI expectations. +5. `Makefile` - canonical target names. +6. `zensical.toml` - site configuration and generated plugin docs navigation block. +7. `devtools/ddp/src/ddp/scaffold.py` - current scaffold output. +8. `devtools/ddp/src/ddp/plugin_docs.py` - how per-plugin docs become Zensical pages. +9. `plugins/data-designer-template/` - reference package, especially `config.py`, `impl.py`, `plugin.py`, `tests/test_plugin.py`, `pyproject.toml`, and `docs/`. -**Required introspection** (after `make sync`): +After `make sync`, inspect the Data Designer interfaces you plan to implement instead of guessing signatures: ```bash uv run python -c "import inspect; from data_designer.config.base import SingleColumnConfig; print(inspect.getsource(SingleColumnConfig))" uv run python -c "import inspect; from data_designer.engine.column_generators.generators.base import ColumnGeneratorFullColumn; print(inspect.getsource(ColumnGeneratorFullColumn))" ``` -This tells you the exact interface you must implement. Do not guess at method signatures. - --- ## Phase 2: Scaffold -Always use the canonical scaffold tool. Never hand-create the plugin directory structure. +Always use the canonical scaffold. Do not hand-create a plugin package layout. ```bash make sync uv run ddp new ``` -After scaffolding, read all generated files to see what the scaffold provides and what you need to modify. +Use the kebab-case slug without the `data-designer-` prefix. The scaffold creates: + +```text +plugins/data-designer-/ +|-- pyproject.toml +|-- README.md +|-- CODEOWNERS +|-- docs/ +| `-- index.md +|-- tests/ +| `-- test_plugin.py +`-- src/ + `-- data_designer_/ + |-- __init__.py + |-- config.py + |-- impl.py + `-- plugin.py +``` + +Read all generated files after scaffolding. If the slug contains words such as `column` and the scaffold generates stuttering class names, rename the classes and update `plugin.py`. --- ## Phase 3: Implement -### 3a. Config (`config.py`) +### Config -Subclass `SingleColumnConfig`. Required elements: +Subclass `SingleColumnConfig`. Use Python 3.10+ annotations and a literal column type default: ```python from typing import Literal + from data_designer.config.base import SingleColumnConfig + class MyPluginColumnConfig(SingleColumnConfig): - column_type: Literal["my-plugin"] = "my-plugin" # Must be Literal with default + """Configuration for the my-plugin column generator.""" - # Your config fields here (use modern 3.10+ annotations: list[str], X | None) + column_type: Literal["my-plugin"] = "my-plugin" @staticmethod def get_column_emoji() -> str: - return "..." # Single emoji + return "..." @property def required_columns(self) -> list[str]: - return [...] # Columns that must exist before this one runs + return ["source_column"] @property def side_effect_columns(self) -> list[str]: - return [] # Additional columns this generator creates (usually empty) + return [] ``` -**Common mistake -- class naming**: If your plugin slug already contains "column" (e.g., "hash-column"), the scaffold generates `HashColumnColumnConfig` with a stutter. Rename to `HashColumnConfig` / `HashColumnGenerator` and update `plugin.py` qualified names accordingly. +Add Pydantic validators for structural constraints such as non-empty lists, parsable patterns, allowed modes, paths, or field combinations. Catch invalid config at construction time, not inside `generate()`. -**Validate early with `field_validator`**: If your config has fields with structural constraints (e.g., a regex that must contain capture groups, a list that must be non-empty, a string that must parse as a certain format), add a Pydantic `@field_validator` so errors are caught at config construction time, not at `generate()` time. Deferring validation to `generate()` means users only discover bad config after they've wired up an entire pipeline. +### Implementation -```python -from pydantic import field_validator +Use the Data Designer generator base that matches the behavior, usually `ColumnGeneratorFullColumn[YourConfig]` for whole-column transformations. -class MyPluginColumnConfig(SingleColumnConfig): - pattern: str - - @field_validator("pattern") - @classmethod - def pattern_must_be_valid(cls, value: str) -> str: - """Validate the pattern at config construction time.""" - compiled = re.compile(value) - if compiled.groups < 1: - raise ValueError(f"Pattern must contain at least one capture group, got: {value!r}") - return value -``` +Implementation rules: -Do not duplicate this validation logic in `impl.py`. If the config validates on construction, `generate()` can trust the field is valid. +- Keep plugin logic in top-level composable functions or small modules; keep `impl.py` mostly orchestration. +- Do not use relative imports. Import from the package name, for example `from data_designer_my_plugin.config import MyPluginColumnConfig`. +- Do not define private helper closures or functions inside functions. +- Prefer vectorized pandas operations, named helpers, `functools.partial`, or module-level dispatch tables over lambda-heavy `apply()` code. +- Use `from __future__ import annotations` and guard pandas imports with `TYPE_CHECKING` when pandas is only needed for type hints. +- Add Google-style docstrings to public classes, functions, and methods. +- Keep dependencies in the plugin's own `pyproject.toml`. Do not depend on another local plugin package. -### 3b. Implementation (`impl.py`) +If you add a new import package, add it to the root Ruff isort list: -Subclass `ColumnGeneratorFullColumn[YourConfig]` (batch) or `ColumnGeneratorCellByCell[YourConfig]` (row-by-row). +```toml +[tool.ruff.lint.isort] +known-first-party = ["ddp", "data_designer_template", "data_designer_my_plugin"] +``` -**Critical rules:** +### Plugin Wiring -1. **NO lambda closures.** CLAUDE.md bans closures and function-in-function definitions. This is the most common violation. +The scaffold usually gets `plugin.py` right. Only update it when class or module names change: - BAD (every prior agent did this): - ```python - data[self.config.name] = data[col].apply(lambda x: my_func(x, param)) - ``` +```python +from data_designer.plugins.plugin import Plugin, PluginType - GOOD -- use `functools.partial`: - ```python - from functools import partial - data[self.config.name] = data[col].apply(partial(my_func, param=param)) - ``` +plugin = Plugin( + config_qualified_name="data_designer_my_plugin.config.MyPluginColumnConfig", + impl_qualified_name="data_designer_my_plugin.impl.MyPluginColumnGenerator", + plugin_type=PluginType.COLUMN_GENERATOR, +) +``` - GOOD -- use vectorized pandas operations when possible: - ```python - data[self.config.name] = data[col].str.upper() - ``` +### CODEOWNERS - GOOD -- use a module-level dispatch dict: - ```python - _MODE_FUNCTIONS: dict[str, Callable[[str], int]] = { - "words": count_words, - "characters": count_characters, - } - # In generate(): - data[self.config.name] = data[col].apply(_MODE_FUNCTIONS[self.config.mode]) - ``` +The scaffold discovers an owner from git config. Check the per-plugin `CODEOWNERS` and prefer the repo convention from the template: -2. **Extract logic into top-level composable functions**, not methods on the generator class. This follows the CLAUDE.md rule: "Favor reusable, composable functions that can be combined in higher-level functions." +```text +* @NVIDIA-NeMo/data_designer_reviewers +``` - **But avoid leaky abstractions.** If a helper function accepts a compiled object (e.g., `re.Pattern`) but then extracts the raw string from it to pass to another API (e.g., `series.str.extract(pattern.pattern)`), the abstraction is misleading. Either accept the raw form the downstream API needs, or use the compiled object directly. +Run `make codeowners` after ownership changes so `.github/CODEOWNERS` is regenerated. -3. **Use `TYPE_CHECKING` guard for pandas**: - ```python - from __future__ import annotations - from typing import TYPE_CHECKING - if TYPE_CHECKING: - import pandas as pd - ``` +--- -4. **Full Google-style docstrings** on all public functions, methods, and classes. +## Phase 4: Test Public Behavior -5. **No relative imports.** Use `from data_designer_my_plugin.config import MyPluginColumnConfig`. +Write tests around public interfaces and expected Data Designer behavior: -### 3c. Plugin wiring (`plugin.py`) +```python +from data_designer.engine.testing.utils import assert_valid_plugin -The scaffold generates this correctly. Only update if you renamed classes: +from data_designer_my_plugin.plugin import plugin -```python -plugin = Plugin( - config_qualified_name="data_designer_my_plugin.config.MyPluginColumnConfig", - impl_qualified_name="data_designer_my_plugin.impl.MyPluginColumnGenerator", - plugin_type=PluginType.COLUMN_GENERATOR, -) + +def test_valid_plugin() -> None: + assert_valid_plugin(plugin) ``` -### 3d. Extra modules +Cover the relevant tiers: -If your plugin has substantial pure logic (scoring, parsing, transformation), extract it into a separate module (e.g., `scoring.py`). Keep `impl.py` thin -- it should wire config to logic, not contain the logic itself. +- Config properties, defaults, and validation errors. +- Pure helper functions or parsing/scoring modules. +- Generator behavior against representative DataFrames. +- Data Designer preview integration when the plugin changes user-visible pipeline behavior. +- Edge cases with `None`, `NaN`, empty strings, numeric values in text columns, or malformed config. -### 3e. Root `pyproject.toml` +If using pytest's `tmp_path`, annotate it as `pathlib.Path`, not `pd.DataFrame`. -Add your module to the isort known-first-party list: +Run the isolated plugin test loop while developing: -```toml -[tool.ruff.lint.isort] -known-first-party = [..., "data_designer_my_plugin"] +```bash +make test-plugin PLUGIN=data-designer-my-plugin ``` -### 3f. CODEOWNERS - -The scaffold generates this from `git config user.email`. Check that it uses `@username` or `@org/team` format (e.g., `* @NVIDIA-NeMo/data_designer_reviewers`), not email format. If it used email, fix it to match the convention in the template's CODEOWNERS. +The target uses `uv venv --clear`, so stale `.venv-data-designer-my-plugin` directories should not need manual cleanup. --- -## Phase 4: Test +## Phase 5: Prepare Per-Plugin Zensical Docs -### Test structure +Each plugin owns its source docs under `plugins/data-designer-/docs/`. The top-level Zensical site is generated from those files and package metadata. -Write four tiers of tests, matching the template's patterns: +Required source docs: -```python -# Tier 1: Plugin contract validation -def test_valid_plugin() -> None: - assert_valid_plugin(plugin) +```text +plugins/data-designer-/docs/ +`-- index.md +``` + +Recommended docs for user-facing plugins: -# Tier 2: Config unit tests -class TestMyPluginColumnConfig: - def test_required_columns(self) -> None: ... - def test_side_effect_columns(self) -> None: ... - def test_column_emoji(self) -> None: ... - def test_defaults(self) -> None: ... - -# Tier 3: Generator unit tests (using __new__ bypass pattern) -def _make_generator(config: MyPluginColumnConfig) -> MyPluginColumnGenerator: - generator = MyPluginColumnGenerator.__new__(MyPluginColumnGenerator) - generator._config = config - return generator - -class TestMyPluginColumnGenerator: - @pytest.fixture() - def source_df(self) -> pd.DataFrame: - return pd.DataFrame({...}) - - def test_basic_generation(self, source_df: pd.DataFrame) -> None: - generator = _make_generator(MyPluginColumnConfig(name="out", ...)) - result = generator.generate(source_df) - assert "out" in result.columns - ... - -# Tier 4: Integration tests using DataDesigner.preview() -class TestMyPluginPreviewIntegration: - def test_preview_basic(self, tmp_path: Path) -> None: - seed_df = pd.DataFrame({...}) - builder = DataDesignerConfigBuilder() - builder.with_seed_dataset(DataFrameSeedSource(df=seed_df)) - builder.add_column(name="out", column_type="my-plugin", ...) - result = DataDesigner(artifact_path=tmp_path / "artifacts").preview(builder, num_records=3) - assert result.dataset is not None - assert "out" in result.dataset.columns +```text +plugins/data-designer-/docs/ +|-- index.md +`-- usage.md ``` -### Test pitfalls to avoid +Write `docs/index.md` as the plugin overview: -1. **`tmp_path` type annotation**: The pytest `tmp_path` fixture is `pathlib.Path`, NOT `pd.DataFrame`. The template has this bug -- do not copy it. - ```python - from pathlib import Path - def test_preview(self, tmp_path: Path) -> None: # CORRECT - # NOT: def test_preview(self, tmp_path: pd.DataFrame) -> None: # WRONG - ``` +- H1: `# data-designer-`. +- One short paragraph explaining what the plugin adds. +- Installation command using `uv add data-designer data-designer-`. +- Column type section naming the discovered entry point, for example `` `` ``. +- Configuration table with `Field`, `Required`, and `Description` columns. +- A realistic Python or Data Designer config example. +- Important behavior notes, limitations, or output columns only when useful. -2. **Pandas dtype coercion**: When creating a `pd.Series` with mixed int/float, pandas upcasts ints to floats. `pd.Series({"a": 42, "b": 3.14})` gives `a=42.0`, not `a=42`. Write test expectations accordingly, or use uniform types. +Write `docs/usage.md` when the plugin needs a fuller example: -3. **Test composable functions independently**: If you extracted functions to module-level, write dedicated test classes for them (e.g., `TestComputeHash`, `TestTokenize`). This goes beyond the template but produces better coverage. +- H1: `# Usage` or another concise title; non-index H1 text becomes the Zensical nav label. +- A runnable or realistic example using `DataDesignerConfigBuilder`, a YAML-style config, or both. +- Expected output shape or before/after behavior. +- Error cases and config validation notes that users should know before running a job. -4. **Test config validation edge cases**: If you added `@field_validator` on config fields, write tests that verify invalid inputs are rejected at construction time: - ```python - def test_rejects_invalid_pattern(self) -> None: - with pytest.raises(ValueError, match="at least one capture group"): - MyPluginColumnConfig(name="out", source_column="src", pattern=r"\d+") +Zensical formatting rules for this repo: - def test_rejects_malformed_input(self) -> None: - with pytest.raises(Exception): # re.error or ValidationError - MyPluginColumnConfig(name="out", source_column="src", pattern=r"(unclosed") - ``` +- Keep links and assets relative to the plugin's own `docs/` directory; generated pages are copied to `docs/plugins/data-designer-/`. +- Store plugin doc assets under the plugin docs tree, for example `plugins/data-designer-/docs/assets/example.png`. +- Use fenced code blocks with language tags such as `python`, `yaml`, `toml`, or `bash`. +- Use Markdown tables for config references. +- Keep headings hierarchical and avoid skipping from H1 to H3. +- Do not edit `docs/plugins/` directly. It is generated. +- Do not edit the generated plugin nav block in `zensical.toml` directly. +- Remember that package metadata feeds the generated plugin index card: keep `pyproject.toml` `description` concise and user-facing, and verify the `data_designer.plugins` entry point key is the column type users configure. -5. **Test edge cases with None and non-string source values**: DataFrames in the wild often have `None`, `NaN`, or numeric values in text columns. Write at least one test that exercises your generator on a DataFrame with `None` values in the source column to verify graceful handling. +Regenerate and validate site inputs after plugin docs or metadata change: -6. **Stale venv on test re-run**: `make test-plugin` creates `.venv-{plugin-name}` and fails if it already exists from a prior failed run. If tests fail and you need to re-run: - ```bash - rm -rf .venv-data-designer-my-plugin && make test-plugin PLUGIN=data-designer-my-plugin - ``` +```bash +make plugin-docs +make docs +``` --- -## Phase 5: Format and Lint First +## Phase 6: Regenerate Derived Files + +Use current target names. There is no `make catalog` target in this repo. -Import sort order (isort) is the most common lint failure. **Always run `make format` before `make lint`** to avoid wasting a cycle: +When plugin docs or package metadata change: ```bash -make sync -make format # Fix import order and formatting FIRST -make lint # Should pass after format +make plugin-docs ``` ---- +When plugin ownership changes: + +```bash +make codeowners +``` -## Phase 6: Test in Isolation +When Python files are added or changed: ```bash -make test-plugin PLUGIN=data-designer-my-plugin # Isolated venv test +make update-license-headers ``` -If tests fail and you need to re-run, **delete the stale venv first** (the Makefile does not auto-clean on failure): +`make check` verifies generated plugin docs, generated CODEOWNERS, and SPDX headers: ```bash -rm -rf .venv-data-designer-my-plugin && make test-plugin PLUGIN=data-designer-my-plugin +make check ``` --- -## Phase 7: Validate and Check +## Phase 7: Local Validation + +Prefer the repo's Makefile targets over ad hoc substitutes. + +Fast loop: + +```bash +make format +make lint +make test-plugin PLUGIN=data-designer-my-plugin +make validate +make check +make docs +``` + +Full local CI: ```bash -make validate # Entry point + assert_valid_plugin -make catalog && make codeowners && make update-license-headers # Regenerate derived files -make check # Verify derived files match -make lint # Final lint confirmation +make all ``` --- ## Anti-Pattern Checklist -Before declaring done, verify you have NOT done any of these: - -- [ ] Lambda closures in `generate()` or anywhere else (use `functools.partial` or dispatch dicts) -- [ ] Relative imports (`from .config import ...`) -- [ ] `tmp_path: pd.DataFrame` in test signatures (should be `from pathlib import Path` then `tmp_path: Path`) -- [ ] Missing SPDX headers on any `.py` file -- [ ] Email format in CODEOWNERS instead of `@username` (read template's CODEOWNERS to match) -- [ ] Missing docstrings on public functions/classes -- [ ] Private helper closures or nested function definitions -- [ ] `typing.List[str]` instead of `list[str]` (3.10+ style required) -- [ ] Missing `from __future__ import annotations` when using `TYPE_CHECKING` -- [ ] Skipped reading `docs/adding-a-plugin.md` -- [ ] Used `find` or `ls` via Bash instead of Glob/Read tools -- [ ] Forgot to add module to `known-first-party` in root `pyproject.toml` -- [ ] Forgot to run `make catalog && make codeowners && make update-license-headers` -- [ ] Forgot to run `make format` BEFORE `make lint` (isort failures are the #1 lint issue) -- [ ] Forgot to delete stale `.venv-*` before re-running `make test-plugin` after a failure -- [ ] Config fields with structural constraints lack `@field_validator` (validate at construction, not at `generate()` time) -- [ ] Helper function accepts a compiled object but then extracts the raw form to pass to a downstream API (leaky abstraction) -- [ ] No tests for `None`/`NaN` values in the source column -- [ ] No tests verifying that invalid config field values are rejected at construction time +Before opening the PR, verify you have not done any of these: + +- Skipped `uv run ddp new ` and hand-created the plugin. +- Used `docs/adding-a-plugin.md`; the current guide is `docs/authoring.md`. +- Used `make catalog`; the current generated docs target is `make plugin-docs`. +- Edited generated files under `docs/plugins/` manually. +- Edited the generated plugin nav block in `zensical.toml` manually. +- Forgot to run `make plugin-docs` after plugin docs or package metadata changes. +- Forgot to run `make codeowners` after per-plugin ownership changes. +- Left `pyproject.toml` with a generic scaffold description. +- Left `docs/index.md` as generic scaffold text for a user-facing plugin. +- Used relative imports. +- Added local plugin-to-plugin dependencies. +- Used `typing.List`, `typing.Optional`, or `typing.Union` instead of Python 3.10+ annotations. +- Added nested helper functions or private helper closures. +- Deferred structural config validation to `generate()`. +- Missed tests for invalid config or null-like source values. diff --git a/.codex/skills/data-designer-plugin-authoring/SKILL.md b/.codex/skills/data-designer-plugin-authoring/SKILL.md new file mode 100644 index 0000000..0e97c57 --- /dev/null +++ b/.codex/skills/data-designer-plugin-authoring/SKILL.md @@ -0,0 +1,160 @@ +--- +name: data-designer-plugin-authoring +description: Use when creating, updating, documenting, or preparing a pull request for a Data Designer plugin in the NVIDIA-NeMo/DataDesignerPlugins repository, including ddp scaffolding, plugin implementation, validation, and per-plugin Zensical docs. +metadata: + short-description: Create Data Designer plugins +--- + +# Data Designer Plugin Authoring + +Use this skill for plugin work in the `DataDesignerPlugins` repo. The repo is a `uv` workspace with shared tooling in `devtools/` and one independent package per plugin under `plugins/*`. The Python baseline is 3.10+. + +## Context To Load + +Before making plugin changes, read the local files that define the current contract: + +- `AGENTS.md` +- `README.md` +- `docs/authoring.md` +- `docs/workflow.md` +- `Makefile` +- `zensical.toml` +- `devtools/ddp/src/ddp/scaffold.py` +- `devtools/ddp/src/ddp/plugin_docs.py` +- The reference plugin under `plugins/data-designer-template/` + +After `make sync`, inspect Data Designer interfaces directly when signatures matter: + +```bash +uv run python -c "import inspect; from data_designer.config.base import SingleColumnConfig; print(inspect.getsource(SingleColumnConfig))" +uv run python -c "import inspect; from data_designer.engine.column_generators.generators.base import ColumnGeneratorFullColumn; print(inspect.getsource(ColumnGeneratorFullColumn))" +``` + +## Scaffold First + +Use the repo CLI instead of creating package files by hand: + +```bash +make sync +uv run ddp new +``` + +Use a kebab-case slug without the `data-designer-` prefix. The package will be created at `plugins/data-designer-/` with `pyproject.toml`, `README.md`, `CODEOWNERS`, `docs/index.md`, tests, and `src/data_designer_/`. + +Read the generated files before editing. If the generated class names stutter because the slug contains words such as `column`, rename the classes and update `plugin.py`. + +## Implementation Rules + +- Keep plugins self-contained. Do not add local dependencies on another plugin package. +- Use absolute imports such as `from data_designer_my_plugin.config import MyPluginColumnConfig`. +- Use Python 3.10+ annotations: `list[str]`, `A | B`, and `X | None`. +- Subclass `SingleColumnConfig` and define `column_type` as a `Literal["slug"]` with the same default string. +- Add Pydantic validators for structural constraints so bad config fails during config construction. +- Use the appropriate generator base, usually `ColumnGeneratorFullColumn[YourConfig]`. +- Keep logic in top-level reusable functions or modules. Do not use nested helper functions or private closures. +- Prefer vectorized pandas operations, named helpers, `functools.partial`, or dispatch tables over lambda-heavy `apply()` code. +- Use `from __future__ import annotations` and `TYPE_CHECKING` for pandas type-only imports. +- Add Google-style docstrings to public classes, functions, and methods. +- Add new import packages to the root Ruff isort `known-first-party` list. + +The scaffold normally creates correct plugin wiring: + +```python +from data_designer.plugins.plugin import Plugin, PluginType + +plugin = Plugin( + config_qualified_name="data_designer_my_plugin.config.MyPluginColumnConfig", + impl_qualified_name="data_designer_my_plugin.impl.MyPluginColumnGenerator", + plugin_type=PluginType.COLUMN_GENERATOR, +) +``` + +## Tests + +Write tests around public behavior: + +- `assert_valid_plugin(plugin)` contract validation. +- Config defaults, dependency properties, and validation errors. +- Pure helper function behavior. +- Generator behavior on representative DataFrames. +- Data Designer preview integration when useful. +- Edge cases with `None`, `NaN`, empty strings, numeric values in text columns, and malformed config. + +Use `pathlib.Path` for pytest `tmp_path` annotations. + +Run plugin tests in isolation: + +```bash +make test-plugin PLUGIN=data-designer-my-plugin +``` + +## Per-Plugin Zensical Docs + +Each plugin owns source docs under `plugins/data-designer-/docs/`. `make plugin-docs` copies those files into the generated `docs/plugins/data-designer-/` tree and updates the generated nav block in `zensical.toml`. + +Do not edit generated plugin docs or the generated nav block directly. + +Recommended source docs: + +```text +plugins/data-designer-/docs/ +|-- index.md +`-- usage.md +``` + +Prepare `docs/index.md` as the overview: + +- H1: `# data-designer-`. +- Short description of what the plugin adds. +- Installation command: `uv add data-designer data-designer-`. +- Column type section naming the entry point users configure. +- Configuration table with `Field`, `Required`, and `Description`. +- Realistic Python or YAML usage example. +- Important behavior notes, output columns, or limitations only when useful. + +Prepare `docs/usage.md` when the plugin needs a fuller example. Its H1 becomes the Zensical nav label, so keep it concise. Include expected output shape, before/after behavior, and validation or error cases users need to understand. + +Formatting rules: + +- Keep links and assets relative to the plugin docs directory. +- Put doc assets under the plugin docs tree, such as `docs/assets/example.png`. +- Use fenced code blocks with language tags. +- Use Markdown tables for config references. +- Keep heading levels hierarchical. +- Make the package `pyproject.toml` description concise and user-facing because it feeds generated plugin cards. +- Verify the `[project.entry-points."data_designer.plugins"]` key is the column type users configure. + +Regenerate and validate docs after plugin docs or metadata changes: + +```bash +make plugin-docs +make docs +``` + +## Generated Files + +Use current target names: + +```bash +make plugin-docs +make codeowners +make update-license-headers +make check +``` + +There is no `make catalog` target in this repo. + +## Validation + +Prefer repo targets: + +```bash +make format +make lint +make test-plugin PLUGIN=data-designer-my-plugin +make validate +make check +make docs +``` + +Run `make all` before the PR when feasible. If a full target cannot be run, report exactly what was skipped and why. From fde00e16d8d2a9cecb10e4acfa906c1ccaf0bc9b Mon Sep 17 00:00:00 2001 From: "Eric W. Tramel" Date: Wed, 6 May 2026 10:03:03 -0400 Subject: [PATCH 2/2] Clarify deterministic plugin scaffolding --- .claude/commands/create-plugin.md | 28 +++++-------------- .../data-designer-plugin-authoring/SKILL.md | 10 ++++--- 2 files changed, 13 insertions(+), 25 deletions(-) diff --git a/.claude/commands/create-plugin.md b/.claude/commands/create-plugin.md index b34eb05..d889844 100644 --- a/.claude/commands/create-plugin.md +++ b/.claude/commands/create-plugin.md @@ -36,35 +36,20 @@ uv run python -c "import inspect; from data_designer.engine.column_generators.ge --- -## Phase 2: Scaffold +## Phase 2: Scaffold With `ddp` -Always use the canonical scaffold. Do not hand-create a plugin package layout. +The initial plugin structure is owned by the repo's `ddp` CLI. Always invoke the scaffold command; do not create the package directory, `pyproject.toml`, source package, tests, docs, or ownership files by hand. ```bash make sync uv run ddp new ``` -Use the kebab-case slug without the `data-designer-` prefix. The scaffold creates: +Use the kebab-case slug without the `data-designer-` prefix. If you need to understand exactly what the command creates, read `devtools/ddp/src/ddp/scaffold.py` or inspect the generated files after running the command. The skill should not duplicate the scaffold algorithm; the software encodes that process deterministically. -```text -plugins/data-designer-/ -|-- pyproject.toml -|-- README.md -|-- CODEOWNERS -|-- docs/ -| `-- index.md -|-- tests/ -| `-- test_plugin.py -`-- src/ - `-- data_designer_/ - |-- __init__.py - |-- config.py - |-- impl.py - `-- plugin.py -``` +If the command fails because the scaffold is wrong or incomplete, fix the `ddp` tooling or report the blocker. Do not bypass it by hand-assembling the initial plugin skeleton. -Read all generated files after scaffolding. If the slug contains words such as `column` and the scaffold generates stuttering class names, rename the classes and update `plugin.py`. +After scaffolding, read the generated files before editing them. If the slug contains words such as `column` and the generated class names stutter, rename the classes and update `plugin.py`. --- @@ -294,7 +279,8 @@ make all Before opening the PR, verify you have not done any of these: -- Skipped `uv run ddp new ` and hand-created the plugin. +- Skipped `uv run ddp new ` and hand-created the plugin structure. +- Treated this command as a copy of the scaffold algorithm instead of delegating the initial structure to `ddp`. - Used `docs/adding-a-plugin.md`; the current guide is `docs/authoring.md`. - Used `make catalog`; the current generated docs target is `make plugin-docs`. - Edited generated files under `docs/plugins/` manually. diff --git a/.codex/skills/data-designer-plugin-authoring/SKILL.md b/.codex/skills/data-designer-plugin-authoring/SKILL.md index 0e97c57..24d2e70 100644 --- a/.codex/skills/data-designer-plugin-authoring/SKILL.md +++ b/.codex/skills/data-designer-plugin-authoring/SKILL.md @@ -30,18 +30,20 @@ uv run python -c "import inspect; from data_designer.config.base import SingleCo uv run python -c "import inspect; from data_designer.engine.column_generators.generators.base import ColumnGeneratorFullColumn; print(inspect.getsource(ColumnGeneratorFullColumn))" ``` -## Scaffold First +## Scaffold With `ddp` -Use the repo CLI instead of creating package files by hand: +The initial plugin structure is owned by the repo's `ddp` CLI. Always invoke the scaffold command; do not create the package directory, `pyproject.toml`, source package, tests, docs, or ownership files by hand. ```bash make sync uv run ddp new ``` -Use a kebab-case slug without the `data-designer-` prefix. The package will be created at `plugins/data-designer-/` with `pyproject.toml`, `README.md`, `CODEOWNERS`, `docs/index.md`, tests, and `src/data_designer_/`. +Use a kebab-case slug without the `data-designer-` prefix. If you need the exact scaffold behavior, read `devtools/ddp/src/ddp/scaffold.py` or inspect the generated files after running the command. Do not reproduce the scaffold algorithm in this skill; the software encodes that process deterministically. -Read the generated files before editing. If the generated class names stutter because the slug contains words such as `column`, rename the classes and update `plugin.py`. +If the command fails because the scaffold is wrong or incomplete, fix the `ddp` tooling or report the blocker. Do not bypass it by hand-assembling the initial plugin skeleton. + +Read the generated files before editing them. If the generated class names stutter because the slug contains words such as `column`, rename the classes and update `plugin.py`. ## Implementation Rules