From 18ca03d859540568dfea1b4de81478ebd5b64475 Mon Sep 17 00:00:00 2001
From: "Eric W. Tramel" <eric.tramel@gmail.com>
Date: Wed, 6 May 2026 09:50:31 -0400
Subject: [PATCH 1/2] Update plugin creation agent guidance

---
 .claude/commands/create-plugin.md             | 385 +++++++++---------
 .../data-designer-plugin-authoring/SKILL.md   | 160 ++++++++
 2 files changed, 347 insertions(+), 198 deletions(-)
 create mode 100644 .codex/skills/data-designer-plugin-authoring/SKILL.md

diff --git a/.claude/commands/create-plugin.md b/.claude/commands/create-plugin.md
index 195acef..b34eb05 100644
--- a/.claude/commands/create-plugin.md
+++ b/.claude/commands/create-plugin.md
@@ -1,322 +1,311 @@
 ---
-description: Create a new DataDesigner plugin with correct structure, conventions, and passing CI
-argument-hint: Plugin name and description (e.g., "word-count - counts words in text columns")
+description: Create a new Data Designer plugin with correct structure, docs, metadata, and local validation
+argument-hint: Plugin slug and description (e.g., "word-count - counts words in text columns")
 ---
 
-# DataDesigner Plugin Authoring
+# Data Designer Plugin Authoring
 
-You are creating a new DataDesigner plugin in the `data-designer-plugins` monorepo. Follow this guide precisely. It encodes lessons learned from prior plugin authoring attempts and addresses common pitfalls.
+You are creating a Data Designer plugin in the `DataDesignerPlugins` monorepo. Use the repository tooling as the source of truth, keep the plugin self-contained, and prepare the per-plugin documentation so the Zensical site generation stays clean.
 
 **Plugin request:** $ARGUMENTS
 
 ---
 
-## Phase 1: Understand the Codebase (DO NOT SKIP)
+## Phase 1: Read Current Repo Context
 
-Before writing any code, you must build context. Use dedicated tools (Read, Glob) rather than Bash for file exploration.
+Before editing, read the current repository guidance and reference implementation. Use Claude Read/Glob tools for file exploration when possible.
 
-**Required reads** (in parallel):
+Required reads:
 
-1. `CLAUDE.md` -- repo conventions (already in system context, but re-read for specifics)
-2. `plugins/data-designer-template/src/data_designer_template/config.py` -- reference config
-3. `plugins/data-designer-template/src/data_designer_template/impl.py` -- reference implementation
-4. `plugins/data-designer-template/src/data_designer_template/plugin.py` -- reference wiring
-5. `plugins/data-designer-template/tests/test_plugin.py` -- reference tests
-6. `plugins/data-designer-template/pyproject.toml` -- reference packaging
-7. `docs/adding-a-plugin.md` -- full authoring guide (agents often skip this -- don't)
+1. `AGENTS.md` - repo conventions, workflow, PR expectations, and release guardrails.
+2. `README.md` - current quick start, Makefile targets, and `ddp` CLI overview.
+3. `docs/authoring.md` - plugin authoring guide.
+4. `docs/workflow.md` - local checks, generated docs, and CI expectations.
+5. `Makefile` - canonical target names.
+6. `zensical.toml` - site configuration and generated plugin docs navigation block.
+7. `devtools/ddp/src/ddp/scaffold.py` - current scaffold output.
+8. `devtools/ddp/src/ddp/plugin_docs.py` - how per-plugin docs become Zensical pages.
+9. `plugins/data-designer-template/` - reference package, especially `config.py`, `impl.py`, `plugin.py`, `tests/test_plugin.py`, `pyproject.toml`, and `docs/`.
 
-**Required introspection** (after `make sync`):
+After `make sync`, inspect the Data Designer interfaces you plan to implement instead of guessing signatures:
 
 ```bash
 uv run python -c "import inspect; from data_designer.config.base import SingleColumnConfig; print(inspect.getsource(SingleColumnConfig))"
 uv run python -c "import inspect; from data_designer.engine.column_generators.generators.base import ColumnGeneratorFullColumn; print(inspect.getsource(ColumnGeneratorFullColumn))"
 ```
 
-This tells you the exact interface you must implement. Do not guess at method signatures.
-
 ---
 
 ## Phase 2: Scaffold
 
-Always use the canonical scaffold tool. Never hand-create the plugin directory structure.
+Always use the canonical scaffold. Do not hand-create a plugin package layout.
 
 ```bash
 make sync
 uv run ddp new <slug>
 ```
 
-After scaffolding, read all generated files to see what the scaffold provides and what you need to modify.
+Use the kebab-case slug without the `data-designer-` prefix. The scaffold creates:
+
+```text
+plugins/data-designer-<slug>/
+|-- pyproject.toml
+|-- README.md
+|-- CODEOWNERS
+|-- docs/
+|   `-- index.md
+|-- tests/
+|   `-- test_plugin.py
+`-- src/
+    `-- data_designer_<slug_with_underscores>/
+        |-- __init__.py
+        |-- config.py
+        |-- impl.py
+        `-- plugin.py
+```
+
+Read all generated files after scaffolding. If the slug contains words such as `column` and the scaffold generates stuttering class names, rename the classes and update `plugin.py`.
 
 ---
 
 ## Phase 3: Implement
 
-### 3a. Config (`config.py`)
+### Config
 
-Subclass `SingleColumnConfig`. Required elements:
+Subclass `SingleColumnConfig`. Use Python 3.10+ annotations and a literal column type default:
 
 ```python
 from typing import Literal
+
 from data_designer.config.base import SingleColumnConfig
 
+
 class MyPluginColumnConfig(SingleColumnConfig):
-    column_type: Literal["my-plugin"] = "my-plugin"  # Must be Literal with default
+    """Configuration for the my-plugin column generator."""
 
-    # Your config fields here (use modern 3.10+ annotations: list[str], X | None)
+    column_type: Literal["my-plugin"] = "my-plugin"
 
     @staticmethod
     def get_column_emoji() -> str:
-        return "..."  # Single emoji
+        return "..."
 
     @property
     def required_columns(self) -> list[str]:
-        return [...]  # Columns that must exist before this one runs
+        return ["source_column"]
 
     @property
     def side_effect_columns(self) -> list[str]:
-        return []  # Additional columns this generator creates (usually empty)
+        return []
 ```
 
-**Common mistake -- class naming**: If your plugin slug already contains "column" (e.g., "hash-column"), the scaffold generates `HashColumnColumnConfig` with a stutter. Rename to `HashColumnConfig` / `HashColumnGenerator` and update `plugin.py` qualified names accordingly.
+Add Pydantic validators for structural constraints such as non-empty lists, parsable patterns, allowed modes, paths, or field combinations. Catch invalid config at construction time, not inside `generate()`.
 
-**Validate early with `field_validator`**: If your config has fields with structural constraints (e.g., a regex that must contain capture groups, a list that must be non-empty, a string that must parse as a certain format), add a Pydantic `@field_validator` so errors are caught at config construction time, not at `generate()` time. Deferring validation to `generate()` means users only discover bad config after they've wired up an entire pipeline.
+### Implementation
 
-```python
-from pydantic import field_validator
+Use the Data Designer generator base that matches the behavior, usually `ColumnGeneratorFullColumn[YourConfig]` for whole-column transformations.
 
-class MyPluginColumnConfig(SingleColumnConfig):
-    pattern: str
-
-    @field_validator("pattern")
-    @classmethod
-    def pattern_must_be_valid(cls, value: str) -> str:
-        """Validate the pattern at config construction time."""
-        compiled = re.compile(value)
-        if compiled.groups < 1:
-            raise ValueError(f"Pattern must contain at least one capture group, got: {value!r}")
-        return value
-```
+Implementation rules:
 
-Do not duplicate this validation logic in `impl.py`. If the config validates on construction, `generate()` can trust the field is valid.
+- Keep plugin logic in top-level composable functions or small modules; keep `impl.py` mostly orchestration.
+- Do not use relative imports. Import from the package name, for example `from data_designer_my_plugin.config import MyPluginColumnConfig`.
+- Do not define private helper closures or functions inside functions.
+- Prefer vectorized pandas operations, named helpers, `functools.partial`, or module-level dispatch tables over lambda-heavy `apply()` code.
+- Use `from __future__ import annotations` and guard pandas imports with `TYPE_CHECKING` when pandas is only needed for type hints.
+- Add Google-style docstrings to public classes, functions, and methods.
+- Keep dependencies in the plugin's own `pyproject.toml`. Do not depend on another local plugin package.
 
-### 3b. Implementation (`impl.py`)
+If you add a new import package, add it to the root Ruff isort list:
 
-Subclass `ColumnGeneratorFullColumn[YourConfig]` (batch) or `ColumnGeneratorCellByCell[YourConfig]` (row-by-row).
+```toml
+[tool.ruff.lint.isort]
+known-first-party = ["ddp", "data_designer_template", "data_designer_my_plugin"]
+```
 
-**Critical rules:**
+### Plugin Wiring
 
-1. **NO lambda closures.** CLAUDE.md bans closures and function-in-function definitions. This is the most common violation.
+The scaffold usually gets `plugin.py` right. Only update it when class or module names change:
 
-   BAD (every prior agent did this):
-   ```python
-   data[self.config.name] = data[col].apply(lambda x: my_func(x, param))
-   ```
+```python
+from data_designer.plugins.plugin import Plugin, PluginType
 
-   GOOD -- use `functools.partial`:
-   ```python
-   from functools import partial
-   data[self.config.name] = data[col].apply(partial(my_func, param=param))
-   ```
+plugin = Plugin(
+    config_qualified_name="data_designer_my_plugin.config.MyPluginColumnConfig",
+    impl_qualified_name="data_designer_my_plugin.impl.MyPluginColumnGenerator",
+    plugin_type=PluginType.COLUMN_GENERATOR,
+)
+```
 
-   GOOD -- use vectorized pandas operations when possible:
-   ```python
-   data[self.config.name] = data[col].str.upper()
-   ```
+### CODEOWNERS
 
-   GOOD -- use a module-level dispatch dict:
-   ```python
-   _MODE_FUNCTIONS: dict[str, Callable[[str], int]] = {
-       "words": count_words,
-       "characters": count_characters,
-   }
-   # In generate():
-   data[self.config.name] = data[col].apply(_MODE_FUNCTIONS[self.config.mode])
-   ```
+The scaffold discovers an owner from git config. Check the per-plugin `CODEOWNERS` and prefer the repo convention from the template:
 
-2. **Extract logic into top-level composable functions**, not methods on the generator class. This follows the CLAUDE.md rule: "Favor reusable, composable functions that can be combined in higher-level functions."
+```text
+* @NVIDIA-NeMo/data_designer_reviewers
+```
 
-   **But avoid leaky abstractions.** If a helper function accepts a compiled object (e.g., `re.Pattern`) but then extracts the raw string from it to pass to another API (e.g., `series.str.extract(pattern.pattern)`), the abstraction is misleading. Either accept the raw form the downstream API needs, or use the compiled object directly.
+Run `make codeowners` after ownership changes so `.github/CODEOWNERS` is regenerated.
 
-3. **Use `TYPE_CHECKING` guard for pandas**:
-   ```python
-   from __future__ import annotations
-   from typing import TYPE_CHECKING
-   if TYPE_CHECKING:
-       import pandas as pd
-   ```
+---
 
-4. **Full Google-style docstrings** on all public functions, methods, and classes.
+## Phase 4: Test Public Behavior
 
-5. **No relative imports.** Use `from data_designer_my_plugin.config import MyPluginColumnConfig`.
+Write tests around public interfaces and expected Data Designer behavior:
 
-### 3c. Plugin wiring (`plugin.py`)
+```python
+from data_designer.engine.testing.utils import assert_valid_plugin
 
-The scaffold generates this correctly. Only update if you renamed classes:
+from data_designer_my_plugin.plugin import plugin
 
-```python
-plugin = Plugin(
-    config_qualified_name="data_designer_my_plugin.config.MyPluginColumnConfig",
-    impl_qualified_name="data_designer_my_plugin.impl.MyPluginColumnGenerator",
-    plugin_type=PluginType.COLUMN_GENERATOR,
-)
+
+def test_valid_plugin() -> None:
+    assert_valid_plugin(plugin)
 ```
 
-### 3d. Extra modules
+Cover the relevant tiers:
 
-If your plugin has substantial pure logic (scoring, parsing, transformation), extract it into a separate module (e.g., `scoring.py`). Keep `impl.py` thin -- it should wire config to logic, not contain the logic itself.
+- Config properties, defaults, and validation errors.
+- Pure helper functions or parsing/scoring modules.
+- Generator behavior against representative DataFrames.
+- Data Designer preview integration when the plugin changes user-visible pipeline behavior.
+- Edge cases with `None`, `NaN`, empty strings, numeric values in text columns, or malformed config.
 
-### 3e. Root `pyproject.toml`
+If using pytest's `tmp_path`, annotate it as `pathlib.Path`, not `pd.DataFrame`.
 
-Add your module to the isort known-first-party list:
+Run the isolated plugin test loop while developing:
 
-```toml
-[tool.ruff.lint.isort]
-known-first-party = [..., "data_designer_my_plugin"]
+```bash
+make test-plugin PLUGIN=data-designer-my-plugin
 ```
 
-### 3f. CODEOWNERS
-
-The scaffold generates this from `git config user.email`. Check that it uses `@username` or `@org/team` format (e.g., `* @NVIDIA-NeMo/data_designer_reviewers`), not email format. If it used email, fix it to match the convention in the template's CODEOWNERS.
+The target uses `uv venv --clear`, so stale `.venv-data-designer-my-plugin` directories should not need manual cleanup.
 
 ---
 
-## Phase 4: Test
+## Phase 5: Prepare Per-Plugin Zensical Docs
 
-### Test structure
+Each plugin owns its source docs under `plugins/data-designer-<slug>/docs/`. The top-level Zensical site is generated from those files and package metadata.
 
-Write four tiers of tests, matching the template's patterns:
+Required source docs:
 
-```python
-# Tier 1: Plugin contract validation
-def test_valid_plugin() -> None:
-    assert_valid_plugin(plugin)
+```text
+plugins/data-designer-<slug>/docs/
+`-- index.md
+```
+
+Recommended docs for user-facing plugins:
 
-# Tier 2: Config unit tests
-class TestMyPluginColumnConfig:
-    def test_required_columns(self) -> None: ...
-    def test_side_effect_columns(self) -> None: ...
-    def test_column_emoji(self) -> None: ...
-    def test_defaults(self) -> None: ...
-
-# Tier 3: Generator unit tests (using __new__ bypass pattern)
-def _make_generator(config: MyPluginColumnConfig) -> MyPluginColumnGenerator:
-    generator = MyPluginColumnGenerator.__new__(MyPluginColumnGenerator)
-    generator._config = config
-    return generator
-
-class TestMyPluginColumnGenerator:
-    @pytest.fixture()
-    def source_df(self) -> pd.DataFrame:
-        return pd.DataFrame({...})
-
-    def test_basic_generation(self, source_df: pd.DataFrame) -> None:
-        generator = _make_generator(MyPluginColumnConfig(name="out", ...))
-        result = generator.generate(source_df)
-        assert "out" in result.columns
-        ...
-
-# Tier 4: Integration tests using DataDesigner.preview()
-class TestMyPluginPreviewIntegration:
-    def test_preview_basic(self, tmp_path: Path) -> None:
-        seed_df = pd.DataFrame({...})
-        builder = DataDesignerConfigBuilder()
-        builder.with_seed_dataset(DataFrameSeedSource(df=seed_df))
-        builder.add_column(name="out", column_type="my-plugin", ...)
-        result = DataDesigner(artifact_path=tmp_path / "artifacts").preview(builder, num_records=3)
-        assert result.dataset is not None
-        assert "out" in result.dataset.columns
+```text
+plugins/data-designer-<slug>/docs/
+|-- index.md
+`-- usage.md
 ```
 
-### Test pitfalls to avoid
+Write `docs/index.md` as the plugin overview:
 
-1. **`tmp_path` type annotation**: The pytest `tmp_path` fixture is `pathlib.Path`, NOT `pd.DataFrame`. The template has this bug -- do not copy it.
-   ```python
-   from pathlib import Path
-   def test_preview(self, tmp_path: Path) -> None:  # CORRECT
-   # NOT: def test_preview(self, tmp_path: pd.DataFrame) -> None:  # WRONG
-   ```
+- H1: `# data-designer-<slug>`.
+- One short paragraph explaining what the plugin adds.
+- Installation command using `uv add data-designer data-designer-<slug>`.
+- Column type section naming the discovered entry point, for example `` `<slug>` ``.
+- Configuration table with `Field`, `Required`, and `Description` columns.
+- A realistic Python or Data Designer config example.
+- Important behavior notes, limitations, or output columns only when useful.
 
-2. **Pandas dtype coercion**: When creating a `pd.Series` with mixed int/float, pandas upcasts ints to floats. `pd.Series({"a": 42, "b": 3.14})` gives `a=42.0`, not `a=42`. Write test expectations accordingly, or use uniform types.
+Write `docs/usage.md` when the plugin needs a fuller example:
 
-3. **Test composable functions independently**: If you extracted functions to module-level, write dedicated test classes for them (e.g., `TestComputeHash`, `TestTokenize`). This goes beyond the template but produces better coverage.
+- H1: `# Usage` or another concise title; non-index H1 text becomes the Zensical nav label.
+- A runnable or realistic example using `DataDesignerConfigBuilder`, a YAML-style config, or both.
+- Expected output shape or before/after behavior.
+- Error cases and config validation notes that users should know before running a job.
 
-4. **Test config validation edge cases**: If you added `@field_validator` on config fields, write tests that verify invalid inputs are rejected at construction time:
-   ```python
-   def test_rejects_invalid_pattern(self) -> None:
-       with pytest.raises(ValueError, match="at least one capture group"):
-           MyPluginColumnConfig(name="out", source_column="src", pattern=r"\d+")
+Zensical formatting rules for this repo:
 
-   def test_rejects_malformed_input(self) -> None:
-       with pytest.raises(Exception):  # re.error or ValidationError
-           MyPluginColumnConfig(name="out", source_column="src", pattern=r"(unclosed")
-   ```
+- Keep links and assets relative to the plugin's own `docs/` directory; generated pages are copied to `docs/plugins/data-designer-<slug>/`.
+- Store plugin doc assets under the plugin docs tree, for example `plugins/data-designer-<slug>/docs/assets/example.png`.
+- Use fenced code blocks with language tags such as `python`, `yaml`, `toml`, or `bash`.
+- Use Markdown tables for config references.
+- Keep headings hierarchical and avoid skipping from H1 to H3.
+- Do not edit `docs/plugins/` directly. It is generated.
+- Do not edit the generated plugin nav block in `zensical.toml` directly.
+- Remember that package metadata feeds the generated plugin index card: keep `pyproject.toml` `description` concise and user-facing, and verify the `data_designer.plugins` entry point key is the column type users configure.
 
-5. **Test edge cases with None and non-string source values**: DataFrames in the wild often have `None`, `NaN`, or numeric values in text columns. Write at least one test that exercises your generator on a DataFrame with `None` values in the source column to verify graceful handling.
+Regenerate and validate site inputs after plugin docs or metadata change:
 
-6. **Stale venv on test re-run**: `make test-plugin` creates `.venv-{plugin-name}` and fails if it already exists from a prior failed run. If tests fail and you need to re-run:
-   ```bash
-   rm -rf .venv-data-designer-my-plugin && make test-plugin PLUGIN=data-designer-my-plugin
-   ```
+```bash
+make plugin-docs
+make docs
+```
 
 ---
 
-## Phase 5: Format and Lint First
+## Phase 6: Regenerate Derived Files
+
+Use current target names. There is no `make catalog` target in this repo.
 
-Import sort order (isort) is the most common lint failure. **Always run `make format` before `make lint`** to avoid wasting a cycle:
+When plugin docs or package metadata change:
 
 ```bash
-make sync
-make format                                            # Fix import order and formatting FIRST
-make lint                                              # Should pass after format
+make plugin-docs
 ```
 
----
+When plugin ownership changes:
+
+```bash
+make codeowners
+```
 
-## Phase 6: Test in Isolation
+When Python files are added or changed:
 
 ```bash
-make test-plugin PLUGIN=data-designer-my-plugin        # Isolated venv test
+make update-license-headers
 ```
 
-If tests fail and you need to re-run, **delete the stale venv first** (the Makefile does not auto-clean on failure):
+`make check` verifies generated plugin docs, generated CODEOWNERS, and SPDX headers:
 
 ```bash
-rm -rf .venv-data-designer-my-plugin && make test-plugin PLUGIN=data-designer-my-plugin
+make check
 ```
 
 ---
 
-## Phase 7: Validate and Check
+## Phase 7: Local Validation
+
+Prefer the repo's Makefile targets over ad hoc substitutes.
+
+Fast loop:
+
+```bash
+make format
+make lint
+make test-plugin PLUGIN=data-designer-my-plugin
+make validate
+make check
+make docs
+```
+
+Full local CI:
 
 ```bash
-make validate                                          # Entry point + assert_valid_plugin
-make catalog && make codeowners && make update-license-headers  # Regenerate derived files
-make check                                             # Verify derived files match
-make lint                                              # Final lint confirmation
+make all
 ```
 
 ---
 
 ## Anti-Pattern Checklist
 
-Before declaring done, verify you have NOT done any of these:
-
-- [ ] Lambda closures in `generate()` or anywhere else (use `functools.partial` or dispatch dicts)
-- [ ] Relative imports (`from .config import ...`)
-- [ ] `tmp_path: pd.DataFrame` in test signatures (should be `from pathlib import Path` then `tmp_path: Path`)
-- [ ] Missing SPDX headers on any `.py` file
-- [ ] Email format in CODEOWNERS instead of `@username` (read template's CODEOWNERS to match)
-- [ ] Missing docstrings on public functions/classes
-- [ ] Private helper closures or nested function definitions
-- [ ] `typing.List[str]` instead of `list[str]` (3.10+ style required)
-- [ ] Missing `from __future__ import annotations` when using `TYPE_CHECKING`
-- [ ] Skipped reading `docs/adding-a-plugin.md`
-- [ ] Used `find` or `ls` via Bash instead of Glob/Read tools
-- [ ] Forgot to add module to `known-first-party` in root `pyproject.toml`
-- [ ] Forgot to run `make catalog && make codeowners && make update-license-headers`
-- [ ] Forgot to run `make format` BEFORE `make lint` (isort failures are the #1 lint issue)
-- [ ] Forgot to delete stale `.venv-*` before re-running `make test-plugin` after a failure
-- [ ] Config fields with structural constraints lack `@field_validator` (validate at construction, not at `generate()` time)
-- [ ] Helper function accepts a compiled object but then extracts the raw form to pass to a downstream API (leaky abstraction)
-- [ ] No tests for `None`/`NaN` values in the source column
-- [ ] No tests verifying that invalid config field values are rejected at construction time
+Before opening the PR, verify you have not done any of these:
+
+- Skipped `uv run ddp new <slug>` and hand-created the plugin.
+- Used `docs/adding-a-plugin.md`; the current guide is `docs/authoring.md`.
+- Used `make catalog`; the current generated docs target is `make plugin-docs`.
+- Edited generated files under `docs/plugins/` manually.
+- Edited the generated plugin nav block in `zensical.toml` manually.
+- Forgot to run `make plugin-docs` after plugin docs or package metadata changes.
+- Forgot to run `make codeowners` after per-plugin ownership changes.
+- Left `pyproject.toml` with a generic scaffold description.
+- Left `docs/index.md` as generic scaffold text for a user-facing plugin.
+- Used relative imports.
+- Added local plugin-to-plugin dependencies.
+- Used `typing.List`, `typing.Optional`, or `typing.Union` instead of Python 3.10+ annotations.
+- Added nested helper functions or private helper closures.
+- Deferred structural config validation to `generate()`.
+- Missed tests for invalid config or null-like source values.
diff --git a/.codex/skills/data-designer-plugin-authoring/SKILL.md b/.codex/skills/data-designer-plugin-authoring/SKILL.md
new file mode 100644
index 0000000..0e97c57
--- /dev/null
+++ b/.codex/skills/data-designer-plugin-authoring/SKILL.md
@@ -0,0 +1,160 @@
+---
+name: data-designer-plugin-authoring
+description: Use when creating, updating, documenting, or preparing a pull request for a Data Designer plugin in the NVIDIA-NeMo/DataDesignerPlugins repository, including ddp scaffolding, plugin implementation, validation, and per-plugin Zensical docs.
+metadata:
+  short-description: Create Data Designer plugins
+---
+
+# Data Designer Plugin Authoring
+
+Use this skill for plugin work in the `DataDesignerPlugins` repo. The repo is a `uv` workspace with shared tooling in `devtools/` and one independent package per plugin under `plugins/*`. The Python baseline is 3.10+.
+
+## Context To Load
+
+Before making plugin changes, read the local files that define the current contract:
+
+- `AGENTS.md`
+- `README.md`
+- `docs/authoring.md`
+- `docs/workflow.md`
+- `Makefile`
+- `zensical.toml`
+- `devtools/ddp/src/ddp/scaffold.py`
+- `devtools/ddp/src/ddp/plugin_docs.py`
+- The reference plugin under `plugins/data-designer-template/`
+
+After `make sync`, inspect Data Designer interfaces directly when signatures matter:
+
+```bash
+uv run python -c "import inspect; from data_designer.config.base import SingleColumnConfig; print(inspect.getsource(SingleColumnConfig))"
+uv run python -c "import inspect; from data_designer.engine.column_generators.generators.base import ColumnGeneratorFullColumn; print(inspect.getsource(ColumnGeneratorFullColumn))"
+```
+
+## Scaffold First
+
+Use the repo CLI instead of creating package files by hand:
+
+```bash
+make sync
+uv run ddp new <slug>
+```
+
+Use a kebab-case slug without the `data-designer-` prefix. The package will be created at `plugins/data-designer-<slug>/` with `pyproject.toml`, `README.md`, `CODEOWNERS`, `docs/index.md`, tests, and `src/data_designer_<slug>/`.
+
+Read the generated files before editing. If the generated class names stutter because the slug contains words such as `column`, rename the classes and update `plugin.py`.
+
+## Implementation Rules
+
+- Keep plugins self-contained. Do not add local dependencies on another plugin package.
+- Use absolute imports such as `from data_designer_my_plugin.config import MyPluginColumnConfig`.
+- Use Python 3.10+ annotations: `list[str]`, `A | B`, and `X | None`.
+- Subclass `SingleColumnConfig` and define `column_type` as a `Literal["slug"]` with the same default string.
+- Add Pydantic validators for structural constraints so bad config fails during config construction.
+- Use the appropriate generator base, usually `ColumnGeneratorFullColumn[YourConfig]`.
+- Keep logic in top-level reusable functions or modules. Do not use nested helper functions or private closures.
+- Prefer vectorized pandas operations, named helpers, `functools.partial`, or dispatch tables over lambda-heavy `apply()` code.
+- Use `from __future__ import annotations` and `TYPE_CHECKING` for pandas type-only imports.
+- Add Google-style docstrings to public classes, functions, and methods.
+- Add new import packages to the root Ruff isort `known-first-party` list.
+
+The scaffold normally creates correct plugin wiring:
+
+```python
+from data_designer.plugins.plugin import Plugin, PluginType
+
+plugin = Plugin(
+    config_qualified_name="data_designer_my_plugin.config.MyPluginColumnConfig",
+    impl_qualified_name="data_designer_my_plugin.impl.MyPluginColumnGenerator",
+    plugin_type=PluginType.COLUMN_GENERATOR,
+)
+```
+
+## Tests
+
+Write tests around public behavior:
+
+- `assert_valid_plugin(plugin)` contract validation.
+- Config defaults, dependency properties, and validation errors.
+- Pure helper function behavior.
+- Generator behavior on representative DataFrames.
+- Data Designer preview integration when useful.
+- Edge cases with `None`, `NaN`, empty strings, numeric values in text columns, and malformed config.
+
+Use `pathlib.Path` for pytest `tmp_path` annotations.
+
+Run plugin tests in isolation:
+
+```bash
+make test-plugin PLUGIN=data-designer-my-plugin
+```
+
+## Per-Plugin Zensical Docs
+
+Each plugin owns source docs under `plugins/data-designer-<slug>/docs/`. `make plugin-docs` copies those files into the generated `docs/plugins/data-designer-<slug>/` tree and updates the generated nav block in `zensical.toml`.
+
+Do not edit generated plugin docs or the generated nav block directly.
+
+Recommended source docs:
+
+```text
+plugins/data-designer-<slug>/docs/
+|-- index.md
+`-- usage.md
+```
+
+Prepare `docs/index.md` as the overview:
+
+- H1: `# data-designer-<slug>`.
+- Short description of what the plugin adds.
+- Installation command: `uv add data-designer data-designer-<slug>`.
+- Column type section naming the entry point users configure.
+- Configuration table with `Field`, `Required`, and `Description`.
+- Realistic Python or YAML usage example.
+- Important behavior notes, output columns, or limitations only when useful.
+
+Prepare `docs/usage.md` when the plugin needs a fuller example. Its H1 becomes the Zensical nav label, so keep it concise. Include expected output shape, before/after behavior, and validation or error cases users need to understand.
+
+Formatting rules:
+
+- Keep links and assets relative to the plugin docs directory.
+- Put doc assets under the plugin docs tree, such as `docs/assets/example.png`.
+- Use fenced code blocks with language tags.
+- Use Markdown tables for config references.
+- Keep heading levels hierarchical.
+- Make the package `pyproject.toml` description concise and user-facing because it feeds generated plugin cards.
+- Verify the `[project.entry-points."data_designer.plugins"]` key is the column type users configure.
+
+Regenerate and validate docs after plugin docs or metadata changes:
+
+```bash
+make plugin-docs
+make docs
+```
+
+## Generated Files
+
+Use current target names:
+
+```bash
+make plugin-docs
+make codeowners
+make update-license-headers
+make check
+```
+
+There is no `make catalog` target in this repo.
+
+## Validation
+
+Prefer repo targets:
+
+```bash
+make format
+make lint
+make test-plugin PLUGIN=data-designer-my-plugin
+make validate
+make check
+make docs
+```
+
+Run `make all` before the PR when feasible. If a full target cannot be run, report exactly what was skipped and why.

From fde00e16d8d2a9cecb10e4acfa906c1ccaf0bc9b Mon Sep 17 00:00:00 2001
From: "Eric W. Tramel" <eric.tramel@gmail.com>
Date: Wed, 6 May 2026 10:03:03 -0400
Subject: [PATCH 2/2] Clarify deterministic plugin scaffolding

---
 .claude/commands/create-plugin.md             | 28 +++++--------------
 .../data-designer-plugin-authoring/SKILL.md   | 10 ++++---
 2 files changed, 13 insertions(+), 25 deletions(-)

diff --git a/.claude/commands/create-plugin.md b/.claude/commands/create-plugin.md
index b34eb05..d889844 100644
--- a/.claude/commands/create-plugin.md
+++ b/.claude/commands/create-plugin.md
@@ -36,35 +36,20 @@ uv run python -c "import inspect; from data_designer.engine.column_generators.ge
 
 ---
 
-## Phase 2: Scaffold
+## Phase 2: Scaffold With `ddp`
 
-Always use the canonical scaffold. Do not hand-create a plugin package layout.
+The initial plugin structure is owned by the repo's `ddp` CLI. Always invoke the scaffold command; do not create the package directory, `pyproject.toml`, source package, tests, docs, or ownership files by hand.
 
 ```bash
 make sync
 uv run ddp new <slug>
 ```
 
-Use the kebab-case slug without the `data-designer-` prefix. The scaffold creates:
+Use the kebab-case slug without the `data-designer-` prefix. If you need to understand exactly what the command creates, read `devtools/ddp/src/ddp/scaffold.py` or inspect the generated files after running the command. The skill should not duplicate the scaffold algorithm; the software encodes that process deterministically.
 
-```text
-plugins/data-designer-<slug>/
-|-- pyproject.toml
-|-- README.md
-|-- CODEOWNERS
-|-- docs/
-|   `-- index.md
-|-- tests/
-|   `-- test_plugin.py
-`-- src/
-    `-- data_designer_<slug_with_underscores>/
-        |-- __init__.py
-        |-- config.py
-        |-- impl.py
-        `-- plugin.py
-```
+If the command fails because the scaffold is wrong or incomplete, fix the `ddp` tooling or report the blocker. Do not bypass it by hand-assembling the initial plugin skeleton.
 
-Read all generated files after scaffolding. If the slug contains words such as `column` and the scaffold generates stuttering class names, rename the classes and update `plugin.py`.
+After scaffolding, read the generated files before editing them. If the slug contains words such as `column` and the generated class names stutter, rename the classes and update `plugin.py`.
 
 ---
 
@@ -294,7 +279,8 @@ make all
 
 Before opening the PR, verify you have not done any of these:
 
-- Skipped `uv run ddp new <slug>` and hand-created the plugin.
+- Skipped `uv run ddp new <slug>` and hand-created the plugin structure.
+- Treated this command as a copy of the scaffold algorithm instead of delegating the initial structure to `ddp`.
 - Used `docs/adding-a-plugin.md`; the current guide is `docs/authoring.md`.
 - Used `make catalog`; the current generated docs target is `make plugin-docs`.
 - Edited generated files under `docs/plugins/` manually.
diff --git a/.codex/skills/data-designer-plugin-authoring/SKILL.md b/.codex/skills/data-designer-plugin-authoring/SKILL.md
index 0e97c57..24d2e70 100644
--- a/.codex/skills/data-designer-plugin-authoring/SKILL.md
+++ b/.codex/skills/data-designer-plugin-authoring/SKILL.md
@@ -30,18 +30,20 @@ uv run python -c "import inspect; from data_designer.config.base import SingleCo
 uv run python -c "import inspect; from data_designer.engine.column_generators.generators.base import ColumnGeneratorFullColumn; print(inspect.getsource(ColumnGeneratorFullColumn))"
 ```
 
-## Scaffold First
+## Scaffold With `ddp`
 
-Use the repo CLI instead of creating package files by hand:
+The initial plugin structure is owned by the repo's `ddp` CLI. Always invoke the scaffold command; do not create the package directory, `pyproject.toml`, source package, tests, docs, or ownership files by hand.
 
 ```bash
 make sync
 uv run ddp new <slug>
 ```
 
-Use a kebab-case slug without the `data-designer-` prefix. The package will be created at `plugins/data-designer-<slug>/` with `pyproject.toml`, `README.md`, `CODEOWNERS`, `docs/index.md`, tests, and `src/data_designer_<slug>/`.
+Use a kebab-case slug without the `data-designer-` prefix. If you need the exact scaffold behavior, read `devtools/ddp/src/ddp/scaffold.py` or inspect the generated files after running the command. Do not reproduce the scaffold algorithm in this skill; the software encodes that process deterministically.
 
-Read the generated files before editing. If the generated class names stutter because the slug contains words such as `column`, rename the classes and update `plugin.py`.
+If the command fails because the scaffold is wrong or incomplete, fix the `ddp` tooling or report the blocker. Do not bypass it by hand-assembling the initial plugin skeleton.
+
+Read the generated files before editing them. If the generated class names stutter because the slug contains words such as `column`, rename the classes and update `plugin.py`.
 
 ## Implementation Rules