Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
fce1016
feat(cli): add plugin catalog services
johnnygreco May 7, 2026
bb9428e
feat(cli): add plugins command group
johnnygreco May 7, 2026
1239643
test(cli): cover plugin catalog workflows
johnnygreco May 7, 2026
bf9f522
fix(cli): align plugin taps with schema v2
johnnygreco May 7, 2026
47bed27
fix(cli): address plugin review feedback
johnnygreco May 7, 2026
6ffad17
fix(cli): gate incompatible plugin installs
johnnygreco May 7, 2026
4bfb03e
style(cli): format plugin catalog files
johnnygreco May 7, 2026
1d20aae
fix(cli): reject duplicate plugin entry names
johnnygreco May 7, 2026
f8f866b
fix(cli): preserve GitHub tree tap paths
johnnygreco May 8, 2026
4717aab
fix(cli): verify plugin entry point names
johnnygreco May 8, 2026
7799e41
align plugin CLI with catalog schema
johnnygreco May 8, 2026
d3c7320
tidy plugin catalog workflow docs
johnnygreco May 8, 2026
429d549
align plugin catalog CLI with package contract
johnnygreco May 8, 2026
6adf2cf
add plugin package uninstall workflow
johnnygreco May 8, 2026
208948b
test plugin package command targets
johnnygreco May 8, 2026
e2735ab
document plugin package aliases
johnnygreco May 8, 2026
b86a7a8
address plugin catalog review feedback
johnnygreco May 8, 2026
a59d6aa
prefer runtime plugin lookup matches
johnnygreco May 8, 2026
4c5da44
rename plugins command to plugin
johnnygreco May 8, 2026
b9e74b3
show plugin package descriptions
johnnygreco May 8, 2026
86a0776
rename plugin catalogs command
johnnygreco May 8, 2026
47f8436
add protected plugin package installs
johnnygreco May 8, 2026
ef8166c
document plugin package install modes
johnnygreco May 8, 2026
6cfa3ef
avoid building project during plugin installs
johnnygreco May 9, 2026
05bfaf0
harden plugin package installs
johnnygreco May 9, 2026
2e64d14
tighten plugin catalog contracts
johnnygreco May 9, 2026
d8e1d6a
fix no-args help exit code
johnnygreco May 9, 2026
13a1c9b
make plugin docs links robust
johnnygreco May 9, 2026
3d0e526
document plugin CLI catalog workflows
johnnygreco May 9, 2026
1a9f468
clarify plugin entry point verification
johnnygreco May 9, 2026
2b334cc
simplify plugin CLI docs
johnnygreco May 9, 2026
66c74a5
narrow plugin search fields
johnnygreco May 9, 2026
0574e7b
hide plugin catalog cache ttl
johnnygreco May 9, 2026
2a91d21
remove plugin catalog trust flag
johnnygreco May 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 51 additions & 7 deletions architecture/cli.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# CLI

The CLI (`data-designer`) provides an interactive command-line interface for configuring models, providers, tools, and personas, as well as running dataset generation. It uses a layered architecture for config management and delegates generation to the public `DataDesigner` API.
The CLI (`data-designer`) provides an interactive command-line interface for configuring models, providers, MCP providers, and tools, downloading managed persona datasets, discovering, installing, and uninstalling plugin packages from catalogs, and running dataset generation. It uses a layered architecture for setup workflows and delegates generation to the public `DataDesigner` API.

Source: `packages/data-designer/src/data_designer/cli/`

## Overview

The CLI is built on Typer with lazy command loading to keep startup fast. Config management commands follow a **command β†’ controller β†’ service β†’ repository** layering pattern. Generation commands bypass this stack and use the public `DataDesigner` class directly.
The CLI is built on Typer with lazy command loading to keep startup fast. Config management and plugin catalog commands follow a **command β†’ controller β†’ service β†’ repository** layering pattern. Generation commands bypass this stack and use the public `DataDesigner` class directly.

## Key Components

Expand All @@ -20,9 +20,9 @@ The CLI is built on Typer with lazy command loading to keep startup fast. Config

`create_lazy_typer_group` and `_LazyCommand` stubs defer importing command modules until a command is actually invoked. This keeps `data-designer --help` fast β€” only the command names and descriptions are loaded eagerly; the full module (and its dependencies) loads on first use.

### Layering Pattern (Config Management)
### Layering Pattern (Setup Workflows)

Config management commands (models, providers, tools, personas) follow a consistent four-layer pattern:
Config management commands (models, providers, MCP providers, tools) follow a consistent four-layer pattern:

| Layer | Role | Example |
|-------|------|---------|
Expand All @@ -31,10 +31,22 @@ Config management commands (models, providers, tools, personas) follow a consist
| **Service** | Domain rules: uniqueness, merge, delete-all | `ModelService.add/update/delete` over `ModelRepository` |
| **Repository** | File I/O for typed config registries | `ModelRepository` extends `ConfigRepository[ModelConfigRegistry]` |

Repositories: `ModelRepository`, `ProviderRepository`, `ToolRepository`, `MCPProviderRepository`, `PersonaRepository`.
Repositories: `ModelRepository`, `ProviderRepository`, `MCPProviderRepository`, and `ToolRepository`.
`PersonaRepository` provides read-only locale metadata for managed persona dataset downloads.

Services mirror the repository domains with business logic (validation, conflict resolution).

Plugin catalog commands use the same layering shape:

| Layer | Role | Example |
|-------|------|---------|
| **Command** | Thin Typer entry, wires `DATA_DESIGNER_HOME` and command options | `plugin` subcommands (`list`, `search`, `info`, `install`, `uninstall`, `installed`, `catalog`) β†’ `PluginCatalogController(DATA_DESIGNER_HOME)` |
| **Controller** | UX flow: catalog tables, package metadata, compatibility display, install/uninstall confirmations | `PluginCatalogController` composes catalog + install services |
| **Service** | Domain rules: package listing, compatibility checks, uv/pip install and uninstall commands, plugin discovery verification | `PluginCatalogService`, `PluginInstallService` |
| **Repository** | File/cache I/O for catalog aliases and catalog documents | `PluginCatalogRepository` |

The built-in `nvidia` catalog points at `https://nvidia-nemo.github.io/DataDesignerPlugins/catalog/plugins.json`. `NVIDIA-NeMo/DataDesignerPlugins` defines the catalog format. Each catalog entry is an installable package with docs, install metadata, compatibility constraints, and one or more runtime plugins. Users install and uninstall packages, not individual runtime plugins. Commands that take a package name also accept the package alias from the `data-designer-{alias}` package-name pattern; for example, `data-designer-calculator` can be addressed as `calculator`.

### Generation Commands

`preview`, `create`, and `validate` commands use `GenerationController`, which:
Expand Down Expand Up @@ -62,6 +74,37 @@ User invokes command (e.g., `data-designer config models`)
β†’ Repository reads/writes config files
```

### Plugin Catalog Discovery
```
User invokes command (e.g., `data-designer plugin list`)
β†’ Command function wires DATA_DESIGNER_HOME and catalog options
β†’ PluginCatalogController resolves the catalog alias
β†’ PluginCatalogService loads packages and filters out incompatible packages by default
β†’ PluginCatalogRepository reads local config and cached/remote catalog JSON
```

### Plugin Install/Uninstall
```
User invokes command (e.g., `data-designer plugin install calculator`)
β†’ PluginCatalogController resolves the plugin package name or package alias
β†’ PluginCatalogService evaluates Python and Data Designer compatibility
β†’ PluginInstallService chooses uv or pip and builds the command.
In active uv projects it uses `uv add` so the package is recorded in
`pyproject.toml`; otherwise it installs into the current Python environment.
Data Designer itself is already installed, so its packages are not reinstalled
or replaced while installing plugin dependencies.
β†’ PluginInstallService verifies Data Designer can discover the package's runtime plugins
```

```
User invokes command (e.g., `data-designer plugin uninstall calculator`)
β†’ PluginCatalogController resolves the plugin package name or package alias
β†’ PluginInstallService chooses uv or pip and builds the uninstall command.
Active uv projects remove the dependency from project metadata and uninstall
the package from the current environment.
β†’ PluginInstallService verifies Data Designer no longer discovers the package's runtime plugins
```

### Generation
```
User invokes command (e.g., `data-designer create config.yaml`)
Expand All @@ -73,8 +116,9 @@ User invokes command (e.g., `data-designer create config.yaml`)
## Design Decisions

- **Lazy command loading** keeps `data-designer --help` responsive: command modules (and their heavy dependencies, such as the engine and model stacks) load only when a command is invoked, not at process startup.
- **Controller/service/repo for config, direct API for generation** β€” config management benefits from the layered pattern (testable services, swappable repositories). Generation doesn't need this indirection; it delegates to the same `DataDesigner` class that Python users call directly.
- **`DATA_DESIGNER_HOME`** centralizes all CLI-managed state (model configs, provider configs, tool configs, personas) in a single directory, defaulting to `~/.data_designer/`.
- **Controller/service/repo for setup workflows, direct API for generation** β€” config and plugin catalog workflows benefit from the layered pattern (testable services, swappable repositories). Generation doesn't need this indirection; it delegates to the same `DataDesigner` class that Python users call directly.
- **`DATA_DESIGNER_HOME`** centralizes CLI-managed state (model configs, provider configs, MCP provider configs, tool configs, managed assets, plugin catalog aliases, and catalog caches) in a single directory, defaulting to `~/.data-designer/`.
- **Package-first plugin catalogs** match how users install plugins: one package can provide one or more runtime plugins, but install and uninstall commands always target the package.
- **Rich-based UI** provides formatted tables, progress bars, and interactive prompts without requiring a web interface.

## Cross-References
Expand Down
138 changes: 131 additions & 7 deletions packages/data-designer/src/data_designer/cli/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
# 🎨 NeMo Data Designer CLI

This directory contains the Command-Line Interface (CLI) for configuring model providers and model configurations used in Data Designer.
This directory contains the Command-Line Interface (CLI) for configuring model providers, model configurations, MCP providers, tool configs, managed assets, and plugin catalogs used in Data Designer.

## Overview

The CLI provides an interactive interface for managing:
- **Model Providers**: LLM API endpoints (NVIDIA, OpenAI, Anthropic, custom providers)
- **Model Configs**: Specific model configurations with inference parameters
- **MCP Providers**: MCP server configurations for tool integration
- **Tool Configs**: Tool definitions used by configured models and workflows
- **Managed Assets**: Persona dataset downloads under the Data Designer home directory
- **Plugin Catalogs**: Catalog aliases for finding Data Designer plugin packages
- **Plugin Packages**: Install and uninstall packages from catalogs, check version compatibility first, and verify Data Designer can discover the plugins they provide

Configuration files are stored in `~/.data-designer/` by default and can be referenced by Data Designer workflows.
Configuration files and CLI-managed state are stored in `~/.data-designer/` by default.

## Architecture

Expand All @@ -17,7 +22,7 @@ The CLI follows a **layered architecture** pattern, separating concerns into dis
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Commands β”‚
β”‚ Entry points for CLI commands (list, providers, models) β”‚
β”‚ Entry points for CLI commands (config, download, plugin) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
Expand Down Expand Up @@ -50,9 +55,13 @@ The CLI follows a **layered architecture** pattern, separating concerns into dis
- Handle top-level error reporting
- **Files**:
- `list.py`: List current configurations
- `mcp.py`: Configure MCP providers
- `models.py`: Configure models
- `providers.py`: Configure providers
- `download.py`: Download managed assets
- `plugin.py`: Discover, install, and uninstall plugin packages from catalogs
- `reset.py`: Reset/delete configurations
- `tools.py`: Configure tool configs

#### 2. **Controllers** (`controllers/`)
- **Purpose**: Orchestrate user workflows and coordinate between services, forms, and UI
Expand All @@ -62,8 +71,12 @@ The CLI follows a **layered architecture** pattern, separating concerns into dis
- Handle user navigation and session state
- Manage associated resource deletion (e.g., deleting models when provider is deleted)
- **Files**:
- `download_controller.py`: Orchestrates managed asset download workflows
- `mcp_provider_controller.py`: Orchestrates MCP provider configuration workflows
- `model_controller.py`: Orchestrates model configuration workflows
- `provider_controller.py`: Orchestrates provider configuration workflows
- `plugin_catalog_controller.py`: Orchestrates plugin catalog browsing, alias management, and package workflows
- `tool_controller.py`: Orchestrates tool configuration workflows

**Key Features**:
- **Associated Resource Management**: When deleting a provider, the controller checks for associated models and prompts the user to delete them together
Expand All @@ -77,8 +90,12 @@ The CLI follows a **layered architecture** pattern, separating concerns into dis
- Coordinate between multiple repositories when needed
- Handle default management (e.g., default provider selection)
- **Files**:
- `mcp_provider_service.py`: MCP provider configuration business logic
- `model_service.py`: Model configuration business logic
- `provider_service.py`: Provider business logic
- `plugin_catalog_service.py`: Plugin catalog loading, search, compatibility checks, and installed plugin listing
- `plugin_install_service.py`: Chooses and runs uv or pip commands for installing/uninstalling plugin packages, keeps installed Data Designer packages in place, and verifies installed plugins
- `tool_service.py`: Tool configuration business logic

**Key Methods**:
- `list_all()`: Get all configured items
Expand All @@ -91,16 +108,20 @@ The CLI follows a **layered architecture** pattern, separating concerns into dis
- `set_default()`, `get_default()`: Manage default provider (providers only)

#### 4. **Repositories** (`repositories/`)
- **Purpose**: Handle data persistence (YAML file I/O)
- **Purpose**: Handle data persistence and read-only reference metadata
- **Responsibilities**:
- Load configuration from YAML files
- Save configuration to YAML files
- Check file existence
- Delete configuration files
- Check file existence and delete configuration files where applicable
- Provide read-only metadata for built-in managed assets
- **Files**:
- `base.py`: Abstract base repository with common operations
- `mcp_provider_repository.py`: MCP provider configuration persistence
- `model_repository.py`: Model configuration persistence
- `persona_repository.py`: Read-only persona locale metadata
- `provider_repository.py`: Provider persistence
- `plugin_catalog_repository.py`: Plugin catalog aliases, catalog fetching, and URL-keyed catalog cache
- `tool_repository.py`: Tool configuration persistence

**Base Repository Pattern**:
```python
Expand All @@ -122,8 +143,10 @@ class ConfigRepository(ABC, Generic[T]):
- `builder.py`: Abstract form builder base
- `field.py`: Form field types (TextField, SelectField, NumericField)
- `form.py`: Form container and prompt orchestration
- `mcp_provider_builder.py`: Interactive MCP provider configuration builder
- `model_builder.py`: Interactive model configuration builder
- `provider_builder.py`: Interactive provider configuration builder
- `tool_builder.py`: Interactive tool configuration builder

**Form Features**:
- Field-level validation
Expand Down Expand Up @@ -152,7 +175,7 @@ class ConfigRepository(ABC, Generic[T]):

## Configuration Files

The CLI manages two YAML configuration files:
The CLI manages YAML configuration files, managed assets, and plugin catalog caches under `~/.data-designer/`:

### `~/.data-designer/model_providers.yaml`

Expand Down Expand Up @@ -206,6 +229,61 @@ model_configs:
max_parallel_requests: 4
```

### `~/.data-designer/mcp_providers.yaml`

Stores MCP provider configurations:

```yaml
providers:
- name: local-tools
provider_type: stdio
command: python
args:
- "-m"
- my_mcp_server
```

### `~/.data-designer/tool_configs.yaml`

Stores tool configurations that reference MCP providers:

```yaml
tool_configs:
- tool_alias: research-tools
providers:
- local-tools
max_tool_call_turns: 5
```

### `~/.data-designer/managed-assets/`

Stores managed assets downloaded by CLI commands such as
`data-designer download personas`. Set `DATA_DESIGNER_MANAGED_ASSETS_PATH` to
store managed assets outside `DATA_DESIGNER_HOME`.

### `~/.data-designer/plugin_catalogs.yaml`

Stores user-added plugin catalog aliases. The built-in NVIDIA catalog points at
`https://nvidia-nemo.github.io/DataDesignerPlugins/catalog/plugins.json`, is
always available, and is not written to this file. Set
`DATA_DESIGNER_DEFAULT_PLUGIN_CATALOG_URL` to repoint the built-in catalog for QA or
staging.

```yaml
catalogs:
- alias: research
url: https://raw.githubusercontent.com/acme/dd-plugins/main/catalog/plugins.json
```

### `~/.data-designer/plugin-catalog-cache/`

Stores fetched plugin catalog payloads as JSON cache files keyed by catalog alias and URL hash. This prevents a re-pointed alias from serving stale catalog data from a previous URL.

Plugin package arguments accept either the full package name or the package
alias. For packages named `data-designer-{alias}`, the alias is `{alias}`. For
example, `data-designer-github` can be addressed as `github` in `info`,
`install`, and `uninstall`.

## Usage Examples

### Configure Providers
Expand Down Expand Up @@ -248,3 +326,49 @@ data-designer config list
# Delete configuration files (with confirmation)
data-designer config reset
```

### Discover, Install, and Uninstall Plugin Packages

```bash
# List compatible plugin packages from the default NVIDIA catalog
data-designer plugin list

# Search a specific catalog
data-designer plugin --catalog research search transform

# Show package metadata, compatibility, docs, and the install command
data-designer plugin info github

# Install a plugin package from a catalog and verify Data Designer can discover its plugins
data-designer plugin install github --yes

# Preview without changing the current environment
data-designer plugin install github --dry-run

# Uninstall a plugin package and verify Data Designer no longer discovers its plugins
data-designer plugin uninstall github --yes

# Preview without changing the current environment
data-designer plugin uninstall github --dry-run

# Add and manage catalog aliases
data-designer plugin catalog add research https://github.com/acme/dd-plugins
data-designer plugin catalog list
data-designer plugin catalog remove research

# List installed runtime plugin entry points without importing plugin modules
data-designer plugin installed
```

When installing a plugin package, the CLI first checks the package's Python and
Data Designer version requirements. The plugin package and its other
dependencies are installed normally, but the currently installed Data Designer
packages (`data-designer`, `data-designer-config`, and `data-designer-engine`)
are kept in place. This prevents a plugin dependency from upgrading,
downgrading, or reinstalling Data Designer itself.

In an active virtual environment with a user `pyproject.toml`, `uv` uses
`uv add` so the plugin package is recorded in the project. Otherwise the CLI
installs into the current Python environment with `uv pip install` or `pip`.
`uv` plugin installs require `uv >= 0.6.0`; auto mode falls back to `pip` when
`uv` is missing or too old. `pip` remains supported for pip-only environments.
Loading
Loading