NVIDIA-NeMo · eric-tramel · May 5, 2026
@@ -8,3 +8,4 @@
 
 # Plugins
 /plugins/data-designer-template/ @NVIDIA-NeMo/data_designer_reviewers
+/plugins/data-designer-visual-search/ eric.tramel@gmail.com
@@ -0,0 +1,86 @@
+# Practical Examples
+
+## Branch From an Earlier Crop
+
+The image workspace is tree-shaped. A model can create one crop, inspect it,
+then operate on the original image again:
+
+1. `open_image()` returns `img_0000`.
+2. `crop_image(image_id="img_0000", x=0, y=0, width=50, height=50, unit="percent")`
+   returns `img_0001`.
+3. `edit_color(image_id="img_0001", contrast=1.5)` returns `img_0002`.
+4. `crop_image(image_id="img_0000", x=50, y=50, width=50, height=50, unit="percent")`
+   returns `img_0003`.
+
+The resulting history preserves both branches:
+
+```text
+img_0000 open_image
+|-- img_0001 crop_image
+|   `-- img_0002 edit_color
+`-- img_0003 crop_image
+```
+
+This is useful when the model needs to compare multiple areas or recover from a
+crop that turned out to be unhelpful.
+
+## Read Small Text
+
+```python
+builder.add_column(
+    name="label_text",
+    column_type="visual-search",
+    image_column="product_photo",
+    prompt=(
+        "Find the ingredients label. Crop tightly around it, increase contrast "
+        "if needed, and return the text you can read."
+    ),
+    model_alias="vision",
+    max_tool_call_turns=5,
+)
+```
+
+Expected model behavior:
+
+- Inspect the original image.
+- Crop the label region.
+- Optionally increase contrast or convert to grayscale.
+- Answer using the attached edited crop.
+
+## Compare Two Regions
+
+```python
+builder.add_column(
+    name="comparison",
+    column_type="visual-search",
+    image_column="shelf_image",
+    prompt=(
+        "Compare the price tags on the left and right sides of the shelf. "
+        "Use separate crops and report which price is lower."
+    ),
+    model_alias="vision",
+    max_tool_call_turns=6,
+)
+```
+
+The model can crop the left tag from `img_0000`, crop the right tag from
+`img_0000`, inspect both resulting IDs, and answer from the evidence.
+
+## Data URI Input
+
+The `image_column` can contain base64 data or a full data URI instead of a file
+path:
+
+```python
+builder.add_column(
+    name="base64_answer",
+    column_type="visual-search",
+    image_column="image_data_uri",
+    prompt="Crop the center of the image and describe what is visible.",
+    model_alias="vision",
+)
+```
+
+If values are raw base64 and the format cannot be detected reliably, set
+`image_data_type="base64"` and `image_format="png"` or another supported image
+format.
@@ -0,0 +1,94 @@
+# data-designer-visual-search
+
+`data-designer-visual-search` adds a `visual-search` column type for
+image-grounded visual search workflows. It is intended for cases where a VLM
+needs to inspect an image, crop into regions, transform the view, adjust color,
+and then continue reasoning over the resulting image.
+
+The plugin owns the extra plumbing that ordinary model tool calling does not
+handle: each local image operation returns an `image_id`, the new image is held
+in memory, and the generated image is attached back into the next model turn as
+multimodal context.
+
+## What It Provides
+
+- A `VisualSearchColumnConfig` registered as column type `visual-search`.
+- A row-scoped in-memory image workspace.
+- Local tools for opening images, listing image IDs, inspecting image metadata,
+  cropping, transforming, and editing color.
+- Tree-shaped image history, so the model can branch from any previous
+  `image_id` instead of following a single linear edit chain.
+- A default side-effect column named `{column_name}__image_history` that records
+  image IDs, parent IDs, child IDs, operations, dimensions, and operation
+  metadata.
+- Optional model trace and reasoning-content side-effect columns that match the
+  conventions used by Data Designer LLM columns.
+
+## Column Interface
+
+| Field | Required | Description |
+| --- | --- | --- |
+| `name` | Yes | Output column name. |
+| `column_type` | Yes | Must be `visual-search`. |
+| `image_column` | Yes | Existing column containing a local image path, URL, base64 image, or image data URI. |
+| `prompt` | Yes | Jinja2 prompt template for the visual search task. |
+| `model_alias` | Yes | Alias of a vision-capable chat model in the Data Designer config. |
+| `system_prompt` | No | Optional Jinja2 system prompt appended to the built-in visual search instructions. |
+| `image_data_type` | No | Optional explicit image data type, such as `url` or `base64`. Leave unset for auto-detection. |
+| `image_format` | Conditional | Required when `image_data_type` is explicitly `base64`. |
+| `image_placeholder` | No | Optional text token to include next to every image attachment for endpoints that require one. |
+| `max_tool_call_turns` | No | Maximum tool-calling turns per row. Defaults to `6`. |
+| `allowed_tools` | No | Optional allowlist of built-in visual tools. Defaults to all tools. |
+| `attach_images_after_tool_calls` | No | Whether to attach tool-created images into the next model turn. Defaults to `True`. |
+| `include_image_history` | No | Whether to write `{name}__image_history`. Defaults to `True`. |
+| `with_trace` | No | Optional trace capture mode. Defaults to `none`. |
+| `extract_reasoning_content` | No | Whether to write `{name}__reasoning_content`. Defaults to `False`. |
+| `use_default_system_prompt` | No | Whether to prepend built-in image-tool instructions. Defaults to `True`. |
+
+## Built-In Tools
+
+| Tool | Purpose |
+| --- | --- |
+| `open_image` | Opens the configured row image and returns the root `image_id`. |
+| `get_image_info` | Returns dimensions, parent ID, children IDs, operation name, and metadata for an `image_id`. |
+| `list_images` | Lists every image currently held in the row workspace. |
+| `crop_image` | Crops an existing image by pixel or percent coordinates and returns a new `image_id`. |
+| `transform_image` | Rotates, flips, or resizes an existing image and returns a new `image_id`. |
+| `edit_color` | Adjusts brightness, contrast, saturation, sharpness, grayscale, or inversion and returns a new `image_id`. |
+
+Tool results are ordinary tool messages containing JSON metadata. When a tool
+creates an image, the plugin also attaches that image to the next user turn so
+the model can inspect it visually.
+
+## Image History
+
+Every image node has stable metadata:
+
+```json
+{
+  "image_id": "img_0001",
+  "parent_image_id": "img_0000",
+  "children_image_ids": [],
+  "operation": "crop_image",
+  "width": 512,
+  "height": 384,
+  "metadata": {
+    "box": {"left": 0, "top": 0, "right": 512, "bottom": 384},
+    "unit": "pixels"
+  }
+}
+```
+
+Because the model controls the `image_id` argument, it can crop from the root
+image, transform that crop, rewind to the root, and crop a different region.
+The workspace keeps the whole tree for the duration of that row.
+
+## When To Use It
+
+Use `visual-search` when the model needs iterative visual operations before it
+can answer reliably. Good examples include reading small labels, comparing
+regions, checking color after contrast adjustment, or zooming into a specific
+part of a larger image.
+
+For a single prompt over an image with no iterative image manipulation, a
+standard Data Designer LLM column with multimodal context may be simpler.
@@ -0,0 +1,125 @@
+# Usage
+
+This example starts with a dataframe column containing image paths and adds a
+`visual-search` column. The model can call image tools while answering the
+prompt, and the plugin will pass each resulting crop or edited image back to the
+model automatically.
+
+```python
+import pandas as pd
+
+from data_designer.config.config_builder import DataDesignerConfigBuilder
+from data_designer.config.models import ChatCompletionInferenceParams, ModelConfig, ModelProvider
+from data_designer.config.seed_source_dataframe import DataFrameSeedSource
+from data_designer.interface.data_designer import DataDesigner
+
+seed_df = pd.DataFrame(
+    {
+        "image_path": ["/path/to/store-shelf.png"],
+        "target": ["the nutrition label on the cereal box"],
+    }
+)
+
+provider = ModelProvider(
+    name="nvidia",
+    endpoint="https://integrate.api.nvidia.com/v1",
+    api_key="NVIDIA_API_KEY",
+    provider_type="openai",
+)
+
+vision_model = ModelConfig(
+    alias="vision",
+    model="qwen/qwen3.5-122b-a10b",
+    provider="nvidia",
+    inference_parameters=ChatCompletionInferenceParams(
+        temperature=0,
+        max_tokens=512,
+        timeout=60,
+    ),
+)
+
+builder = DataDesignerConfigBuilder(model_configs=[vision_model])
+builder.with_seed_dataset(DataFrameSeedSource(df=seed_df))
+builder.add_column(
+    name="visual_answer",
+    column_type="visual-search",
+    image_column="image_path",
+    prompt=(
+        "Find {{ target }}. Use crop_image or edit_color if that helps. "
+        "Return the text you can read and explain which image_id you used."
+    ),
+    model_alias="vision",
+    max_tool_call_turns=4,
+)
+
+result = DataDesigner(
+    artifact_path="artifacts",
+    model_providers=[provider],
+).preview(builder, num_records=1)
+```
+
+The generated dataset includes:
+
+- `visual_answer`: the model's final answer.
+- `visual_answer__image_history`: the image operation tree produced while
+  answering the row.
+
+## Restricting Tools
+
+Use `allowed_tools` when you want the model to perform only a narrower set of
+operations:
+
+```python
+builder.add_column(
+    name="crop_only_answer",
+    column_type="visual-search",
+    image_column="image_path",
+    prompt="Crop the upper-right quadrant and describe the dominant color.",
+    model_alias="vision",
+    allowed_tools=["open_image", "get_image_info", "crop_image"],
+    max_tool_call_turns=2,
+)
+```
+
+## Endpoint Image Tokens
+
+Most OpenAI-compatible multimodal endpoints accept image content blocks directly.
+Some model servers also require a model-specific image token in the text for
+each attached image. Set `image_placeholder` for those endpoints:
+
+```python
+builder.add_column(
+    name="answer",
+    column_type="visual-search",
+    image_column="image_path",
+    prompt="Inspect the attached image and answer the question.",
+    model_alias="vision",
+    image_placeholder="<image>",
+)
+```
+
+The plugin prepends the placeholder to the initial image turn and to every later
+turn that attaches a tool-created image.
+
+## Capturing Trace Output
+
+The column supports the same trace side-effect pattern as other LLM-backed Data
+Designer columns:
+
+```python
+from data_designer.config.utils.trace_type import TraceType
+
+builder.add_column(
+    name="answer_with_trace",
+    column_type="visual-search",
+    image_column="image_path",
+    prompt="Zoom into the serial number and read it.",
+    model_alias="vision",
+    with_trace=TraceType.ALL_MESSAGES,
+    extract_reasoning_content=True,
+)
+```
+
+This adds `answer_with_trace__trace` and
+`answer_with_trace__reasoning_content` when the selected model provides
+reasoning content.
@@ -16,4 +16,15 @@ Browse available Data Designer plugins by what they add to your data generation
       <span class="plugin-doc-card__chips"><span class="plugin-doc-chip">text-transform</span></span>
     </span>
   </a>
+  <a class="plugin-doc-card" href="data-designer-visual-search/" aria-label="Open data-designer-visual-search documentation">
+    <span class="plugin-doc-card__header">
+      <span class="plugin-doc-card__title">data-designer-visual-search</span>
+      <span class="plugin-doc-card__version">v0.1.0</span>
+    </span>
+    <span class="plugin-doc-card__description">Visual search column with local image crop, transform, and color-edit tools</span>
+    <span class="plugin-doc-card__section">
+      <span class="plugin-doc-card__label">Column types</span>
+      <span class="plugin-doc-card__chips"><span class="plugin-doc-chip">visual-search</span></span>
+    </span>
+  </a>
 </div>
@@ -0,0 +1,3 @@
+# Owner(s) of this plugin — used to generate the root CODEOWNERS file.
+# GitHub accepts @username, @org/team, or email format.
+* eric.tramel@gmail.com
@@ -0,0 +1,61 @@
+# data-designer-visual-search
+
+Data Designer plugin for VLM-driven visual search over image columns, with
+local image crop, transform, and color-edit tools.
+
+The `visual-search` column runs a vision-capable chat model with built-in
+image-operation tools:
+
+- `open_image`
+- `get_image_info`
+- `list_images`
+- `crop_image`
+- `transform_image`
+- `edit_color`
+
+Each operation returns an `image_id`. The column keeps intermediate images in
+memory and re-attaches tool-produced images to the following model turn, so the
+model can inspect a crop or transformed image before deciding what to do next.
+Because IDs remain addressable, the model can branch from an earlier image
+rather than being forced through a linear edit chain.
+
+## Installation
+
+```bash
+pip install data-designer-visual-search
+```
+
+## Usage
+
+Once installed, the `visual-search` column type is automatically discovered by
+[NeMo Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner).
+
+```python
+import pandas as pd
+from data_designer.config.config_builder import DataDesignerConfigBuilder
+from data_designer.config.seed_source_dataframe import DataFrameSeedSource
+from data_designer.interface.data_designer import DataDesigner
+
+seed_df = pd.DataFrame({"image_path": ["/path/to/scene.png"]})
+
+builder = DataDesignerConfigBuilder()
+builder.with_seed_dataset(DataFrameSeedSource(df=seed_df))
+builder.add_column(
+    name="visual_answer",
+    column_type="visual-search",
+    image_column="image_path",
+    prompt="Find the red object. Crop or transform the image if that helps.",
+    model_alias="nvidia-vision",
+    # Optional: set a model-specific image token here if your endpoint requires
+    # one in the text for every attached image.
+    # image_placeholder="<image>",
+)
+
+result = DataDesigner(artifact_path="artifacts").preview(builder, num_records=1)
+```
+
+The main output column contains the model's final answer. By default the plugin
+also writes `{column_name}__image_history`, a compact tree of image IDs, parent
+IDs, operations, and dimensions.
+
+See `docs/` for the full interface reference and practical examples.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -8,3 +8,4 @@

		# Plugins
		/plugins/data-designer-template/ @NVIDIA-NeMo/data_designer_reviewers
		/plugins/data-designer-visual-search/ eric.tramel@gmail.com