Skip to content

feat: Support more Data Designer seed sources#413

Open
mikeknep wants to merge 12 commits into
mainfrom
remote-seeds/mknepper
Open

feat: Support more Data Designer seed sources#413
mikeknep wants to merge 12 commits into
mainfrom
remote-seeds/mknepper

Conversation

@mikeknep

@mikeknep mikeknep commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Adds support for DirectorySeedSource and FileContentsSeedSource seed datasets in remote contexts. Depends on NVIDIA-NeMo/DataDesigner#765 getting merged and a new release of the library getting published.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for directory-based and file content-based seed sources from the Files service for remote execution
    • Improved seed validation to return canonical root paths for validated seed sources
  • Tests

    • Added comprehensive test coverage for filesystem-based seed source workflows in remote execution

mikeknep added 12 commits June 23, 2026 11:38
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
…rted

Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
…d sources

Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds remote filesystem seed support (DirectorySeedSource, FileContentsSeedSource) to the NeMo Data Designer remote execution path. validate_seed now returns a canonical fileset root string, FilesetFileSystemProvider and _FilesetDirFileSystem are introduced to scope seed readers to validated fileset roots, and RemoteDataDesignerContext is wired to collect and pass those roots. ModelProviderRegistry early-validation for missing providers is removed.

Changes

Remote Filesystem Seed Support

Layer / File(s) Summary
validate_seed returns canonical fileset root
packages/data_designer_nemo/src/data_designer_nemo/seed.py, packages/data_designer_nemo/src/data_designer_nemo/unsupported_features.py
validate_seed signature changes from -> None to `-> str
FilesetFileSystemProvider and _FilesetDirFileSystem
packages/data_designer_nemo/src/data_designer_nemo/fileset_filesystem_provider.py
New module: _FilesetDirFileSystem overrides _relpath for #-separated fileset paths; FilesetFileSystemProvider exposes create_context and ensure_root_exists with a validated-roots cache to back directory-style seed readers.
RemoteDataDesignerContext wiring
packages/data_designer_nemo/src/data_designer_nemo/context.py
Adds _validated_filesystem_roots field, captures canonical root from validate_seed into that set, and passes FilesetFileSystemProvider (initialized with those roots) to DirectorySeedReader and FileContentsSeedReader.
ModelProviderRegistry early-validation removal
packages/data_designer_nemo/src/data_designer_nemo/model_provider.py
Removes explicit default=_NO_OP from make_null_registry and removes upfront NDDInvalidConfigError for provider=None configs in make_local_first_model_provider_registry, deferring that error to ModelProviderCollection.add().
Unit tests: FilesetFileSystemProvider and remote seeds
packages/data_designer_nemo/tests/unit/test_fileset_filesystem_provider.py, packages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.py
New unit tests cover create_context, ensure_root_exists (cached and error paths), validate_seed canonical root return, remote context seed reader types, and seed type validation acceptance for filesystem sources.
Integration tests and fixture alignment
plugins/nemo-data-designer/src/nemo_data_designer_plugin/testing/utils.py, plugins/nemo-data-designer/tests/integration/test_preview_remote_sdk.py, plugins/nemo-data-designer/tests/integration/test_remote_validation_errors.py, plugins/nemo-data-designer/tests/integration/test_validate_sdk.py, plugins/nemo-data-designer/tests/unit/test_model_provider.py, plugins/nemo-data-designer/tests/unit/test_preview_function.py, plugins/nemo-data-designer/tests/unit/test_sdk_resources.py, packages/data_designer_nemo/tests/unit/test_model_configs.py
Integration tests add DirectorySeedSource and FileContentsSeedSource preview scenarios; setup_mock_file gains remote_path override; async_to_sync_sdk patched in fixtures. Multiple fixtures updated to include provider="default/nvidia" in ModelConfig; removed-provider-validation test deleted; seed type parameterization narrowed.

Possibly related PRs

  • NVIDIA-NeMo/nemo-platform#99: Modifies RemoteDataDesignerContext.validate() and the validate_seed() aggregation flow in context.py, directly overlapping with this PR's changes to the same code path.

Suggested labels

feat

Suggested reviewers

  • matthewgrossman
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title accurately describes the main change: adding support for additional seed source types (DirectorySeedSource and FileContentsSeedSource) in remote contexts.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch remote-seeds/mknepper

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.py (1)

4-4: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Remove postponed annotations import in this test module.

Line 4 enables postponed/string-based annotations; keep runtime-resolved concrete hints instead.

Suggested change
-from __future__ import annotations
-
 from typing import Any

As per coding guidelines "Always prefer concrete type hints over string-based ones in Python code; do not import types under TYPE_CHECKING, instead import types as regular imports when possible".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.py` at
line 4, Remove the `from __future__ import annotations` import statement from
the top of the test_remote_filesystem_seeds.py module. This import enables
postponed evaluation of annotations (string-based type hints), but the coding
guidelines require using concrete type hints instead of string-based ones.
Delete this single import line and ensure any type annotations in the module use
direct type references rather than string representations.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.py`:
- Line 4: Remove the `from __future__ import annotations` import statement from
the top of the test_remote_filesystem_seeds.py module. This import enables
postponed evaluation of annotations (string-based type hints), but the coding
guidelines require using concrete type hints instead of string-based ones.
Delete this single import line and ensure any type annotations in the module use
direct type references rather than string representations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e07ae8ef-d677-42f9-9166-d1e26b92627d

📥 Commits

Reviewing files that changed from the base of the PR and between 6d742a3 and 89e0539.

📒 Files selected for processing (15)
  • packages/data_designer_nemo/src/data_designer_nemo/context.py
  • packages/data_designer_nemo/src/data_designer_nemo/fileset_filesystem_provider.py
  • packages/data_designer_nemo/src/data_designer_nemo/model_provider.py
  • packages/data_designer_nemo/src/data_designer_nemo/seed.py
  • packages/data_designer_nemo/src/data_designer_nemo/unsupported_features.py
  • packages/data_designer_nemo/tests/unit/test_fileset_filesystem_provider.py
  • packages/data_designer_nemo/tests/unit/test_model_configs.py
  • packages/data_designer_nemo/tests/unit/test_remote_filesystem_seeds.py
  • plugins/nemo-data-designer/src/nemo_data_designer_plugin/testing/utils.py
  • plugins/nemo-data-designer/tests/integration/test_preview_remote_sdk.py
  • plugins/nemo-data-designer/tests/integration/test_remote_validation_errors.py
  • plugins/nemo-data-designer/tests/integration/test_validate_sdk.py
  • plugins/nemo-data-designer/tests/unit/test_model_provider.py
  • plugins/nemo-data-designer/tests/unit/test_preview_function.py
  • plugins/nemo-data-designer/tests/unit/test_sdk_resources.py
💤 Files with no reviewable changes (3)
  • plugins/nemo-data-designer/tests/integration/test_remote_validation_errors.py
  • plugins/nemo-data-designer/tests/unit/test_model_provider.py
  • packages/data_designer_nemo/src/data_designer_nemo/model_provider.py

@github-actions

Copy link
Copy Markdown
Contributor
Suite Lines Covered Line Rate Branch Rate
Unit Tests 20839/27661 75.3% 60.3%
Integration Tests 11255/26430 42.6% 16.2%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant