Skip to content

feat(models): deprecate implicit default provider routing#594

Merged
nabinchha merged 6 commits intomainfrom
nmulepati/refactor/589-deprecate-default-provider-routing
May 5, 2026
Merged

feat(models): deprecate implicit default provider routing#594
nabinchha merged 6 commits intomainfrom
nmulepati/refactor/589-deprecate-default-provider-routing

Conversation

@nabinchha
Copy link
Copy Markdown
Contributor

@nabinchha nabinchha commented Apr 30, 2026

📋 Summary

Deprecates the legacy "implicit default provider" routing before it's removed in a future release. Every entry point that exercises the implicit default — ModelConfig.provider=None, the registry-level ModelProviderRegistry.default, the YAML default: key, the CLI's "Change default provider" workflow, and the allow_resize async-fallback escape hatch — now emits a DeprecationWarning pointing users at the explicit provider= migration. Continues the work started in #591 / tracked under issue #589.

A second concern surfaced during review: DeprecationWarnings emitted from library frames are silenced under Python's default ignore::DeprecationWarning filter and dedupe against pydantic-internal lines. To make the new warnings actually visible to users, every emission site now goes through a small warn_at_caller helper that walks past pydantic and data_designer frames and attributes the warning to the user's call site.

🔗 Related Issue

Refs #589

🔄 Changes

✨ Added

Deprecation warnings (one per entry point):

  • ModelConfig._warn_on_implicit_provider — pydantic post-validator that warns whenever provider is None (packages/data-designer-config/src/data_designer/config/models.py)
  • ModelProviderRegistry._warn_on_explicit_default — fires only when the caller actually passed default= (uses model_fields_set so the field-default None path stays quiet) (packages/data-designer-engine/src/data_designer/engine/model_provider.py)
  • get_default_provider_name() warns when the on-disk providers YAML carries a default: key
  • ProviderRepository.load warns on the same YAML-default condition for the CLI read path
  • ProviderController._handle_change_default warns when the user enters the "Change default provider" interactive workflow
  • DatasetBuilder._resolve_async_compatibility warns when allow_resize=True forces sync fallback (separate deprecation under issue docs: add plan for workflow chaining #552, surfaced through the same helper for consistency)

Visibility / attribution helper:

  • New module packages/data-designer-config/src/data_designer/config/utils/warning_helpers.py exporting warn_at_caller. Walks sys._getframe past every frame whose module belongs to pydantic, pydantic_core, or data_designer, then calls warnings.warn_explicit against the first user frame using that frame's own __warningregistry__ so Python's once-per-location dedup keys correctly. Falls back to warnings.warn if no user frame is reachable.
  • Prefix matching is exact-or-dotted (module == prefix or module.startswith(prefix + ".")) so pydantic_helpers is not mistaken for pydantic, and data_designer_other is not mistaken for data_designer (regression case from review).

Tests:

  • 9 new test_warning_helpers.py cases covering the prefix-matching predicate and the frame-walk semantics (direct caller, library skip, fallback)
  • Per-emission-site regression tests asserting the warning is emitted and warning.filename == __file__ (i.e. attributes to the user's frame, not a library frame). A regression to warnings.warn(..., stacklevel=N) would silently silence these warnings under default filters and now fails the assertion instead.
  • Happy-path "stays quiet" pins for the no-deprecation paths

🔧 Changed

  • All DeprecationWarning emission sites now use warn_at_caller instead of warnings.warn(..., stacklevel=N). stacklevel=N is brittle (any added frame breaks it) and lands on a data_designer.* frame for every realistic call path through controllers, services, builders, and the interface layer — silenced under ignore::DeprecationWarning.
  • resolve_model_provider_registry skips passing default= in the single-provider case so the common construction path stays quiet under the new warning. Multi-provider registries still pass default (per check_implicit_default) and warn accordingly.
  • stub_model_configs fixture and existing ModelConfig-constructing tests now pass provider= explicitly so they don't trip the new warning
  • Docstrings on ModelConfig.provider and ModelProviderRegistry.default annotated as deprecated

📚 Docs

  • Deprecation admonitions added to docs/concepts/models/model-providers.md, default-model-settings.md, custom-model-settings.md, and configure-model-settings-with-the-cli.md
  • Code examples in docs/concepts/architecture-and-performance.md, inference-parameters.md, and the data-designer-config README updated to set provider= explicitly

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • packages/data-designer-config/src/data_designer/config/utils/warning_helpers.py — uses sys._getframe and warnings.warn_explicit. The module docstring spells out (1) why stacklevel=N is wrong for warnings emitted from a pydantic validator or library helper, (2) why module_globals is deliberately omitted from the warn_explicit call (the __main__-as-BuiltinImporter linecache failure mode), and (3) the dedup-key reasoning. New canonical pattern — review with that bar in mind.
  • packages/data-designer-engine/src/data_designer/engine/model_provider.py_warn_on_explicit_default uses model_fields_set to distinguish "caller passed default=" from "field at default None". The single-provider resolve_model_provider_registry tweak relies on this distinction so common construction paths stay quiet. Worth a careful read.
  • packages/data-designer-config/src/data_designer/config/models.py_warn_on_implicit_provider runs at construction time, so any ModelConfig built without provider= (including legacy serialized configs loaded via model_validate) will now emit a warning. Confirm this is the intended blast radius.

🧪 Testing

  • make test passes (3,125 tests: 548 config + 1,923 engine + 654 interface)
  • Unit tests added/updated — new regression tests for all 6 warning entry points + 9 cases pinning warn_at_caller semantics
  • Attribution regression tests pin warning.filename == __file__ at every emission site so a future regression to warnings.warn(stacklevel=N) fails CI instead of silently silencing the warning
  • E2E tests added/updated — N/A (no behavior change beyond warnings)

✅ Checklist

Emit DeprecationWarning whenever the legacy "implicit default
provider" path is exercised: `ModelConfig.provider=None`, the
registry-level `ModelProviderRegistry.default`, the YAML
`default:` key in `~/.data-designer/model_providers.yaml`, and
the CLI's "Change default provider" workflow.

`resolve_model_provider_registry` skips passing `default=` in the
single-provider case so the common construction path stays quiet.
Multi-provider registries still pass `default` (per
`check_implicit_default`) and warn accordingly.

Update docs, the package README, and test fixtures to specify
`provider=` explicitly on every `ModelConfig`. New tests cover
each warning entry point and pin the post-deprecation happy paths.

Refs #589

Made-with: Cursor
@nabinchha nabinchha requested a review from a team as a code owner April 30, 2026 22:21
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Docs preview: https://a769da61.dd-docs-preview.pages.dev

Notebook tutorials are placeholder-only in previews.

@github-actions
Copy link
Copy Markdown
Contributor

Review: PR #594feat(models): deprecate implicit default provider routing

Summary

Adds DeprecationWarnings to every entry point of the legacy "implicit default
provider" routing, ahead of its removal (issue #589). Four call sites are
instrumented: ModelConfig construction when provider=None, explicit
ModelProviderRegistry(default=...), YAML default: key loading
(get_default_provider_name / ProviderRepository.load), and the CLI
"Change default provider" workflow. resolve_model_provider_registry is
tweaked so the single-provider common-case stays quiet. Docs are annotated
throughout, code examples updated to pass provider= explicitly, and
regression tests added for every warning entry point plus happy-path "stays
quiet" pins. Pure-additive: no behavior change beyond the warnings.

Findings

🟡 Double warning on ProviderRepository.load with YAML default:

packages/data-designer/src/data_designer/cli/repositories/provider_repository.py:37-44

The repository emits its own DeprecationWarning when config_dict["default"]
is non-None, then immediately calls ModelProviderRegistry.model_validate(config_dict).
Because pydantic v2 records "default" in model_fields_set when the key is
present in the validated dict, _warn_on_explicit_default will fire a second
time. Users see the same deprecation twice per load. Either gate the
repository-level warning behind a guard (e.g., simplefilter("ignore") around
model_validate), or drop the repository-level warning and rely on the
validator-level one.

🟡 Double warning on DataDesigner.__init__ startup path

packages/data-designer/src/data_designer/interface/data_designer.py:161-166

When model_providers=None and the user's YAML has default: "foo":

  1. get_default_provider_name() warns (default_model_settings.py:107).
  2. resolve_model_provider_registry(..., default_provider_name="foo") takes
    the multi-provider/explicit-default branch and constructs
    ModelProviderRegistry(providers=..., default="foo") → fires
    _warn_on_explicit_default.

Same UX issue as the repository path: one user action → two warnings. Worth
suppressing at one of the layers.

🟡 Warning storm from legacy on-disk ModelConfig entries

packages/data-designer-config/src/data_designer/config/default_model_settings.py:71

get_default_model_configs() iterates ModelConfig.model_validate(mc) over
every entry in ~/.data-designer/model_configs.yaml. Any entry serialized
before this change lacks provider=, so _warn_on_implicit_provider fires
once per entry at every startup. The PR description calls out this blast
radius ("including legacy serialized configs loaded via model_validate"),
but worth confirming intent — a user with five legacy configs will see five
identical warnings every run. Consider either deduping by alias in the
validator (emit once per (alias,) using functools.lru_cache or a module
set) or having the YAML loader rewrite missing provider= keys with the
resolved registry default before validation.

🟢 _warn_on_explicit_default fires even when default=None is passed explicitly

packages/data-designer-engine/src/data_designer/engine/model_provider.py:57-71

Using "default" in self.model_fields_set means
ModelProviderRegistry(providers=[...], default=None) warns, even though
it's semantically identical to omitting the argument. Probably fine (callers
shouldn't pass default=None in new code anyway), but a belt-and-braces
tightening would be if self.default is not None — which also happens to
eliminate one of the double-warning cases above naturally. Trade-off: loses
the "you passed the deprecated kwarg" signal on the None path. Flag, not
block.

🟢 Validator stacklevel=2 won't surface user call site under pydantic v2

Both _warn_on_implicit_provider and _warn_on_explicit_default use
stacklevel=2. Pydantic v2 wraps @model_validator(mode="after") functions
through several internal frames, so stacklevel=2 lands inside pydantic
internals, not the user's ModelConfig(...) call. Users will still see the
message but the attributed source line will be unhelpful. A higher
stacklevel or explicit warnings.warn_explicit(..., filename=..., lineno=...)
would help, though it's fragile across pydantic versions. Not blocking.

🟢 Tests

Coverage is solid — each warning entry point has both a "warns" case and a
"stays quiet" pin using simplefilter("error", DeprecationWarning). The
"stays quiet" pattern is particularly good at preventing silent regressions.
Existing tests that construct ModelConfig without provider= are updated
thoroughly (5 test files, stub_model_configs fixture, _make_model helper
in fingerprint tests).

Two minor gaps:

  • No test pins the post-deprecation behavior of get_default_model_configs()
    when model_configs.yaml contains legacy entries without provider=
    i.e., the "warning storm" case from the third finding. Worth at least one
    regression showing N-entries → N-warnings (or whatever the intended
    behavior is once decided).
  • test_resolve_model_provider_registry_with_explicit_default
    (packages/data-designer-engine/tests/engine/test_model_provider.py:82)
    now exercises the deprecation path but isn't wrapped in
    pytest.warns/filterwarnings("ignore"). It passes today because pytest
    doesn't escalate DeprecationWarning by default, but if the project ever
    turns on -W error::DeprecationWarning this test will fail. Low priority.

🟢 Docs

Admonitions are consistent in tone and all point to issue #589. Code examples
in the three docs/concepts/ files and the data-designer-config README are
updated to set provider= explicitly. Nothing else in docs/ constructs
ModelConfig without a provider — scanned the remaining model-docs pages and
they either already pass provider= or show YAML not Python.

🟢 Style / conventions

Verdict

Approve with non-blocking comments. The deprecation machinery is correct,
the tests pin both directions, and the docs make the migration obvious. The
main thing I'd ask the author to consider before merging is whether the
double-warning cases (repository + validator, get_default_provider_name +
registry construction) and the N-entries-N-warnings case on legacy
model_configs.yaml loads are intended UX or accidental. If the plan is
"warn loudly and repeatedly until users migrate," this is already correct;
if the plan is "warn once per user action," a small dedup pass would help.
Either stance is defensible — worth making it explicit in a comment on the
PR.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

This PR adds deprecation warnings across all six entry points for implicit default-provider routing (ModelConfig.provider=None, registry-level default=, the YAML default: key, the CLI change-default workflow, and the allow_resize async fallback), using a new warn_at_caller helper that walks the CPython frame stack to attribute warnings to user call sites rather than library frames.

  • New warn_at_caller utility in warning_helpers.py uses sys._getframe + warnings.warn_explicit with exact-or-dotted prefix matching to escape pydantic/data_designer frames before attributing; includes a stacklevel=3 fallback for non-CPython environments.
  • provider_repository.py warning emission is moved outside the except Exception block to prevent filterwarnings(\"error\") from silently swallowing it, and migrated to warn_at_caller for consistent attribution.
  • DataDesigner.__init__ uses a scoped warnings.catch_warnings() + filterwarnings(\"ignore\", …) to suppress the duplicate registry-level warning when the YAML path already emitted the same root-cause deprecation.

Confidence Score: 5/5

Safe to merge — the change is purely additive (new warnings and a new helper), with no behavior change beyond emitting DeprecationWarnings at the correct user call site.

All six deprecation entry points are covered with corresponding regression tests that pin both emission and attribution (filename == file). The warn_at_caller helper correctly handles prefix collisions, the model_fields_set guard cleanly separates opt-in from field-default, and the duplicate-warning suppression in DataDesigner.init is scoped tightly to the YAML fallback path. No behavioral regressions are introduced.

No files require special attention. The test_data_designer.py mock uses stacklevel=2 rather than warn_at_caller for the get_default_provider_name simulation, which means the test exercises the suppression logic correctly under always filter but does not mirror the production attribution path.

Important Files Changed

Filename Overview
packages/data-designer-config/src/data_designer/config/utils/warning_helpers.py New canonical helper; exact-or-dotted prefix matching prevents false positives; warn_explicit registry threading is correct; stacklevel=3 fallback is acknowledged best-effort.
packages/data-designer-engine/src/data_designer/engine/model_provider.py model_fields_set guard correctly distinguishes explicit default= from field-default None; single-provider early return keeps common path quiet under deprecation.
packages/data-designer/src/data_designer/cli/repositories/provider_repository.py Warning emission split out of the broad except block to prevent filterwarnings("error") from swallowing it; migrated to warn_at_caller for correct attribution.
packages/data-designer/src/data_designer/interface/data_designer.py Scoped catch_warnings + filterwarnings("ignore") correctly suppresses the duplicate registry-level warning when the YAML path already warned; nested catch_warnings is properly supported by Python.
packages/data-designer/tests/interface/test_data_designer.py test_init_yaml_default_emits_single_deprecation_warning uses stacklevel=2 in the mock rather than warn_at_caller, attributing to the DataDesigner frame; passes under always filter but does not mirror production attribution.

Reviews (5): Last reviewed commit: "Merge branch 'main' into nmulepati/refac..." | Re-trigger Greptile

Comment thread packages/data-designer-config/src/data_designer/config/models.py
@johnnygreco
Copy link
Copy Markdown
Contributor

Thanks for putting this together, @nabinchha — the four entry points are mapped cleanly to issue #589 and the regression tests pin each one. I had a few thoughts after a careful read with edge cases in mind.

Summary

This PR lands the deprecation phase for the implicit-default-provider concept tracked in #589: ModelConfig.provider=None, ModelProviderRegistry.default, the YAML default: key, and the CLI's "Change default provider" flow now emit a DeprecationWarning, with docs/tests/fixtures updated to the explicit-provider= pattern. The implementation faithfully covers the five items the issue called out, and stays additive (no behavior change beyond warnings).

Findings

Greptile has already flagged two issues that I won't repeat here:

  • warnings.warn inside provider_repository.py's try/except Exception block (P1 — real correctness bug under filterwarnings("error", DeprecationWarning)).
  • stacklevel=2 inside Pydantic mode="after" validators pointing into pydantic internals (P2).

Both are legitimate; please address them before merge.

Warnings — Worth addressing

packages/data-designer/src/data_designer/cli/repositories/provider_repository.py:40-46 — Warning copy mixes two YAML files

  • What: The message says "Remove it and refer to providers by name in your ModelConfig entries." But this code path is reading model_providers.yaml, where ModelConfig entries don't live (those are in model_configs.yaml). A user following the message literally goes looking in the wrong file.
  • Why: The same default: key is also surfaced by default_model_settings.get_default_provider_name(), and that copy is clearer ("Remove it and specify provider= explicitly on each ModelConfig instead"). Inconsistency between the two messages doesn't help either.
  • Suggestion: Mirror the wording from default_model_settings.py:107-113 so both YAML-default warning sites give the same migration instruction. Could also extract the message into a module-level constant if we expect it to live in two places.

packages/data-designer/src/data_designer/interface/data_designer.py:161-166 — Cascade of two warnings on a single DataDesigner() call

  • What: When model_providers=None and the YAML carries default:, get_default_provider_name() emits warning docs: adding initial mkdocs structure #1, then resolve_model_provider_registry(..., default_provider_name="...") falls into the multi-provider branch and triggers _warn_on_explicit_default for warning DataDesigner.make_seed_reference_from_file doesn't support paths with multiple parquet partition #2. Two distinct deprecations fire for the same underlying cause (the user has a YAML default).
  • Why: They do point at different remediations (edit the YAML vs. specify provider= per ModelConfig), so showing both is defensible — but a user instantiating DataDesigner() with the legacy on-disk config will see two separate DeprecationWarnings with overlapping language and may be confused about whether they're the same problem.
  • Suggestion: At minimum, worth confirming this is intended. If we'd rather emit only the most specific one, resolve_model_provider_registry could skip passing default= when the value came from get_default_provider_name() (since that site already warned), or the resolve-side warn could be conditional on whether the default came from a user-passed argument vs. the YAML fallback. Happy to leave as-is if the team prefers two nudges over one.

Suggestions — Take it or leave it

packages/data-designer-engine/src/data_designer/engine/model_provider.py:57-71model_fields_set triggers on explicit default=None

  • What: The validator fires on "default" in self.model_fields_set. That set includes the field whenever a caller passes it explicitly — including ModelProviderRegistry(providers=[...], default=None), where the caller is explicitly opting out of having a default. The deprecation message ("ModelProviderRegistry.default is deprecated") would be a bit misleading there.
  • Why: No production caller does this today (I checked — provider_service.py:32 constructs the CLI-local ModelProviderRegistry, not this one), so this is theoretical. But the inline comment claims the warn "fires only when the caller actually passed default=", and a future reader who tightens that contract has a small footgun waiting.
  • Suggestion: Tighten the predicate to "default" in self.model_fields_set and self.default is not None, and update the inline comment to match. Trivial change, removes the only misleading-warning case I could construct.

packages/data-designer-config/tests/config/test_models.py — No regression test for the model_validate deserialization path

  • What: The PR description explicitly calls out the intended blast radius: "any ModelConfig built without provider= (including legacy serialized configs loaded via model_validate) will now emit a warning." The construction path is covered by test_model_config_provider_none_emits_deprecation_warning, but there's no analogous test pinning the deserialization path (ModelConfig.model_validate({"alias": ..., "model": ...}) without provider).
  • Why: Both paths funnel through the same validator today, so the test coverage is implicit — but pinning it explicitly protects against a future refactor that, say, only runs the validator on construction and not on revalidation.
  • Suggestion: Add a one-liner asserting ModelConfig.model_validate({"alias": "x", "model": "y"}) emits the warning, alongside the existing construction test.

packages/data-designer-config/src/data_designer/config/models.py:517 and packages/data-designer-engine/src/data_designer/engine/model_provider.py:19 — Pydantic-native deprecation marker

  • What: Pydantic v2 supports Field(..., deprecated=True) which produces an automatic DeprecationWarning plus IDE/JSON-schema metadata for downstream tooling. The PR uses docstring notes plus custom validators.
  • Why: Validator-based warnings only fire at construction time; Field(deprecated=True) also fires on attribute access, and integrates with schema/doc generators. Not a behavioral upgrade for users today, but it's what tooling tends to look for.
  • Suggestion: Optional. If we want the deprecation to be discoverable from generated schemas/IDE tooltips, swap to provider: str | None = Field(default=None, deprecated="...") and default: str | None = Field(default=None, deprecated="..."). The custom validator can stay or be retired depending on whether you want the per-call warning semantics that Field(deprecated=...) doesn't replicate.

What Looks Good

  • The single-provider carve-out in resolve_model_provider_registry is the right call — it keeps DataDesigner(model_providers=[one_provider]) quiet while still nudging users on the multi-provider path. The accompanying test_resolve_single_provider_quiet_under_deprecation test pins this nicely.
  • model_fields_set for _warn_on_explicit_default is a clean way to distinguish caller intent from the field default — that's a thoughtful detail.
  • Coverage is thorough: each of the four entry points has both a positive (warns) and negative (stays quiet under simplefilter("error", DeprecationWarning)) test. The "stays quiet" pins are particularly valuable for catching future regressions.
  • Docs admonitions land in the right places and consistently link back to issue refactor: deprecate implicit default model provider routing #589, which makes the deprecation easy to follow from any entry point.

Verdict

Needs changes — Greptile's two findings (the swallowed-warning bug and the stacklevel issue) should be addressed before merge; everything else above is non-blocking. Once those are in, this is a tidy deprecation cycle.


This review was generated by an AI assistant.

Greptile P1: ProviderRepository.load emitted its DeprecationWarning
inside a `try/except Exception` block. Under
`filterwarnings("error", DeprecationWarning)` the warn would raise,
the except would swallow it, and `load()` would silently return None
(losing the registry). Move the warn outside the catch-all so the
strict-warning path no longer drops valid configs.

Greptile P2 / johnnygreco: `_warn_on_implicit_provider` and
`_warn_on_explicit_default` use `stacklevel=2`, which lands inside
pydantic v2's validator dispatch rather than at the user's
`ModelConfig(...)` / `ModelProviderRegistry(...)` call. That broke
both attribution (the source line was unhelpful) and Python's
once-per-location dedup (every call collapsed to the same
pydantic-internal key, suppressing all but the first warning).
Introduce `data_designer.config.utils.warning_helpers.warn_at_caller`,
which walks past the helper, validator, and any pydantic frames to
find the user's call site and emits via `warnings.warn_explicit` with
the user frame's `__warningregistry__`. Keeps attribution accurate
and dedup keyed on the user's (filename, lineno).

johnnygreco: align the `provider_repository.py` warning copy with the
sibling site in `default_model_settings.py` ("specify provider=
explicitly on each ModelConfig instead") so both YAML-default warning
sites give the same migration instruction. The previous wording
pointed users at "ModelConfig entries" inside `model_providers.yaml`,
where ModelConfig entries don't actually live.

johnnygreco: dedup the cascade in `DataDesigner.__init__`. With
`model_providers=None` and a YAML `default:`, the user previously saw
two DeprecationWarnings for the same root cause —
`get_default_provider_name()` warns about the YAML key, then
`resolve_model_provider_registry(...)` re-warns from
`_warn_on_explicit_default`. Suppress the registry-level duplicate in
the YAML-fallback branch via `warnings.catch_warnings()` so users see
exactly one warning per user action.

johnnygreco: tighten `_warn_on_explicit_default` to fire only when
`default is not None`. Passing `default=None` explicitly is
semantically equivalent to omitting it (caller is opting *out* of a
registry-level default), and shouldn't trigger the deprecation
nudge.

johnnygreco: add a `model_validate({...})` regression test for
`ModelConfig` so the deserialization path (legacy on-disk configs)
is pinned alongside the construction path.

Tests:
- Update `test_load_exists` and `test_save` to omit `default=` so the
  roundtrip stops exercising the deprecated YAML-default path
  unguarded (Greptile note).
- Wrap `test_resolve_model_provider_registry_with_explicit_default`,
  `test_get_provider`, and
  `test_init_user_supplied_providers_preserve_first_wins_over_yaml_default`
  in `pytest.warns` so the suite stays green under
  `-W error::DeprecationWarning` (Greptile note).
- Add `test_explicit_default_none_does_not_emit_deprecation_warning`
  to pin the tightened predicate.
- Add `test_init_yaml_default_emits_single_deprecation_warning` to
  pin the cascade-dedup behavior.

Refs #589

Made-with: Cursor
@nabinchha
Copy link
Copy Markdown
Contributor Author

Thanks for the careful reads, @greptile-apps and @johnnygreco. Pushed 17a48acc addressing the feedback. Summary:

Blockers (P1/P2) — fixed

Greptile P1 / johnnygreco — warnings.warn inside try/except Exception in ProviderRepository.load. Moved the warn outside the catch-all. load_config_file exceptions still return None silently, the deprecation warn fires unconditionally if default: is present, and model_validate errors continue to fall back to None. Confirmed under filterwarnings("error", DeprecationWarning) the registry now loads correctly instead of being silently dropped.

Greptile P2 / johnnygreco — pydantic validator stacklevel. Introduced data_designer.config.utils.warning_helpers.warn_at_caller, which walks past the helper, validator, and pydantic-internal frames to find the user's call site and emits via warnings.warn_explicit using the user frame's __warningregistry__. Both attribution and dedup now key on the user's (filename, lineno). Verified with a quick python -c repro: warning is now attributed to <string>:22 (the user's model_validate call) rather than a pydantic-internal frame, and Python's once-per-location dedup behaves correctly per user call site.

Note on module_globals: I deliberately don't pass it to warn_explicit. Including it triggers linecache source lookup, which fails for __main__ scripts run with python -c because the loader is BuiltinImporter (ImportError: '__main__' is not a built-in module). The test_fingerprint_deterministic_across_processes test caught this. Skipping module_globals keeps the warning robust at the cost of an empty source line in formatted output — a fair trade.

Worth addressing — fixed

johnnygreco — provider_repository.py warning copy mixed two YAML files. Mirrored the wording from default_model_settings.py so both YAML-default warning sites give the same migration instruction ("Remove it and specify provider= explicitly on each ModelConfig instead"). The previous copy pointed users at "ModelConfig entries" inside model_providers.yaml, where they don't live.

johnnygreco — cascade of two warnings on a single DataDesigner() call. Suppressed the registry-level duplicate in the YAML-fallback branch via a scoped warnings.catch_warnings() filter on the \"ModelProviderRegistry.default is deprecated\" message. Users now see exactly one DeprecationWarning (the more specific YAML-default one) instead of two for the same root cause. Added test_init_yaml_default_emits_single_deprecation_warning to pin the behavior.

Take-it-or-leave-it — taken

johnnygreco — _warn_on_explicit_default triggers on default=None explicitly. Tightened the predicate to \"default\" in self.model_fields_set and self.default is not None. Updated the inline comment. Added test_explicit_default_none_does_not_emit_deprecation_warning to pin.

johnnygreco — no regression test for the model_validate deserialization path. Added test_model_config_provider_none_via_model_validate_emits_deprecation_warning.

Test hygiene (Greptile notes, addressed)

  • Wrapped test_resolve_model_provider_registry_with_explicit_default, test_get_provider, and test_init_user_supplied_providers_preserve_first_wins_over_yaml_default in pytest.warns so the suite stays green under -W error::DeprecationWarning.
  • Updated test_load_exists and test_save to omit default= from the roundtrip stub, since they weren't testing the default field — keeps these tests clear of the deprecated path while leaving test_load_with_yaml_default_emits_deprecation_warning to pin the warning.

Take-it-or-leave-it — left

johnnygreco — Field(deprecated=True) for tooling integration. Left as-is. Field(deprecated=True) fires on attribute access, which would warn every time internal callers (e.g. get_default_provider_name) read self.default — false positives for code paths that aren't deprecated by themselves, only the user setting them is. The validator-with-model_fields_set distinction is the right shape for this deprecation. Happy to revisit if there's a tooling-discoverability story I'm missing.

Greptile — "warning storm" from get_default_model_configs() over legacy YAML entries. Naturally deduped now: with warn_at_caller attributing to the call site (default_model_settings.py:71), all N legacy entries dedupe to the same (filename, lineno) key. So a user with five provider-less entries sees one warning, not five — without a special-case dedup pass. Confirmed in make test output (single warning per call-site, multiple test cases).

johnnygreco — "two distinct nudges" debate on the cascade. I chose dedup (one warning per user action) over two separate nudges. The YAML-default warning already explains the migration; firing the registry-level deprecation immediately afterward for the same root cause was confusing without adding information. Easy to flip if the team prefers the louder option.

Test counts

make test: 3,114 passed (540 config + 1,921 engine + 653 interface). 106 deprecation warnings emitted as expected — all attributed to user call sites under the new helper.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 1, 2026

The implementation in 17a48acc looks clean across all the changes. Quick rundown:

P1 (provider_repository.py): Correctly split into two separate try/except blocks — load_config_file has its own catch, and the deprecation warn fires unconditionally between it and the model_validate catch. The filterwarnings("error") path now works correctly.

P2 (warn_at_caller): The stack-walk approach is the right tool here. Skipping frames 0+1 then walking past pydantic.* modules to warn_explicit with the user frame's __warningregistry__ gives correct attribution and per-call-site dedup. The rationale for omitting module_globals (avoiding BuiltinImporter failures for python -c scripts) is sound.

Cascade dedup (data_designer.py): The catch_warnings() scoping is tight and conditional — only adds the "ignore" filter when default_provider_name is not None, so the no-YAML-default path remains unaffected. Message-prefix matching on "ModelProviderRegistry.default is deprecated" is stable enough since it's internal.

_warn_on_explicit_default tightening: "default" in self.model_fields_set and self.default is not None is the right predicate. default=None is semantically "opt out of a registry default" and shouldn't warn.

resolve_model_provider_registry single-provider path: Avoids passing default= in the one-provider case, keeping the common construction path silent. Multi-provider still passes default and warns as expected.

One minor observation: warn_at_caller checks module_name.startswith("pydantic") to skip internal frames. This works for standard pydantic dispatch but wouldn't catch frames from annotated_types or other pydantic-adjacent packages if they appear in the chain. In practice this isn't a problem given the current validator structure, but worth keeping in mind if the validator depth changes.

Everything else — dedup via warn_at_caller naturally collapsing N legacy YAML entries to one warning, the default=None regression test, the model_validate deserialization path test, and the pytest.warns wrapping for -W error hygiene — all look correct. The "take-it-or-leave-it" justifications are solid. LGTM.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

@johnnygreco
Copy link
Copy Markdown
Contributor

No blockers from me — both Greptile points are addressed correctly and the warn_at_caller helper is a clean fix for the pydantic-stacklevel problem. Happy to ship as-is.

A couple of small follow-ups (do them here or in a fast-follow, your call):

  1. Add the missing test for the tightened predicate. The commit message mentions test_explicit_default_none_does_not_emit_deprecation_warning, but it's not in tests/engine/test_model_provider.py — only the field-default case (test_no_default_does_not_emit_deprecation_warning) is pinned. The new self.default is not None clause is currently untested. One-liner:

    def test_explicit_default_none_does_not_emit_deprecation_warning(stub_foo_provider):
        with warnings.catch_warnings():
            warnings.simplefilter(\"error\", DeprecationWarning)
            ModelProviderRegistry(providers=[stub_foo_provider], default=None)
  2. Call out the ProviderRepository.load() behavior change in the PR description. Moving the warn outside the catch-all is the right fix, but it does mean that under filterwarnings(\"error\", DeprecationWarning) load() now raises instead of silently returning None. That's the correct semantics, but worth flagging for anyone reading the changelog.

Smaller nits, only worth chasing if you're already in the file:

  • warn_at_caller matches module_name.startswith(\"pydantic\"), which would also skip a user module named e.g. pydantic_helpers.py. Tightening to == \"pydantic\" / startswith(\"pydantic.\") (+ the pydantic_core variant) would be safer.
  • The catch_warnings() dedup in DataDesigner.__init__ is a fine band-aid, but the underlying issue is that resolve_model_provider_registry synthesizes default= for the multi-provider case and trips the library's own deprecation. Worth a follow-up issue once the registry no longer requires default for multi-provider — at that point the suppression can come out.

johnnygreco
johnnygreco previously approved these changes May 1, 2026
"""
default = _get_default_providers_file_content(MODEL_PROVIDERS_FILE_PATH).get("default")
if default is not None:
warnings.warn(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up to johnnygreco's warn_at_caller work: this site still uses warnings.warn(stacklevel=2), so on the only real call path (DataDesigner.__init__:162) the warning is attributed to the data_designer library, not user code. python's default filter is default::DeprecationWarning:__main__ + ignore::DeprecationWarning, so library-attributed deprecations get silenced — verified empirically: a normal DataDesigner() call with a YAML default: set shows nothing under default filters. could either fire the warning from the __init__ boundary, or call warn_at_caller here too (with a small skip-list extension for data_designer.). non-blocking but worth doing in the same cycle while the deprecation messaging is fresh.

frame = sys._getframe(2) if hasattr(sys, "_getframe") else None
while frame is not None:
module_name = frame.f_globals.get("__name__", "")
if not module_name.startswith("pydantic"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to johnnygreco's nit about startswith("pydantic") matching pydantic_helpers.py, there's a related issue going the other direction: when a ModelConfig or ModelProviderRegistry is constructed inside a data_designer helper (e.g. config builders, YAML loaders, resolve_model_provider_registry), the first non-pydantic frame is data_designer code, not the user's call site. the warning gets stamped at the library and silenced under default DeprecationWarning filters. confirmed via repro: resolve_model_provider_registry([a, b]) ends up attributed to model_provider.py:108. extending the skip to data_designer. (or accepting caller-supplied prefixes) would close the gap. easy to add a regression test asserting warning.filename lands on the test file rather than a library module.

andreatgretel (PR #594): the YAML-default warning in
`get_default_provider_name` and the registry-default warning emitted
from inside DataDesigner helpers were attributing to data_designer
library frames, not user code. Python's default filter chain includes
`ignore::DeprecationWarning`, so library-attributed entries are
silenced — meaning a normal `DataDesigner()` call with a YAML
`default:` set showed nothing, and `resolve_model_provider_registry`
warnings were similarly invisible. Two related changes:

1. `warn_at_caller`: extend the default skip-list from `("pydantic",)`
   to `("pydantic", "pydantic_core", "data_designer")` so the walk
   escapes both pydantic's validator-dispatch frames and data_designer
   helper frames before attributing. Also tighten the prefix predicate
   to exact-or-dotted-prefix matching (`name == p or
   name.startswith(p + ".")`) so e.g. `pydantic_helpers` is not
   falsely matched as part of `pydantic` (johnnygreco nit). Allow
   callers to pass a custom `skip_prefixes` for flexibility. Drop the
   "skip frame 0+1 unconditionally" guard now that prefix matching
   covers it.

2. `get_default_provider_name`: switch from
   `warnings.warn(stacklevel=2)` to `warn_at_caller`. The previous
   stacklevel pointed into `default_model_settings.py`, which is a
   library file → silenced under default filters. Verified the fix
   empirically with `python -W default`: warning is now attributed to
   the user's call site and rendered.

johnnygreco (PR #594): add the missing
`test_explicit_default_none_does_not_emit_deprecation_warning`
regression for the `self.default is not None` predicate landed in
the prior round.

Tests:
- New `test_warning_helpers.py` pins prefix-matching precision
  (rejects `pydantic_helpers` / `data_designer_other`), default
  skip-list contents, attribution past skip-prefix frames, and
  per-call-site dedup behavior.
- `test_get_default_provider_name_warning_attributes_to_user_frame`
  pins andreatgretel's repro for the YAML-default site.
- `test_explicit_default_warning_attributes_to_user_frame` pins the
  multi-frame case: construction goes through
  `resolve_model_provider_registry`, so the walk has to escape both
  pydantic and data_designer before landing on the test file.
- `test_explicit_default_none_does_not_emit_deprecation_warning`
  pins johnnygreco's predicate-tightening regression.

3,124 tests pass (540 config + 1,923 engine + 653 interface; +10 net
from this round).

Refs #589

Made-with: Cursor
@nabinchha
Copy link
Copy Markdown
Contributor Author

Thanks @andreatgretel and @johnnygreco — pushed 247fa30 addressing both review notes.

Address

andreatgretel — default_model_settings.py:107 invisible under default filters. Confirmed your repro: warnings.warn(stacklevel=2) from get_default_provider_name was attributing to default_model_settings.py (a library file), and Python's default filter chain (ignore::DeprecationWarning + default::DeprecationWarning:__main__) silenced it. Switched to warn_at_caller. Verified empirically with python -W default -c '...' that the warning is now attributed to the user's call site and rendered.

andreatgretel — warn_at_caller skip-list missed data_designer. frames. Same root cause for the registry-level warning when ModelConfig / ModelProviderRegistry is constructed inside a data_designer helper (e.g. resolve_model_provider_registry). Extended DEFAULT_INTERNAL_PREFIXES from ("pydantic",) to ("pydantic", "pydantic_core", "data_designer") so the walk escapes both pydantic's validator-dispatch frames and data_designer helper frames before attributing. Also added a skip_prefixes kwarg so callers can extend the list per-call.

johnnygreco — prefix-collision nit (startswith(\"pydantic\") matches pydantic_helpers). Tightened the predicate to exact-or-dotted-prefix matching (name == prefix or name.startswith(prefix + \".\")). Pinned with test_module_in_prefixes_rejects_prefix_collision covering pydantic_helpers, pydanticfoo, data_designer_other.

johnnygreco — missing test_explicit_default_none_does_not_emit_deprecation_warning. Added. Pins the self.default is not None clause from the prior round.

Other notes from the May-1 review

  • ProviderRepository.load() behavior change under filterwarnings(\"error\", DeprecationWarning). Worth flagging in the changelog — load() now raises instead of silently returning None when the YAML default: triggers a strict warning. Correct semantics, but a visible behavior change.
  • catch_warnings() dedup band-aid in DataDesigner.__init__. Agreed it's a band-aid. The clean fix is to drop the synthesized default= from resolve_model_provider_registry once the multi-provider path no longer requires it. Tracking as a follow-up to refactor: deprecate implicit default model provider routing #589.

Tests

New regressions:

  • test_warning_helpers.py (new): prefix-precision, default skip-list contents, attribution past skip-prefix frames, per-call-site dedup.
  • test_get_default_provider_name_warning_attributes_to_user_frame: pins andreatgretel comment 1.
  • test_explicit_default_warning_attributes_to_user_frame: pins andreatgretel comment 2 (constructs through resolve_model_provider_registry so the walk has to escape both pydantic and data_designer).
  • test_explicit_default_none_does_not_emit_deprecation_warning: pins johnnygreco's missing test.

make test: 3,124 passed (540 config + 1,923 engine + 653 interface; +10 net this round). Lint/format clean.

andreatgretel
andreatgretel previously approved these changes May 5, 2026
Copy link
Copy Markdown
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

greptile-apps (PR #594, r3189904028): `ProviderRepository.load`'s
YAML-default `DeprecationWarning` was using `warnings.warn(stacklevel=2)`,
which attributes to whichever data_designer frame called `load()` —
controllers, services, list/reset commands, agent introspection. Every
real call path lands on `data_designer.cli.*`, which falls under
Python's default `ignore::DeprecationWarning` filter and is silenced.
Audit found two more sites with the same problem:

- `DatasetBuilder._resolve_async_compatibility` (`allow_resize` /
  issue #552) — was using `stacklevel=4` to walk past
  `_resolve_async_compatibility -> build/build_preview -> interface ->
  user`. Brittle: any added frame (decorator, async wrapping, the
  `try/except DeprecationWarning: raise` boundary) shifts attribution
  silently. The existing test passed only because it used
  `simplefilter("always") + record=True`, which records warnings
  regardless of attribution.
- `ProviderController._handle_change_default` — was using
  `stacklevel=2`, which lands on the menu dispatcher in the same
  controller module. `print_warning` already shows the message
  visually, but programmatic observers (`pytest.warns`,
  `filterwarnings("error", ...)`) saw a library-attributed entry that
  default filters silenced.

All three migrated to `warn_at_caller` (the helper from 247fa30) so
attribution lands on the user's call site regardless of internal
chain shape. `data_designer` is already in
`DEFAULT_INTERNAL_PREFIXES`, so the walk escapes the entire library
in one pass.

Added attribution regression tests at each site asserting
`warning.filename == __file__`. A future regression to
`warnings.warn(stacklevel=N)` now fails CI instead of silently
silencing the user-facing nudge:

- `test_load_with_yaml_default_attributes_warning_to_caller`
  (test_provider_repository.py)
- `test_resolve_async_compatibility` extended with the same assertion
- `test_handle_change_default_emits_deprecation_warning` rewritten
  from `pytest.warns(...)` to a `catch_warnings(record=True)` block
  that filters for the message and asserts `filename == __file__`
  (`pytest.warns` does not check attribution, so the rewrite is
  required to actually catch the regression).

3,125 tests pass (548 config + 1,923 engine + 654 interface).

Refs #589
@nabinchha nabinchha merged commit f73da19 into main May 5, 2026
84 of 93 checks passed
@nabinchha nabinchha deleted the nmulepati/refactor/589-deprecate-default-provider-routing branch May 5, 2026 19:39
lbliii added a commit that referenced this pull request May 6, 2026
Forward-port the doc changes that landed in main since this branch was
cut, translating MkDocs admonition syntax to Fern components. Three
product changes drove the updates:

PR #594 — deprecate implicit default-provider routing:
- concepts/models/configure-model-settings-with-the-cli.mdx: deprecate
  "Change default provider" workflow + inline mark on `data-designer
  config list` output
- concepts/models/custom-model-settings.mdx: warning that `provider=`
  is now required on every ModelConfig
- concepts/models/default-model-settings.mdx: warning that the
  registry-level default-provider concept is deprecated
- concepts/models/model-providers.mdx: same warning at the top of the
  ModelProvider overview
- concepts/models/inference-parameters.mdx: add explicit `provider=
  "openai"` to the dalle ModelConfig example

PR #592 — async engine becomes the default:
- concepts/architecture-and-performance.mdx: rewrite Execution Model
  intro to mention both engines, qualify "How It Works" as sync-engine
  semantics, update Concurrency Formula and Throttle notes from "Sync
  engine caveat" to "Engine paths", and add a full new "## Async
  Engine" section (per-model timeouts, run outcomes / Early Shutdown,
  opt-out via DATA_DESIGNER_ASYNC_ENGINE=0). Add `provider="nvidia"`
  to the my-model example.
- concepts/custom_columns.mdx: note that sync `cell_by_cell`
  generators dispatch concurrently under the async engine; mock with
  `MagicMock(spec=ModelFacade)` so async methods are auto-detected.
- concepts/processors.mdx: warning that the async engine enforces
  row-count invariance in process_before/after_batch.
- devnotes/posts/async-all-the-way-down.mdx: append an "Update" callout
  noting the engine is now default, with a link to the Architecture
  page anchor.

All `!!! warning|note|tip "Title"` admonitions converted to Fern
<Warning|Note|Tip title="..."> components. Internal links to mkdocs
relative paths (`../../concepts/foo.md#anchor`) rewritten to canonical
Fern URLs (`/concepts/foo#anchor`).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
andreatgretel added a commit that referenced this pull request May 7, 2026
* docs: migrate documentation from MkDocs to Fern

Adds a Fern Docs build under fern/ alongside the existing mkdocs site.
Production target docs.nvidia.com/nemo/datadesigner with floating-latest
pointer (latest.yml symlink) at v0.5.8. Migrated all concept, recipe, plugin,
dev-note, and tutorial pages to MDX with NVIDIA theme and custom components
(Authors, MetricsTable, TrajectoryViewer, NotebookViewer, BadgeLinks).
Tutorial notebooks now render via NotebookViewer with captured outputs (text,
DataFrames, inline images) - new make targets generate-fern-notebooks and
generate-fern-notebooks-with-outputs drive the .py -> executed .ipynb -> Fern
JSON+TS pipeline, pinning docs to Python 3.13 to dodge pyarrow wheel issues
on 3.14. Python API reference is configured via Fern libraries: pointing at
data-designer-config; output is gitignored and regenerated locally with
'fern docs md generate'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: add datadesigner-docs agent skill

Captures the patterns established in the Fern migration so agents (and humans)
can maintain fern/ confidently. Modeled after NVIDIA-NeMo/Gym's
nemo-gym-docs SKILL.md, adapted for our floating-latest versioning,
notebook-with-outputs pipeline, dev-notes kit components, and the MDX gotchas
hit during migration (pymdown attr_list, --8<-- snippet syntax, frontmatter
authors-as-JSX-scope-variable, etc.). Routes triggers like "edit docs", "add
doc page", "regenerate notebooks", "update dev note", "add API reference" to
this skill.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: address PR review for Fern migration

- Delete stale fern/versions/_nav_order.yml (references non-existent
  ./versions/latest/pages/ — paths were never updated when latest/ was
  renamed to v0.5.8/, no consumer found in docs.yml or v0.5.8.yml).
- Remove unused custom components: Tag.tsx, CustomCard.tsx, Include.tsx
  (had its own untested markdown parser), ExpandableCode.tsx (broken in
  Fern SSR runtime). Drop expandable-code.css from docs.yml. Authors,
  BadgeLinks, MetricsTable, NotebookViewer, TrajectoryViewer remain
  (each has at least one call site).
- BadgeLinks: remove DEFAULT_BADGES with placeholder URLs; make `badges`
  prop required so we can never accidentally ship 'your-org/your-repo'.
- NotebookViewer: document the XSS trust boundary on output cells of
  format: "html". Outputs flow .py source → jupytext --execute → committed
  *.ts (review boundary). Add an inline comment at the dangerouslySetInnerHTML
  call site pointing back to the trust-model section.
- README: add Windows caveat on the latest.yml symlink — Windows users need
  core.symlinks=true before clone or Fern will reject the version config.
- Makefile: tighten generate-fern-notebooks source probe from `ls .../*.ipynb`
  (which can return success on non-file errors) to `[ -f docs/notebooks/1-the-basics.ipynb ]`,
  matching the reviewer's suggestion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: address @aschilling-nv review on fern/docs.yml

Three suggestions from the Fern review, all matching Curator's docs.yml
conventions:

- instances[0].url: drop the https:// protocol prefix to match Curator's
  shape (e.g. nemo-curator.docs.buildwithfern.com/nemo/curator).
- logo.href: was '/'; now points at /nemo/datadesigner/getting-started/welcome
  (the actual landing page) so clicking the logo lands on real content
  instead of the bare basepath.
- experimental.basepath-aware: true — opts into Fern's basepath-aware
  routing so internal links don't double-prefix the /nemo/datadesigner
  segment.
- redirects: also fix /nemo/datadesigner/index.html → getting-started/welcome
  (was bouncing to /latest, which is just the version slug); add
  /getting-started → /getting-started/welcome to mirror Curator's
  /home → /home/welcome convention.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: put dev notes overview timestamps on separate lines

Signed-off-by: Kirit93 <kthadaka@nvidia.com>
Made-with: Cursor

* docs: redesign dev-notes index with BlogCard component

Replaces the generic <CardGroup>/<Card> grid (same green icon × 10, date
glued to bottom of description) with a purpose-built BlogCard for the
dev-notes landing page.

Each card now has:
- Hero image (16:9, lazy-loaded, click-to-zoom via Fern's rmiz wrapper)
- ALL-CAPS date eyebrow as proper subtitle styling
- Title, 3-line clamped description
- Author byline at the bottom: avatar stack (overlapping) + first author
  name + "+N", pulling from the existing devnotes/.authors.yml registry
- Hover: NVIDIA-green border + subtle lift

Posts without a hero image fall back to a deterministic hash-based
gradient placeholder + monogram (DJB2 hash of href → HSL hue, with the
muddy-yellow band 40–90° remapped). Same post always gets the same look.

Notes:
- Image prop is React.ReactNode (not string) — pass <img> JSX from MDX
  so Fern's link rewriter can resolve the src to /_local/... in dev and
  /nemo/datadesigner/assets/... in prod. Raw string props bypass the
  rewriter and 404 in dev.
- Card href runs through a small withBasepath() helper since the <a>
  also bypasses Fern's link rewriter.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: flush blog-card hero images to the top of the card

Fern's prose stylesheet applies a top margin to <img> tags, and the
click-to-zoom wrapper Fern injects around each image (<span data-rmiz>)
inherits that margin too. Result: a ~1rem gap between the card's top
edge and the hero image.

Reset margin/padding on the rmiz wrapper spans + the img itself inside
.blog-card__media so the image renders flush against the top border.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: stop blog-card hero from opening Fern's click-to-zoom modal

When an <img> appears in MDX, Fern auto-wraps it with a click-to-zoom
shell (<span data-rmiz>...). On the dev-notes index that shell intercepts
clicks meant for the card's <a> wrapper, so clicking a hero opens a
lightbox AND tries to navigate.

Set pointer-events: none on the rmiz spans + img inside .blog-card__media
so clicks bubble straight to the parent <a> and the card behaves as a
single, predictable link target. Hover still works because pointer-events
on children doesn't block :hover on the ancestor <a>.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: render notebook markdown at build time with markdown-it-py

Replaces NotebookViewer's hand-rolled JS markdown parser (the one with
the ^@br^@ sentinel the reviewer flagged as fragile) with build-time
rendering in the converter.

ipynb-to-fern-json.py now uses markdown-it-py (CommonMark + tables +
strikethrough + raw HTML) to render each markdown cell's source into
source_html, mirroring how code cells already store Pygments-highlighted
source_html. NotebookViewer's markdown branch becomes a single
dangerouslySetInnerHTML on the pre-rendered HTML, with a plain-escape
fallback for old snapshots.

Removes the dead JS helpers (renderMarkdown, isSafeUrl, UL_CLASS,
OL_CLASS) — ~60 lines of brittle regex-based markdown parsing.

Fixes broken rendering of:
- Blockquotes (showed literal > characters before)
- Nested content inside blockquotes (e.g. blockquote with bullet list)
- Fenced code blocks
- Tables
- Multi-paragraph list items

Includes regenerated fern/components/notebooks/*.{json,ts} for all 6
tutorials.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs: rewrite recipes index + replace octicons download links with Fern Info callouts

The recipes/cards.mdx page was still in MkDocs Material format:
- <div class="grid cards" markdown> wrapper (no-op in MDX)
- :material-snake:, :material-database:, :material-tools:, etc. (rendered
  as literal text — Fern uses Font Awesome, not Material icons)
- !!! tip Prerequisite (mkdocs admonition syntax)
- [:material-book-open-page-variant: View Recipe] / [Download Code
  :octicons-download-24:] links with embedded icon shortcodes

Rewrite using Fern's native components: <CardGroup cols={2}> with <Card
title icon href> grouped by category (Code Generation, QA and Chat,
Trace Ingestion, MCP and Tool Use, Plugin Development). Each card has
one primary action (the recipe page); download lives on the recipe page
itself.

Replace the trailing "Download Code :octicons-download-24:" link on
every recipe page (and 2 dev notes) with a <Info title="Download Recipe">
callout pointing at the GitHub blob URL — matching PR #215's
convention. 12 occurrences across 12 files.

Also fixes 6 recipe pages whose frontmatter title was "Untitled"
(unfilled placeholder from auto-migration): text_to_python, basic_mcp,
pdf_qa, multi_turn_chat, product_info_qa, agent_rollout_distillation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs(fern): mirror main's content updates into v0.5.8 MDX pages

Forward-port the doc changes that landed in main since this branch was
cut, translating MkDocs admonition syntax to Fern components. Three
product changes drove the updates:

PR #594 — deprecate implicit default-provider routing:
- concepts/models/configure-model-settings-with-the-cli.mdx: deprecate
  "Change default provider" workflow + inline mark on `data-designer
  config list` output
- concepts/models/custom-model-settings.mdx: warning that `provider=`
  is now required on every ModelConfig
- concepts/models/default-model-settings.mdx: warning that the
  registry-level default-provider concept is deprecated
- concepts/models/model-providers.mdx: same warning at the top of the
  ModelProvider overview
- concepts/models/inference-parameters.mdx: add explicit `provider=
  "openai"` to the dalle ModelConfig example

PR #592 — async engine becomes the default:
- concepts/architecture-and-performance.mdx: rewrite Execution Model
  intro to mention both engines, qualify "How It Works" as sync-engine
  semantics, update Concurrency Formula and Throttle notes from "Sync
  engine caveat" to "Engine paths", and add a full new "## Async
  Engine" section (per-model timeouts, run outcomes / Early Shutdown,
  opt-out via DATA_DESIGNER_ASYNC_ENGINE=0). Add `provider="nvidia"`
  to the my-model example.
- concepts/custom_columns.mdx: note that sync `cell_by_cell`
  generators dispatch concurrently under the async engine; mock with
  `MagicMock(spec=ModelFacade)` so async methods are auto-detected.
- concepts/processors.mdx: warning that the async engine enforces
  row-count invariance in process_before/after_batch.
- devnotes/posts/async-all-the-way-down.mdx: append an "Update" callout
  noting the engine is now default, with a link to the Architecture
  page anchor.

All `!!! warning|note|tip "Title"` admonitions converted to Fern
<Warning|Note|Tip title="..."> components. Internal links to mkdocs
relative paths (`../../concepts/foo.md#anchor`) rewritten to canonical
Fern URLs (`/concepts/foo#anchor`).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs(fern): address @andreatgretel review comments

Four issues from Andre's review pass:

1. /devnotes 404 (index.mdx:23) — section slug is /dev-notes, page slug
   is /dev-notes/overview. Fix the link in the landing page so visitors
   actually reach the dev notes index.

2. TrajectoryViewer.tsx final-answer body shown as literal markdown
   (line 66) — the renderer uses dangerouslySetInnerHTML but
   example-marcia.ts shipped raw markdown (**bold**, \n\n breaks). Visible
   on the deep-research devnote where the trajectory is defaultOpen.
   Pre-render body to HTML in the fixture (matches the original hand-coded
   format pre-migration); document the convention in the ToolCall.body
   doc comment so future fixtures don't regress.

3. Tutorials 5/6 (image generation/editing) ship with 0 captured outputs
   because Flux runs through OpenRouter and OPENROUTER_API_KEY isn't set
   at build time. Cannot regenerate without the key, so add a <Note> at
   the top of each wrapper page pointing readers at the Colab link to
   execute the cells live and see the generated images. Maintainers with
   the key in their environment should re-run
   `make generate-fern-notebooks-with-outputs` before merge to capture
   the snapshots.

4. Legacy nvidia-nemo.github.io/DataDesigner/* URLs in MDX prose (8
   occurrences across 5 files) rewritten to canonical Fern paths so
   visitors don't get sent back to the legacy GitHub Pages site once
   docs.nvidia.com/nemo/datadesigner becomes the production URL:
   - The single deep link in data-designer-got-skills.mdx →
     /concepts/models/default-model-settings
   - All other "documentation home" links (CONTRIBUTING ×2,
     async-all-the-way-down ×2, owning-the-model-stack, design-principles
     ×2) → /getting-started/welcome (the canonical landing slug, matches
     logo.href in docs.yml)

   Notebook .py source URLs are tracked separately as part of the
   notebook-regen work.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs(fern): regenerate notebook snapshots with Flux outputs captured

Re-ran make generate-fern-notebooks-with-outputs with NVIDIA_API_KEY +
OPENROUTER_API_KEY set, now that we have a NVIDIA key with permission
on nemotron-3-nano-30b-a3b. All 6 tutorials regenerated; the two image
tutorials (5 and 6) which had been shipping with 0 outputs now have
captured Flux generations:

  1 the-basics:                12/15 outputs
  2 structured-outputs:        13/17 outputs
  3 seeding-with-a-dataset:    10/13 outputs
  4 providing-images:          13/17 outputs (1 image)
  5 generating-images:          8/10 outputs (2 images) ← was 0/12
  6 image-to-image-editing:     9/12 outputs (10 images) ← was 0/14

The two `<Note title="Run in Colab to see ...">` workarounds I added on
the 5/6 wrapper pages are no longer needed — outputs render inline now.
NotebookViewer's own "Run in Google Colab" banner is still rendered
from the wrapper's `colabUrl` prop, so the live-execute path stays one
click away.

Bumps the diff size noticeably (notebook 6 .ts is ~22MB of base64-
encoded PNGs from 10 edited images), but that's intentional — these
images are the proof points for what the Flux/MCP image-context
tutorials actually produce.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs(fern): unbreak SSR — shrink notebook image outputs + fix BlogCard React import

Two server-side render bugs surfaced when running `fern docs md generate &&
fern docs dev` (the static-preview path):

1. The 22 MB notebook 6 .ts module (full-resolution Flux PNGs from 10 edited
   images) tripped Fern's SSR module-evaluation step. Once that module
   failed to evaluate, the shared component bundle failed to load on every
   page, replacing each MDX body with `<span data-intent="error">Something
   went wrong!</span>` while the layout chrome continued to render.

   Fix in fern/scripts/ipynb-to-fern-json.py: after extracting an
   image/png output, pass it through Pillow to (a) downscale so the
   longest edge is at most 800 px, (b) re-encode as JPEG q=82 progressive
   (Flux outputs are photographic — JPEG compresses 5–10× better than PNG
   for this content). NotebookViewer's CellOutput interface gains a
   `mime` field so the data URL uses the actual encoded MIME type. Result:

       notebook 6: 22 MB → 4.6 MB
       notebook 5: 3.8 MB → 1.8 MB
       notebook 4: 514 KB → 116 KB
       (notebooks 1–3 unaffected — no image outputs)

2. fern/components/BlogCard.tsx referenced `React.ReactNode` twice without
   importing React. Other components in the kit use `import type
   { ReactNode } from "react"`; BlogCard was the outlier. Aligned the
   import style — even though this didn't end up being the trigger, leaving
   the dangling reference would have eventually caused a strict-mode SSR
   regression.

Sweep test against http://localhost:3000/nemo/datadesigner/* — landing,
concepts, tutorials (including 5/6 image notebooks), dev notes, recipes,
and code-reference topic pages all render with their content; no error
spans.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

* docs(fern): add MkDocs-shape redirects for legacy URLs

The legacy site at https://nvidia-nemo.github.io/DataDesigner/ used
MkDocs-Material conventions (mkdocstrings + blog plugin + mkdocs-jupyter
+ directory URLs). Several path segments and page slugs differ from
Fern's slugified-title routing — search-engine indexed links and
copy-pasted bookmarks land on 404 without redirects.

Adds 30+ specific redirect rules covering every renamed surface:

- Tutorials: /notebooks/<filename>/ -> /tutorials/<title-slug>
  (page-title slugs differ from .ipynb filenames; one rule per notebook
   plus a README -> overview alias).

- Recipes: /recipes/<snake_subsection>/<snake_page>/ ->
  /recipes/<kebab-subsection>/<kebab-page>. Per-page rules for each of
  the 10 recipes (page titles diverged from .py filenames — e.g.
  basic_mcp -> basic-mcp-tool-use, search_agent -> nemotron-super-search-agent),
  followed by subsection :rest* fallbacks.

- Concepts: /concepts/mcp/* -> /concepts/tool-use-mcp/* (subsection
  rename, with & dropped, not -and-). Per-page rules for safety-and-limits
  -> safety-limits and configure-mcp-cli -> cli-configuration where
  page titles diverged from filenames.

- Code Reference: /code_reference/<module>/ ->
  /code-reference/topic-overviews/<module>. Per-page rules for the six
  underscored modules (column_configs, config_builder, run_config,
  sampler_params, validator_params, data_designer_config) since Fern's
  page-slug rule kebabs underscores.

- Plugins: filesystem_seed_reader -> file-system-seed-reader-plugins
  (Fern inserts hyphens between CamelCase words). example -> example-plugin,
  available -> available-plugin-list (page-title slugs).

- Dev Notes: blog plugin's /devnotes/posts/<slug>/ -> /dev-notes/<slug>.
  Per-page rules for text-to-sql -> text-to-sql-for-nemotron-super and
  rqa -> rqa-dataset (post titles diverged from filenames).

- /devnotes -> /dev-notes/overview (section landing).

MkDocs's directory-URL trailing-slash convention is handled natively by
Fern's runtime (both /foo and /foo/ return the same page), so no
explicit slash-strip rule is needed.

Smoke-tested all 34 legacy URLs against http://localhost:3000 — every
one resolves to a 200 page on the new structure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

---------

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Kirit93 <kthadaka@nvidia.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Kirit93 <kthadaka@nvidia.com>
Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants