Conversation
Emit DeprecationWarning whenever the legacy "implicit default provider" path is exercised: `ModelConfig.provider=None`, the registry-level `ModelProviderRegistry.default`, the YAML `default:` key in `~/.data-designer/model_providers.yaml`, and the CLI's "Change default provider" workflow. `resolve_model_provider_registry` skips passing `default=` in the single-provider case so the common construction path stays quiet. Multi-provider registries still pass `default` (per `check_implicit_default`) and warn accordingly. Update docs, the package README, and test fixtures to specify `provider=` explicitly on every `ModelConfig`. New tests cover each warning entry point and pin the post-deprecation happy paths. Refs #589 Made-with: Cursor
|
Docs preview: https://a769da61.dd-docs-preview.pages.dev
|
Review: PR #594 —
|
Greptile SummaryThis PR adds deprecation warnings across all six entry points for implicit default-provider routing (
|
| Filename | Overview |
|---|---|
| packages/data-designer-config/src/data_designer/config/utils/warning_helpers.py | New canonical helper; exact-or-dotted prefix matching prevents false positives; warn_explicit registry threading is correct; stacklevel=3 fallback is acknowledged best-effort. |
| packages/data-designer-engine/src/data_designer/engine/model_provider.py | model_fields_set guard correctly distinguishes explicit default= from field-default None; single-provider early return keeps common path quiet under deprecation. |
| packages/data-designer/src/data_designer/cli/repositories/provider_repository.py | Warning emission split out of the broad except block to prevent filterwarnings("error") from swallowing it; migrated to warn_at_caller for correct attribution. |
| packages/data-designer/src/data_designer/interface/data_designer.py | Scoped catch_warnings + filterwarnings("ignore") correctly suppresses the duplicate registry-level warning when the YAML path already warned; nested catch_warnings is properly supported by Python. |
| packages/data-designer/tests/interface/test_data_designer.py | test_init_yaml_default_emits_single_deprecation_warning uses stacklevel=2 in the mock rather than warn_at_caller, attributing to the DataDesigner frame; passes under always filter but does not mirror production attribution. |
Reviews (5): Last reviewed commit: "Merge branch 'main' into nmulepati/refac..." | Re-trigger Greptile
|
Thanks for putting this together, @nabinchha — the four entry points are mapped cleanly to issue #589 and the regression tests pin each one. I had a few thoughts after a careful read with edge cases in mind. SummaryThis PR lands the deprecation phase for the implicit-default-provider concept tracked in #589: Findings
Warnings — Worth addressing
Suggestions — Take it or leave it
What Looks Good
VerdictNeeds changes — Greptile's two findings (the swallowed-warning bug and the This review was generated by an AI assistant. |
Greptile P1: ProviderRepository.load emitted its DeprecationWarning
inside a `try/except Exception` block. Under
`filterwarnings("error", DeprecationWarning)` the warn would raise,
the except would swallow it, and `load()` would silently return None
(losing the registry). Move the warn outside the catch-all so the
strict-warning path no longer drops valid configs.
Greptile P2 / johnnygreco: `_warn_on_implicit_provider` and
`_warn_on_explicit_default` use `stacklevel=2`, which lands inside
pydantic v2's validator dispatch rather than at the user's
`ModelConfig(...)` / `ModelProviderRegistry(...)` call. That broke
both attribution (the source line was unhelpful) and Python's
once-per-location dedup (every call collapsed to the same
pydantic-internal key, suppressing all but the first warning).
Introduce `data_designer.config.utils.warning_helpers.warn_at_caller`,
which walks past the helper, validator, and any pydantic frames to
find the user's call site and emits via `warnings.warn_explicit` with
the user frame's `__warningregistry__`. Keeps attribution accurate
and dedup keyed on the user's (filename, lineno).
johnnygreco: align the `provider_repository.py` warning copy with the
sibling site in `default_model_settings.py` ("specify provider=
explicitly on each ModelConfig instead") so both YAML-default warning
sites give the same migration instruction. The previous wording
pointed users at "ModelConfig entries" inside `model_providers.yaml`,
where ModelConfig entries don't actually live.
johnnygreco: dedup the cascade in `DataDesigner.__init__`. With
`model_providers=None` and a YAML `default:`, the user previously saw
two DeprecationWarnings for the same root cause —
`get_default_provider_name()` warns about the YAML key, then
`resolve_model_provider_registry(...)` re-warns from
`_warn_on_explicit_default`. Suppress the registry-level duplicate in
the YAML-fallback branch via `warnings.catch_warnings()` so users see
exactly one warning per user action.
johnnygreco: tighten `_warn_on_explicit_default` to fire only when
`default is not None`. Passing `default=None` explicitly is
semantically equivalent to omitting it (caller is opting *out* of a
registry-level default), and shouldn't trigger the deprecation
nudge.
johnnygreco: add a `model_validate({...})` regression test for
`ModelConfig` so the deserialization path (legacy on-disk configs)
is pinned alongside the construction path.
Tests:
- Update `test_load_exists` and `test_save` to omit `default=` so the
roundtrip stops exercising the deprecated YAML-default path
unguarded (Greptile note).
- Wrap `test_resolve_model_provider_registry_with_explicit_default`,
`test_get_provider`, and
`test_init_user_supplied_providers_preserve_first_wins_over_yaml_default`
in `pytest.warns` so the suite stays green under
`-W error::DeprecationWarning` (Greptile note).
- Add `test_explicit_default_none_does_not_emit_deprecation_warning`
to pin the tightened predicate.
- Add `test_init_yaml_default_emits_single_deprecation_warning` to
pin the cascade-dedup behavior.
Refs #589
Made-with: Cursor
|
Thanks for the careful reads, @greptile-apps and @johnnygreco. Pushed Blockers (P1/P2) — fixedGreptile P1 / johnnygreco — Greptile P2 / johnnygreco — pydantic validator stacklevel. Introduced
Worth addressing — fixedjohnnygreco — johnnygreco — cascade of two warnings on a single Take-it-or-leave-it — takenjohnnygreco — johnnygreco — no regression test for the Test hygiene (Greptile notes, addressed)
Take-it-or-leave-it — leftjohnnygreco — Greptile — "warning storm" from johnnygreco — "two distinct nudges" debate on the cascade. I chose dedup (one warning per user action) over two separate nudges. The YAML-default warning already explains the migration; firing the registry-level deprecation immediately afterward for the same root cause was confusing without adding information. Easy to flip if the team prefers the louder option. Test counts
|
|
The implementation in P1 ( P2 ( Cascade dedup (
One minor observation: Everything else — dedup via Tip: You can customize Greptile's behavior for this repo with |
|
No blockers from me — both Greptile points are addressed correctly and the A couple of small follow-ups (do them here or in a fast-follow, your call):
Smaller nits, only worth chasing if you're already in the file:
|
| """ | ||
| default = _get_default_providers_file_content(MODEL_PROVIDERS_FILE_PATH).get("default") | ||
| if default is not None: | ||
| warnings.warn( |
There was a problem hiding this comment.
follow-up to johnnygreco's warn_at_caller work: this site still uses warnings.warn(stacklevel=2), so on the only real call path (DataDesigner.__init__:162) the warning is attributed to the data_designer library, not user code. python's default filter is default::DeprecationWarning:__main__ + ignore::DeprecationWarning, so library-attributed deprecations get silenced — verified empirically: a normal DataDesigner() call with a YAML default: set shows nothing under default filters. could either fire the warning from the __init__ boundary, or call warn_at_caller here too (with a small skip-list extension for data_designer.). non-blocking but worth doing in the same cycle while the deprecation messaging is fresh.
| frame = sys._getframe(2) if hasattr(sys, "_getframe") else None | ||
| while frame is not None: | ||
| module_name = frame.f_globals.get("__name__", "") | ||
| if not module_name.startswith("pydantic"): |
There was a problem hiding this comment.
related to johnnygreco's nit about startswith("pydantic") matching pydantic_helpers.py, there's a related issue going the other direction: when a ModelConfig or ModelProviderRegistry is constructed inside a data_designer helper (e.g. config builders, YAML loaders, resolve_model_provider_registry), the first non-pydantic frame is data_designer code, not the user's call site. the warning gets stamped at the library and silenced under default DeprecationWarning filters. confirmed via repro: resolve_model_provider_registry([a, b]) ends up attributed to model_provider.py:108. extending the skip to data_designer. (or accepting caller-supplied prefixes) would close the gap. easy to add a regression test asserting warning.filename lands on the test file rather than a library module.
andreatgretel (PR #594): the YAML-default warning in `get_default_provider_name` and the registry-default warning emitted from inside DataDesigner helpers were attributing to data_designer library frames, not user code. Python's default filter chain includes `ignore::DeprecationWarning`, so library-attributed entries are silenced — meaning a normal `DataDesigner()` call with a YAML `default:` set showed nothing, and `resolve_model_provider_registry` warnings were similarly invisible. Two related changes: 1. `warn_at_caller`: extend the default skip-list from `("pydantic",)` to `("pydantic", "pydantic_core", "data_designer")` so the walk escapes both pydantic's validator-dispatch frames and data_designer helper frames before attributing. Also tighten the prefix predicate to exact-or-dotted-prefix matching (`name == p or name.startswith(p + ".")`) so e.g. `pydantic_helpers` is not falsely matched as part of `pydantic` (johnnygreco nit). Allow callers to pass a custom `skip_prefixes` for flexibility. Drop the "skip frame 0+1 unconditionally" guard now that prefix matching covers it. 2. `get_default_provider_name`: switch from `warnings.warn(stacklevel=2)` to `warn_at_caller`. The previous stacklevel pointed into `default_model_settings.py`, which is a library file → silenced under default filters. Verified the fix empirically with `python -W default`: warning is now attributed to the user's call site and rendered. johnnygreco (PR #594): add the missing `test_explicit_default_none_does_not_emit_deprecation_warning` regression for the `self.default is not None` predicate landed in the prior round. Tests: - New `test_warning_helpers.py` pins prefix-matching precision (rejects `pydantic_helpers` / `data_designer_other`), default skip-list contents, attribution past skip-prefix frames, and per-call-site dedup behavior. - `test_get_default_provider_name_warning_attributes_to_user_frame` pins andreatgretel's repro for the YAML-default site. - `test_explicit_default_warning_attributes_to_user_frame` pins the multi-frame case: construction goes through `resolve_model_provider_registry`, so the walk has to escape both pydantic and data_designer before landing on the test file. - `test_explicit_default_none_does_not_emit_deprecation_warning` pins johnnygreco's predicate-tightening regression. 3,124 tests pass (540 config + 1,923 engine + 653 interface; +10 net from this round). Refs #589 Made-with: Cursor
|
Thanks @andreatgretel and @johnnygreco — pushed 247fa30 addressing both review notes. Addressandreatgretel — andreatgretel — johnnygreco — prefix-collision nit ( johnnygreco — missing Other notes from the May-1 review
TestsNew regressions:
|
greptile-apps (PR #594, r3189904028): `ProviderRepository.load`'s YAML-default `DeprecationWarning` was using `warnings.warn(stacklevel=2)`, which attributes to whichever data_designer frame called `load()` — controllers, services, list/reset commands, agent introspection. Every real call path lands on `data_designer.cli.*`, which falls under Python's default `ignore::DeprecationWarning` filter and is silenced. Audit found two more sites with the same problem: - `DatasetBuilder._resolve_async_compatibility` (`allow_resize` / issue #552) — was using `stacklevel=4` to walk past `_resolve_async_compatibility -> build/build_preview -> interface -> user`. Brittle: any added frame (decorator, async wrapping, the `try/except DeprecationWarning: raise` boundary) shifts attribution silently. The existing test passed only because it used `simplefilter("always") + record=True`, which records warnings regardless of attribution. - `ProviderController._handle_change_default` — was using `stacklevel=2`, which lands on the menu dispatcher in the same controller module. `print_warning` already shows the message visually, but programmatic observers (`pytest.warns`, `filterwarnings("error", ...)`) saw a library-attributed entry that default filters silenced. All three migrated to `warn_at_caller` (the helper from 247fa30) so attribution lands on the user's call site regardless of internal chain shape. `data_designer` is already in `DEFAULT_INTERNAL_PREFIXES`, so the walk escapes the entire library in one pass. Added attribution regression tests at each site asserting `warning.filename == __file__`. A future regression to `warnings.warn(stacklevel=N)` now fails CI instead of silently silencing the user-facing nudge: - `test_load_with_yaml_default_attributes_warning_to_caller` (test_provider_repository.py) - `test_resolve_async_compatibility` extended with the same assertion - `test_handle_change_default_emits_deprecation_warning` rewritten from `pytest.warns(...)` to a `catch_warnings(record=True)` block that filters for the message and asserts `filename == __file__` (`pytest.warns` does not check attribution, so the rewrite is required to actually catch the regression). 3,125 tests pass (548 config + 1,923 engine + 654 interface). Refs #589
Forward-port the doc changes that landed in main since this branch was cut, translating MkDocs admonition syntax to Fern components. Three product changes drove the updates: PR #594 — deprecate implicit default-provider routing: - concepts/models/configure-model-settings-with-the-cli.mdx: deprecate "Change default provider" workflow + inline mark on `data-designer config list` output - concepts/models/custom-model-settings.mdx: warning that `provider=` is now required on every ModelConfig - concepts/models/default-model-settings.mdx: warning that the registry-level default-provider concept is deprecated - concepts/models/model-providers.mdx: same warning at the top of the ModelProvider overview - concepts/models/inference-parameters.mdx: add explicit `provider= "openai"` to the dalle ModelConfig example PR #592 — async engine becomes the default: - concepts/architecture-and-performance.mdx: rewrite Execution Model intro to mention both engines, qualify "How It Works" as sync-engine semantics, update Concurrency Formula and Throttle notes from "Sync engine caveat" to "Engine paths", and add a full new "## Async Engine" section (per-model timeouts, run outcomes / Early Shutdown, opt-out via DATA_DESIGNER_ASYNC_ENGINE=0). Add `provider="nvidia"` to the my-model example. - concepts/custom_columns.mdx: note that sync `cell_by_cell` generators dispatch concurrently under the async engine; mock with `MagicMock(spec=ModelFacade)` so async methods are auto-detected. - concepts/processors.mdx: warning that the async engine enforces row-count invariance in process_before/after_batch. - devnotes/posts/async-all-the-way-down.mdx: append an "Update" callout noting the engine is now default, with a link to the Architecture page anchor. All `!!! warning|note|tip "Title"` admonitions converted to Fern <Warning|Note|Tip title="..."> components. Internal links to mkdocs relative paths (`../../concepts/foo.md#anchor`) rewritten to canonical Fern URLs (`/concepts/foo#anchor`). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>
* docs: migrate documentation from MkDocs to Fern Adds a Fern Docs build under fern/ alongside the existing mkdocs site. Production target docs.nvidia.com/nemo/datadesigner with floating-latest pointer (latest.yml symlink) at v0.5.8. Migrated all concept, recipe, plugin, dev-note, and tutorial pages to MDX with NVIDIA theme and custom components (Authors, MetricsTable, TrajectoryViewer, NotebookViewer, BadgeLinks). Tutorial notebooks now render via NotebookViewer with captured outputs (text, DataFrames, inline images) - new make targets generate-fern-notebooks and generate-fern-notebooks-with-outputs drive the .py -> executed .ipynb -> Fern JSON+TS pipeline, pinning docs to Python 3.13 to dodge pyarrow wheel issues on 3.14. Python API reference is configured via Fern libraries: pointing at data-designer-config; output is gitignored and regenerated locally with 'fern docs md generate'. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs: add datadesigner-docs agent skill Captures the patterns established in the Fern migration so agents (and humans) can maintain fern/ confidently. Modeled after NVIDIA-NeMo/Gym's nemo-gym-docs SKILL.md, adapted for our floating-latest versioning, notebook-with-outputs pipeline, dev-notes kit components, and the MDX gotchas hit during migration (pymdown attr_list, --8<-- snippet syntax, frontmatter authors-as-JSX-scope-variable, etc.). Routes triggers like "edit docs", "add doc page", "regenerate notebooks", "update dev note", "add API reference" to this skill. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs: address PR review for Fern migration - Delete stale fern/versions/_nav_order.yml (references non-existent ./versions/latest/pages/ — paths were never updated when latest/ was renamed to v0.5.8/, no consumer found in docs.yml or v0.5.8.yml). - Remove unused custom components: Tag.tsx, CustomCard.tsx, Include.tsx (had its own untested markdown parser), ExpandableCode.tsx (broken in Fern SSR runtime). Drop expandable-code.css from docs.yml. Authors, BadgeLinks, MetricsTable, NotebookViewer, TrajectoryViewer remain (each has at least one call site). - BadgeLinks: remove DEFAULT_BADGES with placeholder URLs; make `badges` prop required so we can never accidentally ship 'your-org/your-repo'. - NotebookViewer: document the XSS trust boundary on output cells of format: "html". Outputs flow .py source → jupytext --execute → committed *.ts (review boundary). Add an inline comment at the dangerouslySetInnerHTML call site pointing back to the trust-model section. - README: add Windows caveat on the latest.yml symlink — Windows users need core.symlinks=true before clone or Fern will reject the version config. - Makefile: tighten generate-fern-notebooks source probe from `ls .../*.ipynb` (which can return success on non-file errors) to `[ -f docs/notebooks/1-the-basics.ipynb ]`, matching the reviewer's suggestion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs: address @aschilling-nv review on fern/docs.yml Three suggestions from the Fern review, all matching Curator's docs.yml conventions: - instances[0].url: drop the https:// protocol prefix to match Curator's shape (e.g. nemo-curator.docs.buildwithfern.com/nemo/curator). - logo.href: was '/'; now points at /nemo/datadesigner/getting-started/welcome (the actual landing page) so clicking the logo lands on real content instead of the bare basepath. - experimental.basepath-aware: true — opts into Fern's basepath-aware routing so internal links don't double-prefix the /nemo/datadesigner segment. - redirects: also fix /nemo/datadesigner/index.html → getting-started/welcome (was bouncing to /latest, which is just the version slug); add /getting-started → /getting-started/welcome to mirror Curator's /home → /home/welcome convention. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs: put dev notes overview timestamps on separate lines Signed-off-by: Kirit93 <kthadaka@nvidia.com> Made-with: Cursor * docs: redesign dev-notes index with BlogCard component Replaces the generic <CardGroup>/<Card> grid (same green icon × 10, date glued to bottom of description) with a purpose-built BlogCard for the dev-notes landing page. Each card now has: - Hero image (16:9, lazy-loaded, click-to-zoom via Fern's rmiz wrapper) - ALL-CAPS date eyebrow as proper subtitle styling - Title, 3-line clamped description - Author byline at the bottom: avatar stack (overlapping) + first author name + "+N", pulling from the existing devnotes/.authors.yml registry - Hover: NVIDIA-green border + subtle lift Posts without a hero image fall back to a deterministic hash-based gradient placeholder + monogram (DJB2 hash of href → HSL hue, with the muddy-yellow band 40–90° remapped). Same post always gets the same look. Notes: - Image prop is React.ReactNode (not string) — pass <img> JSX from MDX so Fern's link rewriter can resolve the src to /_local/... in dev and /nemo/datadesigner/assets/... in prod. Raw string props bypass the rewriter and 404 in dev. - Card href runs through a small withBasepath() helper since the <a> also bypasses Fern's link rewriter. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs: flush blog-card hero images to the top of the card Fern's prose stylesheet applies a top margin to <img> tags, and the click-to-zoom wrapper Fern injects around each image (<span data-rmiz>) inherits that margin too. Result: a ~1rem gap between the card's top edge and the hero image. Reset margin/padding on the rmiz wrapper spans + the img itself inside .blog-card__media so the image renders flush against the top border. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs: stop blog-card hero from opening Fern's click-to-zoom modal When an <img> appears in MDX, Fern auto-wraps it with a click-to-zoom shell (<span data-rmiz>...). On the dev-notes index that shell intercepts clicks meant for the card's <a> wrapper, so clicking a hero opens a lightbox AND tries to navigate. Set pointer-events: none on the rmiz spans + img inside .blog-card__media so clicks bubble straight to the parent <a> and the card behaves as a single, predictable link target. Hover still works because pointer-events on children doesn't block :hover on the ancestor <a>. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs: render notebook markdown at build time with markdown-it-py Replaces NotebookViewer's hand-rolled JS markdown parser (the one with the ^@br^@ sentinel the reviewer flagged as fragile) with build-time rendering in the converter. ipynb-to-fern-json.py now uses markdown-it-py (CommonMark + tables + strikethrough + raw HTML) to render each markdown cell's source into source_html, mirroring how code cells already store Pygments-highlighted source_html. NotebookViewer's markdown branch becomes a single dangerouslySetInnerHTML on the pre-rendered HTML, with a plain-escape fallback for old snapshots. Removes the dead JS helpers (renderMarkdown, isSafeUrl, UL_CLASS, OL_CLASS) — ~60 lines of brittle regex-based markdown parsing. Fixes broken rendering of: - Blockquotes (showed literal > characters before) - Nested content inside blockquotes (e.g. blockquote with bullet list) - Fenced code blocks - Tables - Multi-paragraph list items Includes regenerated fern/components/notebooks/*.{json,ts} for all 6 tutorials. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs: rewrite recipes index + replace octicons download links with Fern Info callouts The recipes/cards.mdx page was still in MkDocs Material format: - <div class="grid cards" markdown> wrapper (no-op in MDX) - :material-snake:, :material-database:, :material-tools:, etc. (rendered as literal text — Fern uses Font Awesome, not Material icons) - !!! tip Prerequisite (mkdocs admonition syntax) - [:material-book-open-page-variant: View Recipe] / [Download Code :octicons-download-24:] links with embedded icon shortcodes Rewrite using Fern's native components: <CardGroup cols={2}> with <Card title icon href> grouped by category (Code Generation, QA and Chat, Trace Ingestion, MCP and Tool Use, Plugin Development). Each card has one primary action (the recipe page); download lives on the recipe page itself. Replace the trailing "Download Code :octicons-download-24:" link on every recipe page (and 2 dev notes) with a <Info title="Download Recipe"> callout pointing at the GitHub blob URL — matching PR #215's convention. 12 occurrences across 12 files. Also fixes 6 recipe pages whose frontmatter title was "Untitled" (unfilled placeholder from auto-migration): text_to_python, basic_mcp, pdf_qa, multi_turn_chat, product_info_qa, agent_rollout_distillation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): mirror main's content updates into v0.5.8 MDX pages Forward-port the doc changes that landed in main since this branch was cut, translating MkDocs admonition syntax to Fern components. Three product changes drove the updates: PR #594 — deprecate implicit default-provider routing: - concepts/models/configure-model-settings-with-the-cli.mdx: deprecate "Change default provider" workflow + inline mark on `data-designer config list` output - concepts/models/custom-model-settings.mdx: warning that `provider=` is now required on every ModelConfig - concepts/models/default-model-settings.mdx: warning that the registry-level default-provider concept is deprecated - concepts/models/model-providers.mdx: same warning at the top of the ModelProvider overview - concepts/models/inference-parameters.mdx: add explicit `provider= "openai"` to the dalle ModelConfig example PR #592 — async engine becomes the default: - concepts/architecture-and-performance.mdx: rewrite Execution Model intro to mention both engines, qualify "How It Works" as sync-engine semantics, update Concurrency Formula and Throttle notes from "Sync engine caveat" to "Engine paths", and add a full new "## Async Engine" section (per-model timeouts, run outcomes / Early Shutdown, opt-out via DATA_DESIGNER_ASYNC_ENGINE=0). Add `provider="nvidia"` to the my-model example. - concepts/custom_columns.mdx: note that sync `cell_by_cell` generators dispatch concurrently under the async engine; mock with `MagicMock(spec=ModelFacade)` so async methods are auto-detected. - concepts/processors.mdx: warning that the async engine enforces row-count invariance in process_before/after_batch. - devnotes/posts/async-all-the-way-down.mdx: append an "Update" callout noting the engine is now default, with a link to the Architecture page anchor. All `!!! warning|note|tip "Title"` admonitions converted to Fern <Warning|Note|Tip title="..."> components. Internal links to mkdocs relative paths (`../../concepts/foo.md#anchor`) rewritten to canonical Fern URLs (`/concepts/foo#anchor`). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): address @andreatgretel review comments Four issues from Andre's review pass: 1. /devnotes 404 (index.mdx:23) — section slug is /dev-notes, page slug is /dev-notes/overview. Fix the link in the landing page so visitors actually reach the dev notes index. 2. TrajectoryViewer.tsx final-answer body shown as literal markdown (line 66) — the renderer uses dangerouslySetInnerHTML but example-marcia.ts shipped raw markdown (**bold**, \n\n breaks). Visible on the deep-research devnote where the trajectory is defaultOpen. Pre-render body to HTML in the fixture (matches the original hand-coded format pre-migration); document the convention in the ToolCall.body doc comment so future fixtures don't regress. 3. Tutorials 5/6 (image generation/editing) ship with 0 captured outputs because Flux runs through OpenRouter and OPENROUTER_API_KEY isn't set at build time. Cannot regenerate without the key, so add a <Note> at the top of each wrapper page pointing readers at the Colab link to execute the cells live and see the generated images. Maintainers with the key in their environment should re-run `make generate-fern-notebooks-with-outputs` before merge to capture the snapshots. 4. Legacy nvidia-nemo.github.io/DataDesigner/* URLs in MDX prose (8 occurrences across 5 files) rewritten to canonical Fern paths so visitors don't get sent back to the legacy GitHub Pages site once docs.nvidia.com/nemo/datadesigner becomes the production URL: - The single deep link in data-designer-got-skills.mdx → /concepts/models/default-model-settings - All other "documentation home" links (CONTRIBUTING ×2, async-all-the-way-down ×2, owning-the-model-stack, design-principles ×2) → /getting-started/welcome (the canonical landing slug, matches logo.href in docs.yml) Notebook .py source URLs are tracked separately as part of the notebook-regen work. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): regenerate notebook snapshots with Flux outputs captured Re-ran make generate-fern-notebooks-with-outputs with NVIDIA_API_KEY + OPENROUTER_API_KEY set, now that we have a NVIDIA key with permission on nemotron-3-nano-30b-a3b. All 6 tutorials regenerated; the two image tutorials (5 and 6) which had been shipping with 0 outputs now have captured Flux generations: 1 the-basics: 12/15 outputs 2 structured-outputs: 13/17 outputs 3 seeding-with-a-dataset: 10/13 outputs 4 providing-images: 13/17 outputs (1 image) 5 generating-images: 8/10 outputs (2 images) ← was 0/12 6 image-to-image-editing: 9/12 outputs (10 images) ← was 0/14 The two `<Note title="Run in Colab to see ...">` workarounds I added on the 5/6 wrapper pages are no longer needed — outputs render inline now. NotebookViewer's own "Run in Google Colab" banner is still rendered from the wrapper's `colabUrl` prop, so the live-execute path stays one click away. Bumps the diff size noticeably (notebook 6 .ts is ~22MB of base64- encoded PNGs from 10 edited images), but that's intentional — these images are the proof points for what the Flux/MCP image-context tutorials actually produce. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): unbreak SSR — shrink notebook image outputs + fix BlogCard React import Two server-side render bugs surfaced when running `fern docs md generate && fern docs dev` (the static-preview path): 1. The 22 MB notebook 6 .ts module (full-resolution Flux PNGs from 10 edited images) tripped Fern's SSR module-evaluation step. Once that module failed to evaluate, the shared component bundle failed to load on every page, replacing each MDX body with `<span data-intent="error">Something went wrong!</span>` while the layout chrome continued to render. Fix in fern/scripts/ipynb-to-fern-json.py: after extracting an image/png output, pass it through Pillow to (a) downscale so the longest edge is at most 800 px, (b) re-encode as JPEG q=82 progressive (Flux outputs are photographic — JPEG compresses 5–10× better than PNG for this content). NotebookViewer's CellOutput interface gains a `mime` field so the data URL uses the actual encoded MIME type. Result: notebook 6: 22 MB → 4.6 MB notebook 5: 3.8 MB → 1.8 MB notebook 4: 514 KB → 116 KB (notebooks 1–3 unaffected — no image outputs) 2. fern/components/BlogCard.tsx referenced `React.ReactNode` twice without importing React. Other components in the kit use `import type { ReactNode } from "react"`; BlogCard was the outlier. Aligned the import style — even though this didn't end up being the trigger, leaving the dangling reference would have eventually caused a strict-mode SSR regression. Sweep test against http://localhost:3000/nemo/datadesigner/* — landing, concepts, tutorials (including 5/6 image notebooks), dev notes, recipes, and code-reference topic pages all render with their content; no error spans. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> * docs(fern): add MkDocs-shape redirects for legacy URLs The legacy site at https://nvidia-nemo.github.io/DataDesigner/ used MkDocs-Material conventions (mkdocstrings + blog plugin + mkdocs-jupyter + directory URLs). Several path segments and page slugs differ from Fern's slugified-title routing — search-engine indexed links and copy-pasted bookmarks land on 404 without redirects. Adds 30+ specific redirect rules covering every renamed surface: - Tutorials: /notebooks/<filename>/ -> /tutorials/<title-slug> (page-title slugs differ from .ipynb filenames; one rule per notebook plus a README -> overview alias). - Recipes: /recipes/<snake_subsection>/<snake_page>/ -> /recipes/<kebab-subsection>/<kebab-page>. Per-page rules for each of the 10 recipes (page titles diverged from .py filenames — e.g. basic_mcp -> basic-mcp-tool-use, search_agent -> nemotron-super-search-agent), followed by subsection :rest* fallbacks. - Concepts: /concepts/mcp/* -> /concepts/tool-use-mcp/* (subsection rename, with & dropped, not -and-). Per-page rules for safety-and-limits -> safety-limits and configure-mcp-cli -> cli-configuration where page titles diverged from filenames. - Code Reference: /code_reference/<module>/ -> /code-reference/topic-overviews/<module>. Per-page rules for the six underscored modules (column_configs, config_builder, run_config, sampler_params, validator_params, data_designer_config) since Fern's page-slug rule kebabs underscores. - Plugins: filesystem_seed_reader -> file-system-seed-reader-plugins (Fern inserts hyphens between CamelCase words). example -> example-plugin, available -> available-plugin-list (page-title slugs). - Dev Notes: blog plugin's /devnotes/posts/<slug>/ -> /dev-notes/<slug>. Per-page rules for text-to-sql -> text-to-sql-for-nemotron-super and rqa -> rqa-dataset (post titles diverged from filenames). - /devnotes -> /dev-notes/overview (section landing). MkDocs's directory-URL trailing-slash convention is handled natively by Fern's runtime (both /foo and /foo/ return the same page), so no explicit slash-strip rule is needed. Smoke-tested all 34 legacy URLs against http://localhost:3000 — every one resolves to a 200 page on the new structure. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Kirit93 <kthadaka@nvidia.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Kirit93 <kthadaka@nvidia.com> Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>
📋 Summary
Deprecates the legacy "implicit default provider" routing before it's removed in a future release. Every entry point that exercises the implicit default —
ModelConfig.provider=None, the registry-levelModelProviderRegistry.default, the YAMLdefault:key, the CLI's "Change default provider" workflow, and theallow_resizeasync-fallback escape hatch — now emits aDeprecationWarningpointing users at the explicitprovider=migration. Continues the work started in #591 / tracked under issue #589.A second concern surfaced during review:
DeprecationWarnings emitted from library frames are silenced under Python's defaultignore::DeprecationWarningfilter and dedupe against pydantic-internal lines. To make the new warnings actually visible to users, every emission site now goes through a smallwarn_at_callerhelper that walks pastpydanticanddata_designerframes and attributes the warning to the user's call site.🔗 Related Issue
Refs #589
🔄 Changes
✨ Added
Deprecation warnings (one per entry point):
ModelConfig._warn_on_implicit_provider— pydantic post-validator that warns wheneverproviderisNone(packages/data-designer-config/src/data_designer/config/models.py)ModelProviderRegistry._warn_on_explicit_default— fires only when the caller actually passeddefault=(usesmodel_fields_setso the field-defaultNonepath stays quiet) (packages/data-designer-engine/src/data_designer/engine/model_provider.py)get_default_provider_name()warns when the on-disk providers YAML carries adefault:keyProviderRepository.loadwarns on the same YAML-default condition for the CLI read pathProviderController._handle_change_defaultwarns when the user enters the "Change default provider" interactive workflowDatasetBuilder._resolve_async_compatibilitywarns whenallow_resize=Trueforces sync fallback (separate deprecation under issue docs: add plan for workflow chaining #552, surfaced through the same helper for consistency)Visibility / attribution helper:
packages/data-designer-config/src/data_designer/config/utils/warning_helpers.pyexportingwarn_at_caller. Walkssys._getframepast every frame whose module belongs topydantic,pydantic_core, ordata_designer, then callswarnings.warn_explicitagainst the first user frame using that frame's own__warningregistry__so Python's once-per-location dedup keys correctly. Falls back towarnings.warnif no user frame is reachable.module == prefix or module.startswith(prefix + ".")) sopydantic_helpersis not mistaken forpydantic, anddata_designer_otheris not mistaken fordata_designer(regression case from review).Tests:
test_warning_helpers.pycases covering the prefix-matching predicate and the frame-walk semantics (direct caller, library skip, fallback)warning.filename == __file__(i.e. attributes to the user's frame, not a library frame). A regression towarnings.warn(..., stacklevel=N)would silently silence these warnings under default filters and now fails the assertion instead.🔧 Changed
DeprecationWarningemission sites now usewarn_at_callerinstead ofwarnings.warn(..., stacklevel=N).stacklevel=Nis brittle (any added frame breaks it) and lands on adata_designer.*frame for every realistic call path through controllers, services, builders, and the interface layer — silenced underignore::DeprecationWarning.resolve_model_provider_registryskips passingdefault=in the single-provider case so the common construction path stays quiet under the new warning. Multi-provider registries still passdefault(percheck_implicit_default) and warn accordingly.stub_model_configsfixture and existingModelConfig-constructing tests now passprovider=explicitly so they don't trip the new warningModelConfig.providerandModelProviderRegistry.defaultannotated as deprecated📚 Docs
docs/concepts/models/model-providers.md,default-model-settings.md,custom-model-settings.md, andconfigure-model-settings-with-the-cli.mddocs/concepts/architecture-and-performance.md,inference-parameters.md, and thedata-designer-configREADME updated to setprovider=explicitly🔍 Attention Areas
packages/data-designer-config/src/data_designer/config/utils/warning_helpers.py— usessys._getframeandwarnings.warn_explicit. The module docstring spells out (1) whystacklevel=Nis wrong for warnings emitted from a pydantic validator or library helper, (2) whymodule_globalsis deliberately omitted from thewarn_explicitcall (the__main__-as-BuiltinImporterlinecachefailure mode), and (3) the dedup-key reasoning. New canonical pattern — review with that bar in mind.packages/data-designer-engine/src/data_designer/engine/model_provider.py—_warn_on_explicit_defaultusesmodel_fields_setto distinguish "caller passeddefault=" from "field at defaultNone". The single-providerresolve_model_provider_registrytweak relies on this distinction so common construction paths stay quiet. Worth a careful read.packages/data-designer-config/src/data_designer/config/models.py—_warn_on_implicit_providerruns at construction time, so anyModelConfigbuilt withoutprovider=(including legacy serialized configs loaded viamodel_validate) will now emit a warning. Confirm this is the intended blast radius.🧪 Testing
make testpasses (3,125 tests: 548 config + 1,923 engine + 654 interface)warn_at_callersemanticswarning.filename == __file__at every emission site so a future regression towarnings.warn(stacklevel=N)fails CI instead of silently silencing the warning✅ Checklist