Skip to content

Avoids bug in tensordict==0.12.x by upper-bounding tensordict version#1658

Open
peterdsharpe wants to merge 4 commits into
NVIDIA:2.1.0-rcfrom
peterdsharpe:psharpe/tensordict-compile-hotfix
Open

Avoids bug in tensordict==0.12.x by upper-bounding tensordict version#1658
peterdsharpe wants to merge 4 commits into
NVIDIA:2.1.0-rcfrom
peterdsharpe:psharpe/tensordict-compile-hotfix

Conversation

@peterdsharpe
Copy link
Copy Markdown
Collaborator

@peterdsharpe peterdsharpe commented May 20, 2026

PhysicsNeMo Pull Request

Limits tensordict upper version to <0.12 until the following torch.compile regressions are fixed, merged, and released:

Also adds a test that would have caught this regression earlier.

Description

Checklist

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

… Mesh under torch.compile

- Adjusted the tensordict dependency in pyproject.toml to be upper-bounded due to regressions in version 0.12.x, with a note to drop the upper bound once the related PR is merged.
- Introduced a new test file for regression testing of the Mesh class to ensure compatibility with torch.compile, specifically addressing issues caused by the tensordict 0.12.x changes. The tests validate that cached properties and data fields behave correctly when compiled.
- Added a new entry in CHANGELOG detailing the fix for constructing a Mesh inside a torch.compile-traced function, addressing regressions from tensordict 0.12.0.
- Updated the mlflow and starlette package versions to 3.12.0 and 0.52.1 respectively, along with their corresponding source distribution and wheel URLs.
- Adjusted tensordict dependency constraints to ensure compatibility with the latest changes.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@peterdsharpe peterdsharpe changed the title Psharpe/tensordict compile hotfix Avoids bug in tensordict==0.12.x by upper-bounding tensordict version May 20, 2026
@peterdsharpe peterdsharpe requested a review from abokov-nv May 20, 2026 21:53
@peterdsharpe
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 20, 2026

Greptile Summary

This hotfix pins tensordict to >=0.11.0,<0.12 to avoid two torch.compile regressions introduced in tensordict 0.12.x (pytorch/tensordict#1708, #1710), and adds a dedicated regression test suite that constructs a Mesh inside a compiled function to prevent silent re-introduction of the bug on a future pin bump.

  • pyproject.toml: tensordict pin tightened to >=0.11.0,<0.12; redundant datapipes-extras tensordict entry removed (core dependency now satisfies the >=0.11.0 requirement already).
  • test/mesh/mesh/test_compile.py: Comprehensive regression tests cover cached-property access, data-field defaults, and the __post_init__ → cache round-trip, all under torch.compile.
  • uv.lock: Expected tensordict downgrade to 0.11.0; incidental side-effects include mlflow 3.11.1 → 3.12.0 and a starlette 1.0.0 → 0.52.1 downgrade (driven by mlflow-skinny 3.12.0 adding starlette as a direct dependency with an implicit upper bound).

Important Files Changed

Filename Overview
pyproject.toml Pins tensordict to >=0.11.0,<0.12 (well-commented) and removes the now-redundant datapipes-extras tensordict entry since core already covers >=0.11.0.
test/mesh/mesh/test_compile.py New regression test file; well-structured, covers cached properties, data-field defaults, and the full __post_init__→cache round-trip under torch.compile.
uv.lock tensordict downgraded to 0.11.0 (expected); incidental side-effects include mlflow 3.11.1→3.12.0 and starlette 1.0.0→0.52.1 downgrade driven by mlflow-skinny 3.12.0's new starlette dependency.
CHANGELOG.md CHANGELOG updated with both a Fixed entry and a Dependencies entry, clearly explaining the tensordict pin and upstream issue references.

Comments Outside Diff (1)

  1. uv.lock, line 7216-7232 (link)

    P2 Starlette version downgrade as lock-regen side-effect

    starlette moved from 1.0.00.52.1 as a side effect of mlflow-skinny 3.12.0 now declaring starlette as an explicit dependency — mlflow-skinny likely constrains it below 1.0.0. Since starlette is not a direct dependency of physicsnemo the practical risk is low, but it is worth confirming no other transitive consumer of starlette in the environment expects the 1.x API (which introduced breaking changes relative to 0.x).

Reviews (1): Last reviewed commit: "format" | Re-trigger Greptile

@ktangsali
Copy link
Copy Markdown
Collaborator

/blossom-ci

@peterdsharpe
Copy link
Copy Markdown
Collaborator Author

FYI, tensordict is planning to cut a patch-version release to fix this; after that lands, we can pin >=0.12.4.

pytorch/tensordict#1709 (comment)

@peterdsharpe
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

1 similar comment
@peterdsharpe
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants