Skip to content

Generate a manifest file for Pytorch builds#2547

Merged
erman-gurses merged 74 commits intomainfrom
users/erman-gurses/add-pytorch-manifest
Feb 13, 2026
Merged

Generate a manifest file for Pytorch builds#2547
erman-gurses merged 74 commits intomainfrom
users/erman-gurses/add-pytorch-manifest

Conversation

@erman-gurses
Copy link
Copy Markdown
Contributor

@erman-gurses erman-gurses commented Dec 15, 2025

Closes #2481
Generate a manifest file for Pytorch builds.

Example: therock-manifest_torch_py3.11_release-2.7.json

{
  "pytorch": {
    "commit": "01078254ea873c32a090304cecc6a70c1c87132c",
    "repo": "https://github.com/ROCm/pytorch.git"
  },
  "pytorch_audio": {
    "commit": "95c61b4168fc5133be8dd8c1337d929d066ae6cf",
    "repo": "https://github.com/pytorch/audio"
  },
  "pytorch_vision": {
    "commit": "59a3e1f9f78cfe44cb989877cc6f4ea77c8a75ca",
    "repo": "https://github.com/pytorch/vision"
  },
  "triton": {
    "commit": "b2eaccd6b2b7909bf02ac7cb2dddfa87275c3f25",
    "repo": "https://github.com/ROCm/triton.git"
  },
  "therock": {
    "commit": "905eb116096064c5058b6c487239411637bab52e",
    "repo": "https://github.com/ROCm/TheRock.git",
    "branch": "users/erman-gurses/add-pytorch-manifest"
  }
}

https://therock-dev-artifacts.s3.us-east-2.amazonaws.com/21179896705-linux/manifests/gfx110X-all/therock-manifest_torch_py3.11_release-2.7.json

Comment thread .github/workflows/build_portable_linux_pytorch_wheels.yml Outdated
Comment thread .github/workflows/build_portable_linux_pytorch_wheels.yml Outdated
Comment thread external-builds/pytorch/generate_pytorch_manifest.py Outdated
Comment thread external-builds/pytorch/generate_pytorch_manifest.py Outdated
Comment thread external-builds/pytorch/generate_pytorch_manifest.py Outdated
@erman-gurses erman-gurses marked this pull request as ready for review January 5, 2026 06:08
@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented Jan 5, 2026

@erman-gurses erman-gurses moved this from TODO to In Progress in TheRock Triage Jan 5, 2026
@erman-gurses erman-gurses linked an issue Jan 5, 2026 that may be closed by this pull request
@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented Jan 5, 2026

Copy link
Copy Markdown
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional feedback but need to take a closer look.

Comment thread external-builds/pytorch/generate_pytorch_manifest.py Outdated
if: ${{ github.repository_owner == 'ROCm' }}
run: |
aws s3 cp ${{ env.PACKAGE_DIST_DIR }}/manifests/ \
s3://${{ env.S3_BUCKET_ARTIFACTS }}/external-builds/pytorch/${{ github.run_id }}/${{ inputs.amdgpu_family }}/ \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ScottTodd any feelings about the location?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think carefully about the top level folder structure.

Here's an example Linux release package build: https://github.com/ROCm/TheRock/actions/runs/20737916627#summary-59538918041

Could we match that and write to ${{ github.run_id }}-{{ platform }}/manifests/{{ inputs.amdgpu_family }}/therock_torch_manifest.json?

For example:

20737916627-linux/manifests/gfx94X-dcgpu/therock_rocm_manifest.json
20737916627-linux/manifests/gfx94X-dcgpu/therock_torch_manifest.json
20737916627-linux/manifests/gfx94X-dcgpu/therock_jax_manifest.json

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an issue when we have single name as therock_torch_manifest.json per run. We need six different json files (json per job) from a particular run: e.g. https://github.com/ROCm/TheRock/actions/runs/20808841882/job/59768375462. So the last completed job will overwrite the therock_torch_manifest.json file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When #1236 is solved, each release build should use matching commits. Until then, we can upload one manifest per job with a descriptive name. We'll at least be able to compare manifests across workflow runs and jobs then to see how often the commits used differ.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, sounds good to me.

Copy link
Copy Markdown
Contributor Author

@erman-gurses erman-gurses Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking the future issue in here: #2827

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we should sequence this work differently.

  1. Solve Release workflows do not freeze the commits they select #1236 first. Generate manifest files at the start of the release pipelines for all releases, all repositories (ROCm, PyTorch, JAX, etc.).
  2. Trigger release builds based on those computed manifests.

That should be much cleaner architecturally. The build scripts won't need to know anything about manifests, they'll continue to operate as they currently do, based on whatever code has been checked out ahead of time by the workflows. We can add a mode to the checkout scripts that reads from the manifest as needed.

Comment thread build_tools/github_actions/generate_pytorch_manifest.py
@marbre marbre self-requested a review January 5, 2026 22:11
if: ${{ github.repository_owner == 'ROCm' }}
run: |
aws s3 cp ${{ env.PACKAGE_DIST_DIR }}/manifests/ \
s3://${{ env.S3_BUCKET_ARTIFACTS }}/external-builds/pytorch/${{ github.run_id }}/${{ inputs.amdgpu_family }}/ \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think carefully about the top level folder structure.

Here's an example Linux release package build: https://github.com/ROCm/TheRock/actions/runs/20737916627#summary-59538918041

Could we match that and write to ${{ github.run_id }}-{{ platform }}/manifests/{{ inputs.amdgpu_family }}/therock_torch_manifest.json?

For example:

20737916627-linux/manifests/gfx94X-dcgpu/therock_rocm_manifest.json
20737916627-linux/manifests/gfx94X-dcgpu/therock_torch_manifest.json
20737916627-linux/manifests/gfx94X-dcgpu/therock_jax_manifest.json

Comment thread build_tools/github_actions/generate_pytorch_manifest.py
Comment thread external-builds/pytorch/generate_pytorch_manifest.py Outdated
Comment thread external-builds/pytorch/generate_pytorch_manifest.py Outdated
@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses

This comment was marked as outdated.

Comment thread .github/workflows/build_portable_linux_pytorch_wheels.yml Outdated
Comment thread external-builds/pytorch/build_prod_wheels.py Outdated
Comment thread external-builds/pytorch/build_prod_wheels.py Outdated
Comment thread external-builds/pytorch/generate_pytorch_manifest.py Outdated
Comment thread external-builds/pytorch/generate_pytorch_manifest.py Outdated
Comment thread external-builds/pytorch/generate_pytorch_manifest.py Outdated
Comment thread external-builds/pytorch/tests/test_generate_pytorch_manifest.py Outdated
Comment thread external-builds/pytorch/tests/test_generate_pytorch_manifest.py Outdated
Comment thread external-builds/pytorch/tests/test_generate_pytorch_manifest.py Outdated
Comment thread .github/workflows/build_portable_linux_pytorch_wheels.yml Outdated
Comment thread .github/workflows/build_windows_pytorch_wheels.yml Outdated
@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented Feb 12, 2026

Tests: https://github.com/ROCm/TheRock/actions/runs/21955293350

Example from the test:
https://therock-dev-artifacts.s3.us-east-2.amazonaws.com/21955293350-linux/manifests/gfx110X-all/therock-manifest_torch_py3.11_nightly.json

{
  "pytorch": {
    "commit": "a2508a3a3258955ba020584e5bff9cb64a6e4b32",
    "repo": "https://github.com/pytorch/pytorch.git"
  },
  "pytorch_audio": {
    "commit": "5fde87141e1d48a4fe2d0ab6371bb109bcc0ff46",
    "repo": "https://github.com/pytorch/audio.git"
  },
  "pytorch_vision": {
    "commit": "95c251f3f39a90877a41ff60471031cb4db0aa38",
    "repo": "https://github.com/pytorch/vision.git"
  },
  "triton": {
    "commit": "f5955032dea6ca418ed2b1ac28a86dbb8f9e2869",
    "repo": "https://github.com/ROCm/triton.git"
  },
  "therock": {
    "commit": "6c743f7f8c4f3207ea65982e40f5845211b41f46",
    "repo": "https://github.com/ROCm/TheRock.git",
    "branch": "users/erman-gurses/add-pytorch-manifest"
  }
}

Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good. A few remaining comments about file organization and code style.

Comment on lines +210 to +213
--pytorch-dir "external-builds/pytorch/pytorch" \
--pytorch-audio-dir "external-builds/pytorch/pytorch_audio" \
--pytorch-vision-dir "external-builds/pytorch/pytorch_vision" \
--triton-dir "external-builds/pytorch/triton"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add apex now too. Fine to be a follow-up PR.

See 39d840b

Comment thread .github/workflows/unit_tests.yml Outdated
Comment thread build_tools/github_actions/upload_pytorch_manifest.py Outdated
Comment thread build_tools/github_actions/generate_pytorch_manifest_test.py Outdated
Comment thread build_tools/github_actions/generate_pytorch_manifest.py Outdated
@erman-gurses
Copy link
Copy Markdown
Contributor Author

erman-gurses commented Feb 12, 2026

Test: https://github.com/ROCm/TheRock/actions/runs/21970407618
Example:
https://therock-dev-artifacts.s3.us-east-2.amazonaws.com/21970407618-linux/manifests/gfx110X-all/therock-manifest_torch_py3.13_release-2.10.json

{
  "pytorch": {
    "commit": "e77a6493f222212b11f9fc91510e15465d106a8b",
    "repo": "https://github.com/ROCm/pytorch.git",
    "branch": "release/2.10"
  },
  "pytorch_audio": {
    "commit": "5047768f2447d963dffc250b64a5d6c01afc84fa",
    "repo": "https://github.com/pytorch/audio"
  },
  "pytorch_vision": {
    "commit": "82df5f599578b383987510836bb05ea97dcc9669",
    "repo": "https://github.com/pytorch/vision"
  },
  "triton": {
    "commit": "244a4867743cd408b3cbbca704e4c28a80533491",
    "repo": "https://github.com/ROCm/triton.git"
  },
  "therock": {
    "commit": "26d0ca4c2f744b10434c80bb7720f08f07d3223d",
    "repo": "https://github.com/ROCm/TheRock.git",
    "branch": "users/erman-gurses/add-pytorch-manifest"
  }
}

CC: @ScottTodd

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also include these changes in https://github.com/ROCm/TheRock/blob/main/.github/workflows/build_portable_linux_pytorch_wheels_ci.yml (new workflow for CI), but the longer term plans will solve for that naturally:

  • In the "setup" job, compute a single manifests for all projects and upload it (or multiple manifests: one for ROCm, one for PyTorch, one for JAX, etc.)
  • In all jobs follow that, use the manifests to decide what to checkout and build

@erman-gurses erman-gurses merged commit 28ae028 into main Feb 13, 2026
92 of 95 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in TheRock Triage Feb 13, 2026
@erman-gurses erman-gurses deleted the users/erman-gurses/add-pytorch-manifest branch February 13, 2026 02:57
erman-gurses added a commit that referenced this pull request Feb 20, 2026
## Motivation
Add Apex to the Pytorch Manifest file
Follow up PR for #2547

See the comment here:
#2547 (comment)

## Test Plan

Test on CI
## Test Result
The test runs: https://github.com/ROCm/TheRock/actions/runs/22070943413

Example Manifest from the test:
```
{
  "pytorch": {
    "commit": "f466ea7ac710374ce7066d9ead92e57dc42dc76f",
    "repo": "https://github.com/ROCm/pytorch.git",
    "branch": "release/2.7"
  },
  "pytorch_audio": {
    "commit": "95c61b4168fc5133be8dd8c1337d929d066ae6cf",
    "repo": "https://github.com/pytorch/audio"
  },
  "pytorch_vision": {
    "commit": "59a3e1f9f78cfe44cb989877cc6f4ea77c8a75ca",
    "repo": "https://github.com/pytorch/vision"
  },
  "triton": {
    "commit": "95abd4daa02d789ed257178ba35b4ef2d746ca29",
    "repo": "https://github.com/ROCm/triton.git"
  },
  "apex": {
    "commit": "7a57becffa31a4394369152e978fa6f8a0f005c2",
    "repo": "https://github.com/ROCm/apex"
  },
  "therock": {
    "commit": "02024427089134e951fe4063100b93c28e7e01a8",
    "repo": "https://github.com/ROCm/TheRock.git",
    "branch": "users/erman-gurses/add-apex-to-pytorch-manifest"
  }
}
```

Link:
https://therock-dev-artifacts.s3.us-east-2.amazonaws.com/22070943413-linux/manifests/gfx94X-dcgpu/therock-manifest_torch_py3.10_release-2.7.json

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Add Manifest Generation for PyTorch External Builds

3 participants