Skip to content

[CI] Add initial CI test for XCCL backend#104

Closed
chuanqi129 wants to merge 5 commits intometa-pytorch:mainfrom
chuanqi129:xpu_ci
Closed

[CI] Add initial CI test for XCCL backend#104
chuanqi129 wants to merge 5 commits intometa-pytorch:mainfrom
chuanqi129:xpu_ci

Conversation

@chuanqi129
Copy link
Copy Markdown
Collaborator

@chuanqi129 chuanqi129 commented Dec 24, 2025

Follows #52 to enable XCCL backend CI test, depends on #432 build fix PR land

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 24, 2025
Comment thread .github/scripts/xpu_test.sh Outdated
cd torchcomms && pip install . --no-build-isolation && cd ..

python3 -c "import torch; import torchcomms; print(f'Torch version: {torch.__version__}')"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is still draft but we don't actually run any tests here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will rebase the PR and modify this part after the XCCL integration PR landed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @d4l3k , the added xccl workflow passed https://github.com/meta-pytorch/torchcomms/actions/runs/21890217912/job/63194257787?pr=104#step:10:2581, please help to review this PR again. This PR only have build for now, we will have another PRs to enable the tests

@chuanqi129 chuanqi129 force-pushed the xpu_ci branch 2 times, most recently from 6ece948 to 8ebd6f4 Compare January 26, 2026 10:32
@chuanqi129
Copy link
Copy Markdown
Collaborator Author

Status update:

  1. Use pytorch ci image form S3 registry, met credential issue https://github.com/meta-pytorch/torchcomms/actions/runs/21239236743/job/61311504892#step:10:11
  2. Use PyTorch CI image public mirror from GitHub registry, met docker pull spend more than 30 mins https://github.com/meta-pytorch/torchcomms/actions/runs/21354504206/job/61511937439?pr=104#step:10:189
  3. Try to use other lightweight docker image and install xpu support package during the test

@chuanqi129 chuanqi129 force-pushed the xpu_ci branch 11 times, most recently from 321e4b9 to 7102621 Compare February 2, 2026 10:36
@chuanqi129 chuanqi129 marked this pull request as ready for review February 2, 2026 16:23
@chuanqi129 chuanqi129 requested a review from d4l3k February 2, 2026 16:24
Copy link
Copy Markdown
Member

@d4l3k d4l3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Feb 11, 2026

@d4l3k has imported this pull request. If you are a Meta employee, you can view this in D93006547.

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Feb 11, 2026

@d4l3k merged this pull request in 9e4cc3b.

Chao1Han pushed a commit to Chao1Han/torchcomms that referenced this pull request Feb 27, 2026
Summary:
Follows meta-pytorch#52 to enable XCCL backend CI test, depends on meta-pytorch#432 build fix PR land

Pull Request resolved: meta-pytorch#104

Reviewed By: kapilsh

Differential Revision: D93006547

Pulled By: d4l3k

fbshipit-source-id: 27c2d64a0971a4ef1a45e644f3f457721931f7c9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants