Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
0ccb2bb
Fix gradient is ready with z2 (#7829)
sfc-gh-truwase Feb 3, 2026
bea50ef
add autoep
tohtana Feb 7, 2026
a44fb58
Fix AutoTP custom patterns: respect use_default_specs (#7827)
tohtana Feb 7, 2026
2c041db
add checkpointing
tohtana Feb 7, 2026
c2a89bc
fix format
tohtana Feb 7, 2026
fd07c93
add custom patterns
tohtana Feb 8, 2026
cabfebc
fix optimizer resumption
tohtana Feb 9, 2026
72b9465
Support new python 3.14 annotation handling (#7831)
sdvillal Feb 10, 2026
2c36283
fix: replace deprecated fractions.gcd with math.gcd (#7845)
Mr-Neutr0n Feb 11, 2026
1752c2a
Fix bf16 gradient norm divergence with ZeRO stage 0 (#7839)
tohtana Feb 12, 2026
d2ca6e7
Replace torch.jit.script with torch.compile (#7835) (#7840)
tohtana Feb 12, 2026
b74df16
Merge pull request #7850 from deepspeedai/loadams/update-post-0.18.6
loadams Feb 13, 2026
84af822
Z1/2 init: flatten params on device (#7828)
ksugama Feb 13, 2026
a2ab10d
autoep: fix post-dispatch local expert permutation grouping
tohtana Feb 13, 2026
7f49367
Enable shm_comm support for arm (#7800)
phalani-paladugu Feb 15, 2026
8a63a2a
Add news entry for DeepSpeed updates (#7854)
PKUWZP Feb 16, 2026
f3a9819
Add EXAONE 4.0 model support for Inference V2 (#7853)
Bias92 Feb 17, 2026
dbc1b07
Fix ROCm BF16 conversion intrinsics in inference v2 (#7843) (#7846)
tohtana Feb 18, 2026
c9e652c
Fix compilation of Evoformer (#7862)
Flamefire Feb 21, 2026
57b10d5
Throw error when parameter is modified in GatheredParameters (#7832)
tohtana Feb 21, 2026
93524c8
Fix Zero-3 static scale assertion in fp16 test (#7866)
tohtana Feb 22, 2026
0416cf6
Schedule nightly full test (#7870)
tohtana Feb 24, 2026
efc0b49
Fix broken links and add AutoTP Training tutorial to sidebar nav (#7874)
tohtana Feb 25, 2026
ae21699
fix: replace 35 bare except clauses with except Exception (#7873)
haosenwang1018 Feb 26, 2026
a15e557
perf: use deque for FIFO queues in sequence parallel, superoffload, a…
giulio-leone Mar 1, 2026
116dbe2
Fix: only add parameter with grads to parameter group (#7869)
delock Mar 1, 2026
bffaf45
Fix no-grad grad-fn lookup in ZeRO hook counting on PyTorch 2.3 (#783…
tohtana Mar 1, 2026
04d69cc
Fix import deepspeed crash on PyTorch v2.3 + Python 3.12 (#7875)
tohtana Mar 2, 2026
d8e15da
XPU use stock pytorch instead of Intel Extension for PyTorch (#7877)
delock Mar 2, 2026
a41a96b
Remove amp() from abstract accelerator (#7879)
delock Mar 2, 2026
4dba1e2
Add document section explaining autocast nesting (#7883)
tohtana Mar 4, 2026
6c59d54
Fix hook count performance regression from v0.18.5 (#7886)
tohtana Mar 5, 2026
285cae3
Suppress see_memory_usage logs (#7891)
sfc-gh-truwase Mar 8, 2026
d9a4aad
[Bloom] Fix hangs of bloom test (#7890)
k-artem Mar 10, 2026
b6346bf
double reduction user-friendly error (#7895)
stas00 Mar 10, 2026
f88d0f8
Fix async_io ops building error on Huawei Ascend NPU (#7894)
huangyifan0610 Mar 12, 2026
784cc26
Fix Evoformer's multi-arch dispatch root cause (#7881)
tohtana Mar 13, 2026
63eeb11
fix: Validate fp16.loss_scale is finite and non-negative (#7889)
nathon-lee Mar 13, 2026
49a82a0
Add AGENTS.md and CLAUDE.md with project rules for AI coding agents (…
delock Mar 13, 2026
be60451
fix(zero3): use current_stream() instead of default_stream() for grad…
michaelroyzen Mar 13, 2026
5f7b687
Update version (#7903)
loadams Mar 13, 2026
38bd11a
Respect `$TRITON_HOME` (#7907)
Flamefire Mar 23, 2026
f2bb1ec
Add Feature Universal Checkpoint for AutoTP (#7908)
nathon-lee Mar 24, 2026
5ce3abb
fix: remove unnecessary shell=True in ROCm GPU architecture detection…
instantraaamen Mar 24, 2026
26c954f
Don't detect local GPU if `$DS_IGNORE_CUDA_DETECTION` is set (#7896)
Flamefire Mar 24, 2026
a240c4d
Add HuggingFace tp_plan support for AutoTP (#7901)
delock Mar 25, 2026
f887b98
fix: handle non-existent path in is_nfs_path for Triton autotune cach…
Krishnachaitanyakc Mar 25, 2026
138f20d
Fix backward compatibility of torch.amp.custom_fwd for PyTorch < 2.4 …
tohtana Mar 25, 2026
956ec6f
Extending Muon Optimizer Support for ZeRO Stage 3 (#7919)
PKUWZP Mar 26, 2026
62c3e6d
Add news item for ASPLOS 2026 Best Paper Award (#7923)
PKUWZP Mar 26, 2026
729df6c
fix(superoffload) preserve multi-group updates with shared cpu buffer…
xylian86 Mar 28, 2026
abb88ce
AGENTS.md: Add pre-commit command to existing CI requirements line (#…
delock Mar 29, 2026
2bae360
Update README with latest news from DeepSpeed (#7931)
PKUWZP Mar 30, 2026
5efb24a
Merging AutoSP into DeepSpeed (#7860)
neeldani Mar 30, 2026
36f0b0c
Add fallback to full test (#7933)
tohtana Mar 30, 2026
9486ab7
Remove Microsoft Corporation copyright from AGENTS.md and CLAUDE.md (…
PKUWZP Mar 30, 2026
8c93851
Update version.txt for latest incoming release 0.18.9 (#7935)
loadams Mar 30, 2026
607b55f
Update version after latest release (v0.18.9) (#7936)
loadams Mar 30, 2026
89bf0d2
Refactor consolidate transpose (#7934)
nathon-lee Mar 30, 2026
e79c2e8
Merge branch 'master' into tohtana/add_autoep
tohtana Mar 31, 2026
3bdebc0
Fix/fix autotp universal checkpoint ci (#7937)
tohtana Mar 31, 2026
2f0924a
Fix process hang in process-group shutdown (#7941)
Flamefire Mar 31, 2026
5dce124
Zero3 defragment utility (#7940)
nathon-lee Mar 31, 2026
046db04
fix(autoep): restore ep_count helper
tohtana Apr 1, 2026
71a0a36
test(autoep): make checkpoint tests cpu-safe
tohtana Apr 1, 2026
bf0126b
[SP] add SP deny list instead of allow (#7887)
kashif Apr 1, 2026
fae0276
Fix AutoEP ZeRO-2 expert gradient scaling
tohtana Apr 3, 2026
37e232f
fix(zero): detach flat buffer to prevent autograd inplace error on CP…
delock Apr 3, 2026
5f7dc1e
fix(autoep): preserve manual backward parity
tohtana Apr 5, 2026
3a5df51
Fix FPQuantizer build (#7963)
Flamefire Apr 8, 2026
90f86c7
fix(autoep): align combine path with grouped-mm baseline
tohtana Apr 10, 2026
b207c5e
feat(autoep): add selectable combine implementations
tohtana Apr 11, 2026
9d632f1
Fix zero 1 and 2 CPU-offloaded gradient norm (#7967)
alek6kun Apr 11, 2026
dac1525
Fix overlap-comm buffer lifetimes (#7965)
tohtana Apr 11, 2026
ecb26a5
Fix DeepCompile+Z3 on PyTorch v2.9/2.10 (#7951)
tohtana Apr 11, 2026
0e872cc
Merge branch 'master' into tohtana/add_autoep
tohtana Apr 11, 2026
3fd762c
Fix WarmupCosineLR multi-group initialization (#7969)
tohtana Apr 12, 2026
0ba2352
Enable PyTorch version selection for full test (#7968)
tohtana Apr 12, 2026
3ba3dc9
fix(fp_quantizer): fix UB and negative shift warnings in fp_quantize_…
Cursx Apr 15, 2026
893c6d2
fix(op_builder): avoid duplicate/wrong -gencode flags (#7974)
Cursx Apr 15, 2026
dc0fd29
Rename dequantization template parameters (#7976)
Flamefire Apr 15, 2026
6b65486
Merge branch 'master' into tohtana/add_autoep
tohtana Apr 16, 2026
2c873a9
test(autoep): update combine_from_routed calls
tohtana Apr 16, 2026
240495c
fix(autoep): support llama4 fused experts
tohtana Apr 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 165 additions & 10 deletions .github/workflows/aws-torch-latest-full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,76 @@
# DeepSpeed CI - AWS L40S GPU Full Tests (PyTorch Latest)
#
# Runs the full DeepSpeed unit test suite on AWS self-hosted runners.
# Uses 4x NVIDIA L40S GPUs on g6e.12xlarge instances.
# Prefers 4x NVIDIA L40S GPUs on g6e.12xlarge instances, with AWS-side
# fallback to 8x A100 nodes when L40S capacity is unavailable.
#
# This workflow runs:
# - Parallel tests with pytest-xdist (-n 8)
# - Sequential tests marked with @pytest.mark.sequential
# - Nightly schedule: skips if no new commits since last successful run
################################################################################

name: aws-torch-latest-full

on:
schedule:
- cron: '0 8 * * *' # Daily at 08:00 UTC (midnight PST)
workflow_dispatch:
inputs:
torch_preset:
description: PyTorch preset to install for manual runs
required: false
default: '2.7.1-cu126'
type: choice
options:
- '2.7.1-cu126'
- '2.8.0-cu126'
- '2.9.1-cu126'
- '2.10.0-cu126'
- '2.11.0-cu126'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
check-changes:
name: Check for new commits
runs-on: ubuntu-latest
if: github.event_name == 'schedule'
outputs:
has_changes: ${{ steps.check.outputs.has_changes }}
steps:
- name: Check for commits since last successful run
id: check
env:
GH_TOKEN: ${{ github.token }}
run: |
default_branch="${{ github.event.repository.default_branch }}"

last_sha=$(gh api \
"repos/${{ github.repository }}/actions/workflows/aws-torch-latest-full.yml/runs?status=success&event=schedule&branch=${default_branch}&per_page=1" \
--jq '.workflow_runs[0].head_sha // empty')

current_sha="${{ github.sha }}"

if [ -z "$last_sha" ]; then
echo "No previous successful run found - running tests"
echo "has_changes=true" >> "$GITHUB_OUTPUT"
elif [ "$last_sha" = "$current_sha" ]; then
echo "No new commits since last successful run ($last_sha) - skipping"
echo "has_changes=false" >> "$GITHUB_OUTPUT"
else
echo "New commits detected: $last_sha -> $current_sha - running tests"
echo "has_changes=true" >> "$GITHUB_OUTPUT"
fi

unit-tests:
name: Unit Tests (Full)
needs: [check-changes]
if: |
always() &&
(github.event_name == 'workflow_dispatch' || needs.check-changes.outputs.has_changes == 'true')
runs-on: [self-hosted, gpu-ci, gpu-l40s, l40s-4gpu, aws]
timeout-minutes: 180

Expand All @@ -30,11 +81,10 @@ jobs:
options: --gpus all --shm-size "32G" -v /mnt/aio:/mnt/aio

env:
TORCH_VER: "2.7"
CUDA_VER: "12.6"
DEFAULT_TORCH_PRESET: '2.7.1-cu126'
CUTLASS_PATH: /opt/cutlass
# Disable reuse_dist_env to prevent pool worker cleanup hangs in full test runs
DS_DISABLE_REUSE_DIST_ENV: "1"
DS_DISABLE_REUSE_DIST_ENV: '1'

steps:
- name: Install system dependencies
Expand All @@ -48,6 +98,79 @@ jobs:
with:
lfs: true

- name: Resolve PyTorch preset
env:
GITHUB_EVENT_NAME: ${{ github.event_name }}
MANUAL_TORCH_PRESET: ${{ github.event.inputs.torch_preset || '' }}
run: |
if [ "$GITHUB_EVENT_NAME" = 'workflow_dispatch' ] && [ -n "$MANUAL_TORCH_PRESET" ]; then
selected_preset="$MANUAL_TORCH_PRESET"
else
selected_preset="$DEFAULT_TORCH_PRESET"
fi

case "$selected_preset" in
'2.7.1-cu126')
torch_install_version='2.7.1'
torchvision_install_version='0.22.1'
torchaudio_install_version='2.7.1'
torch_test_version='2.7'
cuda_test_version='12.6'
pytorch_index_url='https://download.pytorch.org/whl/cu126'
;;
'2.8.0-cu126')
torch_install_version='2.8.0'
torchvision_install_version='0.23.0'
torchaudio_install_version='2.8.0'
torch_test_version='2.8'
cuda_test_version='12.6'
pytorch_index_url='https://download.pytorch.org/whl/cu126'
;;
'2.9.1-cu126')
torch_install_version='2.9.1'
torchvision_install_version='0.24.1'
torchaudio_install_version='2.9.1'
torch_test_version='2.9'
cuda_test_version='12.6'
pytorch_index_url='https://download.pytorch.org/whl/cu126'
;;
'2.10.0-cu126')
torch_install_version='2.10.0'
torchvision_install_version='0.25.0'
torchaudio_install_version='2.10.0'
torch_test_version='2.10'
cuda_test_version='12.6'
pytorch_index_url='https://download.pytorch.org/whl/cu126'
;;
'2.11.0-cu126')
torch_install_version='2.11.0'
torchvision_install_version='0.26.0'
torchaudio_install_version='2.11.0'
torch_test_version='2.11'
cuda_test_version='12.6'
pytorch_index_url='https://download.pytorch.org/whl/cu126'
;;
*)
echo "Unsupported torch_preset: $selected_preset" >&2
exit 1
;;
esac

{
echo "SELECTED_TORCH_PRESET=$selected_preset"
echo "TORCH_INSTALL_VERSION=$torch_install_version"
echo "TORCHVISION_INSTALL_VERSION=$torchvision_install_version"
echo "TORCHAUDIO_INSTALL_VERSION=$torchaudio_install_version"
echo "TORCH_TEST_VERSION=$torch_test_version"
echo "CUDA_TEST_VERSION=$cuda_test_version"
echo "PYTORCH_INDEX_URL=$pytorch_index_url"
} >> "$GITHUB_ENV"

echo "Selected preset: $selected_preset"
echo "Resolved install tuple: torch==$torch_install_version torchvision==$torchvision_install_version torchaudio==$torchaudio_install_version"
echo "Resolved test expectations: torch=$torch_test_version cuda=$cuda_test_version"
echo "Resolved PyTorch index: $pytorch_index_url"

- name: Install CUTLASS
run: |
git clone --depth 1 --branch v3.5.1 https://github.com/NVIDIA/cutlass.git /opt/cutlass
Expand All @@ -56,7 +179,11 @@ jobs:

- name: Install PyTorch
run: |
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126
pip install \
torch=="$TORCH_INSTALL_VERSION" \
torchvision=="$TORCHVISION_INSTALL_VERSION" \
torchaudio=="$TORCHAUDIO_INSTALL_VERSION" \
--index-url "$PYTORCH_INDEX_URL"

- name: Install transformers
run: |
Expand All @@ -75,6 +202,12 @@ jobs:

- name: Check environment
run: |
echo "=== Selected PyTorch Preset ==="
echo "Preset: $SELECTED_TORCH_PRESET"
echo "Install tuple: torch==$TORCH_INSTALL_VERSION torchvision==$TORCHVISION_INSTALL_VERSION torchaudio==$TORCHAUDIO_INSTALL_VERSION"
echo "PyTorch index URL: $PYTORCH_INDEX_URL"
echo "Expected test versions: torch=$TORCH_TEST_VERSION cuda=$CUDA_TEST_VERSION"
echo ""
echo "=== GPU Information ==="
nvidia-smi
echo ""
Expand All @@ -90,10 +223,32 @@ jobs:
echo ""
echo "=== CUTLASS ==="
echo "CUTLASS_PATH: $CUTLASS_PATH"
ls -la $CUTLASS_PATH/include/ | head -5
ls -la "$CUTLASS_PATH"/include/ | head -5

- name: Detect GPU architecture
run: |
python - <<'PY'
import os
import torch

torch.cuda.init()
major, minor = torch.cuda.get_device_capability(0)
arch = f"{major}.{minor}"
gpu_count = torch.cuda.device_count()
gpu_name = torch.cuda.get_device_name(0)

with open(os.environ["GITHUB_ENV"], "a", encoding="utf-8") as env_file:
env_file.write(f"TORCH_CUDA_ARCH_LIST={arch}\n")
env_file.write(f"GPU_COUNT={gpu_count}\n")

print(f"Detected GPU: {gpu_name}")
print(f"Detected compute capability: {arch}")
print(f"Detected GPU count: {gpu_count}")
PY

- name: Install DeepSpeed
run: |
echo "Using TORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST"
# Initialize CUDA before install so setup.py can detect NCCL version
python -c "import torch; torch.cuda.init(); print(f'NCCL version: {torch.cuda.nccl.version()}')"
# Use --no-build-isolation so setup.py can access pre-installed PyTorch
Expand All @@ -106,7 +261,7 @@ jobs:

- name: Unit tests (parallel)
run: |
export TORCH_CUDA_ARCH_LIST="8.9"
echo "Running parallel tests with TORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST on $GPU_COUNT GPUs"
cd tests
# Skip tests requiring unavailable hardware or known issues:
# - nvme checkpointing: no nvme device
Expand All @@ -120,11 +275,11 @@ jobs:
--ignore=unit/launcher/test_user_args.py \
--ignore=unit/runtime/zenflow \
--ignore=unit/ops/adam/test_zf_torch_adam.py \
--torch_ver=${{ env.TORCH_VER }} --cuda_ver=${{ env.CUDA_VER }}
--torch_ver="$TORCH_TEST_VERSION" --cuda_ver="$CUDA_TEST_VERSION"

- name: Unit tests (sequential)
run: |
export TORCH_CUDA_ARCH_LIST="8.9"
echo "Running sequential tests with TORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST on $GPU_COUNT GPUs"
cd tests
rm -rf /mnt/aio/pytest
pytest --instafail --timeout 600 --forked -m 'sequential' --basetemp=/mnt/aio/pytest unit/ \
Expand All @@ -134,4 +289,4 @@ jobs:
--ignore=unit/runtime/zenflow \
--ignore=unit/ops/adam/test_zf_torch_adam.py \
--ignore=unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py \
--torch_ver=${{ env.TORCH_VER }} --cuda_ver=${{ env.CUDA_VER }}
--torch_ver="$TORCH_TEST_VERSION" --cuda_ver="$CUDA_TEST_VERSION"
15 changes: 12 additions & 3 deletions .github/workflows/nv-pre-compile-ops.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,20 @@ jobs:
unit-tests:
runs-on: ubuntu-24.04
container:
image: deepspeed/gh-builder:ubuntu1804-py38-torch1131-cu116
image: nvidia/cuda:12.6.3-devel-ubuntu22.04

steps:
- name: Install system dependencies
run: |
apt-get update && apt-get install -y git python3 python3-pip libaio-dev ninja-build
ln -sf /usr/bin/python3 /usr/bin/python

- uses: actions/checkout@v4

- name: Install PyTorch
run: |
pip install torch==2.10.0 --index-url https://download.pytorch.org/whl/cu126

- name: environment
run: |
which python
Expand All @@ -36,7 +45,7 @@ jobs:
#python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
- name: Compile DeepSpeed Ops
run: |
DS_ACCELERATOR=cuda DS_ENABLE_NINJA=1 TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0" DS_BUILD_OPS=1 DS_BUILD_SPARSE_ATTN=0 DS_BUILD_FP_QUANTIZER=0 DS_BUILD_CUTLASS_OPS=0 DS_BUILD_GDS=0 DS_BUILD_RAGGED_DEVICE_OPS=0 DS_BUILD_EVOFORMER_ATTN=0 DS_BUILD_DEEP_COMPILE=0 pip3 install .
DS_ACCELERATOR=cuda DS_ENABLE_NINJA=1 TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0;8.6;8.9;9.0" DS_BUILD_OPS=1 DS_BUILD_SPARSE_ATTN=0 DS_BUILD_FP_QUANTIZER=0 DS_BUILD_CUTLASS_OPS=0 DS_BUILD_GDS=0 DS_BUILD_RAGGED_DEVICE_OPS=0 DS_BUILD_EVOFORMER_ATTN=0 DS_BUILD_DEEP_COMPILE=0 pip3 install .
- name: DS Report
run: |
ds_report
DS_ACCELERATOR=cuda ds_report
10 changes: 3 additions & 7 deletions .github/workflows/xpu-compile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
compile-tests:
runs-on: [self-hosted, intel, xpu]
container:
image: intel/oneapi-basekit:2024.2.1-0-devel-ubuntu22.04
image: intel/oneapi-basekit:2025.0.2-0-devel-ubuntu22.04
ports:
- 80
options: --privileged -it --rm --device /dev/dri:/dev/dri -v /dev/dri/by-path:/dev/dri/by-path --ipc=host --cap-add=ALL
Expand All @@ -31,11 +31,7 @@ jobs:
run: |
apt-get update
apt-get install clinfo libaio-dev python3-pip -y
pip install torch==2.3.1 -f https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/torch/
pip install intel-extension-for-pytorch==2.3.110+xpu -f https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/intel-extension-for-pytorch/
pip install oneccl_bind_pt==2.3.100+xpu -f https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/oneccl-bind-pt/
pip install torchvision==0.18.1 -f https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/torchvision/
pip install https://github.com/intel/intel-xpu-backend-for-triton/releases/download/v3.0.0b2/triton_xpu-3.0.0b2-cp310-cp310-linux_x86_64.whl
pip install torch==2.10.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu
pip install py-cpuinfo numpy
pip install .[dev,autotuning]

Expand All @@ -44,7 +40,7 @@ jobs:
ldd --version
ds_report
python3 -c "import torch; print('torch:', torch.__version__, torch)"
python3 -c "import torch; import intel_extension_for_pytorch; print('XPU available:', torch.xpu.is_available())"
python3 -c "import torch; print('XPU available:', torch.xpu.is_available())"
python3 -c "from deepspeed.accelerator import get_accelerator; print('accelerator:', get_accelerator()._name)"
pip list

Expand Down
5 changes: 2 additions & 3 deletions .github/workflows/xpu-max1100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,7 @@ jobs:
apt-get install -y python3.11 python3.11-dev python3-pip clinfo libaio-dev
pip install --upgrade pip
pip install py-cpuinfo
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/xpu
pip install intel-extension-for-pytorch==2.7.10+xpu oneccl_bind_pt==2.7.0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us --trusted-host pytorch-extension.intel.com
pip install torch==2.10.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu
pip install .[dev,autotuning]

- name: Check container state
Expand All @@ -60,7 +59,7 @@ jobs:
ldd --version
ds_report
python3 -c "import torch; print('torch:', torch.__version__, torch)"
python3 -c "import torch; import intel_extension_for_pytorch; print('XPU available:', torch.xpu.is_available())"
python3 -c "import torch; print('XPU available:', torch.xpu.is_available())"
python3 -c "from deepspeed.accelerator import get_accelerator; print('accelerator:', get_accelerator()._name)"
pip list

Expand Down
32 changes: 32 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<!-- This file is duplicated as CLAUDE.md and AGENTS.md. Keep them in sync. -->
# AGENTS.md — Workspace-level instructions for AI coding agents

## DeepSpeed Project Rules

### Commit & CI requirements

- All commits MUST have a `Signed-off-by` line (use `--signoff`). Get the name and email from `git config user.name` / `git config user.email`.
- Formatting: yapf (column_limit=119, `.style.yapf`) + flake8 (`.flake8`).
- Always verify changed files pass pre-commit checks before committing: `pre-commit run --files <changed_files>`. Only check modified files, not the entire codebase. Config: `.pre-commit-config.yaml`.
- `check-torchdist` hook: NEVER directly import torch's distributed module. Use `import deepspeed.comm as dist` instead.
- New files require license header:
```
# SPDX-License-Identifier: Apache-2.0
# DeepSpeed Team
```

### Code change discipline

- NEVER make cosmetic/formatting-only changes to existing code. Only add/modify lines that are functionally necessary. Minimizing diff noise is critical for code review.
- Delete dead code decisively — if code is unused at runtime (only referenced in tests), remove it along with its tests.
- Prefer consolidating tests over proliferating test files.
- Blend in: when modifying code, read the surrounding context and match the style of neighboring code (naming, spacing, patterns, idioms).
- Write beginner-friendly code: avoid deeply nested expressions or chained logic. Break complex expressions into clear, named intermediate steps.
- Comments should explain **why**, not **what**. Describe the purpose and reasoning, not the mechanics that the code already shows.
- New features must include corresponding tests and documentation updates.

## Tool Caveats

### Edit tool auto-formatter

The Edit tool has a hidden auto-formatter that silently changes quotes, whitespace, blank lines, and line wrapping. For format-sensitive modifications (e.g., when exact formatting matters for pre-commit), use `bash` with `sed`, `python`, or `cat` instead.
Loading