Skip to content

Conversation

@VeeraRajasekhar
Copy link
Contributor

@VeeraRajasekhar VeeraRajasekhar commented Jan 14, 2026

Description

  • Add a manual AITER prebuilt upload workflow (aiter-prebuilt-upload.yml) that lets you pick a docker image and GPU arch list, builds the aiter libs inside the container, and packages/uploads via the shared shell script.
  • Move build/package/upload logic into ci/aiter_upload.sh (uses GPU_ARCHS, defaults to gfx942;gfx950, supports upload when env vars are set; otherwise packages and prints the artifact path).
  • Strip upload logic from aiter_prebuilt.cmake so normal builds only download/use existing prebuilts.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Add a workflow_dispatch GHA to build/upload AITER prebuilts using a chosen image and GPU arch list, reusing ci/aiter_upload.sh.
  • Update ci/aiter_upload.sh to handle build + package + optional upload (env-driven), default GPU_ARCHS to gfx942;gfx950, and honor GPU_ARCHS input.
  • Remove upload/packaging logic from aiter_prebuilt.cmake; it now only downloads/uses prebuilts in the normal build.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@ipanfilo
Copy link
Collaborator

This approach does not work.

  1. Anyone can create PR against protected branches whereas the code is expected to work only for org members
  2. When building aiter on CI machine, it is built with one specific GPU arch but we need it to be built against all supported architectures
  3. It is desired to build new aiter not only for single ROCm version used for CI
    So there should be separate build procedure with specific usage by limited group of people and hence it is no practical to fit it to CI action. If you want to utilize GHA infrastructure for that, I suggest to create separate action that can be run manually. But let's first justify why GHA action is preferred over dedicated manually running script

@VeeraRajasekhar VeeraRajasekhar marked this pull request as draft January 14, 2026 17:03
@VeeraRajasekhar
Copy link
Contributor Author

VeeraRajasekhar commented Jan 14, 2026

I like your idea of creating a separate GHA workflow for this. I feel creating this workflow which accepts a docker image is a better option. I don't see lot of our group is pushing to the aiter artifactory. Creating this workflow will be a good option to increase the cache hit.

As per the permissions, we can explore this option
workflow_dispatch (Manual Trigger) Event: Workflows with the workflow_dispatch trigger can only be triggered by users who are collaborators with at least write access to the repository. External contributors (those without write access) cannot manually trigger these workflows from the UI or API.

Add a workflow_dispatch GHA to build/upload aiter prebuilts using a chosen image and GPU arch list, reusing the shell script.

Make ci/aiter_upload.sh handle build + package + upload with optional env-based upload, respecting GPU_ARCHS input and defaulting to gfx942;gfx950.

Strip upload/packaging logic out of the CMake helper so normal builds only download/use prebuilts.
@VeeraRajasekhar VeeraRajasekhar force-pushed the veergopu/aiter_prebuilt_automation branch from 69e8d2e to a631973 Compare January 15, 2026 06:42
@VeeraRajasekhar VeeraRajasekhar marked this pull request as ready for review January 15, 2026 06:51
@VeeraRajasekhar
Copy link
Contributor Author

@ipanfilo @wangye805 PR is ready for review.

Summary: Offloaded all upload related functionality from cmake file. Added a new bash script to upload and this is used in the github action file.

ROCM_VER="$(head -n1 "${ROCM_PATH}/.info/version" | sed -n 's/^\([0-9]\+\.[0-9]\+\).*/\1/p')"

AITER_DIR="${ROOT_DIR}/3rdparty/aiter"
git -C "${AITER_DIR}" config --global --add safe.directory "${AITER_DIR}" >/dev/null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use code from

COMMAND sh -c "export GIT_CONFIG_GLOBAL=$(mktemp /tmp/gitconfig.XXXXXX);
not to modify repository

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

echo "[AITER-PREBUILT] Building aiter libs for ${ARCHS} ..."
rm -rf "${AITER_DIR}/aiter/jit/build"
AITER_LOG_MORE=1 \
GPU_ARCHS="${ARCHS}" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CK_TILE_FLOAT_TO_BFLOAT16_DEFAULT is missed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

-T "${OUTPUT_TGZ}" \
"${REMOTE_URL}" \
-o /dev/null
echo "[AITER-PREBUILT] Uploaded tgz to ${REMOTE_URL}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about .sha256?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jfrog is automatically creating a sha256 if we upload a file, so to leverage thatI didn't upload. But in future if we change the storage loc, we might need, so I will upload it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If JFrog creates sha256 then it may make more sense to download it and compare with local sha256 to make sure the package uploaded correctly. And if there is no sha256 on server and you want support non-Jfrog servers then upload local sha256 file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, added functionality to check the sha256 with local

@@ -0,0 +1,92 @@
name: AITER Prebuilt Upload
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add copyright

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


- name: Host Diagnostics (upload)
run: |
echo "::group::Host Diagnostics"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For building this host info is not important

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

docker exec \
-e NVTE_AITER_PREBUILT_BASE_URL=${NVTE_AITER_PREBUILT_BASE_URL} \
-e NVTE_AITER_PREBUILT_UPLOAD_TOKEN=${NVTE_AITER_PREBUILT_UPLOAD_TOKEN} \
-e GPU_ARCHS_INPUT="${{ inputs.gpu_archs }}" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this intermediate var is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

exit 1
fi
docker exec \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to separate run and exec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

ci/aiter_upload.sh --build
'
- name: Cleanup container
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using run --rm, this step is not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -0,0 +1,81 @@
#!/usr/bin/env bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add copyright. And maybe move it to .github/scripts/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, moved to .github/scripts

Move aiter upload helper to .github/scripts, add copyright header, and use a temp gitconfig for safe.directory/commit lookup

set CK_TILE_FLOAT_TO_BFLOAT16_DEFAULT, added functionality to verify remote SHA after upload

Trim workflow diagnostics/cleanup, use --rm container, pass GPU_ARCHS input directly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants