Skip to content

Anants/auto dep upgrade trtllm#9274

Draft
nv-anants wants to merge 13 commits intomainfrom
anants/auto-dep-upgrade-trtllm
Draft

Anants/auto dep upgrade trtllm#9274
nv-anants wants to merge 13 commits intomainfrom
anants/auto-dep-upgrade-trtllm

Conversation

@nv-anants
Copy link
Copy Markdown
Member

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Adds a daily cron pipeline that detects new trtllm releases, branches off
main, applies the version bump, validates via post-merge CI on the upgrade
branch, and either opens a draft PR or pings Slack on failure.

- detect_latest_versions.py: queries NVIDIA/TensorRT-LLM GitHub releases,
  reads the current pin from container/context.yaml, emits JSON with an
  upgrade_needed flag. Stdlib only; matches the existing .github/scripts
  house style.
- bump_dependency.py: idempotent in-place edits for the four anchor lines
  (pip_wheel, github_trtllm_commit, pyproject.toml extras, support-matrix
  main ToT row). Historical support-matrix rows stay put because the regex
  anchors on the **main (ToT)** label.
- auto-dep-upgrade-trigger.yml: cron 10:00 UTC daily plus workflow_dispatch.
  Detects, branches off main as deps/upgrade-trtllm-vX, commits the bump
  with --signoff under dynamo-dep-bot, pushes, and dispatches post-merge-ci
  on the new branch with framework=trtllm. Idempotent: skips if the branch
  already exists on origin.
- auto-dep-upgrade-complete.yml: workflow_run listener gated on
  startsWith(head_branch, 'deps/upgrade-'). On success opens a draft PR
  with labels dep-upgrade and backend::trtllm; on failure pings the
  existing post-merge Slack channel via SLACK_NOTIFY_NIGHTLY_WEBHOOK_URL.
  Drafts so pre-merge-ci doesn't fire until a human marks them ready.
- dep-upgrade-pr-body.md: PR body template, expanded via envsubst with
  framework / version / branch / run-url substitutions.

End-to-end function depends on post-merge-ci.yml accepting a framework
input on workflow_dispatch (introduced separately on
anants/auto-dep-upgrade-postmerge); that change must merge first.

Signed-off-by: Anant Sharma <anants@nvidia.com>
Throwaway diagnostic commit to validate the phase 2 dep-upgrade pipeline
end-to-end before phase 1 lands. NOT for merge — revert before reviewing.

- auto-dep-upgrade-trigger.yml: fire on push to anants/auto-dep-upgrade-trtllm,
  checkout from this branch (so dep branch inherits the modified post-merge-ci),
  drop the gh workflow run dispatch step (rejected today because main lacks
  the workflow_dispatch trigger).
- post-merge-ci.yml: add deps/upgrade-* to push triggers; gate dynamo-pipeline,
  snapshot-agent, and the vllm/sglang/planner/frontend build jobs off when
  ref_name starts with deps/upgrade-. Downstream test/compliance/copy/deploy
  jobs cascade-skip via existing needs chains. notify-slack hard-disabled.
- auto-dep-upgrade-complete.yml: hard-disable slack failure step. Listener
  cannot fire from this branch anyway (workflow_run requires the file on the
  default branch); the if:false is precautionary.

Manual step required after the trigger workflow creates deps/upgrade-trtllm-vX:
  git fetch && git checkout deps/upgrade-trtllm-vX
  git commit --allow-empty -s -m "fire CI" && git push
GITHUB_TOKEN-authored pushes do not fire downstream workflows; the manual
re-push is needed to fire post-merge-ci's push trigger on the dep branch.

Signed-off-by: Anant Sharma <anants@nvidia.com>
…push"

This reverts commit e21d883.

Signed-off-by: Anant Sharma <anants@nvidia.com>
Use a PAT issued for the dynamo-ops service account for both the dep branch
push and the post-merge-ci dispatch. Pushes performed by GITHUB_TOKEN do not
fire downstream workflows; PAT-authored pushes do, so post-merge-ci's push
trigger on main / release branches stays usable for any future automation
that wants to react to the dep branch.

Also switches the committer identity to dynamo-ops to match the auth
identity recorded by GitHub.

Requires repo secret OPS_BOT_PAT (fine-grained PAT scoped to ai-dynamo/dynamo
with Contents:RW + Pull-requests:RW + Actions:RW, or classic repo + workflow
scopes), provisioned on the dynamo-ops account.

Signed-off-by: Anant Sharma <anants@nvidia.com>
Adds push trigger on anants/auto-dep-upgrade-trtllm and switches checkout
ref from main to the phase 2 branch so the dep_upgrade scripts are
available pre-merge. schedule + workflow_dispatch require the workflow
file on the default branch, so neither fires until phase 2 merges.

Revert these two hunks before merging the PR:
  - on.push.branches addition
  - checkout step ref change (back to main)

Signed-off-by: Anant Sharma <anants@nvidia.com>
GitHub attributes commits to a user account only when the email matches the
canonical noreply form '<id>+<username>@users.noreply.github.com' for newer
accounts. dynamo-ops is recent enough to require the id-prefixed form (id
170655669); the previous plain noreply email left bump commits unattributed
in the GitHub UI.

Signed-off-by: Anant Sharma <anants@nvidia.com>
Single unified completion path. The previous workflow only opened a PR on
success and only sent slack on failure. Both notifications now fire on
every run with the same payload:

- pass/fail emoji + word
- link to the draft PR (created or reused)
- link to the post-merge CI run
- dep branch name
- a status note (ready for review on success; investigate on failure)

PR creation is idempotent under reruns: gh pr view checks first, gh pr
create only fires if no PR exists for the branch. Repeated workflow_run
firings (e.g. CI rerun) will keep the same PR and update the slack
message context only.

Inlines the PR body via heredoc; drops the .github/templates/ directory
since it held a single 19-line file used in only one place. Authentication
switched from GITHUB_TOKEN to OPS_BOT_PAT so the PR is authored by
dynamo-ops, matching the trigger workflow's commit identity.

Signed-off-by: Anant Sharma <anants@nvidia.com>
sglang-efa-build was added to post-merge-ci.yml after phase 1's gating logic
landed, so it ran unconditionally on every workflow_dispatch — including
runs scoped to a different framework (e.g. framework=trtllm). Add the same
needs:[config] + run_sglang check pattern used by every other framework
build job.

push behavior is unchanged (the !=workflow_dispatch branch short-circuits).

Signed-off-by: Anant Sharma <anants@nvidia.com>
…nches

Adds branches:[deps/upgrade-*] to the workflow_run trigger so GitHub drops
events from main / release-branch post-merge runs before any workflow
allocation happens. The previous job-level if check still produced a
"skipped" entry in the Actions UI for every regular post-merge run, which
was both noisy and unnecessary; the event-level filter eliminates that.

Drops the now-redundant if: startsWith(... 'deps/upgrade-') guard from the
finalize job, leaving a single source of truth for branch scoping.

Signed-off-by: Anant Sharma <anants@nvidia.com>
This reverts commit b5140e4.

Signed-off-by: Anant Sharma <anants@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant