Skip to content

ci: run Docker builds on a self-hosted 1ES runner pool#63

Open
benhillis wants to merge 1 commit into
microsoft:mainfrom
benhillis:ci/use-self-hosted-runners
Open

ci: run Docker builds on a self-hosted 1ES runner pool#63
benhillis wants to merge 1 commit into
microsoft:mainfrom
benhillis:ci/use-self-hosted-runners

Conversation

@benhillis
Copy link
Copy Markdown
Member

@benhillis benhillis commented May 15, 2026

Move the build job off ubuntu-latest onto the dedicated 1ES Hosted Pool openvmm-deps-gh-amd-westus2 (image ubuntu2404-amd64). Both x86_64 and aarch64 share the AMD pool; aarch64 still cross-compiles via QEMU.

Fixes the recurring No space left on device flakes from the GitHub-hosted runners (e.g. run 25928571548) and gives us a project-owned scheduling pool.

Also a nice speedup — D32a has 32 vCPUs vs ubuntu-latest's 4, and build I/O lands on a local 1.2TB SSD:

arch old (ubuntu-latest, avg of 3 runs) new (1ES Standard_D32a) speedup
x86_64 ~36 min 9 min 20 s ~3.9×
aarch64 ~43 min 15 min 34 s ~2.8×

The workflow grew an Install Docker Engine step because the 1ES image doesn't ship Docker. The OS disk is small (~80GB) so the storage paths are pre-symlinked to /mnt (~1.2TB resource SSD); without that, Docker 29's containerd snapshotter fills the OS disk and the runner agent gets killed mid-build with no log capture.

Future improvement: bake Docker (and the /mnt storage layout) directly into the 1ES image so the install step can be dropped entirely.

cgmanifest and release stay on ubuntu-latest (small jobs, no benefit).

Copilot AI review requested due to automatic review settings May 15, 2026 18:14
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Moves Docker build jobs to Microsoft 1ES self-hosted runners to increase disk/CPU headroom and reduce flakiness from ubuntu-latest limitations.

Changes:

  • Switch build job from ubuntu-latest to the 1ES hosted pool openvmm-deps-gh-amd-westus2 with ubuntu2404-amd64.
  • Add job-level permissions to enable OIDC (id-token: write) for 1ES.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@benhillis benhillis force-pushed the ci/use-self-hosted-runners branch from 855eec2 to 8fab03f Compare May 15, 2026 18:30
Copilot AI review requested due to automatic review settings May 15, 2026 18:39
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

.github/workflows/build.yml:51

  • The PR description calls out aarch64 continuing to build under QEMU, but this change removes the conditional docker/setup-qemu-action step and doesn't add an equivalent binfmt/QEMU setup. If aarch64 is still expected to run on an amd64 runner, please add QEMU/binfmt setup back; if the intent is native arm64 runners, consider updating the description (and potentially the --platform value) to reflect the new execution model.
      - name: Install Docker Engine
        uses: docker/setup-docker-action@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

Comment thread .github/workflows/build.yml Outdated
Copilot AI review requested due to automatic review settings May 15, 2026 18:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread .github/workflows/build.yml Outdated
Copilot AI review requested due to automatic review settings May 15, 2026 19:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

Comment thread .github/workflows/build.yml
Comment thread .github/workflows/build.yml
Comment thread .github/workflows/build.yml
Comment thread .github/workflows/build.yml
@benhillis benhillis changed the title ci: run Docker builds on self-hosted 1ES runners ci: run Docker builds on a self-hosted 1ES runner pool May 15, 2026
Copilot AI review requested due to automatic review settings May 15, 2026 19:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

Comment thread .github/workflows/build.yml
Comment thread .github/workflows/build.yml
Comment thread .github/workflows/build.yml Outdated
Comment thread .github/workflows/build.yml
Copilot AI review requested due to automatic review settings May 15, 2026 20:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

.github/workflows/build.yml:62

  • Making the Docker socket world-writable (chmod 666) allows any process/user on the runner to gain root via Docker. Prefer adding the runner user to the docker group (and keeping the socket at 660), or keep using sudo docker ... in the steps that need it and avoid broadening socket permissions.
          # Allow the runner user (and other actions like setup-qemu-action)
          # to talk to dockerd without sudo.
          sudo chmod 666 /var/run/docker.sock
          docker info --format 'Docker Root Dir: {{.DockerRootDir}} / Storage Driver: {{.Driver}}'

Comment thread .github/workflows/build.yml
Comment thread .github/workflows/build.yml
Move the build job off ubuntu-latest onto the dedicated 1ES Hosted
Pool openvmm-deps-gh-amd-westus2 (image ubuntu2404-amd64). Both
x86_64 and aarch64 share the AMD pool; aarch64 still cross-compiles
via QEMU.

Fixes the recurring 'No space left on device' flakes on the
GitHub-hosted runners, gives us a project-owned scheduling pool, and
is a sizeable speedup (D32a has 32 vCPUs vs ubuntu-latest's 4, and
build I/O lands on a local 1.2TB SSD): x86_64 ~36m -> 9m20s,
aarch64 ~43m -> 15m34s.

The 1ES ubuntu2404-amd64 image doesn't ship with Docker, so the
workflow installs it via the get.docker.com script and pre-symlinks
/var/lib/{docker,containerd,buildkit} to /mnt before the install.
The OS disk is small (~80GB) and Docker 29's containerd snapshotter
writes to /var/lib/containerd regardless of dockerd's data-root, so
without the symlinks the OS disk fills up mid-build and the runner
agent gets killed with no log capture. The whole install step could
be dropped in the future by baking Docker into a custom 1ES image.

cgmanifest and release jobs stay on ubuntu-latest (small jobs, no
benefit from self-hosted capacity).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@benhillis benhillis force-pushed the ci/use-self-hosted-runners branch from c5b43c7 to 6a9f6cb Compare May 15, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants