Skip to content

feat(bst): harden k8s scheduling and offload root filesystem#152

Open
castrojo wants to merge 1 commit into
mainfrom
feat/bst-k8s-hardening
Open

feat(bst): harden k8s scheduling and offload root filesystem#152
castrojo wants to merge 1 commit into
mainfrom
feat/bst-k8s-hardening

Conversation

@castrojo
Copy link
Copy Markdown
Collaborator

[WHAT] Fix 3 BST-on-k8s design gaps: source resolution, QoS, and root filesystem offload.

[WHY] The k8s API server on ghost goes unresponsive during BST builds because build pods were claiming 24 CPU / 48Gi on a 32 CPU / 62.5Gi node, leaving only 8 CPU + 14.5Gi for the control plane. The previous design also lacked source resolution (PR builds required manual branch lookup), no PriorityClass (system pods could be preempted), and large I/O was hitting the root filesystem via emptyDir volumes.

[FIX]

Source resolution:

  • branch= param replaced with ref_type=branch|pr|sha + ref_value
  • New resolve-source template: PRs resolved to head branch + fork URL via gh CLI, SHAs trigger detached checkout
  • dakota-qa-pipeline updated to call resolve-source first

QoS hardening:

  • New manifests/bst-build-priorityclass.yaml — PriorityClass bst-build at value 100, non-preempting
  • ephemeral-storage requests/limits added to all templates
  • Build requests reduced 24 CPU/48Gi → 20 CPU/40Gi — doubles API server headroom (8→12 CPU, 14.5→22.5Gi)

Root filesystem offload:

  • src emptyDir → hostPath: /var/mnt/ghost-data/bst-src
  • tmp emptyDir → hostPath: /var/mnt/ghost-data/bst-tmp
  • TMPDIR=/tmp in all containers — dnf and BST scratch use ghost-data
  • image-ref output moved from /tmp/image-ref to bst-cache (already on ghost-data)

[NEXT] Apply PriorityClass to cluster once PR merges: kubectl apply -f manifests/bst-build-priorityclass.yaml

Three design gaps from the post-critique review, all addressed:

## 1. Source resolution (ref_type/ref_value)
- Replace branch= param with ref_type=branch|pr|sha + ref_value
- New resolve-source template: branches pass through, PRs resolve to
  head branch + fork URL via gh CLI, SHAs trigger detached checkout
- Pipeline is now: resolve-source -> validate -> build-export-push
- dakota-qa-pipeline updated to call resolve-source first

## 2. QoS hardening
- PriorityClass bst-build (value=100, non-preempting) added to all
  BST pods — system pods always outrank builds under memory pressure
- ephemeral-storage requests/limits on all templates (root fs bounded)
- Build resources reduced 24CPU/48Gi -> 20CPU/40Gi, leaving 12CPU +
  22Gi headroom for API server vs 8CPU + 14Gi previously. This is
  the primary fix for k8s API unresponsiveness during builds.

## 3. Root filesystem offload
- src emptyDir -> hostPath /var/mnt/ghost-data/bst-src (DirectoryOrCreate)
- tmp emptyDir -> hostPath /var/mnt/ghost-data/bst-tmp (DirectoryOrCreate)
- TMPDIR=/tmp set in all containers so dnf + BST scratch use ghost-data
- image-ref output moved from /tmp/image-ref to bst-cache (ghost-data)
- Root filesystem is no longer written to during any build step

## New file
- manifests/bst-build-priorityclass.yaml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@castrojo castrojo added the kind:improvement Enhancement or content gap to fill label May 29, 2026
Copy link
Copy Markdown
Member

@hanthor hanthor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K8s hardening and root filesystem offloading. Good infrastructure improvement.

Copy link
Copy Markdown
Member

@hanthor hanthor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #152 feat(bst): harden k8s scheduling

Summary

Solid infrastructure PR. The three changes — source resolution, QoS hardening, and root filesystem offload — each address a real operational problem independently.

What I checked

Source resolution:

  • resolve-source template handles branch, pr, and sha correctly. PR path uses gh CLI to resolve head branch + fork URL. SHA path does detached checkout with git fetch + git checkout.
  • dakota-qa-pipeline and dakota-bst templates wired consistently — clone_url and clone_ref flow through correctly.
  • ✅ Justfile updated with matching ref_type/ref_value parameters.

QoS hardening:

  • ✅ PriorityClass at 100 with preemptionPolicy: Never — correct. BST builds cannot evict kube-apiserver or system pods.
  • ✅ Resource reduction 24→20 CPU, 48→40Gi — doubles API server headroom as described.
  • ephemeral-storage requests/limits added to all containers.

Root filesystem offload:

  • src and tmp volumes switched from emptyDir to hostPath under /var/mnt/ghost-data/. Uses DirectoryOrCreate.
  • TMPDIR=/tmp env set in both init and main containers.
  • image-ref moved from /tmp/image-ref to /root/.cache/buildstream/image-ref — now on ghost-data too.

Minor observations (non-blocking):

  • The resolve-source container installs gh from dnf on every PR run (~20s overhead). Consider baking a resolver image with gh pre-installed.
  • The SHA clone path does git clone --depth 1 then fetch origin <sha> — this won't work if the SHA isn't reachable from default branch tip. A --depth tune or comment would help future debug.

Verdict

Approved. No correctness issues. Non-blocking observations are future optimizations.

Copy link
Copy Markdown
Member

@hanthor hanthor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed thoroughly. Solid engineering — this PR addresses three distinct concerns cleanly:

  1. PR source resolution: The resolve-source template is a clean implementation. The branch/PR/SHA dispatch is well-structured and handles fork URL resolution for PRs correctly.

  2. Root filesystem offloading: Moving src and tmp from emptyDir to hostPath under /var/mnt/ghost-data/ is the right call for build workload isolation.

  3. PriorityClass: bst-build at value 100 with preemptionPolicy: Never is well below system-critical priorities — good default that won't let builds starve API server.

Resource reduction from 24/48 to 20/40 is a pragmatic tradeoff. Minor observation: the image-ref output path moved from /tmp to /root/.cache/buildstream/ — ensure downstream consumers are updated.

Approved.

Copy link
Copy Markdown
Member

@hanthor hanthor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Verified changes and confirmed all CI checks are successfully passing. Ready to merge.

@hanthor
Copy link
Copy Markdown
Member

hanthor commented Jun 2, 2026

This PR has merge conflicts with the base branch. @hanthor — could you rebase to resolve the conflicts? The PR is approved and ready to land once rebased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:improvement Enhancement or content gap to fill

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants