Implementation of flake bump auto approve and merge by pkr4711 · Pull Request #154 · cyberus-technology/cloud-hypervisor

pkr4711 · 2026-05-05T08:53:24Z

for details please look into: ci/README.auto.approve.md

Pipeline tests are here

phip1611

I'd prefer if you elaborate on the git commit message a little. something like:

ci: implementation of flake bump auto approve and merge

Add a GitHub workflow to automatically validate and merge pull requests
that only bump flake.lock. This reduces manual review overhead for routine
dependency updates while keeping the merge path constrained and auditable.

The workflow uses a dedicated gitlint config and custom rule to ensure each
eligible commit changes exactly flake.lock. Documentation is added for the
required GitHub App, secrets, permissions, and branch-ruleset bypass setup.

Signed-off-by: Paul Kroeher <paul.kroeher@cyberus-technology.de>
On-behalf-of: SAP paul.kroeher@sap.com

Add a GitHub workflow to automatically validate and merge pull requests that only bump flake.lock. This reduces manual review overhead for routine dependency updates while keeping the merge path constrained and auditable. The workflow uses a dedicated gitlint config and custom rule to ensure each eligible commit changes exactly flake.lock. Documentation is added for the required GitHub App, secrets, permissions, and branch-ruleset bypass setup. Signed-off-by: Paul Kroeher <paul.kroeher@cyberus-technology.de> On-behalf-of: SAP paul.kroeher@sap.com

phip1611 · 2026-05-18T15:21:00Z

thanks! I'd like to postpone this until #153 is merged - hopefully by the end of the week

phip1611 · 2026-05-21T05:30:49Z

Because of our release progress, this PR now targets the old branch:

Please adjust the target branch to gardenlinux, rebase the PR and then we can proceed.

phip1611 · 2026-05-21T05:33:15Z

+        OWNER=$(echo ${GITHUB_REPOSITORY} | cut -f 1 -d '/')
+        echo "repo=$REPO" >> "$GITHUB_OUTPUT"
+        echo "owner=$OWNER" >> "$GITHUB_OUTPUT"
+    - name: Generate token


Just a general question. Is all of this needed? In another repository, I can auto-approve all PRs automatically with much less boilerplate:

https://github.com/phip1611/spectrum-analyzer/blob/1e93a225b066b06821d99fe042a87ef304bc7c1d/.github/workflows/dependabot-auto-merge.yml#L21

WDYT?

No need to bikeshed here and perfectionize, just a general question.

Any PR must be reviewed by 2 persons. If we want to auto approve & merge this PRs we need to define a bypass rule. This can be done with a dedicated GithubApp.

Please look into the README.

Perfect, thanks! :)

I'll approve once rebased and when the PR targets the right branch again.

phip1611 · 2026-05-21T05:34:01Z

@@ -0,0 +1,72 @@
+name: Flake bump


If I read this file correctly, this should be flake bump auto-approve as it just approves bump PRs but doesn't create such PRs, right?

auto approval of a PR was the task. The user has to create the PR.

Got it! Sorry, very nit request, but could you please rename to flake-bump-auto-approve or so? I think it is confusing as the name reads like a scheduled job that automatically bumps the flake inputs

yes of course

Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit f77c6ef) On-behalf-of: SAP paul.kroeher@sap.com

Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit 21bd3ae) On-behalf-of: SAP paul.kroeher@sap.com

Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit b5053ae) On-behalf-of: SAP paul.kroeher@sap.com

A buggy or malicious guest may write an inappropriate value into virtqueue's next_avail field. This will result in an error when iterating over the queue: https://github.com/rust-vmm/vm-virtio/blob/863837ef863f6880bb8357e60bbac49e72c0844c/virtio-queue/src/queue.rs#L708 but this error is (logged and) ignored if pop_descriptor_chain() is used: https://github.com/rust-vmm/vm-virtio/blob/863837ef863f6880bb8357e60bbac49e72c0844c/virtio-queue/src/queue.rs#L583 A reasonable approach, implemented here, is to mark the device as NEEDS_RESET and ignore further queue events until the guest reinitializes the device. How this patch was tested: Linux kernel was patched to trigger a bad next_avail when the virtqueue queue counter reaches 5000: --------------- START OF LINUX KERNEL PATCH ---------- $ git diff diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index b784aab668670..989f2a0c64a77 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -15,6 +15,9 @@ #include <linux/spinlock.h> #include <xen/xen.h> + +void virtqueue_kick_always(struct virtqueue *vq); + #ifdef DEBUG /* For development, we want to crash whenever the ring is screwed. */ #define BAD_RING(_vq, fmt, args...) \ @@ -677,6 +680,12 @@ static inline int virtqueue_add_split( struct virtqueue *_vq, * new available array entries. */ virtio_wmb(vq->weak_barriers); vq->split.avail_idx_shadow++; + { + if ((vq->split.avail_idx_shadow % 100) == 0) + printk(KERN_ERR "avail idx: %d", + (int)vq->split.avail_idx_shadow); + if (vq->split.avail_idx_shadow == 5000) + vq->split.avail_idx_shadow = 0; + } vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->split.avail_idx_shadow); vq->num_added++; @@ -689,6 +698,11 @@ static inline int virtqueue_add_split( struct virtqueue *_vq, if (unlikely(vq->num_added == (1 << 16) - 1)) virtqueue_kick(_vq); + { + if (unlikely(vq->split.avail_idx_shadow == 0)) + virtqueue_kick_always(_vq); + } + return 0; unmap_release: @@ -2515,6 +2529,11 @@ bool virtqueue_kick(struct virtqueue *vq) } EXPORT_SYMBOL_GPL(virtqueue_kick); +void virtqueue_kick_always(struct virtqueue *vq) +{ + virtqueue_kick_prepare(vq); + virtqueue_notify(vq); +} /** * virtqueue_get_buf_ctx - get the next used buffer * @_vq: the struct virtqueue we're talking about. --------------- END OF LINUX KERNEL PATCH ---------- Then the kernel was booted, and the host pinged until the nic became unresponsive: ping -i 0.002 192.168.4.1 Device status was confirmed using cat /sys/class/net/eth0/device/status (it was 0x4f). Then the device was re-initialized: DEV_NAME=$(basename $(readlink -f /sys/class/net/eth0/device)) echo $DEV_NAME | tee /sys/bus/virtio/drivers/virtio_net/unbind echo $DEV_NAME | tee /sys/bus/virtio/drivers/virtio_net/bind ip link set eth0 up At this point networking became healthly again. Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit 563303b) On-behalf-of: SAP paul.kroeher@sap.com

Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit 8b60b38) On-behalf-of: SAP paul.kroeher@sap.com

A malicious or buggy guest can violate virtio by making the same descriptor head available twice before the first chain has been placed on the used ring. The submit path pushed both chains onto the VecDeque-backed inflight_requests keyed by head_index, and on completion find_inflight_request() returned the first linear match. That Request's complete_async() freed its bounce buffer while the other chain's io_uring op was still targeting it, producing a use-after-free the kernel could then scribble into. Signed-off-by: Dylan Reid <dgreid@fb.com> (cherry picked from commit 544fa4a) On-behalf-of: SAP paul.kroeher@sap.com

For non-batch backends execute_async submits the kernel I/O inline before returning. An early return while processing before inserting in inflight_requests, meant the request went untracked, the local batch list was never appended to inflight_requests, even though the request is pending in the kernel. To track it, insert into self.inflight_requests as soon as execute_async returns Ok. The completion path's find_inflight_request now matches the orphan and the bounce buffer is freed only after the kernel signals it is done. Signed-off-by: Dylan Reid <dgreid@fb.com> (cherry picked from commit fa8acbd) On-behalf-of: SAP paul.kroeher@sap.com

The bounce buffer for an unaligned descriptor was allocated in execute_async and leaked on error paths, even though, for the sync case the kernel already had a pointer to the buffer. Clean this up by moving ownership of the buffer to the AlignedOperation type. To make it actually safe, stop stashing a guest memory pointer for the duration of the op. Instead, save the guest address and pass guest memory back to the complete function. Signed-off-by: Dylan Reid <dgreid@fb.com> (cherry picked from commit 1b8c92dd5e3c0c58316826486ce5ee30eeb71407)) [backport: adapted to stable/v51.x; v51.x has no block/src/request.rs split, so the new aligned_operation module is added next to block/src/lib.rs and the in tree struct, alloc, free path is replaced in place.] Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com> On-behalf-of: SAP paul.kroeher@sap.com

submit_batch_requests pushed each BatchRequest into the io_uring SQ in turn and used `?` to bail on the first push failure. Leaving the initial SQEs visible to the kernel — but submitter.submit() was never called, and every other call site in this file gates submit() behind a preceding sq.push() that now also fails on the full ring. This could allow a guest to DoS it's own queue or worse if the buffer is freed early. Signed-off-by: Dylan Reid <dgreid@fb.com> (cherry picked from commit ee315d2) On-behalf-of: SAP paul.kroeher@sap.com

Signed-off-by: Bo Chen <bchen@crusoe.ai> On-behalf-of: SAP paul.kroeher@sap.com

A restore snapshot is only construction input. The restore path should use it while rebuilding the VM from saved state, then discard it before later VM lifecycle operations run. Keeping it around is observable after a restored VM changes its device set. For example, a VM can be restored from a snapshot, live-migrated, and then hot-remove a device on the destination. The restored device tree no longer contains that device, but DeviceManager still carries the original device snapshot. That stale snapshot can affect later hotplug. Every added device eventually reaches a constructor that calls state_from_id(self.snapshot.as_ref(), id) or an equivalent snapshot lookup. If the stale snapshot still contains an entry with the newly added device name, the new device is created with old saved state even though the matching device was already removed from the live device tree. Clear the DeviceManager snapshot after hypervisor-specific initialization has rebuilt the interrupt controller and devices. From that point onward, device operations should behave as normal runtime operations, not as restore operations. Co-developed-by: Thomas Prescher <thomas.prescher@cyberus-technology.de> Co-developed-by: Leander Kohler <leander.kohler@cyberus-technology.de> On-behalf-of: SAP philipp.schuster@sap.com Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>

Add a GitHub workflow to automatically validate and merge pull requests that only bump flake.lock. This reduces manual review overhead for routine dependency updates while keeping the merge path constrained and auditable. The workflow uses a dedicated gitlint config and custom rule to ensure each eligible commit changes exactly flake.lock. Documentation is added for the required GitHub App, secrets, permissions, and branch-ruleset bypass setup. Signed-off-by: Paul Kroeher <paul.kroeher@cyberus-technology.de> On-behalf-of: SAP paul.kroeher@sap.com

pkr4711 requested review from amphi and phip1611 May 5, 2026 08:55

pkr4711 force-pushed the auto-flake-bump branch from a653820 to 81dfd61 Compare May 5, 2026 09:05

amphi reviewed May 7, 2026

View reviewed changes

Comment thread .github/workflows/flake-bump.yaml Outdated

Comment thread .github/workflows/flake-bump.yaml Outdated

phip1611 reviewed May 7, 2026

View reviewed changes

Comment thread .github/workflows/flake-bump.yaml Outdated

pkr4711 force-pushed the auto-flake-bump branch from 81dfd61 to 4df33bc Compare May 8, 2026 05:52

pkr4711 force-pushed the auto-flake-bump branch from 4df33bc to 0350f6a Compare May 8, 2026 06:04

amphi approved these changes May 8, 2026

View reviewed changes

pkr4711 self-assigned this May 19, 2026

pkr4711 marked this pull request as draft May 19, 2026 06:07

pkr4711 marked this pull request as ready for review May 21, 2026 04:26

phip1611 reviewed May 21, 2026

View reviewed changes

posk-io and others added 12 commits May 21, 2026 07:48

virtio-devices: introduce ActivationContext for device activation

2a6fe8e

Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit f77c6ef) On-behalf-of: SAP paul.kroeher@sap.com

virtio-devices: switch driver_status to Arc<AtomicU8>

779ee4f

Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit 21bd3ae) On-behalf-of: SAP paul.kroeher@sap.com

virtio-devices: wire driver_status to EpollHandler

c2bbbcb

Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit b5053ae) On-behalf-of: SAP paul.kroeher@sap.com

virtio-devices: block: handle corrupted requests with NEEDS_RESET

2a11aba

Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit 8b60b38) On-behalf-of: SAP paul.kroeher@sap.com

build: Release v51.2

ff770cc

Signed-off-by: Bo Chen <bchen@crusoe.ai> On-behalf-of: SAP paul.kroeher@sap.com

pkr4711 force-pushed the auto-flake-bump branch from ba9b0a6 to 91666d4 Compare May 21, 2026 05:49

pkr4711 closed this May 21, 2026

pkr4711 mentioned this pull request May 21, 2026

implementation of flake bump auto approve and merge #159

Merged

Conversation

pkr4711 commented May 5, 2026

Uh oh!

Uh oh!

Uh oh!

phip1611 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

phip1611 commented May 18, 2026

Uh oh!

phip1611 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

phip1611 commented May 21, 2026 •

edited

Loading