Skip to content

Implementation of flake bump auto approve and merge#154

Closed
pkr4711 wants to merge 13 commits into
cyberus-technology:gardenlinux-v51from
pkr4711:auto-flake-bump
Closed

Implementation of flake bump auto approve and merge#154
pkr4711 wants to merge 13 commits into
cyberus-technology:gardenlinux-v51from
pkr4711:auto-flake-bump

Conversation

@pkr4711
Copy link
Copy Markdown

@pkr4711 pkr4711 commented May 5, 2026

for details please look into: ci/README.auto.approve.md

Pipeline tests are here

@pkr4711 pkr4711 requested review from amphi and phip1611 May 5, 2026 08:55
@pkr4711 pkr4711 force-pushed the auto-flake-bump branch from a653820 to 81dfd61 Compare May 5, 2026 09:05
Comment thread .github/workflows/flake-bump.yaml Outdated
Comment thread .github/workflows/flake-bump.yaml Outdated
Copy link
Copy Markdown
Member

@phip1611 phip1611 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if you elaborate on the git commit message a little. something like:

ci: implementation of flake bump auto approve and merge

Add a GitHub workflow to automatically validate and merge pull requests
that only bump flake.lock. This reduces manual review overhead for routine
dependency updates while keeping the merge path constrained and auditable.

The workflow uses a dedicated gitlint config and custom rule to ensure each
eligible commit changes exactly flake.lock. Documentation is added for the
required GitHub App, secrets, permissions, and branch-ruleset bypass setup.

Signed-off-by: Paul Kroeher <paul.kroeher@cyberus-technology.de>
On-behalf-of: SAP paul.kroeher@sap.com

Comment thread .github/workflows/flake-bump.yaml Outdated
@pkr4711 pkr4711 force-pushed the auto-flake-bump branch from 81dfd61 to 4df33bc Compare May 8, 2026 05:52
Add a GitHub workflow to automatically validate and merge
pull requests that only bump flake.lock.
This reduces manual review overhead for routine
dependency updates while keeping the merge path
constrained and auditable.

The workflow uses a dedicated gitlint config and
custom rule to ensure each eligible commit changes
exactly flake.lock. Documentation is added for the
required GitHub App, secrets, permissions, and
branch-ruleset bypass setup.

Signed-off-by: Paul Kroeher <paul.kroeher@cyberus-technology.de>

On-behalf-of: SAP paul.kroeher@sap.com
@pkr4711 pkr4711 force-pushed the auto-flake-bump branch from 4df33bc to 0350f6a Compare May 8, 2026 06:04
@phip1611
Copy link
Copy Markdown
Member

thanks! I'd like to postpone this until #153 is merged - hopefully by the end of the week

@pkr4711 pkr4711 self-assigned this May 19, 2026
@pkr4711 pkr4711 marked this pull request as draft May 19, 2026 06:07
@pkr4711 pkr4711 marked this pull request as ready for review May 21, 2026 04:26
@phip1611
Copy link
Copy Markdown
Member

phip1611 commented May 21, 2026

Because of our release progress, this PR now targets the old branch:

image

Please adjust the target branch to gardenlinux, rebase the PR and then we can proceed.

OWNER=$(echo ${GITHUB_REPOSITORY} | cut -f 1 -d '/')
echo "repo=$REPO" >> "$GITHUB_OUTPUT"
echo "owner=$OWNER" >> "$GITHUB_OUTPUT"
- name: Generate token
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general question. Is all of this needed? In another repository, I can auto-approve all PRs automatically with much less boilerplate:

https://github.com/phip1611/spectrum-analyzer/blob/1e93a225b066b06821d99fe042a87ef304bc7c1d/.github/workflows/dependabot-auto-merge.yml#L21

WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to bikeshed here and perfectionize, just a general question.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any PR must be reviewed by 2 persons. If we want to auto approve & merge this PRs we need to define a bypass rule. This can be done with a dedicated GithubApp.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look into the README.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks! :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve once rebased and when the PR targets the right branch again.

Comment thread .github/workflows/flake-bump.yaml Outdated
@@ -0,0 +1,72 @@
name: Flake bump
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this file correctly, this should be flake bump auto-approve as it just approves bump PRs but doesn't create such PRs, right?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto approval of a PR was the task. The user has to create the PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Sorry, very nit request, but could you please rename to flake-bump-auto-approve or so? I think it is confusing as the name reads like a scheduled job that automatically bumps the flake inputs

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes of course

posk-io and others added 12 commits May 21, 2026 07:48
Signed-off-by: Peter Oskolkov <posk@google.com>
(cherry picked from commit f77c6ef)

On-behalf-of: SAP paul.kroeher@sap.com
Signed-off-by: Peter Oskolkov <posk@google.com>
(cherry picked from commit 21bd3ae)

On-behalf-of: SAP paul.kroeher@sap.com
Signed-off-by: Peter Oskolkov <posk@google.com>
(cherry picked from commit b5053ae)

On-behalf-of: SAP paul.kroeher@sap.com
A buggy or malicious guest may write an inappropriate value into
virtqueue's next_avail field. This will result in an error
when iterating over the queue:

https://github.com/rust-vmm/vm-virtio/blob/863837ef863f6880bb8357e60bbac49e72c0844c/virtio-queue/src/queue.rs#L708

but this error is (logged and) ignored if pop_descriptor_chain()
is used:

https://github.com/rust-vmm/vm-virtio/blob/863837ef863f6880bb8357e60bbac49e72c0844c/virtio-queue/src/queue.rs#L583

A reasonable approach, implemented here, is to mark the device as
NEEDS_RESET and ignore further queue events until the guest
reinitializes the device.

How this patch was tested:

Linux kernel was patched to trigger a bad next_avail when the
virtqueue queue counter reaches 5000:

--------------- START OF LINUX KERNEL PATCH ----------
$ git diff
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index b784aab668670..989f2a0c64a77 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -15,6 +15,9 @@
 #include <linux/spinlock.h>
 #include <xen/xen.h>

+
+void virtqueue_kick_always(struct virtqueue *vq);
+
 #ifdef DEBUG
 /* For development, we want to crash whenever the ring is screwed. */
 #define BAD_RING(_vq, fmt, args...)                            \
@@ -677,6 +680,12 @@ static inline int virtqueue_add_split(
                   struct virtqueue *_vq,
         * new available array entries. */
        virtio_wmb(vq->weak_barriers);
        vq->split.avail_idx_shadow++;
+       {
+        if ((vq->split.avail_idx_shadow % 100) == 0)
+            printk(KERN_ERR "avail idx: %d",
+                  (int)vq->split.avail_idx_shadow);
+               if (vq->split.avail_idx_shadow == 5000)
+               vq->split.avail_idx_shadow = 0;
+       }
        vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
                                      vq->split.avail_idx_shadow);
        vq->num_added++;
@@ -689,6 +698,11 @@ static inline int virtqueue_add_split(
                  struct virtqueue *_vq,
        if (unlikely(vq->num_added == (1 << 16) - 1))
                virtqueue_kick(_vq);

+       {
+               if (unlikely(vq->split.avail_idx_shadow == 0))
+                       virtqueue_kick_always(_vq);
+       }
+
        return 0;

 unmap_release:
@@ -2515,6 +2529,11 @@ bool virtqueue_kick(struct virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_kick);

+void virtqueue_kick_always(struct virtqueue *vq)
+{
+       virtqueue_kick_prepare(vq);
+       virtqueue_notify(vq);
+}
 /**
  * virtqueue_get_buf_ctx - get the next used buffer
  * @_vq: the struct virtqueue we're talking about.
--------------- END OF LINUX KERNEL PATCH ----------

Then the kernel was booted, and the host pinged until the
nic became unresponsive:

ping -i 0.002 192.168.4.1

Device status was confirmed using

cat /sys/class/net/eth0/device/status

(it was 0x4f).

Then the device was re-initialized:

DEV_NAME=$(basename $(readlink -f /sys/class/net/eth0/device))
echo $DEV_NAME | tee /sys/bus/virtio/drivers/virtio_net/unbind
echo $DEV_NAME | tee /sys/bus/virtio/drivers/virtio_net/bind
ip link set eth0 up

At this point networking became healthly again.

Signed-off-by: Peter Oskolkov <posk@google.com>
(cherry picked from commit 563303b)

On-behalf-of: SAP paul.kroeher@sap.com
Signed-off-by: Peter Oskolkov <posk@google.com>
(cherry picked from commit 8b60b38)

On-behalf-of: SAP paul.kroeher@sap.com
A malicious or buggy guest can violate virtio by making the same
descriptor head available twice before the first chain has been placed
on the used ring. The submit path pushed both chains onto the
VecDeque-backed inflight_requests keyed by head_index, and on completion
find_inflight_request() returned the first linear match. That Request's
complete_async() freed its bounce buffer while the other chain's
io_uring op was still targeting it, producing a use-after-free the
kernel could then scribble into.

Signed-off-by: Dylan Reid <dgreid@fb.com>
(cherry picked from commit 544fa4a)

On-behalf-of: SAP paul.kroeher@sap.com
For non-batch backends execute_async submits the kernel I/O inline
before returning. An early return while processing before inserting in
inflight_requests, meant the request went untracked, the local batch
list was never appended to inflight_requests, even though the request is
pending in the kernel.

To track it, insert into self.inflight_requests as soon as execute_async
returns Ok. The completion path's find_inflight_request now matches the
orphan and the bounce buffer is freed only after the kernel signals it
is done.

Signed-off-by: Dylan Reid <dgreid@fb.com>
(cherry picked from commit fa8acbd)

On-behalf-of: SAP paul.kroeher@sap.com
The bounce buffer for an unaligned descriptor was allocated in
execute_async and leaked on error paths, even though, for the sync
case the kernel already had a pointer to the buffer.

Clean this up by moving ownership of the buffer to the
AlignedOperation type. To make it actually safe, stop stashing a
guest memory pointer for the duration of the op. Instead, save the
guest address and pass guest memory back to the complete function.

Signed-off-by: Dylan Reid <dgreid@fb.com>
(cherry picked from commit 1b8c92dd5e3c0c58316826486ce5ee30eeb71407))
[backport: adapted to stable/v51.x; v51.x has no block/src/request.rs
 split, so the new aligned_operation module is added next to
 block/src/lib.rs and the in tree struct, alloc, free path is
 replaced in place.]
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>

On-behalf-of: SAP paul.kroeher@sap.com
submit_batch_requests pushed each BatchRequest into the io_uring SQ in
turn and used `?` to bail on the first push failure.
Leaving the initial SQEs visible to the kernel — but submitter.submit()
was never called, and every other call site in this file gates submit()
behind a preceding sq.push() that now also fails on the full ring.

This could allow a guest to DoS it's own queue or worse if the buffer is
freed early.

Signed-off-by: Dylan Reid <dgreid@fb.com>
(cherry picked from commit ee315d2)

On-behalf-of: SAP paul.kroeher@sap.com
Signed-off-by: Bo Chen <bchen@crusoe.ai>

On-behalf-of: SAP paul.kroeher@sap.com
A restore snapshot is only construction input. The restore path should
use it while rebuilding the VM from saved state, then discard it before
later VM lifecycle operations run.

Keeping it around is observable after a restored VM changes its device
set. For example, a VM can be restored from a snapshot, live-migrated,
and then hot-remove a device on the destination. The restored device
tree no longer contains that device, but DeviceManager still carries the
original device snapshot.

That stale snapshot can affect later hotplug. Every added device
eventually reaches a constructor that calls
state_from_id(self.snapshot.as_ref(), id) or an equivalent snapshot
lookup. If the stale snapshot still contains an entry with the newly
added device name, the new device is created with old saved state even
though the matching device was already removed from the live device
tree.

Clear the DeviceManager snapshot after hypervisor-specific
initialization has rebuilt the interrupt controller and devices. From
that point onward, device operations should behave as normal runtime
operations, not as restore operations.

Co-developed-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
Co-developed-by: Leander Kohler <leander.kohler@cyberus-technology.de>

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
Add a GitHub workflow to automatically validate and merge
pull requests that only bump flake.lock.
This reduces manual review overhead for routine
dependency updates while keeping the merge path
constrained and auditable.

The workflow uses a dedicated gitlint config and
custom rule to ensure each eligible commit changes
exactly flake.lock. Documentation is added for the
required GitHub App, secrets, permissions, and
branch-ruleset bypass setup.

Signed-off-by: Paul Kroeher <paul.kroeher@cyberus-technology.de>

On-behalf-of: SAP paul.kroeher@sap.com
@pkr4711 pkr4711 force-pushed the auto-flake-bump branch from ba9b0a6 to 91666d4 Compare May 21, 2026 05:49
@pkr4711 pkr4711 closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants