Implementation of flake bump auto approve and merge#154
Conversation
phip1611
left a comment
There was a problem hiding this comment.
I'd prefer if you elaborate on the git commit message a little. something like:
ci: implementation of flake bump auto approve and merge
Add a GitHub workflow to automatically validate and merge pull requests
that only bump flake.lock. This reduces manual review overhead for routine
dependency updates while keeping the merge path constrained and auditable.
The workflow uses a dedicated gitlint config and custom rule to ensure each
eligible commit changes exactly flake.lock. Documentation is added for the
required GitHub App, secrets, permissions, and branch-ruleset bypass setup.
Signed-off-by: Paul Kroeher <paul.kroeher@cyberus-technology.de>
On-behalf-of: SAP paul.kroeher@sap.com
Add a GitHub workflow to automatically validate and merge pull requests that only bump flake.lock. This reduces manual review overhead for routine dependency updates while keeping the merge path constrained and auditable. The workflow uses a dedicated gitlint config and custom rule to ensure each eligible commit changes exactly flake.lock. Documentation is added for the required GitHub App, secrets, permissions, and branch-ruleset bypass setup. Signed-off-by: Paul Kroeher <paul.kroeher@cyberus-technology.de> On-behalf-of: SAP paul.kroeher@sap.com
|
thanks! I'd like to postpone this until #153 is merged - hopefully by the end of the week |
| OWNER=$(echo ${GITHUB_REPOSITORY} | cut -f 1 -d '/') | ||
| echo "repo=$REPO" >> "$GITHUB_OUTPUT" | ||
| echo "owner=$OWNER" >> "$GITHUB_OUTPUT" | ||
| - name: Generate token |
There was a problem hiding this comment.
Just a general question. Is all of this needed? In another repository, I can auto-approve all PRs automatically with much less boilerplate:
WDYT?
There was a problem hiding this comment.
No need to bikeshed here and perfectionize, just a general question.
There was a problem hiding this comment.
Any PR must be reviewed by 2 persons. If we want to auto approve & merge this PRs we need to define a bypass rule. This can be done with a dedicated GithubApp.
There was a problem hiding this comment.
I'll approve once rebased and when the PR targets the right branch again.
| @@ -0,0 +1,72 @@ | |||
| name: Flake bump | |||
There was a problem hiding this comment.
If I read this file correctly, this should be flake bump auto-approve as it just approves bump PRs but doesn't create such PRs, right?
There was a problem hiding this comment.
auto approval of a PR was the task. The user has to create the PR.
There was a problem hiding this comment.
Got it! Sorry, very nit request, but could you please rename to flake-bump-auto-approve or so? I think it is confusing as the name reads like a scheduled job that automatically bumps the flake inputs
Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit f77c6ef) On-behalf-of: SAP paul.kroeher@sap.com
Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit 21bd3ae) On-behalf-of: SAP paul.kroeher@sap.com
Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit b5053ae) On-behalf-of: SAP paul.kroeher@sap.com
A buggy or malicious guest may write an inappropriate value into virtqueue's next_avail field. This will result in an error when iterating over the queue: https://github.com/rust-vmm/vm-virtio/blob/863837ef863f6880bb8357e60bbac49e72c0844c/virtio-queue/src/queue.rs#L708 but this error is (logged and) ignored if pop_descriptor_chain() is used: https://github.com/rust-vmm/vm-virtio/blob/863837ef863f6880bb8357e60bbac49e72c0844c/virtio-queue/src/queue.rs#L583 A reasonable approach, implemented here, is to mark the device as NEEDS_RESET and ignore further queue events until the guest reinitializes the device. How this patch was tested: Linux kernel was patched to trigger a bad next_avail when the virtqueue queue counter reaches 5000: --------------- START OF LINUX KERNEL PATCH ---------- $ git diff diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index b784aab668670..989f2a0c64a77 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -15,6 +15,9 @@ #include <linux/spinlock.h> #include <xen/xen.h> + +void virtqueue_kick_always(struct virtqueue *vq); + #ifdef DEBUG /* For development, we want to crash whenever the ring is screwed. */ #define BAD_RING(_vq, fmt, args...) \ @@ -677,6 +680,12 @@ static inline int virtqueue_add_split( struct virtqueue *_vq, * new available array entries. */ virtio_wmb(vq->weak_barriers); vq->split.avail_idx_shadow++; + { + if ((vq->split.avail_idx_shadow % 100) == 0) + printk(KERN_ERR "avail idx: %d", + (int)vq->split.avail_idx_shadow); + if (vq->split.avail_idx_shadow == 5000) + vq->split.avail_idx_shadow = 0; + } vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->split.avail_idx_shadow); vq->num_added++; @@ -689,6 +698,11 @@ static inline int virtqueue_add_split( struct virtqueue *_vq, if (unlikely(vq->num_added == (1 << 16) - 1)) virtqueue_kick(_vq); + { + if (unlikely(vq->split.avail_idx_shadow == 0)) + virtqueue_kick_always(_vq); + } + return 0; unmap_release: @@ -2515,6 +2529,11 @@ bool virtqueue_kick(struct virtqueue *vq) } EXPORT_SYMBOL_GPL(virtqueue_kick); +void virtqueue_kick_always(struct virtqueue *vq) +{ + virtqueue_kick_prepare(vq); + virtqueue_notify(vq); +} /** * virtqueue_get_buf_ctx - get the next used buffer * @_vq: the struct virtqueue we're talking about. --------------- END OF LINUX KERNEL PATCH ---------- Then the kernel was booted, and the host pinged until the nic became unresponsive: ping -i 0.002 192.168.4.1 Device status was confirmed using cat /sys/class/net/eth0/device/status (it was 0x4f). Then the device was re-initialized: DEV_NAME=$(basename $(readlink -f /sys/class/net/eth0/device)) echo $DEV_NAME | tee /sys/bus/virtio/drivers/virtio_net/unbind echo $DEV_NAME | tee /sys/bus/virtio/drivers/virtio_net/bind ip link set eth0 up At this point networking became healthly again. Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit 563303b) On-behalf-of: SAP paul.kroeher@sap.com
Signed-off-by: Peter Oskolkov <posk@google.com> (cherry picked from commit 8b60b38) On-behalf-of: SAP paul.kroeher@sap.com
A malicious or buggy guest can violate virtio by making the same descriptor head available twice before the first chain has been placed on the used ring. The submit path pushed both chains onto the VecDeque-backed inflight_requests keyed by head_index, and on completion find_inflight_request() returned the first linear match. That Request's complete_async() freed its bounce buffer while the other chain's io_uring op was still targeting it, producing a use-after-free the kernel could then scribble into. Signed-off-by: Dylan Reid <dgreid@fb.com> (cherry picked from commit 544fa4a) On-behalf-of: SAP paul.kroeher@sap.com
For non-batch backends execute_async submits the kernel I/O inline before returning. An early return while processing before inserting in inflight_requests, meant the request went untracked, the local batch list was never appended to inflight_requests, even though the request is pending in the kernel. To track it, insert into self.inflight_requests as soon as execute_async returns Ok. The completion path's find_inflight_request now matches the orphan and the bounce buffer is freed only after the kernel signals it is done. Signed-off-by: Dylan Reid <dgreid@fb.com> (cherry picked from commit fa8acbd) On-behalf-of: SAP paul.kroeher@sap.com
The bounce buffer for an unaligned descriptor was allocated in execute_async and leaked on error paths, even though, for the sync case the kernel already had a pointer to the buffer. Clean this up by moving ownership of the buffer to the AlignedOperation type. To make it actually safe, stop stashing a guest memory pointer for the duration of the op. Instead, save the guest address and pass guest memory back to the complete function. Signed-off-by: Dylan Reid <dgreid@fb.com> (cherry picked from commit 1b8c92dd5e3c0c58316826486ce5ee30eeb71407)) [backport: adapted to stable/v51.x; v51.x has no block/src/request.rs split, so the new aligned_operation module is added next to block/src/lib.rs and the in tree struct, alloc, free path is replaced in place.] Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com> On-behalf-of: SAP paul.kroeher@sap.com
submit_batch_requests pushed each BatchRequest into the io_uring SQ in turn and used `?` to bail on the first push failure. Leaving the initial SQEs visible to the kernel — but submitter.submit() was never called, and every other call site in this file gates submit() behind a preceding sq.push() that now also fails on the full ring. This could allow a guest to DoS it's own queue or worse if the buffer is freed early. Signed-off-by: Dylan Reid <dgreid@fb.com> (cherry picked from commit ee315d2) On-behalf-of: SAP paul.kroeher@sap.com
Signed-off-by: Bo Chen <bchen@crusoe.ai> On-behalf-of: SAP paul.kroeher@sap.com
A restore snapshot is only construction input. The restore path should use it while rebuilding the VM from saved state, then discard it before later VM lifecycle operations run. Keeping it around is observable after a restored VM changes its device set. For example, a VM can be restored from a snapshot, live-migrated, and then hot-remove a device on the destination. The restored device tree no longer contains that device, but DeviceManager still carries the original device snapshot. That stale snapshot can affect later hotplug. Every added device eventually reaches a constructor that calls state_from_id(self.snapshot.as_ref(), id) or an equivalent snapshot lookup. If the stale snapshot still contains an entry with the newly added device name, the new device is created with old saved state even though the matching device was already removed from the live device tree. Clear the DeviceManager snapshot after hypervisor-specific initialization has rebuilt the interrupt controller and devices. From that point onward, device operations should behave as normal runtime operations, not as restore operations. Co-developed-by: Thomas Prescher <thomas.prescher@cyberus-technology.de> Co-developed-by: Leander Kohler <leander.kohler@cyberus-technology.de> On-behalf-of: SAP philipp.schuster@sap.com Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
Add a GitHub workflow to automatically validate and merge pull requests that only bump flake.lock. This reduces manual review overhead for routine dependency updates while keeping the merge path constrained and auditable. The workflow uses a dedicated gitlint config and custom rule to ensure each eligible commit changes exactly flake.lock. Documentation is added for the required GitHub App, secrets, permissions, and branch-ruleset bypass setup. Signed-off-by: Paul Kroeher <paul.kroeher@cyberus-technology.de> On-behalf-of: SAP paul.kroeher@sap.com

for details please look into:
ci/README.auto.approve.mdPipeline tests are here