feat(agent): add LVM on file-backed loop devices (spec.fileDevices)#302
Open
duckhawk wants to merge 13 commits into
Open
feat(agent): add LVM on file-backed loop devices (spec.fileDevices)#302duckhawk wants to merge 13 commits into
duckhawk wants to merge 13 commits into
Conversation
Agents: Claude Opus 4.6 (ведущий, авто-режим)
…er marker provisionFileDevices used to allocate a fresh loop minor on every retry because `losetup --find --show` does not deduplicate, leaking loop devices until the system-wide limit. It also left half-attached files on the disk whenever any later step (CreatePV, vgcreate) failed because there was no compensating cleanup. cleanupFileDevices walked status.nodes[].fileDevices only, so a crash between fallocate and the next discoverer pass would leak the backing files forever. It also swallowed losetup errors, allowing the finalizer to be removed while a busy loop device was still alive. The basename was tied to the slice index (`sds-<lvg>-<i>.img`), so any reorder/insert/delete in `spec.fileDevices` silently retargeted an existing entry to a different file. This commit rewrites both routines on top of two helpers (BuildFileDevicePath / IsManagedFileDevicePath) that derive the basename from a SHA-256 prefix of (directory, size). The pattern doubles as the owner marker: the cleanup refuses to rm any path whose basename does not match, which protects unrelated host files (libvirt qcow2, snap loops, …) from being clobbered if a foreign loop PV ever slips into status. Concrete changes: - provisionFileDevices now queries FindLoopDeviceByFile first and reuses the existing loop device if there is one; otherwise it tracks every file it creates and every loop it attaches, and rolls them back in LIFO order when any step in the loop fails. - cleanupFileDevices walks the union of spec.fileDevices and status.nodes[].fileDevices, returns an error when any losetup -d fails (so the finalizer stays in place and the reconcile retries), and refuses to act on paths that do not match the managed pattern. - validateFileDevice rejects relative directories and `..` segments — defense in depth against runaway fallocate in arbitrary locations. - reconcileLVGDeleteFunc now propagates the cleanup error, sets TypeVGConfigurationApplied=False/Terminating, and keeps the finalizer until the next pass succeeds. Agents: Claude Opus 4.7 (ведущий, авто-режим)
…n on failure ReattachFileDevices used to be fire-and-forget: a missing backing file or an EIO on losetup --find --show was logged at Warning level and the function returned cleanly. ActivateAllManagedVGs then proceeded to bring up VGs whose PVs were still missing, which left the affected volume groups in a degraded state with no user-visible signal — the first symptom was usually a confusing "device busy" or "no such file" when a PVC tried to mount much later. The function now collects every per-device failure, joins them with errors.Join, and returns a non-nil error if anything failed. The startup runnable in cmd/main.go skips ActivateAllManagedVGs for this pass when the reattach failed and logs the error so it is visible in pod logs and pickable by alerting on agent error rate. Agents: Claude Opus 4.7 (ведущий, авто-режим)
The discoverer used to run a private `nsenter -t 1 -m -- /bin/cat /sys/block/<loop>/loop/backing_file` per loop PV per discovery cycle. This duplicated the nsenter argv that already lives in nsentrerExpendedArgs, bypassed exec.CommandContext (no timeout, the reconciler would hang on a stuck cat), and ran an external process from the discoverer's domain layer — outside the Commands interface that all other host commands flow through and that tests mock. This commit folds the read into a new Commands method, GetLoopBackingFile, implemented on top of `losetup --noheadings --output BACK-FILE <loopDev>`. The discoverer now calls it with a 10s context.WithTimeout, the gomock recorder is regenerated, and fileDevice discovery is wired through the owner marker added in the previous commit: any loop PV whose basename does not match the managed `sds-<lvg>-<hash>.img` pattern (or whose VG carries no storage.deckhouse.io/lvmVolumeGroupName tag) is skipped with a warning instead of being written into status. While here, drop the redundant `Nodes: nil, FileDeviceNodes: nil` literal that was assigned and then immediately overwritten, and the now-unused bytes/os/exec imports in discoverer.go. Agents: Claude Opus 4.7 (ведущий, авто-режим)
CRD validation now rejects in-place edits of fileDevices[].directory and fileDevices[].size via `x-kubernetes-validations: self == oldSelf`. docs/FAQ.md already promised "No resize: file device size cannot be changed after creation", but nothing actually enforced it: increasing size would fallocate-extend the file while the loop device kept its captured (old) size, and shrinking would leave a file larger than the spec entry asked for. With the basename derived from (directory, size) this would also silently retarget the entry to a different file. The Go side gets a small bit of housekeeping: LVMVolumeGroupTemplate in lvm_volume_group_set.go now uses `json:"blockDeviceSelector,omitempty"`, matching the change applied to LVMVolumeGroupSpec in this PR. Without it, marshalling a template with no selector would emit `"blockDeviceSelector":null` while marshalling the spec form would omit the field entirely. The struct field alignment of both files is re-`gofmt`-ed for the columns to line up after the json-tag change. Agents: Claude Opus 4.7 (ведущий, авто-режим)
…ming Adds unit tests for the safety properties added in the previous commits, all driven through gomock against the Commands interface: - BuildFileDevicePath / IsManagedFileDevicePath: determinism, that different (directory, size) tuples produce different basenames, and that foreign paths (libvirt qcow2, snap squashfs, a path with a too short or non-hex hash) are correctly rejected even when an empty lvgName is passed. - ReattachFileDevices: already-attached files are skipped (no fresh losetup --find --show, which would have leaked loops); a per-device losetup failure does not abort the loop and the joined error surfaces every failing file path; FindLoopDeviceByFile errors are propagated. - provisionFileDevices: reuses an existing loop attachment, and on partial failure rolls back both the file and the loop in LIFO order so a retry sees a clean slate. - cleanupFileDevices: walks the union of spec.fileDevices and status.nodes[].fileDevices (so files created but never reflected in status are still removed), returns an error when losetup -d fails (so the finalizer cannot be removed while a busy loop is still alive), and refuses to act on paths whose basename does not match the managed pattern. - validateFileDevice: rejects relative directories and `..` segments. Agents: Claude Opus 4.7 (ведущий, авто-режим)
POSIX text files end with a newline; YAML tools and shell editors otherwise show "No newline at end of file" on every diff. Agents: Claude Opus 4.7 (ведущий, авто-режим)
7ae49b3 to
ec49aec
Compare
Agents: Claude Opus 4.7 (ведущий, авто-режим)
Add context and command timeouts for loop/file-device operations, validate duplicate spec entries and directory writability, restore host LVM loop global_filter, fix CRD fileDevices list map semantics, and update tests/mocks.
Signed-off-by: v.oleynikov <vasily.oleynikov@flant.com>
Transition rule перенесена с полей элементов map-list на spec/lvmVolumeGroupTemplate: oldSelf на item-level внутри fileDevices отклонялся apiserver как uncorrelatable и превышал CEL cost budget. Добавлены maxItems и maxLength для снижения стоимости правил. Agents: Composer 2.5 (ведущий, авто-режим)
reattachFileDevicesAtStartup использует time.Duration; без импорта падает сборка agent-golang-artifact. Agents: Composer 2.5 (ведущий, авто-режим)
Signed-off-by: v.oleynikov <vasily.oleynikov@flant.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add support for creating LVM Volume Groups on top of file-backed loop devices via a new
spec.fileDevicesfield inLVMVolumeGroupCRD. The agent creates preallocated files (fallocate), attaches them as loop devices (losetup), and uses them as LVM Physical Volumes. File devices can be used standalone or combined withblockDeviceSelectorin a single VG.Key changes:
LVMVolumeGroupandLVMVolumeGroupSetCRDs withspec.fileDevicesandstatus.nodes[].fileDevicesloopfromLVMGlobalFilterandForeignDeviceBasePrefixesto allow LVM to see managed loop PVsCreateFileDevice,SetupLoopDevice,DetachLoopDevice,FindLoopDeviceByFile,RemoveFileDevicestatus.nodes[].fileDevicesNodeGroupConfigurationto remove LVM loop blacklist (multipath blacklist retained)Why do we need it, and what problem does it solve?
When dedicated block devices are not available on a node, there is no way to create an LVM Volume Group using the current
sds-node-configurator. This feature allows users to allocate part of an existing filesystem for LVM by specifying a directory and size inspec.fileDevices, enabling LVM-based storage provisioning without dedicated disks.What is the expected result?
LVMVolumeGroupwithspec.fileDevicesprovisions preallocated files, attaches them as loop devices, creates PVs and a VGstatus.nodes[].fileDevicesreflects the file path, loop device, size, and PV UUID for each file deviceLVMVolumeGroupwith file devices detaches loop devices and removes backing filesfileDevicesandblockDeviceSelectorcan be used together in a single VGstorage.deckhouse.io/enabled=truetag) are safely ignoredChecklist