Skip to content

feat(agent): add LVM on file-backed loop devices (spec.fileDevices)#302

Open
duckhawk wants to merge 13 commits into
mainfrom
feat/lvm-file-backed-devices
Open

feat(agent): add LVM on file-backed loop devices (spec.fileDevices)#302
duckhawk wants to merge 13 commits into
mainfrom
feat/lvm-file-backed-devices

Conversation

@duckhawk

Copy link
Copy Markdown
Member

Description

Add support for creating LVM Volume Groups on top of file-backed loop devices via a new spec.fileDevices field in LVMVolumeGroup CRD. The agent creates preallocated files (fallocate), attaches them as loop devices (losetup), and uses them as LVM Physical Volumes. File devices can be used standalone or combined with blockDeviceSelector in a single VG.

Key changes:

  • Extended LVMVolumeGroup and LVMVolumeGroupSet CRDs with spec.fileDevices and status.nodes[].fileDevices
  • Removed loop from LVMGlobalFilter and ForeignDeviceBasePrefixes to allow LVM to see managed loop PVs
  • Added 5 new agent commands: CreateFileDevice, SetupLoopDevice, DetachLoopDevice, FindLoopDeviceByFile, RemoveFileDevice
  • Implemented file device provisioning in create flow and cleanup in delete flow
  • Added startup recovery: agent re-attaches loop devices before VG activation after node reboot
  • Extended discoverer to reflect file-backed PVs in status.nodes[].fileDevices
  • Updated NodeGroupConfiguration to remove LVM loop blacklist (multipath blacklist retained)

Why do we need it, and what problem does it solve?

When dedicated block devices are not available on a node, there is no way to create an LVM Volume Group using the current sds-node-configurator. This feature allows users to allocate part of an existing filesystem for LVM by specifying a directory and size in spec.fileDevices, enabling LVM-based storage provisioning without dedicated disks.

What is the expected result?

  • Creating an LVMVolumeGroup with spec.fileDevices provisions preallocated files, attaches them as loop devices, creates PVs and a VG
  • status.nodes[].fileDevices reflects the file path, loop device, size, and PV UUID for each file device
  • After a node reboot, the agent automatically re-attaches loop devices before activating VGs
  • Deleting an LVMVolumeGroup with file devices detaches loop devices and removes backing files
  • fileDevices and blockDeviceSelector can be used together in a single VG
  • Loop devices that are NOT managed by the agent (no storage.deckhouse.io/enabled=true tag) are safely ignored

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

duckhawk added 7 commits June 23, 2026 18:30
Agents: Claude Opus 4.6 (ведущий, авто-режим)
…er marker

provisionFileDevices used to allocate a fresh loop minor on every retry
because `losetup --find --show` does not deduplicate, leaking loop
devices until the system-wide limit. It also left half-attached files
on the disk whenever any later step (CreatePV, vgcreate) failed because
there was no compensating cleanup.

cleanupFileDevices walked status.nodes[].fileDevices only, so a crash
between fallocate and the next discoverer pass would leak the backing
files forever. It also swallowed losetup errors, allowing the finalizer
to be removed while a busy loop device was still alive.

The basename was tied to the slice index (`sds-<lvg>-<i>.img`), so any
reorder/insert/delete in `spec.fileDevices` silently retargeted an
existing entry to a different file.

This commit rewrites both routines on top of two helpers
(BuildFileDevicePath / IsManagedFileDevicePath) that derive the basename
from a SHA-256 prefix of (directory, size). The pattern doubles as the
owner marker: the cleanup refuses to rm any path whose basename does
not match, which protects unrelated host files (libvirt qcow2, snap
loops, …) from being clobbered if a foreign loop PV ever slips into
status.

Concrete changes:

- provisionFileDevices now queries FindLoopDeviceByFile first and reuses
  the existing loop device if there is one; otherwise it tracks every
  file it creates and every loop it attaches, and rolls them back in
  LIFO order when any step in the loop fails.
- cleanupFileDevices walks the union of spec.fileDevices and
  status.nodes[].fileDevices, returns an error when any losetup -d
  fails (so the finalizer stays in place and the reconcile retries),
  and refuses to act on paths that do not match the managed pattern.
- validateFileDevice rejects relative directories and `..` segments —
  defense in depth against runaway fallocate in arbitrary locations.
- reconcileLVGDeleteFunc now propagates the cleanup error, sets
  TypeVGConfigurationApplied=False/Terminating, and keeps the
  finalizer until the next pass succeeds.

Agents: Claude Opus 4.7 (ведущий, авто-режим)
…n on failure

ReattachFileDevices used to be fire-and-forget: a missing backing file
or an EIO on losetup --find --show was logged at Warning level and the
function returned cleanly. ActivateAllManagedVGs then proceeded to
bring up VGs whose PVs were still missing, which left the affected
volume groups in a degraded state with no user-visible signal — the
first symptom was usually a confusing "device busy" or "no such file"
when a PVC tried to mount much later.

The function now collects every per-device failure, joins them with
errors.Join, and returns a non-nil error if anything failed. The
startup runnable in cmd/main.go skips ActivateAllManagedVGs for this
pass when the reattach failed and logs the error so it is visible in
pod logs and pickable by alerting on agent error rate.

Agents: Claude Opus 4.7 (ведущий, авто-режим)
The discoverer used to run a private `nsenter -t 1 -m -- /bin/cat
/sys/block/<loop>/loop/backing_file` per loop PV per discovery cycle.
This duplicated the nsenter argv that already lives in
nsentrerExpendedArgs, bypassed exec.CommandContext (no timeout, the
reconciler would hang on a stuck cat), and ran an external process
from the discoverer's domain layer — outside the Commands interface
that all other host commands flow through and that tests mock.

This commit folds the read into a new Commands method,
GetLoopBackingFile, implemented on top of `losetup --noheadings
--output BACK-FILE <loopDev>`. The discoverer now calls it with a
10s context.WithTimeout, the gomock recorder is regenerated, and
fileDevice discovery is wired through the owner marker added in the
previous commit: any loop PV whose basename does not match the
managed `sds-<lvg>-<hash>.img` pattern (or whose VG carries no
storage.deckhouse.io/lvmVolumeGroupName tag) is skipped with a
warning instead of being written into status.

While here, drop the redundant `Nodes: nil, FileDeviceNodes: nil`
literal that was assigned and then immediately overwritten, and the
now-unused bytes/os/exec imports in discoverer.go.

Agents: Claude Opus 4.7 (ведущий, авто-режим)
CRD validation now rejects in-place edits of fileDevices[].directory
and fileDevices[].size via `x-kubernetes-validations: self == oldSelf`.
docs/FAQ.md already promised "No resize: file device size cannot be
changed after creation", but nothing actually enforced it: increasing
size would fallocate-extend the file while the loop device kept its
captured (old) size, and shrinking would leave a file larger than the
spec entry asked for. With the basename derived from (directory, size)
this would also silently retarget the entry to a different file.

The Go side gets a small bit of housekeeping: LVMVolumeGroupTemplate
in lvm_volume_group_set.go now uses `json:"blockDeviceSelector,omitempty"`,
matching the change applied to LVMVolumeGroupSpec in this PR. Without
it, marshalling a template with no selector would emit
`"blockDeviceSelector":null` while marshalling the spec form would
omit the field entirely.

The struct field alignment of both files is re-`gofmt`-ed for the
columns to line up after the json-tag change.

Agents: Claude Opus 4.7 (ведущий, авто-режим)
…ming

Adds unit tests for the safety properties added in the previous
commits, all driven through gomock against the Commands interface:

- BuildFileDevicePath / IsManagedFileDevicePath: determinism, that
  different (directory, size) tuples produce different basenames, and
  that foreign paths (libvirt qcow2, snap squashfs, a path with a too
  short or non-hex hash) are correctly rejected even when an empty
  lvgName is passed.

- ReattachFileDevices: already-attached files are skipped (no fresh
  losetup --find --show, which would have leaked loops); a per-device
  losetup failure does not abort the loop and the joined error
  surfaces every failing file path; FindLoopDeviceByFile errors are
  propagated.

- provisionFileDevices: reuses an existing loop attachment, and on
  partial failure rolls back both the file and the loop in LIFO
  order so a retry sees a clean slate.

- cleanupFileDevices: walks the union of spec.fileDevices and
  status.nodes[].fileDevices (so files created but never reflected
  in status are still removed), returns an error when losetup -d
  fails (so the finalizer cannot be removed while a busy loop is
  still alive), and refuses to act on paths whose basename does not
  match the managed pattern.

- validateFileDevice: rejects relative directories and `..` segments.

Agents: Claude Opus 4.7 (ведущий, авто-режим)
POSIX text files end with a newline; YAML tools and shell editors
otherwise show "No newline at end of file" on every diff.

Agents: Claude Opus 4.7 (ведущий, авто-режим)
@duckhawk duckhawk force-pushed the feat/lvm-file-backed-devices branch from 7ae49b3 to ec49aec Compare June 23, 2026 15:31
duckhawk added 6 commits June 23, 2026 18:38
Agents: Claude Opus 4.7 (ведущий, авто-режим)
Add context and command timeouts for loop/file-device operations,
validate duplicate spec entries and directory writability, restore host
LVM loop global_filter, fix CRD fileDevices list map semantics, and
update tests/mocks.
Signed-off-by: v.oleynikov <vasily.oleynikov@flant.com>
Transition rule перенесена с полей элементов map-list на spec/lvmVolumeGroupTemplate:
oldSelf на item-level внутри fileDevices отклонялся apiserver как uncorrelatable
и превышал CEL cost budget. Добавлены maxItems и maxLength для снижения стоимости правил.

Agents: Composer 2.5 (ведущий, авто-режим)
reattachFileDevicesAtStartup использует time.Duration; без импорта
падает сборка agent-golang-artifact.

Agents: Composer 2.5 (ведущий, авто-режим)
Signed-off-by: v.oleynikov <vasily.oleynikov@flant.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant