fix(lvg): align thin-pool and PV resize sizing with LVM#223
fix(lvg): align thin-pool and PV resize sizing with LVM#223IvanOgurchenok wants to merge 16 commits into
Conversation
d848281 to
5336a94
Compare
E2E Smoke Tests Results ❌Module: sds-node-configurator
Tests were executed via storage-e2e (TestSdsNodeConfigurator: Common Scheduler Extender, then Sds Node Configurator). Artifacts: Test logs are available in workflow artifacts. |
eda8df8 to
5fd3a28
Compare
E2E Smoke Tests Results ❌Module: sds-node-configurator
Tests were executed via storage-e2e (TestSdsNodeConfigurator: Common Scheduler Extender, then Sds Node Configurator). Artifacts: Test logs are available in workflow artifacts. |
E2E Smoke Tests Results ❌Module: sds-node-configurator
Tests were executed via storage-e2e (TestSdsNodeConfigurator: Common Scheduler Extender, then Sds Node Configurator). Artifacts: Test logs are available in workflow artifacts. |
2a7beba to
89f107c
Compare
E2E Smoke Tests Results ❌Module: sds-node-configurator
Tests were executed via storage-e2e (TestSdsNodeConfigurator: Common Scheduler Extender, then Sds Node Configurator). Artifacts: Test logs are available in workflow artifacts. |
E2E Smoke Tests Results ✅Module: sds-node-configurator
Tests were executed via storage-e2e (TestSdsNodeConfigurator: Common Scheduler Extender, then Sds Node Configurator). Artifacts: Test logs are available in workflow artifacts. |
There was a problem hiding this comment.
Summary
Reviewing the fixes for Bugs 1–3 from BUG_REPORT_THINPOOL_SIZE.md (percentage thin-pool off the real VGSize, exact pe_start in resizePVIfNeeded, per-pool extent tolerance in validateLVGFor*Func). The logic of all three fixes is correct and well explained in the bug report and the commit messages. The comments below group the findings that should ideally be addressed before merge.
Severity overview
| Severity | Count |
|---|---|
| Critical | 1 |
| High | 5 |
| Medium | 6 |
| Low | 2 |
| Nit (PR-wide, below) | 3 |
PR-wide nits (not attached inline)
- PR checklist empty. The "The code is covered by unit tests / e2e tests passed / Documentation updated / Changes were tested manually" boxes are still unchecked. Tick those that apply and mark docs N/A if nothing changed — it's the easiest signal reviewers have that the author walked through the list.
- Explicit link to
efa5f61. The PR description referencesresizeDeltaand commitefa5f61. GitHub resolves the hash, but a full URL (https://github.com/deckhouse/sds-node-configurator/commit/efa5f61) is friendlier in changelogs and IDEs.
Positive observations
- Each of the three fixes lives in its own focused commit (
9449b2f,1cfb2d3,e9261c1) with a clear message. - Using
pv.PEStartfrompvsinstead of the heuristicDevSize - PVSize > extentis the right side of the LVM contract —pe_startis exactly the offset where physical extents begin, andDevSize - (PVSize + peStart) >= extentcorrectly signals unallocated space beyond the current PV. vgSizeinreconcileLVGCreateFuncis hoisted out of the per-pool loop; thevgAfterCreate != nilbranch and the fallback Warning keep the code path visible in logs (see the inline comment on that fallback for a follow-up).PVData.PEStart resource.Quantitymatches the rest of the type and round-trips throughpvs --units B --nosuffixcorrectly.- Table-driven sub-tests
with_100_percent_thin_pools_returns_truekeep the existing test style (see the inline comments on how to make them actually load-bearing).
The rest is style / trade-off and can be discussed inline.
532fac1 to
4991879
Compare
E2E Smoke Tests Results ❌Module: sds-node-configurator
Tests were executed via storage-e2e (TestSdsNodeConfigurator: Common Scheduler Extender, then Sds Node Configurator). Artifacts: Test logs are available in workflow artifacts. |
d51a401 to
8f71f5a
Compare
E2E Smoke Tests Results ❌Module: sds-node-configurator
Tests were executed via storage-e2e (TestSdsNodeConfigurator: Common Scheduler Extender, then Sds Node Configurator). Artifacts: Test logs are available in workflow artifacts. |
E2E Smoke Tests Results ✅Module: sds-node-configurator
Tests were executed via storage-e2e (TestSdsNodeConfigurator: Common Scheduler Extender, then Sds Node Configurator). Artifacts: Test logs are available in workflow artifacts. |
Since commit efa5f61 removed the resizeDelta tolerance in favor of exact extent-aligned size comparisons, the controller started failing validation checks when applying percentage-based thin-pool sizes. It originally calculated the requested bytes from raw BlockDevice capacities (and redundantly re-calculated it on every loop iteration) instead of the actual VG space. The VG space is smaller due to metadata reservations and extent alignments. This updates reconcileLVGCreateFunc to query the actual VG size from LVM after creation, and evaluates requested pool bytes based on this real size before creating pools. This eliminates discrepancies between the creation and validation stages. It also moves the fallback calculation outside of the thin-pools loop. Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
This addresses additional regressions caused by the removal of resizeDelta: 1. Infinite pvresize loop: The controller falsely interpreted physical loss to LVM metadata as raw disk expansion. It now fetches exact pe_start values from LVM to accurately determine the PV metadata header size and verify if physical BlockDevices genuinely expanded. 2. False-positive validation on 100% pools: Added an extent-sized tolerance multiplier to pre-creation validation, allowing mathematically rounded full-disk requests to pass without failing "Required space" checks. Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
Add unit coverage for 100% thin-pool validation in create and update flows, and extend the E2E scenario to verify VirtualDisk expansion updates BlockDevice and VG size. Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
Improve LVG validation and create/update flow safety around extent tolerance, cached VG extent usage, and transient GetVG failures. Split pvresize E2E into a dedicated spec and add regression coverage for tolerance and pe_start parsing. Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
Rebase the LVG pvresize scenario onto the current main test structure while keeping the review-driven checks for PV growth, absence of PVResizeFailed, and stable pvresize convergence. Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
Align PVData fields with gofmt/gci expectations so the agent linter no longer fails on the PEStart-related type update. Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
Add regression tests for reconcileLVGCreateFunc so percent thin-pool sizing is driven by the actual VG size after vgcreate and transient GetVG failures trigger a safe retry instead of continuing on raw device size. Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
Added missing resources used by Common Scheduler + Sds Node Configurator E2E tests in nested cluster: - localstorageclasses.storage.deckhouse.io - core: namespaces, events, configmaps, secrets, leases - deckhouse.io: modules, moduleconfigs, modulepulloverrides This eliminates potential RBAC issues during setup/cleanup. Nested cluster is ephemeral, so broad permissions are safe. Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
Use the selected BlockDevice metadata label as the stable selector and include LVG conditions in the readiness failure message so CI failures expose the pending reason. Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
Move pvresize helper functions into the split E2E helper file and remove the duplicate pvresize scenario left by rebasing the old monolithic test layout. Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
Remove duplicated e2e validation scenario, keep e2e RBAC aligned with main for verification, and normalize test comments before rerunning the full e2e suite. Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
Run pvs with sudo in the pvresize e2e check and include command output in failures so CI logs show the real LVM or permission error. Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
3420ab6 to
3b8487b
Compare
E2E Smoke Tests Results ✅Module: sds-node-configurator
Tests were executed via storage-e2e (TestSdsNodeConfigurator: Common Scheduler Extender, then Sds Node Configurator). Artifacts: Test logs are available in workflow artifacts. |
Allow existing thin pools to be up to one extent above the aligned requested size so LVM rounding does not leave LVMVolumeGroups unapplied. Extend e2e coverage for VGConfigurationApplied and percent LVMLogicalVolume growth after vgextend. Signed-off-by: Ivan Ogurchenok <ivan.ogurchenok@flant.com>
E2E Smoke Tests Results ❌Module: sds-node-configurator
Tests were executed via storage-e2e (TestSdsNodeConfigurator: Common Scheduler Extender, then Sds Node Configurator). Artifacts: Test logs are available in workflow artifacts. |
| extentTolerance := int64(len(lvg.Spec.ThinPools)) * extentSizeVal | ||
| allowedSpace := totalVGSize.Value() + extentTolerance | ||
|
|
||
| if totalThinPoolSize > allowedSpace { | ||
| r.log.Trace(fmt.Sprintf("[validateLVGForCreateFunc] total thin pool size: %s, allowed space: %d", resource.NewQuantity(totalThinPoolSize, resource.BinarySI).String(), allowedSpace)) | ||
| r.log.Warning(fmt.Sprintf("[validateLVGForCreateFunc] requested thin pool size is more than VG total size for the LVMVolumeGroup %s", lvg.Name)) | ||
| reason.WriteString(fmt.Sprintf("Required space for thin-pools %d is more than VG size %d.", totalThinPoolSize, totalVGSize.Value())) | ||
| reason.WriteString(fmt.Sprintf("Required space for thin-pools %d is more than allowed space %d (VG size %d + extent tolerance %d). ", totalThinPoolSize, allowedSpace, totalVGSize.Value(), extentTolerance)) |
There was a problem hiding this comment.
Architectural concern — extent tolerance reintroduced into validation.
These create/update validators gate thin-pool sizing with an extent-sized tolerance layered on top of the deterministic AlignSizeToExtent:
- create:
allowedSpace = totalVGSize + len(lvg.Spec.ThinPools)*extent(here) - update, adding pools:
allowedFreeSpace = totalFreeSpace + addingPoolsCount*extent(L964–L967) - update, existing pool:
allowedActualSize = alignedTpSize + extent(L940)
The extent-alignment refactoring (#201) deliberately replaced the fixed resizeDelta tolerance with deterministic up-alignment, precisely to stop relying on a slack value. The design rationale explicitly rejected the "just increase the tolerance" alternative, because a tolerance masks real discrepancies and scales with the number of objects.
In the create path allowedSpace grows with pool count, so N pools can each overshoot by up to one extent and still pass validation — only to fail later at lvcreate. That is the same scaling-slack behavior the refactoring set out to remove.
Question: was a deterministic alternative considered instead of a tolerance — e.g. comparing the sum of aligned pool sizes against the (aligned) VG size, or capping each pool at the remaining VG (min(aligned, remaining)), mirroring the create path's alignedTpSize >= vgSize → CreateThinPoolFullVGSpace? That keeps the request→validate→create round-trip deterministic and avoids reintroducing the slack #201 eliminated. If the tolerance is intentional, a short comment explaining why one extent per pool is the correct, bounded allowance would help future readers.
|
Scope question: are relative sizes of This PR changes the LVG reconciler (thin-pool create/validate), adds I'm asking to make the scope explicit: if "relative LV sizes" is also meant to cover percent-sized LLVs (e.g. If the intent is strictly thin-pool sizing + PV resize, this is fine as-is — just flagging so reviewers don't assume the LLV side is covered by this change. |
Description
This PR fixes LVMVolumeGroup sizing regressions introduced after
efa5f61c53801579ea84f696f2a81292e3de47c6removed the oldresizeDeltatolerance and switched the controller to strict LVM-derived size comparisons.The change aligns thin-pool percentage sizing, PV resize detection, and validation tolerance with the actual usable LVM sizes reported by
vgs/pvs. It also extends the regression coverage around 100% thin-pool requests, extent rounding, and PV resize behavior. The E2E cluster configuration uses RedOS for the primary nodes so the scenario is exercised on the target OS image.Why do we need it, and what problem does it solve?
LVM reports usable PV/VG size after metadata overhead and extent alignment, while the controller still used raw block device sizes in several paths. That mismatch could reject valid percentage-based thin-pool requests, repeatedly trigger
pvresizeon unchanged devices, or fail validation only because the aligned size was rounded by an extent.What is the expected result?
After applying this PR:
99%and100%are calculated from the real created VG size;pvresizeattempts caused by the normal LVM metadata gap;pvresizelog entry and then stabilizes.Checklist