Skip to content

OCPBUGS-63326: Ensure /boot dir has enough space during reclaim#1316

Open
CrystalChun wants to merge 7 commits intoopenshift:masterfrom
CrystalChun:reclaim-fix
Open

OCPBUGS-63326: Ensure /boot dir has enough space during reclaim#1316
CrystalChun wants to merge 7 commits intoopenshift:masterfrom
CrystalChun:reclaim-fix

Conversation

@CrystalChun
Copy link
Contributor

@CrystalChun CrystalChun commented Feb 6, 2026

Reclaim can fail if there's not enough space to write the boot artifacts. Recently, OCP 4.19 and later has taken up more space in the /boot directory, which prevents the Agent from writing all of the artifacts needed for reclaim.

To resolve this, the Agent will check if there's enough space in the /boot directory and if there isn't, it will try to remove the ostree by running rpm-ostree cleanup --os=rhcos -r.

Summary by CodeRabbit

  • Improvements
    • Boot artifact downloads now use a temporary staging area and are moved atomically to avoid touching existing host files.
    • Bootloader config creation is more robust and written to a safe temporary location with clearer success messages.
    • Added disk space checks with optional reclamation, permission-preserving file moves/copies, and more verbose progress logging to improve reliability.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 6, 2026
@openshift-ci
Copy link

openshift-ci bot commented Feb 6, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Feb 6, 2026
@openshift-ci-robot
Copy link

@CrystalChun: This pull request references Jira Issue OCPBUGS-63326, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Reclaim can fail if there's not enough space to write the boot artifacts. Recently, OCP 4.19 and later has taken up more space in the /boot directory, which prevents the Agent from writing all of the artifacts needed for reclaim.

To resolve this, the Agent will check if there's enough space in the /boot directory and if there isn't, it will remove files and folders from /boot/ostree directory (as these are no longer needed when reclaiming).

Includes retry logic in case this fails the first time around.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Feb 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Introduces a folders struct and refactors boot artifact handling: downloads kernel/initrd into a temporary download folder, creates bootloader config in temp, checks/reclaims boot space, then copies/moves artifacts into final boot locations with permission preservation and added verbose logging. (36 words)

Changes

Cohort / File(s) Summary
Boot artifacts flow & helpers
src/commands/actions/download_boot_artifacts_cmd.go
Added internal folders struct and new folder constants; refactored run flow to use folders; added helpers: downloadArtifacts, ensureBootHasSpace, getFreeSpace, calculateBootArtifactsSize, reclaimBootFolderSpace, moveFilesToBootFolder, moveFiles, copyFile; updated createBootLoaderConfig to write to temp boot folder and derive kernel/initrd paths from hostArtifactsFolder; kernel/initrd now download to tempBootArtifactsFolder/download; added verbose logging and boot space checks; preserves permissions when moving files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test file contains broken function calls with mismatched signatures that will fail to compile. createFolders() and createBootLoaderConfig() calls have incorrect parameter types and counts. Update test function calls to match implementation signatures: pass *folders struct to createFolders() and createBootLoaderConfig(), and add descriptive assertion messages.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: adding space validation and reclamation logic for the /boot directory during the reclaim process.
Stable And Deterministic Test Names ✅ Passed All test names in download_boot_artifacts_cmd_test.go are stable and deterministic with static descriptive strings and no dynamic values.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 6, 2026
@openshift-ci
Copy link

openshift-ci bot commented Feb 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: CrystalChun
Once this PR has been reviewed and has the lgtm label, please assign danmanor for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@CrystalChun CrystalChun force-pushed the reclaim-fix branch 2 times, most recently from 7cc8468 to b1f69c6 Compare February 10, 2026 18:41

const (
retryDownloadAmount = 5
minFreeSpaceReq = 115 * 1024 * 1024 // 115MB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get the actual value from HEAD requests to the download endpoints? I assume this could also change.

@CrystalChun CrystalChun force-pushed the reclaim-fix branch 3 times, most recently from 96f0759 to ac04bfa Compare February 11, 2026 00:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
src/commands/actions/download_boot_artifacts_cmd.go (1)

249-263: removeFiles is defined but never called (dead code)

The function is flagged as unused by the static analyser. filePath at line 104 is constructed and logged but never passed to removeFiles or any other cleanup call — the actual cleanup is delegated entirely to rpm-ostree. Either wire removeFiles into the cleanup flow (perhaps as a fallback) or remove it to keep the build clean.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 249 - 263,
remove the dead helper or wire it as a fallback: either delete the
removeFiles(folder string) function entirely, or call it where filePath is
constructed (use the same filePath variable) as a cleanup fallback when the
rpm-ostree cleanup path fails; if choosing the fallback, invoke
removeFiles(filePath) in the error branch after rpm-ostree returns an error and
log any error returned from removeFiles; ensure references to removeFiles,
filePath and rpm-ostree are updated accordingly and handle/propagate errors
consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/commands/actions/download_boot_artifacts_cmd.go`:
- Around line 110-113: The call to the shell builtin is using util.Execute which
invokes exec.Command and will fail; replace the util.Execute("unset container")
invocation with util.ExecuteShell("unset container") so the command runs in a
shell (mirror the retry path), updating the call that assigns stdout, stderr,
exitCode (and any error handling) in download_boot_artifacts_cmd.go to use
util.ExecuteShell instead of util.Execute.
- Around line 107-109: The short declaration in the syscall.Mount call shadows
the outer err; change the `if err := syscall.Mount(sysrootFolder, sysrootFolder,
"", syscall.MS_REMOUNT, ""); err != nil { ... }` to assign into the existing
error variable (e.g. `err = syscall.Mount(...)` then `if err != nil { return
fmt.Errorf(...) }`) or use a distinct name like `mountErr` so the outer `err` is
not shadowed; keep the fmt.Errorf call unchanged but reference the non-shadowing
error variable.
- Around line 80-89: The run() loop currently swallows all errors and always
returns nil; update it so that it returns the last non-nil error from
runDownloadBootArtifacts when all retryCmdAmount attempts fail, and avoid
calling time.Sleep(defaultCmdRetryDelay) after the final attempt. Specifically,
track the error returned by runDownloadBootArtifacts inside the loop
(referencing runDownloadBootArtifacts, retryCmdAmount, defaultCmdRetryDelay),
only sleep when i < retryCmdAmount-1, and after the loop return the recorded
error instead of nil so callers can detect failure.
- Around line 114-123: Replace the ineffectual re-assignment of stdout with _
when running util.ExecutePrivileged("rpm-ostree", "cleanup", "--os=rhcos", "-r")
before checking exitCode, and restructure the logging so success messages are
only emitted on successful exit codes: call util.ExecutePrivileged once, if
exitCode != 0 call util.ExecuteShell("unset container") and retry (assigning to
_, stderr, exitCode), then if the retry exitCode == 0 call
log.Infof("Successfully removed rhcos") else call log.Errorf with both stdout
and stderr from the failed attempt(s) using their respective variables (and
avoid printing unconditional success messages); reference variables stdout,
stderr, exitCode and functions util.ExecutePrivileged, util.ExecuteShell,
log.Errorf, log.Infof to locate the code to change.
- Around line 241-247: getSize currently uses os.Stat(folder).Size() which
returns directory inode size (~4KB) and causes cleanup to always run; replace
its implementation to call syscall.Statfs (or unix.Statfs on non-deprecated
platform package) for the mount containing folder and compute available bytes as
stat.Bavail * uint64(stat.Bsize) (convert to int64) so getSize(folder) returns
real available filesystem bytes; update error handling to wrap syscall failures
and keep the same function signature so callers like getSize (used by the
free-space check) get correct values.

---

Nitpick comments:
In `@src/commands/actions/download_boot_artifacts_cmd.go`:
- Around line 249-263: remove the dead helper or wire it as a fallback: either
delete the removeFiles(folder string) function entirely, or call it where
filePath is constructed (use the same filePath variable) as a cleanup fallback
when the rpm-ostree cleanup path fails; if choosing the fallback, invoke
removeFiles(filePath) in the error branch after rpm-ostree returns an error and
log any error returned from removeFiles; ensure references to removeFiles,
filePath and rpm-ostree are updated accordingly and handle/propagate errors
consistently.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4f8a09b and fe03e0f.

📒 Files selected for processing (1)
  • src/commands/actions/download_boot_artifacts_cmd.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/commands/actions/download_boot_artifacts_cmd.go (4)

200-205: 🛠️ Refactor suggestion | 🟠 Major

createFolderIfNotExist swallows non-IsNotExist errors from os.Stat; the Stat check is unnecessary.

If os.Stat fails with an error other than IsNotExist (e.g., permission denied), the function silently returns nil without creating the directory. Since os.MkdirAll is already a no-op when the directory exists, the Stat guard adds no value and introduces a subtle bug.

♻️ Simplify
 func createFolderIfNotExist(folder string) error {
-	if _, err := os.Stat(folder); os.IsNotExist(err) {
-		return os.MkdirAll(folder, 0755)
-	}
-	return nil
+	return os.MkdirAll(folder, 0755)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 200 - 205,
The function createFolderIfNotExist currently swallows non-IsNotExist errors
from os.Stat; replace the Stat check by directly calling os.MkdirAll(folder,
0755) and returning its error so permission or other filesystem errors are
propagated; update the function body to simply call and return os.MkdirAll for
the given folder.

80-113: ⚠️ Potential issue | 🔴 Critical

Downloaded artifacts are never moved from temp to their final host paths — bootloader config will reference nonexistent files.

Artifacts are downloaded to tempDownloadFolder (e.g., /tmp/boot_artifacts/download/vmlinuz), and the bootloader config is written to tempBootArtifactsFolder. However, the bootloader config references paths under hostArtifactsFolder (lines 185–186), and there is no step that copies or moves files from the temp locations to the host /boot paths.

As written, the bootloader will attempt to load kernel/initrd from hostArtifactsFolder where nothing was placed.

I understand this is a draft/WIP PR, but flagging this so it isn't overlooked — the flow needs a step (likely after the space-check/cleanup mentioned in the PR objectives) to move artifacts from tempDownloadFolderhostArtifactsFolder and the config from tempBootArtifactsFolderbootLoaderFolder.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 80 - 113,
Downloaded artifacts are left in temp locations and never moved to the host boot
paths, so createBootLoaderConfig ends up referencing files that don't exist
under hostArtifactsFolder; after downloadArtifacts returns (in run) add a step
that moves or copies files from folders.tempDownloadFolder into
folders.hostArtifactsFolder and moves the generated config from
folders.tempBootArtifactsFolder into folders.bootLoaderFolder (respecting
permissions/ownership and handling existing files), or implement a helper like
moveArtifacts(tempDownloadFolder, hostArtifactsFolder) and
moveBootConfig(tempBootArtifactsFolder, bootLoaderFolder) and call them before
createBootLoaderConfig/*or*/ modify downloadArtifacts to place files directly
into hostArtifactsFolder while still using temp folders for validation. Ensure
to reference folders.hostArtifactsFolder, folders.tempDownloadFolder,
folders.tempBootArtifactsFolder, and the
run/downloadArtifacts/createBootLoaderConfig functions when making the change.

154-182: ⚠️ Potential issue | 🔴 Critical

Pre-existing: nil-pointer dereference on res.StatusCode and response body leak in retry loop.

Two bugs in the (unchanged) download function:

  1. NPE (line 163): When httpClient.Get fails with a connection error, res is nil. The code falls through to line 163 and dereferences res.StatusCode, causing a panic.

  2. Response body leak: On each retry iteration where the response has a non-2xx status, res.Body is never closed before the next iteration.

🐛 Proposed fix
 func download(httpClient *http.Client, filePath, url string, retry int) error {
 	var downloadErr error
 	var res *http.Response
 	for attempts := 0; attempts < retry; attempts++ {
 		res, downloadErr = httpClient.Get(url)
-		if downloadErr == nil && (res.StatusCode >= 200 && res.StatusCode < 300) {
+		if downloadErr == nil && res != nil && (res.StatusCode >= 200 && res.StatusCode < 300) {
 			break
 		}
-		downloadErr = fmt.Errorf("failed downloading boot artifact from %s, status code received: %d, attempt %d/%d, download error: %w",
-			url, res.StatusCode, attempts, retry, downloadErr)
+		statusCode := 0
+		if res != nil {
+			statusCode = res.StatusCode
+			res.Body.Close()
+		}
+		downloadErr = fmt.Errorf("failed downloading boot artifact from %s, status code received: %d, attempt %d/%d, download error: %w",
+			url, statusCode, attempts, retry, downloadErr)
 		log.Warn(downloadErr.Error())
 		time.Sleep(defaultDownloadRetryDelay)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 154 - 182,
The download function can dereference a nil response and leak response bodies
during retries; update the download loop in download(httpClient *http.Client,
filePath, url string, retry int) to: after each httpClient.Get call check if res
!= nil and if so defer or explicitly close res.Body before the next iteration
when the status is non-2xx, and only read res.StatusCode when res is non-nil
(use a local status variable or guard the access); also ensure the error message
you build (downloadErr) correctly wraps the underlying error even when res is
nil (avoid using res.StatusCode in that case) so you do not panic and you close
any non-nil res.Body on failed attempts to avoid leaking connections.

184-192: ⚠️ Potential issue | 🔴 Critical

Bootloader config embeds container-relative /host/… paths instead of boot-relative paths.

The current implementation uses folders.hostArtifactsFolder (e.g., /host/boot/discovery/vmlinuz) for kernel and initrd paths in the bootloader config. However, at boot time, the bootloader resolves paths relative to /boot, not /host/boot. The config should reference /boot/discovery/vmlinuz instead.

Additionally, the test file is out of sync with this change—the existing tests (lines 114-115 of download_boot_artifacts_cmd_test.go) expect the config to contain paths constructed with the artifactsFolder constant (/boot/discovery/...), not hostArtifactsFolder.

🐛 Proposed fix
 func createBootLoaderConfig(rootfsUrl string, folders *folders) error {
-	kernelPath := path.Join(folders.hostArtifactsFolder, kernelFile)
-	initrdPath := path.Join(folders.hostArtifactsFolder, initrdFile)
+	kernelPath := path.Join(artifactsFolder, kernelFile)
+	initrdPath := path.Join(artifactsFolder, initrdFile)
 	bootLoaderConfigFile := path.Join(folders.tempBootArtifactsFolder, bootLoaderConfigFileName)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 184 - 192,
The bootloader config generator createBootLoaderConfig currently builds
kernelPath and initrdPath from folders.hostArtifactsFolder (container /host/…
paths) but must use boot-relative paths (folders.artifactsFolder or the
artifactsFolder constant, e.g. /boot/discovery/...) so the bootloader sees
/boot/... at boot time; update kernelPath and initrdPath to be
path.Join(folders.artifactsFolder, kernelFile) and
path.Join(folders.artifactsFolder, initrdFile) and keep the arch-specific
templates bootLoaderConfigTemplate and bootLoaderConfigTemplateS390x unchanged,
then update the corresponding test expectations in the test that asserts the
generated config (previously expecting artifactsFolder paths) so they match the
new /boot/... values.
🧹 Nitpick comments (1)
src/commands/actions/download_boot_artifacts_cmd.go (1)

48-57: Variable folders shadows type folders; constant names collide with struct field names.

At line 91, folders := &folders{…} shadows the type with the variable. Additionally, the constants bootLoaderFolder (line 63) and tempBootArtifactsFolder (line 64) share names with struct fields, making lines 93–94 read ambiguously (e.g., bootLoaderFolder: path.Join(…, bootLoaderFolder)).

Consider renaming the type (e.g., artifactFolders or bootFolders) or the variable (e.g., dirs) to remove ambiguity:

-	folders := &folders{
+	dirs := &folders{

And update downstream references accordingly.

Also applies to: 91-96

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 48 - 57,
The struct type named "folders" collides with the local variable created as
"folders := &folders{...}" and its field names (bootLoaderFolder,
tempBootArtifactsFolder) collide with constants of the same names; rename either
the type (e.g., artifactFolders or bootFolders) or the variable (e.g., dirs or
folderVars) so the type and variable are distinct, update the constructor site
where "folders := &folders{...}" is created to use the new identifier, and
update all downstream uses (references to the type, the variable, and field
initializations that currently read like bootLoaderFolder: path.Join(...,
bootLoaderFolder)) to the new names to remove ambiguity (ensure references to
the constants remain correct).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/commands/actions/download_boot_artifacts_cmd.go`:
- Around line 200-205: The function createFolderIfNotExist currently swallows
non-IsNotExist errors from os.Stat; replace the Stat check by directly calling
os.MkdirAll(folder, 0755) and returning its error so permission or other
filesystem errors are propagated; update the function body to simply call and
return os.MkdirAll for the given folder.
- Around line 80-113: Downloaded artifacts are left in temp locations and never
moved to the host boot paths, so createBootLoaderConfig ends up referencing
files that don't exist under hostArtifactsFolder; after downloadArtifacts
returns (in run) add a step that moves or copies files from
folders.tempDownloadFolder into folders.hostArtifactsFolder and moves the
generated config from folders.tempBootArtifactsFolder into
folders.bootLoaderFolder (respecting permissions/ownership and handling existing
files), or implement a helper like moveArtifacts(tempDownloadFolder,
hostArtifactsFolder) and moveBootConfig(tempBootArtifactsFolder,
bootLoaderFolder) and call them before createBootLoaderConfig/*or*/ modify
downloadArtifacts to place files directly into hostArtifactsFolder while still
using temp folders for validation. Ensure to reference
folders.hostArtifactsFolder, folders.tempDownloadFolder,
folders.tempBootArtifactsFolder, and the
run/downloadArtifacts/createBootLoaderConfig functions when making the change.
- Around line 154-182: The download function can dereference a nil response and
leak response bodies during retries; update the download loop in
download(httpClient *http.Client, filePath, url string, retry int) to: after
each httpClient.Get call check if res != nil and if so defer or explicitly close
res.Body before the next iteration when the status is non-2xx, and only read
res.StatusCode when res is non-nil (use a local status variable or guard the
access); also ensure the error message you build (downloadErr) correctly wraps
the underlying error even when res is nil (avoid using res.StatusCode in that
case) so you do not panic and you close any non-nil res.Body on failed attempts
to avoid leaking connections.
- Around line 184-192: The bootloader config generator createBootLoaderConfig
currently builds kernelPath and initrdPath from folders.hostArtifactsFolder
(container /host/… paths) but must use boot-relative paths
(folders.artifactsFolder or the artifactsFolder constant, e.g.
/boot/discovery/...) so the bootloader sees /boot/... at boot time; update
kernelPath and initrdPath to be path.Join(folders.artifactsFolder, kernelFile)
and path.Join(folders.artifactsFolder, initrdFile) and keep the arch-specific
templates bootLoaderConfigTemplate and bootLoaderConfigTemplateS390x unchanged,
then update the corresponding test expectations in the test that asserts the
generated config (previously expecting artifactsFolder paths) so they match the
new /boot/... values.

---

Nitpick comments:
In `@src/commands/actions/download_boot_artifacts_cmd.go`:
- Around line 48-57: The struct type named "folders" collides with the local
variable created as "folders := &folders{...}" and its field names
(bootLoaderFolder, tempBootArtifactsFolder) collide with constants of the same
names; rename either the type (e.g., artifactFolders or bootFolders) or the
variable (e.g., dirs or folderVars) so the type and variable are distinct,
update the constructor site where "folders := &folders{...}" is created to use
the new identifier, and update all downstream uses (references to the type, the
variable, and field initializations that currently read like bootLoaderFolder:
path.Join(..., bootLoaderFolder)) to the new names to remove ambiguity (ensure
references to the constants remain correct).

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between fe03e0f and a713909.

📒 Files selected for processing (1)
  • src/commands/actions/download_boot_artifacts_cmd.go

@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 25, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
src/commands/actions/download_boot_artifacts_cmd.go (2)

297-317: moveFiles only copies — source files are never deleted

The function is named moveFiles but after copyFile succeeds, the source file is never removed. This is semantically copyFiles. While /tmp is ephemeral and won't cause a user-visible problem here, the naming mismatch is a maintenance hazard and briefly doubles the storage footprint on the source filesystem during the transfer.

♻️ Proposed fix
 		if err := copyFile(sourcePath, destPath); err != nil {
 			return fmt.Errorf("failed to copy file %s to %s: %w", file.Name(), destinationFolder, err)
 		}
+		if err := os.Remove(sourcePath); err != nil {
+			log.Warnf("failed to remove source file %s after copy: %v", sourcePath, err)
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 297 - 317,
The function moveFiles currently only copies files via copyFile but never
deletes the originals; update moveFiles so that after a successful copy (the
block where copyFile(sourcePath, destPath) returns nil) it removes the source
file (os.Remove(sourcePath)) and returns/wraps any removal error (or logs it)
instead of leaving stale files; ensure you only remove after copyFile succeeds
and reference the existing sourcePath/destPath variables and copyFile call when
adding the removal and error handling.

94-100: folders variable shadows the folders type within run()

Assigning folders := &folders{...} makes the folders identifier refer to the variable for the rest of run(), shadowing the type. While Go allows this and the type isn't needed again in the function, it's a readability hazard that govet may flag depending on lint config. Consider a distinct variable name such as f or artifactDirs.

♻️ Proposed fix
-	folders := &folders{
+	f := &folders{
 		bootFolder:              bootFolder,
 		hostArtifactsFolder:     path.Join(*req.HostFsMountDir, artifactsFolder),
 		bootLoaderFolder:        path.Join(*req.HostFsMountDir, bootLoaderFolder),
 		tempBootArtifactsFolder: tempBootArtifactsFolder,
 		tempDownloadFolder:      path.Join(tempBootArtifactsFolder, "download"),
 	}

-	if err := createFolders(folders); err != nil {
+	if err := createFolders(f); err != nil {
 ...
-	if err := downloadArtifacts(req, caCertPath, folders); err != nil {
+	if err := downloadArtifacts(req, caCertPath, f); err != nil {
 ...
 // (and so on for all uses of `folders` in run())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 94 - 100,
The local variable declaration "folders := &folders{...}" in run() shadows the
folders type; rename the variable (e.g., to "f" or "artifactDirs") and update
all subsequent references in run() (the allocation site and any uses) to the new
variable name so the type name is not shadowed and readability/lint warnings are
avoided; ensure the symbol "folders" (the type) remains unchanged and only the
variable name is modified.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/commands/actions/download_boot_artifacts_cmd.go`:
- Around line 117-130: After reclaimBootFolderSpace() succeeds you must re-check
available space and implement retry logic for the reclaim operation; modify the
flow around doesBootFolderHaveEnoughSpace, reclaimBootFolderSpace and
moveFilesToBootFolder so that: 1) if initial doesBootFolderHaveEnoughSpace
returns false, call reclaimBootFolderSpace in a retry loop with a limited number
of attempts and small backoff, 2) after each successful reclaim attempt call
doesBootFolderHaveEnoughSpace again and break if true, and 3) if space is still
insufficient after retries, return a clear error and do not call
moveFilesToBootFolder. Ensure the logic references the same functions
(doesBootFolderHaveEnoughSpace, reclaimBootFolderSpace, moveFilesToBootFolder)
so the re-verification and retry behavior is applied where the current checks
occur.
- Around line 152-163: doesBootFolderHaveEnoughSpace incorrectly uses
calculateSize(folders.tempDownloadFolder) which calls syscall.Statfs and returns
filesystem-wide used space; replace that call with a directory-walk that sums
the sizes of files under folders.tempDownloadFolder (e.g., using
filepath.WalkDir or Walk and os.FileInfo.Size), skipping directories and
handling errors, and use that sum as artifactsSize while keeping the existing
calculateSize(folders.bootFolder) call for freeSpace; update
doesBootFolderHaveEnoughSpace to compare freeSpace > artifactsSize and ensure
errors from the walk are returned upward (so reclaimBootFolderSpace only runs
when truly needed).
- Around line 263-273: In calculateSize, the short declaration "if err :=
syscall.Statfs(folder, &stat); err != nil {" shadows the named return variable
err; change the code to assign to the existing named return (use "if err =
syscall.Statfs(folder, &stat); err != nil {") or remove the named return and use
a local err variable consistently so the syscall.Statfs call does not shadow the
function-level err; update the if statement in calculateSize accordingly.

---

Nitpick comments:
In `@src/commands/actions/download_boot_artifacts_cmd.go`:
- Around line 297-317: The function moveFiles currently only copies files via
copyFile but never deletes the originals; update moveFiles so that after a
successful copy (the block where copyFile(sourcePath, destPath) returns nil) it
removes the source file (os.Remove(sourcePath)) and returns/wraps any removal
error (or logs it) instead of leaving stale files; ensure you only remove after
copyFile succeeds and reference the existing sourcePath/destPath variables and
copyFile call when adding the removal and error handling.
- Around line 94-100: The local variable declaration "folders := &folders{...}"
in run() shadows the folders type; rename the variable (e.g., to "f" or
"artifactDirs") and update all subsequent references in run() (the allocation
site and any uses) to the new variable name so the type name is not shadowed and
readability/lint warnings are avoided; ensure the symbol "folders" (the type)
remains unchanged and only the variable name is modified.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between a713909 and 21cf90b.

📒 Files selected for processing (1)
  • src/commands/actions/download_boot_artifacts_cmd.go

Comment on lines +117 to +130
bootFolderHasEnoughSpace, err := doesBootFolderHaveEnoughSpace(folders)
if err != nil {
return fmt.Errorf("failed to check if boot folder has enough space: %w", err)
}
if !bootFolderHasEnoughSpace {
log.Warnf("Boot folder does not have enough space, attempting to reclaim space")
if err := reclaimBootFolderSpace(); err != nil {
return fmt.Errorf("failed to reclaim boot folder space: %w", err)
}
}

if err := moveFilesToBootFolder(folders); err != nil {
return fmt.Errorf("failed to move files to boot folder: %w", err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

No space re-verification after reclaimBootFolderSpace

After reclaimBootFolderSpace() succeeds, execution falls straight into moveFilesToBootFolder without confirming that the reclaim actually freed enough space. If rpm-ostree cleanup only partially recovers space, the subsequent file move to /boot may still fail with out-of-disk-space errors, making the reclaim logic ineffective.

The PR description also mentions retry logic for the cleanup step, but the current implementation has no retry.

🐛 Proposed fix
 	if !bootFolderHasEnoughSpace {
 		log.Warnf("Boot folder does not have enough space, attempting to reclaim space")
 		if err := reclaimBootFolderSpace(); err != nil {
 			return fmt.Errorf("failed to reclaim boot folder space: %w", err)
 		}
+		// Re-verify that reclaim freed sufficient space before proceeding.
+		bootFolderHasEnoughSpace, err = doesBootFolderHaveEnoughSpace(folders)
+		if err != nil {
+			return fmt.Errorf("failed to re-check boot folder space after reclaim: %w", err)
+		}
+		if !bootFolderHasEnoughSpace {
+			return fmt.Errorf("boot folder still does not have enough space after reclaim")
+		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bootFolderHasEnoughSpace, err := doesBootFolderHaveEnoughSpace(folders)
if err != nil {
return fmt.Errorf("failed to check if boot folder has enough space: %w", err)
}
if !bootFolderHasEnoughSpace {
log.Warnf("Boot folder does not have enough space, attempting to reclaim space")
if err := reclaimBootFolderSpace(); err != nil {
return fmt.Errorf("failed to reclaim boot folder space: %w", err)
}
}
if err := moveFilesToBootFolder(folders); err != nil {
return fmt.Errorf("failed to move files to boot folder: %w", err)
}
bootFolderHasEnoughSpace, err := doesBootFolderHaveEnoughSpace(folders)
if err != nil {
return fmt.Errorf("failed to check if boot folder has enough space: %w", err)
}
if !bootFolderHasEnoughSpace {
log.Warnf("Boot folder does not have enough space, attempting to reclaim space")
if err := reclaimBootFolderSpace(); err != nil {
return fmt.Errorf("failed to reclaim boot folder space: %w", err)
}
// Re-verify that reclaim freed sufficient space before proceeding.
bootFolderHasEnoughSpace, err = doesBootFolderHaveEnoughSpace(folders)
if err != nil {
return fmt.Errorf("failed to re-check boot folder space after reclaim: %w", err)
}
if !bootFolderHasEnoughSpace {
return fmt.Errorf("boot folder still does not have enough space after reclaim")
}
}
if err := moveFilesToBootFolder(folders); err != nil {
return fmt.Errorf("failed to move files to boot folder: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 117 - 130,
After reclaimBootFolderSpace() succeeds you must re-check available space and
implement retry logic for the reclaim operation; modify the flow around
doesBootFolderHaveEnoughSpace, reclaimBootFolderSpace and moveFilesToBootFolder
so that: 1) if initial doesBootFolderHaveEnoughSpace returns false, call
reclaimBootFolderSpace in a retry loop with a limited number of attempts and
small backoff, 2) after each successful reclaim attempt call
doesBootFolderHaveEnoughSpace again and break if true, and 3) if space is still
insufficient after retries, return a clear error and do not call
moveFilesToBootFolder. Ensure the logic references the same functions
(doesBootFolderHaveEnoughSpace, reclaimBootFolderSpace, moveFilesToBootFolder)
so the re-verification and retry behavior is applied where the current checks
occur.

@CrystalChun CrystalChun force-pushed the reclaim-fix branch 2 times, most recently from cc6f163 to adc3c0d Compare February 25, 2026 22:59
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/commands/actions/download_boot_artifacts_cmd.go (1)

117-126: ⚠️ Potential issue | 🟠 Major

Re-check /boot capacity after reclaim and add bounded reclaim retries.

If reclaim partially succeeds, the code still proceeds to copy into /boot without confirming enough space. This can fail later during artifact moves. Also, reclaim is attempted only once.

💡 Proposed fix
+const (
+	reclaimRetryAmount = 3
+	reclaimRetryDelay  = 5 * time.Second
+)
+
 	bootFolderHasEnoughSpace, err := doesBootFolderHaveEnoughSpace(folders)
 	if err != nil {
 		return fmt.Errorf("failed to check if boot folder has enough space: %w", err)
 	}
 	if !bootFolderHasEnoughSpace {
 		log.Warnf("Boot folder does not have enough space, attempting to reclaim space")
-		if err := reclaimBootFolderSpace(); err != nil {
-			return fmt.Errorf("failed to reclaim boot folder space: %w", err)
+		for attempt := 1; attempt <= reclaimRetryAmount && !bootFolderHasEnoughSpace; attempt++ {
+			if err := reclaimBootFolderSpace(); err != nil {
+				if attempt == reclaimRetryAmount {
+					return fmt.Errorf("failed to reclaim boot folder space after %d attempts: %w", attempt, err)
+				}
+				log.WithError(err).Warnf("Reclaim attempt %d/%d failed", attempt, reclaimRetryAmount)
+			}
+
+			bootFolderHasEnoughSpace, err = doesBootFolderHaveEnoughSpace(folders)
+			if err != nil {
+				return fmt.Errorf("failed to re-check boot folder space: %w", err)
+			}
+			if !bootFolderHasEnoughSpace && attempt < reclaimRetryAmount {
+				time.Sleep(reclaimRetryDelay)
+			}
+		}
+		if !bootFolderHasEnoughSpace {
+			return fmt.Errorf("boot folder still does not have enough space after reclaim attempts")
 		}
 	}

Also applies to: 128-130

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 117 - 126,
After calling reclaimBootFolderSpace() when
doesBootFolderHaveEnoughSpace(folders) returns false, re-check capacity by
calling doesBootFolderHaveEnoughSpace(folders) again and only proceed if it
returns true; implement a bounded retry loop around reclaimBootFolderSpace()
(e.g., maxRetries with a short backoff) so reclaim is attempted multiple times
before erroring, and return a clear error if after retries the boot folder still
lacks space. Ensure this change is applied to the same logic paths in
download_boot_artifacts_cmd.go where reclaim is invoked (referencing
doesBootFolderHaveEnoughSpace and reclaimBootFolderSpace).
🧹 Nitpick comments (1)
src/commands/actions/download_boot_artifacts_cmd.go (1)

324-344: moveFiles currently copies but never removes source files.

The implementation doesn’t match the function name and leaves temp files behind. Either rename to copyFiles or delete the source after successful copy.

💡 Proposed fix (true move semantics)
 func moveFiles(sourceFolder, destinationFolder string) error {
 	files, err := os.ReadDir(sourceFolder)
 	if err != nil {
 		return fmt.Errorf("failed to read %s: %w", sourceFolder, err)
 	}
 	for _, file := range files {
 		if file.IsDir() {
 			log.Debugf("Skipping directory %s", file.Name())
 			continue
 		}
 		log.Debugf("Moving file %s from %s to %s", file.Name(), sourceFolder, destinationFolder)
 		sourcePath := path.Join(sourceFolder, file.Name())
 		destPath := path.Join(destinationFolder, file.Name())

 		if err := copyFile(sourcePath, destPath); err != nil {
 			return fmt.Errorf("failed to copy file %s to %s: %w", file.Name(), destinationFolder, err)
 		}
+		if err := os.Remove(sourcePath); err != nil {
+			return fmt.Errorf("failed to remove source file %s after copy: %w", sourcePath, err)
+		}

 	}
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 324 - 344,
The function moveFiles currently only copies files using copyFile but does not
remove the originals; update moveFiles to perform true move semantics by
removing each source file after a successful copy: after calling
copyFile(sourcePath, destPath) in moveFiles, call os.Remove(sourcePath) and
return or wrap any os.Remove error (include file names in the error message) so
failures are visible; ensure directories are still skipped (file.IsDir()) and
that errors from copyFile and os.Remove are returned to the caller.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/commands/actions/download_boot_artifacts_cmd.go`:
- Line 163: The current space check uses a strict greater-than (return freeSpace
> artifactsSize, nil) which rejects exact-fit cases; change the comparison to
inclusive (use >=) so a freeSpace equal to artifactsSize is considered
sufficient, and update any related tests or callers expecting strict behavior to
reflect the inclusive check on the freeSpace and artifactsSize variables.

---

Duplicate comments:
In `@src/commands/actions/download_boot_artifacts_cmd.go`:
- Around line 117-126: After calling reclaimBootFolderSpace() when
doesBootFolderHaveEnoughSpace(folders) returns false, re-check capacity by
calling doesBootFolderHaveEnoughSpace(folders) again and only proceed if it
returns true; implement a bounded retry loop around reclaimBootFolderSpace()
(e.g., maxRetries with a short backoff) so reclaim is attempted multiple times
before erroring, and return a clear error if after retries the boot folder still
lacks space. Ensure this change is applied to the same logic paths in
download_boot_artifacts_cmd.go where reclaim is invoked (referencing
doesBootFolderHaveEnoughSpace and reclaimBootFolderSpace).

---

Nitpick comments:
In `@src/commands/actions/download_boot_artifacts_cmd.go`:
- Around line 324-344: The function moveFiles currently only copies files using
copyFile but does not remove the originals; update moveFiles to perform true
move semantics by removing each source file after a successful copy: after
calling copyFile(sourcePath, destPath) in moveFiles, call os.Remove(sourcePath)
and return or wrap any os.Remove error (include file names in the error message)
so failures are visible; ensure directories are still skipped (file.IsDir()) and
that errors from copyFile and os.Remove are returned to the caller.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 21cf90b and adc3c0d.

📒 Files selected for processing (1)
  • src/commands/actions/download_boot_artifacts_cmd.go

return false, fmt.Errorf("failed to get free space of boot folder [%s]: %w", folders.bootFolder, err)
}
log.Infof("Free space in boot folder: %d", freeSpace)
return freeSpace > artifactsSize, nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use inclusive comparison for exact-fit space checks.

Exact-fit capacity should pass. Strict > can trigger unnecessary reclaim/failure when free space equals artifact size.

💡 Proposed fix
-	return freeSpace > artifactsSize, nil
+	return freeSpace >= artifactsSize, nil
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return freeSpace > artifactsSize, nil
return freeSpace >= artifactsSize, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` at line 163, The current
space check uses a strict greater-than (return freeSpace > artifactsSize, nil)
which rejects exact-fit cases; change the comparison to inclusive (use >=) so a
freeSpace equal to artifactsSize is considered sufficient, and update any
related tests or callers expecting strict behavior to reflect the inclusive
check on the freeSpace and artifactsSize variables.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/commands/actions/download_boot_artifacts_cmd.go (1)

158-160: ⚠️ Potential issue | 🟡 Minor

Exact-fit space check still uses strict > — unnecessary reclaim when freeSpace == artifactsSize.

Line 158 should use >=; the post-reclaim check at line 172 already correctly uses < (allowing exact fit), creating an asymmetry where exact fit before reclaim triggers the expensive cleanup unnecessarily.

🐛 Proposed fix
-	if freeSpace > artifactsSize {
+	if freeSpace >= artifactsSize {
		return nil
	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 158 - 160,
The pre-reclaim space check uses a strict '>' which forces reclaim when
freeSpace == artifactsSize; change the condition in the
download_boot_artifacts_cmd.go check comparing freeSpace and artifactsSize from
'>' to '>=' so exact-fit skips unnecessary cleanup (this mirrors the later
post-reclaim '<' check and avoids an unnecessary expensive reclaim).
🧹 Nitpick comments (1)
src/commands/actions/download_boot_artifacts_cmd.go (1)

336-356: Temp directories are never cleaned up after a successful copy.

After moveFilesToBootFolder completes, /tmp/boot_artifacts and its contents remain. On repeated invocations (e.g., after a reboot/retry), stale artifacts from a prior run could interfere.

♻️ Proposed fix
 func moveFilesToBootFolder(folders *folders) error {
 	if err := moveFiles(folders.tempDownloadFolder, folders.hostArtifactsFolder); err != nil {
 		return fmt.Errorf("failed moving files from %s to %s: %w", folders.tempDownloadFolder, folders.hostArtifactsFolder, err)
 	}
 	if err := moveFiles(folders.tempBootArtifactsFolder, folders.bootLoaderFolder); err != nil {
 		return fmt.Errorf("failed moving files from %s to %s: %w", folders.tempBootArtifactsFolder, folders.bootLoaderFolder, err)
 	}
+	if err := os.RemoveAll(folders.tempBootArtifactsFolder); err != nil {
+		log.Warnf("Failed to clean up temp folder %s: %v", folders.tempBootArtifactsFolder, err)
+	}
 	log.Infof("Successfully moved files to /boot folder.")
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 336 - 356,
The temp source directory (e.g., /tmp/boot_artifacts) is never removed after
files are copied; modify moveFiles (or the caller moveFilesToBootFolder) so that
after all files are successfully copied it removes the sourceFolder via
os.RemoveAll and handles any removal error (log or return as appropriate)
instead of leaving stale artifacts; ensure removal happens only when the copy
loop completes without error to avoid losing data on partial failures and
reference the moveFiles function and moveFilesToBootFolder caller to implement
this cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/commands/actions/download_boot_artifacts_cmd.go`:
- Around line 358-383: The copyFile function risks lost data because it never
fsyncs the destination; after the io.Copy and again after setting permissions
you should call destFile.Sync() (checking and returning any error) to ensure
data and metadata are flushed to disk before returning; keep the existing
deferred destFile.Close() but add error handling for destFile.Sync() (return
fmt.Errorf with context) both immediately after the copy and after os.Chmod so
boot-critical files are durable.
- Around line 314-322: The reclaimBootFolderSpace function currently calls
util.ExecutePrivileged("rpm-ostree", "cleanup", "--os=rhcos", "-r") only once;
add retry logic around that call (in reclaimBootFolderSpace) to attempt the
command multiple times (e.g., 3 attempts) with a short backoff between attempts
(time.Sleep) and log each attempt via log.Debugf/log.Warnf; on success return
nil immediately, on repeated failure return the final fmt.Errorf including
stdout/stderr/exitCode from util.ExecutePrivileged so callers can see the last
error. Ensure you reference the util.ExecutePrivileged call and preserve the
existing log messages (including the successful log.Info on success).

---

Duplicate comments:
In `@src/commands/actions/download_boot_artifacts_cmd.go`:
- Around line 158-160: The pre-reclaim space check uses a strict '>' which
forces reclaim when freeSpace == artifactsSize; change the condition in the
download_boot_artifacts_cmd.go check comparing freeSpace and artifactsSize from
'>' to '>=' so exact-fit skips unnecessary cleanup (this mirrors the later
post-reclaim '<' check and avoids an unnecessary expensive reclaim).

---

Nitpick comments:
In `@src/commands/actions/download_boot_artifacts_cmd.go`:
- Around line 336-356: The temp source directory (e.g., /tmp/boot_artifacts) is
never removed after files are copied; modify moveFiles (or the caller
moveFilesToBootFolder) so that after all files are successfully copied it
removes the sourceFolder via os.RemoveAll and handles any removal error (log or
return as appropriate) instead of leaving stale artifacts; ensure removal
happens only when the copy loop completes without error to avoid losing data on
partial failures and reference the moveFiles function and moveFilesToBootFolder
caller to implement this cleanup.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between adc3c0d and 5c808aa.

📒 Files selected for processing (1)
  • src/commands/actions/download_boot_artifacts_cmd.go

Comment on lines +314 to +322
func reclaimBootFolderSpace() error {
stdout, stderr, exitCode := util.ExecutePrivileged("rpm-ostree", "cleanup", "--os=rhcos", "-r")
log.Debugf("Cleanup RHCOS stdout: %s\nstderr: %s\nexitCode: %d", stdout, stderr, exitCode)
if exitCode != 0 {
return fmt.Errorf("Cleanup command for RHCOS failed: %s: %s", stdout, stderr)
}
log.Info("Successfully cleaned up RHCOS")
return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing retry logic for rpm-ostree cleanup — contradicts PR description.

The PR description explicitly states "Includes retry logic to attempt the cleanup again if the first attempt fails," but reclaimBootFolderSpace runs rpm-ostree cleanup exactly once. Transient failures (e.g., RPM database lock contention) will surface as permanent errors with no recovery opportunity.

🐛 Proposed fix
+const reclaimRetryAmount = 3
+const defaultReclaimRetryDelay = 30 * time.Second
+
 func reclaimBootFolderSpace() error {
-	stdout, stderr, exitCode := util.ExecutePrivileged("rpm-ostree", "cleanup", "--os=rhcos", "-r")
-	log.Debugf("Cleanup RHCOS stdout: %s\nstderr: %s\nexitCode: %d", stdout, stderr, exitCode)
-	if exitCode != 0 {
-		return fmt.Errorf("Cleanup command for RHCOS failed: %s: %s", stdout, stderr)
-	}
-	log.Info("Successfully cleaned up RHCOS")
-	return nil
+	var lastErr error
+	for i := 0; i < reclaimRetryAmount; i++ {
+		stdout, stderr, exitCode := util.ExecutePrivileged("rpm-ostree", "cleanup", "--os=rhcos", "-r")
+		log.Debugf("Cleanup RHCOS stdout: %s\nstderr: %s\nexitCode: %d", stdout, stderr, exitCode)
+		if exitCode == 0 {
+			log.Info("Successfully cleaned up RHCOS")
+			return nil
+		}
+		lastErr = fmt.Errorf("cleanup command for RHCOS failed: %s: %s", stdout, stderr)
+		log.WithError(lastErr).Errorf("Failed to reclaim boot space (attempt %d/%d)", i+1, reclaimRetryAmount)
+		if i < reclaimRetryAmount-1 {
+			time.Sleep(defaultReclaimRetryDelay)
+		}
+	}
+	return lastErr
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 314 - 322,
The reclaimBootFolderSpace function currently calls
util.ExecutePrivileged("rpm-ostree", "cleanup", "--os=rhcos", "-r") only once;
add retry logic around that call (in reclaimBootFolderSpace) to attempt the
command multiple times (e.g., 3 attempts) with a short backoff between attempts
(time.Sleep) and log each attempt via log.Debugf/log.Warnf; on success return
nil immediately, on repeated failure return the final fmt.Errorf including
stdout/stderr/exitCode from util.ExecutePrivileged so callers can see the last
error. Ensure you reference the util.ExecutePrivileged call and preserve the
existing log messages (including the successful log.Info on success).

Comment on lines +358 to 383
func copyFile(src, dst string) error {
sourceFile, err := os.Open(src)
if err != nil {
return fmt.Errorf("failed to open source file %s: %w", src, err)
}
defer sourceFile.Close()

destFile, err := os.Create(dst)
if err != nil {
return fmt.Errorf("failed to create destination file %s: %w", dst, err)
}
defer destFile.Close()

if _, err := io.Copy(destFile, sourceFile); err != nil {
return fmt.Errorf("failed to copy data from %s to %s: %w", src, dst, err)
}

// Preserve file permissions
sourceInfo, err := os.Stat(src)
if err != nil {
return fmt.Errorf("failed to stat source file %s: %w", src, err)
}
if err := os.Chmod(dst, sourceInfo.Mode()); err != nil {
return fmt.Errorf("failed to set permissions on %s: %w", dst, err)
}
return nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

copyFile doesn't sync data to disk before closing — data loss risk for boot-critical files.

io.Copy fills the kernel page cache, and the deferred destFile.Close() may not flush dirty pages to the underlying storage. If the system reboots immediately after the copy (exactly the use case here), the kernel, initrd, or bootloader config could be partially missing.

🐛 Proposed fix — call `Sync()` before returning
 	if _, err := io.Copy(destFile, sourceFile); err != nil {
 		return fmt.Errorf("failed to copy data from %s to %s: %w", src, dst, err)
 	}

+	if err := destFile.Sync(); err != nil {
+		return fmt.Errorf("failed to sync data to %s: %w", dst, err)
+	}
+
 	// Preserve file permissions
 	sourceInfo, err := os.Stat(src)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/commands/actions/download_boot_artifacts_cmd.go` around lines 358 - 383,
The copyFile function risks lost data because it never fsyncs the destination;
after the io.Copy and again after setting permissions you should call
destFile.Sync() (checking and returning any error) to ensure data and metadata
are flushed to disk before returning; keep the existing deferred
destFile.Close() but add error handling for destFile.Sync() (return fmt.Errorf
with context) both immediately after the copy and after os.Chmod so
boot-critical files are durable.

@CrystalChun CrystalChun force-pushed the reclaim-fix branch 2 times, most recently from e71b6ec to 31dcbbf Compare February 26, 2026 21:37
@CrystalChun CrystalChun marked this pull request as ready for review February 27, 2026 21:09
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2026
@CrystalChun
Copy link
Contributor Author

Requires assisted-service PR openshift/assisted-service#9934

@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 34.21053% with 100 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.96%. Comparing base (4f8a09b) to head (8b99fd4).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...rc/commands/actions/download_boot_artifacts_cmd.go 34.21% 93 Missing and 7 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1316      +/-   ##
==========================================
- Coverage   56.52%   55.96%   -0.56%     
==========================================
  Files          88       88              
  Lines        4315     4440     +125     
==========================================
+ Hits         2439     2485      +46     
- Misses       1699     1772      +73     
- Partials      177      183       +6     
Files with missing lines Coverage Δ
...rc/commands/actions/download_boot_artifacts_cmd.go 30.27% <34.21%> (+10.92%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@CrystalChun CrystalChun force-pushed the reclaim-fix branch 2 times, most recently from 657e75c to 94905d2 Compare March 2, 2026 21:51
… folder

Ensures we have the artifacts ready to be moved to the /boot folder
without failing to download to the /boot folder in case it's
full. Putting the artifacts in one location allows us to easily
calculate its size later on.
@CrystalChun CrystalChun force-pushed the reclaim-fix branch 7 times, most recently from 5d4ced3 to d19b603 Compare March 3, 2026 00:50
Calculates the space required and checks if the /boot directory has
space. If it doesn't, it attempts to reclaim it by running
rpm-ostree cleanup.
Once we've determined there's enough space, we should move
all of the artifacts to the final destination in `/boot`
In the event the step processor creates this step more than
once, we don't want to redo any of the previous steps so
we should indicate if it's already successful.
@CrystalChun CrystalChun force-pushed the reclaim-fix branch 2 times, most recently from 4389042 to 568eccf Compare March 4, 2026 01:38
@openshift-ci openshift-ci bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 4, 2026
@CrystalChun CrystalChun force-pushed the reclaim-fix branch 2 times, most recently from be8fa0a to e129e4f Compare March 4, 2026 13:56
Assisted-by: Claude Code/claude-sonnet-4.5
@openshift-ci
Copy link

openshift-ci bot commented Mar 4, 2026

@CrystalChun: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants