Skip to content

Commit 73c4206

Browse files
committed
backup(nas): move backup-mode policy + stdout markers from script to wrapper
Address abh1sar review on PR #13074 (nasbackup.sh:155, :193, :358; LibvirtTake BackupCommandWrapper.java:124). The script was carrying caller-side policy: arg validation, fallback decisions, and stdout markers that the wrapper had to parse out before the size-parsing logic could run. Move that policy into Java and use dedicated exit codes for the signals the wrapper needs. Script (scripts/vm/hypervisor/kvm/nasbackup.sh): * Drop the per-mode required-args checks (the wrapper now pre-validates). * Replace the INCREMENTAL_FALLBACK stdout marker with exit code 21 (EXIT_INCREMENTAL_UNSUPPORTED): emitted when the running-VM path can't re-register the parent checkpoint, and when the stopped-VM path was asked for incremental. The wrapper retries the script as a full backup and sets incrementalFallback on the BackupAnswer. * Replace the BITMAP_CREATED stdout marker with exit code 22 (EXIT_BITMAP_NOT_SEEDED), emitted only by the stopped-VM path when qemu-img bitmap --add failed for every source disk. Backup file is valid but no usable bitmap exists on the host; wrapper records bitmapCreated=null so NASBackupProvider clears active_checkpoint_id and the next backup starts a fresh full chain. Running-VM success path no longer needs a marker — libvirt's backup-begin atomically creates the checkpoint. LibvirtTakeBackupCommandWrapper.java: * Pre-validate incremental args (mode-vs-bitmapNew/Parent/parentPaths) before invoking the script. Returns a failed BackupAnswer on missing args, keeping the script agnostic to caller policy. * Extract runBackupScript() so the same code can fire the retry-as-full after EXIT_INCREMENTAL_UNSUPPORTED without duplicating arg assembly. * On EXIT_INCREMENTAL_UNSUPPORTED + requestedMode==incremental, re-invoke with mode=full and only --bitmap-new (drop --bitmap-parent/--parent-paths); set incrementalFallback=true on the eventual answer. * On EXIT_BITMAP_NOT_SEEDED, treat as success but set bitmapCreated=null. * Drop the stdout-marker stripping loop (markers no longer emitted), and the separate BITMAP_CREATED parsing — bitmapCreated mirrors command.getBitmapNew() unless the not-seeded exit code says otherwise. NASBackupProvider.java: * Refresh the two comment blocks that referenced the old BITMAP_CREATED= stdout signal to describe the new exit-code path. No behaviour change in this file.
1 parent 096bef1 commit 73c4206

3 files changed

Lines changed: 152 additions & 96 deletions

File tree

plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -406,12 +406,12 @@ private List<String> composeParentBackupPaths(Backup parent, long vmId) {
406406
* other providers can implement their own chain semantics without schema changes.
407407
*/
408408
private void persistChainMetadata(Backup backup, ChainDecision decision, String bitmapFromAgent) {
409-
// Only persist nas.bitmap_name when the agent confirmed it via BITMAP_CREATED=. If we
410-
// fall back to decision.bitmapNew when the agent didn't emit BITMAP_CREATED= (e.g.,
411-
// stopped-VM path where the qemu-img pre-seed failed, or running-VM path where libvirt
412-
// backup-begin succeeded but the bitmap line wasn't surfaced for any reason), we'd
413-
// anchor the next incremental on a bitmap that doesn't exist on the host. Better to
414-
// leave it empty so the next backup sees no checkpoint and starts a fresh full chain.
409+
// Only persist nas.bitmap_name when the agent confirmed the bitmap exists on the host.
410+
// The agent wrapper sets bitmapFromAgent=null when nasbackup.sh exits
411+
// EXIT_BITMAP_NOT_SEEDED (=22) — currently only the stopped-VM path where qemu-img
412+
// bitmap --add failed on every source disk. Anchoring the next incremental on a
413+
// bitmap that doesn't exist would force a non-recoverable failure, so we leave the
414+
// detail empty and let the next backup start a fresh full chain.
415415
if (bitmapFromAgent != null && !bitmapFromAgent.isEmpty()) {
416416
backupDetailsDao.persist(new BackupDetailVO(backup.getId(), NASBackupChainKeys.BITMAP_NAME, bitmapFromAgent, true));
417417
}
@@ -562,10 +562,11 @@ public Pair<Boolean, Backup> takeBackup(final VirtualMachine vm, Boolean quiesce
562562
persistChainMetadata(backupVO, effective, answer.getBitmapCreated());
563563
// Pin the VM's active_checkpoint_id to whichever bitmap the agent actually
564564
// created. This is the only valid parent for the next incremental (see
565-
// decideChain). For the stopped-VM offline path BITMAP_CREATED is null —
566-
// no bitmap exists on the host, so we also clear any stale detail from a
567-
// prior online backup. Either way, after this step the detail accurately
568-
// reflects what's on the running QEMU (or absence thereof).
565+
// decideChain). When the agent wrapper sets bitmapCreated=null (script exited
566+
// EXIT_BITMAP_NOT_SEEDED — stopped-VM path where qemu-img bitmap --add failed),
567+
// no bitmap exists on the host, so we also clear any stale detail from a prior
568+
// online backup. Either way, after this step the detail accurately reflects
569+
// what's on the running QEMU (or absence thereof).
569570
String confirmedBitmap = answer.getBitmapCreated();
570571
if (confirmedBitmap != null) {
571572
upsertVmActiveCheckpoint(vm.getId(), confirmedBitmap);

plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtTakeBackupCommandWrapper.java

Lines changed: 111 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,14 @@
4242
@ResourceWrapper(handles = TakeBackupCommand.class)
4343
public class LibvirtTakeBackupCommandWrapper extends CommandWrapper<TakeBackupCommand, Answer, LibvirtComputingResource> {
4444
private static final Integer EXIT_CLEANUP_FAILED = 20;
45+
private static final Integer EXIT_INCREMENTAL_UNSUPPORTED = 21;
46+
// Backup file is valid, but the host has no bitmap to anchor the next incremental on.
47+
// Used by the stopped-VM path when qemu-img bitmap --add failed for every source disk.
48+
private static final Integer EXIT_BITMAP_NOT_SEEDED = 22;
49+
50+
private static final String MODE_FULL = "full";
51+
private static final String MODE_INCREMENTAL = "incremental";
52+
4553
@Override
4654
public Answer execute(TakeBackupCommand command, LibvirtComputingResource libvirtComputingResource) {
4755
final String vmName = command.getVmName();
@@ -54,6 +62,13 @@ public Answer execute(TakeBackupCommand command, LibvirtComputingResource libvir
5462
KVMStoragePoolManager storagePoolMgr = libvirtComputingResource.getStoragePoolMgr();
5563
int timeout = command.getWait() > 0 ? command.getWait() * 1000 : libvirtComputingResource.getCmdsTimeout();
5664

65+
// Pre-validate incremental args here rather than relying on the script to error out.
66+
// Keeps the script agnostic to caller policy (it just does what it's told).
67+
String validationError = validateBackupArgs(command);
68+
if (validationError != null) {
69+
return new BackupAnswer(command, false, validationError);
70+
}
71+
5772
List<String> diskPaths = new ArrayList<>();
5873
if (Objects.nonNull(volumePaths)) {
5974
for (int idx = 0; idx < volumePaths.size(); idx++) {
@@ -69,6 +84,59 @@ public Answer execute(TakeBackupCommand command, LibvirtComputingResource libvir
6984
}
7085
}
7186

87+
final String requestedMode = command.getMode();
88+
Pair<Integer, String> result = runBackupScript(libvirtComputingResource, command, vmName, backupRepoType, backupRepoAddress,
89+
mountOptions, backupPath, diskPaths, requestedMode,
90+
command.getBitmapNew(), command.getBitmapParent(), command.getParentPaths(), timeout);
91+
92+
boolean incrementalFallback = false;
93+
if (result.first() == EXIT_INCREMENTAL_UNSUPPORTED && MODE_INCREMENTAL.equals(requestedMode)) {
94+
// Script told us the incremental can't proceed (parent checkpoint can't be
95+
// re-registered, or VM is stopped). Re-invoke as full with the same --bitmap-new
96+
// so the chain restarts cleanly. Drop --bitmap-parent + --parent-paths since
97+
// they no longer apply.
98+
logger.info("nasbackup.sh signalled incremental unsupported for VM " + vmName + " — retrying as full");
99+
result = runBackupScript(libvirtComputingResource, command, vmName, backupRepoType, backupRepoAddress,
100+
mountOptions, backupPath, diskPaths, MODE_FULL,
101+
command.getBitmapNew(), null, null, timeout);
102+
incrementalFallback = true;
103+
}
104+
105+
boolean bitmapSeeded = true;
106+
if (result.first() == EXIT_BITMAP_NOT_SEEDED) {
107+
// Backup file is valid; the host just has no bitmap. Treat as success but
108+
// mark bitmapCreated=null so the orchestrator clears active_checkpoint_id.
109+
bitmapSeeded = false;
110+
} else if (result.first() != 0) {
111+
logger.debug("Failed to take VM backup: " + result.second());
112+
BackupAnswer answer = new BackupAnswer(command, false, result.second().trim());
113+
if (result.first() == EXIT_CLEANUP_FAILED) {
114+
logger.debug("Backup cleanup failed");
115+
answer.setNeedsCleanup(true);
116+
}
117+
return answer;
118+
}
119+
120+
String stdout = result.second().trim();
121+
long backupSize = parseBackupSize(stdout, diskPaths);
122+
123+
BackupAnswer answer = new BackupAnswer(command, true, stdout);
124+
answer.setSize(backupSize);
125+
// bitmapCreated mirrors what we asked the script to create — except when the
126+
// script exited EXIT_BITMAP_NOT_SEEDED, in which case the host has no bitmap
127+
// and the orchestrator must clear active_checkpoint_id.
128+
answer.setBitmapCreated(bitmapSeeded ? command.getBitmapNew() : null);
129+
answer.setIncrementalFallback(incrementalFallback);
130+
return answer;
131+
}
132+
133+
/**
134+
* Run nasbackup.sh once with the given args. Returns the exit code + captured stdout.
135+
*/
136+
private Pair<Integer, String> runBackupScript(LibvirtComputingResource libvirtComputingResource,
137+
TakeBackupCommand command, String vmName, String backupRepoType, String backupRepoAddress,
138+
String mountOptions, String backupPath, List<String> diskPaths, String mode,
139+
String bitmapNew, String bitmapParent, List<String> parentPaths, int timeout) {
72140
List<String> argv = new ArrayList<>(Arrays.asList(
73141
libvirtComputingResource.getNasBackupPath(),
74142
"-o", "backup",
@@ -80,83 +148,75 @@ public Answer execute(TakeBackupCommand command, LibvirtComputingResource libvir
80148
"-q", command.getQuiesce() != null && command.getQuiesce() ? "true" : "false",
81149
"-d", diskPaths.isEmpty() ? "" : String.join(",", diskPaths)
82150
));
83-
// Incremental NAS backup args (only added when the orchestrator asked for full/inc mode).
84-
if (command.getMode() != null && !command.getMode().isEmpty()) {
151+
if (mode != null && !mode.isEmpty()) {
85152
argv.add("-M");
86-
argv.add(command.getMode());
153+
argv.add(mode);
87154
}
88-
if (command.getBitmapNew() != null && !command.getBitmapNew().isEmpty()) {
155+
if (bitmapNew != null && !bitmapNew.isEmpty()) {
89156
argv.add("--bitmap-new");
90-
argv.add(command.getBitmapNew());
157+
argv.add(bitmapNew);
91158
}
92-
if (command.getBitmapParent() != null && !command.getBitmapParent().isEmpty()) {
159+
if (bitmapParent != null && !bitmapParent.isEmpty()) {
93160
argv.add("--bitmap-parent");
94-
argv.add(command.getBitmapParent());
161+
argv.add(bitmapParent);
95162
}
96-
if (command.getParentPaths() != null && !command.getParentPaths().isEmpty()) {
163+
if (parentPaths != null && !parentPaths.isEmpty()) {
97164
argv.add("--parent-paths");
98-
argv.add(String.join(",", command.getParentPaths()));
165+
argv.add(String.join(",", parentPaths));
99166
}
100167

101168
List<String[]> commands = new ArrayList<>();
102169
commands.add(argv.toArray(new String[0]));
170+
return Script.executePipedCommands(commands, timeout);
171+
}
103172

104-
Pair<Integer, String> result = Script.executePipedCommands(commands, timeout);
105-
106-
if (result.first() != 0) {
107-
logger.debug("Failed to take VM backup: " + result.second());
108-
BackupAnswer answer = new BackupAnswer(command, false, result.second().trim());
109-
if (result.first() == EXIT_CLEANUP_FAILED) {
110-
logger.debug("Backup cleanup failed");
111-
answer.setNeedsCleanup(true);
112-
}
113-
return answer;
173+
/**
174+
* Return a human-readable validation error string, or {@code null} if the command's
175+
* incremental-backup args are internally consistent.
176+
*/
177+
private String validateBackupArgs(TakeBackupCommand command) {
178+
String mode = command.getMode();
179+
if (mode == null || mode.isEmpty()) {
180+
return null; // legacy full-only — no extra args expected
114181
}
115-
116-
// Strip out our incremental marker lines before parsing size, so the legacy
117-
// numeric-suffix parser keeps working.
118-
String stdout = result.second().trim();
119-
String bitmapCreated = null;
120-
boolean incrementalFallback = false;
121-
StringBuilder filtered = new StringBuilder();
122-
for (String line : stdout.split("\n")) {
123-
String trimmed = line.trim();
124-
if (trimmed.startsWith("BITMAP_CREATED=")) {
125-
// The marker only confirms the bitmap was actually created on the disks
126-
// (it isn't, e.g., for stopped-VM RBD/LINSTOR sources). The name itself is
127-
// the value we already passed via --bitmap-new, so use that rather than
128-
// re-parsing the echoed value.
129-
bitmapCreated = command.getBitmapNew();
130-
continue;
182+
if (MODE_INCREMENTAL.equals(mode)) {
183+
if (command.getBitmapNew() == null || command.getBitmapNew().isEmpty()) {
184+
return "incremental mode requires bitmapNew";
131185
}
132-
if (trimmed.startsWith("INCREMENTAL_FALLBACK=")) {
133-
incrementalFallback = true;
134-
continue;
186+
if (command.getBitmapParent() == null || command.getBitmapParent().isEmpty()) {
187+
return "incremental mode requires bitmapParent";
135188
}
136-
if (filtered.length() > 0) {
137-
filtered.append("\n");
189+
if (command.getParentPaths() == null || command.getParentPaths().isEmpty()) {
190+
return "incremental mode requires parentPaths";
138191
}
139-
filtered.append(line);
192+
return null;
140193
}
141-
String numericOutput = filtered.toString().trim();
194+
if (MODE_FULL.equals(mode)) {
195+
if (command.getBitmapNew() == null || command.getBitmapNew().isEmpty()) {
196+
return "full mode requires bitmapNew (the bitmap to create for the next incremental)";
197+
}
198+
return null;
199+
}
200+
return "Unknown backup mode: " + mode;
201+
}
142202

203+
/**
204+
* Sum the per-disk size lines emitted by nasbackup.sh. Single-volume mode emits one
205+
* line containing just the byte count; multi-volume mode emits one line per disk
206+
* whose first whitespace-separated token is the byte count.
207+
*/
208+
private long parseBackupSize(String stdout, List<String> diskPaths) {
143209
long backupSize = 0L;
144210
if (CollectionUtils.isNullOrEmpty(diskPaths)) {
145-
List<String> outputLines = Arrays.asList(numericOutput.split("\n"));
211+
List<String> outputLines = Arrays.asList(stdout.split("\n"));
146212
if (!outputLines.isEmpty()) {
147213
backupSize = Long.parseLong(outputLines.get(outputLines.size() - 1).trim());
148214
}
149215
} else {
150-
String[] outputLines = numericOutput.split("\n");
151-
for(String line : outputLines) {
216+
for (String line : stdout.split("\n")) {
152217
backupSize = backupSize + Long.parseLong(line.split(" ")[0].trim());
153218
}
154219
}
155-
156-
BackupAnswer answer = new BackupAnswer(command, true, stdout);
157-
answer.setSize(backupSize);
158-
answer.setBitmapCreated(bitmapCreated);
159-
answer.setIncrementalFallback(incrementalFallback);
160-
return answer;
220+
return backupSize;
161221
}
162222
}

scripts/vm/hypervisor/kvm/nasbackup.sh

Lines changed: 30 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,11 @@ logFile="/var/log/cloudstack/agent/agent.log"
4949

5050
EXIT_CLEANUP_FAILED=20
5151
EXIT_INCREMENTAL_UNSUPPORTED=21
52+
# Stopped-VM backup succeeded but the qemu-img bitmap pre-seed failed on every disk
53+
# (e.g. RBD/LINSTOR sources, or the bitmap --add operation errored). The backup file
54+
# is valid; the wrapper records bitmapCreated=null so the orchestrator clears the
55+
# VM's active_checkpoint_id and the next backup starts a fresh full chain.
56+
EXIT_BITMAP_NOT_SEEDED=22
5257

5358
log() {
5459
[[ "$verb" -eq 1 ]] && builtin echo "$@"
@@ -128,23 +133,12 @@ backup_running_vm() {
128133

129134
# Determine effective mode for this run.
130135
# Legacy callers (no -M argument) get the original full-only behavior with no checkpoint.
136+
# The Java wrapper (LibvirtTakeBackupCommandWrapper) pre-validates required args before
137+
# invoking the script; the case below is a defensive fallback for direct invocations.
131138
local effective_mode="${MODE:-legacy-full}"
132139
local make_checkpoint=0
133140
case "$effective_mode" in
134-
incremental)
135-
if [[ -z "$BITMAP_PARENT" || -z "$BITMAP_NEW" || -z "$PARENT_PATHS" ]]; then
136-
echo "incremental mode requires --bitmap-parent, --bitmap-new, and --parent-paths"
137-
cleanup
138-
exit 1
139-
fi
140-
make_checkpoint=1
141-
;;
142-
full)
143-
if [[ -z "$BITMAP_NEW" ]]; then
144-
echo "full mode requires --bitmap-new (the bitmap to create for the next incremental)"
145-
cleanup
146-
exit 1
147-
fi
141+
incremental|full)
148142
make_checkpoint=1
149143
;;
150144
legacy-full)
@@ -174,10 +168,12 @@ backup_running_vm() {
174168
: # parent checkpoint re-registered; the incremental can proceed against it
175169
else
176170
# No saved checkpoint XML (e.g. a backup taken before this fix) or redefine failed.
177-
# Fall back to a full so the chain restarts cleanly instead of failing the backup.
178-
echo "INCREMENTAL_FALLBACK=full (parent checkpoint $BITMAP_PARENT could not be re-registered)"
179-
effective_mode="full"
180-
BITMAP_PARENT=""
171+
# Signal the Java wrapper to retry as a full backup so the chain restarts cleanly
172+
# instead of failing the backup. The wrapper is responsible for the retry and for
173+
# recording incrementalFallback=true on the resulting BackupAnswer.
174+
log -e "incremental: parent checkpoint $BITMAP_PARENT could not be re-registered — exiting $EXIT_INCREMENTAL_UNSUPPORTED for caller-driven fallback"
175+
cleanup
176+
exit $EXIT_INCREMENTAL_UNSUPPORTED
181177
fi
182178
fi
183179
fi
@@ -338,9 +334,9 @@ backup_running_vm() {
338334
# Persist the FULL checkpoint XML next to this backup so a later incremental can
339335
# re-register this checkpoint with --redefine after the VM restarts (which wipes
340336
# libvirt's checkpoint registry but leaves the bitmap on the qcow2).
337+
# The Java wrapper records bitmapCreated from its own --bitmap-new arg on success,
338+
# so no stdout signal is needed here.
341339
virsh -c qemu:///system checkpoint-dumpxml "$VM" "$BITMAP_NEW" > "$dest/$BITMAP_NEW.checkpoint.xml" 2>/dev/null || true
342-
# Echo the bitmap name on its own line so the Java caller can capture it for backup_details.
343-
echo "BITMAP_CREATED=$BITMAP_NEW"
344340
fi
345341

346342
umount $mount_point
@@ -349,13 +345,11 @@ backup_running_vm() {
349345

350346
backup_stopped_vm() {
351347
# Stopped VMs cannot use libvirt's backup-begin (no QEMU process). Take a full
352-
# backup via qemu-img convert. If the caller asked for incremental, fall back
353-
# to full and signal the fallback so the orchestrator can record it as a full
354-
# in the chain.
348+
# backup via qemu-img convert. If the caller asked for incremental, signal the
349+
# Java wrapper to retry as full and record the fallback on the BackupAnswer.
355350
if [[ "$MODE" == "incremental" ]]; then
356-
# Emit on stdout so Script.executePipedCommands in LibvirtTakeBackupCommandWrapper
357-
# can parse it and record the backup as FULL.
358-
echo "INCREMENTAL_FALLBACK=full (VM stopped — incremental requires running VM)"
351+
log -e "incremental: VM stopped — exiting $EXIT_INCREMENTAL_UNSUPPORTED for caller-driven fallback to full"
352+
exit $EXIT_INCREMENTAL_UNSUPPORTED
359353
fi
360354

361355
mount_operation
@@ -397,16 +391,17 @@ backup_stopped_vm() {
397391
done
398392
sync
399393

400-
# Surface the bitmap name we created so the orchestrator can persist it as
401-
# the VM's active_checkpoint_id. Empty when sources weren't file-backed or
402-
# qemu-img bitmap failed — orchestrator handles either case.
403-
# Stdout (not stderr) so Script.executePipedCommands in the Java wrapper
404-
# can parse it — matches the backup_running_vm path.
405-
if [[ "$bitmap_seeded" == "1" ]]; then
406-
echo "BITMAP_CREATED=$BITMAP_NEW"
407-
fi
408-
409394
ls -l --numeric-uid-gid $dest | awk '{print $5}'
395+
396+
# Signal to the Java wrapper whether a usable bitmap exists on the host. When the
397+
# caller asked for BITMAP_NEW but we couldn't pre-seed it (RBD/LINSTOR sources, or
398+
# qemu-img bitmap --add failed on every disk), exit $EXIT_BITMAP_NOT_SEEDED so the
399+
# wrapper records bitmapCreated=null. The orchestrator (NASBackupProvider) then
400+
# clears the VM's active_checkpoint_id and the next backup starts a fresh chain.
401+
if [[ -n "$BITMAP_NEW" && "$bitmap_seeded" != "1" ]]; then
402+
log -e "stopped-VM backup: bitmap pre-seed skipped or failed for all disks — exiting $EXIT_BITMAP_NOT_SEEDED so wrapper clears active_checkpoint_id"
403+
exit $EXIT_BITMAP_NOT_SEEDED
404+
fi
410405
}
411406

412407
delete_backup() {

0 commit comments

Comments
 (0)