Five fixes for plugin reliability against current openclaw + non-trivial state#7
Open
tommyjoseph wants to merge 5 commits into
Open
Conversation
`readJsonFileWithFallback` and `writeJsonFileAtomically` were exported from the top-level `openclaw/plugin-sdk` in openclaw 2026.2 (281 names exported there) but moved into the `openclaw/plugin-sdk/json-store` submodule in openclaw 2026.4 as part of a tree-shaking refactor (only 5 names left at top level). Without this fix the plugin throws `TypeError: ... is not a function` on the first scheduled push when running against openclaw 2026.4+, and the gateway never produces a backup. The peer/dev dependency range is bumped to `>=2026.4.0` to make the new SDK layout requirement explicit. Verified against openclaw 2026.5.7. Tests: 70 pass.
The b2 client built the request path as `/${bucket}/${key}` and used
that raw string both for SigV4 canonical-request signing and for the
fetch URL. Node's `fetch` URL-encodes spaces (and other reserved
characters) before transmission, but the signer signed the unencoded
path. B2 recomputes the canonical request from the encoded path it
receives, gets a different signature, and rejects the request with
HTTP 403 `AccessDenied — Signature validation failed`.
Symptom: any object key containing a space (or `+`, `'`, `(`, `)`,
`*`, `!`, etc.) failed with 403, while UUID-named keys succeeded.
In a real workload that means user-named files in `workspace/` like
`Marketing Image 1.png` cause the entire scheduled push to abort,
since the first 403 propagates out of `Promise.all`.
Fix: introduce `s3EncodePath` which URI-encodes each segment per the
S3 SigV4 rules (preserves `/` between segments, encodes the AWS-
extended set `' ( ) * !` on top of `encodeURIComponent`'s defaults).
Use the encoded form in both signing and the fetch URL so the two
strings stay in sync.
Tests: added 9 unit tests for `s3EncodePath` covering spaces, the
AWS-extended set, unicode, unreserved chars, slashes, and edge cases.
70 → 79 tests pass.
A scheduled push of a non-trivial state directory makes thousands of serial putObject calls. With the previous code, a single transient failure (Node `fetch failed`, B2 5xx, B2 429, or B2 400 IncompleteBody from a connection drop mid-upload) caused `Promise.all` to reject and aborted the entire push — wasting all upload progress and falling back to retrying everything from scratch on the next cron fire. This commit introduces `putObjectWithRetry` (in a new `retry.ts` module) which wraps b2.putObject with exponential-backoff-with-jitter: - 6 attempts (initial + 5 retries) - 500ms base delay, doubling, capped at 15s - ±25% jitter - Retries `fetch failed`, HTTP 5xx, 408, 429, and 400 IncompleteBody - Propagates other errors (401/403/404/InvalidRequest/etc.) immediately Push.ts is updated to use this wrapper for both the per-file upload loop and the final manifest upload. Tests: 14 unit tests cover both `isRetryablePutError` (retry decision matrix) and `putObjectWithRetry` (success path, retry-then-succeed, non-retryable propagation, exhaustion, fetch failed, IncompleteBody vs InvalidRequest disambiguation). 79 → 93 tests pass.
The gather phase walks the state directory and snapshots a list of file paths. The upload phase then reads each file in turn. If another subsystem (an LCM compactor, a session-cleanup hook, a user `rm`, etc.) deletes a file between those two phases, the upload's `readFile` rejects with `ENOENT` and aborts the whole push via `Promise.all`. This is a real condition in any agent that maintains its own session files in the background — losing 15+ minutes of upload progress to a single deleted JSONL file is not acceptable. Fix: catch ENOENT specifically inside the per-file upload task, log a debug line, and skip the file. Other read errors (EACCES, EIO, etc.) still propagate. The next push naturally picks up the new state.
B2 rejects object keys containing characters with codes <32 with HTTP 400 `InvalidRequest — File names must not contain unicode characters with codes less than 32`. The plugin's previous behavior was to send the file anyway, which aborted the entire push the moment B2 saw the first such filename. In practice this happened for paths produced by upstream document/ email-attachment pipelines that didn't strip trailing CR/LF from attachment names. The underlying bug belongs to those pipelines, but the backup plugin should not lose the rest of the snapshot just because one filename is malformed. Filter at gather time inside `shouldInclude`. Tests cover trailing CR, embedded newline, tab inside a path segment, NUL byte, and a control char in an otherwise-included filename. 92 → 93 tests pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Five independent fixes, one per commit, each addressing a distinct bug discovered while bringing this plugin up against openclaw 2026.4 / 2026.5 with a real ~25k-file state directory. Tests grow from 70 → 93 (all green). End-to-end verified with a ~3.7 GB encrypted snapshot landing successfully in B2 after the patches.
Closes
Commits (in order)
Fix import path for json-store helpers; bump openclaw peer range
readJsonFileWithFallback,writeJsonFileAtomicallymoved toopenclaw/plugin-sdk/json-store;OpenClawPluginService,OpenClawPluginServiceContextmoved toopenclaw/plugin-sdk/core. Bumps peer/devopenclawto>=2026.4.0to make the layout requirement explicit. Without this the plugin throwsTypeError: ... is not a functionon every push.URL-encode S3 object keys for SigV4 + wire URL
putObject/getObject/deleteObjectnow use a news3EncodePathhelper for both signing and the fetch URL. Without this any object key with a space,+,',(,),*, or!returns 403 AccessDenied. +9 unit tests fors3EncodePath.Add exponential-backoff retry around b2.putObject calls
New
src/retry.tsmodule. 6 attempts, 500ms base, 15s cap, ±25% jitter. Retriesfetch failed, HTTP 5xx, 408, 429, 400 IncompleteBody; propagates other errors. Both putObject sites inpush.tsuse it. +13 unit tests covering retry decisions and behavior.Skip ENOENT during upload (file vanished between gather and read)
Inside the per-file upload task, catch
ENOENTfromreadFileand skip the file. Other read errors still propagate. The next push naturally picks up the new state.Skip files whose paths contain ASCII control characters
shouldIncludenow also rejects any path with a char code <32. Real workloads see this for filenames produced by upstream pipelines that didn't strip CR/LF. +1 test covering several control-char shapes.Test plan
pnpm test→ 93 passed, 0 failed (was 70 baseline)pnpm build→ cleanpnpm typecheck→ no new errors (one pre-existingindex.ts:69tool signature error remains, unrelated to these patches)Notes for review