fix(meshcore): serialize 0x8C binary-response consumers to prevent reply theft (#3667)#3722
Conversation
…ply theft (#3667) request_regions and request_telemetry both await the shared PUSH_CODE_BINARY_RESPONSE (0x8C) push. That push carries only a 4-byte correlation tag and no responder pubkey. request_regions sends via the raw sendToRadioFrame path, whose Sent ack has no expectedAckCrc, so it can't obtain a tag and falls back to accepting the first 0x8C reply it sees. If a request_telemetry binary request (same 0x8C push, e.g. the periodic remote- telemetry scheduler) overlaps it on the same connection, the two race for that first reply and mis-attribute each other's payload — a telemetry CayenneLPP body gets parsed as a regions list, or vice versa. Fix: chain every 0x8C-awaiting op on a single per-instance promise (runExclusiveBinaryResponse) so exactly one is listening at a time. The lock is an instance field, so it is inherently source-aware: each source owns its own MeshCoreNativeBackend and physical connection, so same-connection 0x8C ops serialize while concurrent sources never block each other. The lock is held only until each op's listeners tear down (bounded by its internal timeout), so it always releases. Defense-in-depth: the regions parser now drops any token that isn't printable ASCII, so a stray non-regions payload can't render garbage region chips. Adds a regression test firing request_regions and request_telemetry concurrently and asserting each resolves with its own payload with no overlap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4
|
Claude finished @Yeraze's task —— View job PR Review: fix(meshcore): serialize 0x8C binary-response consumers
SummaryThis PR fixes a real race condition where Code Quality & Best Practices
private runExclusiveBinaryResponse<T>(fn: () => Promise<T>): Promise<T> {
const run = this.binaryResponseChain.then(fn, fn);
this.binaryResponseChain = run.then(() => {}, () => {});
return run;
}
One subtle issue worth flagging: Defense-in-depth filter for .filter((s) => s.length > 0 && /^[\x20-\x7e]+$/.test(s));The regex is correct (printable ASCII). The Potential Issues1. const onErr = () => { cleanup(); reject(new Error('Device rejected regions request')); };
2. const responseData: Uint8Array = await this.runExclusiveBinaryResponse(
() => c.sendBinaryRequest(publicKey, [reqType]),
);
3. Test mock (conn as any).sendBinaryRequest = (_pubkey: Uint8Array, _req: number[]) =>
new Promise<Uint8Array>((resolve) => {
...
resolve(telemetryBytes);
});The mock resolves with the raw Performance Considerations
Security ConsiderationsNo security concerns. The ASCII filter on regions output is a good guard against injecting control characters into the UI. Test CoverageThe new test (lines 318–381) is well-structured:
Minor observation: The test name mentions Missing edge case: There's no test for what happens when the first op (e.g., VerdictApprove. The fix correctly addresses the stated race condition using a well-understood concurrency primitive. Code is clean, well-commented, and the test adequately covers the key scenario. The one gap (no timeout-recovery test for chain advancement) is minor and the logic is straightforwardly correct from the implementation. |
Addresses the Claude review's one in-scope suggestion on #3722: add a timeout-recovery test for runExclusiveBinaryResponse. Verifies that when request_regions times out (no BinaryResponse ever arrives), the queued request_telemetry still proceeds — proving the serializer releases on rejection, not just resolution, so a dead repeater can't wedge the connection's 0x8C queue. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4
|
Thanks for the thorough review. Triage of the points raised:
CI is green; the added test brings the file to 24 passing. |
|
Claude finished @Yeraze's task —— View job PR Review: fix(meshcore): serialize 0x8C binary-response consumers
Follow-up Review (latest commit:
|
| Scenario | Test |
|---|---|
| Concurrent 0x8C ops don't steal each other's reply | serializes overlapping 0x8C consumers... (line 318) |
| Chain advances after timeout/rejection | advances the 0x8C chain when an op times out... (line 383) |
The implementation is correct, the test suite is comprehensive, and all previously raised concerns have been addressed or explicitly scoped out. The one minor nit (30ms timeout may be tight under heavy CI load) is not a blocker.
Approve.
…e-reject each other (#3725) (#3728) * fix(meshcore): unified radio-op lock so raw-frame commands can't false-reject each other (#3725) Several native-backend handlers send a raw frame and then once()-race the device's shared, untagged Ok/Sent/Err ack: discover_path, discover_nodes, request_regions, and set_device_time. Because the Err channel carries no correlation tag, a concurrent command's Err can fire the wrong handler's once(Err) and false-reject an unrelated command. This is the command-ack counterpart to the 0x8C reply-theft race fixed in #3722. Generalize the per-instance 0x8C serializer into a single unified radio-op lock (runExclusiveRadioOp, issue #3725 option 2) that covers BOTH the command-ack window (Ok/Sent/Err) and the 0x8C reply window, and wrap all five fragile handlers in it. One chain, one mental model; the only long holder is request_regions' BinaryResponse wait, which it must hold anyway to keep request_telemetry from stealing its reply. The lock stays an instance field, so it remains per-source — concurrent sources never block each other. Because library commands (e.g. send_message via sendTextMessage) are not locked and can still emit Err on the shared channel, request_regions now ignores Err once its Sent ack has arrived, so its multi-second reply wait is not exposed to a foreign Err. The discover_*/set_device_time ack windows are sub-millisecond. discover_nodes' pendingDiscoverTag is now set inside the lock, just before send, so a queued discovery can't clobber a running one's tag. Tests: request_regions ignores a post-Sent foreign Err; a command-ack op (discover_path) serializes against request_regions under the unified lock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4 * test(meshcore): add set_device_time serialization test under the unified lock Addresses the Claude review's one completeness gap on #3728: set_device_time was wrapped in runExclusiveRadioOp but lacked a serialization test. Mirrors the discover_path test (concurrent request_regions, assert no ack-channel overlap), using a non-overlap pairing assertion since set_device_time has no await before the lock so acquisition order is microtask-dependent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4 --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Problem
request_regionsandrequest_telemetryboth await the sharedPUSH_CODE_BINARY_RESPONSE(0x8C) push, which carries only a 4-byte correlationtagand no responder pubkey.request_regionssends via the rawsendToRadioFramepath, whoseSentack has noexpectedAckCrc, so it can't obtain a tag and falls back to accepting the first0x8Creply it sees.If a
request_telemetrybinary request (same0x8Cpush — e.g. the periodic remote-telemetry scheduler) overlaps it on the same connection, the two race for that first reply and mis-attribute each other's payload: a telemetry CayenneLPP body gets parsed as a regions list, or vice versa. Worst case is garbage/empty region chips on a manual, user-triggered action.This is a latent issue that survives the already-merged timeout fix (#3702 /
3a035711) — that fix made the null-tag fallback work, but the fallback is tag-blind and the0x8Cevent is shared.Fix
runExclusiveBinaryResponse— chains every0x8C-awaiting op on a single per-instance promise so exactly one is listening at a time. Wraps bothrequest_regionsandrequest_telemetry.MeshCoreNativeBackend+ physical connection. Same-connection0x8Cops serialize; concurrent sources never block each other. A module-level global would have wrongly coupled sources — this doesn't.Test
Adds a regression test firing
request_regionsandrequest_telemetryconcurrently on one connection, asserting (a) each resolves with its own payload (regions =['saxony'], not the telemetry bytes) and (b) the call log shows no overlap — each:sendis immediately followed by its own:reply.Verified: 481 MeshCore backend/manager tests pass (
success=true);tscclean for the changed file.🤖 Generated with Claude Code