fix(meshcore): unified radio-op lock so raw-frame commands can't false-reject each other (#3725)#3728
Conversation
…e-reject each other (#3725) Several native-backend handlers send a raw frame and then once()-race the device's shared, untagged Ok/Sent/Err ack: discover_path, discover_nodes, request_regions, and set_device_time. Because the Err channel carries no correlation tag, a concurrent command's Err can fire the wrong handler's once(Err) and false-reject an unrelated command. This is the command-ack counterpart to the 0x8C reply-theft race fixed in #3722. Generalize the per-instance 0x8C serializer into a single unified radio-op lock (runExclusiveRadioOp, issue #3725 option 2) that covers BOTH the command-ack window (Ok/Sent/Err) and the 0x8C reply window, and wrap all five fragile handlers in it. One chain, one mental model; the only long holder is request_regions' BinaryResponse wait, which it must hold anyway to keep request_telemetry from stealing its reply. The lock stays an instance field, so it remains per-source — concurrent sources never block each other. Because library commands (e.g. send_message via sendTextMessage) are not locked and can still emit Err on the shared channel, request_regions now ignores Err once its Sent ack has arrived, so its multi-second reply wait is not exposed to a foreign Err. The discover_*/set_device_time ack windows are sub-millisecond. discover_nodes' pendingDiscoverTag is now set inside the lock, just before send, so a queued discovery can't clobber a running one's tag. Tests: request_regions ignores a post-Sent foreign Err; a command-ack op (discover_path) serializes against request_regions under the unified lock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4
|
Claude finished @Yeraze's task —— View job PR Review — fix(meshcore): unified radio-op lock (#3725)
SummaryThis PR introduces a unified per-instance mutex ( Code Quality & Design
The implementation is idiomatic and correct. The chain-continuation pattern: const run = this.radioOpChain.then(fn, fn);
this.radioOpChain = run.then(() => {}, () => {});
return run;correctly (a) passes Per-source isolation is inherent — because Potential Issue:
|
…ied lock Addresses the Claude review's one completeness gap on #3728: set_device_time was wrapped in runExclusiveRadioOp but lacked a serialization test. Mirrors the discover_path test (concurrent request_regions, assert no ack-channel overlap), using a non-overlap pairing assertion since set_device_time has no await before the lock so acquisition order is microtask-dependent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4
|
Thanks for the review. The one completeness gap — no serialization test for The other points were all self-resolved in your review (resolvePublicKey-outside-lock is intentional; the unlocked- |
|
Claude finished @Yeraze's task —— View job PR Review — fix(meshcore): unified radio-op lock (#3725)
SummaryThis PR introduces The implementation is correct and the design is sound. Notes below in descending order of significance. Code Quality & Design
The chain-continuation pattern is a well-known async mutex idiom and is implemented correctly:
No issues here. Potential Issue:
|
Closes #3725.
Problem
Four native-backend handlers send a raw frame via
sendToRadioFrameand thenonce()-race the device's shared, untaggedOk/Sent/Errack:discover_pathSentvsErrdiscover_nodesOkvsErrrequest_regionsSent(thenBinaryResponse) vsErrset_device_timeOkvsErrBecause the
Errchannel carries no correlation tag, a concurrent command'sErrcan fire the wrong handler'sonce(Err)and false-reject an unrelated command. This is the command-ack counterpart to the0x8Creply-theft race fixed in #3722.Fix — unified lock (issue #3725, option 2)
Generalize the per-instance
0x8Cserializer (runExclusiveBinaryResponse) into a singlerunExclusiveRadioOpthat covers both the command-ack window (Ok/Sent/Err) and the0x8Creply window, and wrap all five fragile handlers (the four above +request_telemetry) in it. One chain, one mental model.request_regions'BinaryResponsewait, which it must hold anyway to keeprequest_telemetryfrom stealing its reply. Thediscover_*/set_device_timewindows are sub-millisecond.send_messageviasendTextMessage) aren't locked and can still emitErron the shared channel. Sorequest_regionsnow ignoresErronce itsSentack has arrived — its multi-second reply wait is no longer exposed to a foreignErr(this was the bot's exact fix(meshcore): serialize 0x8C binary-response consumers to prevent reply theft (#3667) #3722-review scenario). The briefdiscover_*/set_device_timeack windows are left as-is.discover_nodes'pendingDiscoverTagis now set inside the lock, just before send, so a queued discovery can't clobber a running one's tag while its0x8Eresponses are still arriving.Tests
request_regionsignores a post-SentforeignErrand still resolves with its region list (fails without the gate).discover_path) serializes againstrequest_regionsunder the unified lock — asserted via call-log ordering (no overlap on the ack channel).486 MeshCore backend/manager tests pass (
success=true);tscclean for the changed file.🤖 Generated with Claude Code