You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -78,48 +80,59 @@ This achieves Ken's property that **outputs are only externalized after successf
78
80
79
81
`RemoteHandle` persists messages to `remotePending` before transmitting for a different reason: to enable retransmission on recovery if the transmission or ACK is lost. This is part of the at-least-once delivery mechanism, not the output validity mechanism.
80
82
81
-
### Remaining Gaps (Receive Side)
83
+
### Receive-Side Implementation (Issue #808)
82
84
83
-
The remaining gaps are on the **receive side** of remote messaging. Code review of `RemoteHandle.handleRemoteMessage()` revealed specific bugs:
85
+
The receive side of remote messaging implements Ken's exactly-once delivery guarantee through transactional message processing with duplicate detection.
84
86
85
-
#### 1. No Duplicate Detection (Bug)
87
+
#### Duplicate Detection
86
88
87
-
Ken maintains a `Done` table ensuring each message is delivered to the application **at most once**.
89
+
Ken maintains a `Done` table ensuring each message is delivered to the application **at most once**. Our implementation achieves this by checking `seq <= highestReceivedSeq` before processing:
this.#handleRemoteDeliver(params); // Always runs, even for duplicates!
100
97
```
101
98
102
-
**Problem**: There is no deduplication check. Even when `seq<=highestReceivedSeq`, the message is processed. After a crash and retransmit, duplicate messages will be delivered to the vat.
99
+
After a crash and retransmit, duplicate messages are detected and ignored.
100
+
101
+
#### Transactional Message Processing
103
102
104
-
#### 2. Wrong Persistence Order (Bug)
103
+
Message processing is wrapped in a database savepoint to ensure atomicity:
**What's needed**: Process the message first (add to run queue), then persist `highestReceivedSeq`. Ideally these should be atomic.
129
+
This achieves atomicity: if a crash occurs before commit, both the run queue entry and the sequence update roll back together. The remote retransmits, and we process it correctly.
117
130
118
-
#### 3. FIFO Enforcement on Receive (Not a Gap)
131
+
#### FIFO Ordering
119
132
120
133
Ken enforces per-sender FIFO ordering via `next_ready()` which only delivers the next expected sequence number.
121
134
122
-
**Our situation**: We use TCP-based transports (libp2p streams) which guarantee in-order delivery during normal operation. Out-of-order arrival only occurs after a crash when the sender retransmits. With proper deduplication (fix #1 above), retransmitted messages for already-processed sequence numbers will be dropped, maintaining FIFO semantics.
135
+
**Our situation**: We use TCP-based transports (libp2p streams) which guarantee in-order delivery during normal operation. Out-of-order arrival only occurs after a crash when the sender retransmits. With duplicate detection, retransmitted messages for already-processed sequence numbers are dropped, maintaining FIFO semantics.
123
136
124
137
Therefore, explicit receive-side reordering is not required given our transport guarantees.
125
138
@@ -134,45 +147,12 @@ Therefore, explicit receive-side reordering is not required given our transport
134
147
| Consistent frontier |**Yes**| Each kernel's checkpoint is independent |
135
148
| Local recovery |**Yes**| Crashes don't affect other processes |
136
149
| Sender-based logging |**Yes**| Messages persisted in remotePending until ACKed |
This achieves atomicity without restructuring the existing message handling code. If a crash occurs before commit, both the run queue entry and the sequence update roll back together - the remote retransmits, and we process it correctly.
170
-
171
-
The transaction approach is simpler than reordering because `handleRemoteMessage` handles multiple message types (`deliver`, `resolve`, `gc`) with different processing paths, and reference slots require translation before persistence.
172
-
173
153
## Architectural Summary
174
154
175
-
**Send side (achieved with crank buffering):**
155
+
**Send side (crank buffering):**
176
156
```
177
157
Vat Crank:
178
158
vat processes message → syscalls buffer outputs
@@ -186,34 +166,31 @@ Later (separate operation):
186
166
187
167
The key insight: by the time RemoteHandle sees a message, the originating crank has already committed. Output validity is achieved.
188
168
189
-
**Receive side (bugs to fix):**
169
+
**Receive side (transactional processing):**
190
170
```
191
-
Current (buggy):
192
-
receivefromnetwork
193
-
→ persisthighestReceivedSeq (WRONG: tooearly)
194
-
→ processmessageunconditionally (WRONG: nodedup)
195
-
→ addtorunqueue
196
-
197
-
Fixed (wrapintransaction):
198
-
receivefromnetwork
199
-
→ begintransaction
200
-
→ checkseq<=highestReceivedSeq (skipifduplicate)
201
-
→ processmessage, addtorunqueue
202
-
→ persisthighestReceivedSeq
203
-
→ committransaction
171
+
receive from network
172
+
→ check seq <= highestReceivedSeq (skip if duplicate)
173
+
→ begin transaction (savepoint)
174
+
→ process message, add to run queue
175
+
→ persist highestReceivedSeq
176
+
→ commit transaction (release savepoint)
204
177
```
205
178
179
+
If a crash occurs before commit, both the run queue entry and the sequence update roll back together. The remote retransmits, and we process it correctly.
0 commit comments