You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- last_bgp_reported_at updated on every write (not only on transitions)
- telemetry agent writes on state change or after periodic refresh interval (~1h)
- removed user.status == Activated validation constraint
- instruction variant changed from 94 to TBD (94-103 are taken)
- expanded UserBGPSession alternative with rejection rationale
- removed resolved open question about periodic reconfirmation writes
@@ -75,18 +78,21 @@ After each BGP socket collection tick in `collectBGPStateSnapshot`:
75
78
1. Fetch activated users for this device from the serviceability program.
76
79
2. Map each user to its BGP peer IP: `overlay_dst_ip = user.TunnelNet[0:4]`, last octet +1.
77
80
3. For each user: Up if a socket with matching RemoteIP exists, Down otherwise.
78
-
4. Enqueue one `SetUserBGPStatus` transaction per user into a non-blocking background
79
-
worker. The worker retries failed submissions independently so that a single RPC
80
-
error or congested transaction does not delay other users or block the collection
81
-
tick. The metrics publisher keypair is already loaded in the telemetry agent.
81
+
4. For each user, submit `SetUserBGPStatus` if: (a) the computed status differs from
82
+
the last known onchain value, or (b) the last write was more than a configurable
83
+
interval ago (e.g., 1h), to keep `last_bgp_reported_at` fresh for staleness
84
+
detection. Submissions are enqueued into a non-blocking background worker that
85
+
retries independently so that a single RPC error does not delay other users or
86
+
block the collection tick. The metrics publisher keypair is already loaded in the
87
+
telemetry agent.
82
88
83
89
The raw TCP snapshot upload to S3 continues unchanged.
84
90
85
91
## Impact
86
92
87
93
- Serviceability program: one new instruction, seventeen new bytes on User accounts (1 byte `bgp_status` + 8 bytes `last_bgp_up_at` + 8 bytes `last_bgp_reported_at`).
88
-
- Telemetry agent: one extra RPC call per collection tick to fetch users; N transactions
89
-
per tick (one per activated user on the device).
94
+
- Telemetry agent: one extra RPC call per collection tick to fetch users; up to N transactions
95
+
per tick (one per user whose status changed, or whose periodic refresh interval has elapsed).
90
96
- Read SDKs (Go, TypeScript, Python): update User deserialization for the new field.
91
97
92
98
## Security Considerations
@@ -131,15 +137,10 @@ On a devnet device with at least one activated user:
131
137
- Should there be a grace period before marking a session `Down`? A single missed tick
132
138
due to a transient collection error would incorrectly transition an active user to
133
139
`Down`. One option is to require N consecutive `Down` observations before writing.
134
-
- Since the agent only writes on status changes, `last_bgp_reported_at` will not
135
-
advance for stable sessions, making it impossible to distinguish a healthy long-lived
136
-
`Up` session from a silent agent. Should the agent periodically send a reconfirmation
137
-
write (e.g., every N days) even when the status has not changed, to keep
138
-
`last_bgp_reported_at` fresh and preserve staleness detection?
139
140
- Should we implement per-user rate limiting to prevent RPC saturation caused by
140
141
constant BGP flaps? A user cycling Up/Down rapidly would generate a transaction on
141
-
every tick; a cooldown window or minimum time-between-writes per user account could
142
-
bound the worst-case submission rate.
142
+
every state-change; a cooldown window or minimum time-between-writes per user account
143
+
could bound the worst-case submission rate.
143
144
- How should recurring circuit flaps be handled? A user whose BGP session repeatedly
144
145
drops and recovers within short windows may indicate an unstable circuit rather than
145
146
a transient error. Should the data model track a flap counter or a flap rate to
0 commit comments