You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
agent: simplify config polling by removing GetConfigHash RPC
Remove the separate GetConfigHash RPC endpoint and simplify the agent's
config polling logic based on PR feedback. The agent now:
- Fetches the full config every 5 seconds (as before)
- Computes SHA256 hash locally to detect changes
- Only applies config when changed or after 60s timeout
This achieves the same CPU/load reduction goals with simpler architecture:
- No duplicate logic between GetConfig and GetConfigHash
- Same performance (config only applied when changed)
- ~400 lines of code removed
The optimization benefits remain: EOS devices aren't hammered with
unchanged configs, reducing device CPU usage while maintaining the
5-second polling interval for responsiveness.
@@ -33,8 +36,6 @@ All notable changes to this project will be documented in this file.
33
36
- Add onchain parent DZD discovery to geoprobe-agent: periodically queries the Geolocation program for this probe's parent devices and resolves their metrics publisher keys from Serviceability, replacing the need for static `--parent-dzd` CLI flags. Static parents from CLI are merged with onchain parents, with onchain taking precedence for duplicate keys.
34
37
- Optimize inbound probe-measured RTT accuracy: pre-sign both TWAMP probes before network I/O so probe 1 fires immediately after reply 0 with no signing delay, measure Tx-to-Rx interval (reply 0 Tx → probe 1 Rx) instead of Rx-to-Rx to exclude processing overhead on both sides, use kernel `SO_TIMESTAMPNS` receive timestamps on the reflector, and add a 15ms busy-poll window on the sender to avoid scheduler wakeup latency
35
38
- Optimize outbound probe RTT accuracy: send a staggered warmup probe on a separate socket 2ms before the measurement probe to wake the reflector's thread, then take the min RTT of both
36
-
- Device agents
37
-
- Reduce config agent network and CPU usage by checking config checksums every 5 seconds, and reducing full config check frequency to 1m
Copy file name to clipboardExpand all lines: controlplane/controller/README.md
+43-79Lines changed: 43 additions & 79 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,93 +6,57 @@ The controller generates device configurations from Solana smart contract state
6
6
7
7
### Agent-Controller Communication Flow
8
8
9
-
The controller provides two gRPC endpoints, GetConfig and GetConfigHash, that the config agent (in ../agent/) uses to detect and apply configuration changes. The agent polls the controller every 5 seconds by default.
9
+
The controller provides a gRPC endpoint (GetConfig) that returns both the configuration and its hash. The agent polls the controller every 5 seconds, but only applies the configuration to the EOS device when it has changed (based on hash comparison) or after a 60-second timeout.
10
10
11
-
The design includes two optimizations:
12
-
1. Applying configuration to an Arista EOS device causes the EOS ConfigAgent process CPU to spike, so the agent only applies the config when the config generated by the controller is different than the last polling cycle
13
-
2. To make success more likely on lossy networks, GetConfigHash returns only the hash (64 bytes) instead of the full config (~50KB+)
11
+
The design includes an optimization to reduce EOS device CPU usage:
12
+
- Applying configuration to an Arista EOS device causes the EOS ConfigAgent process CPU to spike
13
+
- The agent computes a SHA256 hash of the received config and only applies it when:
14
+
1. The hash differs from the last applied configuration, OR
15
+
2. 60 seconds have elapsed since the last application (as a safety measure)
0 commit comments