Split TCP RR infrastructure from the workload#1905
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1785a95bb9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| /// Flow count is *not* configured here — it is owned by the | ||
| /// `tcp_rr` blackhole and communicated to the generator over the control port | ||
| /// during startup. |
There was a problem hiding this comment.
Preserve existing tcp_rr flow configs
When an existing tcp_rr generator config contains the previously documented flows field, this new schema rejects it because Config still uses deny_unknown_fields; moving the field solely to the blackhole means those user configs fail at startup rather than running, despite /workspace/lading/AGENTS.md stating that user configs need stability. Please keep accepting the generator-side field during a migration or otherwise translate it instead of making existing experiment configs invalid.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
It is fine. We do not have many users for this.
|
|
||
| /// Control-channel handshake: server writes `flows` to the accepted control | ||
| /// connection as a 2-byte big-endian `u16` and closes; client reads the same | ||
| /// 2 bytes after connecting. Internal protocol — no magic / version byte. |
There was a problem hiding this comment.
This newly added doc comment contains a non-ASCII em dash, and /workspace/lading/AGENTS.md requires US-ASCII only in code and documentation. There are multiple new occurrences in this file, so replace these punctuation characters with ASCII equivalents before landing.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f74a091536
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // against a generator that connects but never reads. | ||
| conn.set_write_timeout(Some(HANDSHAKE_TIMEOUT)) | ||
| .expect("set_write_timeout on accepted TcpStream must succeed"); | ||
| conn.write_all(&flows_bytes)?; |
There was a problem hiding this comment.
Clean up workers when control handshake fails
If the accepted control connection closes or resets before the 2-byte handshake is written (for example, the generator is killed right after connecting, or a probe hits the control port), this ? returns from run_server before shutdown_flag is set and before the worker/metrics handles are joined. Dropping those JoinHandles detaches the data listener threads, leaving sockets and metrics work running even though the blackhole task reported an error, which can make subsequent runs in the same process collide with the still-bound data port.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 86bb9725e0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| registry | ||
| .reregister(&mut flow.stream, flow.token, interest) | ||
| .expect("reregister of a live, owned flow must succeed"); |
There was a problem hiding this comment.
Handle poll registry errors without panicking
When mio returns an io::Error from reregister (and the same new expect pattern is used for deregister just below), this worker thread panics instead of converting the flow to an error/removal path. In a live run, a registry error would stop that thread's event loop and only surface as ThreadPanicked after shutdown, which can corrupt the experiment's results; it also violates the repository rule in AGENTS.md that recoverable errors must not panic.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
We do not consider this a recoverable error. A live flow must always allow reregistration.
What does this PR do?
This PR refactors the tcp_rr workload to split the infrastructure elements into a separate file. This will allow subsequent workloads to reuse the same thread and flow management code.
Motivation
Related issues
Additional Notes
PR stack
[1] #1905 <-- This
[2] #1906