feat: Docker-based E2E test framework with chaos testing by AlexCheema · Pull Request #1462 · exo-explore/exo

AlexCheema · 2026-02-12T18:48:19Z

Motivation

We had no end-to-end testing for exo clusters. Unit tests can't catch issues in node discovery, master election, or multi-node coordination. We also need a framework for chaos testing (network failures, partitions, etc.) to build confidence in cluster resilience.

Changes

Adds a Python/asyncio E2E test framework that spins up 2-node exo clusters in Docker Compose:

e2e/Dockerfile — Multi-stage build: Node.js dashboard → Rust nightly + Python 3.13. Cleans up Rust build artifacts to keep the image small. Includes a g++ wrapper for MLX CPU JIT compatibility with GCC 14.
e2e/conftest.py — Cluster class wrapping docker compose: build, start, stop, logs, exec, place_model, chat. Async context manager with automatic cleanup.
e2e/run_all.py — Test runner discovering test_*.py files. Supports --slow / E2E_SLOW=1 to include inference tests.
3 tests:
- test_cluster_formation — Nodes discover each other via mDNS, elect a master, API responds.
- test_no_internet — iptables blocks all outbound traffic except private subnets and multicast. Verifies cluster forms without internet, confirms connectivity is actually blocked (curl + exo's own "Internet connectivity: False" log).
- test_inference_snapshot (slow) — Launches mlx-community/Qwen3-0.6B-4bit, sends a chat completion with seed=42, temperature=0, verifies output matches a committed snapshot. Skipped in CI (x86 MLX CPU too slow), runs on Apple Silicon with --slow.
.github/workflows/e2e.yml — CI workflow on push/PR. Frees disk space before Docker build (Rust compilation is heavy).

Why It Works

mDNS discovery works on Docker bridge networks — multicast 224.0.0.251 stays within the bridge.
No-internet isolation uses iptables (not Docker internal: true, which blocks multicast). NET_ADMIN capability lets containers set up firewall rules before starting exo.
.venv/bin/exo is called directly instead of uv run, avoiding PyPI resolution at container startup.
Deterministic inference — MLX respects mx.random.seed() with temperature=0 for reproducible output.

Test Plan

Manual Testing

All 3 tests pass locally on macOS (Apple Silicon, Docker Desktop):

$ python3 e2e/run_all.py --slow
PASSED: cluster_formation
PASSED: inference_snapshot
PASSED: no_internet
3/3 tests passed

Automated Testing

CI runs cluster_formation and no_internet (2/3 passed, 1 skipped):
https://github.com/exo-explore/exo/actions/runs/21961324819

Add a Python/asyncio E2E test framework that spins up 2-node exo clusters in Docker Compose and verifies cluster formation, discovery, election, and API health. Includes a no-internet chaos test using DNS blocking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The runner was running out of disk space during the Docker image build (Rust compilation + Python deps). Remove unused toolchains first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Clean up Rust target/ and cargo registry after uv sync in the same RUN command so build artifacts aren't committed to the layer (~1-2 GB saved). Also remove more unused toolchains from the CI runner. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use iptables to block all outbound traffic except private subnets and multicast (for mDNS discovery). Verify internet is blocked by curling huggingface.co from inside each container and checking exo logs for "Internet connectivity: False". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Launch mlx-community/Qwen3-0.6B-4bit on the cluster, send a chat completion with seed=42 and temperature=0, and verify the output matches a committed snapshot. Tests inference determinism end-to-end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MLX CPU inference on x86_64 is too slow for CI runners (~10min+ for a single request). Mark the inference snapshot test as slow so it's skipped by default. Run with --slow or E2E_SLOW=1 on Apple Silicon. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…st collection The tests/start_distributed_test.py script calls sys.exit() at module level, which crashes pytest collection. Exclude it via collect_ignore. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add e2e/snapshot.py with assert_snapshot() for deterministic regression testing. On first run, saves inference output as the expected snapshot. On subsequent runs, compares against it with unified diff on mismatch. Set UPDATE_SNAPSHOTS=1 or pass --update-snapshots to regenerate. Refactor test_inference_snapshot.py to use the shared infrastructure and drop temperature=0 in favor of seed-only determinism. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nd edge cases Expand e2e snapshot coverage beyond the single 'What is 2+2?' test: - test_snapshot_code_gen.py: code generation prompt (max_tokens=64) - test_snapshot_reasoning.py: step-by-step math reasoning (max_tokens=64) - test_snapshot_long_output.py: longer response with max_tokens=128 - test_snapshot_edge.py: single word, special chars, and unicode prompts All use seed=42 and the shared assert_snapshot() infrastructure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MLX already supports x86 CPU via mlx[cpu] and the Dockerfile has the GCC workaround for CPU JIT. The only barriers were the 'slow' markers causing tests to be skipped in CI. Changes: - Remove 'slow' marker from all snapshot tests so they run by default - Make snapshots architecture-aware (snapshots/{arch}/{name}.json) since floating-point results differ between x86_64 and arm64 - Store architecture in snapshot metadata - Increase CI timeout from 30 to 45 minutes for model download + CPU inference - Update docstrings to remove Apple Silicon requirement Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pre-build the Docker image using docker/build-push-action with GitHub Actions cache (type=gha). On cache hit, the image loads from cache instead of rebuilding (~12min → seconds). Changes: - CI: set up buildx, build image with --cache-from/--cache-to type=gha - docker-compose.yml: add image tag (exo-e2e:latest) so compose uses the pre-built image instead of rebuilding - conftest.py: Cluster.build() skips if exo-e2e:latest already exists (pre-built in CI), falls back to docker compose build for local dev Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add e2e snapshot test that exercises 3 different model architectures to catch model-specific regressions: - SmolLM2-135M-Instruct (tiny llama, bf16, ~269MB) - Llama-3.2-1B-Instruct-4bit (small llama, 4bit, ~730MB) - gemma-2-2b-it-4bit (gemma2 architecture, 4bit, ~1.5GB) Each model gets its own snapshot file. All use the same prompt ("What is the capital of France?"), seed=42, max_tokens=32. Also adds model cards for SmolLM2-135M-Instruct and gemma-2-2b-it-4bit (Llama-3.2-1B-Instruct-4bit already had one). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Two issues prevented MLX CPU from working on x86_64 in Docker: 1. Missing BLAS/LAPACK libraries: MLX CPU backend requires libblas-dev, liblapack-dev, and liblapacke-dev on Linux. Added to apt-get install. 2. g++ wrapper ordering: The -fpermissive wrapper for GCC 14 was installed AFTER uv sync, but MLX may compile extensions during install. Moved the wrapper BEFORE uv sync so both build-time and runtime JIT compilation benefit from the fix. MLX publishes manylinux_2_35_x86_64 wheels, so this uses the native CPU backend — no alternative inference framework needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add proactive monitoring to detect runner process death and unresponsiveness: - Health check loop polls is_alive() every 1s, detects unexpected exits - Counter-based heartbeat detects frozen/unresponsive processes - Emits RunnerFailed event and releases pending task waiters on failure - Add EXO_RUNNER_MUST_DIE debug trigger for testing abrupt process death - Add chaos E2E test that kills runner mid-inference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lection Add root conftest.py to exclude tests/start_distributed_test.py from pytest collection (it calls sys.exit at module level). Fix ruff lint issues (import sorting, f-string without placeholders, lambda loop variable capture) and apply nix fmt formatting to e2e files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Snapshot tests do MLX inference on x86 CPU in Docker which takes >600s per test, causing the 45-minute CI job to timeout. Only cluster_formation and no_internet (non-inference tests) should run in CI. Inference snapshot tests can be run locally with --slow or E2E_SLOW=1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Scope e2e workflow to only trigger on pushes to e2e-tests branch (not every branch push) - Add temperature=0 to remaining snapshot test chat calls for deterministic output - Make assert_snapshot fail when no baseline exists instead of silently creating one — baselines must be explicitly generated with UPDATE_SNAPSHOTS=1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Docker mDNS discovery can be slow on first boot in CI, causing cluster_formation to timeout on "Nodes discovered each other" while subsequent tests pass fine. Retry failed tests once before counting them as real failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

After merging main (api cancellation #1276), the RunnerSupervisor dataclass requires a _cancel_sender field. Update the test helper to create and pass this channel. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AlexCheema · 2026-02-17T18:52:09Z

Code Review: Docker-based E2E test framework with chaos testing

Large, well-engineered PR (24 files, +1399/-5, 23 commits) adding E2E test infrastructure plus RunnerSupervisor health check.

CI Failures

e2e jobs (FAILING): test_inference_snapshot fails — no x86_64 snapshot baseline is committed. The test is NOT marked slow, so it runs in CI. Fix: either generate and commit x86_64 baselines (UPDATE_SNAPSHOTS=1) or mark the test as slow.

aarch64-darwin (HANGING ~6 hours): Root cause is the MpReceiver.close() thread hang bug — _forward_events() → receive_async() → abandoned thread blocked on queue.get() that never receives _MpEndOfStream. Fixed in PR #1511. Once #1511 is merged and this PR is rebased, the hang should be resolved.

E2E Framework — Well-Designed

Cluster class (conftest.py): Clean async wrapper around docker compose with automatic cleanup, smart CI optimization (skips build if image exists), good failure output
Dockerfile: Multi-stage build, aggressive Rust cleanup, BLAS/LAPACK for MLX CPU on Linux. The g++ -fpermissive wrapper is acceptable for a test image but should note which MLX version requires it
Snapshot infrastructure: Architecture-aware (x86_64/arm64), explicit baseline requirement, clear diff output on mismatch
docker-compose: Two services, same image, isolated namespace. No-internet override uses iptables REJECT (not DROP) for fast test feedback

E2E Tests — Good Coverage

Test	Verdict
`test_cluster_formation`	Good — minimal smoke test
`test_no_internet`	Good — validates iptables blocks + exo detects no connectivity
`test_inference_snapshot`	Needs x86_64 baseline committed
`test_runner_chaos`	Good — validates health check E2E via `EXO_RUNNER_MUST_DIE`
Snapshot variants (5 files)	Good coverage, somewhat repetitive boilerplate

RunnerSupervisor Health Check — Well-Designed, Minor Concerns

Good:

Counter-based heartbeat (avoids clock skew), daemon thread, 0.5s interval
_death_handled flag prevents race between health check and event forwarding
Comprehensive unit tests (6 test cases, real multiprocessing)

Concerns:

Sync shutdown() in async context: _handle_process_exit() and _handle_unresponsive() are sync methods that call self.shutdown() (which does runner_process.join(1) — blocking for up to 1s). These are called from _health_check() which is an async coroutine, so they block the event loop. Consider wrapping in to_thread.run_sync().
Two overlapping failure-handling paths: _check_runner() (from _forward_events) and _handle_process_exit() (from _health_check) both handle process death. The _death_handled flag coordinates them, but having two code paths for the same failure is fragile. Consider consolidating.
Pre-heartbeat freeze gap: If the process freezes before the heartbeat thread starts (e.g., during import), heartbeat.value stays at 0, and the if current > 0 guard causes the health check to ignore it. The is_alive() check won't detect a live-but-frozen process. Known limitation, acceptable for now.

Security Note

Debug triggers (EXO_RUNNER_MUST_DIE, MUST_FAIL, MUST_OOM, MUST_TIMEOUT) are gated on prompt content, not environment variables. Any API user can kill a runner by sending a specific prompt. Consider gating on EXO_ENABLE_DEBUG_TRIGGERS=1 for production safety.

Minor

Hard-coded port 52415 in conftest.py — extract to a constant
run_all.py slow detection parses docstrings — fragile, consider filename convention or pytest markers
23 commits, many incremental CI fixes — squash before merge

Verdict

Substantial, well-engineered addition to the project. The E2E framework is clean and the RunnerSupervisor health check is sound. Two blockers:

Fix e2e CI: Generate x86_64 snapshot baselines or mark test_inference_snapshot as slow
Rebase onto fix: unblock MpReceiver.close() to prevent shutdown hang #1511: The aarch64-darwin hang is caused by the MpReceiver.close() bug — fixed in PR fix: unblock MpReceiver.close() to prevent shutdown hang #1511

Once those are resolved, recommend squashing the 23 commits and merging.

Review only — not a merge approval.

## Summary - `MpReceiver.close()` did not unblock threads stuck on `queue.get()` in `receive_async()`, causing abandoned threads (via `abandon_on_cancel=True`) to keep the Python process alive indefinitely after tests pass - This caused the `aarch64-darwin` CI jobs in PR #1462 to hang for ~6 hours until the GitHub Actions timeout killed them - Sends an `_MpEndOfStream` sentinel before closing the buffer, mirroring what `MpSender.close()` already does ## Test plan - [x] `uv run basedpyright` — 0 errors - [x] `uv run ruff check` — clean - [x] `nix fmt` — 0 changed - [x] `uv run pytest` — 188 passed, 1 skipped in 12s (no hang) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: rltakashige <rl.takashige@gmail.com> Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>

@AlexCheema

**Enabling peers to be discovered in environments where mDNS is unavailable (SSH sessions, headless servers, Docker).** ## Motivation Exo discovers peers exclusively via mDNS, which works great on a local network but breaks once you move beyond a single L2 broadcast domain: - SSH sessions on macOS — TCC blocks mDNS multicast from non-GUI sessions (#1488) - Headless servers/rack machines — #1682 ("DGX Spark does not find other nodes") - Docker Compose — mDNS is often unavailable across container networks; e.g. #1462 (E2E test framework) needs an alternative Related works: #1488 (working implementation made by @AlexCheema and closed because SSH had a GUI workaround), #1023 (Headscale WAN then closed due to merge conflicts), #1656 (discovery cleanup, open). This PR introduces an optional bootstrap mechanism for peer discovery while leaving the existing mDNS behavior unchanged. ## Changes Adds two new CLI flags: - `--bootstrap-peers` (env: `EXO_BOOTSTRAP_PEERS`) — comma-separated libp2p multiaddrs to dial on startup and retry periodically - `--libp2p-port` — fixed TCP port for libp2p to listen on (default: OS-assigned). Required when bootstrap peers, so other nodes know which port to dial. 8 files: - `rust/networking/src/discovery.rs`: Store bootstrap addrs, dial in existing retry loop - `rust/networking/src/swarm.rs`: Thread `bootstrap_peers` parameter to `Behaviour` - `rust/networking/examples/chatroom.rs`: Updated call site for new create_swarm signature - `rust/networking/tests/bootstrap_peers.rs`: Integration tests - `rust/exo_pyo3_bindings/src/networking.rs`: Accept optional `bootstrap_peers` in PyO3 constructor - `rust/exo_pyo3_bindings/exo_pyo3_bindings.pyi` : Update type stub - `src/exo/routing/router.py`: Pass peers to `NetworkingHandle` - `src/exo/main.py` : `--bootstrap-peers` CLI arg + `EXO_BOOTSTRAP_PEERS` env var ## Why It Works Bootstrap peers are dialed in the existing retry loop — the same path taken by peers when mDNS-discovered. The swarm handles connection, Noise handshake, and gossipsub mesh joining from there. PeerId is intentionally not required in the multiaddr, the Noise handshake discovers it. Docker Compose example: ```yaml services: exo-1: environment: EXO_BOOTSTRAP_PEERS: "/ip4/exo-2/tcp/30000" exo-2: environment: EXO_BOOTSTRAP_PEERS: "/ip4/exo-1/tcp/30000" ``` ## Test Plan ### Manual Testing <details> <summary>Docker Compose config</summary> ``` services: exo-node1: build: context: . dockerfile: Dockerfile.bootstrap-test container_name: exo-bootstrap-node1 hostname: exo-node1 command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.3/tcp/30000"] environment: - EXO_LIBP2P_NAMESPACE=bootstrap-test ports: - "52415:52415" networks: bootstrap-net: ipv4_address: 172.30.20.2 deploy: resources: limits: memory: 4g exo-node2: build: context: . dockerfile: Dockerfile.bootstrap-test container_name: exo-bootstrap-node2 hostname: exo-node2 command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.2/tcp/30000"] environment: - EXO_LIBP2P_NAMESPACE=bootstrap-test ports: - "52416:52415" networks: bootstrap-net: ipv4_address: 172.30.20.3 deploy: resources: limits: memory: 4g networks: bootstrap-net: driver: bridge ipam: config: - subnet: 172.30.20.0/24 ``` </details> Two containers on a bridge network (`172.30.20.0/24`), fixed IPs, `--libp2p-port 30000`, cross-referencing `--bootstrap-peers`. Both nodes found each other and established a connection then ran the election protocol. ### Automated Testing 4 Rust integration tests in `rust/networking/tests/bootstrap_peers.rs` (`cargo test -p networking`): | Test | What it verifies | Result | |------|-----------------|--------| | `two_nodes_connect_via_bootstrap_peers` | Node B discovers Node A via bootstrap addr (real TCP connection) | PASS | | `create_swarm_with_empty_bootstrap_peers` | Backward compatibility — no bootstrap peers works | PASS | | `create_swarm_ignores_invalid_bootstrap_addrs` | Invalid multiaddrs silently filtered | PASS | | `create_swarm_with_fixed_port` | `listen_port` parameter works | PASS | All 4 pass. The connection test takes ~6s --------- Signed-off-by: DeepZima <deepzima@outlook.com> Co-authored-by: Evan <evanev7@gmail.com>

@AlexCheema

**Enabling peers to be discovered in environments where mDNS is unavailable (SSH sessions, headless servers, Docker).** ## Motivation Exo discovers peers exclusively via mDNS, which works great on a local network but breaks once you move beyond a single L2 broadcast domain: - SSH sessions on macOS — TCC blocks mDNS multicast from non-GUI sessions (exo-explore#1488) - Headless servers/rack machines — exo-explore#1682 ("DGX Spark does not find other nodes") - Docker Compose — mDNS is often unavailable across container networks; e.g. exo-explore#1462 (E2E test framework) needs an alternative Related works: exo-explore#1488 (working implementation made by @AlexCheema and closed because SSH had a GUI workaround), exo-explore#1023 (Headscale WAN then closed due to merge conflicts), exo-explore#1656 (discovery cleanup, open). This PR introduces an optional bootstrap mechanism for peer discovery while leaving the existing mDNS behavior unchanged. ## Changes Adds two new CLI flags: - `--bootstrap-peers` (env: `EXO_BOOTSTRAP_PEERS`) — comma-separated libp2p multiaddrs to dial on startup and retry periodically - `--libp2p-port` — fixed TCP port for libp2p to listen on (default: OS-assigned). Required when bootstrap peers, so other nodes know which port to dial. 8 files: - `rust/networking/src/discovery.rs`: Store bootstrap addrs, dial in existing retry loop - `rust/networking/src/swarm.rs`: Thread `bootstrap_peers` parameter to `Behaviour` - `rust/networking/examples/chatroom.rs`: Updated call site for new create_swarm signature - `rust/networking/tests/bootstrap_peers.rs`: Integration tests - `rust/exo_pyo3_bindings/src/networking.rs`: Accept optional `bootstrap_peers` in PyO3 constructor - `rust/exo_pyo3_bindings/exo_pyo3_bindings.pyi` : Update type stub - `src/exo/routing/router.py`: Pass peers to `NetworkingHandle` - `src/exo/main.py` : `--bootstrap-peers` CLI arg + `EXO_BOOTSTRAP_PEERS` env var ## Why It Works Bootstrap peers are dialed in the existing retry loop — the same path taken by peers when mDNS-discovered. The swarm handles connection, Noise handshake, and gossipsub mesh joining from there. PeerId is intentionally not required in the multiaddr, the Noise handshake discovers it. Docker Compose example: ```yaml services: exo-1: environment: EXO_BOOTSTRAP_PEERS: "/ip4/exo-2/tcp/30000" exo-2: environment: EXO_BOOTSTRAP_PEERS: "/ip4/exo-1/tcp/30000" ``` ## Test Plan ### Manual Testing <details> <summary>Docker Compose config</summary> ``` services: exo-node1: build: context: . dockerfile: Dockerfile.bootstrap-test container_name: exo-bootstrap-node1 hostname: exo-node1 command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.3/tcp/30000"] environment: - EXO_LIBP2P_NAMESPACE=bootstrap-test ports: - "52415:52415" networks: bootstrap-net: ipv4_address: 172.30.20.2 deploy: resources: limits: memory: 4g exo-node2: build: context: . dockerfile: Dockerfile.bootstrap-test container_name: exo-bootstrap-node2 hostname: exo-node2 command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.2/tcp/30000"] environment: - EXO_LIBP2P_NAMESPACE=bootstrap-test ports: - "52416:52415" networks: bootstrap-net: ipv4_address: 172.30.20.3 deploy: resources: limits: memory: 4g networks: bootstrap-net: driver: bridge ipam: config: - subnet: 172.30.20.0/24 ``` </details> Two containers on a bridge network (`172.30.20.0/24`), fixed IPs, `--libp2p-port 30000`, cross-referencing `--bootstrap-peers`. Both nodes found each other and established a connection then ran the election protocol. ### Automated Testing 4 Rust integration tests in `rust/networking/tests/bootstrap_peers.rs` (`cargo test -p networking`): | Test | What it verifies | Result | |------|-----------------|--------| | `two_nodes_connect_via_bootstrap_peers` | Node B discovers Node A via bootstrap addr (real TCP connection) | PASS | | `create_swarm_with_empty_bootstrap_peers` | Backward compatibility — no bootstrap peers works | PASS | | `create_swarm_ignores_invalid_bootstrap_addrs` | Invalid multiaddrs silently filtered | PASS | | `create_swarm_with_fixed_port` | `listen_port` parameter works | PASS | All 4 pass. The connection test takes ~6s --------- Signed-off-by: DeepZima <deepzima@outlook.com> Co-authored-by: Evan <evanev7@gmail.com>

AlexCheema marked this pull request as draft February 12, 2026 18:48

AlexCheema changed the title ~~E2E Tests~~ feat: Docker-based E2E test framework with chaos testing Feb 12, 2026

AlexCheema marked this pull request as ready for review February 16, 2026 13:41

AlexCheema enabled auto-merge (squash) February 16, 2026 17:49

AlexCheema and others added 22 commits February 16, 2026 10:00

fix: free disk space in CI before Docker build

b9dcea2

The runner was running out of disk space during the Docker image build (Rust compilation + Python deps). Remove unused toolchains first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: ruff lint and formatting for e2e test files

ce82486

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: add root conftest.py to exclude start_distributed_test from pyte…

2cce4e8

…st collection The tests/start_distributed_test.py script calls sys.exit() at module level, which crashes pytest collection. Exclude it via collect_ignore. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ci: retrigger CI

5ec7b35

ci: retrigger CI (darwin runner stale)

671e5de

ci: retrigger CI (darwin runner back)

b36721e

AlexCheema force-pushed the e2e-tests branch from 0e61947 to b36721e Compare February 16, 2026 18:01

fix: pass _cancel_sender in RunnerSupervisor test helper

a288401

After merging main (api cancellation #1276), the RunnerSupervisor dataclass requires a _cancel_sender field. Update the test helper to create and pass this channel. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AlexCheema mentioned this pull request Feb 17, 2026

fix: unblock MpReceiver.close() to prevent shutdown hang #1511

Merged

4 tasks

AlexCheema mentioned this pull request Feb 17, 2026

fix: add health check and heartbeat to RunnerSupervisor #1464

Closed

Deepzima mentioned this pull request Mar 9, 2026

Feat/static peer discovery #1690

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Docker-based E2E test framework with chaos testing#1462

feat: Docker-based E2E test framework with chaos testing#1462
AlexCheema wants to merge 23 commits intomainfrom
e2e-tests

AlexCheema commented Feb 12, 2026 •

edited

Loading

Uh oh!

AlexCheema commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AlexCheema commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Why It Works

Test Plan

Manual Testing

Automated Testing

Uh oh!

AlexCheema commented Feb 17, 2026

Code Review: Docker-based E2E test framework with chaos testing

CI Failures

E2E Framework — Well-Designed

E2E Tests — Good Coverage

RunnerSupervisor Health Check — Well-Designed, Minor Concerns

Security Note

Minor

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AlexCheema commented Feb 12, 2026 •

edited

Loading