Summary
Following the README quickstart on a single Mac (M4 Pro, macOS Sequoia) — i.e. running parallax run and parallax join on the same physical machine — the worker repeatedly fails to discover the scheduler via libp2p's mDNS, even though both processes are on 127.0.0.1. This makes the single-host setup path effectively unreliable for Mac users.
This is a separate issue from #459 (vllm_version handshake fix); both surface from the same single-host Mac flow.
Environment
- Hardware: Mac mini M4 Pro, 64 GB unified memory
- OS: macOS 15.x (Sequoia)
- Network: home Wi-Fi, single en1 interface (192.168.110.16/24), default route healthy
- parallax: HEAD
328c99f (also reproducible on current main)
- Same machine runs both
parallax run -m Qwen/Qwen3-0.6B -n 1 and parallax join
Symptom
Worker log floods with:
ERROR libp2p_mdns::behaviour::iface: error sending packet on iface address
No route to host (os error 65) address=192.168.110.16
[parallax] WARNING server.py:883 No peers found or scheduler peer id not found, waiting for 1 second.
... (repeats 20+ times) ...
[parallax] ERROR server.py:491 Failed to get scheduler peer id
[parallax] ERROR server.py:500 Failed to build lattica
[parallax] ERROR launch.py:234 Timeout waiting for layer allocation from scheduler
RuntimeError: Failed to get layer allocation from scheduler
Scheduler is alive and listening on 127.0.0.1:3001 throughout; worker simply cannot reach it via libp2p discovery.
What I tried (all failed)
parallax join (default, --scheduler-addr auto) — mDNS discovery times out
parallax join -s 12D3Koo... (explicit scheduler peer id) — worker knows the ID but still cannot establish a connection
parallax join -s ... -r (with --use-relay) — scheduler is not on relay either; both ends would need -r, and that path goes through the public network for a same-host setup
Interestingly, the same setup occasionally did work earlier in the same session (probably libp2p DHT / bootstrap luck), but is now consistently broken. The non-determinism makes the README quickstart frustrating for first-time Mac users.
Why this happens (best-effort analysis)
- macOS routing table shows the multicast route to
224.0.0.251 (mDNS) is present and en1 is healthy
- Yet libp2p's rust-mdns reports
ENETUNREACH for outbound multicast on the active interface
- Common candidates: macOS App Sandbox restrictions on multicast send, IPv4/IPv6 source-address selection mismatch, or
rust-libp2p mDNS interface-enumeration quirks
- I have not pinpointed the exact macOS-side cause — but the user-facing root issue is that parallax requires libp2p to work even for same-host scheduler + worker, exposing every libp2p-on-macOS papercut to single-machine users
Proposal: --local-only mode
For single-host deployments (which is also what the dashboard "Get Your Nodes Running" page implicitly guides new users through), it would be valuable to have a mode that bypasses libp2p entirely:
parallax run --local-only writes scheduler endpoint info (e.g. socket path or 127.0.0.1:PORT) to a well-known location like ~/.parallax/scheduler.json
parallax join --local-only (or auto-detect on same host) reads that file and connects via Unix socket or localhost TCP
- The existing handshake / RPC protocol is reused over the simpler transport
Benefits:
- README quickstart "just works" on a fresh Mac with no network tweaks
- No dependency on the macOS multicast stack for the common single-machine demo case
- Doesn't change anything for multi-host clusters — they keep using libp2p
Related
Summary
Following the README quickstart on a single Mac (M4 Pro, macOS Sequoia) — i.e. running
parallax runandparallax joinon the same physical machine — the worker repeatedly fails to discover the scheduler via libp2p's mDNS, even though both processes are on127.0.0.1. This makes the single-host setup path effectively unreliable for Mac users.This is a separate issue from #459 (vllm_version handshake fix); both surface from the same single-host Mac flow.
Environment
328c99f(also reproducible on current main)parallax run -m Qwen/Qwen3-0.6B -n 1andparallax joinSymptom
Worker log floods with:
Scheduler is alive and listening on
127.0.0.1:3001throughout; worker simply cannot reach it via libp2p discovery.What I tried (all failed)
parallax join(default,--scheduler-addr auto) — mDNS discovery times outparallax join -s 12D3Koo...(explicit scheduler peer id) — worker knows the ID but still cannot establish a connectionparallax join -s ... -r(with--use-relay) — scheduler is not on relay either; both ends would need-r, and that path goes through the public network for a same-host setupInterestingly, the same setup occasionally did work earlier in the same session (probably libp2p DHT / bootstrap luck), but is now consistently broken. The non-determinism makes the README quickstart frustrating for first-time Mac users.
Why this happens (best-effort analysis)
224.0.0.251(mDNS) is present anden1is healthyENETUNREACHfor outbound multicast on the active interfacerust-libp2pmDNS interface-enumeration quirksProposal:
--local-onlymodeFor single-host deployments (which is also what the dashboard "Get Your Nodes Running" page implicitly guides new users through), it would be valuable to have a mode that bypasses libp2p entirely:
parallax run --local-onlywrites scheduler endpoint info (e.g. socket path or127.0.0.1:PORT) to a well-known location like~/.parallax/scheduler.jsonparallax join --local-only(or auto-detect on same host) reads that file and connects via Unix socket or localhost TCPBenefits:
Related
--local-onlydirection; would prefer feedback on the proposed endpoint-discovery mechanism before coding