Skip to content

[macOS] Single-host scheduler + worker setup fails due to libp2p mDNS unreliability; propose --local-only mode #460

@iamagenius00

Description

@iamagenius00

Summary

Following the README quickstart on a single Mac (M4 Pro, macOS Sequoia) — i.e. running parallax run and parallax join on the same physical machine — the worker repeatedly fails to discover the scheduler via libp2p's mDNS, even though both processes are on 127.0.0.1. This makes the single-host setup path effectively unreliable for Mac users.

This is a separate issue from #459 (vllm_version handshake fix); both surface from the same single-host Mac flow.

Environment

  • Hardware: Mac mini M4 Pro, 64 GB unified memory
  • OS: macOS 15.x (Sequoia)
  • Network: home Wi-Fi, single en1 interface (192.168.110.16/24), default route healthy
  • parallax: HEAD 328c99f (also reproducible on current main)
  • Same machine runs both parallax run -m Qwen/Qwen3-0.6B -n 1 and parallax join

Symptom

Worker log floods with:

ERROR libp2p_mdns::behaviour::iface: error sending packet on iface address
    No route to host (os error 65) address=192.168.110.16
[parallax] WARNING server.py:883 No peers found or scheduler peer id not found, waiting for 1 second.
... (repeats 20+ times) ...
[parallax] ERROR server.py:491 Failed to get scheduler peer id
[parallax] ERROR server.py:500 Failed to build lattica
[parallax] ERROR launch.py:234 Timeout waiting for layer allocation from scheduler
RuntimeError: Failed to get layer allocation from scheduler

Scheduler is alive and listening on 127.0.0.1:3001 throughout; worker simply cannot reach it via libp2p discovery.

What I tried (all failed)

  1. parallax join (default, --scheduler-addr auto) — mDNS discovery times out
  2. parallax join -s 12D3Koo... (explicit scheduler peer id) — worker knows the ID but still cannot establish a connection
  3. parallax join -s ... -r (with --use-relay) — scheduler is not on relay either; both ends would need -r, and that path goes through the public network for a same-host setup

Interestingly, the same setup occasionally did work earlier in the same session (probably libp2p DHT / bootstrap luck), but is now consistently broken. The non-determinism makes the README quickstart frustrating for first-time Mac users.

Why this happens (best-effort analysis)

  • macOS routing table shows the multicast route to 224.0.0.251 (mDNS) is present and en1 is healthy
  • Yet libp2p's rust-mdns reports ENETUNREACH for outbound multicast on the active interface
  • Common candidates: macOS App Sandbox restrictions on multicast send, IPv4/IPv6 source-address selection mismatch, or rust-libp2p mDNS interface-enumeration quirks
  • I have not pinpointed the exact macOS-side cause — but the user-facing root issue is that parallax requires libp2p to work even for same-host scheduler + worker, exposing every libp2p-on-macOS papercut to single-machine users

Proposal: --local-only mode

For single-host deployments (which is also what the dashboard "Get Your Nodes Running" page implicitly guides new users through), it would be valuable to have a mode that bypasses libp2p entirely:

  • parallax run --local-only writes scheduler endpoint info (e.g. socket path or 127.0.0.1:PORT) to a well-known location like ~/.parallax/scheduler.json
  • parallax join --local-only (or auto-detect on same host) reads that file and connects via Unix socket or localhost TCP
  • The existing handshake / RPC protocol is reused over the simpler transport

Benefits:

  • README quickstart "just works" on a fresh Mac with no network tweaks
  • No dependency on the macOS multicast stack for the common single-machine demo case
  • Doesn't change anything for multi-host clusters — they keep using libp2p

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions