[macOS] Single-host scheduler + worker setup fails due to libp2p mDNS unreliability; propose --local-only mode

## Summary

Following the README quickstart on a single Mac (M4 Pro, macOS Sequoia) — i.e. running `parallax run` and `parallax join` on the **same physical machine** — the worker repeatedly fails to discover the scheduler via libp2p's mDNS, even though both processes are on `127.0.0.1`. This makes the single-host setup path effectively unreliable for Mac users.

This is a separate issue from #459 (vllm_version handshake fix); both surface from the same single-host Mac flow.

## Environment

- Hardware: Mac mini M4 Pro, 64 GB unified memory
- OS: macOS 15.x (Sequoia)
- Network: home Wi-Fi, single en1 interface (192.168.110.16/24), default route healthy
- parallax: HEAD `328c99f` (also reproducible on current main)
- Same machine runs both `parallax run -m Qwen/Qwen3-0.6B -n 1` and `parallax join`

## Symptom

Worker log floods with:

```
ERROR libp2p_mdns::behaviour::iface: error sending packet on iface address
    No route to host (os error 65) address=192.168.110.16
[parallax] WARNING server.py:883 No peers found or scheduler peer id not found, waiting for 1 second.
... (repeats 20+ times) ...
[parallax] ERROR server.py:491 Failed to get scheduler peer id
[parallax] ERROR server.py:500 Failed to build lattica
[parallax] ERROR launch.py:234 Timeout waiting for layer allocation from scheduler
RuntimeError: Failed to get layer allocation from scheduler
```

Scheduler is alive and listening on `127.0.0.1:3001` throughout; worker simply cannot reach it via libp2p discovery.

## What I tried (all failed)

1. `parallax join` (default, `--scheduler-addr auto`) — mDNS discovery times out
2. `parallax join -s 12D3Koo...` (explicit scheduler peer id) — worker knows the ID but still cannot establish a connection
3. `parallax join -s ... -r` (with `--use-relay`) — scheduler is not on relay either; both ends would need `-r`, and that path goes through the public network for a same-host setup

Interestingly, the same setup occasionally *did* work earlier in the same session (probably libp2p DHT / bootstrap luck), but is now consistently broken. The non-determinism makes the README quickstart frustrating for first-time Mac users.

## Why this happens (best-effort analysis)

- macOS routing table shows the multicast route to `224.0.0.251` (mDNS) is present and `en1` is healthy
- Yet libp2p's rust-mdns reports `ENETUNREACH` for outbound multicast on the active interface
- Common candidates: macOS App Sandbox restrictions on multicast send, IPv4/IPv6 source-address selection mismatch, or `rust-libp2p` mDNS interface-enumeration quirks
- I have not pinpointed the exact macOS-side cause — but the user-facing root issue is that **parallax requires libp2p to work even for same-host scheduler + worker**, exposing every libp2p-on-macOS papercut to single-machine users

## Proposal: `--local-only` mode

For single-host deployments (which is also what the dashboard "Get Your Nodes Running" page implicitly guides new users through), it would be valuable to have a mode that bypasses libp2p entirely:

- `parallax run --local-only` writes scheduler endpoint info (e.g. socket path or `127.0.0.1:PORT`) to a well-known location like `~/.parallax/scheduler.json`
- `parallax join --local-only` (or auto-detect on same host) reads that file and connects via Unix socket or localhost TCP
- The existing handshake / RPC protocol is reused over the simpler transport

Benefits:
- README quickstart "just works" on a fresh Mac with no network tweaks
- No dependency on the macOS multicast stack for the common single-machine demo case
- Doesn't change anything for multi-host clusters — they keep using libp2p

## Related

- #459 (vllm_version handshake) — also surfaces from the same single-host Mac path
- Happy to draft a PR if maintainers agree on the `--local-only` direction; would prefer feedback on the proposed endpoint-discovery mechanism before coding


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[macOS] Single-host scheduler + worker setup fails due to libp2p mDNS unreliability; propose --local-only mode #460

Summary

Environment

Symptom

What I tried (all failed)

Why this happens (best-effort analysis)

Proposal: `--local-only` mode

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[macOS] Single-host scheduler + worker setup fails due to libp2p mDNS unreliability; propose --local-only mode #460

Description

Summary

Environment

Symptom

What I tried (all failed)

Why this happens (best-effort analysis)

Proposal: --local-only mode

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Proposal: `--local-only` mode