feat(cluster): RDMA smoke test + encrypted pipeline inference over Thunderbolt by anupsv · Pull Request #192 · Layr-Labs/d-inference

anupsv · 2026-05-20T19:11:01Z

Summary

This PR adds the complete RDMA pipeline inference stack with SE-authenticated encrypted tensor transfer over Thunderbolt, plus fully automated peer discovery.

RDMA connectivity smoke test (rdma-connection-test/) — standalone Swift package that verifies jaccl/RDMA works between two Apple Silicon Macs over Thunderbolt
Encrypted pipeline inference infrastructure — full SE-authenticated, AES-256-GCM-encrypted pipeline-parallel inference stack in ProviderCore/P2P/
Coordinator-verified SE key pinning — replaces TOFU disk files with Keychain storage backed by coordinator MDM attestation
RDMA auto-discovery — darkbloom serve --rdma-enabled automatically finds the Thunderbolt peer without manual serial pairing

Why not plain RDMA for tensor transfer?

RDMA (mlx_distributed_send/recv) is a black-box C API — we can't wrap it with encryption. An attacker with root could inject a librdma.dylib shim and intercept activation tensors in transit. Instead:

jaccl / RDMA is used only for collective ops (allreduce) where confidentiality is less critical
Activation + token transfer goes over ThunderboltLink TCP with AES-256-GCM (~0.3 ms for 32 MB)

New files

`ProviderCore/P2P/`

File	Role
`ThunderboltLink.swift`	TCP framing over `wiredEthernet` (bridge0), port 7777, 8-byte LE length prefix
`MLXDistributed.swift`	Swift wrappers for jaccl C API
`ClusterControlMessage.swift`	Wire types, `ClusterFrame` encoding, `ClusterHealth`, `ClusterError`
`ClusterHandshake.swift`	SE mutual auth + ephemeral X25519 ECDH + HKDF session key
`ClusterPeerKeychain.swift`	Stores peer SE keys in macOS Keychain (`kSecAttrAccessibleWhenUnlockedThisDeviceOnly`)
`ClusterCoordinatorClient.swift`	Coordinator HTTP client: `fetchPeerSEKey` + `fetchRDMAPeers`
`TensorCrypto.swift`	AES-256-GCM seal/open for `MLXArray`
`ClusterSession.swift`	Rank 0 actor (connect loop, ping loop, auto-reconnect) + Rank 1 `ClusterPeer`
`EncryptedPipelineInference.swift`	`EncryptedPipelineEngine` (rank 0) + `EncryptedPipelineServer` (rank 1)
`ClusterDiscovery.swift`	NEW: RDMA auto-discovery actor (NWPathMonitor + ARP + coordinator lookup)
`RDMACapability.swift`	NEW: M5+ chip + `rdma_ctl status` capability gate (downgrades `--rdma-enabled` when hardware can't honor it)
`Security/AttestationBuilder.swift`	Add `rdmaEnabled` field to the signed `AttestationBlob`; builder takes effective value as a parameter

Coordinator

File	Change
`protocol/messages.go`	Add `rdma_enabled` to `RegisterMessage`
`attestation/attestation.go`	Add `RDMAEnabled` to `AttestationBlob` + `VerificationResult`; include in `marshalSortedJSON`
`registry/registry.go`	Add `RDMAEnabled` to `Provider`; add `SetRDMAEnabled()` (coordinator pins to attested value on mismatch); `ListRDMAEnabledPeersForCaller()` enforces symmetric-access + Attested + Status filters
`api/cluster_handlers.go`	`GET /v1/cluster/peer-key` + `GET /v1/cluster/rdma-peers` (rdma-peers returns 403 unless caller owns an Attested RDMA provider)
`api/provider.go`	After `attestation.Verify`, reconcile `msg.RDMAEnabled` vs `result.RDMAEnabled` — attested value wins, mismatch logged
`api/server.go`	Register both cluster endpoints

Protocol flows

Handshake (once per connection)

Rank 0 → Rank 1:  HandshakeHello { se_pub₀, x25519_pub₀, nonce₀, sig₀ }
                  sig₀ = SE_sign(SHA256(x25519_pub₀ ∥ nonce₀))

Rank 1 → Rank 0:  HandshakeAck   { se_pub₁, x25519_pub₁, nonce₁, sig₁ }
                  sig₁ = SE_sign(SHA256(nonce₀ ∥ nonce₁ ∥ x25519_pub₁))

Session key = HKDF-SHA256(X25519(sk_local, pk_peer), salt: nonce₀ ∥ nonce₁)

SE key pinning is coordinator-verified: keys come from MDM attestation records,
stored in macOS Keychain (kSecAttrAccessibleWhenUnlockedThisDeviceOnly).

Inference (per token)

Rank 0 → Rank 1:  inferenceStep  { AES-GCM(seqLen ∥ activation_tensor) }
Rank 1 → Rank 0:  inferenceToken { AES-GCM(tokenID) }
… repeat until stop sentinel: inferenceStep { AES-GCM(seqLen = -1) }

RDMA auto-discovery flow

End-to-end, including the capability gate (binary refuses to honor --rdma-enabled on non-M5 hardware), the SE-signed rdmaEnabled field in the attestation blob, and the coordinator's symmetric-access gate on /v1/cluster/rdma-peers.

sequenceDiagram
    participant MacA as Mac A (M5, --rdma-enabled)
    participant SE_A as SE (Mac A)
    participant Coord as Coordinator
    participant MacB as Mac B (M5, --rdma-enabled)

    rect rgb(230, 240, 255)
        Note over MacA,MacB: Registration — capability gate runs once, locally
        Note over MacA: RDMACapability.isAvailable()<br/>chip = M5+? rdma_ctl = enabled?<br/>(else effectiveRDMA = false)
        MacA->>SE_A: sign AttestationBlob { ..., rdmaEnabled: true }
        SE_A-->>MacA: signature
        MacA->>Coord: register { rdma_enabled: true, attestation }
        Coord->>Coord: Verify SE signature<br/>Check msg.rdma_enabled == blob.rdmaEnabled<br/>(mismatch → registry pinned to attested value)
        Coord->>Coord: SetAttested(true) — provider now peer-list eligible
        MacB->>Coord: register { rdma_enabled: true, attestation }
    end

    rect rgb(235, 250, 235)
        Note over MacA: Thunderbolt cable plugged in
        Note over MacA: NWPathMonitor fires (wiredEthernet up)
        MacA->>MacA: getifaddrs → own IP (169.254.x.x)
        MacA->>Coord: GET /v1/cluster/rdma-peers (Bearer JWT)
        Note over Coord: Symmetric gate:<br/>caller account must own an<br/>Attested + Online RDMA provider<br/>(else 403)<br/>Filter: rdma && serial && seKey<br/>&& Attested && Status ∉ {Untrusted, Offline}<br/>Excludes caller's own providers
        Coord-->>MacA: [{ serial: MacB, se_public_key, trust_level }]
        MacA->>MacA: arp -a -i bridge100 → peer IP
        MacA->>MacA: Keychain.store(peerSEKey, peerIP)
    end

    rect rgb(255, 245, 230)
        Note over MacA,MacB: Rank election (deterministic, no coordination)
        alt own IP < peer IP (rank 0)
            MacA->>MacA: ClusterSession.start() → connect to MacB
            Note over MacA,MacB: ClusterHandshake + ECDH session key
            MacA-->>MacB: ping every 5 s
        else own IP > peer IP (rank 1)
            MacA->>MacA: ClusterPeer.serve() → listen for MacB
        end
    end

Capability gate + tamper rejection

How the system handles operators who misuse --rdma-enabled and OS-tampered providers that lie in the unsigned registration field.

sequenceDiagram
    participant Op as Operator
    participant Bin as Provider binary
    participant SE as Secure Enclave
    participant Coord as Coordinator

    rect rgb(255, 240, 240)
        Note over Op,Bin: Path A — operator on non-M5 (M3/M4) passes --rdma-enabled
        Op->>Bin: darkbloom serve --rdma-enabled
        Bin->>Bin: RDMACapability.unavailableReason()<br/>"this Mac is not M5 or later"
        Bin-->>Op: stderr: "--rdma-enabled ignored: ..."
        Note over Bin: effectiveRDMA = false
        Bin->>SE: sign blob { rdmaEnabled: false }
        Bin->>Coord: register { rdma_enabled: false, attestation }
        Note over Coord: Provider excluded from peer list<br/>(rdma == false in registry)
    end

    rect rgb(255, 240, 240)
        Note over Bin,Coord: Path B — OS-tampered: flip msg.rdma_enabled but SE only signs effective value
        Bin->>SE: sign blob { rdmaEnabled: false }<br/>(capability gate ran honestly)
        SE-->>Bin: signature
        Note over Bin: Tampered shim flips unsigned field
        Bin->>Coord: register { rdma_enabled: true, attestation }
        Coord->>Coord: Verify SE signature ✓<br/>msg.rdma_enabled=true ≠ blob.rdmaEnabled=false
        Coord->>Coord: WARN log + SetRDMAEnabled(false)<br/>(attested value wins)
        Note over Coord: Registry RDMAEnabled = false<br/>provider not in peer list
    end

    rect rgb(255, 240, 240)
        Note over Bin,Coord: Path C — Bad-signature attestation (forged blob)
        Bin->>Coord: register { rdma_enabled: true, tampered attestation }
        Coord->>Coord: Verify SE signature ✗<br/>SetAttestationResult(Valid=false)<br/>MarkUntrusted (if policy)
        Note over Coord: AttestationResult.SerialNumber & PublicKey<br/>are populated (pre-sigcheck) but Attested=false<br/>and Status=Untrusted →<br/>peer-list filter excludes
    end

Full handshake + inference flow

sequenceDiagram
    participant R0 as Rank 0
    participant SE0 as SE (Rank 0)
    participant SE1 as SE (Rank 1)
    participant R1 as Rank 1

    rect rgb(230, 255, 230)
        Note over R0,R1: Handshake
        R0->>SE0: sign SHA256(x25519_pub₀ ∥ nonce₀)
        SE0-->>R0: sig₀
        R0->>R1: HandshakeHello {se_pub₀, x25519_pub₀, nonce₀, sig₀}
        Note right of R1: Keychain verify se_pub₀
        R1->>SE1: verify sig₀
        R1->>SE1: sign SHA256(nonce₀ ∥ nonce₁ ∥ x25519_pub₁)
        SE1-->>R1: sig₁
        R1->>R0: HandshakeAck {se_pub₁, x25519_pub₁, nonce₁, sig₁}
        Note left of R0: Keychain verify se_pub₁, verify sig₁
        R0->>R0: sessionKey = HKDF(X25519(sk₀,pk₁), nonce₀∥nonce₁)
        R1->>R1: sessionKey = HKDF(X25519(sk₁,pk₀), nonce₀∥nonce₁)
    end

    rect rgb(255, 250, 220)
        Note over R0,R1: Ping loop (every 5 s)
        loop Health monitor
            R0->>R1: ping
            R1-->>R0: pong {modelLoaded, memoryPressure}
        end
    end

    rect rgb(255, 235, 230)
        Note over R0,R1: Inference
        loop Each decode step
            R0->>R0: forward layers [0, splitLayer)
            R0->>R1: inferenceStep {AES-GCM(seqLen ∥ activation)}
            R1->>R1: forward layers [splitLayer, N) + argmax
            R1-->>R0: inferenceToken {AES-GCM(tokenID)}
        end
        R0->>R1: inferenceStep {AES-GCM(seqLen=-1)}
    end

Reconnect flow

3 missed pings → health = .unavailable → conn.cancel()
sleep(10 s) → TCP reconnect → new ephemeral X25519 pair → fresh handshake

Usage

Auto-discovery (new)

# Both Macs — plug in Thunderbolt cable, then:
darkbloom serve --rdma-enabled
# ClusterDiscovery auto-detects the peer, elects rank, establishes session.

Manual setup (still supported)

darkbloom cluster setup --peer-serial C02XXXXXXX --peer-ip 169.254.58.74
darkbloom cluster run

DoS protection / tamper resistance

Defense in depth — every layer below has to be defeated to enroll a malicious peer:

Operator opt-in. Providers that start without --rdma-enabled send rdma_enabled: false. The coordinator never includes them in GET /v1/cluster/rdma-peers. A Thunderbolt connection from a non-RDMA peer is rejected during handshake (SE key not pinned → ClusterError.peerSEKeyNotPinned).
Hardware capability gate. RDMACapability.isAvailable() checks chip family is M5+ AND /usr/bin/rdma_ctl status == "enabled". If either fails, the binary downgrades effectiveRDMA to false regardless of what the operator typed, prints a warning, and signs rdmaEnabled: false into the attestation blob.
SE-signed rdmaEnabled. The flag lives inside the SE-signed AttestationBlob (alphabetical position between rdmaDisabled and secureBootEnabled). A compromised OS that tries to flip the unsigned RegisterMessage.rdma_enabled post-hoc creates a mismatch; the coordinator detects it (provider.go: SetRDMAEnabled(attested value)) and pins the registry to the attested value, logging the discrepancy.
Coordinator symmetric-access gate. GET /v1/cluster/rdma-peers requires the calling account to own at least one currently-connected RDMA-enabled provider. Consumers and accounts without a matching provider receive 403. The caller's own providers are excluded from the response.
Attestation-validity filter. The peer list excludes any provider where Attested != true OR Status ∈ {Untrusted, Offline}. This closes a subtle window where attestation.Verify populates SerialNumber/PublicKey on the result before checking the signature — a tampered provider with valid-looking identifiers would otherwise satisfy the presence-of-identifiers check.
SE mutual-auth handshake. Even if all the above were somehow bypassed, the cluster handshake still requires possession of the SE private key matching the pinned public key. A wrong-machine connection fails closed (peerSEKeyNotPinned or signature mismatch).

…in-backed Reflects commit 4a0dae5 (PersistentEnclaveKey.swift). Key changes: - TB-003 how_it_works: document Security framework persistent key, access group SLDQ2GJ6TL.io.darkbloom.provider, kSecAttrIsPermanent, and the errSecMissingEntitlement fallback behaviour on patched binaries - TB-003 current_limitations: add two new limitations — team-scoped cross-binary keychain access, and silent ephemeral fallback that defers rejection to the coordinator rather than failing at the process boundary - TB-009 how_it_works: rewrite SE key lifecycle section to reflect persistent identity across restarts; rotation now requires explicit keychain deletion - T-013 (binary tampering) mitigations: add keychain access group enforcement as a fourth, implemented mitigation; update detection_hint - T-033 (attestation replay) affected_files: add PersistentEnclaveKey.swift and AttestationSigner.swift; update mitigation wording - T-035 (repudiation after rotation) description, mitigations, detection_hint: reframe rotation as an explicit operator action rather than automatic per launch; note kSecAttrIsPermanent as a positive mitigation; add open finding that coordinator cannot detect opportunistic keychain delete + re-registration Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Minimal Swift package that tests jaccl/RDMA connectivity between two Apple Silicon Macs over Thunderbolt without any model stack. rdma-ping initializes a DistributedGroup via jaccl (MLX distributed backend), runs N rounds of all_sum across both ranks, verifies correctness, and reports per-round latency. Tested live: two M5 Max (128 GB) over Thunderbolt at ~40 µs avg latency, 20/20 rounds correct. Usage: # Mac A (rank 0): ./rdma-ping --rank 0 --coordinator 169.254.106.209:9999 --rdma-device rdma_en2 # Mac B (rank 1) simultaneously: ./rdma-ping --rank 1 --coordinator 169.254.106.209:9999 --rdma-device rdma_en1 Requires macOS 26.2+, RDMA enabled in Recovery, Thunderbolt cable, and mlx.metallib symlinked next to the binary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vercel · 2026-05-20T19:11:07Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
d-inference	Ready	Preview	May 21, 2026 4:23am
d-inference-console-ui-dev	Ready	Preview	May 21, 2026 4:23am
d-inference-landing	Ready	Preview	May 21, 2026 4:23am

github-actions · 2026-05-20T19:11:57Z

Benchmark Results

Runner: macos-15 (M1 Virtual) | Date: 2026-05-20 19:14 UTC

1-provider-streaming

1 providers, 1 users, 30 requests, concurrency=5, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	30
Success	4
Errors	26
Total Duration	4.855s
Throughput	0.8 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	2.16s	2.16s	2.16s	2.16s
parse	4	27µs	36µs	50µs	50µs
reserve	4	4ms	4ms	5ms	5ms
route	4	534µs	607µs	662µs	662µs
coordinator_to_provider	4	2.153s	2.154s	2.154s	2.154s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=26.75µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=50µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=3.5335ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=5.232ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-non-streaming

1 providers, 1 users, 20 requests, concurrency=5, streaming=false

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	20
Success	4
Errors	16
Total Duration	2.776s
Throughput	1.4 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	2.772s	2.774s	2.776s	2.776s
parse	4	16µs	16µs	20µs	20µs
reserve	4	3ms	3ms	4ms	4ms
route	4	358µs	354µs	386µs	386µs
coordinator_to_provider	4	2.001s	2.002s	2.003s	2.003s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=15.5µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=20µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.6335ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=3.86ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

7-provider-multi-model

7 providers, 5 users, 50 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	4	0.5 GB
mlx-community/gemma-3-270m-4bit	3	0.2 GB

Metric	Value
Total Requests	50
Success	50
Errors	0
Total Duration	10.39s
Throughput	4.8 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	50	593ms	3ms	3.517s	3.532s
parse	48	17µs	15µs	30µs	44µs
reserve	48	1ms	1ms	2ms	3ms
route	48	390µs	380µs	527µs	627µs
coordinator_to_provider	50	489ms	1ms	3.482s	3.523s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=16.875µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=30µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.27002ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.152ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-high-concurrency

3 providers, 10 users, 60 requests, concurrency=20, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	60
Success	12
Errors	48
Total Duration	3.321s
Throughput	3.6 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	12	2.122s	2.135s	2.145s	2.145s
parse	12	13µs	11µs	25µs	25µs
reserve	12	3ms	3ms	4ms	4ms
route	12	1ms	1ms	1ms	1ms
coordinator_to_provider	12	2.114s	2.128s	2.138s	2.138s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=13.333µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=25µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.75475ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=4.279ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-queue-saturation

1 providers, 10 users, 40 requests, concurrency=15, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	40
Success	4
Errors	36
Total Duration	2.761s
Throughput	1.4 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	1.948s	1.948s	1.948s	1.948s
parse	4	13µs	13µs	22µs	22µs
reserve	4	2ms	2ms	2ms	2ms
route	4	1ms	1ms	1ms	1ms
coordinator_to_provider	4	1.942s	1.942s	1.942s	1.942s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=13µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=22µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.23725ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.466ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-20-users

3 providers, 20 users, 60 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	60
Success	60
Errors	0
Total Duration	4.867s
Throughput	12.3 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	60	345ms	5ms	2.054s	2.054s
parse	60	18µs	15µs	35µs	112µs
reserve	60	1ms	1ms	2ms	3ms
route	60	420µs	381µs	711µs	851µs
coordinator_to_provider	60	342ms	3ms	2.047s	2.048s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=17.7µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=35µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.199033ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.437ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-scaling

1 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	30
Success	4
Errors	26
Total Duration	3.272s
Throughput	1.2 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	2.312s	2.312s	2.313s	2.313s
parse	4	20µs	18µs	33µs	33µs
reserve	4	2ms	2ms	3ms	3ms
route	4	1ms	1ms	2ms	2ms
coordinator_to_provider	4	2.306s	2.306s	2.306s	2.306s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=19.75µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=33µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.309ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.745ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-scaling

3 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	3.968s
Throughput	7.6 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	716ms	6ms	2.174s	2.174s
parse	30	19µs	17µs	38µs	39µs
reserve	30	2ms	1ms	4ms	6ms
route	30	475µs	441µs	704µs	768µs
coordinator_to_provider	30	710ms	4ms	2.164s	2.164s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=18.7µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=38µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.0087ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=4.25ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

5-provider-scaling

5 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	5	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	4.22s
Throughput	7.1 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	779ms	4ms	2.399s	2.399s
parse	30	18µs	18µs	31µs	42µs
reserve	30	2ms	1ms	4ms	4ms
route	30	1ms	0s	1ms	2ms
coordinator_to_provider	30	775ms	1ms	2.392s	2.392s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=18.3µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=31µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.702566ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=3.934ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-heavy-100conc-10kb

3 providers, 20 users, 100 requests, concurrency=100, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	100
Success	12
Errors	88
Total Duration	3.188s
Throughput	3.8 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	12	2.091s	2.081s	2.115s	2.115s
parse	12	104µs	97µs	175µs	175µs
reserve	12	8ms	8ms	10ms	10ms
route	12	17ms	17ms	18ms	18ms
coordinator_to_provider	12	2.051s	2.041s	2.075s	2.075s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=104.083µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=175µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=8.257ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=9.557ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

github-actions · 2026-05-20T19:20:40Z

Benchmark Results

Runner: macos-15 (M1 Virtual) | Date: 2026-05-20 19:22 UTC

1-provider-streaming

1 providers, 1 users, 30 requests, concurrency=5, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	30
Success	4
Errors	26
Total Duration	3.846s
Throughput	1.0 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	1.747s	1.747s	1.747s	1.747s
parse	4	28µs	21µs	58µs	58µs
reserve	4	4ms	5ms	6ms	6ms
route	4	577µs	619µs	656µs	656µs
coordinator_to_provider	4	1.737s	1.738s	1.739s	1.739s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=27.75µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=58µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=4.038ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=5.946ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-non-streaming

1 providers, 1 users, 20 requests, concurrency=5, streaming=false

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	20
Success	4
Errors	16
Total Duration	2.333s
Throughput	1.7 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	2.33s	2.331s	2.333s	2.333s
parse	4	21µs	24µs	25µs	25µs
reserve	4	3ms	4ms	5ms	5ms
route	4	482µs	439µs	673µs	673µs
coordinator_to_provider	4	1.744s	1.744s	1.745s	1.745s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=21.25µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=25µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=3.26275ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=4.548ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

7-provider-multi-model

7 providers, 5 users, 50 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	4	0.5 GB
mlx-community/gemma-3-270m-4bit	3	0.2 GB

Metric	Value
Total Requests	50
Success	50
Errors	0
Total Duration	9.473s
Throughput	5.3 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	50	547ms	3ms	3.297s	3.326s
parse	49	21µs	16µs	56µs	123µs
reserve	49	2ms	1ms	8ms	8ms
route	49	0s	0s	1ms	1ms
coordinator_to_provider	50	492ms	1ms	3.282s	3.31s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=21.244µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=56µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.869775ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=7.718ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-high-concurrency

3 providers, 10 users, 60 requests, concurrency=20, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	60
Success	12
Errors	48
Total Duration	3.426s
Throughput	3.5 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	12	2.244s	2.241s	2.258s	2.258s
parse	12	13µs	14µs	26µs	26µs
reserve	12	3ms	3ms	4ms	4ms
route	12	1ms	1ms	1ms	1ms
coordinator_to_provider	12	2.235s	2.232s	2.25s	2.25s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=13.416µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=26µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=3.30775ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=3.987ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-queue-saturation

1 providers, 10 users, 40 requests, concurrency=15, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	40
Success	4
Errors	36
Total Duration	3.136s
Throughput	1.3 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	2.09s	2.09s	2.09s	2.09s
parse	4	14µs	16µs	20µs	20µs
reserve	4	2ms	2ms	2ms	2ms
route	4	593µs	606µs	649µs	649µs
coordinator_to_provider	4	2.084s	2.084s	2.084s	2.084s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=13.75µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=20µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.2685ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.347ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-20-users

3 providers, 20 users, 60 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	60
Success	60
Errors	0
Total Duration	5.188s
Throughput	11.6 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	60	370ms	5ms	2.207s	2.207s
parse	60	17µs	14µs	32µs	44µs
reserve	60	2ms	1ms	5ms	5ms
route	60	423µs	407µs	699µs	801µs
coordinator_to_provider	60	367ms	3ms	2.195s	2.2s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=17.1µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=32µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.524766ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=4.558ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-scaling

1 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	30
Success	4
Errors	26
Total Duration	3.317s
Throughput	1.2 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	2.204s	2.204s	2.204s	2.204s
parse	4	25µs	26µs	31µs	31µs
reserve	4	3ms	3ms	3ms	3ms
route	4	708µs	732µs	813µs	813µs
coordinator_to_provider	4	2.194s	2.195s	2.195s	2.195s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=24.5µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=31µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=3.23925ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=3.462ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-scaling

3 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	4.167s
Throughput	7.2 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	735ms	6ms	2.202s	2.202s
parse	30	15µs	14µs	29µs	34µs
reserve	30	2ms	1ms	3ms	3ms
route	30	1ms	0s	1ms	1ms
coordinator_to_provider	30	731ms	3ms	2.196s	2.196s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=15.233µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=29µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.560833ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.56ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

5-provider-scaling

5 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	5	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	4.32s
Throughput	6.9 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	773ms	4ms	2.362s	2.362s
parse	30	18µs	16µs	33µs	51µs
reserve	30	2ms	1ms	6ms	6ms
route	30	1ms	0s	1ms	1ms
coordinator_to_provider	30	767ms	1ms	2.349s	2.352s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=17.766µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=33µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.2053ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=6.304ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-heavy-100conc-10kb

3 providers, 20 users, 100 requests, concurrency=100, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	100
Success	12
Errors	88
Total Duration	3.047s
Throughput	3.9 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	12	2.126s	2.134s	2.134s	2.134s
parse	12	89µs	86µs	119µs	119µs
reserve	12	8ms	8ms	9ms	9ms
route	12	18ms	19ms	19ms	19ms
coordinator_to_provider	12	2.092s	2.098s	2.099s	2.099s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=89.083µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=119µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=7.937333ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=8.881ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

Implements secure two-Mac pipeline-parallel inference using SE-backed mutual authentication and AES-256-GCM for all tensor data in transit. RDMA is used only for jaccl-based collective ops (allreduce); the pipeline activation/token transfer path runs over ThunderboltLink TCP with cryptographic protection, making it immune to librdma shim injection and process memory attacks. New P2P infrastructure: - ThunderboltLink: TCP framing layer over wiredEthernet (bridge0), port 7777, 8-byte LE length-prefixed frames - MLXDistributed: Swift wrappers for jaccl C API (DistributedGroup, allSum, allGather, distributedSend/recv) - ClusterControlMessage: wire types (HandshakeHello/Ack, PongPayload), ClusterFrame encoding, ClusterHealth, ClusterError - ClusterHandshake: SE mutual auth + ephemeral X25519 ECDH session key (HKDF-SHA256, nonce0||nonce1 salt, "darkbloom-cluster-session-v1" info). TOFU pinning: peer SE key stored at /etc/darkbloom/cluster-peer-{IP}.sekey on first connection and verified on all subsequent connections - TensorCrypto: AES-256-GCM seal/open for MLXArray activations and token IDs. Wire format: nonce(12) + ciphertext + tag(16). Shape derived as [1, seqLen, hiddenDim] — seqLen embedded in plaintext - ClusterSession: ClusterSession actor (rank 0) with connect loop, 5 s ping loop, 3-missed-ping teardown, and auto-reconnect with fresh handshake each time. ClusterPeer class (rank 1) dispatches ping/pong and inference frames - EncryptedPipelineInference: EncryptedPipelineEngine actor (rank 0) and EncryptedPipelineServer class (rank 1). Per-step retry: 3 attempts × 3 s delay, then ClusterError.serviceUnavailable (HTTP 429) - PipelineInference: jaccl-based pipeline engine (kept for reference; encrypted path is preferred for production) CLI additions (darkbloom cluster): - cluster setup: writes jaccl devices JSON + env file to /etc/darkbloom - cluster devices: lists RDMA interfaces via ibv_devices - cluster ping: jaccl all_sum smoke test with latency reporting - cluster run: establishes ClusterSession (rank 0) or ClusterPeer (rank 1) and prints health status; EncryptedPipelineEngine wired once model loading is integrated - link: ThunderboltLink bandwidth test (send/receive over bridge0) Package.swift: add MLX as direct dependency of darkbloom executable (ClusterPing uses MLXArray + eval inline) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-05-20T20:42:42Z

Benchmark Results

Runner: macos-15 (M1 Virtual) | Date: 2026-05-21 02:27 UTC

1-provider-streaming

1 providers, 1 users, 30 requests, concurrency=5, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	30
Success	4
Errors	26
Total Duration	3.801s
Throughput	1.1 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	1.732s	1.732s	1.732s	1.732s
parse	4	51µs	65µs	73µs	73µs
reserve	4	6ms	7ms	9ms	9ms
route	4	748µs	795µs	907µs	907µs
coordinator_to_provider	4	1.719s	1.72s	1.722s	1.722s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=50.5µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=73µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=6.2945ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=8.742ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-non-streaming

1 providers, 1 users, 20 requests, concurrency=5, streaming=false

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	20
Success	4
Errors	16
Total Duration	2.715s
Throughput	1.5 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	2.688s	2.712s	2.714s	2.714s
parse	4	180µs	80µs	580µs	580µs
reserve	4	8ms	10ms	10ms	10ms
route	4	732µs	861µs	877µs	877µs
coordinator_to_provider	4	1.711s	1.711s	1.714s	1.714s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=179.5µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=580µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=7.84075ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=10.017ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

7-provider-multi-model

7 providers, 5 users, 50 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	4	0.5 GB
mlx-community/gemma-3-270m-4bit	3	0.2 GB

Metric	Value
Total Requests	50
Success	50
Errors	0
Total Duration	10.747s
Throughput	4.7 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	50	629ms	3ms	3.834s	3.872s
parse	49	25µs	17µs	53µs	278µs
reserve	49	1ms	1ms	2ms	3ms
route	49	0s	0s	1ms	1ms
coordinator_to_provider	50	575ms	1ms	3.826s	3.864s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=24.877µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=53µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.430632ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.357ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-high-concurrency

3 providers, 10 users, 60 requests, concurrency=20, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	60
Success	12
Errors	48
Total Duration	3.467s
Throughput	3.5 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	12	2.239s	2.24s	2.255s	2.255s
parse	12	15µs	16µs	27µs	27µs
reserve	12	2ms	2ms	3ms	3ms
route	12	441µs	396µs	764µs	764µs
coordinator_to_provider	12	2.23s	2.234s	2.249s	2.249s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=15µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=27µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.869333ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.652ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-queue-saturation

1 providers, 10 users, 40 requests, concurrency=15, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	40
Success	6
Errors	34
Total Duration	3.534s
Throughput	1.7 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	6	2.685s	2.35s	3.355s	3.355s
parse	6	15µs	16µs	22µs	22µs
reserve	6	2ms	3ms	3ms	3ms
route	6	1.115s	1ms	3.344s	3.344s
queue_wait	2	3.344s	3.344s	3.344s	3.344s
dispatch	2	34µs	39µs	39µs	39µs
coordinator_to_provider	6	1.564s	2.344s	2.344s	2.344s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=15.166µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=22µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.412666ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.55ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:mean<=5ms	PASS	mean=33.5µs (threshold=5ms)
dispatch:p95<=50ms	PASS	p95=39µs (threshold=50ms)

3-provider-20-users

3 providers, 20 users, 60 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	60
Success	60
Errors	0
Total Duration	5.305s
Throughput	11.3 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	60	353ms	4ms	2.1s	2.101s
parse	60	17µs	15µs	33µs	55µs
reserve	60	1ms	1ms	3ms	3ms
route	60	415µs	393µs	664µs	926µs
coordinator_to_provider	60	350ms	3ms	2.092s	2.095s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=16.65µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=33µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.242933ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.652ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-scaling

1 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	30
Success	4
Errors	26
Total Duration	2.78s
Throughput	1.4 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	2.034s	2.034s	2.034s	2.034s
parse	4	16µs	15µs	26µs	26µs
reserve	4	2ms	2ms	2ms	2ms
route	4	502µs	546µs	616µs	616µs
coordinator_to_provider	4	2.028s	2.028s	2.028s	2.028s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=15.75µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=26µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.97925ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.143ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-scaling

3 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	3.848s
Throughput	7.8 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	720ms	6ms	2.161s	2.161s
parse	30	16µs	13µs	30µs	55µs
reserve	30	2ms	1ms	3ms	4ms
route	30	439µs	400µs	752µs	898µs
coordinator_to_provider	30	716ms	3ms	2.154s	2.155s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=15.733µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=30µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.5179ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.622ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

5-provider-scaling

5 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	5	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	3.924s
Throughput	7.6 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	727ms	4ms	2.22s	2.22s
parse	30	18µs	18µs	29µs	31µs
reserve	30	1ms	1ms	2ms	3ms
route	30	447µs	415µs	709µs	732µs
coordinator_to_provider	30	723ms	1ms	2.211s	2.215s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=17.933µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=29µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.474333ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.448ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-heavy-100conc-10kb

3 providers, 20 users, 100 requests, concurrency=100, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	100
Success	12
Errors	88
Total Duration	3.179s
Throughput	3.8 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	12	2.059s	2.058s	2.078s	2.078s
parse	12	94µs	88µs	178µs	178µs
reserve	12	9ms	9ms	10ms	10ms
route	12	18ms	19ms	19ms	19ms
coordinator_to_provider	12	2.018s	2.017s	2.038s	2.038s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=94.083µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=178µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=8.718583ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=9.656ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

Replaces the TOFU (Trust On First Use) disk-file approach for cluster handshake SE key pinning with coordinator-verified distribution via the macOS Keychain. The previous approach stored the peer's SE public key in /etc/darkbloom/cluster-peer-{IP}.sekey on first connection. Root on either Mac could overwrite the file to accept a rogue peer. The new approach: 1. `darkbloom cluster setup --peer-serial <serial> --peer-ip <ip>` fetches the peer's SE public key from the coordinator, which verified it through Apple MDM attestation (MDA cert chain signed by Apple Enterprise Attestation Root CA). 2. The fetched key is stored in the macOS Keychain with kSecAttrAccessibleWhenUnlockedThisDeviceOnly — modifying it requires SE access, not just root. 3. ClusterHandshake.verifyAgainstKeychain() compares the SE key received during every handshake against the Keychain entry. An unknown key or a mismatch throws ClusterError.peerSEKeyMismatch and the connection is rejected immediately. Coordinator (Go): - GET /v1/cluster/peer-key?serial=<serial> (requirePrivyAuth) Queries the persistent store — works even when the peer is offline. Returns { serial, se_public_key, trust_level, mda_verified }. Swift — new files: - ClusterPeerKeychain: store/load/delete Keychain entries keyed by peer Thunderbolt IP (service "io.darkbloom.cluster.peer-sekey") - ClusterCoordinatorClient: HTTP client for /v1/cluster/peer-key; converts wss:// coordinator WebSocket URL to https:// automatically Swift — updated files: - ClusterHandshake: pinOrVerify (TOFU) → verifyAgainstKeychain Removed pinnedKeyPath and all disk-file I/O - ClusterControlMessage: add ClusterError.peerSEKeyNotPinned - ClusterCommand (ClusterSetup): add --peer-serial and --peer-ip flags; fetch + pin peer SE key as the first step of `cluster setup` Usage (run on both Macs before `cluster run`): Mac A: darkbloom cluster setup --rank 0 --peer-serial <B serial> \ --peer-ip 169.254.58.74 --coordinator :9999 Mac B: darkbloom cluster setup --rank 1 --peer-serial <A serial> \ --peer-ip 169.254.106.209 --coordinator 169.254.106.209:9999 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Providers started with `darkbloom serve --rdma-enabled` register themselves with the coordinator as RDMA-capable. Thunderbolt-connected peers automatically discover each other without manual `cluster setup`: Coordinator: - Add `rdma_enabled` field to RegisterMessage (protocol) - Add `RDMAEnabled` field to registry.Provider; set from registration - Add ListRDMAEnabledPeers() to registry — returns connected providers that have RDMAEnabled=true and completed SE attestation - Add GET /v1/cluster/rdma-peers endpoint (Privy-authenticated) - Add GET /v1/cluster/rdma-peers route to server mux Swift provider: - Add rdmaEnabled to ProviderMessage.Register, CoordinatorClientConfig, ProviderLoopConfig; wire through codec → registration message - Add ClusterCoordinatorClient.fetchRDMAPeers() for the new endpoint - New ClusterDiscovery actor: NWPathMonitor watches wiredEthernet, reads own Thunderbolt IP via getifaddrs, finds peer IP via arp -a, fetches RDMA peer list from coordinator, pins SE key in Keychain, elects rank (lower IPv4 = rank 0), starts ClusterSession or ClusterPeer - Add --rdma-enabled flag to `darkbloom serve` (Start command) - Hot-plug: NWPathMonitor re-fires on cable connect/disconnect Providers without --rdma-enabled are not in the RDMA list, preventing unauthorized cluster connections (DoS mitigation as designed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Closes a class of holes in the RDMA auto-discovery surface where the rdma_enabled flag was operator-controlled and the peer-list endpoint was readable by any authenticated user. Provider (Swift): - New RDMACapability: chip family M5+ AND /usr/bin/rdma_ctl=enabled. Operator-passed --rdma-enabled is downgraded to false locally when the hardware can't honor it, with a stderr warning. - AttestationBlob now signs rdmaEnabled (alphabetical, between rdmaDisabled and secureBootEnabled). AttestationBuilder takes the effective value as a parameter. - StartCommand wires effectiveRDMA through ProviderLoopConfig so the ClusterDiscovery actor and the SE-signed blob carry the same value. Coordinator (Go): - AttestationBlob + VerificationResult gain RDMAEnabled (always-emit, matches Swift's non-optional Bool encoding). marshalSortedJSON updated for canonical-JSON parity. - After attestation.Verify succeeds, msg.RDMAEnabled vs result.RDMAEnabled is reconciled — registry pinned to the attested value via SetRDMAEnabled, mismatches logged. - GET /v1/cluster/rdma-peers now requires the caller to own a currently-connected RDMA-enabled provider (was: any Privy JWT). Non-providers get 403; caller's own providers are excluded from the response. - ListRDMAEnabledPeersForCaller filter also requires Attested=true AND Status not in {Untrusted, Offline}. Closes a window where attestation.Verify populates SerialNumber/PublicKey *before* signature verification — without the Attested gate, tampered providers with valid identifiers could appear in the peer list. Tests: - Swift: chip-family gate, rdma_ctl-missing fail-closed, signed blob round-trip, strict-decode requirement. - Go: VerificationResult.RDMAEnabled round-trip, SE-signed bit tamper detection, SetRDMAEnabled overwrite + no-op, 9 peer-list symmetric-gate scenarios (non-provider/empty/self-tampered/ non-RDMA/unattested/tampered/untrusted/offline cases). Threat model: - TB-010 (provider_to_provider_cluster) added, T-041..T-045 cover ARP positional matching, ARP retry cancellability, silent re-pin, JWT in memory, peer-list disclosure. T-045 mitigations note both the symmetric-access gate and the Attested+Status filter as implemented. Plus a small gofmt fix in cluster_handlers.go (pre-existing trailing newline) that was blocking CI.

Picks up Layr-Labs/mlx-swift#2 which exposes the Cmlx library product and enables the jaccl distributed backend that provider-swift's ClusterSession / EncryptedPipelineInference depend on. Without this bump, CI's fresh `swift build -c debug` fails with: product 'Cmlx' required by package 'provider-swift' target 'ProviderCore' not found in package 'mlx-swift'. Tracking issue for the upstream deviation: #193.

Picks up Layr-Labs/mlx-swift-lm#24 which adds the callPartial methods on LlamaModelInner and LlamaModel that EncryptedPipelineInference and PipelineInference need for two-rank pipeline-parallel inference. Without this bump, CI's `swift build` fails with: error: value of type 'LlamaModel' has no member 'callPartial' Related: #193 (upstream mlx-swift distributed deviation tracker).

Force-pushes on Layr-Labs/mlx-swift#2 and Layr-Labs/mlx-swift-lm#24 landed new SHAs (fa6a4e8, c2fbbdc) — bump the submodule pointers to match.

github-actions · 2026-05-21T04:22:49Z

Benchmark Results

Runner: macos-15 (M1 Virtual) | Date: 2026-05-21 04:24 UTC

1-provider-streaming

1 providers, 1 users, 30 requests, concurrency=5, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	30
Success	4
Errors	26
Total Duration	3.869s
Throughput	1.0 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	1.752s	1.752s	1.752s	1.752s
parse	4	29µs	28µs	42µs	42µs
reserve	4	4ms	4ms	6ms	6ms
route	4	555µs	567µs	601µs	601µs
coordinator_to_provider	4	1.742s	1.743s	1.744s	1.744s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=29µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=42µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=3.91425ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=5.568ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-non-streaming

1 providers, 1 users, 20 requests, concurrency=5, streaming=false

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	20
Success	4
Errors	16
Total Duration	2.384s
Throughput	1.7 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	2.382s	2.382s	2.384s	2.384s
parse	4	29µs	34µs	35µs	35µs
reserve	4	4ms	5ms	7ms	7ms
route	4	681µs	626µs	995µs	995µs
coordinator_to_provider	4	1.762s	1.762s	1.764s	1.764s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=28.5µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=35µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=4.48575ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=6.713ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

7-provider-multi-model

7 providers, 5 users, 50 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	4	0.5 GB
mlx-community/gemma-3-270m-4bit	3	0.2 GB

Metric	Value
Total Requests	50
Success	50
Errors	0
Total Duration	9.56s
Throughput	5.2 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	50	554ms	3ms	3.334s	3.36s
parse	48	20µs	17µs	47µs	56µs
reserve	48	1ms	1ms	4ms	5ms
route	48	1ms	0s	1ms	1ms
coordinator_to_provider	50	450ms	1ms	3.322s	3.35s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=20.062µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=47µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.457ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=3.517ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-high-concurrency

3 providers, 10 users, 60 requests, concurrency=20, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	60
Success	12
Errors	48
Total Duration	3.528s
Throughput	3.4 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	12	2.099s	2.104s	2.116s	2.116s
parse	12	16µs	15µs	38µs	38µs
reserve	12	3ms	3ms	4ms	4ms
route	12	1ms	1ms	1ms	1ms
coordinator_to_provider	12	2.09s	2.097s	2.11s	2.11s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=16µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=38µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.762666ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=4.386ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-queue-saturation

1 providers, 10 users, 40 requests, concurrency=15, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	40
Success	4
Errors	36
Total Duration	3.209s
Throughput	1.2 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	2.155s	2.155s	2.155s	2.155s
parse	4	18µs	26µs	27µs	27µs
reserve	4	2ms	2ms	3ms	3ms
route	4	621µs	733µs	758µs	758µs
coordinator_to_provider	4	2.149s	2.149s	2.149s	2.149s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=18µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=27µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.38625ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.868ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-20-users

3 providers, 20 users, 60 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	60
Success	60
Errors	0
Total Duration	4.838s
Throughput	12.4 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	60	353ms	5ms	2.101s	2.102s
parse	60	21µs	15µs	46µs	191µs
reserve	60	1ms	1ms	3ms	3ms
route	60	0s	0s	1ms	2ms
coordinator_to_provider	60	349ms	3ms	2.094s	2.096s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=20.883µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=46µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.400433ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.909ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-scaling

1 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	30
Success	4
Errors	26
Total Duration	3.237s
Throughput	1.2 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	2.218s	2.218s	2.218s	2.218s
parse	4	15µs	15µs	25µs	25µs
reserve	4	2ms	2ms	2ms	2ms
route	4	646µs	651µs	944µs	944µs
coordinator_to_provider	4	2.212s	2.212s	2.213s	2.213s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=14.5µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=25µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.14825ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.34ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-scaling

3 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	4.201s
Throughput	7.1 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	753ms	6ms	2.27s	2.27s
parse	30	15µs	14µs	24µs	30µs
reserve	30	2ms	1ms	3ms	3ms
route	30	422µs	404µs	596µs	649µs
coordinator_to_provider	30	749ms	4ms	2.264s	2.265s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=14.566µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=24µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.502633ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.516ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

5-provider-scaling

5 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	5	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	4.23s
Throughput	7.1 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	769ms	4ms	2.328s	2.329s
parse	30	19µs	17µs	35µs	43µs
reserve	30	2ms	1ms	3ms	4ms
route	30	420µs	398µs	593µs	819µs
coordinator_to_provider	30	765ms	1ms	2.321s	2.324s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=18.666µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=35µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.6024ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=3.454ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-heavy-100conc-10kb

3 providers, 20 users, 100 requests, concurrency=100, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	100
Success	12
Errors	88
Total Duration	3.484s
Throughput	3.4 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	12	2.269s	2.275s	2.296s	2.296s
parse	12	143µs	127µs	228µs	228µs
reserve	12	11ms	11ms	12ms	12ms
route	12	19ms	19ms	19ms	19ms
coordinator_to_provider	12	2.217s	2.222s	2.24s	2.24s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=142.75µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=228µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=10.824583ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=12.326ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

anupsv and others added 2 commits May 15, 2026 16:26

vercel Bot deployed to Preview – d-inference-landing May 20, 2026 19:11 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 20, 2026 19:11 View deployment

vercel Bot deployed to Preview – d-inference May 20, 2026 19:11 View deployment

anupsv requested a review from Gajesh2007 May 20, 2026 19:19

Merge branch 'master' into rdma-connection-test

6eec617

vercel Bot deployed to Preview – d-inference-landing May 20, 2026 19:20 View deployment

vercel Bot deployed to Preview – d-inference May 20, 2026 19:20 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 20, 2026 19:20 View deployment

vercel Bot deployed to Preview – d-inference-landing May 20, 2026 20:41 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 20, 2026 20:41 View deployment

anupsv changed the title ~~feat(rdma): standalone RDMA connection smoke test~~ feat(cluster): RDMA smoke test + encrypted pipeline inference over Thunderbolt May 20, 2026

vercel Bot deployed to Preview – d-inference May 20, 2026 20:41 View deployment

vercel Bot deployed to Preview – d-inference-landing May 20, 2026 21:05 View deployment

vercel Bot deployed to Preview – d-inference May 20, 2026 21:05 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 20, 2026 21:05 View deployment

vercel Bot deployed to Preview – d-inference-landing May 20, 2026 22:22 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 20, 2026 22:23 View deployment

vercel Bot deployed to Preview – d-inference May 20, 2026 22:23 View deployment

vercel Bot deployed to Preview – d-inference-landing May 21, 2026 01:45 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 21, 2026 01:45 View deployment

vercel Bot deployed to Preview – d-inference May 21, 2026 01:45 View deployment

vercel Bot deployed to Preview – d-inference-landing May 21, 2026 02:12 View deployment

vercel Bot deployed to Preview – d-inference May 21, 2026 02:13 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 21, 2026 02:13 View deployment

vercel Bot deployed to Preview – d-inference-landing May 21, 2026 02:25 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 21, 2026 02:25 View deployment

vercel Bot deployed to Preview – d-inference May 21, 2026 02:25 View deployment

anupsv added 3 commits May 20, 2026 19:41

anupsv force-pushed the rdma-connection-test branch from 7c75cc7 to 16ea9f5 Compare May 21, 2026 04:21

vercel Bot deployed to Preview – d-inference-landing May 21, 2026 04:21 View deployment

chore(submodule): re-point to re-signed fork commits

65cd3d9

Force-pushes on Layr-Labs/mlx-swift#2 and Layr-Labs/mlx-swift-lm#24 landed new SHAs (fa6a4e8, c2fbbdc) — bump the submodule pointers to match.

vercel Bot deployed to Preview – d-inference-landing May 21, 2026 04:22 View deployment

vercel Bot deployed to Preview – d-inference May 21, 2026 04:23 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 21, 2026 04:23 View deployment

anupsv mentioned this pull request May 22, 2026

feat(cluster): TP-default dispatch + TensorParallelEngine scaffold #194

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cluster): RDMA smoke test + encrypted pipeline inference over Thunderbolt#192

feat(cluster): RDMA smoke test + encrypted pipeline inference over Thunderbolt#192
anupsv wants to merge 10 commits into
masterfrom
rdma-connection-test

anupsv commented May 20, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anupsv commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why not plain RDMA for tensor transfer?

New files

ProviderCore/P2P/

Coordinator

Protocol flows

Handshake (once per connection)

Inference (per token)

RDMA auto-discovery flow

Capability gate + tamper rejection

Full handshake + inference flow

Reconnect flow

Usage

Auto-discovery (new)

Manual setup (still supported)

DoS protection / tamper resistance

Uh oh!

vercel Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

1-provider-streaming

Latency Decomposition

Assertion Report: FAIL

1-provider-non-streaming

Latency Decomposition

Assertion Report: FAIL

7-provider-multi-model

Latency Decomposition

Assertion Report: FAIL

3-provider-high-concurrency

Latency Decomposition

Assertion Report: FAIL

1-provider-queue-saturation

Latency Decomposition

Assertion Report: FAIL

3-provider-20-users

Latency Decomposition

Assertion Report: FAIL

1-provider-scaling

Latency Decomposition

Assertion Report: FAIL

3-provider-scaling

Latency Decomposition

Assertion Report: FAIL

5-provider-scaling

Latency Decomposition

Assertion Report: FAIL

3-provider-heavy-100conc-10kb

Latency Decomposition

Assertion Report: FAIL

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

1-provider-streaming

Latency Decomposition

Assertion Report: FAIL

1-provider-non-streaming

Latency Decomposition

Assertion Report: FAIL

7-provider-multi-model

Latency Decomposition

Assertion Report: FAIL

3-provider-high-concurrency

Latency Decomposition

Assertion Report: FAIL

1-provider-queue-saturation

Latency Decomposition

Assertion Report: FAIL

3-provider-20-users

Latency Decomposition

Assertion Report: FAIL

1-provider-scaling

Latency Decomposition

anupsv commented May 20, 2026 •

edited

Loading

`ProviderCore/P2P/`

vercel Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading