Skip to content

feat(cluster): RDMA smoke test + encrypted pipeline inference over Thunderbolt#192

Open
anupsv wants to merge 10 commits into
masterfrom
rdma-connection-test
Open

feat(cluster): RDMA smoke test + encrypted pipeline inference over Thunderbolt#192
anupsv wants to merge 10 commits into
masterfrom
rdma-connection-test

Conversation

@anupsv
Copy link
Copy Markdown
Contributor

@anupsv anupsv commented May 20, 2026

Summary

This PR adds the complete RDMA pipeline inference stack with SE-authenticated encrypted tensor transfer over Thunderbolt, plus fully automated peer discovery.

  1. RDMA connectivity smoke test (rdma-connection-test/) — standalone Swift package that verifies jaccl/RDMA works between two Apple Silicon Macs over Thunderbolt
  2. Encrypted pipeline inference infrastructure — full SE-authenticated, AES-256-GCM-encrypted pipeline-parallel inference stack in ProviderCore/P2P/
  3. Coordinator-verified SE key pinning — replaces TOFU disk files with Keychain storage backed by coordinator MDM attestation
  4. RDMA auto-discoverydarkbloom serve --rdma-enabled automatically finds the Thunderbolt peer without manual serial pairing

Why not plain RDMA for tensor transfer?

RDMA (mlx_distributed_send/recv) is a black-box C API — we can't wrap it with encryption. An attacker with root could inject a librdma.dylib shim and intercept activation tensors in transit. Instead:

  • jaccl / RDMA is used only for collective ops (allreduce) where confidentiality is less critical
  • Activation + token transfer goes over ThunderboltLink TCP with AES-256-GCM (~0.3 ms for 32 MB)

New files

ProviderCore/P2P/

File Role
ThunderboltLink.swift TCP framing over wiredEthernet (bridge0), port 7777, 8-byte LE length prefix
MLXDistributed.swift Swift wrappers for jaccl C API
ClusterControlMessage.swift Wire types, ClusterFrame encoding, ClusterHealth, ClusterError
ClusterHandshake.swift SE mutual auth + ephemeral X25519 ECDH + HKDF session key
ClusterPeerKeychain.swift Stores peer SE keys in macOS Keychain (kSecAttrAccessibleWhenUnlockedThisDeviceOnly)
ClusterCoordinatorClient.swift Coordinator HTTP client: fetchPeerSEKey + fetchRDMAPeers
TensorCrypto.swift AES-256-GCM seal/open for MLXArray
ClusterSession.swift Rank 0 actor (connect loop, ping loop, auto-reconnect) + Rank 1 ClusterPeer
EncryptedPipelineInference.swift EncryptedPipelineEngine (rank 0) + EncryptedPipelineServer (rank 1)
ClusterDiscovery.swift NEW: RDMA auto-discovery actor (NWPathMonitor + ARP + coordinator lookup)
RDMACapability.swift NEW: M5+ chip + rdma_ctl status capability gate (downgrades --rdma-enabled when hardware can't honor it)
Security/AttestationBuilder.swift Add rdmaEnabled field to the signed AttestationBlob; builder takes effective value as a parameter

Coordinator

File Change
protocol/messages.go Add rdma_enabled to RegisterMessage
attestation/attestation.go Add RDMAEnabled to AttestationBlob + VerificationResult; include in marshalSortedJSON
registry/registry.go Add RDMAEnabled to Provider; add SetRDMAEnabled() (coordinator pins to attested value on mismatch); ListRDMAEnabledPeersForCaller() enforces symmetric-access + Attested + Status filters
api/cluster_handlers.go GET /v1/cluster/peer-key + GET /v1/cluster/rdma-peers (rdma-peers returns 403 unless caller owns an Attested RDMA provider)
api/provider.go After attestation.Verify, reconcile msg.RDMAEnabled vs result.RDMAEnabled — attested value wins, mismatch logged
api/server.go Register both cluster endpoints

Protocol flows

Handshake (once per connection)

Rank 0 → Rank 1:  HandshakeHello { se_pub₀, x25519_pub₀, nonce₀, sig₀ }
                  sig₀ = SE_sign(SHA256(x25519_pub₀ ∥ nonce₀))

Rank 1 → Rank 0:  HandshakeAck   { se_pub₁, x25519_pub₁, nonce₁, sig₁ }
                  sig₁ = SE_sign(SHA256(nonce₀ ∥ nonce₁ ∥ x25519_pub₁))

Session key = HKDF-SHA256(X25519(sk_local, pk_peer), salt: nonce₀ ∥ nonce₁)

SE key pinning is coordinator-verified: keys come from MDM attestation records,
stored in macOS Keychain (kSecAttrAccessibleWhenUnlockedThisDeviceOnly).

Inference (per token)

Rank 0 → Rank 1:  inferenceStep  { AES-GCM(seqLen ∥ activation_tensor) }
Rank 1 → Rank 0:  inferenceToken { AES-GCM(tokenID) }
… repeat until stop sentinel: inferenceStep { AES-GCM(seqLen = -1) }

RDMA auto-discovery flow

End-to-end, including the capability gate (binary refuses to honor --rdma-enabled on non-M5 hardware), the SE-signed rdmaEnabled field in the attestation blob, and the coordinator's symmetric-access gate on /v1/cluster/rdma-peers.

sequenceDiagram
    participant MacA as Mac A (M5, --rdma-enabled)
    participant SE_A as SE (Mac A)
    participant Coord as Coordinator
    participant MacB as Mac B (M5, --rdma-enabled)

    rect rgb(230, 240, 255)
        Note over MacA,MacB: Registration — capability gate runs once, locally
        Note over MacA: RDMACapability.isAvailable()<br/>chip = M5+? rdma_ctl = enabled?<br/>(else effectiveRDMA = false)
        MacA->>SE_A: sign AttestationBlob { ..., rdmaEnabled: true }
        SE_A-->>MacA: signature
        MacA->>Coord: register { rdma_enabled: true, attestation }
        Coord->>Coord: Verify SE signature<br/>Check msg.rdma_enabled == blob.rdmaEnabled<br/>(mismatch → registry pinned to attested value)
        Coord->>Coord: SetAttested(true) — provider now peer-list eligible
        MacB->>Coord: register { rdma_enabled: true, attestation }
    end

    rect rgb(235, 250, 235)
        Note over MacA: Thunderbolt cable plugged in
        Note over MacA: NWPathMonitor fires (wiredEthernet up)
        MacA->>MacA: getifaddrs → own IP (169.254.x.x)
        MacA->>Coord: GET /v1/cluster/rdma-peers (Bearer JWT)
        Note over Coord: Symmetric gate:<br/>caller account must own an<br/>Attested + Online RDMA provider<br/>(else 403)<br/>Filter: rdma && serial && seKey<br/>&& Attested && Status ∉ {Untrusted, Offline}<br/>Excludes caller's own providers
        Coord-->>MacA: [{ serial: MacB, se_public_key, trust_level }]
        MacA->>MacA: arp -a -i bridge100 → peer IP
        MacA->>MacA: Keychain.store(peerSEKey, peerIP)
    end

    rect rgb(255, 245, 230)
        Note over MacA,MacB: Rank election (deterministic, no coordination)
        alt own IP < peer IP (rank 0)
            MacA->>MacA: ClusterSession.start() → connect to MacB
            Note over MacA,MacB: ClusterHandshake + ECDH session key
            MacA-->>MacB: ping every 5 s
        else own IP > peer IP (rank 1)
            MacA->>MacA: ClusterPeer.serve() → listen for MacB
        end
    end
Loading

Capability gate + tamper rejection

How the system handles operators who misuse --rdma-enabled and OS-tampered providers that lie in the unsigned registration field.

sequenceDiagram
    participant Op as Operator
    participant Bin as Provider binary
    participant SE as Secure Enclave
    participant Coord as Coordinator

    rect rgb(255, 240, 240)
        Note over Op,Bin: Path A — operator on non-M5 (M3/M4) passes --rdma-enabled
        Op->>Bin: darkbloom serve --rdma-enabled
        Bin->>Bin: RDMACapability.unavailableReason()<br/>"this Mac is not M5 or later"
        Bin-->>Op: stderr: "--rdma-enabled ignored: ..."
        Note over Bin: effectiveRDMA = false
        Bin->>SE: sign blob { rdmaEnabled: false }
        Bin->>Coord: register { rdma_enabled: false, attestation }
        Note over Coord: Provider excluded from peer list<br/>(rdma == false in registry)
    end

    rect rgb(255, 240, 240)
        Note over Bin,Coord: Path B — OS-tampered: flip msg.rdma_enabled but SE only signs effective value
        Bin->>SE: sign blob { rdmaEnabled: false }<br/>(capability gate ran honestly)
        SE-->>Bin: signature
        Note over Bin: Tampered shim flips unsigned field
        Bin->>Coord: register { rdma_enabled: true, attestation }
        Coord->>Coord: Verify SE signature ✓<br/>msg.rdma_enabled=true ≠ blob.rdmaEnabled=false
        Coord->>Coord: WARN log + SetRDMAEnabled(false)<br/>(attested value wins)
        Note over Coord: Registry RDMAEnabled = false<br/>provider not in peer list
    end

    rect rgb(255, 240, 240)
        Note over Bin,Coord: Path C — Bad-signature attestation (forged blob)
        Bin->>Coord: register { rdma_enabled: true, tampered attestation }
        Coord->>Coord: Verify SE signature ✗<br/>SetAttestationResult(Valid=false)<br/>MarkUntrusted (if policy)
        Note over Coord: AttestationResult.SerialNumber & PublicKey<br/>are populated (pre-sigcheck) but Attested=false<br/>and Status=Untrusted →<br/>peer-list filter excludes
    end
Loading

Full handshake + inference flow

sequenceDiagram
    participant R0 as Rank 0
    participant SE0 as SE (Rank 0)
    participant SE1 as SE (Rank 1)
    participant R1 as Rank 1

    rect rgb(230, 255, 230)
        Note over R0,R1: Handshake
        R0->>SE0: sign SHA256(x25519_pub₀ ∥ nonce₀)
        SE0-->>R0: sig₀
        R0->>R1: HandshakeHello {se_pub₀, x25519_pub₀, nonce₀, sig₀}
        Note right of R1: Keychain verify se_pub₀
        R1->>SE1: verify sig₀
        R1->>SE1: sign SHA256(nonce₀ ∥ nonce₁ ∥ x25519_pub₁)
        SE1-->>R1: sig₁
        R1->>R0: HandshakeAck {se_pub₁, x25519_pub₁, nonce₁, sig₁}
        Note left of R0: Keychain verify se_pub₁, verify sig₁
        R0->>R0: sessionKey = HKDF(X25519(sk₀,pk₁), nonce₀∥nonce₁)
        R1->>R1: sessionKey = HKDF(X25519(sk₁,pk₀), nonce₀∥nonce₁)
    end

    rect rgb(255, 250, 220)
        Note over R0,R1: Ping loop (every 5 s)
        loop Health monitor
            R0->>R1: ping
            R1-->>R0: pong {modelLoaded, memoryPressure}
        end
    end

    rect rgb(255, 235, 230)
        Note over R0,R1: Inference
        loop Each decode step
            R0->>R0: forward layers [0, splitLayer)
            R0->>R1: inferenceStep {AES-GCM(seqLen ∥ activation)}
            R1->>R1: forward layers [splitLayer, N) + argmax
            R1-->>R0: inferenceToken {AES-GCM(tokenID)}
        end
        R0->>R1: inferenceStep {AES-GCM(seqLen=-1)}
    end
Loading

Reconnect flow

3 missed pings → health = .unavailable → conn.cancel()
sleep(10 s) → TCP reconnect → new ephemeral X25519 pair → fresh handshake

Usage

Auto-discovery (new)

# Both Macs — plug in Thunderbolt cable, then:
darkbloom serve --rdma-enabled
# ClusterDiscovery auto-detects the peer, elects rank, establishes session.

Manual setup (still supported)

darkbloom cluster setup --peer-serial C02XXXXXXX --peer-ip 169.254.58.74
darkbloom cluster run

DoS protection / tamper resistance

Defense in depth — every layer below has to be defeated to enroll a malicious peer:

  1. Operator opt-in. Providers that start without --rdma-enabled send rdma_enabled: false. The coordinator never includes them in GET /v1/cluster/rdma-peers. A Thunderbolt connection from a non-RDMA peer is rejected during handshake (SE key not pinned → ClusterError.peerSEKeyNotPinned).
  2. Hardware capability gate. RDMACapability.isAvailable() checks chip family is M5+ AND /usr/bin/rdma_ctl status == "enabled". If either fails, the binary downgrades effectiveRDMA to false regardless of what the operator typed, prints a warning, and signs rdmaEnabled: false into the attestation blob.
  3. SE-signed rdmaEnabled. The flag lives inside the SE-signed AttestationBlob (alphabetical position between rdmaDisabled and secureBootEnabled). A compromised OS that tries to flip the unsigned RegisterMessage.rdma_enabled post-hoc creates a mismatch; the coordinator detects it (provider.go: SetRDMAEnabled(attested value)) and pins the registry to the attested value, logging the discrepancy.
  4. Coordinator symmetric-access gate. GET /v1/cluster/rdma-peers requires the calling account to own at least one currently-connected RDMA-enabled provider. Consumers and accounts without a matching provider receive 403. The caller's own providers are excluded from the response.
  5. Attestation-validity filter. The peer list excludes any provider where Attested != true OR Status ∈ {Untrusted, Offline}. This closes a subtle window where attestation.Verify populates SerialNumber/PublicKey on the result before checking the signature — a tampered provider with valid-looking identifiers would otherwise satisfy the presence-of-identifiers check.
  6. SE mutual-auth handshake. Even if all the above were somehow bypassed, the cluster handshake still requires possession of the SE private key matching the pinned public key. A wrong-machine connection fails closed (peerSEKeyNotPinned or signature mismatch).

anupsv and others added 2 commits May 15, 2026 16:26
…in-backed

Reflects commit 4a0dae5 (PersistentEnclaveKey.swift). Key changes:

- TB-003 how_it_works: document Security framework persistent key, access group
  SLDQ2GJ6TL.io.darkbloom.provider, kSecAttrIsPermanent, and the
  errSecMissingEntitlement fallback behaviour on patched binaries
- TB-003 current_limitations: add two new limitations — team-scoped cross-binary
  keychain access, and silent ephemeral fallback that defers rejection to the
  coordinator rather than failing at the process boundary
- TB-009 how_it_works: rewrite SE key lifecycle section to reflect persistent
  identity across restarts; rotation now requires explicit keychain deletion
- T-013 (binary tampering) mitigations: add keychain access group enforcement as
  a fourth, implemented mitigation; update detection_hint
- T-033 (attestation replay) affected_files: add PersistentEnclaveKey.swift and
  AttestationSigner.swift; update mitigation wording
- T-035 (repudiation after rotation) description, mitigations, detection_hint:
  reframe rotation as an explicit operator action rather than automatic per
  launch; note kSecAttrIsPermanent as a positive mitigation; add open finding
  that coordinator cannot detect opportunistic keychain delete + re-registration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Minimal Swift package that tests jaccl/RDMA connectivity between two
Apple Silicon Macs over Thunderbolt without any model stack.

rdma-ping initializes a DistributedGroup via jaccl (MLX distributed
backend), runs N rounds of all_sum across both ranks, verifies
correctness, and reports per-round latency.

Tested live: two M5 Max (128 GB) over Thunderbolt at ~40 µs avg
latency, 20/20 rounds correct.

Usage:
  # Mac A (rank 0):
  ./rdma-ping --rank 0 --coordinator 169.254.106.209:9999 --rdma-device rdma_en2

  # Mac B (rank 1) simultaneously:
  ./rdma-ping --rank 1 --coordinator 169.254.106.209:9999 --rdma-device rdma_en1

Requires macOS 26.2+, RDMA enabled in Recovery, Thunderbolt cable,
and mlx.metallib symlinked next to the binary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
d-inference Ready Ready Preview May 21, 2026 4:23am
d-inference-console-ui-dev Ready Ready Preview May 21, 2026 4:23am
d-inference-landing Ready Ready Preview May 21, 2026 4:23am

Request Review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

Benchmark Results

Runner: macos-15 (M1 Virtual) | Date: 2026-05-20 19:14 UTC

1-provider-streaming

1 providers, 1 users, 30 requests, concurrency=5, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 30
Success 4
Errors 26
Total Duration 4.855s
Throughput 0.8 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 2.16s 2.16s 2.16s 2.16s
parse 4 27µs 36µs 50µs 50µs
reserve 4 4ms 4ms 5ms 5ms
route 4 534µs 607µs 662µs 662µs
coordinator_to_provider 4 2.153s 2.154s 2.154s 2.154s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=26.75µs (threshold=1ms)
parse:p95<=5ms PASS p95=50µs (threshold=5ms)
reserve:mean<=50ms PASS mean=3.5335ms (threshold=50ms)
reserve:p95<=200ms PASS p95=5.232ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

1-provider-non-streaming

1 providers, 1 users, 20 requests, concurrency=5, streaming=false

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 20
Success 4
Errors 16
Total Duration 2.776s
Throughput 1.4 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 2.772s 2.774s 2.776s 2.776s
parse 4 16µs 16µs 20µs 20µs
reserve 4 3ms 3ms 4ms 4ms
route 4 358µs 354µs 386µs 386µs
coordinator_to_provider 4 2.001s 2.002s 2.003s 2.003s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=15.5µs (threshold=1ms)
parse:p95<=5ms PASS p95=20µs (threshold=5ms)
reserve:mean<=50ms PASS mean=2.6335ms (threshold=50ms)
reserve:p95<=200ms PASS p95=3.86ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

7-provider-multi-model

7 providers, 5 users, 50 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 4 0.5 GB
mlx-community/gemma-3-270m-4bit 3 0.2 GB
Metric Value
Total Requests 50
Success 50
Errors 0
Total Duration 10.39s
Throughput 4.8 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 50 593ms 3ms 3.517s 3.532s
parse 48 17µs 15µs 30µs 44µs
reserve 48 1ms 1ms 2ms 3ms
route 48 390µs 380µs 527µs 627µs
coordinator_to_provider 50 489ms 1ms 3.482s 3.523s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=16.875µs (threshold=1ms)
parse:p95<=5ms PASS p95=30µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.27002ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.152ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-high-concurrency

3 providers, 10 users, 60 requests, concurrency=20, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 60
Success 12
Errors 48
Total Duration 3.321s
Throughput 3.6 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 12 2.122s 2.135s 2.145s 2.145s
parse 12 13µs 11µs 25µs 25µs
reserve 12 3ms 3ms 4ms 4ms
route 12 1ms 1ms 1ms 1ms
coordinator_to_provider 12 2.114s 2.128s 2.138s 2.138s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=13.333µs (threshold=1ms)
parse:p95<=5ms PASS p95=25µs (threshold=5ms)
reserve:mean<=50ms PASS mean=2.75475ms (threshold=50ms)
reserve:p95<=200ms PASS p95=4.279ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

1-provider-queue-saturation

1 providers, 10 users, 40 requests, concurrency=15, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 40
Success 4
Errors 36
Total Duration 2.761s
Throughput 1.4 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 1.948s 1.948s 1.948s 1.948s
parse 4 13µs 13µs 22µs 22µs
reserve 4 2ms 2ms 2ms 2ms
route 4 1ms 1ms 1ms 1ms
coordinator_to_provider 4 1.942s 1.942s 1.942s 1.942s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=13µs (threshold=1ms)
parse:p95<=5ms PASS p95=22µs (threshold=5ms)
reserve:mean<=50ms PASS mean=2.23725ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.466ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-20-users

3 providers, 20 users, 60 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 60
Success 60
Errors 0
Total Duration 4.867s
Throughput 12.3 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 60 345ms 5ms 2.054s 2.054s
parse 60 18µs 15µs 35µs 112µs
reserve 60 1ms 1ms 2ms 3ms
route 60 420µs 381µs 711µs 851µs
coordinator_to_provider 60 342ms 3ms 2.047s 2.048s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=17.7µs (threshold=1ms)
parse:p95<=5ms PASS p95=35µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.199033ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.437ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

1-provider-scaling

1 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 30
Success 4
Errors 26
Total Duration 3.272s
Throughput 1.2 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 2.312s 2.312s 2.313s 2.313s
parse 4 20µs 18µs 33µs 33µs
reserve 4 2ms 2ms 3ms 3ms
route 4 1ms 1ms 2ms 2ms
coordinator_to_provider 4 2.306s 2.306s 2.306s 2.306s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=19.75µs (threshold=1ms)
parse:p95<=5ms PASS p95=33µs (threshold=5ms)
reserve:mean<=50ms PASS mean=2.309ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.745ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-scaling

3 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 3.968s
Throughput 7.6 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 716ms 6ms 2.174s 2.174s
parse 30 19µs 17µs 38µs 39µs
reserve 30 2ms 1ms 4ms 6ms
route 30 475µs 441µs 704µs 768µs
coordinator_to_provider 30 710ms 4ms 2.164s 2.164s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=18.7µs (threshold=1ms)
parse:p95<=5ms PASS p95=38µs (threshold=5ms)
reserve:mean<=50ms PASS mean=2.0087ms (threshold=50ms)
reserve:p95<=200ms PASS p95=4.25ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

5-provider-scaling

5 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 5 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 4.22s
Throughput 7.1 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 779ms 4ms 2.399s 2.399s
parse 30 18µs 18µs 31µs 42µs
reserve 30 2ms 1ms 4ms 4ms
route 30 1ms 0s 1ms 2ms
coordinator_to_provider 30 775ms 1ms 2.392s 2.392s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=18.3µs (threshold=1ms)
parse:p95<=5ms PASS p95=31µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.702566ms (threshold=50ms)
reserve:p95<=200ms PASS p95=3.934ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-heavy-100conc-10kb

3 providers, 20 users, 100 requests, concurrency=100, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 100
Success 12
Errors 88
Total Duration 3.188s
Throughput 3.8 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 12 2.091s 2.081s 2.115s 2.115s
parse 12 104µs 97µs 175µs 175µs
reserve 12 8ms 8ms 10ms 10ms
route 12 17ms 17ms 18ms 18ms
coordinator_to_provider 12 2.051s 2.041s 2.075s 2.075s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=104.083µs (threshold=1ms)
parse:p95<=5ms PASS p95=175µs (threshold=5ms)
reserve:mean<=50ms PASS mean=8.257ms (threshold=50ms)
reserve:p95<=200ms PASS p95=9.557ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

@anupsv anupsv requested a review from Gajesh2007 May 20, 2026 19:19
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

Benchmark Results

Runner: macos-15 (M1 Virtual) | Date: 2026-05-20 19:22 UTC

1-provider-streaming

1 providers, 1 users, 30 requests, concurrency=5, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 30
Success 4
Errors 26
Total Duration 3.846s
Throughput 1.0 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 1.747s 1.747s 1.747s 1.747s
parse 4 28µs 21µs 58µs 58µs
reserve 4 4ms 5ms 6ms 6ms
route 4 577µs 619µs 656µs 656µs
coordinator_to_provider 4 1.737s 1.738s 1.739s 1.739s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=27.75µs (threshold=1ms)
parse:p95<=5ms PASS p95=58µs (threshold=5ms)
reserve:mean<=50ms PASS mean=4.038ms (threshold=50ms)
reserve:p95<=200ms PASS p95=5.946ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

1-provider-non-streaming

1 providers, 1 users, 20 requests, concurrency=5, streaming=false

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 20
Success 4
Errors 16
Total Duration 2.333s
Throughput 1.7 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 2.33s 2.331s 2.333s 2.333s
parse 4 21µs 24µs 25µs 25µs
reserve 4 3ms 4ms 5ms 5ms
route 4 482µs 439µs 673µs 673µs
coordinator_to_provider 4 1.744s 1.744s 1.745s 1.745s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=21.25µs (threshold=1ms)
parse:p95<=5ms PASS p95=25µs (threshold=5ms)
reserve:mean<=50ms PASS mean=3.26275ms (threshold=50ms)
reserve:p95<=200ms PASS p95=4.548ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

7-provider-multi-model

7 providers, 5 users, 50 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 4 0.5 GB
mlx-community/gemma-3-270m-4bit 3 0.2 GB
Metric Value
Total Requests 50
Success 50
Errors 0
Total Duration 9.473s
Throughput 5.3 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 50 547ms 3ms 3.297s 3.326s
parse 49 21µs 16µs 56µs 123µs
reserve 49 2ms 1ms 8ms 8ms
route 49 0s 0s 1ms 1ms
coordinator_to_provider 50 492ms 1ms 3.282s 3.31s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=21.244µs (threshold=1ms)
parse:p95<=5ms PASS p95=56µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.869775ms (threshold=50ms)
reserve:p95<=200ms PASS p95=7.718ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-high-concurrency

3 providers, 10 users, 60 requests, concurrency=20, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 60
Success 12
Errors 48
Total Duration 3.426s
Throughput 3.5 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 12 2.244s 2.241s 2.258s 2.258s
parse 12 13µs 14µs 26µs 26µs
reserve 12 3ms 3ms 4ms 4ms
route 12 1ms 1ms 1ms 1ms
coordinator_to_provider 12 2.235s 2.232s 2.25s 2.25s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=13.416µs (threshold=1ms)
parse:p95<=5ms PASS p95=26µs (threshold=5ms)
reserve:mean<=50ms PASS mean=3.30775ms (threshold=50ms)
reserve:p95<=200ms PASS p95=3.987ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

1-provider-queue-saturation

1 providers, 10 users, 40 requests, concurrency=15, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 40
Success 4
Errors 36
Total Duration 3.136s
Throughput 1.3 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 2.09s 2.09s 2.09s 2.09s
parse 4 14µs 16µs 20µs 20µs
reserve 4 2ms 2ms 2ms 2ms
route 4 593µs 606µs 649µs 649µs
coordinator_to_provider 4 2.084s 2.084s 2.084s 2.084s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=13.75µs (threshold=1ms)
parse:p95<=5ms PASS p95=20µs (threshold=5ms)
reserve:mean<=50ms PASS mean=2.2685ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.347ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-20-users

3 providers, 20 users, 60 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 60
Success 60
Errors 0
Total Duration 5.188s
Throughput 11.6 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 60 370ms 5ms 2.207s 2.207s
parse 60 17µs 14µs 32µs 44µs
reserve 60 2ms 1ms 5ms 5ms
route 60 423µs 407µs 699µs 801µs
coordinator_to_provider 60 367ms 3ms 2.195s 2.2s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=17.1µs (threshold=1ms)
parse:p95<=5ms PASS p95=32µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.524766ms (threshold=50ms)
reserve:p95<=200ms PASS p95=4.558ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

1-provider-scaling

1 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 30
Success 4
Errors 26
Total Duration 3.317s
Throughput 1.2 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 2.204s 2.204s 2.204s 2.204s
parse 4 25µs 26µs 31µs 31µs
reserve 4 3ms 3ms 3ms 3ms
route 4 708µs 732µs 813µs 813µs
coordinator_to_provider 4 2.194s 2.195s 2.195s 2.195s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=24.5µs (threshold=1ms)
parse:p95<=5ms PASS p95=31µs (threshold=5ms)
reserve:mean<=50ms PASS mean=3.23925ms (threshold=50ms)
reserve:p95<=200ms PASS p95=3.462ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-scaling

3 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 4.167s
Throughput 7.2 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 735ms 6ms 2.202s 2.202s
parse 30 15µs 14µs 29µs 34µs
reserve 30 2ms 1ms 3ms 3ms
route 30 1ms 0s 1ms 1ms
coordinator_to_provider 30 731ms 3ms 2.196s 2.196s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=15.233µs (threshold=1ms)
parse:p95<=5ms PASS p95=29µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.560833ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.56ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

5-provider-scaling

5 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 5 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 4.32s
Throughput 6.9 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 773ms 4ms 2.362s 2.362s
parse 30 18µs 16µs 33µs 51µs
reserve 30 2ms 1ms 6ms 6ms
route 30 1ms 0s 1ms 1ms
coordinator_to_provider 30 767ms 1ms 2.349s 2.352s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=17.766µs (threshold=1ms)
parse:p95<=5ms PASS p95=33µs (threshold=5ms)
reserve:mean<=50ms PASS mean=2.2053ms (threshold=50ms)
reserve:p95<=200ms PASS p95=6.304ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-heavy-100conc-10kb

3 providers, 20 users, 100 requests, concurrency=100, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 100
Success 12
Errors 88
Total Duration 3.047s
Throughput 3.9 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 12 2.126s 2.134s 2.134s 2.134s
parse 12 89µs 86µs 119µs 119µs
reserve 12 8ms 8ms 9ms 9ms
route 12 18ms 19ms 19ms 19ms
coordinator_to_provider 12 2.092s 2.098s 2.099s 2.099s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=89.083µs (threshold=1ms)
parse:p95<=5ms PASS p95=119µs (threshold=5ms)
reserve:mean<=50ms PASS mean=7.937333ms (threshold=50ms)
reserve:p95<=200ms PASS p95=8.881ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

Implements secure two-Mac pipeline-parallel inference using SE-backed
mutual authentication and AES-256-GCM for all tensor data in transit.
RDMA is used only for jaccl-based collective ops (allreduce); the
pipeline activation/token transfer path runs over ThunderboltLink TCP
with cryptographic protection, making it immune to librdma shim injection
and process memory attacks.

New P2P infrastructure:
- ThunderboltLink: TCP framing layer over wiredEthernet (bridge0), port
  7777, 8-byte LE length-prefixed frames
- MLXDistributed: Swift wrappers for jaccl C API (DistributedGroup,
  allSum, allGather, distributedSend/recv)
- ClusterControlMessage: wire types (HandshakeHello/Ack, PongPayload),
  ClusterFrame encoding, ClusterHealth, ClusterError
- ClusterHandshake: SE mutual auth + ephemeral X25519 ECDH session key
  (HKDF-SHA256, nonce0||nonce1 salt, "darkbloom-cluster-session-v1"
  info). TOFU pinning: peer SE key stored at
  /etc/darkbloom/cluster-peer-{IP}.sekey on first connection and
  verified on all subsequent connections
- TensorCrypto: AES-256-GCM seal/open for MLXArray activations and
  token IDs. Wire format: nonce(12) + ciphertext + tag(16). Shape
  derived as [1, seqLen, hiddenDim] — seqLen embedded in plaintext
- ClusterSession: ClusterSession actor (rank 0) with connect loop,
  5 s ping loop, 3-missed-ping teardown, and auto-reconnect with fresh
  handshake each time. ClusterPeer class (rank 1) dispatches ping/pong
  and inference frames
- EncryptedPipelineInference: EncryptedPipelineEngine actor (rank 0)
  and EncryptedPipelineServer class (rank 1). Per-step retry: 3
  attempts × 3 s delay, then ClusterError.serviceUnavailable (HTTP 429)
- PipelineInference: jaccl-based pipeline engine (kept for reference;
  encrypted path is preferred for production)

CLI additions (darkbloom cluster):
- cluster setup: writes jaccl devices JSON + env file to /etc/darkbloom
- cluster devices: lists RDMA interfaces via ibv_devices
- cluster ping: jaccl all_sum smoke test with latency reporting
- cluster run: establishes ClusterSession (rank 0) or ClusterPeer
  (rank 1) and prints health status; EncryptedPipelineEngine wired once
  model loading is integrated
- link: ThunderboltLink bandwidth test (send/receive over bridge0)

Package.swift: add MLX as direct dependency of darkbloom executable
(ClusterPing uses MLXArray + eval inline)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@anupsv anupsv changed the title feat(rdma): standalone RDMA connection smoke test feat(cluster): RDMA smoke test + encrypted pipeline inference over Thunderbolt May 20, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

Benchmark Results

Runner: macos-15 (M1 Virtual) | Date: 2026-05-21 02:27 UTC

1-provider-streaming

1 providers, 1 users, 30 requests, concurrency=5, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 30
Success 4
Errors 26
Total Duration 3.801s
Throughput 1.1 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 1.732s 1.732s 1.732s 1.732s
parse 4 51µs 65µs 73µs 73µs
reserve 4 6ms 7ms 9ms 9ms
route 4 748µs 795µs 907µs 907µs
coordinator_to_provider 4 1.719s 1.72s 1.722s 1.722s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=50.5µs (threshold=1ms)
parse:p95<=5ms PASS p95=73µs (threshold=5ms)
reserve:mean<=50ms PASS mean=6.2945ms (threshold=50ms)
reserve:p95<=200ms PASS p95=8.742ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

1-provider-non-streaming

1 providers, 1 users, 20 requests, concurrency=5, streaming=false

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 20
Success 4
Errors 16
Total Duration 2.715s
Throughput 1.5 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 2.688s 2.712s 2.714s 2.714s
parse 4 180µs 80µs 580µs 580µs
reserve 4 8ms 10ms 10ms 10ms
route 4 732µs 861µs 877µs 877µs
coordinator_to_provider 4 1.711s 1.711s 1.714s 1.714s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=179.5µs (threshold=1ms)
parse:p95<=5ms PASS p95=580µs (threshold=5ms)
reserve:mean<=50ms PASS mean=7.84075ms (threshold=50ms)
reserve:p95<=200ms PASS p95=10.017ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

7-provider-multi-model

7 providers, 5 users, 50 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 4 0.5 GB
mlx-community/gemma-3-270m-4bit 3 0.2 GB
Metric Value
Total Requests 50
Success 50
Errors 0
Total Duration 10.747s
Throughput 4.7 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 50 629ms 3ms 3.834s 3.872s
parse 49 25µs 17µs 53µs 278µs
reserve 49 1ms 1ms 2ms 3ms
route 49 0s 0s 1ms 1ms
coordinator_to_provider 50 575ms 1ms 3.826s 3.864s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=24.877µs (threshold=1ms)
parse:p95<=5ms PASS p95=53µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.430632ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.357ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-high-concurrency

3 providers, 10 users, 60 requests, concurrency=20, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 60
Success 12
Errors 48
Total Duration 3.467s
Throughput 3.5 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 12 2.239s 2.24s 2.255s 2.255s
parse 12 15µs 16µs 27µs 27µs
reserve 12 2ms 2ms 3ms 3ms
route 12 441µs 396µs 764µs 764µs
coordinator_to_provider 12 2.23s 2.234s 2.249s 2.249s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=15µs (threshold=1ms)
parse:p95<=5ms PASS p95=27µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.869333ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.652ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

1-provider-queue-saturation

1 providers, 10 users, 40 requests, concurrency=15, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 40
Success 6
Errors 34
Total Duration 3.534s
Throughput 1.7 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 6 2.685s 2.35s 3.355s 3.355s
parse 6 15µs 16µs 22µs 22µs
reserve 6 2ms 3ms 3ms 3ms
route 6 1.115s 1ms 3.344s 3.344s
queue_wait 2 3.344s 3.344s 3.344s 3.344s
dispatch 2 34µs 39µs 39µs 39µs
coordinator_to_provider 6 1.564s 2.344s 2.344s 2.344s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=15.166µs (threshold=1ms)
parse:p95<=5ms PASS p95=22µs (threshold=5ms)
reserve:mean<=50ms PASS mean=2.412666ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.55ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:mean<=5ms PASS mean=33.5µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=39µs (threshold=50ms)

3-provider-20-users

3 providers, 20 users, 60 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 60
Success 60
Errors 0
Total Duration 5.305s
Throughput 11.3 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 60 353ms 4ms 2.1s 2.101s
parse 60 17µs 15µs 33µs 55µs
reserve 60 1ms 1ms 3ms 3ms
route 60 415µs 393µs 664µs 926µs
coordinator_to_provider 60 350ms 3ms 2.092s 2.095s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=16.65µs (threshold=1ms)
parse:p95<=5ms PASS p95=33µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.242933ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.652ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

1-provider-scaling

1 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 30
Success 4
Errors 26
Total Duration 2.78s
Throughput 1.4 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 2.034s 2.034s 2.034s 2.034s
parse 4 16µs 15µs 26µs 26µs
reserve 4 2ms 2ms 2ms 2ms
route 4 502µs 546µs 616µs 616µs
coordinator_to_provider 4 2.028s 2.028s 2.028s 2.028s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=15.75µs (threshold=1ms)
parse:p95<=5ms PASS p95=26µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.97925ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.143ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-scaling

3 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 3.848s
Throughput 7.8 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 720ms 6ms 2.161s 2.161s
parse 30 16µs 13µs 30µs 55µs
reserve 30 2ms 1ms 3ms 4ms
route 30 439µs 400µs 752µs 898µs
coordinator_to_provider 30 716ms 3ms 2.154s 2.155s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=15.733µs (threshold=1ms)
parse:p95<=5ms PASS p95=30µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.5179ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.622ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

5-provider-scaling

5 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 5 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 3.924s
Throughput 7.6 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 727ms 4ms 2.22s 2.22s
parse 30 18µs 18µs 29µs 31µs
reserve 30 1ms 1ms 2ms 3ms
route 30 447µs 415µs 709µs 732µs
coordinator_to_provider 30 723ms 1ms 2.211s 2.215s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=17.933µs (threshold=1ms)
parse:p95<=5ms PASS p95=29µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.474333ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.448ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-heavy-100conc-10kb

3 providers, 20 users, 100 requests, concurrency=100, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 100
Success 12
Errors 88
Total Duration 3.179s
Throughput 3.8 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 12 2.059s 2.058s 2.078s 2.078s
parse 12 94µs 88µs 178µs 178µs
reserve 12 9ms 9ms 10ms 10ms
route 12 18ms 19ms 19ms 19ms
coordinator_to_provider 12 2.018s 2.017s 2.038s 2.038s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=94.083µs (threshold=1ms)
parse:p95<=5ms PASS p95=178µs (threshold=5ms)
reserve:mean<=50ms PASS mean=8.718583ms (threshold=50ms)
reserve:p95<=200ms PASS p95=9.656ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

Replaces the TOFU (Trust On First Use) disk-file approach for cluster
handshake SE key pinning with coordinator-verified distribution via the
macOS Keychain.

The previous approach stored the peer's SE public key in
/etc/darkbloom/cluster-peer-{IP}.sekey on first connection. Root on
either Mac could overwrite the file to accept a rogue peer. The new
approach:

  1. `darkbloom cluster setup --peer-serial <serial> --peer-ip <ip>`
     fetches the peer's SE public key from the coordinator, which
     verified it through Apple MDM attestation (MDA cert chain signed
     by Apple Enterprise Attestation Root CA).

  2. The fetched key is stored in the macOS Keychain with
     kSecAttrAccessibleWhenUnlockedThisDeviceOnly — modifying it
     requires SE access, not just root.

  3. ClusterHandshake.verifyAgainstKeychain() compares the SE key
     received during every handshake against the Keychain entry.
     An unknown key or a mismatch throws ClusterError.peerSEKeyMismatch
     and the connection is rejected immediately.

Coordinator (Go):
- GET /v1/cluster/peer-key?serial=<serial>  (requirePrivyAuth)
  Queries the persistent store — works even when the peer is offline.
  Returns { serial, se_public_key, trust_level, mda_verified }.

Swift — new files:
- ClusterPeerKeychain: store/load/delete Keychain entries keyed by
  peer Thunderbolt IP (service "io.darkbloom.cluster.peer-sekey")
- ClusterCoordinatorClient: HTTP client for /v1/cluster/peer-key;
  converts wss:// coordinator WebSocket URL to https:// automatically

Swift — updated files:
- ClusterHandshake: pinOrVerify (TOFU) → verifyAgainstKeychain
  Removed pinnedKeyPath and all disk-file I/O
- ClusterControlMessage: add ClusterError.peerSEKeyNotPinned
- ClusterCommand (ClusterSetup): add --peer-serial and --peer-ip flags;
  fetch + pin peer SE key as the first step of `cluster setup`

Usage (run on both Macs before `cluster run`):
  Mac A:  darkbloom cluster setup --rank 0 --peer-serial <B serial> \
              --peer-ip 169.254.58.74 --coordinator :9999
  Mac B:  darkbloom cluster setup --rank 1 --peer-serial <A serial> \
              --peer-ip 169.254.106.209 --coordinator 169.254.106.209:9999

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Providers started with `darkbloom serve --rdma-enabled` register
themselves with the coordinator as RDMA-capable. Thunderbolt-connected
peers automatically discover each other without manual `cluster setup`:

Coordinator:
- Add `rdma_enabled` field to RegisterMessage (protocol)
- Add `RDMAEnabled` field to registry.Provider; set from registration
- Add ListRDMAEnabledPeers() to registry — returns connected providers
  that have RDMAEnabled=true and completed SE attestation
- Add GET /v1/cluster/rdma-peers endpoint (Privy-authenticated)
- Add GET /v1/cluster/rdma-peers route to server mux

Swift provider:
- Add rdmaEnabled to ProviderMessage.Register, CoordinatorClientConfig,
  ProviderLoopConfig; wire through codec → registration message
- Add ClusterCoordinatorClient.fetchRDMAPeers() for the new endpoint
- New ClusterDiscovery actor: NWPathMonitor watches wiredEthernet, reads
  own Thunderbolt IP via getifaddrs, finds peer IP via arp -a, fetches
  RDMA peer list from coordinator, pins SE key in Keychain, elects rank
  (lower IPv4 = rank 0), starts ClusterSession or ClusterPeer
- Add --rdma-enabled flag to `darkbloom serve` (Start command)
- Hot-plug: NWPathMonitor re-fires on cable connect/disconnect

Providers without --rdma-enabled are not in the RDMA list, preventing
unauthorized cluster connections (DoS mitigation as designed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
anupsv added 3 commits May 20, 2026 19:41
Closes a class of holes in the RDMA auto-discovery surface where the
rdma_enabled flag was operator-controlled and the peer-list endpoint
was readable by any authenticated user.

Provider (Swift):
- New RDMACapability: chip family M5+ AND /usr/bin/rdma_ctl=enabled.
  Operator-passed --rdma-enabled is downgraded to false locally when
  the hardware can't honor it, with a stderr warning.
- AttestationBlob now signs rdmaEnabled (alphabetical, between
  rdmaDisabled and secureBootEnabled). AttestationBuilder takes the
  effective value as a parameter.
- StartCommand wires effectiveRDMA through ProviderLoopConfig so the
  ClusterDiscovery actor and the SE-signed blob carry the same value.

Coordinator (Go):
- AttestationBlob + VerificationResult gain RDMAEnabled (always-emit,
  matches Swift's non-optional Bool encoding). marshalSortedJSON
  updated for canonical-JSON parity.
- After attestation.Verify succeeds, msg.RDMAEnabled vs
  result.RDMAEnabled is reconciled — registry pinned to the attested
  value via SetRDMAEnabled, mismatches logged.
- GET /v1/cluster/rdma-peers now requires the caller to own a
  currently-connected RDMA-enabled provider (was: any Privy JWT).
  Non-providers get 403; caller's own providers are excluded from
  the response.
- ListRDMAEnabledPeersForCaller filter also requires Attested=true
  AND Status not in {Untrusted, Offline}. Closes a window where
  attestation.Verify populates SerialNumber/PublicKey *before*
  signature verification — without the Attested gate, tampered
  providers with valid identifiers could appear in the peer list.

Tests:
- Swift: chip-family gate, rdma_ctl-missing fail-closed, signed
  blob round-trip, strict-decode requirement.
- Go: VerificationResult.RDMAEnabled round-trip, SE-signed bit
  tamper detection, SetRDMAEnabled overwrite + no-op, 9 peer-list
  symmetric-gate scenarios (non-provider/empty/self-tampered/
  non-RDMA/unattested/tampered/untrusted/offline cases).

Threat model:
- TB-010 (provider_to_provider_cluster) added, T-041..T-045 cover
  ARP positional matching, ARP retry cancellability, silent re-pin,
  JWT in memory, peer-list disclosure. T-045 mitigations note both
  the symmetric-access gate and the Attested+Status filter as
  implemented.

Plus a small gofmt fix in cluster_handlers.go (pre-existing trailing
newline) that was blocking CI.
Picks up Layr-Labs/mlx-swift#2 which exposes the Cmlx library product
and enables the jaccl distributed backend that provider-swift's
ClusterSession / EncryptedPipelineInference depend on.

Without this bump, CI's fresh `swift build -c debug` fails with:
  product 'Cmlx' required by package 'provider-swift' target 'ProviderCore'
  not found in package 'mlx-swift'.

Tracking issue for the upstream deviation: #193.
Picks up Layr-Labs/mlx-swift-lm#24 which adds the callPartial methods
on LlamaModelInner and LlamaModel that EncryptedPipelineInference and
PipelineInference need for two-rank pipeline-parallel inference.

Without this bump, CI's `swift build` fails with:
  error: value of type 'LlamaModel' has no member 'callPartial'

Related: #193 (upstream mlx-swift distributed deviation tracker).
Force-pushes on Layr-Labs/mlx-swift#2 and Layr-Labs/mlx-swift-lm#24
landed new SHAs (fa6a4e8, c2fbbdc) — bump the submodule pointers to
match.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

Benchmark Results

Runner: macos-15 (M1 Virtual) | Date: 2026-05-21 04:24 UTC

1-provider-streaming

1 providers, 1 users, 30 requests, concurrency=5, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 30
Success 4
Errors 26
Total Duration 3.869s
Throughput 1.0 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 1.752s 1.752s 1.752s 1.752s
parse 4 29µs 28µs 42µs 42µs
reserve 4 4ms 4ms 6ms 6ms
route 4 555µs 567µs 601µs 601µs
coordinator_to_provider 4 1.742s 1.743s 1.744s 1.744s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=29µs (threshold=1ms)
parse:p95<=5ms PASS p95=42µs (threshold=5ms)
reserve:mean<=50ms PASS mean=3.91425ms (threshold=50ms)
reserve:p95<=200ms PASS p95=5.568ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

1-provider-non-streaming

1 providers, 1 users, 20 requests, concurrency=5, streaming=false

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 20
Success 4
Errors 16
Total Duration 2.384s
Throughput 1.7 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 2.382s 2.382s 2.384s 2.384s
parse 4 29µs 34µs 35µs 35µs
reserve 4 4ms 5ms 7ms 7ms
route 4 681µs 626µs 995µs 995µs
coordinator_to_provider 4 1.762s 1.762s 1.764s 1.764s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=28.5µs (threshold=1ms)
parse:p95<=5ms PASS p95=35µs (threshold=5ms)
reserve:mean<=50ms PASS mean=4.48575ms (threshold=50ms)
reserve:p95<=200ms PASS p95=6.713ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

7-provider-multi-model

7 providers, 5 users, 50 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 4 0.5 GB
mlx-community/gemma-3-270m-4bit 3 0.2 GB
Metric Value
Total Requests 50
Success 50
Errors 0
Total Duration 9.56s
Throughput 5.2 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 50 554ms 3ms 3.334s 3.36s
parse 48 20µs 17µs 47µs 56µs
reserve 48 1ms 1ms 4ms 5ms
route 48 1ms 0s 1ms 1ms
coordinator_to_provider 50 450ms 1ms 3.322s 3.35s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=20.062µs (threshold=1ms)
parse:p95<=5ms PASS p95=47µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.457ms (threshold=50ms)
reserve:p95<=200ms PASS p95=3.517ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-high-concurrency

3 providers, 10 users, 60 requests, concurrency=20, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 60
Success 12
Errors 48
Total Duration 3.528s
Throughput 3.4 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 12 2.099s 2.104s 2.116s 2.116s
parse 12 16µs 15µs 38µs 38µs
reserve 12 3ms 3ms 4ms 4ms
route 12 1ms 1ms 1ms 1ms
coordinator_to_provider 12 2.09s 2.097s 2.11s 2.11s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=16µs (threshold=1ms)
parse:p95<=5ms PASS p95=38µs (threshold=5ms)
reserve:mean<=50ms PASS mean=2.762666ms (threshold=50ms)
reserve:p95<=200ms PASS p95=4.386ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

1-provider-queue-saturation

1 providers, 10 users, 40 requests, concurrency=15, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 40
Success 4
Errors 36
Total Duration 3.209s
Throughput 1.2 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 2.155s 2.155s 2.155s 2.155s
parse 4 18µs 26µs 27µs 27µs
reserve 4 2ms 2ms 3ms 3ms
route 4 621µs 733µs 758µs 758µs
coordinator_to_provider 4 2.149s 2.149s 2.149s 2.149s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=18µs (threshold=1ms)
parse:p95<=5ms PASS p95=27µs (threshold=5ms)
reserve:mean<=50ms PASS mean=2.38625ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.868ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-20-users

3 providers, 20 users, 60 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 60
Success 60
Errors 0
Total Duration 4.838s
Throughput 12.4 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 60 353ms 5ms 2.101s 2.102s
parse 60 21µs 15µs 46µs 191µs
reserve 60 1ms 1ms 3ms 3ms
route 60 0s 0s 1ms 2ms
coordinator_to_provider 60 349ms 3ms 2.094s 2.096s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=20.883µs (threshold=1ms)
parse:p95<=5ms PASS p95=46µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.400433ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.909ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

1-provider-scaling

1 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 30
Success 4
Errors 26
Total Duration 3.237s
Throughput 1.2 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 4 2.218s 2.218s 2.218s 2.218s
parse 4 15µs 15µs 25µs 25µs
reserve 4 2ms 2ms 2ms 2ms
route 4 646µs 651µs 944µs 944µs
coordinator_to_provider 4 2.212s 2.212s 2.213s 2.213s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=14.5µs (threshold=1ms)
parse:p95<=5ms PASS p95=25µs (threshold=5ms)
reserve:mean<=50ms PASS mean=2.14825ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.34ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-scaling

3 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 4.201s
Throughput 7.1 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 753ms 6ms 2.27s 2.27s
parse 30 15µs 14µs 24µs 30µs
reserve 30 2ms 1ms 3ms 3ms
route 30 422µs 404µs 596µs 649µs
coordinator_to_provider 30 749ms 4ms 2.264s 2.265s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=14.566µs (threshold=1ms)
parse:p95<=5ms PASS p95=24µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.502633ms (threshold=50ms)
reserve:p95<=200ms PASS p95=2.516ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

5-provider-scaling

5 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 5 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 4.23s
Throughput 7.1 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 769ms 4ms 2.328s 2.329s
parse 30 19µs 17µs 35µs 43µs
reserve 30 2ms 1ms 3ms 4ms
route 30 420µs 398µs 593µs 819µs
coordinator_to_provider 30 765ms 1ms 2.321s 2.324s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=18.666µs (threshold=1ms)
parse:p95<=5ms PASS p95=35µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.6024ms (threshold=50ms)
reserve:p95<=200ms PASS p95=3.454ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

3-provider-heavy-100conc-10kb

3 providers, 20 users, 100 requests, concurrency=100, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 100
Success 12
Errors 88
Total Duration 3.484s
Throughput 3.4 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 12 2.269s 2.275s 2.296s 2.296s
parse 12 143µs 127µs 228µs 228µs
reserve 12 11ms 11ms 12ms 12ms
route 12 19ms 19ms 19ms 19ms
coordinator_to_provider 12 2.217s 2.222s 2.24s 2.24s

Assertion Report: FAIL

Assertion Result Detail
parse:mean<=1ms PASS mean=142.75µs (threshold=1ms)
parse:p95<=5ms PASS p95=228µs (threshold=5ms)
reserve:mean<=50ms PASS mean=10.824583ms (threshold=50ms)
reserve:p95<=200ms PASS p95=12.326ms (threshold=200ms)
encrypt:present FAIL no data for segment encrypt
dispatch:present FAIL no data for segment dispatch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant