Skip to content

Latest commit

 

History

History
280 lines (227 loc) · 6.65 KB

File metadata and controls

280 lines (227 loc) · 6.65 KB

Architecture

This document explains how the library operates. It complements the spec (PROTOCOL.md) with high-level structure, data flows, and failure handling.

Contents:

  • System overview
  • Bootstrapping
  • Publishing
  • Bubble detection and merging
  • Data model
  • Failure modes
  • Tuning

System overview

Components:

  • iroh endpoint and gossip
  • Auto-discovery (bootstrap loop)
  • Publisher (background actor)
  • Bubble merge (background actor)
  • Message overlap merge (background actor)
  • DHT client (mutable records)
  • Crypto (signing, encryption, secret rotation)
flowchart LR
  subgraph App
    A[Your App]
  end

  subgraph Iroh
    E[Endpoint]
    G[Gossip]
  end

  subgraph AutoDiscovery
    B[Bootstrap Loop]
    P[Publisher]
    BM[Bubble Merge]
    MO[Message Overlap Merge]
  end

  subgraph DHT
    D[(Mainline DHT)]
  end

  A --> E --> G
  G <---> B
  G <---> P
  G <---> BM
  G <---> MO
  B <--> D
  P <--> D
  BM <--> D
  MO <--> D
Loading

Node lifecycle:

  • Start iroh endpoint
  • Start gossip
  • Auto-discovery:
    • Join topic, attempt bootstrap, connect
    • Spawn publisher, bubble merge, and message overlap merge actors on success

State machine:

stateDiagram-v2
  [*] --> Idle
  Idle --> Discovering
  Discovering --> Joining
  Joining --> Joined
  Joined --> Publishing
  Publishing --> Joined : interval + jitter
  Discovering --> Discovering : retry/jitter
  Joining --> Discovering : no peers
Loading

Bootstrapping

Goal: connect to at least one topic peer.

Sequence:

sequenceDiagram
  participant Node
  participant DHT
  participant Gossip

  Node->>Gossip: subscribe(topic_hash)
  Note over Node: optionally publish startup record
  Node->>DHT: get_mutable(signing_pub, salt, 10s timeout)
  DHT-->>Node: encrypted records (0..N)
  Node->>Node: decrypt, verify, filter(not self)
  alt candidates exist
    loop each candidate
      Node->>Gossip: join_peers([pub_key])
      Node->>Node: sleep per_peer_join_settle_time (100ms)
      Gossip-->>Node: NeighborUp?
    end
    Node->>Node: final wait join_confirmation_wait_time (500ms)
  else no candidates
    Node->>Node: maybe publish own (rate-limited)
    Node->>Node: sleep no_peers_retry_interval (1500ms)
  end
  Node->>Node: joined? if yes, spawn publisher + merge actors
Loading

Key points:

  • First iteration: optionally check older records first (check_older_records_first_on_startup).
  • Both unix_minute and unix_minute - 1 records are always fetched.
  • Pacing avoids bursts and "bubbles."
  • Keep trying until joined.

Pseudocode:

loop:
  if joined(): return sender, receiver

  minute = first_attempt && check_last_minute_first ? -1 : 0
  recs = get_records(unix_minute(minute) - 1) + get_records(unix_minute(minute))

  if recs.is_empty():
    maybe_publish_this_minute()
    sleep(no_peers_retry_interval = 1500ms)
    continue

  for peer in extract_bootstrap_nodes(recs):
    if joined(): break
    join_peer(peer)
    sleep(per_peer_join_settle_time = 100ms)

  sleep(join_confirmation_wait_time = 500ms)
  if joined(): return
  maybe_publish_this_minute()
  sleep(discovery_poll_interval = 2000ms)

Publishing

Goal: publish active participation without overloading DHT.

Flow:

flowchart TD
  A[Tick] --> B[Get minute=now]
  B --> C[Get existing records]
  C --> E{>= max_bootstrap_records = 5?}
  E -- Yes --> F[Skip - rate-limited]
  E -- No --> G[Build record: peers + msg hashes]
  G --> H[Sign + Encrypt]
  H --> I[Publish to DHT]
  I --> J[Reset ticker: base_interval + random jitter]
Loading

Pseudocode:

// Publisher actor loop (interval: base_interval + random jitter)
on tick:
  records = get_records(unix_minute(0))
  if records.len >= max_bootstrap_records(5): return

  rec = make_record(neighbors(<=5), last_hashes(<=5))
  enc = encrypt(sign(rec))
  publish(enc)
  reset_ticker(base_interval + random(0, max_jitter))

Bubble detection and merging

Signal 1: small cluster (neighbors < min_neighbors, default 4).

  • Extract peer ids from discovered records.
  • Exclude zeros, self, current neighbors.
  • Join up to max_join_peer_count (default 4).

Signal 2: non-overlapping message sets.

  • Compare local last_message_hashes with others.
  • If disjoint, collect publisher + peers from those records.
  • Attempt joins to bridge partitions.

Decision graph:

flowchart LR
  A[Tick] --> B{neighbors < min_neighbors?}
  B -- Yes --> C[Join peers from records]
  B -- No --> D{local_msgs >= 1?}
  D -- No --> E[Sleep until next tick]
  D -- Yes --> F{overlap with others?}
  F -- No --> G[Join from non-overlapping records]
  F -- Yes --> E
Loading

Data model

Record (summary):

  • topic hash (32)
  • unix_minute (u64)
  • pub_key (publisher ed25519 public key)
  • content (serialized GossipRecordContent: active_peers + last_message_hashes)
  • signature (64)

EncryptedRecord:

  • encrypted_record (Vec)
  • encrypted_decryption_key (Vec)

Diagram:

classDiagram
  class Record {
    +topic: [u8;32]
    +unix_minute: u64
    +pub_key: [u8;32]
    +content: GossipRecordContent
    +signature: [u8;64]
  }

  class GossipRecordContent {
    +active_peers: [[u8;32];5]
    +last_message_hashes: [[u8;32];5]
  }

  class EncryptedRecord {
    +encrypted_record: Vec<u8>
    +encrypted_decryption_key: Vec<u8>
  }

  Record --* GossipRecordContent : content deserializes to
Loading

Key derivation:

flowchart TD
  T[topic_hash] --> A[SHA512 topic+minute]
  M[unix_minute] --> A
  A --> S[signing_keypair seed -> Ed25519]

  T --> L["salt = SHA512('salt' + topic + minute)[..32]"]
  M --> L

  T --> R[secret_rotation topic,minute,initial_secret_hash]
  M --> R
  R --> E[encryption_keypair seed -> Ed25519]
Loading

Failure modes

  • DHT get timeout:
    • Return empty set; continue loop.
  • Decrypt/verify failure:
    • Drop record; proceed.
  • Publish failure:
    • DHT layer retries with jittered intervals (3 retries, 5s base + 0-10s jitter).
  • Join failure:
    • Continue to next peer; final 500ms wait; loop.

Tuning

  • Per-minute cap (records \ge max_bootstrap_records, default 5) gates publishing.
  • Per-peer pacing (100ms) reduces bursts.
  • No-peers retry (1500ms) and discovery poll (2000ms) stabilize DHT load.
  • Message window size (5 peers, 5 hashes) is a trade-off:
    • Larger window = better visibility, larger records.
    • Smaller window = lower bandwidth, less overlap detection.

Parameters (all configurable):

  • max_bootstrap_records (default 5)
  • max_join_peer_count (default 4)
  • min_neighbors for bubble merge (default 4)
  • DHT timeouts, retry count, and jitter
  • Bootstrap timing: no_peers_retry, per_peer_settle, join_confirmation, discovery_poll
  • Publisher timing: initial_delay, base_interval, max_jitter
  • Merge timing: base_interval, max_jitter (separate for bubble and overlap)