Skip to content

Comments

Merge it#1

Open
santyr wants to merge 201 commits intohexdaemon:mainfrom
lightning-goats:main
Open

Merge it#1
santyr wants to merge 201 commits intohexdaemon:mainfrom
lightning-goats:main

Conversation

@santyr
Copy link

@santyr santyr commented Feb 6, 2026

No description provided.

Copy link
Owner

@hexdaemon hexdaemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Approving.

santyr and others added 29 commits February 6, 2026 08:57
…mpotency

Protocol hardening: version tolerance + deterministic idempotency
…mpotency

Protocol hardening: Phases B+C+D (version tolerance, idempotency, reliable delivery)
…on-idempotency

Hardening: Phase B/C protocol versioning, idempotency & bug fixes
Added an image to enhance the article's visual appeal.
…on-idempotency

Comprehensive hardening: P0/P1 bug fixes, thread safety, security
…on-idempotency

fix: P2/P3 hardening across 13 modules
contribution_ratio was never synced from the ledger to hive_members,
last_seen only updated on connect/disconnect events, and addresses
were never captured at join time. This fixes all three root causes
plus initializes presence tracking at join so uptime_pct accumulates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The hive-status RPC only returned tier/joined_at/pubkey for our membership,
so cl-revenue-ops revenue-hive-status showed null for these fields (Issue #36).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ats-addresses

fix: resolve stale member stats and null addresses (#59, #60)
…mat, determinism, dedup

- Bug 1 (Critical): calculate_our_balance now uses identical MemberContribution
  conversion as compute_settlement_plan (proper uptime normalization, int casting,
  rebalance_costs inclusion)
- Bug 2 (Critical): Period format standardized to YYYY-WW across routing_pool.py
  and rpc_commands.py (was YYYY-WNN, mismatched settlement format)
- Bug 3: settle_period atomicity check changed from `if ok is False` to `if not ok`
  to catch None/0 returns from record_pool_distribution
- Bug 4: generate_payments sort now includes peer_id tie-breaker for deterministic
  payment ordering, matching generate_payment_plan
- Bug 5: capital_score now reflects weighted_capacity instead of uptime_pct
- Bug 6: asyncio event loop in settlement_loop wrapped in try/finally to ensure
  loop.close() on exceptions
- Bug 8: Revenue deduplication by payment_hash (application-level check + UNIQUE
  constraint + index on pool_revenue table)
- Bug 9: Removed snapshot_contributions() side-effects from read-only paths
  (get_pool_status, calculate_distribution)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ification, signed ACKs

CRITICAL:
- Add ban check to handle_hello/handle_attest (prevents ban evasion via rejoin)
- Add timestamp freshness checks to 23 message handlers with per-type age limits
  (GOSSIP 1hr, INTENT 10min, SETTLEMENT 24hr, INTELLIGENCE 2hr)
- 5-minute future clock skew tolerance

HIGH:
- Add cryptographic signature verification to 13 previously unsigned handlers
  (health_report, liquidity_need/snapshot, route_probe/batch,
   peer_reputation_snapshot, task_request/response, splice_init_request/response,
   splice_update/signed/abort)
- MSG_ACK now signed: create_msg_ack accepts rpc for signing,
  handle_msg_ack verifies signature (backward-compatible)

MODERATE:
- Increase relay dedup window from 300s to 3600s (covers freshness windows)
- Increase MAX_SEEN_MESSAGES from 10000 to 50000

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CRITICAL: Replace 9 unsafe plugin.rpc calls with safe_plugin.rpc
- handle_expansion_nominate/elect/decline: checkmessage() and getinfo()
- hive_calculate_size: listchannels() and listfunds()
- hive_test_intent: getinfo()
- hive_test_pending_action: listchannels() and getinfo()

These bypassed the RPC_LOCK thread serialization, risking race conditions
when background threads make concurrent RPC calls to lightningd.

CRITICAL: Fix direct dict access on RPC results
- init(): getinfo()['id'] → getinfo().get('id', '') — could crash startup
- hive_test_intent: getinfo()['id'] → .get('id', '')
- hive_test_pending_action: getinfo()['id'] → .get('id', '')
- member_ids set comprehension: m['peer_id'] → m.get('peer_id', '')

HIGH: Wrap unprotected signmessage vote signing in try-except
- _propose_settlement_gaming_ban: vote signing had no error handling
- hive_propose_ban: vote signing had no error handling
Both could crash if signmessage RPC fails after proposal creation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…safety

- strategic_positioning: fix AttributeError crashes (fleet_coverage, target_capacity_sats, value_score → correct attribute names)
- cooperative_expansion: fix TOCTOU in join_remote_round (atomic check-and-set), negative liquidity score (clamp to 0), deterministic election tie-breaker (peer_id), use-after-free in handle_decline (capture decline_count in local), state validation in handle_elect, prune unbounded _recent_opens/_target_cooldowns
- governance: add threading.Lock for failsafe budget TOCTOU race (atomic check-execute-update)
- settlement: cap remainder allocation to len(frac_order) preventing cyclic wrapping
- bridge: fix double record_failure() on timeout (subprocess.TimeoutExpired → TimeoutError chain)
- liquidity_coordinator: fix MCF assignment ID collision (include channel suffixes)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove trustedcoin plugin (explorer-only Bitcoin backend)
- Add vitality plugin v0.4.5 for plugin health monitoring
- Update Docker image version to 2.2.7
- vitality auto-restarts failed plugins, improving production uptime

Ref: lightning-goats/cl-hive
…orrectness

P0 crashes fixed:
- channel_rationalization: _get_topology_snapshot() → get_topology_snapshot()
- network_metrics: same AttributeError crash on nonexistent private method
- fee_coordination: TypeError when TemporalPattern.hour_of_day/day_of_week is None
- task_manager: crash on None target/amount_sats in _execute_expand_task

P1 logic errors fixed:
- channel_rationalization: self.analyzer → self.rationalizer.redundancy_analyzer
- channel_rationalization: r.owner_id → r.owner_member, r.freed_capacity_sats → r.freed_capital_sats
- channel_rationalization: self.our_pubkey → self._our_pubkey
- fee_coordination: day_of_week == -1 → is None for pattern matching
- planner: listpeerchannels(target) → listpeerchannels(id=target)
- planner: guard for None return from create_intent before accessing .intent_id
- yield_metrics: net_revenue now subtracts total_cost (including open_cost) not just rebalance_cost
- routing_intelligence: int() wrap on float avg_capacity_sats to match type annotation
- mcf_solver: reverse edges now properly filtered via is_reverse flag instead of cost_ppm < 0

P2 edge cases fixed:
- mcf_solver: solution_valid false when no solution exists (was reporting true)
- peer_reputation: force_close_count uses max() not sum() across reporters
- peer_reputation: filter None from unique_reporters set
- network_metrics: use hive_connections not external topology for "not connected to"
- yield_metrics: clamp depletion_risk and saturation_risk to [0, 1.0]
- yield_metrics: init _remote_yield_metrics in __init__ instead of hasattr
- channel_rationalization: init _remote_coverage/_remote_close_proposals in __init__
- channel_rationalization: guard ZeroDivisionError on empty topology
- health_aggregator: round() instead of int() for health score truncation
- planner: clamp negative ratio in channel size calculation
- fee_coordination: min strength floor (0.1) for route markers preserving failure signal
- fee_intelligence: filter None from reporters list
- quality_scorer: Tuple[bool, str] type hint for Python 3.8 compat

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add threading.Lock to AdaptiveFeeController, StigmergicCoordinator,
  MyceliumDefenseSystem, TimeBasedFeeAdjuster, FeeCoordinationManager
  to protect shared state from concurrent modification
- Add threading.Lock to VPNTransportManager with snapshot-swap pattern
  for atomic reconfiguration and protected stats/peer state
- Route task_manager._execute_expand_task through governance engine
  instead of directly calling rpc.fundchannel (security: fail closed)
- Fix outbox retry: parse/serialize errors now fail permanently instead
  of retrying indefinitely with backoff
- Add cache bounds: cap _remote_pheromones (500 peers),
  _markers (1000 routes), _peer_stats (500 peers),
  _remote_yield_metrics (200 peers), _flow_history (500 channels)
- Add stale key eviction to rate limiters in peer_reputation,
  routing_intelligence, liquidity_coordinator, task_manager

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-revenue-ops

5 bugs fixed in the cooperative fee coordination flow:

- Non-salient fee changes now correctly revert to current_fee (was returning
  the modified fee even when salience filter said "not worth changing")
- pheromone_levels RPC now returns list under "pheromone_levels" key with
  field names matching cl-revenue-ops expectations (level, above_threshold)
- New hive-record-routing-outcome RPC for pheromone updates when
  source/destination are unavailable (fallback was calling read-only
  hive-pheromone-levels with invalid write params)
- Health multiplier comments corrected to match actual math ranges

These bugs combined meant the pheromone-based adaptive fee learning signal
was completely non-functional — routing outcomes were never recorded as
pheromone updates, and pheromone levels were unreadable by cl-revenue-ops.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ting, MCF

Critical fixes:
- CircularFlow.cycle → CircularFlow.members: AttributeError crash in
  get_shareable_circular_flows and get_all_circular_flow_alerts
- BFS fleet path finding used shared external peers as connectivity proxy
  instead of checking actual direct channels between members (phantom routes)
- LiquidityCoordinator._lock defined but never acquired — all shared
  mutable state unprotected from concurrent access

Medium fixes:
- MCFCircuitBreaker not thread-safe (added threading.Lock)
- MCF get_total_demand only counted inbound needs — fleets with only
  outbound needs never triggered optimization
- receive_mcf_assignment could exceed MAX_MCF_ASSIGNMENTS if cleanup
  didn't free space (now rejects)
- Empty string peers from failed channel lookups polluted circular
  flow detection graph
- to_us_msat not converted to int before comparison (Msat type safety)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… encapsulation

- create_mcf_ack_message() called with 4 extra args (TypeError on every ACK)
- create_mcf_completion_message() called with 7 extra args (TypeError on every completion)
- ctx.state_manager AttributeError in rebalance_hubs/rebalance_path (safe getattr)
- execute_hive_circular_rebalance missing permission check for fund movements
- get_mcf_optimized_path ignoring to_channel parameter (wrong assignment match)
- _check_stuck_mcf_assignments reaching into private dict (encapsulated with lock)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…defensive copies

State Manager:
- _validate_state_entry() no longer silently mutates input dict (available > capacity now rejected)
- update_peer_state() makes defensive copies of fee_policy, topology, capabilities
- Caps available_sats at capacity_sats in update_peer_state()
- load_from_database() and _load_state_from_db() now use from_dict() for consistent field handling

Planner:
- Added missing feerate gate to _propose_expansion() (documented but never implemented)
- Fixed cfg.market_share_cap_pct crash → getattr(cfg, 'market_share_cap_pct', 0.20)
- Fixed cfg.governance_mode crash → getattr(cfg, 'governance_mode', 'advisor')

Gossip:
- Added timestamp freshness check: rejects messages >1hr old or >5min in future

23 new tests, 1225 total passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…te_entry()

Prevents unbounded arrays, non-string entries, and oversized capability
strings from being accepted via gossip or FULL_SYNC messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gnments()

Added get_all_assignments() method to LiquidityCoordinator and updated
the mcf_assignments RPC to use it instead of reaching into private dict.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add vitality-amboss=true to docker-entrypoint.sh config generation
- Add vitality-watch-channels=true for channel health monitoring
- Add vitality-expiring-htlcs=50 for HTLC expiry warnings
- Update Dockerfile comment to document Amboss integration
…AttributeError

Critical fixes across 5 modules:

- mcf_solver: MCFCircuitBreaker.get_status() race condition — can_execute()
  called outside lock returned stale value; refactored to _can_execute_unlocked()
  called atomically within lock
- liquidity_coordinator: 8 thread safety fixes — missing locks on get_status(),
  get_pending_mcf_assignments(), get_mcf_assignment(), update_mcf_assignment_status(),
  create_mcf_ack_message(), create_mcf_completion_message(), get_mcf_status();
  deadlock fix (non-reentrant lock + nested call); new claim_pending_assignment()
  atomic method to prevent TOCTOU double-claim race
- cl-hive.py: _send_mcf_ack() TypeError — create_mcf_ack_message() takes no
  params but was called with 4 positional args; sendcustommsg keyword args fix;
  broadcast_intent_abort NameError (plugin → safe_plugin); missing coordinator
  check in handle_mcf_completion_report; TOCTOU claim race replaced with atomic
  claim_pending_assignment()
- cost_reduction: CircularFlow AttributeError (cf.members_count → cf.cycle_count);
  hub scoring division-by-zero guard; record_mcf_ack() thread safety with
  dedicated lock and proper __init__ initialization
- intent_manager: get_intent_stats() race — _remote_intents read without lock

25 new tests covering all fixes including concurrent access verification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
santyr and others added 30 commits February 17, 2026 10:21
- 00-INDEX.md: Mark Reputation Schema as Phase 1 Implemented, update capital figures
- CLAUDE.md: Add did_credentials.py to module list (40 modules), add 2 new DB tables
  (48 total), add did_maintenance_loop (9 background loops), update test count (1826)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
15 management schema categories with 5-dimension danger scoring engine,
management credential lifecycle (issue/revoke/list), receipt recording,
command validation, and tier-based authorization. 92 new tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CRITICAL fixes:
- C1: Fail-closed signature verification (reject when no RPC)
- C2: Reject empty-signature revocations (fail-closed)
- C3: Fix _schema_matches prefix boundary (hive:fee-policy/* no longer
  matches hive:fee-policy-extended/v1)
- C4: Make ManagementCredential frozen (prevent post-issuance mutation)
- C6: Strengthen pubkey validation (66-char hex, 02/03 prefix required)

HIGH fixes:
- H1: Reject NaN/Infinity metric values in validate_metrics_for_profile
- H2: Enforce period_start < period_end in issue_credential
- H3: Bound _aggregation_cache to 10k entries with LRU eviction
- H4: Validate valid_days > 0 in management credential issuance
- H5: Enforce MAX_ALLOWED_SCHEMAS_LEN/MAX_CONSTRAINTS_LEN size limits
- H6: Require "advanced" tier for set_bulk and circular_rebalance

MEDIUM fixes:
- M1: Check rowcount in revoke_management_credential (return False if
  no rows updated)
- M2: Require credential_id in handle_credential_present (no UUID
  generation for missing IDs)
- M3: Include credential_id in management signing payload
- M4: Remove unused threading import from management_schemas.py
- M5: Remove unused threading.local() from did_credentials.py

Tests: 1946 passed (28 new tests covering all fixes)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without the `global` keyword, DIDCredentialManager was assigned to a
local variable in init() and immediately garbage-collected. The module-
level global remained None, rendering the entire Phase 1 DID system
inert: all handlers, the maintenance loop, and all hive-did-* RPC
commands silently no-oped.

Found by wiring audit (agent ad0ab8f).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Enable PRAGMA foreign_keys=ON per-connection so the FK on
  management_receipts→management_credentials is actually enforced
- Fix revoke_did_credential to check cursor.rowcount > 0 (was always
  returning True even when no rows matched)

Found by DB audit (agent abdfcc6).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix 4 issues from protocol messages audit:
- idempotency: use credential_id+issuer_id for REVOKE dedup (not event_id)
- protocol: require credential_id in validate_did_credential_present
- protocol: enforce size limits on metrics/evidence in validation
- rpc_commands: apply domain filter when listing by issuer_id

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use 33-byte compressed pubkey (not 32-byte x-only) for Boltz v2 API
- Prefer rest_url from node config (was falling back to localhost)
- Replace httpx with subprocess+curl for CLN REST calls (httpx fails on self-signed certs over WireGuard)
- Add retry logic (3 attempts) for CLN curl calls
- Better error reporting (rc, stderr, stdout)

Tested: 3M sat reverse swap pwBh29N6u3KX completed successfully.
Phase 3 of the DID Ecosystem adds management credential gossip,
auto-issuance, rebroadcast, and reputation integration across the
hive coordination layer.

New protocol messages:
- MGMT_CREDENTIAL_PRESENT (32887): Share management credentials
- MGMT_CREDENTIAL_REVOKE (32889): Announce mgmt credential revocation

New features:
- Management credential gossip handlers with signature verification
- Auto-issue hive:node credentials from peer state/contribution data
- Periodic rebroadcast of own credentials to fleet (4h cycle)
- Enhanced did_maintenance_loop with auto-issue + rebroadcast

Module integrations:
- Planner: reputation-weighted expansion scoring (recognized+ boost)
- Membership: reputation as supplementary fast-track promotion signal
- Settlement: reputation tier metadata in contribution gathering
- MCP server: 10 new tools for DID/management credential operations

Tests: 81 new tests in test_did_protocol.py (2027 total, all passing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- C2: checkmessage isinstance(dict) guard now fails closed (was silently
  skipping signature verification on non-dict results) in did_credentials.py
  and management_schemas.py
- C3: MCP hive_mgmt_credential_issue was sending node_id param that the
  RPC handler doesn't accept — removed from tool def and handler
- H1: empty pubkey from checkmessage now fails closed (was bypassing
  issuer identity check) — changed 'if pubkey and' to 'if not pubkey or'
- H2: _score_metrics now inverts avg_fee_ppm and response_time_ms
  normalization (lower values = better score, not higher)
- M: credential_id max-length (64) and signature min-length (10)
  checks added to both REVOKE protocol validators; issued_at validated
  in handle_credential_present

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both DIDCredentialManager and ManagementSchemaRegistry were initialized
with rpc=safe_rpc, but safe_rpc was never defined anywhere in the
codebase. This would crash init() with NameError, preventing the entire
plugin from starting. Changed to plugin.rpc to match every other module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: implement Phase 4 — Cashu Task Escrow + Extended Settlements; remove boltz-loopout

Phase 4A adds CashuEscrowManager with per-mint circuit breakers, HTLC secret
management (encrypted at rest), danger-based pricing, 4 ticket types
(single/batch/milestone/performance), and signed task execution receipts.

Phase 4B extends SettlementManager with 9 settlement type handlers, bilateral
and multilateral NettingEngine, BondManager (post/slash/refund with
time-weighted staking), DisputeResolver (deterministic stake-weighted panel
selection), and credit tier integration.

Adds 7 protocol messages (32891-32903), 6 DB tables, 13 RPC commands, 113
tests (2140 total, 0 failures). Removes boltz-loopout.py API script in favor
of boltz-client.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: complete phase 4/5 integration and phase 6 planning artifacts

* audit: close remaining phase 1-5 medium findings

* db: auto-migrate legacy settlement_bonds schema on startup

---------

Co-authored-by: santyr <6dcea3ab-e73b-4cd2-8278-d949995d101f@bolverker.anonaddy.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: implement Phase 4 — Cashu Task Escrow + Extended Settlements; remove boltz-loopout

Phase 4A adds CashuEscrowManager with per-mint circuit breakers, HTLC secret
management (encrypted at rest), danger-based pricing, 4 ticket types
(single/batch/milestone/performance), and signed task execution receipts.

Phase 4B extends SettlementManager with 9 settlement type handlers, bilateral
and multilateral NettingEngine, BondManager (post/slash/refund with
time-weighted staking), DisputeResolver (deterministic stake-weighted panel
selection), and credit tier integration.

Adds 7 protocol messages (32891-32903), 6 DB tables, 13 RPC commands, 113
tests (2140 total, 0 failures). Removes boltz-loopout.py API script in favor
of boltz-client.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: complete phase 4/5 integration and phase 6 planning artifacts

* audit: close remaining phase 1-5 medium findings

* db: auto-migrate legacy settlement_bonds schema on startup

* Fix settlement pool period handling and proposal integrity

---------

Co-authored-by: santyr <6dcea3ab-e73b-4cd2-8278-d949995d101f@bolverker.anonaddy.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ging (#73)

* feat: implement Phase 4 — Cashu Task Escrow + Extended Settlements; remove boltz-loopout

Phase 4A adds CashuEscrowManager with per-mint circuit breakers, HTLC secret
management (encrypted at rest), danger-based pricing, 4 ticket types
(single/batch/milestone/performance), and signed task execution receipts.

Phase 4B extends SettlementManager with 9 settlement type handlers, bilateral
and multilateral NettingEngine, BondManager (post/slash/refund with
time-weighted staking), DisputeResolver (deterministic stake-weighted panel
selection), and credit tier integration.

Adds 7 protocol messages (32891-32903), 6 DB tables, 13 RPC commands, 113
tests (2140 total, 0 failures). Removes boltz-loopout.py API script in favor
of boltz-client.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: complete phase 4/5 integration and phase 6 planning artifacts

* audit: close remaining phase 1-5 medium findings

* db: auto-migrate legacy settlement_bonds schema on startup

* Fix settlement pool period handling and proposal integrity

* fix(routing-pool): normalize uptime and correct snapshot capacity logging

---------

Co-authored-by: santyr <6dcea3ab-e73b-4cd2-8278-d949995d101f@bolverker.anonaddy.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Adds boltzd as an optional supervised service (BOLTZ_ENABLED=false by
default). Entrypoint auto-generates boltz.toml with correct CLN gRPC
cert paths and symlinks datadir for boltzcli convenience.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove 57 docs files now canonical in lightning-goats/hive-docs.
Keep only operational docs (JOINING_THE_HIVE, MCP_SERVER) and
pointer README in cl-hive. Update all dangling references in
README.md, MOLTY.md, and docker/README.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(docker): add optional Phase 6 plugin scaffolding (comms + archon)

Wire cl-hive-comms and cl-hive-archon as opt-in plugins: baked into
Docker image but disabled by default. Adds HIVE_COMMS_ENABLED /
HIVE_ARCHON_ENABLED env flags, entrypoint load-order logic, plugin
detection in cl-hive.py (hive-phase6-plugins RPC), config validation,
and manual-install-archon.sh for local dev containers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve Phase 6 gate blockers (H-7 settlement, test stubs, audit status)

- Gate settlement execution behind approval instead of auto-executing
  BOLT12 payments (H-7 remediation)
- Remove 14 stub tests from test_feerate_gate.py, add assertions to
  edge case tests (23 tests, all with real assertions)
- Add remediation status table to audit document (all 9 HIGH resolved)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: pin Phase 6 plugin versions and add detection tests

- Pin cl-hive-comms and cl-hive-archon Docker defaults to v0.1.0
  instead of main branch (PH6-M3)
- Add test_phase6_detection.py with 11 tests covering sibling plugin
  detection, fallback paths, and error handling (PH6-M2)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct cl_revenue_ops version pin and add non-root TODO

- Fix CL_REVENUE_OPS_VERSION v2.2.4 → v2.2.5 to match actual release
- Add TODO comment for future non-root container refactor

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: implement Phase 6 transport handover (comms delegation)

Add dual-mode transport: cl-hive delegates Nostr transport to cl-hive-comms
when present (Coordinated Mode), falls back to internal transport when absent
(Monolith Mode). Implements the core wiring from 17-PHASE6-HANDOVER-PLAN.md.

Key changes:
- ExternalCommsTransport wraps comms RPCs with CircuitBreaker (3 failures → open)
- Transport mode selection in init() based on phase6 plugin detection
- hive-inject-packet RPC for comms→hive inbound message flow
- Inbound pump thread drains injected packets to DM callbacks
- TransportInterface ABC for type-safe polymorphism

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add RemoteArchonIdentity adapter for Phase 6 identity delegation

Wire identity adapter into init() — delegates signing to cl-hive-archon
when present, falls back to LocalIdentity (CLN HSM) when absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: audit round 2 — identity adapter tests and unused import cleanup

- Add 21 tests for IdentityInterface, LocalIdentity, RemoteArchonIdentity
- Remove unused Optional import from identity_adapter.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: audit round 3 — pump logging, transport validation, response checks

- Log exceptions in _external_transport_pump() instead of bare pass
- Validate payload structure in inject_packet() (must be dict)
- Validate pubkey hex format in get_identity() RPC response
- Validate recipient_pubkey non-empty in send_dm()
- Log DM callback exceptions in ExternalCommsTransport.process_inbound()
- Add re import for hex validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: audit round 4 — inject_packet return value, sign_message safety, envelope guard

- ExternalCommsTransport.inject_packet returns bool for queue-full detection
- LocalIdentity.sign_message wrapped in try-except to prevent RPC crash
- hive-inject-packet RPC checks inject_packet return and reports queue-full
- Fix envelope pubkey None→empty string in process_inbound

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: dual-funded channel open with single-funded fallback

All channel opens now attempt v2 (dual-funded) first via
fundpsbt → openchannel_init → openchannel_update → signpsbt → openchannel_signed,
with proper cleanup (openchannel_abort + unreserveinputs) on failure,
then fall back to standard fundchannel. Unified through _open_channel() helper
used by rpc_commands, task_manager, and hive-open-channel RPC.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* phase6: harden injected packet ingest and archon identity info

* feat(docker): non-root lightning user, supervisor hardening, advisor tuning

- Create dedicated 'lightning' user; move plugin/data paths from /root to /home/lightning
- Add chown step in entrypoint for data directories
- Add explicit user=root to supervisord programs needing elevated privileges
- Disable vitality-amboss option (unavailable in current build)
- Advisor: add MAB exploration guidance, profitable-channel protection, 3x margin rule

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Add full cl_revenue_ops revenue RPC parity to MCP server

* feat(mcp): add boltz backup and mnemonic verify tools

Expose revenue-boltz-backup and revenue-boltz-backup-verify as MCP
tools for programmatic access to boltzd swap mnemonic and backup state.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ins (#78)

- Boltz client: tarball uses linux_amd64 (underscore) but install
  referenced linux-amd64 (hyphen), causing silent install failure
- Add PyNaCl>=1.5.0 to pip install (required by cl-hive-comms)
- Update Phase 6 plugin version pins from v0.1.0 scaffold to main

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…lock contention

plugin.log() and RPC responses share a single write_lock on stdout. With
16 message handler threads + 9 background loops, background threads
monopolize the lock and starve the IO thread, causing hive-status and
other RPC commands to hang for 15-20s. Replace per-message lock
acquisition with a queue-based writer that batches all pending log
messages into a single write_lock acquisition every 50ms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Multiple MCP tool handlers were making 3-7 sequential await node.call()
RPCs where all calls were independent. This stacked 30s timeouts
multiplicatively, causing regular MCP communication timeouts.

Converted to asyncio.gather() with return_exceptions=True:
- handle_advisor_record_snapshot: 7 sequential → 1 gather (210s → 30s)
- handle_stagnant_channels (both defs): per-channel RPC loops → batch
  gather (1830s/690s → 30s)
- read_resource fleet handlers: sequential per-node loops → parallel
  per-node gathers (450s → 30s)
- handle_hive_node_diagnostic: 4 sequential → 1 gather (120s → 30s)
- handle_revenue_ops_health: 4 sequential → 1 gather (120s → 30s)
- handle_advisor_get_peer_intel: 3 sequential → 1 gather (90s → 30s)
- handle_set_fees: 2 sequential guard checks → 1 gather (60s → 30s)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Continue RPC parallelization effort from a305dcc, targeting 4 more
handlers that stacked sequential 30s-timeout calls:

- handle_channel_deep_dive: move listnodes/getinfo fallbacks into
  initial gather (4+2 sequential → 6 parallel), saves up to 60s
- handle_rebalance_diagnostic: gather plugin-list, rebalance-debug,
  and sling-status speculatively in parallel (3 sequential → 1 gather)
- handle_revenue_status: fetch revenue-status and fee-intel-query
  in parallel (2 sequential → 1 gather)
- handle_config_recommend: fetch revenue-dashboard and revenue-config
  in parallel (2 sequential → 1 gather)

handle_revenue_rebalance left as-is: its RPCs have genuine data
dependencies (retry depends on failure, sling-stats verifies completed
rebalance) that prevent parallelization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Continue RPC parallelization effort, targeting 2 more handlers:

- handle_revenue_profitability: fetch profitability and fee-intel-query
  in parallel (2 sequential → 1 gather), saves up to 30s
- handle_revenue_competitor_analysis: fetch fee-intel-query and
  listchannels in parallel for single-peer path (2 sequential →
  1 gather), saves up to 30s

handle_propose_promotion and handle_vote_promotion left as-is: they
have true data dependencies (second call needs peer_id from first).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Final batch of RPC parallelization:

- read_resource per-node status: 4 sequential RPCs (hive-status,
  getinfo, listfunds, pending-actions) → 1 gather, saves up to 90s
- handle_run_settlement_cycle: snapshot + calculate in parallel
  (2 sequential → 1 gather), saves up to 30s

handle_enrich_proposal left as-is: true data dependency (needs
peer_id from pending-actions result before enriching).

This completes the MCP RPC parallelization effort:
  16 handlers fixed across 4 commits, eliminating all sequential
  RPC anti-patterns where calls were independent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
on_forward_event makes synchronous listfunds() RPC calls per forward,
blocking cl-hive's IO thread. On 40-peer nodes with active routing,
this queues up ALL incoming RPC calls (hive-record-flow, hive-status,
etc.) for 15+ seconds, causing cl-revenue-ops timeouts.

on_peer_connected similarly blocks on listpeers() and sendcustommsg().

Both handlers now submit work to the existing _msg_executor thread pool,
returning immediately so the IO thread can process RPC requests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implemented 3 priority fixes from settlement audit:

1. Bug #3 (CRITICAL): Fee reports now saved regardless of broadcast threshold
   - Previously: database.save_fee_report() only called when broadcast threshold met
   - Impact: Low-traffic nodes (like nexus-01) showed 0 fees in settlement
   - Fix: Always save fee report to database on every forward, independent of gossip broadcast
   - Location: cl-hive.py line ~3926 in _update_and_broadcast_fees()

2. Bug #1 (HIGH): Local node presence initialized for uptime tracking
   - Previously: Local node never recorded its own presence, showed 0% uptime
   - Impact: Fair share calculations undervalued local node contribution (10% weight)
   - Fix: Initialize presence for our_pubkey on plugin startup
   - Location: cl-hive.py line ~1836 in init()

3. Sling command mismatch: Fixed for v4.2.0 compatibility
   - Previously: hive-sling-status called 'sling-status' (old command)
   - Impact: RPC error 'Unknown command' since sling v4.2.0 renamed to 'sling-stats'
   - Fix: Update command name to match sling v4.2.0 API
   - Location: cl-hive.py line ~13478

All fixes tested with py_compile syntax check.

Refs: docs/settlement-audit-2026-02-23.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants