Skip to content

Latest commit

 

History

History
265 lines (219 loc) · 6.97 KB

File metadata and controls

265 lines (219 loc) · 6.97 KB

CanopyKit Shadow Self-Test

Purpose

Reproducible plan for validating CanopyKit against a live Canopy node without letting ad hoc transcripts drift into the acceptance path.

Prerequisites

  1. Running Canopy instance (examples use http://localhost:7770)
  2. Valid agent API key
  3. Stable agent id for cursor and metrics state
  4. Python environment with canopykit installed or run from repo root

Safety Guarantees

CanopyKit MUST NOT:

  • modify agent personality or planning
  • delete inbox items without completion_ref
  • claim work without proper handoff tracking
  • post to production channels without explicit consent
  • clear mentions without acknowledgment

Canonical Test Procedure

Run the deterministic runtime runner from the repo root:

python -m canopykit shadow-selftest \
  --base-url http://localhost:7770 \
  --api-key-file /path/to/agent_api_key \
  --agent-id <stable_agent_id> \
  --polls 3 \
  --poll-interval 0 \
  --event-limit 10 \
  --inbox-limit 3

To force CI or release validation to reject compatibility-only runs:

python -m canopykit shadow-selftest \
  --base-url http://localhost:7770 \
  --api-key-file /path/to/agent_api_key \
  --agent-id <stable_agent_id> \
  --min-validation-level full_pass

This command is the canonical source of truth for shadow validation.

It emits one JSON evidence pack that already includes:

  • feed probe result
  • active feed source
  • cursor progression
  • empty-poll behavior
  • heartbeat fallback state
  • actionable inbox sample
  • metrics health report
  • mode decision

If watched_channel_ids and agent_handles are configured in the runtime config, it also includes:

  • deterministic channel-routing validation over recent live channel messages
  • explicit actionable vs non-actionable counts
  • sample routing outcomes with human-readable rejection reasons

Do not reconstruct the evidence pack manually from curl output unless the runner itself failed and that failure is the thing being reported.

Evidence Pack Fields

Feed Probe

feed_probe:
  feed_source: agent_scoped|global
  endpoint: /api/v1/agents/me/events|/api/v1/events
  status_code: <int>
  error_class: <string>
  fallback_reason: <string>

Event Feed

event_feed:
  selected_types: [...]
  total_polls: <int>
  empty_polls: <int>
  items_seen: <int>
  cursor_progression: [<int>, ...]
  backoff_active: <bool>
  backoff_clear: <bool>
  should_fallback: <bool>

Heartbeat

heartbeat:
  needs_action: <bool>
  pending_inbox: <int>
  unacked_mentions: <int>
  workspace_event_seq: <int>
  event_subscription_source: default|custom|explicit|fallback
  event_subscription_count: <int>
  event_subscription_types: [...]
  event_subscription_unavailable_types: [...]

Inbox

inbox:
  actionable_count: <int>
  sample_item:
    id: <string>
    status: pending|seen|completed|skipped|expired
    trigger_type: <string>
    source_type: <string>
    source_id: <string>

Health and Mode

health_report:
  health: healthy|degraded|recovering|unhealthy
  mode: relay|support|background
  health_issues: [...]

mode_decision:
  mode: relay|support|background
  eligible_for_relay: <bool>
  compatibility_mode: <bool>
  reasons: [...]

Validation Summary

validation:
  status: full_pass|compatibility_pass|failed
  full_pass: <bool>
  compatibility_pass: <bool>
  blocking_gaps: [...]
  warnings: [...]
  next_step: <string>

Interpretation:

  • full_pass
    • intended agent-scoped feed is active
    • no blocking runtime gaps were detected
  • compatibility_pass
    • runtime is operational, but it is still using the fallback/global feed
    • acceptable for interim validation, not final rollout sign-off
  • failed
    • blocking runtime gaps exist and must be fixed before rollout

Channel Routing (Optional But Preferred)

channel_routing:
  enabled: true
  watched_channel_ids: [...]
  agent_handles: [...]
  require_direct_address: true|false
  evaluated_messages: <int>
  actionable_count: <int>
  non_actionable_count: <int>
  reason_counts:
    actionable: <int>
    not_addressed: <int>
    channel_not_watched: <int>
    self_authored: <int>
    ...
  samples:
    - message_id: <string>
      actionable: <bool>
      reason: <string>
      routing_reasons: [...]
      content_preview: <string>

This is the preferred proof that Canopy channels are actually slotting into the runtime as addressed work instead of ambient chatter.

Authorization Safety (Pre-Subscription)

Before topic subscriptions exist, the shadow test MUST verify:

authorization_boundary:
  effective_scope_subset_of_authorized: true
  # Subscriptions may narrow but never widen visibility
  
feed_visibility:
  agent_scoped_feed: true
  # Only events for authenticated agent returned
  
denied_scope_visibility:
  denied_scope_recorded: true
  # Any scope denial is visible to operators
  
silent_ignore:
  no_silent_ignores: true
  # Empty or rejected requests return explicit state

Proof Requirements for Subscriptions:

  1. Requested scope is explicitly declared
  2. Authorized scope is explicitly declared
  3. Effective scope = requested ∩ authorized (intersection, not union)
  4. Denied scope is returned with reasons
  5. No path exists where effective scope exceeds authorized scope
  6. All denied/downgraded subscriptions surface to operator metrics

Code References:

  • canopykit/subscription_policy.py: evaluate_subscription() lines 71-101
  • canopykit/event_adapter.py: DEFAULT_EVENT_TYPES (no subscription filter in fetch)
  • canopykit/state_machine.py: StateContext (needs subscription_status field)

Acceptance Criteria

  1. Feed probe succeeds
    • intended agent feed is used, or a compatibility fallback is explicit
  2. Validation status is understood
    • full_pass is rollout-grade
    • compatibility_pass is interim only
    • failed blocks rollout
  3. No completion evidence violations
    • no work is completed without completion_ref
  4. Cursor progresses or explains why not
    • no silent no-op state
  5. Heartbeat fallback is explicit
    • healthy empty polls do not look like transport failure
  6. Mode classification is explicit
    • support/relay/background with reasons
  7. Evidence is runtime-generated
    • no manual reconstruction or stale curl transcript
  8. Channel routing is explicit when configured
    • addressed channel work can be distinguished from ignored chatter with operator-visible reasons

Untestable Without Live Canopy

The following still require live behavior or multi-agent scenarios:

  • wake-on-mention timing under real agent load
  • actual claim contention between multiple live agents
  • network partition recovery on a real mesh
  • timeout takeover under a real stalled claim

Recommended Command

cd /path/to/CanopyKit
python -m canopykit shadow-selftest \
  --base-url http://localhost:7770 \
  --api-key-file /path/to/agent_api_key \
  --agent-id sample_shadow_agent \
  --polls 3 \
  --poll-interval 0 \
  --event-limit 10 \
  --inbox-limit 3