Skip to content

feat(cli): soul diff enhancements — --require-same-did, --fail-on, diff-driver-install (#217)#243

Open
sshekhar563 wants to merge 11 commits into
qbtrix:mainfrom
sshekhar563:feat/soul-diff-enhancements
Open

feat(cli): soul diff enhancements — --require-same-did, --fail-on, diff-driver-install (#217)#243
sshekhar563 wants to merge 11 commits into
qbtrix:mainfrom
sshekhar563:feat/soul-diff-enhancements

Conversation

@sshekhar563
Copy link
Copy Markdown

Summary

Implements items 1, 3, and 4 from #217 as a single feat(cli) PR, as suggested in the issue description.

1. soul diff-driver-install (trivial)

Configures .gitattributes (*.soul diff=soul) and git config (diff.soul.command) so git diff, git log -p, and PR tools show soul-level diffs natively.

  • Supports --global (user-level) and --local (repo-level, default)
  • Idempotent: re-running doesn't duplicate the .gitattributes line

3. --require-same-did foot-gun guard (trivial)

Exits non-zero when left and right souls have different DIDs. Prevents accidental diffs between unrelated souls.

  • Override with --allow-cross-did for intentional cross-DID comparison
  • Without the flag, different DIDs are allowed (backward-compatible)

4. --fail-on <category> CI guard (small)

Exits non-zero when the named change category has count > 0 in the diff summary.

# Fail CI if any memories were added
soul diff before.soul after.soul --fail-on memory.added

# Fail on multiple categories
soul diff a.soul b.soul --fail-on memory.added --fail-on identity

prakashUXtech and others added 11 commits April 30, 2026 07:46
…st-0.4.0

chore: sync dev with main post-0.4.0
Three new commands wrap the org-level SQLite WAL journal as a CLI for
shell hooks, CI scripts, and non-Python runtimes:

- soul journal init <path> — bootstrap a standalone journal file
  (no root soul, no scope tree, no founder).
- soul journal append <path> — write one event from flags, or batch
  JSONL events from stdin. Echoes the committed EventEntry to stdout
  with backend-assigned seq + prev_hash so callers can chain
  causation ids.
- soul journal query <path> — filter by --action / --action-prefix
  (trailing-dot tolerated), --scope, --correlation-id, --since/--until,
  plus --at <iso> for point-in-time replay. Rich table by default;
  --json emits a parseable JSON array.

Foundation for replacing memory-heavy soul-sync.sh hooks with
structured, queryable journal events.

Closes qbtrix#189
Read-only command compares two soul files at the soul level — not the
byte level. Sections covered: identity, OCEAN/DNA, state, core memory,
memories per layer + per domain, bond (default + per-user), skills,
trust chain, self-model, evolution.

Memory diff strategy: by id. Added = in right not left, removed = in
left not right, modified = same id different fields. Superseded
memories are filtered from the modified list by default since they
still live in the file; --include-superseded surfaces the chain
explicitly.

Output formats: text (Rich panel, sections omitted when empty),
--format json (full SoulDiff Pydantic dump for tooling), --format
markdown (paste-ready table for PR bodies). Other flags: --section
narrows to one section (with hyphen/underscore aliases), --summary-only
collapses to per-section counts.

Schema mismatch raises SchemaMismatchError → exit 1 with a clean
message pointing at `soul migrate`.

New public Python API at soul_protocol.runtime: diff_souls, SoulDiff,
SchemaMismatchError. The SoulDiff model is fully Pydantic-roundtripable
so PR review bots and CI checks can consume it.

Closes qbtrix#191
…rediction-error gating (qbtrix#192) (qbtrix#209)

Drafts the v0.5.0 RFC for six brain-aligned memory operations: confirm,
update, supersede (extends 0.4.0), forget (semantics shift to weight
decay), purge (new hard-delete with backup), and reinstate. Adds the
schema additions (retrieval_weight, supersedes back-edge,
prediction_error, revisions), recall changes (weight filter +
provenance), trust chain hooks, CLI/MCP surface, a 2-3 hour spike
scope for daily-use validation before the production build, the open
questions for captain review, the 0.4 → 0.5 migration walkthrough,
the SPEC.md follow-up stanzas, and the cog-sci references.
…rix#160) (qbtrix#211)

Add a soul-aware eval framework: a YAML-driven format and runner that
seeds the soul with explicit state (memories, OCEAN, bonds, mood, energy)
before each case runs, so behaviour can be measured against a known
starting point rather than being treated as a stateless function.

New modules
- src/soul_protocol/eval/{schema,runner,scoring}.py — Pydantic schema
  with five scoring kinds (keyword, regex, semantic, judge, structural),
  the run_eval orchestrator, plus run_eval_against_soul for the MCP
  variant that runs against the live soul without re-birthing.
- src/soul_protocol/cli/eval_cmd.py — `soul eval` command. Runs one
  spec or every .yaml under a directory; --json, --filter,
  --judge-engine, --verbose options. Exit 0 on all-pass (skips OK),
  1 on any failure or spec error.
- soul_eval MCP tool (src/soul_protocol/mcp/server.py) — runs a YAML
  spec against the active soul. seed block ignored; live state is the
  seed. Accepts yaml_path or yaml_string.

Shipped examples (tests/eval_examples/)
- personality_expression.yaml — high-openness OCEAN seed surfaces
  creative memories
- memory_recall_filtering.yaml — multi-user attribution prevents
  cross-user bleed
- domain_isolation.yaml — domain-scoped recall stays inside its
  domain (qbtrix#41)
- bond_strength_effect.yaml — bonded-visibility memories gate on
  bond_threshold
- trust_chain_provenance.yaml — observe → recall side-effects flow
  through

Tests: 66 new under tests/test_eval/ (schema, runner, examples, cli,
mcp). Wired into pytest as smoke tests so the example specs never
drift. Total: 2537 → 2603.

Docs: full schema reference at docs/eval-format.md; soul eval section
in cli-reference.md; soul_eval section in mcp-server.md; Evaluation
section in api-reference.md; Unreleased note in CHANGELOG.md.
Three new commands wrap the org-level SQLite WAL journal as a CLI for
shell hooks, CI scripts, and non-Python runtimes:

- soul journal init <path> — bootstrap a standalone journal file
  (no root soul, no scope tree, no founder).
- soul journal append <path> — write one event from flags, or batch
  JSONL events from stdin. Echoes the committed EventEntry to stdout
  with backend-assigned seq + prev_hash so callers can chain
  causation ids.
- soul journal query <path> — filter by --action / --action-prefix
  (trailing-dot tolerated), --scope, --correlation-id, --since/--until,
  plus --at <iso> for point-in-time replay. Rich table by default;
  --json emits a parseable JSON array.

Foundation for replacing memory-heavy soul-sync.sh hooks with
structured, queryable journal events.

Closes qbtrix#189
…oad typing + key rotation tests (qbtrix#210)

Closes qbtrix#199, qbtrix#200, qbtrix#205, qbtrix#204.

* qbtrix#199 — verify_chain rejects entries whose timestamp predates the
  previous entry's timestamp by more than 60s (skew tolerance), closing
  a backdating gap at the chain head.
* qbtrix#200 — _canonical_json no longer silently stringifies non-JSON-native
  types via default=str. A strict default raises TypeError with an
  actionable message so hash-determinism cannot drift across Python
  versions.
* qbtrix#205 — compute_payload_hash refuses BaseModel inputs at the public
  entry point. Callers must pre-serialize via model_dump(mode='json')
  so a BaseModel and a dict cannot accidentally produce different
  hashes for the same logical payload.
* qbtrix#204 — Keystore gains previous_public_keys allow-list, persisted as
  keys/previous.keys (newline-separated base64). Soul.verify_chain
  accepts entries whose public_key matches either the current key or
  any in the allow-list, enabling key rotation. Default empty list
  preserves the v0.4.0 strict-current-key behavior.

Existing chain-append payloads in runtime/soul.py and runtime/bond.py
were audited under the strict canonical JSON rule; they were already
JSON-native dicts so they keep hashing cleanly.

Tests: 34 new test cases in test_verification_hardening.py and
test_key_rotation.py covering monotonicity, strict JSON refusal,
BaseModel guard, mixed-signer chains, allow-list mechanics, and
keystore round-trips through directory + archive layouts.

Docs: CHANGELOG Unreleased section, docs/trust-chain.md threat model
+ key management + on-disk layout, docs/SPEC.md §10A.6 Verification
contract + §10A.7 Identity binding.
…ilure logging (qbtrix#201, qbtrix#202) (qbtrix#213)

- TrustEntry gains a non-cryptographic ``summary`` field excluded from
  the canonical bytes used for ``compute_entry_hash`` and signing.
- TrustChainManager.append accepts an optional ``summary=`` parameter;
  when omitted, an action-keyed default formatter registry covers the
  actions Soul emits (memory.write, memory.forget, memory.supersede,
  bond.strengthen, bond.weaken, evolution.proposed/applied,
  learning.event).
- ``Soul.audit_log()`` rows include ``summary``. ``soul audit`` Rich
  table adds a Summary column; ``--no-summary`` restores the 0.4.0
  hash-only view. JSON output always carries summary.
- ``Soul._safe_append_chain`` splits the log path: verification-only
  (no public key, _PublicOnlyProvider) stays at DEBUG; an unexpected
  exception during ``append`` now logs at WARNING under the
  ``runtime.chain_append_skipped`` event with action, error type,
  error message, and soul name. BondRegistry's on_change callback
  failure path follows the same pattern under
  ``runtime.bond_callback_failed``.
- evolution.applied and learning.event Soul callsites pass an explicit
  summary because their on-chain payloads don't carry the keys the
  registry default expects.
- Tests: 39 new (registry coverage, manager-level summary behaviour,
  cryptographic-exclusion guarantee, back-compat read of pre-qbtrix#201
  chains, Soul-level integration, read-only soul DEBUG-only behaviour,
  unexpected-exception WARNING shape, BondRegistry callback failure,
  long-horizon 50+ ops with flaky provider).
- Docs: trust-chain.md (summary excluded from signed bytes),
  cli-reference.md (Summary column + --no-summary flag),
  api-reference.md (TrustEntry.summary + TrustChainManager.append
  signature), SPEC.md §10A.1 (summary as non-cryptographic field +
  §10A.2 canonical encoding clarification), CHANGELOG.md Unreleased.
…) (qbtrix#214)

Adds a configurable cap on trust-chain length, enforced at append time.
When the cap is reached, every non-genesis entry is compressed into a
single signed `chain.pruned` marker and the chain resumes growing from
there. Genesis (seq=0) is always preserved.

The verifier gains exactly one carve-out from strict seq monotonicity:
entries with action == `chain.pruned` may have a seq strictly greater
than `prev.seq + 1`. Every other action remains strictly monotonic, so
a tampered chain that injects a forged seq gap still fails verification.

Surfaces:
  - Biorhythms.trust_chain_max_entries: int = 0 (unbounded by default)
  - TrustChainManager.prune(keep, *, reason) and dry_run_prune(keep)
  - soul prune-chain CLI (dry-run by default; --apply to mutate)
  - soul_prune_chain MCP tool (apply=False by default)

Spec extension lands as SPEC.md §10A.10 (optional pruning extension).
The full archival design — separate trust_chain/archive/ directory with
checkpoint entries — is deferred to v0.5.x. This release is the
touch-time stub.
…ff-driver-install (qbtrix#217)

Implements items 1, 3, and 4 from issue qbtrix#217 as a single feat(cli) PR:

1. soul diff-driver-install (trivial):
   - Configures .gitattributes (*.soul diff=soul) and git config
     (diff.soul.command) so git diff/log -p/PR tools show soul-level
     diffs natively.
   - Supports --global (user-level) and --local (repo-level, default).
   - Idempotent: re-running doesn't duplicate the .gitattributes line.

3. --require-same-did foot-gun guard (trivial):
   - Exits non-zero when left and right souls have different DIDs.
   - Prevents accidental diffs between unrelated souls.
   - Override with --allow-cross-did for intentional cross-DID comparison.

4. --fail-on <category> CI guard (small):
   - Exits non-zero when the named change category has count > 0.
   - Repeatable: --fail-on memory.added --fail-on identity.
   - 15 categories mapped from SoulDiff.summary() keys.
   - Unknown categories fail fast (exit 2) with a list of known names.
   - Diff output still renders before failing, so CI logs stay useful.

Tests: 11 new tests covering all three features (27 total, all passing).
@github-actions
Copy link
Copy Markdown

This PR has been automatically marked as stale because it has not had activity in the last 14 days. It will be closed in 7 days if no further activity occurs. If you're still working on this, please push an update or leave a comment.

@github-actions github-actions Bot added the stale label May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants