Skip to content

Add Pi (earendil-works) as the sixth searchable backend#28

Open
tony wants to merge 7 commits into
masterfrom
pi-support
Open

Add Pi (earendil-works) as the sixth searchable backend#28
tony wants to merge 7 commits into
masterfrom
pi-support

Conversation

@tony
Copy link
Copy Markdown
Owner

@tony tony commented May 30, 2026

Summary

  • Add Pi (the earendil-works "Pi Agent Harness") as agentgrep's sixth backend, searchable from both the CLI and the MCP server.
  • Parse Pi's append-only JSONL session transcripts — user turns surface as prompts, assistant and tool turns as history, with the model lifted from assistant turns. Pi keeps no separate prompt-history log or SQLite index, so the transcript is the whole searchable surface (the structural twin of the Claude Code backend).
  • Discover sessions under both layouts: the default PI_CODING_AGENT_DIR tree (~/.pi/agent/sessions/--<cwd>--/<ts>_<uuid>.jsonl) and the flat PI_CODING_AGENT_SESSION_DIR override, where the working directory is recovered from the session header.
  • Catalogue every on-disk Pi store — one searchable session store plus documentary descriptors for settings, models, themes, tools, managed binaries, prompt templates, the debug log, and the npm extension root; auth.json is documented but never indexed.
  • Document the backend with a reference page, a support-matrix entry, and a storage-catalogue section.

Changes by area

Search core (src/agentgrep/)

  • store_catalog.py: _PI_STORESpi.sessions (primary chat, searched, with discovery specs for both layouts) plus nine documentary rows; catalogue version bumped.
  • __init__.py: discover_pi_sources (multi-root: agent dir + flat session-dir override) and parse_pi_session_file, which walks the SessionEntry tagged union and reuses build_search_record for the role→kind split.
  • stores.py / query/registry.py: pi added to the agent-name literal and the query agent enum.
  • mcp/: pi in the MCP agent literals and selectors, pi.sessions_jsonl.v1 registered in KNOWN_ADAPTERS, and Pi named in the server instructions.

Docs

  • docs/backends/pi.md: path layout, both env overrides, and the pi.sessions record schema.
  • docs/backends/index.md, docs/dev/storage-catalog.md, README.md, docs/index.md: Pi added to the support matrix and agent lists.

Tests

  • tests/test_agentgrep.py: discovery for both layouts, an end-to-end search, a NamedTuple + test_id parametrized parse matrix over every entry type, and a v1 timestamp-fallback test.
  • tests/samples/pi/pi.sessions/example.jsonl: a session fixture, registered in PRIMARY_FIXTURES.

Design decisions

  • Role-derived prompt/history split, no synthesis: Pi has no separate prompt log, but build_search_record already maps role ∈ {user, human} → prompt, so user turns become --type prompts results straight from the transcript — consistent with the structurally identical Claude Code backend.
  • Bespoke parser over the generic walker: iter_message_candidates would descend into the inner unix-millisecond message.timestamp and lose the clean entry-level ISO timestamp, so parse_pi_session_file walks entries explicitly, preferring the entry timestamp and falling back to the millisecond value only for headerless v1 files.
  • Two discovery roots: PI_CODING_AGENT_SESSION_DIR stores sessions flat with no per-cwd subdirectory, so it is resolved as a second named root and the cwd comes from the session header; per-descriptor dedup collapses the roots when the override is unset.
  • Exhaustive catalogue, selective search: every on-disk Pi artifact is catalogued for completeness, but only the session transcript is searched by default, and auth.json is marked private and never enumerated.

Verification

Search the real Pi sessions:

$ uv run agentgrep grep --agent pi analyze

Every catalogued adapter id is advertised by the MCP layer:

$ uv run pytest tests/test_stores.py::test_runtime_adapter_ids_match_catalogue_discovery

Credentials are catalogued as private (never enumerated):

$ rg -n "pi.auth" src/agentgrep/store_catalog.py

Test plan

  • uv run ruff check . / uv run ruff format . — lint and format clean
  • uv run ty check — strict type checking passes
  • uv run pytest --reruns 0 — full suite green, including the new pi tests
  • just build-docs — docs build clean (pi backend page, support matrix, coverage grid)
  • test_discover_pi_sources_honours_pi_coding_agent_dirPI_CODING_AGENT_DIR used verbatim
  • test_discover_pi_sources_session_dir_override_is_flat — flat session-dir layout, cwd from header
  • test_search_pi_sessions — user→prompt, assistant→history with model lifted
  • test_parse_pi_session_entry — every entry type maps correctly or is skipped
  • test_parse_pi_session_v1_uses_unix_ms_timestamp_fallback — v1 millisecond fallback

Closes #25

tony added 7 commits May 30, 2026 20:27
why: Adding earendil-works/pi (issue #25) as a searchable backend
starts with teaching every agent-name surface that "pi" exists. This
commit is inert on its own — pi is a recognized agent with no stores
yet, so discovery and parsing are never reached — which keeps the
literal change isolated from the catalog/discovery/parser layers that
follow.

what:
- Add "pi" to the AgentName literal in stores.py, __init__.py, and
  mcp/_library.py, plus the AgentSelector literal and AGENT_CHOICES.
- Add "pi" to the five MCP model agent literals and the query
  registry's agent enum_values (and its docstring values line).
- Mention Pi in the MCP server-instruction header and trigger scope.
- Add "pi" to the package description and keywords.
why: The catalogue is agentgrep's single source of truth for where
agent data lives and what shape it takes. pi gets one searchable store
(its JSONL session transcripts) plus documentary descriptors for every
other on-disk artifact it can create, so the catalogue stays a complete
inventory even for data agentgrep never searches.

what:
- Add _PI_OBSERVED_AT and the _PI_STORES tuple: pi.sessions
  (PRIMARY_CHAT, searched, two discovery specs for the default nested
  layout and the flat PI_CODING_AGENT_SESSION_DIR override) plus nine
  documentary rows (settings, auth [PRIVATE credentials], models,
  themes, tools, bin, prompts, debug_log, extensions_npm).
- Splice _PI_STORES into CATALOG, bump catalog_version to 10, and
  advance captured_at to the pi observation date.
- Register pi.sessions_jsonl.v1 in the MCP KNOWN_ADAPTERS tuple.
- Add "pi" to the test-side KNOWN_AGENTS so the catalogue invariants
  cover the new rows.
why: With pi in the catalogue, discovery needs to resolve pi's data
directory and enumerate its session files. pi diverges from the other
backends in two ways: PI_CODING_AGENT_DIR already includes the agent
segment (used verbatim, default ~/.pi/agent), and the optional
PI_CODING_AGENT_SESSION_DIR points at the sessions directory directly,
where files land flat with no per-cwd subdirectory.

what:
- Add discover_pi_sources, resolving the agent dir via resolve_env_root
  and the session dir via _resolve_optional_root, then handing both to
  discover_from_catalog as named roots ("default", "pi_session") so the
  nested and flat layouts are both covered. Reuses the Codex/Cursor
  multi-root pattern; per-descriptor dedup collapses the roots when the
  override is unset.
- Wire the "pi" branch into discover_sources.
why: Discovered pi sessions need an adapter to turn their tagged-union
entries into normalized search records. pi keeps everything in one
append-only JSONL file, so the parser must walk the entry tree and lift
the conversation turns, summaries, and session name while ignoring
metadata-only entries.

what:
- Add parse_pi_session_file plus _pi_message_candidate and
  _pi_entry_text helpers. message entries become MessageCandidates fed
  through build_search_record so user turns map to prompts and the rest
  to history; compaction/branch_summary summaries and session_info
  names are emitted as history text; model/thinking/custom/label
  entries are skipped.
- Prefer the entry-level ISO timestamp, falling back to the inner
  unix-milliseconds message timestamp for v1 entries; capture the
  session id and cwd from the header for record metadata.
- Dispatch pi.sessions_jsonl.v1 in iter_source_records and register it
  in ITER_SOURCE_RECORD_ADAPTERS.
why: The pi backend needs the same test depth as the Claude and Grok
backends — discovery under both layouts, the env overrides, and the
per-entry parse behaviour — so regressions surface before release.

what:
- Add a pi.sessions fixture and register it in PRIMARY_FIXTURES so the
  catalogue's primary-store invariant covers pi.
- Cover PI_CODING_AGENT_DIR (verbatim override) and the flat
  PI_CODING_AGENT_SESSION_DIR layout (cwd recovered from the header),
  plus an end-to-end search asserting user->prompt and
  assistant->history with the model lifted.
- Add a NamedTuple + test_id parametrized parse test over every entry
  type (message roles, compaction/branch summaries, session name, and
  the skipped metadata-only entries) and a v1 unix-ms timestamp
  fallback test.
why: Each backend gets a reference page documenting its layout, env
overrides, and record schemas, and the support matrix and agent lists
must name pi so readers can find it.

what:
- Add docs/backends/pi.md: base path and both env overrides, the
  per-working-directory session layout and the flat session-dir
  override, the pi.sessions record schema, and the documentary stores.
- Add a Pi card and toctree entry to the backend index and a Pi
  section to the storage-catalogue dev page.
- Name Pi in the README and docs landing agent lists.
- Extend the backend-grid and coverage-grid doc tests to cover pi.
why: Record the new pi backend for the unreleased version so readers
know agentgrep now searches earendil-works/pi.

what:
- Add a "Pi backend (#25)" deliverable under What's new for the
  unreleased 0.1.0a8 section, describing the single JSONL session
  store, the two env overrides, and the catalogued documentary stores.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

backend: add pi (earendil-works/pi) as a sixth searchable backend

1 participant