Add Pi (earendil-works) as the sixth searchable backend#28
Open
tony wants to merge 7 commits into
Open
Conversation
why: Adding earendil-works/pi (issue #25) as a searchable backend starts with teaching every agent-name surface that "pi" exists. This commit is inert on its own — pi is a recognized agent with no stores yet, so discovery and parsing are never reached — which keeps the literal change isolated from the catalog/discovery/parser layers that follow. what: - Add "pi" to the AgentName literal in stores.py, __init__.py, and mcp/_library.py, plus the AgentSelector literal and AGENT_CHOICES. - Add "pi" to the five MCP model agent literals and the query registry's agent enum_values (and its docstring values line). - Mention Pi in the MCP server-instruction header and trigger scope. - Add "pi" to the package description and keywords.
why: The catalogue is agentgrep's single source of truth for where agent data lives and what shape it takes. pi gets one searchable store (its JSONL session transcripts) plus documentary descriptors for every other on-disk artifact it can create, so the catalogue stays a complete inventory even for data agentgrep never searches. what: - Add _PI_OBSERVED_AT and the _PI_STORES tuple: pi.sessions (PRIMARY_CHAT, searched, two discovery specs for the default nested layout and the flat PI_CODING_AGENT_SESSION_DIR override) plus nine documentary rows (settings, auth [PRIVATE credentials], models, themes, tools, bin, prompts, debug_log, extensions_npm). - Splice _PI_STORES into CATALOG, bump catalog_version to 10, and advance captured_at to the pi observation date. - Register pi.sessions_jsonl.v1 in the MCP KNOWN_ADAPTERS tuple. - Add "pi" to the test-side KNOWN_AGENTS so the catalogue invariants cover the new rows.
why: With pi in the catalogue, discovery needs to resolve pi's data
directory and enumerate its session files. pi diverges from the other
backends in two ways: PI_CODING_AGENT_DIR already includes the agent
segment (used verbatim, default ~/.pi/agent), and the optional
PI_CODING_AGENT_SESSION_DIR points at the sessions directory directly,
where files land flat with no per-cwd subdirectory.
what:
- Add discover_pi_sources, resolving the agent dir via resolve_env_root
and the session dir via _resolve_optional_root, then handing both to
discover_from_catalog as named roots ("default", "pi_session") so the
nested and flat layouts are both covered. Reuses the Codex/Cursor
multi-root pattern; per-descriptor dedup collapses the roots when the
override is unset.
- Wire the "pi" branch into discover_sources.
why: Discovered pi sessions need an adapter to turn their tagged-union entries into normalized search records. pi keeps everything in one append-only JSONL file, so the parser must walk the entry tree and lift the conversation turns, summaries, and session name while ignoring metadata-only entries. what: - Add parse_pi_session_file plus _pi_message_candidate and _pi_entry_text helpers. message entries become MessageCandidates fed through build_search_record so user turns map to prompts and the rest to history; compaction/branch_summary summaries and session_info names are emitted as history text; model/thinking/custom/label entries are skipped. - Prefer the entry-level ISO timestamp, falling back to the inner unix-milliseconds message timestamp for v1 entries; capture the session id and cwd from the header for record metadata. - Dispatch pi.sessions_jsonl.v1 in iter_source_records and register it in ITER_SOURCE_RECORD_ADAPTERS.
why: The pi backend needs the same test depth as the Claude and Grok backends — discovery under both layouts, the env overrides, and the per-entry parse behaviour — so regressions surface before release. what: - Add a pi.sessions fixture and register it in PRIMARY_FIXTURES so the catalogue's primary-store invariant covers pi. - Cover PI_CODING_AGENT_DIR (verbatim override) and the flat PI_CODING_AGENT_SESSION_DIR layout (cwd recovered from the header), plus an end-to-end search asserting user->prompt and assistant->history with the model lifted. - Add a NamedTuple + test_id parametrized parse test over every entry type (message roles, compaction/branch summaries, session name, and the skipped metadata-only entries) and a v1 unix-ms timestamp fallback test.
why: Each backend gets a reference page documenting its layout, env overrides, and record schemas, and the support matrix and agent lists must name pi so readers can find it. what: - Add docs/backends/pi.md: base path and both env overrides, the per-working-directory session layout and the flat session-dir override, the pi.sessions record schema, and the documentary stores. - Add a Pi card and toctree entry to the backend index and a Pi section to the storage-catalogue dev page. - Name Pi in the README and docs landing agent lists. - Extend the backend-grid and coverage-grid doc tests to cover pi.
why: Record the new pi backend for the unreleased version so readers know agentgrep now searches earendil-works/pi. what: - Add a "Pi backend (#25)" deliverable under What's new for the unreleased 0.1.0a8 section, describing the single JSONL session store, the two env overrides, and the catalogued documentary stores.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PI_CODING_AGENT_DIRtree (~/.pi/agent/sessions/--<cwd>--/<ts>_<uuid>.jsonl) and the flatPI_CODING_AGENT_SESSION_DIRoverride, where the working directory is recovered from the session header.auth.jsonis documented but never indexed.Changes by area
Search core (
src/agentgrep/)store_catalog.py:_PI_STORES—pi.sessions(primary chat, searched, with discovery specs for both layouts) plus nine documentary rows; catalogue version bumped.__init__.py:discover_pi_sources(multi-root: agent dir + flat session-dir override) andparse_pi_session_file, which walks theSessionEntrytagged union and reusesbuild_search_recordfor the role→kind split.stores.py/query/registry.py:piadded to the agent-name literal and the queryagentenum.mcp/:piin the MCP agent literals and selectors,pi.sessions_jsonl.v1registered inKNOWN_ADAPTERS, and Pi named in the server instructions.Docs
docs/backends/pi.md: path layout, both env overrides, and thepi.sessionsrecord schema.docs/backends/index.md,docs/dev/storage-catalog.md,README.md,docs/index.md: Pi added to the support matrix and agent lists.Tests
tests/test_agentgrep.py: discovery for both layouts, an end-to-end search, aNamedTuple+test_idparametrized parse matrix over every entry type, and a v1 timestamp-fallback test.tests/samples/pi/pi.sessions/example.jsonl: a session fixture, registered inPRIMARY_FIXTURES.Design decisions
build_search_recordalready mapsrole ∈ {user, human} → prompt, so user turns become--type promptsresults straight from the transcript — consistent with the structurally identical Claude Code backend.iter_message_candidateswould descend into the inner unix-millisecondmessage.timestampand lose the clean entry-level ISO timestamp, soparse_pi_session_filewalks entries explicitly, preferring the entry timestamp and falling back to the millisecond value only for headerless v1 files.PI_CODING_AGENT_SESSION_DIRstores sessions flat with no per-cwd subdirectory, so it is resolved as a second named root and the cwd comes from the session header; per-descriptor dedup collapses the roots when the override is unset.auth.jsonis marked private and never enumerated.Verification
Search the real Pi sessions:
$ uv run agentgrep grep --agent pi analyzeEvery catalogued adapter id is advertised by the MCP layer:
$ uv run pytest tests/test_stores.py::test_runtime_adapter_ids_match_catalogue_discoveryCredentials are catalogued as private (never enumerated):
$ rg -n "pi.auth" src/agentgrep/store_catalog.pyTest plan
uv run ruff check ./uv run ruff format .— lint and format cleanuv run ty check— strict type checking passesuv run pytest --reruns 0— full suite green, including the new pi testsjust build-docs— docs build clean (pi backend page, support matrix, coverage grid)test_discover_pi_sources_honours_pi_coding_agent_dir—PI_CODING_AGENT_DIRused verbatimtest_discover_pi_sources_session_dir_override_is_flat— flat session-dir layout, cwd from headertest_search_pi_sessions— user→prompt, assistant→history with model liftedtest_parse_pi_session_entry— every entry type maps correctly or is skippedtest_parse_pi_session_v1_uses_unix_ms_timestamp_fallback— v1 millisecond fallbackCloses #25