Optimize all-agent discovery planning#41
Merged
Merged
Conversation
why: Broad all-agent lookups spend most of their time enriching every discovered source with version metadata before search planning can prune those handles. Search and find only need parseable handles, while inventory surfaces still need metadata-rich version evidence. what: - Add a typed discovery version-detail mode with metadata-free, catalog, and shape paths. - Cache Codex client-version metadata once per discovery pass when version detail is requested. - Route search, find, event streams, and lightweight MCP discovery through metadata-free discovery while keeping source listing metadata-rich. - Cover fast entrypoints and version-detail modes with parametrized tests.
why: Search and find now use metadata-free discovery while source inventory keeps rich version evidence. The docs need to describe that cost split without tying it to benchmark configuration changes. what: - Document metadata-free discovery for search/find and shape-rich discovery for source listing. - Note Codex client-version caching in backend docs. - Clarify the benchmark defaults prose without adding benchmark definitions.
why: The fast-discovery work needs repeatable benchmarks for broad all-agent prompt and conversation lookup paths. Keeping the benchmark definitions separate from documentation makes the performance harness changes independently reviewable. what: - Add all-agent prompt and conversation grep benchmarks with JSON output. - Add all-agent search and find benchmarks that exercise the prompt lookup path.
why: The benchmark harness imports click.exceptions directly and is documented as a standalone PEP 723 script. Declaring click explicitly keeps uv-run script execution and test imports aligned with the code's actual imports. what: - Add click to the benchmark script dependency block. - Add click to the dev dependency group for direct script imports in tests. - Refresh uv.lock for the new direct dev dependency edge.
why: Prompt searches were still discovering every default conversation source before the planner narrowed them down. Passing the coarse search scope into discovery avoids those filesystem walks while preserving the existing prompt-history fallback behavior for agents without dedicated prompt history. what: - Add store-role filtering to source discovery before descriptor filesystem walks. - Route search entrypoints through a scope-aware discovery helper. - Cover prompt, conversation, and all-scope discovery planning with parametrized tests.
why: Search now narrows catalogue roles before walking the filesystem. The storage catalogue docs should describe that user-visible planning behavior separately from the engine commit. what: - Document prompt-scope prompt-history discovery and per-agent transcript fallback. - Document direct chat-role discovery for conversation scope. - Clarify that all scope keeps the full default-search catalogue.
why: All-agent benchmark names and descriptions should not hide command caps. Putting the limit in both surfaces makes benchmark output and --commands selection transparent without requiring readers to inspect the command string. what: - Rename capped all-agent benchmark IDs with limit suffixes. - Replace capped-at wording with direct limit text in benchmark descriptions.
why: Capped benchmarks are useful for stable comparisons and profiling, but hidden caps make list-commands output misleading. Grep caps also need to use grep vocabulary so max-count benchmarks do not look like generic result limits. what: - Rename capped benchmark keys so grep uses max-count-N and search/find use limit-N. - Use --max-count in grep benchmark commands instead of -m. - Add bounded profile-* benchmarks for broader lookup bottleneck runs. - Add committed-config regression coverage for benchmark cap names, descriptions, and long grep flags.
why: Benchmark caps are part of the developer-facing contract for interpreting timings. The docs should make clear when a bench is a repeatable comparison point versus a bounded profiling probe. what: - Document benchmark key and description cap naming for max-count and limit flags. - Explain why committed grep benches use the long --max-count flag. - Describe profile-* benches as bounded bottleneck probes rather than primary time-series rows.
tony
added a commit
that referenced
this pull request
May 31, 2026
why: PR #41 changes broad lookup behavior and benchmark transparency in ways that should be visible in the next prerelease notes. what: - Add an unreleased 0.1.0a15 changelog entry for faster scoped discovery planning. - Mention explicit benchmark caps without embedding machine-specific timing data.
why: PR #41 changes broad lookup behavior and benchmark transparency. The unreleased changelog should describe that contribution without claiming the shape of the next release. what: - Add an unreleased changelog entry for faster scoped discovery planning. - Mention explicit benchmark caps without embedding machine-specific timing data.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #40.
This PR makes broad all-agent lookup paths faster by separating metadata-rich source inventory from latency-sensitive search/find discovery. Normal search and find entrypoints now use metadata-free discovery, while source listing and diagnostic inventory surfaces keep version evidence available.
It also pushes coarse search scope into discovery planning: prompt searches enumerate prompt-history stores first and only fall back to conversation stores for agents without prompt-history sources, while conversation scope discovers chat stores directly.
What changed
none,catalog, andshape.clickdependency for standaloneuv run scripts/benchmark.py ...use.Benchmark evidence
Post-change local timing checks with command output discarded:
Internal profiling after the planner change:
For comparison, the issue baseline showed broad all-agent discovery around 2.48s to 3.03s before search planning.
Verification
Ran before the final commit:
$ rm -rf docs/_build; uv run ruff check . --fix --show-fixes; uv run ruff format .; uv run ty check; uv run py.test --reruns 0 -vvv; just build-docs;Result:
Also ran:
$ uv run scripts/benchmark.py list-commandsConfirmed capped all-agent benchmark IDs and descriptions expose their limits.