Skip to content

Optimize all-agent discovery planning#41

Merged
tony merged 10 commits into
masterfrom
streamline-01
May 31, 2026
Merged

Optimize all-agent discovery planning#41
tony merged 10 commits into
masterfrom
streamline-01

Conversation

@tony
Copy link
Copy Markdown
Owner

@tony tony commented May 31, 2026

Summary

Closes #40.

This PR makes broad all-agent lookup paths faster by separating metadata-rich source inventory from latency-sensitive search/find discovery. Normal search and find entrypoints now use metadata-free discovery, while source listing and diagnostic inventory surfaces keep version evidence available.

It also pushes coarse search scope into discovery planning: prompt searches enumerate prompt-history stores first and only fall back to conversation stores for agents without prompt-history sources, while conversation scope discovers chat stores directly.

What changed

  • Add discovery detail levels: none, catalog, and shape.
  • Route CLI/MCP search and find paths through the metadata-free path.
  • Keep source listing and inventory APIs on metadata-rich discovery.
  • Add role-filtered source discovery so prompt/conversation scope avoids unnecessary store walks.
  • Add benchmark definitions for broad all-agent lookup paths and make all capped benchmark names/descriptions include their limits.
  • Document fast discovery behavior and scoped discovery planning.
  • Declare the benchmark script's direct click dependency for standalone uv run scripts/benchmark.py ... use.

Benchmark evidence

Post-change local timing checks with command output discarded:

grep-all-prompts-json-limit-50          0.731s
grep-all-conversations-json-limit-50    1.816s
search-all-prompts-json-limit-50        0.758s
find-all-prompts-json-limit-500         0.727s

Internal profiling after the planner change:

discover_sources_for_search_prompts     0.309s, 71 sources
run_search_query_prompts                0.513s, 50 records
plan_prompts                            0.068s, 24 sources
collect_prompts                         0.132s, 50 records

For comparison, the issue baseline showed broad all-agent discovery around 2.48s to 3.03s before search planning.

Verification

Ran before the final commit:

$ rm -rf docs/_build; uv run ruff check . --fix --show-fixes; uv run ruff format .; uv run ty check; uv run py.test --reruns 0 -vvv; just build-docs;

Result:

All checks passed
83 files left unchanged
All checks passed
880 passed in 51.13s
build succeeded

Also ran:

$ uv run scripts/benchmark.py list-commands

Confirmed capped all-agent benchmark IDs and descriptions expose their limits.

tony added 7 commits May 31, 2026 14:01
why: Broad all-agent lookups spend most of their time enriching every discovered source with version metadata before search planning can prune those handles. Search and find only need parseable handles, while inventory surfaces still need metadata-rich version evidence.
what:
- Add a typed discovery version-detail mode with metadata-free, catalog, and shape paths.
- Cache Codex client-version metadata once per discovery pass when version detail is requested.
- Route search, find, event streams, and lightweight MCP discovery through metadata-free discovery while keeping source listing metadata-rich.
- Cover fast entrypoints and version-detail modes with parametrized tests.
why: Search and find now use metadata-free discovery while source inventory keeps rich version evidence. The docs need to describe that cost split without tying it to benchmark configuration changes.
what:
- Document metadata-free discovery for search/find and shape-rich discovery for source listing.
- Note Codex client-version caching in backend docs.
- Clarify the benchmark defaults prose without adding benchmark definitions.
why: The fast-discovery work needs repeatable benchmarks for broad all-agent prompt and conversation lookup paths. Keeping the benchmark definitions separate from documentation makes the performance harness changes independently reviewable.
what:
- Add all-agent prompt and conversation grep benchmarks with JSON output.
- Add all-agent search and find benchmarks that exercise the prompt lookup path.
why: The benchmark harness imports click.exceptions directly and is documented as a standalone PEP 723 script. Declaring click explicitly keeps uv-run script execution and test imports aligned with the code's actual imports.
what:
- Add click to the benchmark script dependency block.
- Add click to the dev dependency group for direct script imports in tests.
- Refresh uv.lock for the new direct dev dependency edge.
why: Prompt searches were still discovering every default conversation source before the planner narrowed them down. Passing the coarse search scope into discovery avoids those filesystem walks while preserving the existing prompt-history fallback behavior for agents without dedicated prompt history.
what:
- Add store-role filtering to source discovery before descriptor filesystem walks.
- Route search entrypoints through a scope-aware discovery helper.
- Cover prompt, conversation, and all-scope discovery planning with parametrized tests.
why: Search now narrows catalogue roles before walking the filesystem. The storage catalogue docs should describe that user-visible planning behavior separately from the engine commit.
what:
- Document prompt-scope prompt-history discovery and per-agent transcript fallback.
- Document direct chat-role discovery for conversation scope.
- Clarify that all scope keeps the full default-search catalogue.
why: All-agent benchmark names and descriptions should not hide command caps. Putting the limit in both surfaces makes benchmark output and --commands selection transparent without requiring readers to inspect the command string.
what:
- Rename capped all-agent benchmark IDs with limit suffixes.
- Replace capped-at wording with direct limit text in benchmark descriptions.
why: Capped benchmarks are useful for stable comparisons and profiling, but hidden caps make list-commands output misleading. Grep caps also need to use grep vocabulary so max-count benchmarks do not look like generic result limits.

what:
- Rename capped benchmark keys so grep uses max-count-N and search/find use limit-N.
- Use --max-count in grep benchmark commands instead of -m.
- Add bounded profile-* benchmarks for broader lookup bottleneck runs.
- Add committed-config regression coverage for benchmark cap names, descriptions, and long grep flags.
why: Benchmark caps are part of the developer-facing contract for interpreting timings. The docs should make clear when a bench is a repeatable comparison point versus a bounded profiling probe.

what:
- Document benchmark key and description cap naming for max-count and limit flags.
- Explain why committed grep benches use the long --max-count flag.
- Describe profile-* benches as bounded bottleneck probes rather than primary time-series rows.
tony added a commit that referenced this pull request May 31, 2026
why: PR #41 changes broad lookup behavior and benchmark transparency in ways that should be visible in the next prerelease notes.
what:
- Add an unreleased 0.1.0a15 changelog entry for faster scoped discovery planning.
- Mention explicit benchmark caps without embedding machine-specific timing data.
why: PR #41 changes broad lookup behavior and benchmark transparency. The unreleased changelog should describe that contribution without claiming the shape of the next release.
what:
- Add an unreleased changelog entry for faster scoped discovery planning.
- Mention explicit benchmark caps without embedding machine-specific timing data.
@tony tony force-pushed the streamline-01 branch from 7ee9ccb to a0402dc Compare May 31, 2026 21:14
@tony tony merged commit 32e8c06 into master May 31, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Profile and optimize all-agent discovery bottlenecks

1 participant