feat(agents): implement dynamic chunking and recursive map-reduce in …#217
feat(agents): implement dynamic chunking and recursive map-reduce in …#217vakrahul wants to merge 9 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a dynamic chunking and map-reduce pipeline to the SummarizerAgent to handle large payloads, alongside exponential backoff retry logic for rate limits. It also adds a context window registry to dynamically determine chunk sizes based on the active model. The review feedback highlights several critical and high-severity issues: a potential infinite loop/O(N) chunking bug when processing very long strings, excessively large chunk sizes for models with massive context windows, sequential execution of chunk summaries that causes high latency, and fragile provider detection when LangChain models are wrapped in helper classes. Addressing these issues by capping chunk sizes, processing chunks concurrently, and robustly unwrapping models will significantly improve the reliability and performance of the summarizer.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
…summarizer and few changes
|
| Filename | Overview |
|---|---|
| src/agents/summarizer.py | Adds recursive map-reduce summarization with backoff retry. Core logic is sound; the token estimator has a known accuracy gap for non-ASCII text that could cause oversized chunks for multilingual input. |
| src/models/registry.py | Adds context window mapping and lookup. Partial-match logic is well-structured (sorted by descending key length). Missing gpt-4-32k entry causes the 8 192-token GPT-4 limit to be applied to the 32 768-token variant via prefix match. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A([arun: user_query + agent_response]) --> B[pack_summary_query]
B --> C[_recursive_summarize depth=0]
C --> D{depth >= MAX_RECURSION_DEPTH = 3?}
D -- Yes --> E[Truncate to MAX_CHUNK_TOKENS*4 chars and call model directly]
E --> Z([Return summary])
D -- No --> F{estimated_tokens <= MAX_CHUNK_TOKENS?}
F -- Yes base case --> G[_build_messages + _call_model_with_retry]
G --> Z
F -- No --> H[_chunk_payload overlapping word-based split]
H --> I[asyncio.gather all chunks return_exceptions=True]
I --> J{Any exceptions?}
J -- Yes --> K[raise exceptions-0]
J -- No --> L[Join partial summaries with separator]
L --> M[_recursive_summarize depth+1]
M --> C
subgraph retry [_call_model_with_retry per chunk]
R1[attempt 0-2] --> R2{Success?}
R2 -- Yes --> R3([Return str])
R2 -- No: non-rate-limit --> R4([Raise immediately])
R2 -- No: rate-limit last attempt --> R4
R2 -- No: rate-limit not last --> R5[asyncio.sleep backoff 1s to 2s to 4s]
R5 --> R1
end
G --> retry
I --> retry
Reviews (6): Last reviewed commit: "Update src/agents/summarizer.py" | Re-trigger Greptile
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
@vakrahul pls remove the comments from the PR they are not needed😁 |
|
@ved015 done |
Removed commented sections and cleaned up formatting.
|
@ved015 sorry man i didnt checked |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
Thanks a lot for working on this and for pushing fixes after the review comments. The underlying problem is definitely valid: large normal ingest inputs should not be sent through one giant summarizer pass. After reviewing this more deeply, I’m going to take a different architectural approach here. Instead of adding recursive chunking inside Because of that direction change, I’m going to close this PR and implement the revised approach myself. Really appreciate the contribution and the thought you put into this. Sorry for the change in direction here. We’d still love to see more contributions from you. |
Summary
This PR introduces dynamic, token-aware chunking and highly resilient recursive state management to the Summarizer agent workflow. It also enhances the model registry to provide safe context window limits for all supported providers, ensuring we never exceed token maximums during memory retrieval.
Motivation / Problem
Currently, when the Summarizer agent deals with massive context payloads or extensive historical logs, we risk hitting hard API token limits or suffering from LLM "lost in the middle" degradation. This ensures we extract high-fidelity context without dropping critical data points.
Closes #216
Changes
src/models/registry.py: Addedget_model_context_window()to map and return context limits across all supported providers (Claude, OpenAI, Gemini, DeepSeek, Groq, Ollama, Bedrock) with exact and partial matching.src/agents/summarizer.py: Updated agent initialization to calculate a safe chunk size at 80% (SAFE_THRESHOLD_RATIO) of the active model's context window.src/agents/summarizer.py: Replaced standard summarization with a recursive map-reduce loop that slices large inputs into semantic, overlapping chunks and combines partial summaries.src/agents/summarizer.py: Implemented an exponential backoff (1s → 2s → 4s) with up to 3 retry attempts to elegantly handle rate limits and quota errors.Testing
pytest tests/unit)pytest tests/integration)# Run unit tests to verify the agent's new recursive loop uv run pytest tests/unit/test_agents.pyChecklist
ruff check .andblack --check .locally with no errorsCHANGELOG.mdif this is a user-visible change