Skip to content
This repository was archived by the owner on Jun 3, 2026. It is now read-only.

feat(agents): implement dynamic chunking and recursive map-reduce in …#217

Closed
vakrahul wants to merge 9 commits into
XortexAI:mainfrom
vakrahul:feature/dynamic-summarizer
Closed

feat(agents): implement dynamic chunking and recursive map-reduce in …#217
vakrahul wants to merge 9 commits into
XortexAI:mainfrom
vakrahul:feature/dynamic-summarizer

Conversation

@vakrahul
Copy link
Copy Markdown
Contributor

@vakrahul vakrahul commented Jun 2, 2026

Summary

This PR introduces dynamic, token-aware chunking and highly resilient recursive state management to the Summarizer agent workflow. It also enhances the model registry to provide safe context window limits for all supported providers, ensuring we never exceed token maximums during memory retrieval.

Motivation / Problem

Currently, when the Summarizer agent deals with massive context payloads or extensive historical logs, we risk hitting hard API token limits or suffering from LLM "lost in the middle" degradation. This ensures we extract high-fidelity context without dropping critical data points.

Closes #216

Changes

  • src/models/registry.py: Added get_model_context_window() to map and return context limits across all supported providers (Claude, OpenAI, Gemini, DeepSeek, Groq, Ollama, Bedrock) with exact and partial matching.
  • src/agents/summarizer.py: Updated agent initialization to calculate a safe chunk size at 80% (SAFE_THRESHOLD_RATIO) of the active model's context window.
  • src/agents/summarizer.py: Replaced standard summarization with a recursive map-reduce loop that slices large inputs into semantic, overlapping chunks and combines partial summaries.
  • src/agents/summarizer.py: Implemented an exponential backoff (1s → 2s → 4s) with up to 3 retry attempts to elegantly handle rate limits and quota errors.

Testing

  • Unit tests added / updated (pytest tests/unit)
  • Integration tests pass (pytest tests/integration)
  • Tested manually — steps below:
# Run unit tests to verify the agent's new recursive loop
uv run pytest tests/unit/test_agents.py

Checklist

  • I ran ruff check . and black --check . locally with no errors
  • I updated CHANGELOG.md if this is a user-visible change

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a dynamic chunking and map-reduce pipeline to the SummarizerAgent to handle large payloads, alongside exponential backoff retry logic for rate limits. It also adds a context window registry to dynamically determine chunk sizes based on the active model. The review feedback highlights several critical and high-severity issues: a potential infinite loop/O(N) chunking bug when processing very long strings, excessively large chunk sizes for models with massive context windows, sequential execution of chunk summaries that causes high latency, and fragile provider detection when LangChain models are wrapped in helper classes. Addressing these issues by capping chunk sizes, processing chunks concurrently, and robustly unwrapping models will significantly improve the reliability and performance of the summarizer.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/agents/summarizer.py
Comment thread src/agents/summarizer.py Outdated
Comment thread src/agents/summarizer.py Outdated
Comment thread src/agents/summarizer.py
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 2, 2026

Greptile Summary

This PR introduces a recursive map-reduce summarization pipeline and a get_model_context_window() registry lookup to make the SummarizerAgent token-aware across all supported providers. It also adds exponential-backoff retry logic for rate-limit errors on individual chunk API calls.

  • src/models/registry.py: Adds _CONTEXT_WINDOWS dict and get_model_context_window() with exact + longest-prefix partial matching to retrieve per-model context window sizes.
  • src/agents/summarizer.py: Replaces the single _call_model call with _recursive_summarize(), a depth-bounded map-reduce loop that chunks oversized payloads, summarizes them concurrently, and collapses partial summaries up to MAX_RECURSION_DEPTH = 3; also overrides _detect_provider() for improved provider detection including ChatOllama, ChatDeepSeek, and ChatMimo.

Confidence Score: 4/5

Safe to merge for English-only workloads; multilingual input could produce oversized chunks due to the character-based token estimator, and gpt-4-32k users will see unnecessarily small chunk sizes until the registry is updated.

The recursive map-reduce logic, backoff retry, and partial-match registry lookup all work correctly for the common cases. Two gaps remain: the token estimator uses a fixed 4-chars-per-token ratio that significantly underestimates CJK/emoji content (potentially causing API context-limit errors), and the OpenAI registry omits gpt-4-32k so partial matching assigns it the 8 192-token GPT-4 limit instead of its actual 32 768-token window.

Both changed files warrant a second look: src/agents/summarizer.py for the token estimation accuracy, and src/models/registry.py for the missing gpt-4-32k entry.

Important Files Changed

Filename Overview
src/agents/summarizer.py Adds recursive map-reduce summarization with backoff retry. Core logic is sound; the token estimator has a known accuracy gap for non-ASCII text that could cause oversized chunks for multilingual input.
src/models/registry.py Adds context window mapping and lookup. Partial-match logic is well-structured (sorted by descending key length). Missing gpt-4-32k entry causes the 8 192-token GPT-4 limit to be applied to the 32 768-token variant via prefix match.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([arun: user_query + agent_response]) --> B[pack_summary_query]
    B --> C[_recursive_summarize depth=0]
    C --> D{depth >= MAX_RECURSION_DEPTH = 3?}
    D -- Yes --> E[Truncate to MAX_CHUNK_TOKENS*4 chars and call model directly]
    E --> Z([Return summary])
    D -- No --> F{estimated_tokens <= MAX_CHUNK_TOKENS?}
    F -- Yes base case --> G[_build_messages + _call_model_with_retry]
    G --> Z
    F -- No --> H[_chunk_payload overlapping word-based split]
    H --> I[asyncio.gather all chunks return_exceptions=True]
    I --> J{Any exceptions?}
    J -- Yes --> K[raise exceptions-0]
    J -- No --> L[Join partial summaries with separator]
    L --> M[_recursive_summarize depth+1]
    M --> C
    subgraph retry [_call_model_with_retry per chunk]
        R1[attempt 0-2] --> R2{Success?}
        R2 -- Yes --> R3([Return str])
        R2 -- No: non-rate-limit --> R4([Raise immediately])
        R2 -- No: rate-limit last attempt --> R4
        R2 -- No: rate-limit not last --> R5[asyncio.sleep backoff 1s to 2s to 4s]
        R5 --> R1
    end
    G --> retry
    I --> retry
Loading

Fix All in Cursor Fix All in Codex Fix All in Claude Code

Reviews (6): Last reviewed commit: "Update src/agents/summarizer.py" | Re-trigger Greptile

Comment thread src/agents/summarizer.py Outdated
Comment thread src/models/registry.py Outdated
Comment thread src/agents/summarizer.py
vakrahul and others added 2 commits June 2, 2026 11:17
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Comment thread src/models/registry.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@ved015
Copy link
Copy Markdown
Member

ved015 commented Jun 2, 2026

@vakrahul pls remove the comments from the PR they are not needed😁

@vakrahul
Copy link
Copy Markdown
Contributor Author

vakrahul commented Jun 2, 2026

@ved015 done

@ved015
Copy link
Copy Markdown
Member

ved015 commented Jun 2, 2026

@ved015 done

@vakrahul I can still see them i think you didnt push your commit

Copy link
Copy Markdown
Member

@ved015 ved015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comments

@vakrahul
Copy link
Copy Markdown
Contributor Author

vakrahul commented Jun 2, 2026

@ved015 done

@vakrahul I can still see them i think you didnt push your commit

yeah your right

@vakrahul
Copy link
Copy Markdown
Contributor Author

vakrahul commented Jun 2, 2026

@ved015 sorry man i didnt checked

Comment thread src/agents/summarizer.py Outdated
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@ved015
Copy link
Copy Markdown
Member

ved015 commented Jun 2, 2026

Thanks a lot for working on this and for pushing fixes after the review comments. The underlying problem is definitely valid: large normal ingest inputs should not be sent through one giant summarizer pass.

After reviewing this more deeply, I’m going to take a different architectural approach here. Instead of adding recursive chunking inside SummarizerAgent, I want to handle this at the ingest pipeline level by auto-escalating large LOW-mode inputs into a chunked/high-effort path, likely using paired user/assistant excerpts so the existing flow stays consistent.

Because of that direction change, I’m going to close this PR and implement the revised approach myself. Really appreciate the contribution and the thought you put into this.

Sorry for the change in direction here. We’d still love to see more contributions from you.

@ved015 ved015 closed this Jun 2, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Implement Dynamic Chunking & Recursive State Management in the Summarizer Agent

2 participants