feat(agents): implement dynamic chunking and recursive map-reduce in … by vakrahul · Pull Request #217 · XortexAI/XMem

vakrahul · 2026-06-02T05:30:26Z

Summary

This PR introduces dynamic, token-aware chunking and highly resilient recursive state management to the Summarizer agent workflow. It also enhances the model registry to provide safe context window limits for all supported providers, ensuring we never exceed token maximums during memory retrieval.

Motivation / Problem

Currently, when the Summarizer agent deals with massive context payloads or extensive historical logs, we risk hitting hard API token limits or suffering from LLM "lost in the middle" degradation. This ensures we extract high-fidelity context without dropping critical data points.

Closes #216

Changes

src/models/registry.py: Added get_model_context_window() to map and return context limits across all supported providers (Claude, OpenAI, Gemini, DeepSeek, Groq, Ollama, Bedrock) with exact and partial matching.
src/agents/summarizer.py: Updated agent initialization to calculate a safe chunk size at 80% (SAFE_THRESHOLD_RATIO) of the active model's context window.
src/agents/summarizer.py: Replaced standard summarization with a recursive map-reduce loop that slices large inputs into semantic, overlapping chunks and combines partial summaries.
src/agents/summarizer.py: Implemented an exponential backoff (1s → 2s → 4s) with up to 3 retry attempts to elegantly handle rate limits and quota errors.

Testing

Unit tests added / updated (pytest tests/unit)
Integration tests pass (pytest tests/integration)
Tested manually — steps below:

# Run unit tests to verify the agent's new recursive loop
uv run pytest tests/unit/test_agents.py

Checklist

I ran ruff check . and black --check . locally with no errors
I updated CHANGELOG.md if this is a user-visible change

…summarizer

gemini-code-assist

Code Review

This pull request introduces a dynamic chunking and map-reduce pipeline to the SummarizerAgent to handle large payloads, alongside exponential backoff retry logic for rate limits. It also adds a context window registry to dynamically determine chunk sizes based on the active model. The review feedback highlights several critical and high-severity issues: a potential infinite loop/O(N) chunking bug when processing very long strings, excessively large chunk sizes for models with massive context windows, sequential execution of chunk summaries that causes high latency, and fragile provider detection when LangChain models are wrapped in helper classes. Addressing these issues by capping chunk sizes, processing chunks concurrently, and robustly unwrapping models will significantly improve the reliability and performance of the summarizer.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

…summarizer and few changes

greptile-apps · 2026-06-02T05:46:55Z

Greptile Summary

This PR introduces a recursive map-reduce summarization pipeline and a get_model_context_window() registry lookup to make the SummarizerAgent token-aware across all supported providers. It also adds exponential-backoff retry logic for rate-limit errors on individual chunk API calls.

src/models/registry.py: Adds _CONTEXT_WINDOWS dict and get_model_context_window() with exact + longest-prefix partial matching to retrieve per-model context window sizes.
src/agents/summarizer.py: Replaces the single _call_model call with _recursive_summarize(), a depth-bounded map-reduce loop that chunks oversized payloads, summarizes them concurrently, and collapses partial summaries up to MAX_RECURSION_DEPTH = 3; also overrides _detect_provider() for improved provider detection including ChatOllama, ChatDeepSeek, and ChatMimo.

Confidence Score: 4/5

Safe to merge for English-only workloads; multilingual input could produce oversized chunks due to the character-based token estimator, and gpt-4-32k users will see unnecessarily small chunk sizes until the registry is updated.

The recursive map-reduce logic, backoff retry, and partial-match registry lookup all work correctly for the common cases. Two gaps remain: the token estimator uses a fixed 4-chars-per-token ratio that significantly underestimates CJK/emoji content (potentially causing API context-limit errors), and the OpenAI registry omits gpt-4-32k so partial matching assigns it the 8 192-token GPT-4 limit instead of its actual 32 768-token window.

Both changed files warrant a second look: src/agents/summarizer.py for the token estimation accuracy, and src/models/registry.py for the missing gpt-4-32k entry.

Important Files Changed

Filename	Overview
src/agents/summarizer.py	Adds recursive map-reduce summarization with backoff retry. Core logic is sound; the token estimator has a known accuracy gap for non-ASCII text that could cause oversized chunks for multilingual input.
src/models/registry.py	Adds context window mapping and lookup. Partial-match logic is well-structured (sorted by descending key length). Missing gpt-4-32k entry causes the 8 192-token GPT-4 limit to be applied to the 32 768-token variant via prefix match.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([arun: user_query + agent_response]) --> B[pack_summary_query]
    B --> C[_recursive_summarize depth=0]
    C --> D{depth >= MAX_RECURSION_DEPTH = 3?}
    D -- Yes --> E[Truncate to MAX_CHUNK_TOKENS*4 chars and call model directly]
    E --> Z([Return summary])
    D -- No --> F{estimated_tokens <= MAX_CHUNK_TOKENS?}
    F -- Yes base case --> G[_build_messages + _call_model_with_retry]
    G --> Z
    F -- No --> H[_chunk_payload overlapping word-based split]
    H --> I[asyncio.gather all chunks return_exceptions=True]
    I --> J{Any exceptions?}
    J -- Yes --> K[raise exceptions-0]
    J -- No --> L[Join partial summaries with separator]
    L --> M[_recursive_summarize depth+1]
    M --> C
    subgraph retry [_call_model_with_retry per chunk]
        R1[attempt 0-2] --> R2{Success?}
        R2 -- Yes --> R3([Return str])
        R2 -- No: non-rate-limit --> R4([Raise immediately])
        R2 -- No: rate-limit last attempt --> R4
        R2 -- No: rate-limit not last --> R5[asyncio.sleep backoff 1s to 2s to 4s]
        R5 --> R1
    end
    G --> retry
    I --> retry

_{Reviews (6): Last reviewed commit: "Update src/agents/summarizer.py" | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

ved015 · 2026-06-02T07:08:33Z

@vakrahul pls remove the comments from the PR they are not needed😁

vakrahul · 2026-06-02T07:19:05Z

@ved015 done

ved015 · 2026-06-02T07:24:19Z

@ved015 done

@vakrahul I can still see them i think you didnt push your commit

ved015

Remove comments

vakrahul · 2026-06-02T07:42:32Z

@ved015 done

@vakrahul I can still see them i think you didnt push your commit

yeah your right

Removed commented sections and cleaned up formatting.

vakrahul · 2026-06-02T07:48:47Z

@ved015 sorry man i didnt checked

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

ved015 · 2026-06-02T08:40:20Z

Thanks a lot for working on this and for pushing fixes after the review comments. The underlying problem is definitely valid: large normal ingest inputs should not be sent through one giant summarizer pass.

After reviewing this more deeply, I’m going to take a different architectural approach here. Instead of adding recursive chunking inside SummarizerAgent, I want to handle this at the ingest pipeline level by auto-escalating large LOW-mode inputs into a chunked/high-effort path, likely using paired user/assistant excerpts so the existing flow stays consistent.

Because of that direction change, I’m going to close this PR and implement the revised approach myself. Really appreciate the contribution and the thought you put into this.

Sorry for the change in direction here. We’d still love to see more contributions from you.

feat(agents): implement dynamic chunking and recursive map-reduce in …

5a9beb0

…summarizer

vakrahul requested review from ishaanxgupta and ved015 as code owners June 2, 2026 05:30

github-actions Bot added the agents label Jun 2, 2026

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread src/agents/summarizer.py

Comment thread src/agents/summarizer.py Outdated

Comment thread src/agents/summarizer.py Outdated

Comment thread src/agents/summarizer.py

feat(agents): implement dynamic chunking and recursive map-reduce in …

206c45d

…summarizer and few changes

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread src/agents/summarizer.py Outdated

Comment thread src/models/registry.py Outdated

Comment thread src/agents/summarizer.py

vakrahul and others added 2 commits June 2, 2026 11:17

Update src/agents/summarizer.py

9248d51

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Update src/agents/summarizer.py

9d327dc

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

ishaanxgupta assigned ved015 Jun 2, 2026

Update src/models/registry.py

e5e0634

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread src/models/registry.py

Update src/models/registry.py

ca9992b

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

ved015 suggested changes Jun 2, 2026

View reviewed changes

vakrahul added 2 commits June 2, 2026 13:15

Refactor SummarizerAgent for dynamic chunking

2de7f31

Refactor registry.py by removing comments and whitespace

747033e

Removed commented sections and cleaned up formatting.

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread src/agents/summarizer.py Outdated

Update src/agents/summarizer.py

8110e25

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

ved015 closed this Jun 2, 2026

Conversation

vakrahul commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation / Problem

Changes

Testing

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ved015 commented Jun 2, 2026

Uh oh!

vakrahul commented Jun 2, 2026

Uh oh!

ved015 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ved015 left a comment

Choose a reason for hiding this comment

Uh oh!

vakrahul commented Jun 2, 2026

Uh oh!

vakrahul commented Jun 2, 2026

Uh oh!

Uh oh!

ved015 commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vakrahul commented Jun 2, 2026 •

edited

Loading

greptile-apps Bot commented Jun 2, 2026 •

edited

Loading

ved015 commented Jun 2, 2026 •

edited

Loading