Skip to content

Batch rerank http requests by query#2109

Open
ChrisJar wants to merge 2 commits into
NVIDIA:mainfrom
ChrisJar:rerank-http
Open

Batch rerank http requests by query#2109
ChrisJar wants to merge 2 commits into
NVIDIA:mainfrom
ChrisJar:rerank-http

Conversation

@ChrisJar
Copy link
Copy Markdown
Collaborator

@ChrisJar ChrisJar commented May 22, 2026

Description

Groups reranker HTTP requests by query so each request sends all candidate passages together.

Takes bo767 evaluation queries per second from 1.28 to 1.79

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@ChrisJar ChrisJar requested review from a team as code owners May 22, 2026 22:29
@ChrisJar ChrisJar requested a review from drobison00 May 22, 2026 22:29
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR optimizes the HTTP reranker path in _rerank_batch by grouping rows that share the same query and sending all their passages in a single endpoint request, reducing the request count from one-per-row to one-per-unique-query. Both previous review concerns (silent score misalignment on short endpoint responses and undocumented unhashable-query fallback) have been addressed with an explicit RuntimeError guard and a logger.warning, respectively.

  • Batching logic (rerank.py): replaces the per-row _rerank_via_endpoint loop with a groups dict keyed by query, then fans scores back to per-row positions using the stored indices list.
  • Alignment guard: raises RuntimeError when the endpoint returns a different number of scores than documents sent, making silent misalignment impossible.
  • Test suite (test_nemotron_rerank_v2.py): adds four new / updated test cases covering batched same-query requests, multi-query dispatch, the mismatch error, and the unhashable-query warning path.

Confidence Score: 5/5

Safe to merge — the batching logic is correct, score alignment is preserved, and both previously identified edge cases are now explicitly guarded.

The grouping-by-query approach correctly preserves per-row score alignment because _rerank_via_endpoint always returns a list whose length equals the number of input documents (pre-initialized with -inf). The explicit length check prevents any silent truncation. Dictionary insertion-order (Python 3.7+) guarantees stable group ordering. Tests cover the batched happy path, multi-query dispatch, the mismatch error, and the unhashable-query fallback.

No files require special attention.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/rerank/rerank.py Replaces per-row HTTP calls with grouped batching by query; adds an explicit RuntimeError guard for score-count mismatches and a warning for unhashable queries. Logic and alignment are correct.
nemo_retriever/tests/test_nemotron_rerank_v2.py Adds four targeted tests for batching, multi-query dispatch, score-count mismatch, and unhashable-query fallback; existing sort-descending test updated to match the new batched response shape.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_rerank_batch(batch_df)"] --> B{rerank_invoke_url set?}
    B -- No --> C[Local model: score_pairs]
    B -- Yes --> D["Build groups dict keyed by query"]
    D --> E{Query is hashable?}
    E -- No --> F["Warn and use unique fallback key"]
    E -- Yes --> G["Use query as key"]
    F --> H["groups.setdefault: accumulate indices and docs"]
    G --> H
    H --> I["For each group: _rerank_via_endpoint(query, all_docs)"]
    I --> J{score count matches doc count?}
    J -- No --> K["raise RuntimeError: score alignment is broken"]
    J -- Yes --> L["Assign scores back to original row positions"]
    L --> M["Attach score column to DataFrame"]
    C --> M
    M --> N{sort_results?}
    N -- Yes --> O[Sort descending]
    N -- No --> P[Return unchanged order]
Loading

Reviews (2): Last reviewed commit: "Address greptile" | Re-trigger Greptile

Comment thread nemo_retriever/src/nemo_retriever/rerank/rerank.py
Comment thread nemo_retriever/src/nemo_retriever/rerank/rerank.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant