Skip to content

chore(parsers): Mapping vllm parser tests to new PARSER_CASES.md taxonomy#9290

Open
zhongdaor-nv wants to merge 1 commit intomainfrom
zhongdaor/dis-1926-research-vllm-parser-test-coverage-gaps
Open

chore(parsers): Mapping vllm parser tests to new PARSER_CASES.md taxonomy#9290
zhongdaor-nv wants to merge 1 commit intomainfrom
zhongdaor/dis-1926-research-vllm-parser-test-coverage-gaps

Conversation

@zhongdaor-nv
Copy link
Copy Markdown
Contributor

@zhongdaor-nv zhongdaor-nv commented May 7, 2026

Overview

Output of DIS-1926 — bidirectional diff between vLLM and Dynamo tool-parser test corpora, mapped onto the new PARSER_CASES.md taxonomy (PR #9127). Doc-only changes; no source touched.

Unblocks DIS-1906 (cross-impl parser parity harness) by giving it a stable, accurate label set with per-test bucket assignments.

What's in this PR

lib/parsers/PARSER_CASES.md (+96 / −12)

Taxonomy refinements driven by gaps the audit surfaced:

  • Split PARSER.fmt.1PARSER.fmt.5. Old CASE.21 (and an earlier draft of PARSER.fmt.1) conflated function-name surface concerns with argument-envelope shape concerns. Now:
    • PARSER.fmt.1 — function-name surface only (allowed identifier chars, functions.NAME vs bare NAME, malformed-ID rejection).
    • PARSER.fmt.5 — argument-envelope shape: native call-ID preservation (Kimi K2 PR #32768), JSON field-order tolerance ({name, arguments} vs {arguments, name}), argumentsparameters key alias.
  • Broaden PARSER.fmt.3 examples — beyond Kimi K2 singular vs plural section tokens, document Mistral pre-v11 vs v11+ wire format, Llama 3 with vs without <|python_tag|>, Hermes qwen25 registry alias.
  • New Known production gaps section — flags Mistral v11+ wire format ([TOOL_CALLS]name{...args} name-then-object) as a parser-implementation gap. Dynamo's ToolCallConfig::mistral() currently only handles pre-v11 (JSON-array
    body); vLLM tests v11 extensively. v11 is the current Mistral-Small / Mistral-Large production path.
  • Promote regex-timeout / failure containment to Universal Gaps — vLLM has explicit test_regex_timeout_handling for llama3_json / llama4_pythonic / pythonic and *_streaming_exception_returns_none for Mistral; Dynamo relies on
    Rust regex linear-time guarantees but does not pin failure-containment paths.
  • Cross-ref PARSER.batch.1 happy-path → PARSER.fmt.5 native-ID sub-axis.

Summary by CodeRabbit

  • Documentation
    • Updated internal parser implementation guidelines and test coverage documentation to clarify format-conditional variants and argument-envelope shape conventions across different model formats.

… refinement

Output of DIS-1926 (research vLLM parser test coverage gaps). Two doc-only
changes; no source touched. `cargo check -p dynamo-parsers --tests` passes.

PARSER_CASES.md (+96/-12 lines):

- Split PARSER.fmt.1 (function-name surface) from new PARSER.fmt.5
  (argument-envelope shape: native call-ID preservation, JSON field-order
  tolerance, arguments↔parameters key alias). The old CASE.21 conflated
  both axes; review caught the conflation in fmt.1 and required a clean
  split.
- Broaden PARSER.fmt.3 examples beyond Kimi K2 to include Mistral
  pre-v11 vs v11+ wire formats, Llama 3 with vs without `<|python_tag|>`,
  Hermes `qwen25` registry alias.
- Add `Known production gaps` section flagging Mistral v11+ wire format
  (`[TOOL_CALLS]name{...args}` name-then-object) — Dynamo's
  `ToolCallConfig::mistral()` only handles pre-v11 JSON-array body, while
  vLLM tests v11 extensively. v11 is the current Mistral-Small / Large
  production path. Largest single Dynamo parser gap surfaced by the audit.
- Promote regex-timeout / parser-exception containment to Universal Gaps
  (vLLM has explicit `test_regex_timeout_handling` for llama3_json /
  llama4_pythonic / pythonic and `*_streaming_exception_returns_none` for
  Mistral; Dynamo relies on Rust regex linear-time guarantees but does not
  pin failure-containment paths).
- Cross-ref PARSER.batch.1 happy-path → PARSER.fmt.5 native-ID sub-axis.
- Update Applicability summary and `Adding a new parser` minimum viable set
  to cover fmt.{1..5}.

VLLM_TEST_AUDIT.md (new file, 906 lines, 493 distinct test rows):

- Bidirectional audit of vLLM tool-parser tests (`tests/tool_parsers/*`,
  `tests/tool_use/*`, `tests/entrypoints/openai/tool_parsers/*`) at vLLM
  commit b53c507bc91f87e28b03e9b54bbff7c76e97d58b vs Dynamo `main`.
- Every row carries a clickable GitHub link to the test source plus the
  PARSER_CASES.md bucket(s) it belongs to. Re-bucketed against the new
  PR #9127 taxonomy:
    - 244 streaming rows split per-row into PARSER.stream.{1,2,3,4}
      (single-call / multi-call / partial-token / termination)
    - 26 fmt rows split per-row into PARSER.fmt.1 (function-name) vs
      PARSER.fmt.5 (argument-shape)
    - Out-of-PARSER-scope buckets relocated: CASE.{11,18,25} →
      FRONTEND.{1,3,5,6}; CASE.12 → PIPELINE.finish_reason; CASE.{9,10,17}
      → REASONING.batch.{1,2}; CASE.20 → `// helper`; CASE.16 →
      inline-regression annotation; CASE.26 dissolved into
      PARSER.batch.4 (impl-defined recovery contract)
- Two mis-bucketings caught and fixed during review:
  FunctionGemma::test_multiple_tool_calls and
  Gemma4::TestExtractToolCalls.test_multiple_tool_calls were both labeled
  CASE.1 but assert len(tool_calls) == 2 — corrected to PARSER.batch.2.
- Four bucket-assignment refinements caught by review:
  test_unique_tool_call_ids (DSv3.2) drops fmt.5 (no native call-ID
  surface, just parallel-distinctness); test_invalid_funcall_id_skipped
  (Kimi K2) moves fmt.5 → fmt.1 (validation, not preservation);
  3 Mistral `argument_before_name*` parametrized rows gain fmt.5 (canonical
  field-order swap test set referenced by PARSER_CASES.md).
- Staleness banner at top documents the re-bucketing transformation and
  flags the 2 mis-bucket fixes for traceability.

Top-3 P0 gap status post-delta:

1. Mistral v11 wire format — STILL OPEN (parser doesn't exist).
2. PARSER.stream.{1..4} parser-tier — partial; DSv4 + Gemma 4 added
   coverage (PR #8946 + #8852); Kimi K2 / Qwen3 / Hermes / Pythonic /
   Mistral parser-tier streaming tests still gap.
3. CASE.25 / FRONTEND.3 (`adjust_request`) — CLOSED for 7 families via
   28 new tests in `lib/llm/tests/tool_choice.rs` (PR #8946 + #9035).

Closing PRs since 2026-05-05 baseline (cf87277..b9418d3): #8888
(9 silent-drop recoveries), #8946 (DSv4 + Kimi K2 coverage), #9035
(top-N CASE.6+ quartet), #8852 (Gemma 4 family), #9127 (taxonomy rename).

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
@github-actions github-actions Bot added chore documentation Improvements or additions to documentation labels May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

@zhongdaor-nv
Copy link
Copy Markdown
Contributor Author

Note: lib/parsers/VLLM_TEST_AUDIT.md is only for reviewers to check correctness. I will remove this file before merging.

@zhongdaor-nv zhongdaor-nv marked this pull request as ready for review May 7, 2026 23:37
@zhongdaor-nv zhongdaor-nv changed the title chore(parsers): DIS-1926 — bidirectional vLLM↔Dynamo audit + taxonomy… chore(parsers): Mapping vllm parser tests to new PARSER_CASES.md taxonomy May 7, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

Walkthrough

This PR expands the tool-call parser taxonomy documentation by introducing PARSER.fmt.5 for argument-envelope shape conventions, detailed wire-format variant coverage in PARSER.fmt.3, clarifies PARSER.fmt.1 scope, and updates all cross-references and applicability checklists to reflect the new taxonomy dimension.

Changes

Parser Taxonomy: Argument-Shape & Wire-Format

Layer / File(s) Summary
Introduce PARSER.fmt.5 & Universal Gaps
lib/parsers/PARSER_CASES.md
First mention of PARSER.fmt.5 argument-envelope conventions; expands universal gaps section with regex-timeout/exception guidance and Mistral v11+ production gap note.
Define PARSER.fmt.5 Argument-Shape Conventions
lib/parsers/PARSER_CASES.md
Full PARSER.fmt.5 section defining three argument-envelope sub-axes: native call-ID preservation, JSON field-order tolerance (including arguments key named name edge case), and argument-key aliasing (arguments vs parameters), with references to named parametrized tests.
Expand PARSER.fmt.3 Wire-Format Variants
lib/parsers/PARSER_CASES.md
Detailed PARSER.fmt.3 documentation enumerating multiple wire-format spellings across Kimi K2, Mistral (pre-v11 vs v11+), Llama 3 (python_tag fence presence), and Hermes (qwen25 alias), emphasizing active-config registration constraints.
Clarify Format-Related Scope & Batch.1 Reference
lib/parsers/PARSER_CASES.md
Clarifies that PARSER.fmt.1 covers only function-name surface and excludes argument-envelope shape (covered by PARSER.fmt.5); updates PARSER.batch.1 happy-path requirements to assert native ToolCall.id preservation with PARSER.fmt.5 cross-reference.
Update Applicability Summary & New-Parser Checklist
lib/parsers/PARSER_CASES.md
Updates applicability summary table and new-parser-checklist to include PARSER.fmt.{1..5} coverage; explicitly calls out PARSER.fmt.5 requirement for JSON-family parsers.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title mentions 'vllm parser tests' and 'taxonomy', but the actual changes are documentation-only updates to PARSER_CASES.md taxonomy definitions themselves, not mapping of vLLM tests to the taxonomy. Revise the title to reflect that this is a documentation update to the PARSER_CASES.md taxonomy (e.g., 'docs(parsers): Update PARSER_CASES taxonomy with fmt.5 and audit findings' or similar).
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The description includes all required template sections: Overview (with linked issue context), detailed breakdown of What's in this PR with specific taxonomy changes, and implicit related-issue reference through DIS-1926/PR #9127.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/parsers/PARSER_CASES.md (1)

121-126: ⚡ Quick win

Add direct GitHub links for cited PR references.

You reference incidents as PR #...; please also include clickable GitHub URLs for those references to improve audit traceability in docs.

As per coding guidelines "**/*.md: Markdown documentation may reference Linear tickets for internal context, but should prefer to also include the matching GitHub link when one exists`."

Also applies to: 329-333, 429-432

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/parsers/PARSER_CASES.md` around lines 121 - 126, Update the Markdown
references that cite PR numbers to include direct clickable GitHub URLs: replace
occurrences like "PR `#32768`" (and the other cited ranges around lines 329-333
and 429-432) with the full GitHub PR link for the corresponding repository/PR,
keeping the existing text (e.g., "vLLM Kimi K2 PR `#32768`") but appending or
replacing with "([vLLM#32768](https://github.com/vllm-org/vllm/pull/32768))" or
the correct repo/PR URL; ensure the sentences mentioning PARSER.fmt.5, Kimi K2,
and any other PR references consistently include the clickable link to satisfy
the Markdown guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/parsers/PARSER_CASES.md`:
- Around line 121-126: Update the Markdown references that cite PR numbers to
include direct clickable GitHub URLs: replace occurrences like "PR `#32768`" (and
the other cited ranges around lines 329-333 and 429-432) with the full GitHub PR
link for the corresponding repository/PR, keeping the existing text (e.g., "vLLM
Kimi K2 PR `#32768`") but appending or replacing with
"([vLLM#32768](https://github.com/vllm-org/vllm/pull/32768))" or the correct
repo/PR URL; ensure the sentences mentioning PARSER.fmt.5, Kimi K2, and any
other PR references consistently include the clickable link to satisfy the
Markdown guideline.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 75224add-ba0e-4dc8-a6ae-d509f3be65f2

📥 Commits

Reviewing files that changed from the base of the PR and between 73bc969 and e03a365.

📒 Files selected for processing (2)
  • lib/parsers/PARSER_CASES.md
  • lib/parsers/VLLM_TEST_AUDIT.md

@zhongdaor-nv zhongdaor-nv requested a review from a team as a code owner May 8, 2026 00:21
@zhongdaor-nv zhongdaor-nv force-pushed the zhongdaor/dis-1926-research-vllm-parser-test-coverage-gaps branch from 55fb950 to e03a365 Compare May 8, 2026 00:34
Copy link
Copy Markdown
Contributor

@ayushag-nv ayushag-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove VLLM_TEST_AUDIT.md. LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore documentation Improvements or additions to documentation size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants