chore(parsers): Mapping vllm parser tests to new PARSER_CASES.md taxonomy#9290
chore(parsers): Mapping vllm parser tests to new PARSER_CASES.md taxonomy#9290zhongdaor-nv wants to merge 1 commit intomainfrom
Conversation
… refinement Output of DIS-1926 (research vLLM parser test coverage gaps). Two doc-only changes; no source touched. `cargo check -p dynamo-parsers --tests` passes. PARSER_CASES.md (+96/-12 lines): - Split PARSER.fmt.1 (function-name surface) from new PARSER.fmt.5 (argument-envelope shape: native call-ID preservation, JSON field-order tolerance, arguments↔parameters key alias). The old CASE.21 conflated both axes; review caught the conflation in fmt.1 and required a clean split. - Broaden PARSER.fmt.3 examples beyond Kimi K2 to include Mistral pre-v11 vs v11+ wire formats, Llama 3 with vs without `<|python_tag|>`, Hermes `qwen25` registry alias. - Add `Known production gaps` section flagging Mistral v11+ wire format (`[TOOL_CALLS]name{...args}` name-then-object) — Dynamo's `ToolCallConfig::mistral()` only handles pre-v11 JSON-array body, while vLLM tests v11 extensively. v11 is the current Mistral-Small / Large production path. Largest single Dynamo parser gap surfaced by the audit. - Promote regex-timeout / parser-exception containment to Universal Gaps (vLLM has explicit `test_regex_timeout_handling` for llama3_json / llama4_pythonic / pythonic and `*_streaming_exception_returns_none` for Mistral; Dynamo relies on Rust regex linear-time guarantees but does not pin failure-containment paths). - Cross-ref PARSER.batch.1 happy-path → PARSER.fmt.5 native-ID sub-axis. - Update Applicability summary and `Adding a new parser` minimum viable set to cover fmt.{1..5}. VLLM_TEST_AUDIT.md (new file, 906 lines, 493 distinct test rows): - Bidirectional audit of vLLM tool-parser tests (`tests/tool_parsers/*`, `tests/tool_use/*`, `tests/entrypoints/openai/tool_parsers/*`) at vLLM commit b53c507bc91f87e28b03e9b54bbff7c76e97d58b vs Dynamo `main`. - Every row carries a clickable GitHub link to the test source plus the PARSER_CASES.md bucket(s) it belongs to. Re-bucketed against the new PR #9127 taxonomy: - 244 streaming rows split per-row into PARSER.stream.{1,2,3,4} (single-call / multi-call / partial-token / termination) - 26 fmt rows split per-row into PARSER.fmt.1 (function-name) vs PARSER.fmt.5 (argument-shape) - Out-of-PARSER-scope buckets relocated: CASE.{11,18,25} → FRONTEND.{1,3,5,6}; CASE.12 → PIPELINE.finish_reason; CASE.{9,10,17} → REASONING.batch.{1,2}; CASE.20 → `// helper`; CASE.16 → inline-regression annotation; CASE.26 dissolved into PARSER.batch.4 (impl-defined recovery contract) - Two mis-bucketings caught and fixed during review: FunctionGemma::test_multiple_tool_calls and Gemma4::TestExtractToolCalls.test_multiple_tool_calls were both labeled CASE.1 but assert len(tool_calls) == 2 — corrected to PARSER.batch.2. - Four bucket-assignment refinements caught by review: test_unique_tool_call_ids (DSv3.2) drops fmt.5 (no native call-ID surface, just parallel-distinctness); test_invalid_funcall_id_skipped (Kimi K2) moves fmt.5 → fmt.1 (validation, not preservation); 3 Mistral `argument_before_name*` parametrized rows gain fmt.5 (canonical field-order swap test set referenced by PARSER_CASES.md). - Staleness banner at top documents the re-bucketing transformation and flags the 2 mis-bucket fixes for traceability. Top-3 P0 gap status post-delta: 1. Mistral v11 wire format — STILL OPEN (parser doesn't exist). 2. PARSER.stream.{1..4} parser-tier — partial; DSv4 + Gemma 4 added coverage (PR #8946 + #8852); Kimi K2 / Qwen3 / Hermes / Pythonic / Mistral parser-tier streaming tests still gap. 3. CASE.25 / FRONTEND.3 (`adjust_request`) — CLOSED for 7 families via 28 new tests in `lib/llm/tests/tool_choice.rs` (PR #8946 + #9035). Closing PRs since 2026-05-05 baseline (cf87277..b9418d3): #8888 (9 silent-drop recoveries), #8946 (DSv4 + Kimi K2 coverage), #9035 (top-N CASE.6+ quartet), #8852 (Gemma 4 family), #9127 (taxonomy rename). Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
|
Note: lib/parsers/VLLM_TEST_AUDIT.md is only for reviewers to check correctness. I will remove this file before merging. |
WalkthroughThis PR expands the tool-call parser taxonomy documentation by introducing ChangesParser Taxonomy: Argument-Shape & Wire-Format
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lib/parsers/PARSER_CASES.md (1)
121-126: ⚡ Quick winAdd direct GitHub links for cited PR references.
You reference incidents as
PR #...; please also include clickable GitHub URLs for those references to improve audit traceability in docs.As per coding guidelines "
**/*.md: Markdown documentation may reference Linear tickets for internal context, but should prefer to also include the matching GitHub link when one exists`."Also applies to: 329-333, 429-432
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/parsers/PARSER_CASES.md` around lines 121 - 126, Update the Markdown references that cite PR numbers to include direct clickable GitHub URLs: replace occurrences like "PR `#32768`" (and the other cited ranges around lines 329-333 and 429-432) with the full GitHub PR link for the corresponding repository/PR, keeping the existing text (e.g., "vLLM Kimi K2 PR `#32768`") but appending or replacing with "([vLLM#32768](https://github.com/vllm-org/vllm/pull/32768))" or the correct repo/PR URL; ensure the sentences mentioning PARSER.fmt.5, Kimi K2, and any other PR references consistently include the clickable link to satisfy the Markdown guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@lib/parsers/PARSER_CASES.md`:
- Around line 121-126: Update the Markdown references that cite PR numbers to
include direct clickable GitHub URLs: replace occurrences like "PR `#32768`" (and
the other cited ranges around lines 329-333 and 429-432) with the full GitHub PR
link for the corresponding repository/PR, keeping the existing text (e.g., "vLLM
Kimi K2 PR `#32768`") but appending or replacing with
"([vLLM#32768](https://github.com/vllm-org/vllm/pull/32768))" or the correct
repo/PR URL; ensure the sentences mentioning PARSER.fmt.5, Kimi K2, and any
other PR references consistently include the clickable link to satisfy the
Markdown guideline.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 75224add-ba0e-4dc8-a6ae-d509f3be65f2
📒 Files selected for processing (2)
lib/parsers/PARSER_CASES.mdlib/parsers/VLLM_TEST_AUDIT.md
55fb950 to
e03a365
Compare
ayushag-nv
left a comment
There was a problem hiding this comment.
Please remove VLLM_TEST_AUDIT.md. LGTM.
Overview
Output of DIS-1926 — bidirectional diff between vLLM and Dynamo tool-parser test corpora, mapped onto the new
PARSER_CASES.mdtaxonomy (PR #9127). Doc-only changes; no source touched.Unblocks DIS-1906 (cross-impl parser parity harness) by giving it a stable, accurate label set with per-test bucket assignments.
What's in this PR
lib/parsers/PARSER_CASES.md(+96 / −12)Taxonomy refinements driven by gaps the audit surfaced:
PARSER.fmt.1→PARSER.fmt.5. OldCASE.21(and an earlier draft ofPARSER.fmt.1) conflated function-name surface concerns with argument-envelope shape concerns. Now:PARSER.fmt.1— function-name surface only (allowed identifier chars,functions.NAMEvs bareNAME, malformed-ID rejection).PARSER.fmt.5— argument-envelope shape: native call-ID preservation (Kimi K2 PR #32768), JSON field-order tolerance ({name, arguments}vs{arguments, name}),arguments↔parameterskey alias.PARSER.fmt.3examples — beyond Kimi K2 singular vs plural section tokens, document Mistral pre-v11 vs v11+ wire format, Llama 3 with vs without<|python_tag|>, Hermesqwen25registry alias.Known production gapssection — flags Mistral v11+ wire format ([TOOL_CALLS]name{...args}name-then-object) as a parser-implementation gap. Dynamo'sToolCallConfig::mistral()currently only handles pre-v11 (JSON-arraybody); vLLM tests v11 extensively. v11 is the current Mistral-Small / Mistral-Large production path.
test_regex_timeout_handlingforllama3_json/llama4_pythonic/pythonicand*_streaming_exception_returns_nonefor Mistral; Dynamo relies onRust regex linear-time guarantees but does not pin failure-containment paths.
PARSER.batch.1happy-path →PARSER.fmt.5native-ID sub-axis.Summary by CodeRabbit