docs(parity): expand harness to all 19 Dynamo parser families by keivenchang · Pull Request #9261 · ai-dynamo/dynamo

keivenchang · 2026-05-07T16:40:19Z

Overview:

#9186 shipped the M2 cross-impl parity harness with 7 parser families; this PR adds the 12 remaining families Dynamo registers that have at least one upstream peer, bringing the matrix to full registry coverage.

Details:

How is this fixed? New fixtures for pythonic, gemma4, deepseek_v3 (legacy), deepseek_v4, deepseek_v3_2, hermes, qwen25, mistral, jamba, llama3_json, phi4, and nemotron_nano; matching vLLM keys and SGLang detectors wired through tests/parity/parser/{vllm,sglang}.py; fixtures regenerated via the live Dynamo binding; cross-impl divergences recorded as KNOWN_DIVERGENCES xfails.
Divergence categories surfaced: impl-defined recovery contracts on PARSER.batch.4/.5 (malformed args / missing end-token), trailing-text drops on PARSER.batch.8 (XML-style families), and format-detection failures on sglang/mistral/* and sglang/qwen25/* (root cause not investigated here; cells recorded as X so the matrix reflects observed behavior).
No parser-side code change — lib/parsers/ untouched. Doc + harness wiring only.

Coverage matrix (as in `tests/parity/README.md`)

Cell legend: ✓ = both vLLM and SGLang match Dynamo; V = vLLM diverges (xfailed); S = SGLang diverges; VS = both diverge; n/a = no peer parser. Column legend: bN = PARSER.batch.N (1 happy-path, 2 multiple, 3 no-tool, 4 malformed, 5 missing-end-token, 6 empty-args, 7 complex-args, 8 interleaved-text, 9 empty, 10 duplicate).

family	b1	b2	b3	b4	b5	b6	b7	b8	b9	b10
Top-N models
deepseek_v4 § (DeepSeek V4)	✓	✓	✓	✓	V	✓	V	✓	✓	✓
gemma4 § (Gemma 4)	✓	✓	✓	✓	✓	✓	✓	V	✓	✓
glm47 (GLM 5.1)	✓	✓	✓	✓	VS	✓	✓	V	✓	✓
harmony † (gpt-oss)	S	S	✓	S	S	S	S	S	✓	S
kimi_k2 (Kimi K2.6)	✓	✓	✓	S	✓	✓	✓	VS	✓	✓
minimax_m2 (MiniMax 2.7)	✓	✓	✓	V	VS	✓	✓	VS	✓	✓
nemotron_deci †§ (Nemotron)	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
qwen3_coder (Qwen 3.5)	✓	✓	✓	✓	✓	✓	✓	VS	✓	✓
Others
deepseek_v3	✓	✓	✓	V	VS	✓	✓	S	✓	✓
deepseek_v3_1	✓	✓	✓	V	VS	✓	✓	S	✓	✓
deepseek_v3_2	✓	✓	✓	✓	V	✓	V	S	✓	✓
hermes	✓	✓	✓	VS	✓	✓	✓	V	✓	✓
jamba §	✓	✓	✓	✓	V	✓	✓	V	✓	✓
llama3_json §	✓	V	✓	V	V	✓	✓	✓	✓	V
mistral	S	S	✓	VS	S	S	S	VS	✓	S
nemotron_nano †§	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a	n/a
phi4 §	✓	✓	✓	✓	V	✓	V	V	✓	✓
pythonic	✓	✓	✓	✓	✓	✓	✓	VS	✓	✓
qwen25 †	S	S	✓	S	S	S	S	S	✓	S

† = no vLLM peer (or vLLM returns UNAVAILABLE at runtime — harmony needs token IDs); § = no SGLang peer.

Verification:

Smoke against pinned containers — vllm 120/30/40, sglang 83/37/70, dynamo 190/190 (passed/xfailed/skipped); skips are families without a peer on that side.

Where should the reviewer start?

tests/parity/README.md (matrix), then tests/parity/parser/test_parity_parser.py (KNOWN_DIVERGENCES block).

Related Issues:

Relates to DIS-1906

/coderabbit profile chill

coderabbitai · 2026-05-07T16:42:57Z

Walkthrough

This PR updates the parity test suite documentation to reflect expanded parser family coverage. The README was revised to display three additional families in the results table, increase aggregate cell counts from 140 to 200, and adjust narrative text to correctly reference the broader fixture corpus and its implications for test case overlap.

Changes

Parser Family Coverage Documentation Update

Layer / File(s)	Summary
Parity Status Table `tests/parity/README.md`	The "Current parity status (M2, batch mode)" results table extended to include `pythonic`, `gemma4`, and `deepseek_v3` families; aggregate cell count increased from 140 to 200 excluding Dynamo oracle, with corresponding revisions to parity, xfail, and n/a tallies.
Documentation References `tests/parity/README.md`	Narrative text updated to state reviewers can grep case IDs across all 10 families; overlap metric against Rust suite revised from ~70 to ~100 cases to align with current fixture expansion.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title accurately reflects the main change—expanding parity harness coverage to all 19 Dynamo parser families—matching the PR objectives.
Description check	✅ Passed	The PR description provides comprehensive details including overview, implementation specifics, coverage matrix, verification results, and related issues.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/parity/README.md (2)
329-329: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the case count to match the expanded coverage.

Line 329 still references "~70 black-box tests", which was accurate for 7 families × 10 cases. With 10 families now covered, this should be ~100 to match line 380's updated reference.
📝 Proposed fix
-— for the ~70 black-box "given input X, parser returns Y" tests,
+— for the ~100 black-box "given input X, parser returns Y" tests,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/parity/README.md` at line 329, Update the README text that still reads
"~70 black-box \"given input X, parser returns Y\" tests" to reflect the
expanded coverage; change that phrase to "~100 black-box \"given input X, parser
returns Y\" tests" (or "about 100") so it matches the updated reference on line
380 and the current 10 families × 10 cases count; locate the exact string in
tests/parity/README.md and replace it accordingly.
102-102: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the test count to reflect the expanded coverage.

The cost metric still references 210 tests, which was correct for 7 families (7 families × 3 implementations × 10 cases = 210). With the addition of pythonic, gemma4, and deepseek_v3, the total is now 10 families × 3 implementations × 10 cases = 300 tests.
📊 Proposed fix
-| **Cost** | (TBD) | ~3 s for 210 tests | ~60 s for 30 tests (server boot dominates) |
+| **Cost** | (TBD) | ~3 s for 300 tests | ~60 s for 30 tests (server boot dominates) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/parity/README.md` at line 102, The README table's Cost row still lists
"210 tests"; update the test count to reflect the new total of 300 tests (10
families × 3 implementations × 10 cases) by replacing "210 tests" with "300
tests" in the table cell (the line starting with "| **Cost** | (TBD) | ~3 s for
210 tests | ..."). Ensure the text exactly reflects "300 tests".

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/parity/README.md`:
- Line 329: Update the README text that still reads "~70 black-box \"given input
X, parser returns Y\" tests" to reflect the expanded coverage; change that
phrase to "~100 black-box \"given input X, parser returns Y\" tests" (or "about
100") so it matches the updated reference on line 380 and the current 10
families × 10 cases count; locate the exact string in tests/parity/README.md and
replace it accordingly.
- Line 102: The README table's Cost row still lists "210 tests"; update the test
count to reflect the new total of 300 tests (10 families × 3 implementations ×
10 cases) by replacing "210 tests" with "300 tests" in the table cell (the line
starting with "| **Cost** | (TBD) | ~3 s for 210 tests | ..."). Ensure the text
exactly reflects "300 tests".

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f478b20b-d2f4-4675-9294-b97fade3e24e

📥 Commits

Reviewing files that changed from the base of the PR and between f5a9463 and a7d8090.

📒 Files selected for processing (1)

tests/parity/README.md

keivenchang · 2026-05-07T18:24:36Z

@coderabbitai both outside-diff findings caught — fixed in 51ddee6, but with larger numbers than your suggestions because the matrix grew further in-PR after a7d8090:

Line 102 cost row: 210 → 570 (19 families × 3 impls × 10 cases, not 300 — added hermes, qwen25, mistral, jamba, llama3_json, phi4, nemotron_nano, deepseek_v3_2 after your snapshot).
Line 346 black-box test count: ~70 → ~190 (19 families × 10 cases, not ~100).

Same fix shape, scaled up. Please verify and resolve.

coderabbitai · 2026-05-07T18:25:14Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check current state of the README around the claimed fix locations
grep -n "570\|210\|300\|cost\|190\|~70\|~100\|families" tests/parity/README.md | head -60
echo "---LINE 95-115---"
sed -n '95,115p' tests/parity/README.md
echo "---LINE 340-355---"
sed -n '340,355p' tests/parity/README.md

Length of output: 2558

@keivenchang: Both fixes verified in the current branch ✅

Line 102 → ~3 s for 570 tests (19 × 3 × 10 = 570) ✓
Line 346 → ~190 black-box "given input X, parser returns Y" tests (19 × 10 = 190) ✓

One minor stale reference I noticed while scanning: Line 250 still reads "across all 10 families" while Line 144 now correctly states "19 parser families total". Not blocking, but worth a one-word fix if you do another pass.

Resolving the original comments now.

[resolve]

furionw

Thank you

#9186 shipped the M2 cross-impl parity harness with 7 parser families; this PR adds the 12 remaining families Dynamo registers that have at least one upstream peer in vLLM or SGLang, bringing the matrix to full registry coverage. * New family fixtures (10 cases each): pythonic, gemma4, deepseek_v3 (legacy), deepseek_v4, deepseek_v3_2, hermes, qwen25, mistral, jamba, llama3_json, phi4, nemotron_nano. Two with no upstream peer (nemotron_deci, nemotron_nano) listed for completeness. * Wrapper map additions in tests/parity/parser/{vllm,sglang}.py for the matching peer parsers / detectors. * KNOWN_DIVERGENCES: ~50 new entries from smoke runs in vllm + sglang containers. Three categories: impl-defined recovery contracts on PARSER.batch.4/5 (malformed args / missing end-token), trailing-text drops on PARSER.batch.8 (XML-style families), and format-detection failures on sglang/mistral/* and sglang/qwen25/* (those detectors return calls=[] on the bare wire formats Dynamo emits — root cause not investigated here, cells recorded as X so the matrix reflects observed behavior). * README matrix restructured into a single table with bold Top-N models / Others section dividers; family names use compact bN column headers with a bN = PARSER.batch.N legend above; row labels carry (model name) annotations on the Top-N tier; whole-impl-na rows carry footnote markers (no vLLM peer / no SGLang peer). No parser-side code change — lib/parsers/ untouched. Doc + harness wiring only. Smoke against pinned containers (vllm 0.20.1, sglang 0.5.10.post1): * vllm: 120 passed / 30 xfailed / 40 skipped * sglang: 83 passed / 37 xfailed / 70 skipped * dynamo: 190/190 passed (oracle) Tally (excluding dynamo): 380 cells across 19 families x 10 cases x 2 impls -- 203 parity, 67 divergence (xfailed), 110 n/a. Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

keivenchang self-assigned this May 7, 2026

pull-request-size Bot added the size/S label May 7, 2026

github-actions Bot added documentation Improvements or additions to documentation docs labels May 7, 2026

dynamo-ops reviewed May 7, 2026

View reviewed changes

Comment thread tests/parity/README.md Outdated

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

pull-request-size Bot added size/M and removed size/S labels May 7, 2026

keivenchang changed the title ~~docs(parity): refresh coverage matrix for 3 families merged in #9186~~ docs(parity): refresh coverage matrix for 3 families May 7, 2026

keivenchang changed the title ~~docs(parity): refresh coverage matrix for 3 families~~ docs(parity): refresh coverage matrix for parser families May 7, 2026

keivenchang changed the title ~~docs(parity): refresh coverage matrix for parser families~~ docs(parser): refresh coverage matrix for parser families May 7, 2026

keivenchang requested review from a team as code owners May 7, 2026 18:00

pull-request-size Bot added size/L and removed size/M labels May 7, 2026

copy-pr-bot Bot temporarily deployed to GITLAB May 7, 2026 18:00 Inactive

pull-request-size Bot added size/XXL and removed size/L labels May 7, 2026

copy-pr-bot Bot temporarily deployed to GITLAB May 7, 2026 18:15 Inactive

keivenchang changed the title ~~docs(parser): refresh coverage matrix for parser families~~ docs(parity): expand harness to all 19 Dynamo parser families May 7, 2026

copy-pr-bot Bot temporarily deployed to GITLAB May 7, 2026 18:44 Inactive

furionw approved these changes May 7, 2026

View reviewed changes

keivenchang force-pushed the keivenchang/DIS-1906__matrix-update-3-new-families branch from 905c817 to 3abad8a Compare May 7, 2026 18:56

copy-pr-bot Bot temporarily deployed to GITLAB May 7, 2026 18:56 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 7, 2026 19:48 Inactive

keivenchang marked this pull request as draft May 8, 2026 02:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(parity): expand harness to all 19 Dynamo parser families#9261

docs(parity): expand harness to all 19 Dynamo parser families#9261
keivenchang wants to merge 1 commit intomainfrom
keivenchang/DIS-1906__matrix-update-3-new-families

keivenchang commented May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

keivenchang commented May 7, 2026

Uh oh!

coderabbitai Bot commented May 7, 2026

Uh oh!

furionw left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

keivenchang commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Coverage matrix (as in tests/parity/README.md)

Verification:

Where should the reviewer start?

Related Issues:

Uh oh!

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

keivenchang commented May 7, 2026

Uh oh!

coderabbitai Bot commented May 7, 2026

Uh oh!

furionw left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

keivenchang commented May 7, 2026 •

edited

Loading

Coverage matrix (as in `tests/parity/README.md`)

coderabbitai Bot commented May 7, 2026 •

edited

Loading