Skip to content

test: split vLLM tests by component for parallel CI fan-out#9293

Open
nv-rinig wants to merge 10 commits intomainfrom
rini/ops-4856
Open

test: split vLLM tests by component for parallel CI fan-out#9293
nv-rinig wants to merge 10 commits intomainfrom
rini/ops-4856

Conversation

@nv-rinig
Copy link
Copy Markdown

@nv-rinig nv-rinig commented May 7, 2026

Overview:

Split the vLLM test suite into component-based buckets (core, multimodal, router, kvbm) so CI can fan out one job per (backend × component), reducing PR pipeline time. Adds the core marker, tags vLLM tests in tests/** with exactly one component marker, and updates pr.yaml to run 9 vLLM test jobs in parallel instead of 2 sequential ones.

This is the first cut of OPS-4856. Scope is limited to tests/** and vLLM only — sglang/trtllm and components/src/dynamo/<backend>/tests/** are deferred to follow-up PRs.

Details:

New marker: core

  • Registered in pyproject.toml.
  • Documented in tests/README.md and .ai/pytest-guidelines.md (a component marker is now required for backend tests).

Tests tagged in tests/** — every vLLM test now carries exactly one of {core, multimodal, router, kvbm}:

  • multimodaltests/serve/test_vllm_omni.py, multimodal configs in test_vllm.py/test_vllm_xpu.py, factory tag in tests/utils/multimodal.py.
  • routertest_router_e2e_with_vllm.py, agg-router* configs, test_lora_aggregated_router.
  • kvbm — backfilled missing markers across tests/kvbm_integration/ and tests/dependencies/test_kvbm_imports.py.
  • core — residual: serve, frontend, fault_tolerance, predownload, parser, canary, etc.

CI changes (.github/workflows/pr.yaml) — 2 vLLM test jobs → 9:

  • vllm-test-cpu — gpu_0 only (renamed from vllm-test).
  • vllm-test-{core,multimodal,router,kvbm} — gpu_1 fan-out.
  • vllm-multi-gpu-{core,multimodal,router,kvbm} — (gpu_2 or gpu_4) fan-out.
  • run_gpu_parallel_tests is enabled for buckets that contain profiled_vram_gib-tagged tests (core, multimodal, router); kvbm has none, so it stays sequential.

Misc cleanup bundled in:

  • Bumped black pre-commit pin from 23.1.026.3.1 (older versions crash on Python 3.12+ via removed ast.Str).
  • Added warning filters in pyproject.toml for RequestsDependencyWarning and a blanket DeprecationWarning ignore so collection survives transient dep mismatches on newer Pythons.
  • Replaced one usage of deprecated asyncio.iscoroutinefunction with inspect.iscoroutinefunction in lib/bindings/python/tests/test_deprecated_enable_nats.py (Python 3.16 prep).

Where should the reviewer start?

  1. pyproject.tomlcore marker registration plus warning-filter additions.
  2. .github/workflows/pr.yaml — the new 9-job vLLM fan-out structure. Most reviewable risk lives here.
  3. .ai/pytest-guidelines.md — convention update so contributors and AI agents writing new tests follow the component-marker rule going forward.
  4. Per-file marker additions — mostly mechanical; heaviest are tests/serve/test_vllm.py / test_vllm_xpu.py (per-config marks) and tests/fault_tolerance/migration/test_vllm.py.

Related Issues:

  • Addresses Linear ticket OPS-4856

Follow-up TODOs (not in this PR):

  • Apply the same component split to sglang/trtllm.
  • Tag components/src/dynamo/<backend>/tests/** (intentionally out of OPS-4856 scope).
  • Possibly peel fault_tolerance off into its own bucket once we measure runtime — core ∩ fault_tolerance is ~397 of 526 core tests, mostly the migration parametrize matrix.

Open in Devin Review

Summary by CodeRabbit

Release Notes

  • New Features

    • Added new multimodal inference tests for base64-encoded image URLs and frontend decoding paths
  • Testing & CI

    • Reorganized CI pipeline with component-focused test jobs (CPU, core, multimodal, router, KVBM)
    • Introduced new test component marker system for improved test categorization and organization
  • Chores

    • Updated code formatting tool to latest version

@nv-rinig nv-rinig requested review from a team as code owners May 7, 2026 23:45
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added test documentation Improvements or additions to documentation actions xpu labels May 7, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

Walkthrough

This pull request introduces a mandatory pytest component marker system (core, multimodal, router, kvbm) to classify tests by functionality, refactors the CI pipeline to fan out vLLM test jobs per component, and applies these markers across the test suite to enable component-specific test execution in CI.

Changes

Pytest Component Marker System & CI Pipeline Refactoring

Layer / File(s) Summary
Marker Definition & Configuration
pyproject.toml
Adds core pytest marker definition and ignores RequestsDependencyWarning and DeprecationWarning to prevent test failures from transitive dependency warnings.
Marker Documentation & Guidelines
.ai/pytest-guidelines.md, tests/README.md
Documents mandatory component marker requirement (pick exactly one of core/multimodal/router/kvbm), clarifies relationship to framework markers (vllm/trtllm/sglang), explains CI behavior when framework marker present without component marker, and updates marker reference examples.
CI Pipeline Job Decomposition
.github/workflows/pr.yaml
Replaces single vllm-test and vllm-multi-gpu-test jobs with component-specific jobs: vllm-test-cpu, vllm-test-{core,multimodal,router,kvbm}, and vllm-multi-gpu-{core,multimodal,router,kvbm}. Updates backend-status-check, image-copy, cleanup, and reporting job dependencies accordingly.
Core Component Test Tagging
tests/basic/*, tests/fault_tolerance/*, tests/frontend/*, tests/gpu_memory_service/*, tests/serve/test_vllm.py, tests/test_predownload_models.py
Applies @pytest.mark.core to test modules and VLLMConfig entries across fault-tolerance, frontend, GPU memory, and serve test suites to classify tests exercising core/runtime functionality.
Multimodal/Router/KVBM Test Tagging
tests/kvbm_integration/*, tests/parity/parser/test_parity_parser.py, tests/serve/test_vllm.py, tests/serve/test_vllm_omni.py, tests/serve/test_vllm_xpu.py
Applies @pytest.mark.multimodal, @pytest.mark.router, and @pytest.mark.kvbm markers to test files, test functions, and VLLMConfig entries for feature-specific test execution. Introduces new multimodal base64 end-to-end tests in serve suite and updates parity parser to use multi-line marker format.
Tool Updates & Utilities
.pre-commit-config.yaml, lib/bindings/python/tests/test_deprecated_enable_nats.py, tests/utils/multimodal.py
Updates black hook from 23.1.0 to 26.3.1 for Python 3.12 compatibility, switches coroutine detection from asyncio.iscoroutinefunction to inspect.iscoroutinefunction, and extends multimodal config utility to automatically apply pytest.mark.multimodal marker.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main objective: splitting vLLM tests by component for parallel CI fan-out, which aligns with the primary changes across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description is comprehensive and follows the template structure, providing overview, detailed changes, reviewer guidance, and related issues.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
tests/kvbm_integration/test_consolidator_router_e2e.py (1)

36-57: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Dead pytestmark — the intended vllm/trtllm markers never reach the effective marker set, silently excluding these tests from CI component jobs

Python last-write-wins: the assignment at lines 36-40 is overwritten by the one at lines 50-57. The effective pytestmark is therefore the one at lines 50-57, which contains neither pytest.mark.vllm nor pytest.mark.trtllm. None of the test methods in this file carry those framework markers either.

As a result, any CI job that filters with -m "vllm and kvbm" or -m "trtllm and kvbm" will silently skip every test in this file — exactly the opposite of what the PR intends.

🐛 Proposed fix
-pytestmark = [
-    pytest.mark.kvbm,
-    pytest.mark.vllm,
-    pytest.mark.trtllm,
-]
-
 # Build list of available engines for parameterization
 AVAILABLE_ENGINES = []
 if HAS_VLLM:
     AVAILABLE_ENGINES.append("vllm")
 if HAS_TRTLLM:
     AVAILABLE_ENGINES.append("trtllm")

-# Test markers
 pytestmark = [
     pytest.mark.kvbm,
     pytest.mark.e2e,
     pytest.mark.slow,
     pytest.mark.gpu_1,
     pytest.mark.pre_merge,
+    pytest.mark.vllm,
+    pytest.mark.trtllm,
     pytest.mark.skipif(not (HAS_VLLM or HAS_TRTLLM), reason="requires vllm or trtllm"),
 ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/kvbm_integration/test_consolidator_router_e2e.py` around lines 36 - 57,
The file defines two separate pytestmark assignments so the initial
pytest.mark.vllm / pytest.mark.trtllm assignment is overwritten; update the
final pytestmark construction to include the vllm/trtllm markers conditionally
instead of reassigning, e.g. keep AVAILABLE_ENGINES logic and then build the
single pytestmark list (or extend the existing pytestmark list) by adding
pytest.mark.vllm when HAS_VLLM is true and pytest.mark.trtllm when HAS_TRTLLM is
true so the final pytestmark used by the tests contains those framework markers.
tests/README.md (1)

124-137: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the non-vLLM examples to include a component marker.

This table now says backend tests must carry exactly one of core|multimodal|router|kvbm, but the SGLang/TRT-LLM examples below still show framework-only marker blocks. That teaches authors a shape that CI component jobs will silently drop.

As per coding guidelines, .ai/pytest-guidelines.md: "if a test has only a framework marker but no component marker, it’s silently dropped from CI component bucket jobs."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/README.md` around lines 124 - 137, The README examples for non-vLLM
frameworks (e.g., the SGLang and TRT-LLM example blocks) lack a required
component marker; update those example test snippets to include exactly one
component marker (choose from pytest.mark.core, pytest.mark.multimodal,
pytest.mark.router, or pytest.mark.kvbm) alongside the existing framework
markers (e.g., pytest.mark.sglang, pytest.mark.trtllm) so they match the table
guidance and CI component-bucketing rules.
tests/serve/test_vllm_xpu.py (1)

615-620: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add model and sizing markers to the standalone multimodal tests.

Both tests build VLLMConfig(..., marks=[]) inside the body, so params_with_model_mark() never annotates them. That means predownload_models cannot prefetch the checkpoint, and the scheduler has no VRAM/KV metadata for these runs.

As per coding guidelines, tests/**/*.py: "Use the @pytest.mark.model("org/model-name") marker to declare HF models used in tests" and "Tests that load a model must carry both @pytest.mark.profiled_vram_gib(N) marker and a framework-specific KV marker ...".

Also applies to: 672-677

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/serve/test_vllm_xpu.py` around lines 615 - 620, The two standalone
multimodal tests that construct VLLMConfig(...) inside their bodies lack
external model and sizing markers, so params_with_model_mark() cannot annotate
them; update each test (the ones marked with `@pytest.mark.vllm`
`@pytest.mark.multimodal` `@pytest.mark.e2e` `@pytest.mark.xpu_2` `@pytest.mark.nightly`
...) to include a pytest model marker (e.g.,
`@pytest.mark.model`("hf-org/model-name")) plus the required sizing markers:
`@pytest.mark.profiled_vram_gib`(N) with the correct GiB value and the
framework-specific KV marker used in our repo for vllm (e.g.,
`@pytest.mark.kv_vllm` or the equivalent), so predownload_models and the scheduler
receive checkpoint and VRAM/KV metadata; leave the VLLMConfig(...) construction
inside the test body unchanged.
🧹 Nitpick comments (2)
tests/test_predownload_models.py (1)

25-56: ⚡ Quick win

sglang and trtllm parametrize variants are missing a component marker

The vllm variants now have pytest.mark.core, but the sglang and trtllm variants (lines 33–38 and 49–55) still carry only their framework markers with no component marker. Per the coding guidelines, any test with a vllm/trtllm/sglang framework marker must have exactly one component marker. These predownload tests belong in core.

🔧 Proposed fix
         pytest.param(
             "predownload_models_sglang_gpu1",
-            marks=[pytest.mark.sglang, pytest.mark.e2e, pytest.mark.gpu_1],
+            marks=[pytest.mark.sglang, pytest.mark.core, pytest.mark.e2e, pytest.mark.gpu_1],
         ),
         pytest.param(
             "predownload_models_trtllm_gpu1",
-            marks=[pytest.mark.trtllm, pytest.mark.e2e, pytest.mark.gpu_1],
+            marks=[pytest.mark.trtllm, pytest.mark.core, pytest.mark.e2e, pytest.mark.gpu_1],
         ),
         pytest.param(
             "predownload_models_sglang_gpu2",
-            marks=[pytest.mark.sglang, pytest.mark.e2e, pytest.mark.gpu_2],
+            marks=[pytest.mark.sglang, pytest.mark.core, pytest.mark.e2e, pytest.mark.gpu_2],
         ),
         pytest.param(
             "predownload_models_trtllm_gpu2",
-            marks=[pytest.mark.trtllm, pytest.mark.e2e, pytest.mark.gpu_2],
+            marks=[pytest.mark.trtllm, pytest.mark.core, pytest.mark.e2e, pytest.mark.gpu_2],
         ),

As per coding guidelines: "When a test has a framework marker (vllm, trtllm, or sglang), it must have exactly one component marker: multimodal, router, kvbm, or core."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_predownload_models.py` around lines 25 - 56, The sglang/trtllm
parametrized test entries ("predownload_models_sglang_gpu1",
"predownload_models_trtllm_gpu1", "predownload_models_sglang_gpu2",
"predownload_models_trtllm_gpu2") are missing a component marker; update each
pytest.param's marks list to include pytest.mark.core so that every test with a
framework marker (sglang or trtllm) has exactly one component marker (core)
matching the vllm entries.
.ai/pytest-guidelines.md (1)

328-332: ⚡ Quick win

The "Example" section at the end of the file still shows a non-compliant pattern

The GOOD block in the timeout section (lines 328-332) now correctly shows @pytest.mark.vllm + @pytest.mark.core, but the standalone "Example" section at lines 348–357 still has only @pytest.mark.vllm with no component marker — contradicting the new rule documented just above it. A developer skimming to the example section will copy a non-compliant template.

📝 Proposed fix for the example section
 `@pytest.mark.pre_merge`
 `@pytest.mark.gpu_1`
 `@pytest.mark.e2e`
 `@pytest.mark.vllm`
+@pytest.mark.core  # component bucket: pick one of core, multimodal, router, kvbm
 `@pytest.mark.model`("Qwen/Qwen3-0.6B")
 `@pytest.mark.timeout`(300)
 def test_vllm_aggregated(start_serve_deployment):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.ai/pytest-guidelines.md around lines 328 - 332, The example block under the
"Example" section still shows only `@pytest.mark.vllm` and must be updated to
include a component marker per the rule; update the standalone example to match
the GOOD pattern by adding a component marker (e.g., `@pytest.mark.core` or the
appropriate component such as multimodal/router/kvbm) alongside
`@pytest.mark.vllm` in that example snippet so the template is compliant with the
documented requirement.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/pr.yaml:
- Around line 625-633: The vllm-copy-to-acr job is allowed to run once only CPU
tests pass, so add the GPU-split test jobs as dependencies and require them to
be successful or skipped before promoting the image; update the job's needs list
to include vllm-test-core, vllm-test-router, vllm-test-kvbm (and any other
vllm-test-* GPU jobs) and extend the if condition to check that each of those
needs.<result> is 'success' or 'skipped' (similar to how vllm-test-cpu is
checked) so the job cannot proceed until all split GPU bucket tests finish
validating the PR.

In `@pyproject.toml`:
- Around line 236-238: Remove the blanket "ignore::DeprecationWarning" pytest
filter and replace it with targeted ignores for specific third-party warnings
(for example by adding entries like "ignore:SomeThirdPartyClassName:module.path"
or "ignore:deprecated function X:third.party.module") so that pytest keeps
deprecation warnings from our own code as errors; locate the existing filter
string "ignore::DeprecationWarning" in pyproject.toml and replace it with
narrowly-scoped "ignore:" entries and/or comments explaining each exception per
the project guideline.
- Line 273: The pytest `core` marker is registered in pyproject.toml but not
added in tests/conftest.py, causing strict marker enforcement to fail; update
the pytest_configure function in tests/conftest.py to call
config.addinivalue_line for the "core: marks tests that exercise core/runtime
functionality (residual bucket — disjoint from multimodal/router/kvbm/planner)"
marker alongside the existing addinivalue_line calls so the marker is defined in
both places (look for the pytest_configure function and the other
config.addinivalue_line entries to mirror).

In `@tests/kvbm_integration/test_chunked_prefill.py`:
- Around line 20-24: The first pytestmark assignment (the list containing
pytest.mark.kvbm and pytest.mark.vllm) is dead because it gets overwritten later
by another pytestmark; remove the redundant early assignment so there is only
one pytestmark in the module, or merge any missing markers into the existing
pytestmark variable; specifically update/remove the initial pytestmark
declaration so only the later pytestmark remains (refer to the pytestmark
variable in this test module).

In `@tests/kvbm_integration/test_kvbm_vllm_integration.py`:
- Around line 23-27: The module-level pytest mark assignment named pytestmark is
declared twice in this file (the earlier assignment is immediately overwritten
by the later one), so remove the dead redundancy by deleting the first
pytestmark = [ pytest.mark.kvbm, pytest.mark.vllm ] block; keep the later
pytestmark that already includes both pytest.mark.kvbm and pytest.mark.vllm so
the tests retain the intended marks (look for the pytestmark variable in this
test_kvbm_vllm_integration.py module to locate and remove the redundant earlier
assignment).

In `@tests/README.md`:
- Line 117: The README table's "Test Type [required]" cell currently lists
"unit, integration, e2e, benchmark, performance, stress" which incorrectly
treats benchmark/performance/stress as required; change that cell so the
required set only includes "unit, integration, e2e" and move or mention
"benchmark, performance, stress" as optional/extra markers elsewhere in the row
or a footnote to match the repo rule that every test must have at least one of
unit/integration/e2e; update the table cell content string "unit, integration,
e2e, benchmark, performance, stress" to "unit, integration, e2e" and add a short
parenthetical or separate column/footnote listing the extra markers.

In `@tests/serve/test_vllm_xpu.py`:
- Around line 483-488: The test's marks list (in tests/serve/test_vllm_xpu.py
where pytest marks like pytest.mark.xpu_1, pytest.mark.multimodal etc. are
declared for the multimodal_video_agg test) is missing the required resource
profiling markers; add pytest.mark.profiled_vram_gib(<appropriate_gib>) and the
framework KV cache marker (e.g.,
pytest.mark.requested_vllm_kv_cache_bytes(<appropriate_bytes>)) to the same
marks list so the scheduler can safely size and budget the model during
pre-merge runs.

In `@tests/serve/test_vllm.py`:
- Around line 683-807: Both standalone tests (test_multimodal_b64 and
test_multimodal_b64_frontend_decoding) build VLLMConfig with marks=[] so
params_with_model_mark() is skipped; add the missing pytest markers: annotate
each test function with `@pytest.mark.model`("org/model-name") using the
appropriate HF model string (e.g., same strings passed to
VLLMConfig.model/script_args) and add `@pytest.mark.profiled_vram_gib`(N) plus the
framework KV marker `@pytest.mark.requested_vllm_kv_cache_bytes`(BYTES) with
suitable values for the GPU fan-out; update the decorators above
test_multimodal_b64 to include the model and both VRAM/KV markers and likewise
add `@pytest.mark.model` and the VRAM/KV markers above
test_multimodal_b64_frontend_decoding to ensure params_with_model_mark() runs
and the new GPU fan-out receives the required metadata.
- Around line 683-807: The tests use base64.b64encode and
get_multimodal_test_image_bytes but those symbols are not imported; add the
missing imports (import base64 and the get_multimodal_test_image_bytes helper
from its test helpers module) at the top of the module so base64.b64encode(...)
and get_multimodal_test_image_bytes() resolve and Ruff F821 is fixed, keeping
imports grouped at module-level per guidelines.

---

Outside diff comments:
In `@tests/kvbm_integration/test_consolidator_router_e2e.py`:
- Around line 36-57: The file defines two separate pytestmark assignments so the
initial pytest.mark.vllm / pytest.mark.trtllm assignment is overwritten; update
the final pytestmark construction to include the vllm/trtllm markers
conditionally instead of reassigning, e.g. keep AVAILABLE_ENGINES logic and then
build the single pytestmark list (or extend the existing pytestmark list) by
adding pytest.mark.vllm when HAS_VLLM is true and pytest.mark.trtllm when
HAS_TRTLLM is true so the final pytestmark used by the tests contains those
framework markers.

In `@tests/README.md`:
- Around line 124-137: The README examples for non-vLLM frameworks (e.g., the
SGLang and TRT-LLM example blocks) lack a required component marker; update
those example test snippets to include exactly one component marker (choose from
pytest.mark.core, pytest.mark.multimodal, pytest.mark.router, or
pytest.mark.kvbm) alongside the existing framework markers (e.g.,
pytest.mark.sglang, pytest.mark.trtllm) so they match the table guidance and CI
component-bucketing rules.

In `@tests/serve/test_vllm_xpu.py`:
- Around line 615-620: The two standalone multimodal tests that construct
VLLMConfig(...) inside their bodies lack external model and sizing markers, so
params_with_model_mark() cannot annotate them; update each test (the ones marked
with `@pytest.mark.vllm` `@pytest.mark.multimodal` `@pytest.mark.e2e`
`@pytest.mark.xpu_2` `@pytest.mark.nightly` ...) to include a pytest model marker
(e.g., `@pytest.mark.model`("hf-org/model-name")) plus the required sizing
markers: `@pytest.mark.profiled_vram_gib`(N) with the correct GiB value and the
framework-specific KV marker used in our repo for vllm (e.g.,
`@pytest.mark.kv_vllm` or the equivalent), so predownload_models and the scheduler
receive checkpoint and VRAM/KV metadata; leave the VLLMConfig(...) construction
inside the test body unchanged.

---

Nitpick comments:
In @.ai/pytest-guidelines.md:
- Around line 328-332: The example block under the "Example" section still shows
only `@pytest.mark.vllm` and must be updated to include a component marker per the
rule; update the standalone example to match the GOOD pattern by adding a
component marker (e.g., `@pytest.mark.core` or the appropriate component such as
multimodal/router/kvbm) alongside `@pytest.mark.vllm` in that example snippet so
the template is compliant with the documented requirement.

In `@tests/test_predownload_models.py`:
- Around line 25-56: The sglang/trtllm parametrized test entries
("predownload_models_sglang_gpu1", "predownload_models_trtllm_gpu1",
"predownload_models_sglang_gpu2", "predownload_models_trtllm_gpu2") are missing
a component marker; update each pytest.param's marks list to include
pytest.mark.core so that every test with a framework marker (sglang or trtllm)
has exactly one component marker (core) matching the vllm entries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d3701ef3-c615-49cb-b96a-4a610499caa4

📥 Commits

Reviewing files that changed from the base of the PR and between 73bc969 and 6ed1ec7.

📒 Files selected for processing (30)
  • .ai/pytest-guidelines.md
  • .github/workflows/pr.yaml
  • .pre-commit-config.yaml
  • lib/bindings/python/tests/test_deprecated_enable_nats.py
  • pyproject.toml
  • tests/README.md
  • tests/basic/test_cuda_version_consistency.py
  • tests/dependencies/test_kvbm_imports.py
  • tests/fault_tolerance/cancellation/test_vllm.py
  • tests/fault_tolerance/etcd_ha/test_vllm.py
  • tests/fault_tolerance/migration/test_vllm.py
  • tests/fault_tolerance/test_canary_rank_pause.py
  • tests/fault_tolerance/test_vllm_health_check.py
  • tests/frontend/test_prepost.py
  • tests/frontend/test_prepost_mistral.py
  • tests/frontend/test_prompt_embeds.py
  • tests/frontend/test_vllm.py
  • tests/frontend/test_vllm_prepost_integration.py
  • tests/gpu_memory_service/test_quiesce_resume.py
  • tests/gpu_memory_service/test_shadow_failover.py
  • tests/kvbm_integration/test_chunked_prefill.py
  • tests/kvbm_integration/test_consolidator_router_e2e.py
  • tests/kvbm_integration/test_determinism_agg.py
  • tests/kvbm_integration/test_kvbm_vllm_integration.py
  • tests/parity/parser/test_parity_parser.py
  • tests/serve/test_vllm.py
  • tests/serve/test_vllm_omni.py
  • tests/serve/test_vllm_xpu.py
  • tests/test_predownload_models.py
  • tests/utils/multimodal.py

Comment thread .github/workflows/pr.yaml
Comment on lines 625 to 633
vllm-copy-to-acr:
name: vllm-runtime # This name overlaps with other vllm jobs to group them in the UI
needs: [changed-files, vllm-build, vllm-test]
needs: [changed-files, vllm-build, vllm-test-cpu]
if: |
always() && !cancelled() &&
(needs.changed-files.outputs.core == 'true' || needs.changed-files.outputs.vllm == 'true' || needs.changed-files.outputs.deploy == 'true') &&
needs.vllm-build.result == 'success' &&
(needs.vllm-test.result == 'success' || needs.vllm-test.result == 'skipped')
(needs.vllm-test-cpu.result == 'success' || needs.vllm-test-cpu.result == 'skipped')
uses: ./.github/workflows/shared-copy.yml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't promote the vLLM image before the split GPU buckets finish.

vllm-copy-to-acr now advances after CPU-only coverage, so deploy tests can start while vllm-test-core, vllm-test-router, vllm-test-kvbm, etc. are still failing or pending. That burns deploy capacity on an image the PR has not finished validating.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/pr.yaml around lines 625 - 633, The vllm-copy-to-acr job
is allowed to run once only CPU tests pass, so add the GPU-split test jobs as
dependencies and require them to be successful or skipped before promoting the
image; update the job's needs list to include vllm-test-core, vllm-test-router,
vllm-test-kvbm (and any other vllm-test-* GPU jobs) and extend the if condition
to check that each of those needs.<result> is 'success' or 'skipped' (similar to
how vllm-test-cpu is checked) so the job cannot proceed until all split GPU
bucket tests finish validating the PR.

Comment thread pyproject.toml
Comment on lines +236 to +238
# Ignore all DeprecationWarnings — many come from third-party deps we don't control,
# and treating them as errors blocks pytest collection on Python version bumps.
"ignore::DeprecationWarning",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep deprecation filtering targeted.

ignore::DeprecationWarning disables the warning-as-error policy for every deprecation, including ones introduced by our own code. That throws away a lot more signal than the existing per-warning ignores.

As per coding guidelines, pyproject.toml: "Use filterwarnings = ["error"] in pytest configuration with specific ignores for known third-party deprecations; fix root causes for new warnings or add targeted ignores with explanatory comments in pyproject.toml".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` around lines 236 - 238, Remove the blanket
"ignore::DeprecationWarning" pytest filter and replace it with targeted ignores
for specific third-party warnings (for example by adding entries like
"ignore:SomeThirdPartyClassName:module.path" or "ignore:deprecated function
X:third.party.module") so that pytest keeps deprecation warnings from our own
code as errors; locate the existing filter string "ignore::DeprecationWarning"
in pyproject.toml and replace it with narrowly-scoped "ignore:" entries and/or
comments explaining each exception per the project guideline.

Comment thread pyproject.toml
"performance: marks tests as performance tests",
"benchmark: marks tests as benchmark tests",
"none: marks tests that do not require a framework-specific runtime",
"core: marks tests that exercise core/runtime functionality (residual bucket — disjoint from multimodal/router/kvbm/planner)",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

fd -p 'conftest.py' tests | while read -r file; do
  echo "=== $file ==="
  rg -n 'pytest_configure|addinivalue_line|core:' "$file" || true
done

Repository: ai-dynamo/dynamo

Length of output: 688


🏁 Script executed:

# Check the pytest_configure function in tests/conftest.py
cd tests && head -n 250 conftest.py | tail -n 100

Repository: ai-dynamo/dynamo

Length of output: 4320


🏁 Script executed:

# Check pyproject.toml around line 273 for the core marker
head -n 280 pyproject.toml | tail -n 20

Repository: ai-dynamo/dynamo

Length of output: 1859


🏁 Script executed:

# Check if there are addinivalue_line calls for markers in conftest.py
rg -A 2 'addinivalue_line.*markers' tests/conftest.py

Repository: ai-dynamo/dynamo

Length of output: 42


🏁 Script executed:

# Check if any markers are registered in conftest.py at all
rg 'config.addinivalue_line|pytest.mark|pytest.ini' tests/conftest.py -A 1

Repository: ai-dynamo/dynamo

Length of output: 903


🏁 Script executed:

# Check the complete pytest_configure function signature and implementation
rg -A 50 'def pytest_configure' tests/conftest.py | head -n 60

Repository: ai-dynamo/dynamo

Length of output: 2218


Add core marker to tests/conftest.py:pytest_configure.

The core marker is registered in pyproject.toml (line 273) but missing from tests/conftest.py:pytest_configure. Strict marker enforcement requires registration in both locations. Add an addinivalue_line call for the core marker in the pytest_configure function alongside other marker definitions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` at line 273, The pytest `core` marker is registered in
pyproject.toml but not added in tests/conftest.py, causing strict marker
enforcement to fail; update the pytest_configure function in tests/conftest.py
to call config.addinivalue_line for the "core: marks tests that exercise
core/runtime functionality (residual bucket — disjoint from
multimodal/router/kvbm/planner)" marker alongside the existing addinivalue_line
calls so the marker is defined in both places (look for the pytest_configure
function and the other config.addinivalue_line entries to mirror).

Comment on lines +20 to +24
pytestmark = [
pytest.mark.kvbm,
pytest.mark.vllm,
]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Dead pytestmark assignment — overwritten by the existing one at line 47

Same issue as test_kvbm_vllm_integration.py: the assignment at lines 20-23 is overwritten by the one at line 47. The second pytestmark already has both kvbm and vllm, so these lines have no effect.

🔧 Proposed fix
-pytestmark = [
-    pytest.mark.kvbm,
-    pytest.mark.vllm,
-]
-
 # Test configuration
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/kvbm_integration/test_chunked_prefill.py` around lines 20 - 24, The
first pytestmark assignment (the list containing pytest.mark.kvbm and
pytest.mark.vllm) is dead because it gets overwritten later by another
pytestmark; remove the redundant early assignment so there is only one
pytestmark in the module, or merge any missing markers into the existing
pytestmark variable; specifically update/remove the initial pytestmark
declaration so only the later pytestmark remains (refer to the pytestmark
variable in this test module).

Comment on lines +23 to +27
pytestmark = [
pytest.mark.kvbm,
pytest.mark.vllm,
]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Dead pytestmark assignment — overwritten by the existing one at line 48

Python assigns module-level names top-to-bottom; the assignment at lines 23-26 is immediately overwritten when execution reaches line 48. Pytest reads only the final value of module.pytestmark, so these four lines have no effect. The second pytestmark at line 48 already contains both pytest.mark.kvbm and pytest.mark.vllm, making this block entirely redundant.

🔧 Proposed fix
-pytestmark = [
-    pytest.mark.kvbm,
-    pytest.mark.vllm,
-]
-
 HAS_VLLM = check_module_available("vllm")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pytestmark = [
pytest.mark.kvbm,
pytest.mark.vllm,
]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/kvbm_integration/test_kvbm_vllm_integration.py` around lines 23 - 27,
The module-level pytest mark assignment named pytestmark is declared twice in
this file (the earlier assignment is immediately overwritten by the later one),
so remove the dead redundancy by deleting the first pytestmark = [
pytest.mark.kvbm, pytest.mark.vllm ] block; keep the later pytestmark that
already includes both pytest.mark.kvbm and pytest.mark.vllm so the tests retain
the intended marks (look for the pytestmark variable in this
test_kvbm_vllm_integration.py module to locate and remove the redundant earlier
assignment).

Comment thread tests/README.md
|-------------------------|------------------------------------------------------------------|------------------------------------|
| Lifecycle [required] | pre_merge, post_merge, nightly | When the test should run. Aggregate pipeline budgets: pre_merge < 30 min, post_merge < 1 hr, nightly < 3 hr. See [Pipeline Time Budgets](#pipeline-time-budgets). |
| Test Type [required] | unit, integration, e2e, benchmark, performance, stress, multimodal | Nature of the test |
| Test Type [required] | unit, integration, e2e, benchmark, performance, stress | Nature of the test |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep the required test-type set aligned with repo rules.

benchmark, performance, and stress are useful extra markers, but they do not satisfy the required type-marker requirement on their own. Listing them in the required set here makes the table stricter/clearer in the wrong direction.

As per coding guidelines, tests/**/*.py: "Every test must have at least one type marker: unit, integration, or e2e".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/README.md` at line 117, The README table's "Test Type [required]" cell
currently lists "unit, integration, e2e, benchmark, performance, stress" which
incorrectly treats benchmark/performance/stress as required; change that cell so
the required set only includes "unit, integration, e2e" and move or mention
"benchmark, performance, stress" as optional/extra markers elsewhere in the row
or a footnote to match the repo rule that every test must have at least one of
unit/integration/e2e; update the table cell content string "unit, integration,
e2e, benchmark, performance, stress" to "unit, integration, e2e" and add a short
parenthetical or separate column/footnote listing the extra markers.

Comment on lines 483 to 488
marks=[
pytest.mark.xpu_1,
pytest.mark.multimodal,
pytest.mark.pre_merge,
pytest.mark.timeout(600), # TODO: profile to get tighter timeout
], # TODO: profile to get max_vram
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Profile multimodal_video_agg before keeping it in pre_merge.

This config still loads a real model with only a timeout/TODO, so the new component bucket cannot size it safely. Without profiled_vram_gib(...) and requested_vllm_kv_cache_bytes(...), --max-vram-gib treats it as unsafe and the scheduler can't budget it.

As per coding guidelines, tests/**/*.py: "Tests that load a model must carry both @pytest.mark.profiled_vram_gib(N) marker and a framework-specific KV marker ...".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/serve/test_vllm_xpu.py` around lines 483 - 488, The test's marks list
(in tests/serve/test_vllm_xpu.py where pytest marks like pytest.mark.xpu_1,
pytest.mark.multimodal etc. are declared for the multimodal_video_agg test) is
missing the required resource profiling markers; add
pytest.mark.profiled_vram_gib(<appropriate_gib>) and the framework KV cache
marker (e.g., pytest.mark.requested_vllm_kv_cache_bytes(<appropriate_bytes>)) to
the same marks list so the scheduler can safely size and budget the model during
pre-merge runs.

Comment thread tests/serve/test_vllm.py Outdated
Comment on lines +683 to +807
@pytest.mark.vllm
@pytest.mark.multimodal
@pytest.mark.e2e
@pytest.mark.gpu_1
@pytest.mark.nightly
@pytest.mark.model("Qwen/Qwen2.5-VL-7B-Instruct")
@pytest.mark.timeout(360) # Match VLLMConfig.timeout for this multimodal deployment
def test_multimodal_b64(
request,
runtime_services_dynamic_ports,
dynamo_dynamic_ports,
predownload_models,
):
"""
Test multimodal inference with base64 url passthrough.

This test is separate because it loads the required image at runtime
(not collection time), ensuring it only fails when actually executed.

Runs on gpu_1 alongside other single-GPU multimodal tests that use the
same model (mm_agg_qwen2.5-vl-7b). The ``@pytest.mark.model`` mark is
kept as a safety net so the model is predownloaded even if no other
gpu_1 config collects this model in a given CI job.
"""
# Load B64 image at test execution time (uses real PNG even if MULTIMODAL_IMG is LFS pointer)
b64_img = base64.b64encode(get_multimodal_test_image_bytes()).decode()

# Create payload with B64 image
b64_payload = chat_payload(
[
{
"type": "text",
"text": "What colors are in the following image? Respond only with the colors.",
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{b64_img}"},
},
],
repeat_count=1,
expected_response=["purple"],
max_tokens=100,
)

# Create test config
config = VLLMConfig(
name="test_multimodal_b64",
directory=vllm_dir,
script_name="agg_multimodal.sh",
marks=[], # markers at function-level
model="Qwen/Qwen2.5-VL-7B-Instruct",
script_args=["--model", "Qwen/Qwen2.5-VL-7B-Instruct"],
delayed_start=0,
timeout=360,
request_payloads=[b64_payload],
)

config = dataclasses.replace(
config, frontend_port=dynamo_dynamic_ports.frontend_port
)
run_serve_deployment(config, request, ports=dynamo_dynamic_ports)


@pytest.mark.vllm
@pytest.mark.multimodal
@pytest.mark.e2e
@pytest.mark.gpu_1
@pytest.mark.pre_merge
@pytest.mark.timeout(220)
def test_multimodal_b64_frontend_decoding(
request,
runtime_services_dynamic_ports,
dynamo_dynamic_ports,
predownload_models,
):
"""
Test multimodal inference with base64 images through frontend decoding path.

This exercises the Rust frontend image decode + NIXL RDMA transfer path
with inline base64 data: URIs (not HTTP URLs). Verifies that the
strip_inline_data_urls optimization does not break correctness.

HF predownload: same model is already listed via ``@pytest.mark.model`` on
``test_serve_deployment[multimodal_video_agg]`` (pre_merge + gpu_1), so no
extra ``model`` mark is needed here for PR CI.
"""
b64_img = base64.b64encode(get_multimodal_test_image_bytes()).decode()

b64_payload = chat_payload(
[
{
"type": "text",
"text": "What colors are in the following image? Respond only with the colors.",
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{b64_img}"},
},
],
repeat_count=1,
expected_response=["green"],
temperature=0.0,
max_tokens=100,
)

config = VLLMConfig(
name="test_multimodal_b64_frontend_decoding",
directory=vllm_dir,
script_name="agg_multimodal.sh",
marks=[],
model="Qwen/Qwen3-VL-2B-Instruct",
script_args=[
"--model",
"Qwen/Qwen3-VL-2B-Instruct",
"--frontend-decoding",
],
delayed_start=0,
timeout=220,
request_payloads=[b64_payload],
)

config = dataclasses.replace(
config, frontend_port=dynamo_dynamic_ports.frontend_port
)
run_serve_deployment(config, request, ports=dynamo_dynamic_ports)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Decorate the standalone multimodal tests with explicit model and VRAM/KV markers.

Because these tests build VLLMConfig(..., marks=[]) inside the function body, params_with_model_mark() never runs for them. test_multimodal_b64_frontend_decoding is missing the model marker entirely, and both tests are missing the profiled_vram_gib(...) + requested_vllm_kv_cache_bytes(...) pair that the new GPU fan-out relies on.

Suggested fix
 `@pytest.mark.vllm`
 `@pytest.mark.multimodal`
 `@pytest.mark.e2e`
 `@pytest.mark.gpu_1`
 `@pytest.mark.nightly`
 `@pytest.mark.model`("Qwen/Qwen2.5-VL-7B-Instruct")
+@pytest.mark.profiled_vram_gib(19.9)
+@pytest.mark.requested_vllm_kv_cache_bytes(922_354_000)
 `@pytest.mark.timeout`(360)  # Match VLLMConfig.timeout for this multimodal deployment
 def test_multimodal_b64(...):
     ...

 `@pytest.mark.vllm`
 `@pytest.mark.multimodal`
 `@pytest.mark.e2e`
 `@pytest.mark.gpu_1`
 `@pytest.mark.pre_merge`
+@pytest.mark.model("Qwen/Qwen3-VL-2B-Instruct")
+@pytest.mark.profiled_vram_gib(9.6)
+@pytest.mark.requested_vllm_kv_cache_bytes(1_710_490_000)
 `@pytest.mark.timeout`(220)
 def test_multimodal_b64_frontend_decoding(...):
     ...

As per coding guidelines, tests/**/*.py: "Use the @pytest.mark.model("org/model-name") marker to declare HF models used in tests" and "Tests that load a model must carry both @pytest.mark.profiled_vram_gib(N) marker and a framework-specific KV marker ...".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.vllm
@pytest.mark.multimodal
@pytest.mark.e2e
@pytest.mark.gpu_1
@pytest.mark.nightly
@pytest.mark.model("Qwen/Qwen2.5-VL-7B-Instruct")
@pytest.mark.timeout(360) # Match VLLMConfig.timeout for this multimodal deployment
def test_multimodal_b64(
request,
runtime_services_dynamic_ports,
dynamo_dynamic_ports,
predownload_models,
):
"""
Test multimodal inference with base64 url passthrough.
This test is separate because it loads the required image at runtime
(not collection time), ensuring it only fails when actually executed.
Runs on gpu_1 alongside other single-GPU multimodal tests that use the
same model (mm_agg_qwen2.5-vl-7b). The ``@pytest.mark.model`` mark is
kept as a safety net so the model is predownloaded even if no other
gpu_1 config collects this model in a given CI job.
"""
# Load B64 image at test execution time (uses real PNG even if MULTIMODAL_IMG is LFS pointer)
b64_img = base64.b64encode(get_multimodal_test_image_bytes()).decode()
# Create payload with B64 image
b64_payload = chat_payload(
[
{
"type": "text",
"text": "What colors are in the following image? Respond only with the colors.",
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{b64_img}"},
},
],
repeat_count=1,
expected_response=["purple"],
max_tokens=100,
)
# Create test config
config = VLLMConfig(
name="test_multimodal_b64",
directory=vllm_dir,
script_name="agg_multimodal.sh",
marks=[], # markers at function-level
model="Qwen/Qwen2.5-VL-7B-Instruct",
script_args=["--model", "Qwen/Qwen2.5-VL-7B-Instruct"],
delayed_start=0,
timeout=360,
request_payloads=[b64_payload],
)
config = dataclasses.replace(
config, frontend_port=dynamo_dynamic_ports.frontend_port
)
run_serve_deployment(config, request, ports=dynamo_dynamic_ports)
@pytest.mark.vllm
@pytest.mark.multimodal
@pytest.mark.e2e
@pytest.mark.gpu_1
@pytest.mark.pre_merge
@pytest.mark.timeout(220)
def test_multimodal_b64_frontend_decoding(
request,
runtime_services_dynamic_ports,
dynamo_dynamic_ports,
predownload_models,
):
"""
Test multimodal inference with base64 images through frontend decoding path.
This exercises the Rust frontend image decode + NIXL RDMA transfer path
with inline base64 data: URIs (not HTTP URLs). Verifies that the
strip_inline_data_urls optimization does not break correctness.
HF predownload: same model is already listed via ``@pytest.mark.model`` on
``test_serve_deployment[multimodal_video_agg]`` (pre_merge + gpu_1), so no
extra ``model`` mark is needed here for PR CI.
"""
b64_img = base64.b64encode(get_multimodal_test_image_bytes()).decode()
b64_payload = chat_payload(
[
{
"type": "text",
"text": "What colors are in the following image? Respond only with the colors.",
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{b64_img}"},
},
],
repeat_count=1,
expected_response=["green"],
temperature=0.0,
max_tokens=100,
)
config = VLLMConfig(
name="test_multimodal_b64_frontend_decoding",
directory=vllm_dir,
script_name="agg_multimodal.sh",
marks=[],
model="Qwen/Qwen3-VL-2B-Instruct",
script_args=[
"--model",
"Qwen/Qwen3-VL-2B-Instruct",
"--frontend-decoding",
],
delayed_start=0,
timeout=220,
request_payloads=[b64_payload],
)
config = dataclasses.replace(
config, frontend_port=dynamo_dynamic_ports.frontend_port
)
run_serve_deployment(config, request, ports=dynamo_dynamic_ports)
`@pytest.mark.vllm`
`@pytest.mark.multimodal`
`@pytest.mark.e2e`
`@pytest.mark.gpu_1`
`@pytest.mark.nightly`
`@pytest.mark.model`("Qwen/Qwen2.5-VL-7B-Instruct")
`@pytest.mark.profiled_vram_gib`(19.9)
`@pytest.mark.requested_vllm_kv_cache_bytes`(922_354_000)
`@pytest.mark.timeout`(360) # Match VLLMConfig.timeout for this multimodal deployment
def test_multimodal_b64(
request,
runtime_services_dynamic_ports,
dynamo_dynamic_ports,
predownload_models,
):
"""
Test multimodal inference with base64 url passthrough.
This test is separate because it loads the required image at runtime
(not collection time), ensuring it only fails when actually executed.
Runs on gpu_1 alongside other single-GPU multimodal tests that use the
same model (mm_agg_qwen2.5-vl-7b). The ``@pytest.mark.model`` mark is
kept as a safety net so the model is predownloaded even if no other
gpu_1 config collects this model in a given CI job.
"""
# Load B64 image at test execution time (uses real PNG even if MULTIMODAL_IMG is LFS pointer)
b64_img = base64.b64encode(get_multimodal_test_image_bytes()).decode()
# Create payload with B64 image
b64_payload = chat_payload(
[
{
"type": "text",
"text": "What colors are in the following image? Respond only with the colors.",
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{b64_img}"},
},
],
repeat_count=1,
expected_response=["purple"],
max_tokens=100,
)
# Create test config
config = VLLMConfig(
name="test_multimodal_b64",
directory=vllm_dir,
script_name="agg_multimodal.sh",
marks=[], # markers at function-level
model="Qwen/Qwen2.5-VL-7B-Instruct",
script_args=["--model", "Qwen/Qwen2.5-VL-7B-Instruct"],
delayed_start=0,
timeout=360,
request_payloads=[b64_payload],
)
config = dataclasses.replace(
config, frontend_port=dynamo_dynamic_ports.frontend_port
)
run_serve_deployment(config, request, ports=dynamo_dynamic_ports)
`@pytest.mark.vllm`
`@pytest.mark.multimodal`
`@pytest.mark.e2e`
`@pytest.mark.gpu_1`
`@pytest.mark.pre_merge`
`@pytest.mark.model`("Qwen/Qwen3-VL-2B-Instruct")
`@pytest.mark.profiled_vram_gib`(9.6)
`@pytest.mark.requested_vllm_kv_cache_bytes`(1_710_490_000)
`@pytest.mark.timeout`(220)
def test_multimodal_b64_frontend_decoding(
request,
runtime_services_dynamic_ports,
dynamo_dynamic_ports,
predownload_models,
):
"""
Test multimodal inference with base64 images through frontend decoding path.
This exercises the Rust frontend image decode + NIXL RDMA transfer path
with inline base64 data: URIs (not HTTP URLs). Verifies that the
strip_inline_data_urls optimization does not break correctness.
HF predownload: same model is already listed via ``@pytest.mark.model`` on
``test_serve_deployment[multimodal_video_agg]`` (pre_merge + gpu_1), so no
extra ``model`` mark is needed here for PR CI.
"""
b64_img = base64.b64encode(get_multimodal_test_image_bytes()).decode()
b64_payload = chat_payload(
[
{
"type": "text",
"text": "What colors are in the following image? Respond only with the colors.",
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{b64_img}"},
},
],
repeat_count=1,
expected_response=["green"],
temperature=0.0,
max_tokens=100,
)
config = VLLMConfig(
name="test_multimodal_b64_frontend_decoding",
directory=vllm_dir,
script_name="agg_multimodal.sh",
marks=[],
model="Qwen/Qwen3-VL-2B-Instruct",
script_args=[
"--model",
"Qwen/Qwen3-VL-2B-Instruct",
"--frontend-decoding",
],
delayed_start=0,
timeout=220,
request_payloads=[b64_payload],
)
config = dataclasses.replace(
config, frontend_port=dynamo_dynamic_ports.frontend_port
)
run_serve_deployment(config, request, ports=dynamo_dynamic_ports)
🧰 Tools
🪛 Ruff (0.15.12)

[error] 708-708: Undefined name base64

(F821)


[error] 708-708: Undefined name get_multimodal_test_image_bytes

(F821)


[error] 769-769: Undefined name base64

(F821)


[error] 769-769: Undefined name get_multimodal_test_image_bytes

(F821)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/serve/test_vllm.py` around lines 683 - 807, Both standalone tests
(test_multimodal_b64 and test_multimodal_b64_frontend_decoding) build VLLMConfig
with marks=[] so params_with_model_mark() is skipped; add the missing pytest
markers: annotate each test function with `@pytest.mark.model`("org/model-name")
using the appropriate HF model string (e.g., same strings passed to
VLLMConfig.model/script_args) and add `@pytest.mark.profiled_vram_gib`(N) plus the
framework KV marker `@pytest.mark.requested_vllm_kv_cache_bytes`(BYTES) with
suitable values for the GPU fan-out; update the decorators above
test_multimodal_b64 to include the model and both VRAM/KV markers and likewise
add `@pytest.mark.model` and the VRAM/KV markers above
test_multimodal_b64_frontend_decoding to ensure params_with_model_mark() runs
and the new GPU fan-out receives the required metadata.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Add the missing multimodal helper imports.

Both new tests call base64.b64encode() and get_multimodal_test_image_bytes(), but neither symbol is imported in this module. Ruff is already reporting F821 here, and these tests will fail as soon as they execute.

Suggested fix
+import base64
 import dataclasses
 import logging
 import os
 import random
@@
-from tests.serve.conftest import MULTIMODAL_IMG_URL
+from tests.serve.conftest import MULTIMODAL_IMG_URL, get_multimodal_test_image_bytes

As per coding guidelines, .ai/python-guidelines.md: "keep imports at module top".

🧰 Tools
🪛 Ruff (0.15.12)

[error] 708-708: Undefined name base64

(F821)


[error] 708-708: Undefined name get_multimodal_test_image_bytes

(F821)


[error] 769-769: Undefined name base64

(F821)


[error] 769-769: Undefined name get_multimodal_test_image_bytes

(F821)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/serve/test_vllm.py` around lines 683 - 807, The tests use
base64.b64encode and get_multimodal_test_image_bytes but those symbols are not
imported; add the missing imports (import base64 and the
get_multimodal_test_image_bytes helper from its test helpers module) at the top
of the module so base64.b64encode(...) and get_multimodal_test_image_bytes()
resolve and Ruff F821 is fixed, keeping imports grouped at module-level per
guidelines.

nv-rinig added 6 commits May 7, 2026 17:00
Signed-off-by: Rini Gupta <rinig@nvidia.com>
Signed-off-by: Rini Gupta <rinig@nvidia.com>
Signed-off-by: Rini Gupta <rinig@nvidia.com>
Signed-off-by: Rini Gupta <rinig@nvidia.com>
Signed-off-by: Rini Gupta <rinig@nvidia.com>
Signed-off-by: Rini Gupta <rinig@nvidia.com>
@pull-request-size pull-request-size Bot added size/L and removed size/XL labels May 8, 2026
nv-rinig added 2 commits May 7, 2026 17:33
Signed-off-by: Rini Gupta <rinig@nvidia.com>
Signed-off-by: Rini Gupta <rinig@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

actions documentation Improvements or additions to documentation size/L test xpu

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant