test: split vLLM tests by component for parallel CI fan-out#9293
test: split vLLM tests by component for parallel CI fan-out#9293
Conversation
WalkthroughThis pull request introduces a mandatory pytest component marker system ( ChangesPytest Component Marker System & CI Pipeline Refactoring
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
tests/kvbm_integration/test_consolidator_router_e2e.py (1)
36-57:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDead
pytestmark— the intendedvllm/trtllmmarkers never reach the effective marker set, silently excluding these tests from CI component jobsPython last-write-wins: the assignment at lines 36-40 is overwritten by the one at lines 50-57. The effective
pytestmarkis therefore the one at lines 50-57, which contains neitherpytest.mark.vllmnorpytest.mark.trtllm. None of the test methods in this file carry those framework markers either.As a result, any CI job that filters with
-m "vllm and kvbm"or-m "trtllm and kvbm"will silently skip every test in this file — exactly the opposite of what the PR intends.🐛 Proposed fix
-pytestmark = [ - pytest.mark.kvbm, - pytest.mark.vllm, - pytest.mark.trtllm, -] - # Build list of available engines for parameterization AVAILABLE_ENGINES = [] if HAS_VLLM: AVAILABLE_ENGINES.append("vllm") if HAS_TRTLLM: AVAILABLE_ENGINES.append("trtllm") -# Test markers pytestmark = [ pytest.mark.kvbm, pytest.mark.e2e, pytest.mark.slow, pytest.mark.gpu_1, pytest.mark.pre_merge, + pytest.mark.vllm, + pytest.mark.trtllm, pytest.mark.skipif(not (HAS_VLLM or HAS_TRTLLM), reason="requires vllm or trtllm"), ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/kvbm_integration/test_consolidator_router_e2e.py` around lines 36 - 57, The file defines two separate pytestmark assignments so the initial pytest.mark.vllm / pytest.mark.trtllm assignment is overwritten; update the final pytestmark construction to include the vllm/trtllm markers conditionally instead of reassigning, e.g. keep AVAILABLE_ENGINES logic and then build the single pytestmark list (or extend the existing pytestmark list) by adding pytest.mark.vllm when HAS_VLLM is true and pytest.mark.trtllm when HAS_TRTLLM is true so the final pytestmark used by the tests contains those framework markers.tests/README.md (1)
124-137:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the non-vLLM examples to include a component marker.
This table now says backend tests must carry exactly one of
core|multimodal|router|kvbm, but the SGLang/TRT-LLM examples below still show framework-only marker blocks. That teaches authors a shape that CI component jobs will silently drop.As per coding guidelines,
.ai/pytest-guidelines.md: "if a test has only a framework marker but no component marker, it’s silently dropped from CI component bucket jobs."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/README.md` around lines 124 - 137, The README examples for non-vLLM frameworks (e.g., the SGLang and TRT-LLM example blocks) lack a required component marker; update those example test snippets to include exactly one component marker (choose from pytest.mark.core, pytest.mark.multimodal, pytest.mark.router, or pytest.mark.kvbm) alongside the existing framework markers (e.g., pytest.mark.sglang, pytest.mark.trtllm) so they match the table guidance and CI component-bucketing rules.tests/serve/test_vllm_xpu.py (1)
615-620:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd model and sizing markers to the standalone multimodal tests.
Both tests build
VLLMConfig(..., marks=[])inside the body, soparams_with_model_mark()never annotates them. That meanspredownload_modelscannot prefetch the checkpoint, and the scheduler has no VRAM/KV metadata for these runs.As per coding guidelines,
tests/**/*.py: "Use the@pytest.mark.model("org/model-name")marker to declare HF models used in tests" and "Tests that load a model must carry both@pytest.mark.profiled_vram_gib(N)marker and a framework-specific KV marker ...".Also applies to: 672-677
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/serve/test_vllm_xpu.py` around lines 615 - 620, The two standalone multimodal tests that construct VLLMConfig(...) inside their bodies lack external model and sizing markers, so params_with_model_mark() cannot annotate them; update each test (the ones marked with `@pytest.mark.vllm` `@pytest.mark.multimodal` `@pytest.mark.e2e` `@pytest.mark.xpu_2` `@pytest.mark.nightly` ...) to include a pytest model marker (e.g., `@pytest.mark.model`("hf-org/model-name")) plus the required sizing markers: `@pytest.mark.profiled_vram_gib`(N) with the correct GiB value and the framework-specific KV marker used in our repo for vllm (e.g., `@pytest.mark.kv_vllm` or the equivalent), so predownload_models and the scheduler receive checkpoint and VRAM/KV metadata; leave the VLLMConfig(...) construction inside the test body unchanged.
🧹 Nitpick comments (2)
tests/test_predownload_models.py (1)
25-56: ⚡ Quick win
sglangandtrtllmparametrize variants are missing a component markerThe
vllmvariants now havepytest.mark.core, but thesglangandtrtllmvariants (lines 33–38 and 49–55) still carry only their framework markers with no component marker. Per the coding guidelines, any test with avllm/trtllm/sglangframework marker must have exactly one component marker. These predownload tests belong incore.🔧 Proposed fix
pytest.param( "predownload_models_sglang_gpu1", - marks=[pytest.mark.sglang, pytest.mark.e2e, pytest.mark.gpu_1], + marks=[pytest.mark.sglang, pytest.mark.core, pytest.mark.e2e, pytest.mark.gpu_1], ), pytest.param( "predownload_models_trtllm_gpu1", - marks=[pytest.mark.trtllm, pytest.mark.e2e, pytest.mark.gpu_1], + marks=[pytest.mark.trtllm, pytest.mark.core, pytest.mark.e2e, pytest.mark.gpu_1], ),pytest.param( "predownload_models_sglang_gpu2", - marks=[pytest.mark.sglang, pytest.mark.e2e, pytest.mark.gpu_2], + marks=[pytest.mark.sglang, pytest.mark.core, pytest.mark.e2e, pytest.mark.gpu_2], ), pytest.param( "predownload_models_trtllm_gpu2", - marks=[pytest.mark.trtllm, pytest.mark.e2e, pytest.mark.gpu_2], + marks=[pytest.mark.trtllm, pytest.mark.core, pytest.mark.e2e, pytest.mark.gpu_2], ),As per coding guidelines: "When a test has a framework marker (
vllm,trtllm, orsglang), it must have exactly one component marker:multimodal,router,kvbm, orcore."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_predownload_models.py` around lines 25 - 56, The sglang/trtllm parametrized test entries ("predownload_models_sglang_gpu1", "predownload_models_trtllm_gpu1", "predownload_models_sglang_gpu2", "predownload_models_trtllm_gpu2") are missing a component marker; update each pytest.param's marks list to include pytest.mark.core so that every test with a framework marker (sglang or trtllm) has exactly one component marker (core) matching the vllm entries..ai/pytest-guidelines.md (1)
328-332: ⚡ Quick winThe "Example" section at the end of the file still shows a non-compliant pattern
The
GOODblock in the timeout section (lines 328-332) now correctly shows@pytest.mark.vllm+@pytest.mark.core, but the standalone "Example" section at lines 348–357 still has only@pytest.mark.vllmwith no component marker — contradicting the new rule documented just above it. A developer skimming to the example section will copy a non-compliant template.📝 Proposed fix for the example section
`@pytest.mark.pre_merge` `@pytest.mark.gpu_1` `@pytest.mark.e2e` `@pytest.mark.vllm` +@pytest.mark.core # component bucket: pick one of core, multimodal, router, kvbm `@pytest.mark.model`("Qwen/Qwen3-0.6B") `@pytest.mark.timeout`(300) def test_vllm_aggregated(start_serve_deployment):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.ai/pytest-guidelines.md around lines 328 - 332, The example block under the "Example" section still shows only `@pytest.mark.vllm` and must be updated to include a component marker per the rule; update the standalone example to match the GOOD pattern by adding a component marker (e.g., `@pytest.mark.core` or the appropriate component such as multimodal/router/kvbm) alongside `@pytest.mark.vllm` in that example snippet so the template is compliant with the documented requirement.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/pr.yaml:
- Around line 625-633: The vllm-copy-to-acr job is allowed to run once only CPU
tests pass, so add the GPU-split test jobs as dependencies and require them to
be successful or skipped before promoting the image; update the job's needs list
to include vllm-test-core, vllm-test-router, vllm-test-kvbm (and any other
vllm-test-* GPU jobs) and extend the if condition to check that each of those
needs.<result> is 'success' or 'skipped' (similar to how vllm-test-cpu is
checked) so the job cannot proceed until all split GPU bucket tests finish
validating the PR.
In `@pyproject.toml`:
- Around line 236-238: Remove the blanket "ignore::DeprecationWarning" pytest
filter and replace it with targeted ignores for specific third-party warnings
(for example by adding entries like "ignore:SomeThirdPartyClassName:module.path"
or "ignore:deprecated function X:third.party.module") so that pytest keeps
deprecation warnings from our own code as errors; locate the existing filter
string "ignore::DeprecationWarning" in pyproject.toml and replace it with
narrowly-scoped "ignore:" entries and/or comments explaining each exception per
the project guideline.
- Line 273: The pytest `core` marker is registered in pyproject.toml but not
added in tests/conftest.py, causing strict marker enforcement to fail; update
the pytest_configure function in tests/conftest.py to call
config.addinivalue_line for the "core: marks tests that exercise core/runtime
functionality (residual bucket — disjoint from multimodal/router/kvbm/planner)"
marker alongside the existing addinivalue_line calls so the marker is defined in
both places (look for the pytest_configure function and the other
config.addinivalue_line entries to mirror).
In `@tests/kvbm_integration/test_chunked_prefill.py`:
- Around line 20-24: The first pytestmark assignment (the list containing
pytest.mark.kvbm and pytest.mark.vllm) is dead because it gets overwritten later
by another pytestmark; remove the redundant early assignment so there is only
one pytestmark in the module, or merge any missing markers into the existing
pytestmark variable; specifically update/remove the initial pytestmark
declaration so only the later pytestmark remains (refer to the pytestmark
variable in this test module).
In `@tests/kvbm_integration/test_kvbm_vllm_integration.py`:
- Around line 23-27: The module-level pytest mark assignment named pytestmark is
declared twice in this file (the earlier assignment is immediately overwritten
by the later one), so remove the dead redundancy by deleting the first
pytestmark = [ pytest.mark.kvbm, pytest.mark.vllm ] block; keep the later
pytestmark that already includes both pytest.mark.kvbm and pytest.mark.vllm so
the tests retain the intended marks (look for the pytestmark variable in this
test_kvbm_vllm_integration.py module to locate and remove the redundant earlier
assignment).
In `@tests/README.md`:
- Line 117: The README table's "Test Type [required]" cell currently lists
"unit, integration, e2e, benchmark, performance, stress" which incorrectly
treats benchmark/performance/stress as required; change that cell so the
required set only includes "unit, integration, e2e" and move or mention
"benchmark, performance, stress" as optional/extra markers elsewhere in the row
or a footnote to match the repo rule that every test must have at least one of
unit/integration/e2e; update the table cell content string "unit, integration,
e2e, benchmark, performance, stress" to "unit, integration, e2e" and add a short
parenthetical or separate column/footnote listing the extra markers.
In `@tests/serve/test_vllm_xpu.py`:
- Around line 483-488: The test's marks list (in tests/serve/test_vllm_xpu.py
where pytest marks like pytest.mark.xpu_1, pytest.mark.multimodal etc. are
declared for the multimodal_video_agg test) is missing the required resource
profiling markers; add pytest.mark.profiled_vram_gib(<appropriate_gib>) and the
framework KV cache marker (e.g.,
pytest.mark.requested_vllm_kv_cache_bytes(<appropriate_bytes>)) to the same
marks list so the scheduler can safely size and budget the model during
pre-merge runs.
In `@tests/serve/test_vllm.py`:
- Around line 683-807: Both standalone tests (test_multimodal_b64 and
test_multimodal_b64_frontend_decoding) build VLLMConfig with marks=[] so
params_with_model_mark() is skipped; add the missing pytest markers: annotate
each test function with `@pytest.mark.model`("org/model-name") using the
appropriate HF model string (e.g., same strings passed to
VLLMConfig.model/script_args) and add `@pytest.mark.profiled_vram_gib`(N) plus the
framework KV marker `@pytest.mark.requested_vllm_kv_cache_bytes`(BYTES) with
suitable values for the GPU fan-out; update the decorators above
test_multimodal_b64 to include the model and both VRAM/KV markers and likewise
add `@pytest.mark.model` and the VRAM/KV markers above
test_multimodal_b64_frontend_decoding to ensure params_with_model_mark() runs
and the new GPU fan-out receives the required metadata.
- Around line 683-807: The tests use base64.b64encode and
get_multimodal_test_image_bytes but those symbols are not imported; add the
missing imports (import base64 and the get_multimodal_test_image_bytes helper
from its test helpers module) at the top of the module so base64.b64encode(...)
and get_multimodal_test_image_bytes() resolve and Ruff F821 is fixed, keeping
imports grouped at module-level per guidelines.
---
Outside diff comments:
In `@tests/kvbm_integration/test_consolidator_router_e2e.py`:
- Around line 36-57: The file defines two separate pytestmark assignments so the
initial pytest.mark.vllm / pytest.mark.trtllm assignment is overwritten; update
the final pytestmark construction to include the vllm/trtllm markers
conditionally instead of reassigning, e.g. keep AVAILABLE_ENGINES logic and then
build the single pytestmark list (or extend the existing pytestmark list) by
adding pytest.mark.vllm when HAS_VLLM is true and pytest.mark.trtllm when
HAS_TRTLLM is true so the final pytestmark used by the tests contains those
framework markers.
In `@tests/README.md`:
- Around line 124-137: The README examples for non-vLLM frameworks (e.g., the
SGLang and TRT-LLM example blocks) lack a required component marker; update
those example test snippets to include exactly one component marker (choose from
pytest.mark.core, pytest.mark.multimodal, pytest.mark.router, or
pytest.mark.kvbm) alongside the existing framework markers (e.g.,
pytest.mark.sglang, pytest.mark.trtllm) so they match the table guidance and CI
component-bucketing rules.
In `@tests/serve/test_vllm_xpu.py`:
- Around line 615-620: The two standalone multimodal tests that construct
VLLMConfig(...) inside their bodies lack external model and sizing markers, so
params_with_model_mark() cannot annotate them; update each test (the ones marked
with `@pytest.mark.vllm` `@pytest.mark.multimodal` `@pytest.mark.e2e`
`@pytest.mark.xpu_2` `@pytest.mark.nightly` ...) to include a pytest model marker
(e.g., `@pytest.mark.model`("hf-org/model-name")) plus the required sizing
markers: `@pytest.mark.profiled_vram_gib`(N) with the correct GiB value and the
framework-specific KV marker used in our repo for vllm (e.g.,
`@pytest.mark.kv_vllm` or the equivalent), so predownload_models and the scheduler
receive checkpoint and VRAM/KV metadata; leave the VLLMConfig(...) construction
inside the test body unchanged.
---
Nitpick comments:
In @.ai/pytest-guidelines.md:
- Around line 328-332: The example block under the "Example" section still shows
only `@pytest.mark.vllm` and must be updated to include a component marker per the
rule; update the standalone example to match the GOOD pattern by adding a
component marker (e.g., `@pytest.mark.core` or the appropriate component such as
multimodal/router/kvbm) alongside `@pytest.mark.vllm` in that example snippet so
the template is compliant with the documented requirement.
In `@tests/test_predownload_models.py`:
- Around line 25-56: The sglang/trtllm parametrized test entries
("predownload_models_sglang_gpu1", "predownload_models_trtllm_gpu1",
"predownload_models_sglang_gpu2", "predownload_models_trtllm_gpu2") are missing
a component marker; update each pytest.param's marks list to include
pytest.mark.core so that every test with a framework marker (sglang or trtllm)
has exactly one component marker (core) matching the vllm entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d3701ef3-c615-49cb-b96a-4a610499caa4
📒 Files selected for processing (30)
.ai/pytest-guidelines.md.github/workflows/pr.yaml.pre-commit-config.yamllib/bindings/python/tests/test_deprecated_enable_nats.pypyproject.tomltests/README.mdtests/basic/test_cuda_version_consistency.pytests/dependencies/test_kvbm_imports.pytests/fault_tolerance/cancellation/test_vllm.pytests/fault_tolerance/etcd_ha/test_vllm.pytests/fault_tolerance/migration/test_vllm.pytests/fault_tolerance/test_canary_rank_pause.pytests/fault_tolerance/test_vllm_health_check.pytests/frontend/test_prepost.pytests/frontend/test_prepost_mistral.pytests/frontend/test_prompt_embeds.pytests/frontend/test_vllm.pytests/frontend/test_vllm_prepost_integration.pytests/gpu_memory_service/test_quiesce_resume.pytests/gpu_memory_service/test_shadow_failover.pytests/kvbm_integration/test_chunked_prefill.pytests/kvbm_integration/test_consolidator_router_e2e.pytests/kvbm_integration/test_determinism_agg.pytests/kvbm_integration/test_kvbm_vllm_integration.pytests/parity/parser/test_parity_parser.pytests/serve/test_vllm.pytests/serve/test_vllm_omni.pytests/serve/test_vllm_xpu.pytests/test_predownload_models.pytests/utils/multimodal.py
| vllm-copy-to-acr: | ||
| name: vllm-runtime # This name overlaps with other vllm jobs to group them in the UI | ||
| needs: [changed-files, vllm-build, vllm-test] | ||
| needs: [changed-files, vllm-build, vllm-test-cpu] | ||
| if: | | ||
| always() && !cancelled() && | ||
| (needs.changed-files.outputs.core == 'true' || needs.changed-files.outputs.vllm == 'true' || needs.changed-files.outputs.deploy == 'true') && | ||
| needs.vllm-build.result == 'success' && | ||
| (needs.vllm-test.result == 'success' || needs.vllm-test.result == 'skipped') | ||
| (needs.vllm-test-cpu.result == 'success' || needs.vllm-test-cpu.result == 'skipped') | ||
| uses: ./.github/workflows/shared-copy.yml |
There was a problem hiding this comment.
Don't promote the vLLM image before the split GPU buckets finish.
vllm-copy-to-acr now advances after CPU-only coverage, so deploy tests can start while vllm-test-core, vllm-test-router, vllm-test-kvbm, etc. are still failing or pending. That burns deploy capacity on an image the PR has not finished validating.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/pr.yaml around lines 625 - 633, The vllm-copy-to-acr job
is allowed to run once only CPU tests pass, so add the GPU-split test jobs as
dependencies and require them to be successful or skipped before promoting the
image; update the job's needs list to include vllm-test-core, vllm-test-router,
vllm-test-kvbm (and any other vllm-test-* GPU jobs) and extend the if condition
to check that each of those needs.<result> is 'success' or 'skipped' (similar to
how vllm-test-cpu is checked) so the job cannot proceed until all split GPU
bucket tests finish validating the PR.
| # Ignore all DeprecationWarnings — many come from third-party deps we don't control, | ||
| # and treating them as errors blocks pytest collection on Python version bumps. | ||
| "ignore::DeprecationWarning", |
There was a problem hiding this comment.
Keep deprecation filtering targeted.
ignore::DeprecationWarning disables the warning-as-error policy for every deprecation, including ones introduced by our own code. That throws away a lot more signal than the existing per-warning ignores.
As per coding guidelines, pyproject.toml: "Use filterwarnings = ["error"] in pytest configuration with specific ignores for known third-party deprecations; fix root causes for new warnings or add targeted ignores with explanatory comments in pyproject.toml".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pyproject.toml` around lines 236 - 238, Remove the blanket
"ignore::DeprecationWarning" pytest filter and replace it with targeted ignores
for specific third-party warnings (for example by adding entries like
"ignore:SomeThirdPartyClassName:module.path" or "ignore:deprecated function
X:third.party.module") so that pytest keeps deprecation warnings from our own
code as errors; locate the existing filter string "ignore::DeprecationWarning"
in pyproject.toml and replace it with narrowly-scoped "ignore:" entries and/or
comments explaining each exception per the project guideline.
| "performance: marks tests as performance tests", | ||
| "benchmark: marks tests as benchmark tests", | ||
| "none: marks tests that do not require a framework-specific runtime", | ||
| "core: marks tests that exercise core/runtime functionality (residual bucket — disjoint from multimodal/router/kvbm/planner)", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
fd -p 'conftest.py' tests | while read -r file; do
echo "=== $file ==="
rg -n 'pytest_configure|addinivalue_line|core:' "$file" || true
doneRepository: ai-dynamo/dynamo
Length of output: 688
🏁 Script executed:
# Check the pytest_configure function in tests/conftest.py
cd tests && head -n 250 conftest.py | tail -n 100Repository: ai-dynamo/dynamo
Length of output: 4320
🏁 Script executed:
# Check pyproject.toml around line 273 for the core marker
head -n 280 pyproject.toml | tail -n 20Repository: ai-dynamo/dynamo
Length of output: 1859
🏁 Script executed:
# Check if there are addinivalue_line calls for markers in conftest.py
rg -A 2 'addinivalue_line.*markers' tests/conftest.pyRepository: ai-dynamo/dynamo
Length of output: 42
🏁 Script executed:
# Check if any markers are registered in conftest.py at all
rg 'config.addinivalue_line|pytest.mark|pytest.ini' tests/conftest.py -A 1Repository: ai-dynamo/dynamo
Length of output: 903
🏁 Script executed:
# Check the complete pytest_configure function signature and implementation
rg -A 50 'def pytest_configure' tests/conftest.py | head -n 60Repository: ai-dynamo/dynamo
Length of output: 2218
Add core marker to tests/conftest.py:pytest_configure.
The core marker is registered in pyproject.toml (line 273) but missing from tests/conftest.py:pytest_configure. Strict marker enforcement requires registration in both locations. Add an addinivalue_line call for the core marker in the pytest_configure function alongside other marker definitions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pyproject.toml` at line 273, The pytest `core` marker is registered in
pyproject.toml but not added in tests/conftest.py, causing strict marker
enforcement to fail; update the pytest_configure function in tests/conftest.py
to call config.addinivalue_line for the "core: marks tests that exercise
core/runtime functionality (residual bucket — disjoint from
multimodal/router/kvbm/planner)" marker alongside the existing addinivalue_line
calls so the marker is defined in both places (look for the pytest_configure
function and the other config.addinivalue_line entries to mirror).
| pytestmark = [ | ||
| pytest.mark.kvbm, | ||
| pytest.mark.vllm, | ||
| ] | ||
|
|
There was a problem hiding this comment.
Dead pytestmark assignment — overwritten by the existing one at line 47
Same issue as test_kvbm_vllm_integration.py: the assignment at lines 20-23 is overwritten by the one at line 47. The second pytestmark already has both kvbm and vllm, so these lines have no effect.
🔧 Proposed fix
-pytestmark = [
- pytest.mark.kvbm,
- pytest.mark.vllm,
-]
-
# Test configuration🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/kvbm_integration/test_chunked_prefill.py` around lines 20 - 24, The
first pytestmark assignment (the list containing pytest.mark.kvbm and
pytest.mark.vllm) is dead because it gets overwritten later by another
pytestmark; remove the redundant early assignment so there is only one
pytestmark in the module, or merge any missing markers into the existing
pytestmark variable; specifically update/remove the initial pytestmark
declaration so only the later pytestmark remains (refer to the pytestmark
variable in this test module).
| pytestmark = [ | ||
| pytest.mark.kvbm, | ||
| pytest.mark.vllm, | ||
| ] | ||
|
|
There was a problem hiding this comment.
Dead pytestmark assignment — overwritten by the existing one at line 48
Python assigns module-level names top-to-bottom; the assignment at lines 23-26 is immediately overwritten when execution reaches line 48. Pytest reads only the final value of module.pytestmark, so these four lines have no effect. The second pytestmark at line 48 already contains both pytest.mark.kvbm and pytest.mark.vllm, making this block entirely redundant.
🔧 Proposed fix
-pytestmark = [
- pytest.mark.kvbm,
- pytest.mark.vllm,
-]
-
HAS_VLLM = check_module_available("vllm")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pytestmark = [ | |
| pytest.mark.kvbm, | |
| pytest.mark.vllm, | |
| ] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/kvbm_integration/test_kvbm_vllm_integration.py` around lines 23 - 27,
The module-level pytest mark assignment named pytestmark is declared twice in
this file (the earlier assignment is immediately overwritten by the later one),
so remove the dead redundancy by deleting the first pytestmark = [
pytest.mark.kvbm, pytest.mark.vllm ] block; keep the later pytestmark that
already includes both pytest.mark.kvbm and pytest.mark.vllm so the tests retain
the intended marks (look for the pytestmark variable in this
test_kvbm_vllm_integration.py module to locate and remove the redundant earlier
assignment).
| |-------------------------|------------------------------------------------------------------|------------------------------------| | ||
| | Lifecycle [required] | pre_merge, post_merge, nightly | When the test should run. Aggregate pipeline budgets: pre_merge < 30 min, post_merge < 1 hr, nightly < 3 hr. See [Pipeline Time Budgets](#pipeline-time-budgets). | | ||
| | Test Type [required] | unit, integration, e2e, benchmark, performance, stress, multimodal | Nature of the test | | ||
| | Test Type [required] | unit, integration, e2e, benchmark, performance, stress | Nature of the test | |
There was a problem hiding this comment.
Keep the required test-type set aligned with repo rules.
benchmark, performance, and stress are useful extra markers, but they do not satisfy the required type-marker requirement on their own. Listing them in the required set here makes the table stricter/clearer in the wrong direction.
As per coding guidelines, tests/**/*.py: "Every test must have at least one type marker: unit, integration, or e2e".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/README.md` at line 117, The README table's "Test Type [required]" cell
currently lists "unit, integration, e2e, benchmark, performance, stress" which
incorrectly treats benchmark/performance/stress as required; change that cell so
the required set only includes "unit, integration, e2e" and move or mention
"benchmark, performance, stress" as optional/extra markers elsewhere in the row
or a footnote to match the repo rule that every test must have at least one of
unit/integration/e2e; update the table cell content string "unit, integration,
e2e, benchmark, performance, stress" to "unit, integration, e2e" and add a short
parenthetical or separate column/footnote listing the extra markers.
| marks=[ | ||
| pytest.mark.xpu_1, | ||
| pytest.mark.multimodal, | ||
| pytest.mark.pre_merge, | ||
| pytest.mark.timeout(600), # TODO: profile to get tighter timeout | ||
| ], # TODO: profile to get max_vram |
There was a problem hiding this comment.
Profile multimodal_video_agg before keeping it in pre_merge.
This config still loads a real model with only a timeout/TODO, so the new component bucket cannot size it safely. Without profiled_vram_gib(...) and requested_vllm_kv_cache_bytes(...), --max-vram-gib treats it as unsafe and the scheduler can't budget it.
As per coding guidelines, tests/**/*.py: "Tests that load a model must carry both @pytest.mark.profiled_vram_gib(N) marker and a framework-specific KV marker ...".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/serve/test_vllm_xpu.py` around lines 483 - 488, The test's marks list
(in tests/serve/test_vllm_xpu.py where pytest marks like pytest.mark.xpu_1,
pytest.mark.multimodal etc. are declared for the multimodal_video_agg test) is
missing the required resource profiling markers; add
pytest.mark.profiled_vram_gib(<appropriate_gib>) and the framework KV cache
marker (e.g., pytest.mark.requested_vllm_kv_cache_bytes(<appropriate_bytes>)) to
the same marks list so the scheduler can safely size and budget the model during
pre-merge runs.
| @pytest.mark.vllm | ||
| @pytest.mark.multimodal | ||
| @pytest.mark.e2e | ||
| @pytest.mark.gpu_1 | ||
| @pytest.mark.nightly | ||
| @pytest.mark.model("Qwen/Qwen2.5-VL-7B-Instruct") | ||
| @pytest.mark.timeout(360) # Match VLLMConfig.timeout for this multimodal deployment | ||
| def test_multimodal_b64( | ||
| request, | ||
| runtime_services_dynamic_ports, | ||
| dynamo_dynamic_ports, | ||
| predownload_models, | ||
| ): | ||
| """ | ||
| Test multimodal inference with base64 url passthrough. | ||
|
|
||
| This test is separate because it loads the required image at runtime | ||
| (not collection time), ensuring it only fails when actually executed. | ||
|
|
||
| Runs on gpu_1 alongside other single-GPU multimodal tests that use the | ||
| same model (mm_agg_qwen2.5-vl-7b). The ``@pytest.mark.model`` mark is | ||
| kept as a safety net so the model is predownloaded even if no other | ||
| gpu_1 config collects this model in a given CI job. | ||
| """ | ||
| # Load B64 image at test execution time (uses real PNG even if MULTIMODAL_IMG is LFS pointer) | ||
| b64_img = base64.b64encode(get_multimodal_test_image_bytes()).decode() | ||
|
|
||
| # Create payload with B64 image | ||
| b64_payload = chat_payload( | ||
| [ | ||
| { | ||
| "type": "text", | ||
| "text": "What colors are in the following image? Respond only with the colors.", | ||
| }, | ||
| { | ||
| "type": "image_url", | ||
| "image_url": {"url": f"data:image/png;base64,{b64_img}"}, | ||
| }, | ||
| ], | ||
| repeat_count=1, | ||
| expected_response=["purple"], | ||
| max_tokens=100, | ||
| ) | ||
|
|
||
| # Create test config | ||
| config = VLLMConfig( | ||
| name="test_multimodal_b64", | ||
| directory=vllm_dir, | ||
| script_name="agg_multimodal.sh", | ||
| marks=[], # markers at function-level | ||
| model="Qwen/Qwen2.5-VL-7B-Instruct", | ||
| script_args=["--model", "Qwen/Qwen2.5-VL-7B-Instruct"], | ||
| delayed_start=0, | ||
| timeout=360, | ||
| request_payloads=[b64_payload], | ||
| ) | ||
|
|
||
| config = dataclasses.replace( | ||
| config, frontend_port=dynamo_dynamic_ports.frontend_port | ||
| ) | ||
| run_serve_deployment(config, request, ports=dynamo_dynamic_ports) | ||
|
|
||
|
|
||
| @pytest.mark.vllm | ||
| @pytest.mark.multimodal | ||
| @pytest.mark.e2e | ||
| @pytest.mark.gpu_1 | ||
| @pytest.mark.pre_merge | ||
| @pytest.mark.timeout(220) | ||
| def test_multimodal_b64_frontend_decoding( | ||
| request, | ||
| runtime_services_dynamic_ports, | ||
| dynamo_dynamic_ports, | ||
| predownload_models, | ||
| ): | ||
| """ | ||
| Test multimodal inference with base64 images through frontend decoding path. | ||
|
|
||
| This exercises the Rust frontend image decode + NIXL RDMA transfer path | ||
| with inline base64 data: URIs (not HTTP URLs). Verifies that the | ||
| strip_inline_data_urls optimization does not break correctness. | ||
|
|
||
| HF predownload: same model is already listed via ``@pytest.mark.model`` on | ||
| ``test_serve_deployment[multimodal_video_agg]`` (pre_merge + gpu_1), so no | ||
| extra ``model`` mark is needed here for PR CI. | ||
| """ | ||
| b64_img = base64.b64encode(get_multimodal_test_image_bytes()).decode() | ||
|
|
||
| b64_payload = chat_payload( | ||
| [ | ||
| { | ||
| "type": "text", | ||
| "text": "What colors are in the following image? Respond only with the colors.", | ||
| }, | ||
| { | ||
| "type": "image_url", | ||
| "image_url": {"url": f"data:image/png;base64,{b64_img}"}, | ||
| }, | ||
| ], | ||
| repeat_count=1, | ||
| expected_response=["green"], | ||
| temperature=0.0, | ||
| max_tokens=100, | ||
| ) | ||
|
|
||
| config = VLLMConfig( | ||
| name="test_multimodal_b64_frontend_decoding", | ||
| directory=vllm_dir, | ||
| script_name="agg_multimodal.sh", | ||
| marks=[], | ||
| model="Qwen/Qwen3-VL-2B-Instruct", | ||
| script_args=[ | ||
| "--model", | ||
| "Qwen/Qwen3-VL-2B-Instruct", | ||
| "--frontend-decoding", | ||
| ], | ||
| delayed_start=0, | ||
| timeout=220, | ||
| request_payloads=[b64_payload], | ||
| ) | ||
|
|
||
| config = dataclasses.replace( | ||
| config, frontend_port=dynamo_dynamic_ports.frontend_port | ||
| ) | ||
| run_serve_deployment(config, request, ports=dynamo_dynamic_ports) |
There was a problem hiding this comment.
Decorate the standalone multimodal tests with explicit model and VRAM/KV markers.
Because these tests build VLLMConfig(..., marks=[]) inside the function body, params_with_model_mark() never runs for them. test_multimodal_b64_frontend_decoding is missing the model marker entirely, and both tests are missing the profiled_vram_gib(...) + requested_vllm_kv_cache_bytes(...) pair that the new GPU fan-out relies on.
Suggested fix
`@pytest.mark.vllm`
`@pytest.mark.multimodal`
`@pytest.mark.e2e`
`@pytest.mark.gpu_1`
`@pytest.mark.nightly`
`@pytest.mark.model`("Qwen/Qwen2.5-VL-7B-Instruct")
+@pytest.mark.profiled_vram_gib(19.9)
+@pytest.mark.requested_vllm_kv_cache_bytes(922_354_000)
`@pytest.mark.timeout`(360) # Match VLLMConfig.timeout for this multimodal deployment
def test_multimodal_b64(...):
...
`@pytest.mark.vllm`
`@pytest.mark.multimodal`
`@pytest.mark.e2e`
`@pytest.mark.gpu_1`
`@pytest.mark.pre_merge`
+@pytest.mark.model("Qwen/Qwen3-VL-2B-Instruct")
+@pytest.mark.profiled_vram_gib(9.6)
+@pytest.mark.requested_vllm_kv_cache_bytes(1_710_490_000)
`@pytest.mark.timeout`(220)
def test_multimodal_b64_frontend_decoding(...):
...As per coding guidelines, tests/**/*.py: "Use the @pytest.mark.model("org/model-name") marker to declare HF models used in tests" and "Tests that load a model must carry both @pytest.mark.profiled_vram_gib(N) marker and a framework-specific KV marker ...".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @pytest.mark.vllm | |
| @pytest.mark.multimodal | |
| @pytest.mark.e2e | |
| @pytest.mark.gpu_1 | |
| @pytest.mark.nightly | |
| @pytest.mark.model("Qwen/Qwen2.5-VL-7B-Instruct") | |
| @pytest.mark.timeout(360) # Match VLLMConfig.timeout for this multimodal deployment | |
| def test_multimodal_b64( | |
| request, | |
| runtime_services_dynamic_ports, | |
| dynamo_dynamic_ports, | |
| predownload_models, | |
| ): | |
| """ | |
| Test multimodal inference with base64 url passthrough. | |
| This test is separate because it loads the required image at runtime | |
| (not collection time), ensuring it only fails when actually executed. | |
| Runs on gpu_1 alongside other single-GPU multimodal tests that use the | |
| same model (mm_agg_qwen2.5-vl-7b). The ``@pytest.mark.model`` mark is | |
| kept as a safety net so the model is predownloaded even if no other | |
| gpu_1 config collects this model in a given CI job. | |
| """ | |
| # Load B64 image at test execution time (uses real PNG even if MULTIMODAL_IMG is LFS pointer) | |
| b64_img = base64.b64encode(get_multimodal_test_image_bytes()).decode() | |
| # Create payload with B64 image | |
| b64_payload = chat_payload( | |
| [ | |
| { | |
| "type": "text", | |
| "text": "What colors are in the following image? Respond only with the colors.", | |
| }, | |
| { | |
| "type": "image_url", | |
| "image_url": {"url": f"data:image/png;base64,{b64_img}"}, | |
| }, | |
| ], | |
| repeat_count=1, | |
| expected_response=["purple"], | |
| max_tokens=100, | |
| ) | |
| # Create test config | |
| config = VLLMConfig( | |
| name="test_multimodal_b64", | |
| directory=vllm_dir, | |
| script_name="agg_multimodal.sh", | |
| marks=[], # markers at function-level | |
| model="Qwen/Qwen2.5-VL-7B-Instruct", | |
| script_args=["--model", "Qwen/Qwen2.5-VL-7B-Instruct"], | |
| delayed_start=0, | |
| timeout=360, | |
| request_payloads=[b64_payload], | |
| ) | |
| config = dataclasses.replace( | |
| config, frontend_port=dynamo_dynamic_ports.frontend_port | |
| ) | |
| run_serve_deployment(config, request, ports=dynamo_dynamic_ports) | |
| @pytest.mark.vllm | |
| @pytest.mark.multimodal | |
| @pytest.mark.e2e | |
| @pytest.mark.gpu_1 | |
| @pytest.mark.pre_merge | |
| @pytest.mark.timeout(220) | |
| def test_multimodal_b64_frontend_decoding( | |
| request, | |
| runtime_services_dynamic_ports, | |
| dynamo_dynamic_ports, | |
| predownload_models, | |
| ): | |
| """ | |
| Test multimodal inference with base64 images through frontend decoding path. | |
| This exercises the Rust frontend image decode + NIXL RDMA transfer path | |
| with inline base64 data: URIs (not HTTP URLs). Verifies that the | |
| strip_inline_data_urls optimization does not break correctness. | |
| HF predownload: same model is already listed via ``@pytest.mark.model`` on | |
| ``test_serve_deployment[multimodal_video_agg]`` (pre_merge + gpu_1), so no | |
| extra ``model`` mark is needed here for PR CI. | |
| """ | |
| b64_img = base64.b64encode(get_multimodal_test_image_bytes()).decode() | |
| b64_payload = chat_payload( | |
| [ | |
| { | |
| "type": "text", | |
| "text": "What colors are in the following image? Respond only with the colors.", | |
| }, | |
| { | |
| "type": "image_url", | |
| "image_url": {"url": f"data:image/png;base64,{b64_img}"}, | |
| }, | |
| ], | |
| repeat_count=1, | |
| expected_response=["green"], | |
| temperature=0.0, | |
| max_tokens=100, | |
| ) | |
| config = VLLMConfig( | |
| name="test_multimodal_b64_frontend_decoding", | |
| directory=vllm_dir, | |
| script_name="agg_multimodal.sh", | |
| marks=[], | |
| model="Qwen/Qwen3-VL-2B-Instruct", | |
| script_args=[ | |
| "--model", | |
| "Qwen/Qwen3-VL-2B-Instruct", | |
| "--frontend-decoding", | |
| ], | |
| delayed_start=0, | |
| timeout=220, | |
| request_payloads=[b64_payload], | |
| ) | |
| config = dataclasses.replace( | |
| config, frontend_port=dynamo_dynamic_ports.frontend_port | |
| ) | |
| run_serve_deployment(config, request, ports=dynamo_dynamic_ports) | |
| `@pytest.mark.vllm` | |
| `@pytest.mark.multimodal` | |
| `@pytest.mark.e2e` | |
| `@pytest.mark.gpu_1` | |
| `@pytest.mark.nightly` | |
| `@pytest.mark.model`("Qwen/Qwen2.5-VL-7B-Instruct") | |
| `@pytest.mark.profiled_vram_gib`(19.9) | |
| `@pytest.mark.requested_vllm_kv_cache_bytes`(922_354_000) | |
| `@pytest.mark.timeout`(360) # Match VLLMConfig.timeout for this multimodal deployment | |
| def test_multimodal_b64( | |
| request, | |
| runtime_services_dynamic_ports, | |
| dynamo_dynamic_ports, | |
| predownload_models, | |
| ): | |
| """ | |
| Test multimodal inference with base64 url passthrough. | |
| This test is separate because it loads the required image at runtime | |
| (not collection time), ensuring it only fails when actually executed. | |
| Runs on gpu_1 alongside other single-GPU multimodal tests that use the | |
| same model (mm_agg_qwen2.5-vl-7b). The ``@pytest.mark.model`` mark is | |
| kept as a safety net so the model is predownloaded even if no other | |
| gpu_1 config collects this model in a given CI job. | |
| """ | |
| # Load B64 image at test execution time (uses real PNG even if MULTIMODAL_IMG is LFS pointer) | |
| b64_img = base64.b64encode(get_multimodal_test_image_bytes()).decode() | |
| # Create payload with B64 image | |
| b64_payload = chat_payload( | |
| [ | |
| { | |
| "type": "text", | |
| "text": "What colors are in the following image? Respond only with the colors.", | |
| }, | |
| { | |
| "type": "image_url", | |
| "image_url": {"url": f"data:image/png;base64,{b64_img}"}, | |
| }, | |
| ], | |
| repeat_count=1, | |
| expected_response=["purple"], | |
| max_tokens=100, | |
| ) | |
| # Create test config | |
| config = VLLMConfig( | |
| name="test_multimodal_b64", | |
| directory=vllm_dir, | |
| script_name="agg_multimodal.sh", | |
| marks=[], # markers at function-level | |
| model="Qwen/Qwen2.5-VL-7B-Instruct", | |
| script_args=["--model", "Qwen/Qwen2.5-VL-7B-Instruct"], | |
| delayed_start=0, | |
| timeout=360, | |
| request_payloads=[b64_payload], | |
| ) | |
| config = dataclasses.replace( | |
| config, frontend_port=dynamo_dynamic_ports.frontend_port | |
| ) | |
| run_serve_deployment(config, request, ports=dynamo_dynamic_ports) | |
| `@pytest.mark.vllm` | |
| `@pytest.mark.multimodal` | |
| `@pytest.mark.e2e` | |
| `@pytest.mark.gpu_1` | |
| `@pytest.mark.pre_merge` | |
| `@pytest.mark.model`("Qwen/Qwen3-VL-2B-Instruct") | |
| `@pytest.mark.profiled_vram_gib`(9.6) | |
| `@pytest.mark.requested_vllm_kv_cache_bytes`(1_710_490_000) | |
| `@pytest.mark.timeout`(220) | |
| def test_multimodal_b64_frontend_decoding( | |
| request, | |
| runtime_services_dynamic_ports, | |
| dynamo_dynamic_ports, | |
| predownload_models, | |
| ): | |
| """ | |
| Test multimodal inference with base64 images through frontend decoding path. | |
| This exercises the Rust frontend image decode + NIXL RDMA transfer path | |
| with inline base64 data: URIs (not HTTP URLs). Verifies that the | |
| strip_inline_data_urls optimization does not break correctness. | |
| HF predownload: same model is already listed via ``@pytest.mark.model`` on | |
| ``test_serve_deployment[multimodal_video_agg]`` (pre_merge + gpu_1), so no | |
| extra ``model`` mark is needed here for PR CI. | |
| """ | |
| b64_img = base64.b64encode(get_multimodal_test_image_bytes()).decode() | |
| b64_payload = chat_payload( | |
| [ | |
| { | |
| "type": "text", | |
| "text": "What colors are in the following image? Respond only with the colors.", | |
| }, | |
| { | |
| "type": "image_url", | |
| "image_url": {"url": f"data:image/png;base64,{b64_img}"}, | |
| }, | |
| ], | |
| repeat_count=1, | |
| expected_response=["green"], | |
| temperature=0.0, | |
| max_tokens=100, | |
| ) | |
| config = VLLMConfig( | |
| name="test_multimodal_b64_frontend_decoding", | |
| directory=vllm_dir, | |
| script_name="agg_multimodal.sh", | |
| marks=[], | |
| model="Qwen/Qwen3-VL-2B-Instruct", | |
| script_args=[ | |
| "--model", | |
| "Qwen/Qwen3-VL-2B-Instruct", | |
| "--frontend-decoding", | |
| ], | |
| delayed_start=0, | |
| timeout=220, | |
| request_payloads=[b64_payload], | |
| ) | |
| config = dataclasses.replace( | |
| config, frontend_port=dynamo_dynamic_ports.frontend_port | |
| ) | |
| run_serve_deployment(config, request, ports=dynamo_dynamic_ports) |
🧰 Tools
🪛 Ruff (0.15.12)
[error] 708-708: Undefined name base64
(F821)
[error] 708-708: Undefined name get_multimodal_test_image_bytes
(F821)
[error] 769-769: Undefined name base64
(F821)
[error] 769-769: Undefined name get_multimodal_test_image_bytes
(F821)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/serve/test_vllm.py` around lines 683 - 807, Both standalone tests
(test_multimodal_b64 and test_multimodal_b64_frontend_decoding) build VLLMConfig
with marks=[] so params_with_model_mark() is skipped; add the missing pytest
markers: annotate each test function with `@pytest.mark.model`("org/model-name")
using the appropriate HF model string (e.g., same strings passed to
VLLMConfig.model/script_args) and add `@pytest.mark.profiled_vram_gib`(N) plus the
framework KV marker `@pytest.mark.requested_vllm_kv_cache_bytes`(BYTES) with
suitable values for the GPU fan-out; update the decorators above
test_multimodal_b64 to include the model and both VRAM/KV markers and likewise
add `@pytest.mark.model` and the VRAM/KV markers above
test_multimodal_b64_frontend_decoding to ensure params_with_model_mark() runs
and the new GPU fan-out receives the required metadata.
Add the missing multimodal helper imports.
Both new tests call base64.b64encode() and get_multimodal_test_image_bytes(), but neither symbol is imported in this module. Ruff is already reporting F821 here, and these tests will fail as soon as they execute.
Suggested fix
+import base64
import dataclasses
import logging
import os
import random
@@
-from tests.serve.conftest import MULTIMODAL_IMG_URL
+from tests.serve.conftest import MULTIMODAL_IMG_URL, get_multimodal_test_image_bytesAs per coding guidelines, .ai/python-guidelines.md: "keep imports at module top".
🧰 Tools
🪛 Ruff (0.15.12)
[error] 708-708: Undefined name base64
(F821)
[error] 708-708: Undefined name get_multimodal_test_image_bytes
(F821)
[error] 769-769: Undefined name base64
(F821)
[error] 769-769: Undefined name get_multimodal_test_image_bytes
(F821)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/serve/test_vllm.py` around lines 683 - 807, The tests use
base64.b64encode and get_multimodal_test_image_bytes but those symbols are not
imported; add the missing imports (import base64 and the
get_multimodal_test_image_bytes helper from its test helpers module) at the top
of the module so base64.b64encode(...) and get_multimodal_test_image_bytes()
resolve and Ruff F821 is fixed, keeping imports grouped at module-level per
guidelines.
Signed-off-by: Rini Gupta <rinig@nvidia.com>
Signed-off-by: Rini Gupta <rinig@nvidia.com>
Signed-off-by: Rini Gupta <rinig@nvidia.com>
Signed-off-by: Rini Gupta <rinig@nvidia.com>
Signed-off-by: Rini Gupta <rinig@nvidia.com>
Signed-off-by: Rini Gupta <rinig@nvidia.com>
Overview:
Split the vLLM test suite into component-based buckets (
core,multimodal,router,kvbm) so CI can fan out one job per (backend × component), reducing PR pipeline time. Adds thecoremarker, tags vLLM tests intests/**with exactly one component marker, and updatespr.yamlto run 9 vLLM test jobs in parallel instead of 2 sequential ones.This is the first cut of OPS-4856. Scope is limited to
tests/**and vLLM only — sglang/trtllm andcomponents/src/dynamo/<backend>/tests/**are deferred to follow-up PRs.Details:
New marker:
corepyproject.toml.tests/README.mdand.ai/pytest-guidelines.md(a component marker is now required for backend tests).Tests tagged in
tests/**— every vLLM test now carries exactly one of{core, multimodal, router, kvbm}:multimodal—tests/serve/test_vllm_omni.py, multimodal configs intest_vllm.py/test_vllm_xpu.py, factory tag intests/utils/multimodal.py.router—test_router_e2e_with_vllm.py,agg-router*configs,test_lora_aggregated_router.kvbm— backfilled missing markers acrosstests/kvbm_integration/andtests/dependencies/test_kvbm_imports.py.core— residual: serve, frontend, fault_tolerance, predownload, parser, canary, etc.CI changes (
.github/workflows/pr.yaml) — 2 vLLM test jobs → 9:vllm-test-cpu— gpu_0 only (renamed fromvllm-test).vllm-test-{core,multimodal,router,kvbm}— gpu_1 fan-out.vllm-multi-gpu-{core,multimodal,router,kvbm}— (gpu_2 or gpu_4) fan-out.run_gpu_parallel_testsis enabled for buckets that containprofiled_vram_gib-tagged tests (core, multimodal, router); kvbm has none, so it stays sequential.Misc cleanup bundled in:
blackpre-commit pin from23.1.0→26.3.1(older versions crash on Python 3.12+ via removedast.Str).pyproject.tomlforRequestsDependencyWarningand a blanketDeprecationWarningignore so collection survives transient dep mismatches on newer Pythons.asyncio.iscoroutinefunctionwithinspect.iscoroutinefunctioninlib/bindings/python/tests/test_deprecated_enable_nats.py(Python 3.16 prep).Where should the reviewer start?
pyproject.toml—coremarker registration plus warning-filter additions..github/workflows/pr.yaml— the new 9-job vLLM fan-out structure. Most reviewable risk lives here..ai/pytest-guidelines.md— convention update so contributors and AI agents writing new tests follow the component-marker rule going forward.tests/serve/test_vllm.py/test_vllm_xpu.py(per-config marks) andtests/fault_tolerance/migration/test_vllm.py.Related Issues:
Follow-up TODOs (not in this PR):
components/src/dynamo/<backend>/tests/**(intentionally out of OPS-4856 scope).fault_toleranceoff into its own bucket once we measure runtime —core ∩ fault_toleranceis ~397 of 526 core tests, mostly the migration parametrize matrix.Summary by CodeRabbit
Release Notes
New Features
Testing & CI
Chores