[train] Enable RayPrometheusStatLogger for async vLLM engine #900

kanwang · 2026-01-20T14:55:14Z

Add support for vLLM's RayPrometheusStatLogger to enable metrics collection via Ray's metrics infrastructure for the async vLLM inference engine.

Changes:

Add enable_ray_prometheus_stats config option (default: false)
Pass config through main_base.py and ray_wrapped_inference_engine.py
Modify AsyncVLLMInferenceEngine to create and use RayPrometheusStatLogger when enabled (requires vLLM >= 0.9.0)
Add unit tests for the new feature

Usage:
generator: async_engine: true enable_ray_prometheus_stats: true

Add support for vLLM's RayPrometheusStatLogger to enable metrics collection via Ray's metrics infrastructure for the async vLLM inference engine. Changes: - Add `enable_ray_prometheus_stats` config option (default: false) - Pass config through main_base.py and ray_wrapped_inference_engine.py - Modify AsyncVLLMInferenceEngine to create and use RayPrometheusStatLogger when enabled (requires vLLM >= 0.9.0) - Add unit tests for the new feature Usage: generator: async_engine: true enable_ray_prometheus_stats: true Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request adds support for vLLM's RayPrometheusStatLogger to enable metrics collection for the async vLLM engine. The changes include adding a new configuration option, passing it through the necessary components, and conditionally enabling the stat logger in AsyncVLLMInferenceEngine. The implementation correctly handles different vLLM versions by using a try-except block for the import. Unit tests have been added to verify the new functionality.

My review focuses on improving documentation clarity and test coverage. I've suggested updating a docstring to match the implementation and adding a test case for the failure scenario where the required vLLM module is not available.

skyrl-train/skyrl_train/inference_engines/vllm/vllm_engine.py

gemini-code-assist · 2026-01-20T14:58:56Z

skyrl-train/tests/cpu/inf_engines/vllm/test_ray_prometheus_stats.py

+import sys
+
+
+class TestRayPrometheusStatLoggers:


It's great that you've added tests for the success case. To make the tests more robust, consider adding a test case for the failure scenario where RayPrometheusStatLogger cannot be imported (e.g., on older vLLM versions). This would verify that the method correctly returns None and logs a warning.

You could add a test like this to TestRayPrometheusStatLoggers:

def test_create_ray_prometheus_stat_loggers_v1_unavailable(self): """Test that None is returned when vLLM v1 API is not available.""" # By setting the module to None in sys.modules, the import will fail. with patch.dict(sys.modules, {"vllm.v1.metrics.ray_wrappers": None}): from skyrl_train.inference_engines.vllm.vllm_engine import AsyncVLLMInferenceEngine # Create a minimal instance without actually initializing the engine engine = object.__new__(AsyncVLLMInferenceEngine) with patch("skyrl_train.inference_engines.vllm.vllm_engine.logger") as mock_logger: result = engine._create_ray_prometheus_stat_loggers() assert result is None mock_logger.warning.assert_called_once() assert "not available in this vLLM version" in mock_logger.warning.call_args[0][0]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Test verifies that _create_ray_prometheus_stat_loggers returns None and logs a warning when vLLM v1 API is not available (e.g., on older vLLM versions). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

CharlieFRuan · 2026-01-22T05:35:27Z

@kanwang really appreciate the PR! Could you run the SkyRL/format.sh for the CI? Thank you!

kanwang · 2026-01-22T18:29:00Z

@kanwang really appreciate the PR! Could you run the SkyRL/format.sh for the CI? Thank you!

Thanks! just did.

CharlieFRuan · 2026-01-22T20:44:10Z

Ah I see the CI still failing due to vllm not being available in the CPU test (only available in GPU test). I can fix it myself

gemini-code-assist bot reviewed Jan 20, 2026

View reviewed changes

kanwang and others added 2 commits January 20, 2026 10:32

Update skyrl-train/skyrl_train/inference_engines/vllm/vllm_engine.py

6213a78

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Add test for RayPrometheusStatLogger import failure scenario

34cee4b

Test verifies that _create_ray_prometheus_stat_loggers returns None and logs a warning when vLLM v1 API is not available (e.g., on older vLLM versions). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

CharlieFRuan self-assigned this Jan 22, 2026

fix format

78952e9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[train] Enable RayPrometheusStatLogger for async vLLM engine #900

[train] Enable RayPrometheusStatLogger for async vLLM engine #900

kanwang commented Jan 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

CharlieFRuan commented Jan 22, 2026

Uh oh!

kanwang commented Jan 22, 2026

Uh oh!

CharlieFRuan commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[train] Enable RayPrometheusStatLogger for async vLLM engine #900

Are you sure you want to change the base?

[train] Enable RayPrometheusStatLogger for async vLLM engine #900

Conversation

kanwang commented Jan 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan commented Jan 22, 2026

Uh oh!

kanwang commented Jan 22, 2026

Uh oh!

CharlieFRuan commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants