Skip to content

Conversation

@kanwang
Copy link

@kanwang kanwang commented Jan 20, 2026

Add support for vLLM's RayPrometheusStatLogger to enable metrics collection via Ray's metrics infrastructure for the async vLLM inference engine.

Changes:

  • Add enable_ray_prometheus_stats config option (default: false)
  • Pass config through main_base.py and ray_wrapped_inference_engine.py
  • Modify AsyncVLLMInferenceEngine to create and use RayPrometheusStatLogger when enabled (requires vLLM >= 0.9.0)
  • Add unit tests for the new feature

Usage:
generator: async_engine: true enable_ray_prometheus_stats: true

Add support for vLLM's RayPrometheusStatLogger to enable metrics collection
via Ray's metrics infrastructure for the async vLLM inference engine.

Changes:
- Add `enable_ray_prometheus_stats` config option (default: false)
- Pass config through main_base.py and ray_wrapped_inference_engine.py
- Modify AsyncVLLMInferenceEngine to create and use RayPrometheusStatLogger
  when enabled (requires vLLM >= 0.9.0)
- Add unit tests for the new feature

Usage:
  generator:
    async_engine: true
    enable_ray_prometheus_stats: true

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for vLLM's RayPrometheusStatLogger to enable metrics collection for the async vLLM engine. The changes include adding a new configuration option, passing it through the necessary components, and conditionally enabling the stat logger in AsyncVLLMInferenceEngine. The implementation correctly handles different vLLM versions by using a try-except block for the import. Unit tests have been added to verify the new functionality.

My review focuses on improving documentation clarity and test coverage. I've suggested updating a docstring to match the implementation and adding a test case for the failure scenario where the required vLLM module is not available.

import sys


class TestRayPrometheusStatLoggers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's great that you've added tests for the success case. To make the tests more robust, consider adding a test case for the failure scenario where RayPrometheusStatLogger cannot be imported (e.g., on older vLLM versions). This would verify that the method correctly returns None and logs a warning.

You could add a test like this to TestRayPrometheusStatLoggers:

def test_create_ray_prometheus_stat_loggers_v1_unavailable(self):
    """Test that None is returned when vLLM v1 API is not available."""
    # By setting the module to None in sys.modules, the import will fail.
    with patch.dict(sys.modules, {"vllm.v1.metrics.ray_wrappers": None}):
        from skyrl_train.inference_engines.vllm.vllm_engine import AsyncVLLMInferenceEngine

        # Create a minimal instance without actually initializing the engine
        engine = object.__new__(AsyncVLLMInferenceEngine)

        with patch("skyrl_train.inference_engines.vllm.vllm_engine.logger") as mock_logger:
            result = engine._create_ray_prometheus_stat_loggers()

            assert result is None
            mock_logger.warning.assert_called_once()
            assert "not available in this vLLM version" in mock_logger.warning.call_args[0][0]

kanwang and others added 2 commits January 20, 2026 10:32
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Test verifies that _create_ray_prometheus_stat_loggers returns None
and logs a warning when vLLM v1 API is not available (e.g., on older
vLLM versions).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@CharlieFRuan CharlieFRuan self-assigned this Jan 22, 2026
@CharlieFRuan
Copy link
Collaborator

@kanwang really appreciate the PR! Could you run the SkyRL/format.sh for the CI? Thank you!

@kanwang
Copy link
Author

kanwang commented Jan 22, 2026

@kanwang really appreciate the PR! Could you run the SkyRL/format.sh for the CI? Thank you!

Thanks! just did.

@CharlieFRuan
Copy link
Collaborator

Ah I see the CI still failing due to vllm not being available in the CPU test (only available in GPU test). I can fix it myself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants