[skyrl-train][refactor] 2/N Inference Server Refactor -- RemoteInferenceClient #904

kouroshHakha · 2026-01-20T22:03:58Z

Summary: Adds RemoteInferenceClient, a lightweight, fully serializable HTTP client that wraps inference server APIs. This client replaces the old InferenceEngineInterface for HTTP-based inference and can work with any HTTP-compatible inference backend (vLLM, sglang-router, Ray Serve LLM, etc.).

Key Features:

Serializable: Can be pickled and passed between Ray actors/processes
Two URL types: proxy_url for data plane (roundrobin / sticky session router), server_urls for control plane (fan-out)
Data plane: generate(), chat_completion(), completion(), tokenize(), detokenize()
Control plane: pause(), resume(), sleep(), wake_up(), reset_prefix_cache()
Weight sync: init_weight_transfer(), update_weights(), finalize_weight_update()
PauseMode enum: Forward-compatible with vLLM RFC #32103 pause modes
Built-in retry on abort: Handles stop_reason="abort" during weight sync

Comparison vs InferenceEngineInterface + InferenceEngineClient:

Serializable - Just URLs, no Ray actors/tokenizers/thread events
No local tokenizer - Uses /tokenize endpoint instead
Server-side routing - Router handles session routing via X-Session-ID header
Simplified parallelism - Single get_world_size() vs separate tp_size(), pp_size(), dp_size()
No ABC hierarchy - Simple dataclass with async methods
Backend-agnostic - Works with any HTTP server (vLLM, sglang, Ray Serve LLM)

Files Added:

skyrl_train/inference_servers/remote_inference_client.py - The client implementation
tests/cpu/inference_servers/test_remote_inference_client.py - Unit tests

Next: Integration with training code via setup_inference() hook in BasePPOExp.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

gemini-code-assist

Code Review

This pull request refactors the inference engine by replacing the InferenceEngineInterface with a new RemoteInferenceClient for HTTP-based inference, introducing new modules for common utilities, protocols, server groups, and a robust router. While the changes are well-structured and include comprehensive unit and GPU CI tests, it introduces significant security risks. The most critical issue is the use of pickle.loads in the vLLM worker extension, which provides a direct path to Remote Code Execution (RCE). Additionally, the lack of authentication on sensitive control plane and weight synchronization endpoints in both the router and the server actor exposes the cluster to unauthorized control and potential weight hijacking. These security concerns must be addressed before deployment in untrusted network environments.

gemini-code-assist · 2026-01-20T22:06:33Z

skyrl-train/skyrl_train/inference_servers/vllm_worker.py

+
+        # Unpickle init_info to restore the original object type
+        assert isinstance(init_info, bytes), f"Expected bytes, got {type(init_info).__name__}"
+        init_info = pickle.loads(init_info)


The init_weight_update_communicator method uses pickle.loads() to deserialize init_info. This is a critical security vulnerability because pickle is inherently insecure and can be exploited to execute arbitrary code during deserialization. An attacker who can trigger this RPC call with a malicious payload can achieve Remote Code Execution (RCE) on all vLLM workers.

Recommendation: Replace pickle with a secure serialization format such as JSON. Since BroadcastInitInfo is a dataclass, it can be easily converted to and from a JSON-compatible dictionary.

gemini-code-assist · 2026-01-20T22:06:33Z

skyrl-train/skyrl_train/inference_servers/vllm_worker.py

+
+        # Unpickle request to restore the original object type
+        assert isinstance(request, bytes), f"Expected bytes, got {type(request).__name__}"
+        request = pickle.loads(request)


The load_weights method uses pickle.loads() to deserialize the request object. Similar to the vulnerability in init_weight_update_communicator, this allows for arbitrary code execution via a crafted pickle stream.

Recommendation: Use a secure serialization format like JSON instead of pickle.

gemini-code-assist · 2026-01-20T22:06:33Z

skyrl-train/skyrl_train/inference_servers/router.py

+    def _build_app(self) -> FastAPI:
+        """Build the FastAPI app with proxy routes."""
+        app = FastAPI(
+            title="SkyRL Inference Router",
+            docs_url=None,
+            redoc_url=None,
+            openapi_url=None,
+        )
+
+        @app.get("/health")
+        async def health():
+            """Router health check (doesn't proxy to backends)."""
+            return {"status": "healthy"}
+
+        @app.get("/servers")
+        async def list_servers():
+            """Return list of server URLs."""
+            return {"servers": self._server_urls}
+
+        @app.get("/get_server_info")
+        async def get_server_info():
+            """Fetch server info from all servers, return mapping."""
+            return await self._fan_out_get("/get_server_info")
+
+        # Catch-all: proxy everything else to backends
+        @app.api_route(
+            "/{path:path}",
+            methods=["GET", "POST", "PUT", "DELETE", "PATCH", "OPTIONS", "HEAD"],
+        )
+        async def proxy(request: Request, path: str):
+            return await self._proxy_request(request, f"/{path}")
+
+        return app


The InferenceRouter exposes several sensitive control plane routes (e.g., /pause, /resume, /init_weight_transfer, /update_weights) without any authentication or authorization mechanism. This allows any user with network access to the router to disrupt the inference service or potentially hijack model weights by pointing workers to a malicious master node.

Recommendation: Implement an authentication mechanism, such as API keys or OAuth2, and ensure the router validates credentials before processing or proxying requests to these sensitive endpoints.

gemini-code-assist · 2026-01-20T22:06:33Z

skyrl-train/skyrl_train/inference_servers/vllm_server_actor.py

+        """Add custom SkyRL endpoints to the FastAPI app."""
+        engine = self._engine
+
+        @app.get("/get_server_info")
+        async def _get_server_info():
+            """Return server parallelism info."""
+            return self._get_extended_server_info()
+
+        # TODO (Kourosh): After https://github.com/vllm-project/vllm/pull/
+        # 31943/ is merged, use the native API.
+        @app.post("/init_weight_transfer")
+        async def _init_weight_transfer(request: Request):
+            """Initialize weight sync process group."""
+            from skyrl_train.weight_sync import BroadcastInitInfo
+
+            data = await request.json()
+            init_info = BroadcastInitInfo(**data).for_engine(
+                engine_index=self._server_idx,
+                tp_size=self._cli_args.tensor_parallel_size,
+                pp_size=self._cli_args.pipeline_parallel_size,
+            )
+            pickled_init_info = pickle.dumps(init_info)
+
+            await engine.collective_rpc(
+                "init_weight_update_communicator",
+                args=(pickled_init_info,),
+            )
+            return {"status": "ok"}
+
+        @app.post("/update_weights")
+        async def _update_weights(request: Request):
+            """Update model weights via NCCL broadcast."""
+            from skyrl_train.weight_sync import BroadcastWeightUpdateRequest
+
+            data = await request.json()
+            weight_request = BroadcastWeightUpdateRequest(**data)
+            pickled_request = pickle.dumps(weight_request)
+
+            await engine.collective_rpc(
+                "load_weights",
+                args=(pickled_request,),
+            )
+            return {"status": "ok"}
+
+        @app.post("/finalize_weight_update")
+        async def _finalize_weight_update(request: Request):
+            """
+            Finalize weight update - post-processing hook.
+
+            Currently a no-op, reserved for future use e.g. Quantization
+            See https://github.com/vllm-project/vllm/issues/31848 for more
+            details.
+            """
+            # No-op for now - placeholder for future post-processing
+            return {"status": "ok"}
+


The VLLMServerActor adds custom endpoints for weight synchronization and cluster management directly to the FastAPI application without any authentication. These endpoints trigger sensitive operations, including the insecure pickle.loads calls in the workers.

Recommendation: Protect these custom endpoints with an authentication layer (e.g., FastAPI dependencies or middleware) to ensure only authorized training components can trigger these operations.

gemini-code-assist · 2026-01-20T22:06:33Z

skyrl-train/skyrl_train/inference_servers/vllm_worker.py

+        strategy_cls = init_info.strategy_type()
+
+        if hasattr(self, "_weight_receiver") and self._weight_receiver is not None:
+            # TODO(haochen): we should get rid of this flag and override existing receiver.


The check if hasattr(self, "_weight_receiver") and self._weight_receiver is not None: can be simplified to if self._weight_receiver is not None:, as _weight_receiver is initialized to None in __init__ or will be set by create_receiver.

if self._weight_receiver is not None:

kouroshHakha added 20 commits January 18, 2026 21:23

v0

40e538b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

common

a52b0dc

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

vllm_server_actor

6d68e2f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

pool

d0d2990

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

d20b4bd

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

group

07f3d9f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

1a48e61

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

e290f4b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

tests

509538f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Wip

555082b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

afcc8de

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Wip

058cb95

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

7c8fc0b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

lint

68dc4ed

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

lint

dce17d2

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

gemini fback

22c12ad

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

05bfc92

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

eca0e3d

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

bdd1d8a

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

9bf4173

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha changed the title ~~PR 1/N: Inference Server Refactor -- RemoteInferenceClient~~ [skyrl-train][refactor] 2/N Inference Server Refactor -- RemoteInferenceClient Jan 20, 2026

gemini-code-assist bot reviewed Jan 20, 2026

View reviewed changes

CharlieFRuan self-assigned this Jan 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[skyrl-train][refactor] 2/N Inference Server Refactor -- RemoteInferenceClient #904

[skyrl-train][refactor] 2/N Inference Server Refactor -- RemoteInferenceClient #904

kouroshHakha commented Jan 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[skyrl-train][refactor] 2/N Inference Server Refactor -- RemoteInferenceClient #904

Are you sure you want to change the base?

[skyrl-train][refactor] 2/N Inference Server Refactor -- RemoteInferenceClient #904

Conversation

kouroshHakha commented Jan 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants