[tx] Add experimental SkyRL-train backend that supports SFT #871

pcmoritz · 2026-01-13T19:44:29Z

The engine can e.g. by started with

uv run --extra gpu --extra tinker -m tx.tinker.api --base-model Qwen/Qwen3-4B --backend "skyrl_train"

and then you can e.g. run

uv run --with wandb --with tinker sl_loop.py base_url=http://localhost:8000 model_name=Qwen/Qwen3-8B lora_rank=1

gemini-code-assist

Code Review

This pull request introduces a new SkyRL-train backend for supervised training. The changes include updating project dependencies in pyproject.toml and adding the new backend implementation in skyrl-tx/tx/tinker/backends/skyrl_train.py. While this is a good starting point for the new backend, my review has identified several issues that need to be addressed. The most critical issue is in the forward_backward method, which is currently a stub and does not perform a backward pass or return actual losses, preventing any training from occurring. Other significant issues include the use of hardcoded paths and hyperparameters, potentially incorrect token padding, and breaking encapsulation by accessing private members of a library class. Addressing these points will be crucial for the backend to be functional and maintainable.

skyrl-tx/tx/tinker/backends/skyrl_train.py

skyrl-tx/pyproject.toml

skyrl-tx/tx/tinker/backends/skyrl_train.py

gemini-code-assist · 2026-01-13T19:49:59Z

skyrl-tx/tx/tinker/backends/skyrl_train.py

+        ray.get([actor.save_checkpoint.remote(output_path) for actor in self._actor_group._actor_handlers])
+
+    def load_checkpoint(self, checkpoint_path, model_id: str) -> None:
+        if model_id != self._model_id:
+            raise ValueError(f"Model {model_id} not found")
+        ray.get([actor.load_checkpoint.remote(Path(checkpoint_path)) for actor in self._actor_group._actor_handlers])


Accessing the private member _actor_handlers of PPORayActorGroup breaks encapsulation and makes the code dependent on the internal implementation of the skyrl-train library. This could lead to breakages if the library is updated. It would be more robust to use a public API from PPORayActorGroup for this purpose, or request one if it doesn't exist.

vercel · 2026-01-25T09:11:20Z

@pcmoritz is attempting to deploy a commit to the Tyler's projects Team on Vercel.

A member of the Team first needs to authorize it.

…ropy") Enables supervised fine-tuning using the Tinker-compatible API. Changes: - ppo_utils.py: Add CROSS_ENTROPY loss type and cross_entropy_loss() function - worker.py: Add SFT code path that returns per-token logprobs and elementwise_loss - worker_dispatch.py: Add loss_fn and loss_fn_config params to forward_backward() - dispatch.py: Update MeshDispatch to pass through kwargs (loss_fn, loss_fn_config) - replay_buffer.py: Make action_log_probs optional in Experience - worker_utils.py: Use .get() for optional fields; handle non-scalar metrics New: - examples/sft/: Minimal SFT example demonstrating the API This enables PR #871 (SkyRL-train backend for Tinker) to return proper per-token values instead of placeholder data. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

pcmoritz added 3 commits January 13, 2026 10:44

[tx] Add SkyRL-train backend

00f821e

update

8d3e210

build requirements

1d4914d

pcmoritz added the tx label Jan 13, 2026

gemini-code-assist bot reviewed Jan 13, 2026

View reviewed changes

pcmoritz added 2 commits January 13, 2026 15:14

update

40b2fcf

update

b3f5d34

tyler-griggs mentioned this pull request Jan 23, 2026

Add SkyRLInferenceClient adapter and Tinker API tests (Stage 2) #929

Merged

5 tasks

pcmoritz added 12 commits January 24, 2026 13:45

Merge branch 'main' into tx-skyrl-train-backend

8d8c2f1

update

75aa030

update

a677873

update

9415999

update

275cb57

update

534c0bb

update

ec2cc81

update

f540c8e

update

64d7b01

update

34da65c

update

f7557b4

update

877b3e5

pcmoritz added 5 commits January 25, 2026 09:28

update

4fe7a67

update

326280b

update

6242d9f

update

ce20462

update

1066fcf

pcmoritz changed the title ~~[tx] [WIP] Add SkyRL-train backend~~ [tx] Add SkyRL-train backend Jan 25, 2026

pcmoritz added 3 commits January 25, 2026 02:02

black

9461891

update

9f7b31c

update

69d3db9

pcmoritz changed the title ~~[tx] Add SkyRL-train backend~~ [tx] Add experimental SkyRL-train backend that supports SFT Jan 25, 2026

tyler-griggs mentioned this pull request Jan 26, 2026

[skyrl-train] Add SFT support via forward_backward(loss_fn="cross_entropy") #960

Closed

2 tasks

tyler-griggs mentioned this pull request Jan 26, 2026

[skyrl-train] Add SFT support via forward_backward(loss_fn="cross_entropy") #961

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tx] Add experimental SkyRL-train backend that supports SFT #871

[tx] Add experimental SkyRL-train backend that supports SFT #871

pcmoritz commented Jan 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Jan 13, 2026

Uh oh!

vercel bot commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[tx] Add experimental SkyRL-train backend that supports SFT #871

Are you sure you want to change the base?

[tx] Add experimental SkyRL-train backend that supports SFT #871

Conversation

pcmoritz commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

vercel bot commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pcmoritz commented Jan 13, 2026 •

edited

Loading