Pipeline training support + priority based rollout scheduler by mayinghan · Pull Request #358 · eval-protocol/python-sdk

mayinghan · 2025-12-05T07:47:23Z

This PR added support for 1) priority based rollout scheduler and 2) Micro Batch Output Data Buffer.

example run script:
rm -rf test2/ && PYTHONPATH=./ EP_MICRO_BATCH_OUTPUT_SIZE=2 PYTHONPATH=./ EP_USE_PRIORITY_SCHEDULER=1 EP_NO_UPLOAD=1 python -m pytest -sv tests/pytest/test_rollout_scheduler.py::test_rollout_scheduler --ep-output-dir ./test2

Output dir structure be like:

Priority based rollout scheduler with rewrite speculation support.

Rollout tasks are scheduled in a way that rollouts from the same group will be executed at the same time.
A new parameter in_group_microbatch_size is added to support rollout K samples (K < rollout_n) every time and feed their responses in the "prediction" field or the later mini batches to support rewrite speculation to accelerate the rollout.
Evaluation function is run in a non blocking way in background task pool.
This can help achieve
- better kv cache hit rate
- compability of utilizing rewrite speculation to speed up inference.

MicroBatchDataBuffer

Once a full group of rollout_n samples are ready, they will be push to the output buffer.
The output buffer will write to disk once the microbatch size condition meets, the result will be directly consumed by trainer to kick off a training step.

Note

Adds a priority-based rollout scheduler (with optional speculation) and a micro-batch output buffer, wired into evaluation_test via env flags and covered by new tests.

Core (pytest):
- Priority Scheduler: New eval_protocol/pytest/priority_scheduler.py
  - PriorityRolloutScheduler with priority-queued micro-batches, per-row run batching, background eval execution, and optional rewrite speculation (via ENABLE_SPECULATION).
  - execute_priority_rollouts(...) entrypoint; respects max_concurrent_rollouts/max_concurrent_evaluations.
- Micro-batch Buffer: New eval_protocol/pytest/buffer.py
  - MicroBatchDataBuffer buffers per-sample results until all num_runs complete; flushes JSONL batches to disk (output_path_template).
- Integration: evaluation_test.py
  - Optional scheduler path gated by EP_USE_PRIORITY_SCHEDULER (disabled for MCPGymRolloutProcessor).
  - Optional buffering via EP_MICRO_BATCH_OUTPUT_SIZE and EP_OUTPUT_DIR (auto-close on completion).
  - Aggregates run_index from execution_metadata.extra to populate all_results; invokes postprocess accordingly.
Validation:
- validate_signature.py: remove groupwise requirement for at least 2 completion_params.
Tests:
- tests/test_priority_scheduler.py: unit tests for basic execution, concurrency limits, priority ordering, worker scaling, and groupwise behavior.
- tests/pytest/test_rollout_scheduler.py: pytest-style tests for pointwise and groupwise modes using the new scheduler.

^{Written by Cursor Bugbot for commit b29af96. This will update automatically on new commits. Configure here.}

eval_protocol/pytest/priority_scheduler.py

eval_protocol/pytest/evaluation_test.py

eval_protocol/pytest/priority_scheduler.py

cursor · 2025-12-05T07:53:49Z

eval_protocol/pytest/priority_scheduler.py

+                        await self.output_buffer.add_result(row)
+                else:
+                    self._post_process_result(eval_res)
+                    await self.output_buffer.add_result(eval_res)


Bug: Results skip post-processing when output buffer disabled

The _post_process_result method is only called inside the if self.output_buffer: conditional block. When the mini batch buffer feature is not configured (i.e., output_buffer is None), results are appended to self.results but _post_process_result is never invoked. This skips critical operations: add_cost_metrics() is not called, eval_metadata.status remains stuck at "RUNNING" instead of being updated to finished/error, and results are never logged via active_logger.log().

eval_protocol/pytest/priority_scheduler.py

eval_protocol/pytest/evaluation_test.py

cursor · 2025-12-05T18:52:53Z

eval_protocol/pytest/priority_scheduler.py

+                 full_group = self.groups_buffer.pop(task.row_index)
+                 t = asyncio.create_task(_run_eval(full_group))
+                 self.background_tasks.add(t)
+                 t.add_done_callback(self.background_tasks.discard)


Bug: Priority scheduler missing "all" mode evaluation handling

The _process_task method only handles mode == "pointwise" (line 216) and mode == "groupwise" (line 233) for triggering evaluations. However, EvaluationTestMode includes "all" as a valid mode. When mode="all" is used with the priority scheduler enabled, rollouts complete but evaluations are never triggered, silently skipping the evaluation step entirely.

we should remove all mode, its not that useful cc @morgendave

eval_protocol/pytest/evaluation_test.py

eval_protocol/pytest/priority_scheduler.py

morgendave · 2025-12-08T00:42:20Z

eval_protocol/pytest/priority_scheduler.py

+    """
+    Represents a single unit of work for the worker pool.
+    Priority tuple structure: (status, row_index)
+      - status: 0 = High Priority (e.g., subsequent micro-batches of an already started sample)


Later on based on strategy we could change some of the priorities

morgendave · 2025-12-08T00:50:45Z

eval_protocol/pytest/priority_scheduler.py

+
+        # Concurrency Control
+        self.rollout_sem = asyncio.Semaphore(max_concurrent_rollouts)
+        self.eval_sem = asyncio.Semaphore(max_concurrent_evaluations)


rollout_sem and eval_sem duplicates? rollout_sem not used

mb forgot to delete it. there is a global semiphore used in rolloutprocessor, so this one is not needed in the current design

morgendave · 2025-12-08T00:54:50Z

eval_protocol/pytest/priority_scheduler.py

+            run_id = rows_to_eval[0].execution_metadata.run_id if isinstance(rows_to_eval, list) else rows_to_eval.execution_metadata.run_id
+            eval_res = None
+
+            async with self.eval_sem:


If eval_sem max is less than rollout concurrency this might be blocked and timeout?

yes, but this is needed if user is using another env or llm as the judge where there is a qps limit.

Any retry at this level?

not in this pr, the status quo is we dont have default retry on user's evaluator function. we can discuss if we want to have it with @xzrderek in a seaprate thread i think

morgendave · 2025-12-08T01:04:44Z

eval_protocol/pytest/priority_scheduler.py

+            finally:
+                self.queue.task_done()
+
+    async def _process_task(self, task: RolloutTask):


We can treat this as first step but this is not making scheduling waste go away.
Currently this would do

inference call

eval

Step 1 for multi-turns is wasting, we might need some feedbacks to make concurrency changed
Step2 is blocking step1 from finishing/getting new rollouts started, which means we should optimize it first

synced offline, eval is running in background tasks, so it wont be blocked

mayinghan · 2025-12-08T01:12:03Z

eval_protocol/pytest/priority_scheduler.py

+                    batch_results.append(result_row)
+                    # in pointwise, we start evaluation immediately
+                    if self.mode == "pointwise":
+                        t = asyncio.create_task(_run_eval(result_row))


cc @morgendave eval will be executed in a background task pool (actually there is not a pool, just submitted them as background tasks) so the inference wont be blocked as before

eval_protocol/pytest/buffer.py

mayinghan added 4 commits November 27, 2025 09:51

add

d9ab3d4

add priority rolluot scheduler

2865b79

groupwise

37e0210

add

ff329d8

mayinghan marked this pull request as ready for review December 5, 2025 07:47

cursor bot reviewed Dec 5, 2025

View reviewed changes

add

b556f4e

cursor bot reviewed Dec 5, 2025

View reviewed changes

eval_protocol/pytest/priority_scheduler.py Outdated Show resolved Hide resolved

mayinghan added 3 commits December 5, 2025 10:44

fix

5fc935c

put it back

9219921

add

fae3150

cursor bot reviewed Dec 5, 2025

View reviewed changes

add postprocess

f785514

cursor bot reviewed Dec 5, 2025

View reviewed changes

eval_protocol/pytest/evaluation_test.py Outdated Show resolved Hide resolved

mayinghan requested review from benjibc, morgendave and xzrderek December 7, 2025 00:44

morgendave reviewed Dec 8, 2025

View reviewed changes

mayinghan commented Dec 8, 2025

View reviewed changes

resolve comments and fix bugs

81fbc70

cursor bot reviewed Dec 8, 2025

View reviewed changes

eval_protocol/pytest/buffer.py Show resolved Hide resolved

fix

b29af96

mayinghan requested a review from morgendave December 9, 2025 06:10

mayinghan merged commit 8219c44 into main Dec 9, 2025
15 of 16 checks passed

mayinghan deleted the pipeline-training-support branch December 9, 2025 23:09

Conversation

mayinghan commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Priority based rollout scheduler with rewrite speculation support.

MicroBatchDataBuffer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Dec 5, 2025

Choose a reason for hiding this comment

Bug: Results skip post-processing when output buffer disabled

Uh oh!

Uh oh!

Uh oh!

cursor bot Dec 5, 2025

Choose a reason for hiding this comment

Bug: Priority scheduler missing "all" mode evaluation handling

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mayinghan commented Dec 5, 2025 •

edited

Loading