Pipeline training support + priority based rollout scheduler#358
Pipeline training support + priority based rollout scheduler#358
Conversation
| await self.output_buffer.add_result(row) | ||
| else: | ||
| self._post_process_result(eval_res) | ||
| await self.output_buffer.add_result(eval_res) |
There was a problem hiding this comment.
Bug: Results skip post-processing when output buffer disabled
The _post_process_result method is only called inside the if self.output_buffer: conditional block. When the mini batch buffer feature is not configured (i.e., output_buffer is None), results are appended to self.results but _post_process_result is never invoked. This skips critical operations: add_cost_metrics() is not called, eval_metadata.status remains stuck at "RUNNING" instead of being updated to finished/error, and results are never logged via active_logger.log().
| full_group = self.groups_buffer.pop(task.row_index) | ||
| t = asyncio.create_task(_run_eval(full_group)) | ||
| self.background_tasks.add(t) | ||
| t.add_done_callback(self.background_tasks.discard) |
There was a problem hiding this comment.
Bug: Priority scheduler missing "all" mode evaluation handling
The _process_task method only handles mode == "pointwise" (line 216) and mode == "groupwise" (line 233) for triggering evaluations. However, EvaluationTestMode includes "all" as a valid mode. When mode="all" is used with the priority scheduler enabled, rollouts complete but evaluations are never triggered, silently skipping the evaluation step entirely.
There was a problem hiding this comment.
we should remove all mode, its not that useful cc @morgendave
| """ | ||
| Represents a single unit of work for the worker pool. | ||
| Priority tuple structure: (status, row_index) | ||
| - status: 0 = High Priority (e.g., subsequent micro-batches of an already started sample) |
There was a problem hiding this comment.
Later on based on strategy we could change some of the priorities
|
|
||
| # Concurrency Control | ||
| self.rollout_sem = asyncio.Semaphore(max_concurrent_rollouts) | ||
| self.eval_sem = asyncio.Semaphore(max_concurrent_evaluations) |
There was a problem hiding this comment.
rollout_sem and eval_sem duplicates? rollout_sem not used
There was a problem hiding this comment.
mb forgot to delete it. there is a global semiphore used in rolloutprocessor, so this one is not needed in the current design
| run_id = rows_to_eval[0].execution_metadata.run_id if isinstance(rows_to_eval, list) else rows_to_eval.execution_metadata.run_id | ||
| eval_res = None | ||
|
|
||
| async with self.eval_sem: |
There was a problem hiding this comment.
If eval_sem max is less than rollout concurrency this might be blocked and timeout?
There was a problem hiding this comment.
yes, but this is needed if user is using another env or llm as the judge where there is a qps limit.
There was a problem hiding this comment.
Any retry at this level?
There was a problem hiding this comment.
not in this pr, the status quo is we dont have default retry on user's evaluator function. we can discuss if we want to have it with @xzrderek in a seaprate thread i think
| finally: | ||
| self.queue.task_done() | ||
|
|
||
| async def _process_task(self, task: RolloutTask): |
There was a problem hiding this comment.
We can treat this as first step but this is not making scheduling waste go away.
Currently this would do
- inference call
- eval
Step 1 for multi-turns is wasting, we might need some feedbacks to make concurrency changed
Step2 is blocking step1 from finishing/getting new rollouts started, which means we should optimize it first
There was a problem hiding this comment.
synced offline, eval is running in background tasks, so it wont be blocked
| batch_results.append(result_row) | ||
| # in pointwise, we start evaluation immediately | ||
| if self.mode == "pointwise": | ||
| t = asyncio.create_task(_run_eval(result_row)) |
There was a problem hiding this comment.
cc @morgendave eval will be executed in a background task pool (actually there is not a pool, just submitted them as background tasks) so the inference wont be blocked as before
This PR added support for 1) priority based rollout scheduler and 2) Micro Batch Output Data Buffer.
example run script:
rm -rf test2/ && PYTHONPATH=./ EP_MICRO_BATCH_OUTPUT_SIZE=2 PYTHONPATH=./ EP_USE_PRIORITY_SCHEDULER=1 EP_NO_UPLOAD=1 python -m pytest -sv tests/pytest/test_rollout_scheduler.py::test_rollout_scheduler --ep-output-dir ./test2
Output dir structure be like:

Priority based rollout scheduler with rewrite speculation support.
in_group_microbatch_sizeis added to support rollout K samples (K < rollout_n) every time and feed their responses in the "prediction" field or the later mini batches to support rewrite speculation to accelerate the rollout.MicroBatchDataBuffer
Note
Adds a priority-based rollout scheduler (with optional speculation) and a micro-batch output buffer, wired into
evaluation_testvia env flags and covered by new tests.eval_protocol/pytest/priority_scheduler.pyPriorityRolloutSchedulerwith priority-queued micro-batches, per-row run batching, background eval execution, and optional rewrite speculation (viaENABLE_SPECULATION).execute_priority_rollouts(...)entrypoint; respectsmax_concurrent_rollouts/max_concurrent_evaluations.eval_protocol/pytest/buffer.pyMicroBatchDataBufferbuffers per-sample results until allnum_runscomplete; flushes JSONL batches to disk (output_path_template).evaluation_test.pyEP_USE_PRIORITY_SCHEDULER(disabled forMCPGymRolloutProcessor).EP_MICRO_BATCH_OUTPUT_SIZEandEP_OUTPUT_DIR(auto-close on completion).run_indexfromexecution_metadata.extrato populateall_results; invokespostprocessaccordingly.validate_signature.py: remove groupwise requirement for at least 2completion_params.tests/test_priority_scheduler.py: unit tests for basic execution, concurrency limits, priority ordering, worker scaling, and groupwise behavior.tests/pytest/test_rollout_scheduler.py: pytest-style tests for pointwise and groupwise modes using the new scheduler.Written by Cursor Bugbot for commit b29af96. This will update automatically on new commits. Configure here.