-
Notifications
You must be signed in to change notification settings - Fork 16
Pipeline training support + priority based rollout scheduler #358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
d9ab3d4
add
mayinghan 2865b79
add priority rolluot scheduler
mayinghan 37e0210
groupwise
mayinghan ff329d8
add
mayinghan b556f4e
add
mayinghan 5fc935c
fix
mayinghan 9219921
put it back
mayinghan fae3150
add
mayinghan f785514
add postprocess
mayinghan 81fbc70
resolve comments and fix bugs
mayinghan b29af96
fix
mayinghan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| import asyncio | ||
| import os | ||
| from collections import defaultdict | ||
| from typing import List, Dict | ||
|
|
||
| from eval_protocol.models import EvaluationRow | ||
|
|
||
| class MicroBatchDataBuffer: | ||
| """ | ||
| Buffers evaluation results and writes them to disk in minibatches. | ||
| Waits for all runs of a sample to complete before considering it ready and flush to disk. | ||
| """ | ||
| def __init__(self, num_runs: int, batch_size: int, output_path_template: str): | ||
| self.num_runs = num_runs | ||
| self.batch_size = batch_size | ||
| self.output_path_template = output_path_template | ||
| self.pending_samples: Dict[str, List[EvaluationRow]] = defaultdict(list) # row_id -> list[EvaluationRow] | ||
| self.completed_samples_buffer: List[List[EvaluationRow]] = [] # List[List[EvaluationRow]] | ||
| self.batch_index = 0 | ||
| self.lock = asyncio.Lock() | ||
|
|
||
| async def add_result(self, row: EvaluationRow): | ||
| """ | ||
| Add a single evaluation result. | ||
| Thread-safe/Coroutine-safe. | ||
| """ | ||
| async with self.lock: | ||
| row_id = row.input_metadata.row_id | ||
| if not row_id: | ||
| # Should not happen in valid EP workflow, unique row_id is required to group things together properly | ||
| return | ||
|
|
||
| self.pending_samples[row_id].append(row) | ||
|
|
||
| if len(self.pending_samples[row_id]) >= self.num_runs: | ||
| # Sample completed (all runs finished) | ||
| completed_rows = self.pending_samples.pop(row_id) | ||
| self.completed_samples_buffer.append(completed_rows) | ||
|
|
||
| if len(self.completed_samples_buffer) >= self.batch_size: | ||
| await self._flush_unsafe() | ||
|
|
||
| async def _flush_unsafe(self): | ||
| """ | ||
| not thread safe, assumes lock is held by called | ||
| """ | ||
| if not self.completed_samples_buffer: | ||
| return | ||
|
|
||
| if "{index}" in self.output_path_template: | ||
| output_path = self.output_path_template.format(index=self.batch_index) | ||
| mode = "w" | ||
| else: | ||
| output_path = self.output_path_template | ||
| mode = "a" # Append if no index placeholder | ||
|
|
||
| # Ensure directory exists | ||
| os.makedirs(os.path.dirname(os.path.abspath(output_path)), exist_ok=True) | ||
|
|
||
| # Write flattened rows | ||
| with open(output_path, mode) as f: | ||
| for sample_rows in self.completed_samples_buffer: | ||
| for row in sample_rows: | ||
| f.write(row.model_dump_json() + "\n") | ||
|
|
||
| self.completed_samples_buffer = [] | ||
| self.batch_index += 1 | ||
|
|
||
| async def close(self): | ||
| """ | ||
| Flush any remaining samples in the buffer. | ||
| """ | ||
| async with self.lock: | ||
| # Also flush pending (incomplete) samples to avoid data loss | ||
| if self.pending_samples: | ||
| for rows in self.pending_samples.values(): | ||
| self.completed_samples_buffer.append(rows) | ||
| self.pending_samples.clear() | ||
|
|
||
| if self.completed_samples_buffer: | ||
| await self._flush_unsafe() | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.