Skip to content

Implements "toggle" from Kimi K2.5#1676

Draft
finbarrtimbers wants to merge 10 commits into
mainfrom
finbarr/toggle
Draft

Implements "toggle" from Kimi K2.5#1676
finbarrtimbers wants to merge 10 commits into
mainfrom
finbarr/toggle

Conversation

@finbarrtimbers
Copy link
Copy Markdown
Collaborator

No description provided.

@finbarrtimbers finbarrtimbers changed the title Implements "toggle Implements "toggle" from Kimi K2.5 May 11, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the Toggle reward-shaping heuristic from the Kimi K2.5 paper, which alternates between standard scaling and length-penalty phases to control response lengths during training. The implementation includes a new ToggleBudgetTracker class, configuration parameters in StreamingDataLoaderConfig, and integration into the DataPreparationActor. Feedback focuses on critical performance issues in the ToggleBudgetTracker, specifically the inefficient and redundant calculation of percentiles using np.percentile on growing lists within batch loops. It is recommended to cache these values or use streaming percentile estimators to avoid significant training degradation.

Comment on lines +1280 to +1284
def budget(self, dataset) -> float | None:
lengths = self.lengths_per_dataset.get(self._key(dataset))
if not lengths:
return None
return float(np.percentile(lengths, self.percentile))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The budget method calls np.percentile on a list that grows indefinitely as training progresses. This operation is $O(N \log N)$ (or $O(N)$ with selection) and is currently called for every sample in every batch via the list comprehension in maybe_apply. This will cause significant performance degradation in the DataPreparationActor thread as the history of correct response lengths grows. Consider caching the budget values once per training step or using a more efficient streaming percentile estimator.

Comment on lines +1306 to +1313
for dataset in datasets:
key = self._key(dataset)
if key in seen_keys:
continue
seen_keys.add(key)
budget_value = self.budget(dataset)
if budget_value is not None:
metrics[f"toggle/budget/{'|'.join(key)}"] = budget_value
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This loop iterates over all samples in the batch to populate metrics, calling self.budget(dataset) for each one. Since self.budget performs an expensive percentile calculation, this is highly inefficient when many samples belong to the same dataset (which is typical in GRPO). You should iterate over unique dataset keys in the batch instead.

Suggested change
for dataset in datasets:
key = self._key(dataset)
if key in seen_keys:
continue
seen_keys.add(key)
budget_value = self.budget(dataset)
if budget_value is not None:
metrics[f"toggle/budget/{'|'.join(key)}"] = budget_value
unique_keys = {self._key(d) for d in datasets}
budget_map = {key: self.budget_from_key(key) for key in unique_keys}
for key, budget_value in budget_map.items():
if budget_value is not None:
metrics[f"toggle/budget/{'|'.join(key)}"] = budget_value

Comment on lines +1320 to +1322
budgets = np.array(
[self.budget(d) if self.budget(d) is not None else np.inf for d in datasets], dtype=np.float64
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Redundant and expensive calls to self.budget(d) inside a list comprehension. This should be optimized by using a pre-calculated map of budgets for the unique datasets present in the current batch to avoid re-computing the same percentile hundreds of times per step.

Suggested change
budgets = np.array(
[self.budget(d) if self.budget(d) is not None else np.inf for d in datasets], dtype=np.float64
)
budgets = np.array(
[budget_map[self._key(d)] if budget_map[self._key(d)] is not None else np.inf for d in datasets], dtype=np.float64
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant