Skip to content

Add Trackio rollout trace logging#1697

Open
abidlabs wants to merge 2 commits into
allenai:mainfrom
abidlabs:add-trackio-rollout-traces
Open

Add Trackio rollout trace logging#1697
abidlabs wants to merge 2 commits into
allenai:mainfrom
abidlabs:add-trackio-rollout-traces

Conversation

@abidlabs
Copy link
Copy Markdown

@abidlabs abidlabs commented May 21, 2026

Hi folks! This PR adds trace logging via Trackio, the free, local-first experiment tracking library from Hugging Face 🤗

This PR follows Open-Instruct's existing rollout trace saving path, specifically I did this:

  • added StreamingDataLoaderConfig.trackio_project support to enable Trackio rollout traces
  • added logging GRPO rollout prompt/response pairs as trackio.Trace records from the data preparation actor
  • included reward, advantage, finish reason, dataset, prompt/sample index, model step, and tool request metadata on each trace
  • added trackio_max_traces_per_step to cap trace volume per training step, plus optional trackio_space_id for remote logging
  • documented the Trackio rollout trace flags and added tests with mocked Trackio

I tested it end-to-end and here's what it looks like:

image

AI assistance was used to prepare this PR.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates Trackio for logging GRPO rollout traces, adding documentation, configuration settings, and a new TrackioRolloutLogger utility. Feedback highlights the need to prevent potential IndexError crashes when accessing batch metadata and suggests using a try...finally block to ensure the logger is properly closed. Additionally, the reviewer recommended using standard imports for mandatory dependencies and replacing runtime assertions with explicit conditional checks.

Comment thread open_instruct/rl_utils.py Outdated
Comment on lines +201 to +204
if batch.indices is not None:
metadata["dataset_index"] = batch.indices[i]
if batch.model_steps:
metadata["model_step"] = batch.model_steps[i]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a risk of IndexError if batch.indices or batch.model_steps have fewer elements than batch.queries. While these are typically aligned, adding a length check ensures the logging utility doesn't crash the training process if the data structure is unexpected.

Suggested change
if batch.indices is not None:
metadata["dataset_index"] = batch.indices[i]
if batch.model_steps:
metadata["model_step"] = batch.model_steps[i]
if batch.indices is not None and i < len(batch.indices):
metadata["dataset_index"] = batch.indices[i]
if batch.model_steps and i < len(batch.model_steps):
metadata["model_step"] = batch.model_steps[i]

Comment thread open_instruct/rl_utils.py
Comment on lines +209 to +218
prompt = (
batch.raw_queries[i]
if batch.raw_queries is not None and batch.raw_queries[i] is not None
else self.tokenizer.decode(batch.queries[i], skip_special_tokens=False)
)
response = (
batch.decoded_responses[i]
if batch.decoded_responses is not None and batch.decoded_responses[i] is not None
else self.tokenizer.decode(result.responses[i], skip_special_tokens=False)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Accessing batch.raw_queries[i] and batch.decoded_responses[i] without checking their length relative to i can lead to an IndexError. Defensive checks should be added to prevent crashing the training process due to a telemetry failure.

Suggested change
prompt = (
batch.raw_queries[i]
if batch.raw_queries is not None and batch.raw_queries[i] is not None
else self.tokenizer.decode(batch.queries[i], skip_special_tokens=False)
)
response = (
batch.decoded_responses[i]
if batch.decoded_responses is not None and batch.decoded_responses[i] is not None
else self.tokenizer.decode(result.responses[i], skip_special_tokens=False)
)
prompt = (
batch.raw_queries[i]
if batch.raw_queries is not None and i < len(batch.raw_queries) and batch.raw_queries[i] is not None
else self.tokenizer.decode(batch.queries[i], skip_special_tokens=False)
)
response = (
batch.decoded_responses[i]
if batch.decoded_responses is not None and i < len(batch.decoded_responses) and batch.decoded_responses[i] is not None
else self.tokenizer.decode(result.responses[i], skip_special_tokens=False)
)

Comment thread open_instruct/data_loader.py Outdated
Comment on lines 1408 to 1410
if self.trackio_rollout_logger is not None:
self.trackio_rollout_logger.close()
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure that the Trackio session is properly finalized and resources are released, the close() call should ideally be handled in a try...finally block covering the data preparation loop. This ensures trackio.finish() is called even if the loop terminates due to an unhandled exception, preventing orphaned logging sessions.

Comment thread open_instruct/rl_utils.py Outdated
):
self.tokenizer = tokenizer
self.max_traces_per_step = max_traces_per_step
self.trackio = importlib.import_module("trackio")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

trackio is listed as a mandatory dependency in pyproject.toml. Using importlib.import_module inside __init__ is unnecessary and less idiomatic than a standard top-level import. If trackio is intended to be an optional dependency (only required when trace logging is enabled), it should be moved to optional-dependencies in pyproject.toml. Otherwise, a standard import trackio at the top of the file is preferred.

Comment thread open_instruct/rl_utils.py Outdated
if self.max_traces_per_step <= 0:
return

assert batch.scores is not None, "batch.scores must not be None when logging Trackio traces"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using assert for runtime validation is discouraged because assertions are removed when Python is run with optimizations (-O). Since batch.scores is critical for logging rewards, consider using an explicit if check to avoid a potential TypeError during indexing in production environments.

@abidlabs abidlabs marked this pull request as ready for review May 21, 2026 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant