Add Trackio rollout trace logging by abidlabs · Pull Request #1697 · allenai/open-instruct

abidlabs · 2026-05-21T22:47:51Z

Hi folks! This PR adds trace logging via Trackio, the free, local-first experiment tracking library from Hugging Face 🤗

This PR follows Open-Instruct's existing rollout trace saving path, specifically I did this:

added StreamingDataLoaderConfig.trackio_project support to enable Trackio rollout traces
added logging GRPO rollout prompt/response pairs as trackio.Trace records from the data preparation actor
included reward, advantage, finish reason, dataset, prompt/sample index, model step, and tool request metadata on each trace
added trackio_max_traces_per_step to cap trace volume per training step, plus optional trackio_space_id for remote logging
documented the Trackio rollout trace flags and added tests with mocked Trackio

I tested it end-to-end and here's what it looks like:

AI assistance was used to prepare this PR.

gemini-code-assist

Code Review

This pull request integrates Trackio for logging GRPO rollout traces, adding documentation, configuration settings, and a new TrackioRolloutLogger utility. Feedback highlights the need to prevent potential IndexError crashes when accessing batch metadata and suggests using a try...finally block to ensure the logger is properly closed. Additionally, the reviewer recommended using standard imports for mandatory dependencies and replacing runtime assertions with explicit conditional checks.

gemini-code-assist · 2026-05-21T22:49:26Z

+            if batch.indices is not None:
+                metadata["dataset_index"] = batch.indices[i]
+            if batch.model_steps:
+                metadata["model_step"] = batch.model_steps[i]


There is a risk of IndexError if batch.indices or batch.model_steps have fewer elements than batch.queries. While these are typically aligned, adding a length check ensures the logging utility doesn't crash the training process if the data structure is unexpected.

Suggested change

if batch.indices is not None:

metadata["dataset_index"] = batch.indices[i]

if batch.model_steps:

metadata["model_step"] = batch.model_steps[i]

if batch.indices is not None and i < len(batch.indices):

metadata["dataset_index"] = batch.indices[i]

if batch.model_steps and i < len(batch.model_steps):

metadata["model_step"] = batch.model_steps[i]

gemini-code-assist · 2026-05-21T22:49:26Z

+            prompt = (
+                batch.raw_queries[i]
+                if batch.raw_queries is not None and batch.raw_queries[i] is not None
+                else self.tokenizer.decode(batch.queries[i], skip_special_tokens=False)
+            )
+            response = (
+                batch.decoded_responses[i]
+                if batch.decoded_responses is not None and batch.decoded_responses[i] is not None
+                else self.tokenizer.decode(result.responses[i], skip_special_tokens=False)
+            )


Accessing batch.raw_queries[i] and batch.decoded_responses[i] without checking their length relative to i can lead to an IndexError. Defensive checks should be added to prevent crashing the training process due to a telemetry failure.

Suggested change

prompt = (

batch.raw_queries[i]

if batch.raw_queries is not None and batch.raw_queries[i] is not None

else self.tokenizer.decode(batch.queries[i], skip_special_tokens=False)

)

response = (

batch.decoded_responses[i]

if batch.decoded_responses is not None and batch.decoded_responses[i] is not None

else self.tokenizer.decode(result.responses[i], skip_special_tokens=False)

)

prompt = (

batch.raw_queries[i]

if batch.raw_queries is not None and i < len(batch.raw_queries) and batch.raw_queries[i] is not None

else self.tokenizer.decode(batch.queries[i], skip_special_tokens=False)

)

response = (

batch.decoded_responses[i]

if batch.decoded_responses is not None and i < len(batch.decoded_responses) and batch.decoded_responses[i] is not None

else self.tokenizer.decode(result.responses[i], skip_special_tokens=False)

)

gemini-code-assist · 2026-05-21T22:49:26Z

+                if self.trackio_rollout_logger is not None:
+                    self.trackio_rollout_logger.close()
                return


To ensure that the Trackio session is properly finalized and resources are released, the close() call should ideally be handled in a try...finally block covering the data preparation loop. This ensures trackio.finish() is called even if the loop terminates due to an unhandled exception, preventing orphaned logging sessions.

gemini-code-assist · 2026-05-21T22:49:26Z

+    ):
+        self.tokenizer = tokenizer
+        self.max_traces_per_step = max_traces_per_step
+        self.trackio = importlib.import_module("trackio")


trackio is listed as a mandatory dependency in pyproject.toml. Using importlib.import_module inside __init__ is unnecessary and less idiomatic than a standard top-level import. If trackio is intended to be an optional dependency (only required when trace logging is enabled), it should be moved to optional-dependencies in pyproject.toml. Otherwise, a standard import trackio at the top of the file is preferred.

gemini-code-assist · 2026-05-21T22:49:26Z

+        if self.max_traces_per_step <= 0:
+            return
+
+        assert batch.scores is not None, "batch.scores must not be None when logging Trackio traces"


Using assert for runtime validation is discouraged because assertions are removed when Python is run with optimizations (-O). Since batch.scores is critical for logging rewards, consider using an explicit if check to avoid a potential TypeError during indexing in production environments.

Add Trackio rollout trace logging

ecfa148

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

Address Trackio trace review feedback

0ca05f8

abidlabs marked this pull request as ready for review May 21, 2026 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Trackio rollout trace logging#1697

Add Trackio rollout trace logging#1697
abidlabs wants to merge 2 commits into
allenai:mainfrom
abidlabs:add-trackio-rollout-traces

abidlabs commented May 21, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abidlabs commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

abidlabs commented May 21, 2026 •

edited

Loading