fix: match backfilled log status to terminal status code

SunnySoldier357 · SunnySoldier357 · commit 3298857d7b2d · 2026-05-05T17:23:47.000-07:00
Bugbot pointed out that the backfill loop could pick an earlier
RUNNING/partial status log instead of the terminal one when a rollout
emits multiple status-bearing logs. The reported `code` was always
correct (it came from /status), but `message`/`details`/`extras` could
be attached from the wrong row and the raised exception would carry
misleading text.

Match the log row's status code to the terminal code returned by
/status so the backfill is deterministic.

Made-with: Cursor
diff --git a/eval_protocol/pytest/remote_rollout_processor.py b/eval_protocol/pytest/remote_rollout_processor.py
@@ -146,9 +146,12 @@ async def _process_row(row: EvaluationRow) -> EvaluationRow:
                     completed_logs = await self._tracing_adapter.async_search_logs(
                         session, tags=[f"rollout_id:{row.execution_metadata.rollout_id}"]
                     )
+                    # Pick the log row whose status code matches the terminal
+                    # code from /status, so intermediate RUNNING checkpoints
+                    # don't poison the backfill.
                     for log in completed_logs:
                         sd = log.get("status")
-                        if sd and isinstance(sd, dict) and "code" in sd:
+                        if isinstance(sd, dict) and sd.get("code") == status_code:
                             status_message = sd.get("message", "") or ""
                             status_details = sd.get("details", []) or []
                             raw_extras = log.get("extras") or {}