Python: [BREAKING] Integrate looping into HarnessAgent#6607
Conversation
There was a problem hiding this comment.
Pull request overview
This PR makes looping an opt-in, first-class capability of the Python harness by wiring AgentLoopMiddleware as the outermost harness middleware when configured, and adds new provider-resolving todo-loop helpers to support “work until all todos are complete” scenarios (including optional mode-gating). It also adds an approval “escape hatch” so looping stops immediately on pending tool-approval requests and returns control to the caller.
Changes:
- Add
loop_should_continue,loop_next_message, andloop_max_iterationstocreate_harness_agent, wiring the loop outermost (ahead of tool approval) when enabled. - Replace the old provider-argument todo loop helper with
todos_remaining(*, modes=None)and addtodos_remaining_message, both resolving providers fromagent.context_providers. - Update samples/docs and add/adjust tests for harness loop wiring, mode-gating behavior, and approval escape hatch semantics.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| python/samples/02-agents/middleware/agent_loop_middleware_todos.py | Updates todo loop sample to use the new no-argument todos_remaining() helper. |
| python/samples/02-agents/middleware/agent_loop_middleware_report.py | Updates composed loop sample to use the new no-argument todos_remaining() helper. |
| python/samples/02-agents/harness/README.md | Documents harness looping as an available harness feature and clarifies the harness research sample description. |
| python/samples/02-agents/harness/harness_research.py | Demonstrates harness looping with mode-gated todo completion and a max-iteration safety cap. |
| python/packages/core/tests/core/test_harness_loop.py | Replaces old todo helper tests with new provider-resolving todo helper and approval escape hatch tests. |
| python/packages/core/tests/core/test_harness_agent.py | Adds tests asserting loop middleware is wired only when configured and that it is outermost relative to tool approval and user middleware. |
| python/packages/core/AGENTS.md | Updates public docs to describe new todo-loop helpers and harness integration. |
| python/packages/core/agent_framework/_harness/_loop.py | Adds approval escape hatch detection and implements the new todos_remaining / todos_remaining_message helpers. |
| python/packages/core/agent_framework/_harness/_agent.py | Extends create_harness_agent to accept loop params and wire AgentLoopMiddleware outermost when enabled. |
| python/packages/core/agent_framework/init.py | Exports todos_remaining_message at the package level. |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 5 | Confidence: 90%
✓ Correctness
The PR correctly integrates loping into HarnessAgent with proper escape-hatch semantics, middleware ordering, and a well-designed
todos_remaining/todos_remaining_messageAPI. The approval escape hatch correctly fires beforeshould_continueevaluation in both streaming and non-streaming paths. The_resolve_context_providerhelper safely resolves viagetattr(agent, 'context_providers', []). Mode gating uses case-insensitive set matching built at factory time. Middleware insertion withinsert(0, ...)correctly positions the loop outermost. All paths are covered by comprehensive tests.
✓ Security Reliability
This PR integrates loping into the harness agent with a well-designed approval escape hatch, provider resolution via context_providers, and mode-gating. The implementation is security-sound: the escape hatch correctly stops before evaluating should_continue/max_iterations (preventing HITL bypass), the loop has a safe default cap (DEFAULT_MAX_ITERATIONS=10), input validation rejects empty modes, and provider resolution is defensive (getattr with default). No injection risks, resource leaks, or unhandled failure modes were identified.
✓ Test Coverage
Test coverage for the new loop integration is thorough: harness wiring (7 tests), todos_remaining/todos_remaining_message helpers (6 tests), and the approval escape hatch (4 tests) are all well-covered with meaningful assertions. One branch in todos_remaining — the fallback when modes is specified but no AgentModeProvider is registered — is not exercised by any test.
✓ Failure Modes
The PR is well-implemented. The approval escape hatch correctly preserves the response in both streaming (holder['final'] set before the check) and non-streaming (aggregated includes the approval messages before break) paths. The todos_remaining/todos_remaining_message helpers properly guard against missing session/agent/provider by returning False/None. Provider resolution via _resolve_context_provider safely uses getattr with default [] and next() with default None. No silent failures, lost errors, or stale state issues were found.
✗ Design Approach
I found one design-level correctness issue in the new todo-loop helper API: the provider lookup strategy does not match the newly documented "works with any agent that registers a TodoProvider via context_providers" contract when more than one matching provider is present.
Flagged Issues
-
todos_remaining()can silently ignore a caller-suppliedTodoProviderwhen used withcreate_harness_agent(..., context_providers=[...]), because the harness appends extra providers after its built-ins (_harness/_agent.py:150-175) but_resolve_context_provider()always returns the first match (_harness/_loop.py:816-818). The helper needs to disambiguate duplicate providers or reject them instead of always taking the first match.
Suggestions
- Consider adding a test for
todos_remaining(modes=[...])when the agent has aTodoProviderbut noAgentModeProvider— this exercises the else-branch fallback at_loop.py:869(get_agent_mode(session)without provider-specific config).
Automated review by westey-m's agents
|
Flagged issue
Source: automated DevFlow PR review |
Motivation & Context
The core
AgentLoopMiddlewaremakes it possible to re-invoke an agent until configurable criteria are met. This PR builds on that to make looping a first-class, opt-in capability of the harness, delivering the "loop until completion" enforcement scenario for harnesses.It also adds shared
todos_remaining/todos_remaining_messagehelpers so a harness can keep working autonomously until every todo item is complete — optionally scoped to specific agent modes — which is exactly the kind of completion-enforcement behaviour the harness experience needs. This is the Python counterpart of the .NET work in #6544.Description & Review Guide
What are the major changes?
create_harness_agentgainsloop_should_continue,loop_next_message, andloop_max_iterationskeyword arguments. Whenloop_should_continueis supplied, the harness is wrapped in anAgentLoopMiddlewarewired as the outermost middleware (each iteration is a full agent run, including tool approval); when it isNone, behaviour is unchanged.should_continueor injectingnext_message, so looping is HITL-safe even when wrapped aroundToolApprovalMiddleware._loop.py, exported from the package:todos_remaining(*, modes=None)resolves theTodoProvider(andAgentModeProviderwhenmodesis set) fromagent.context_providersand loops while incomplete todos remain (modes=Noneapplies in all modes; a non-empty sequence gates by mode, case-insensitively; an empty sequence raisesValueError).todos_remaining_messageis anext_messagecallable that lists the still-open todos. These merge and replace the former provider-argument helper.harness_research.pysample demonstrates the loop scoped to"execute"mode with aloop_max_iterationssafety cap; the twoagent_loop_middleware_*samples switch to the no-argumenttodos_remaining().AGENTS.mdupdated; conforms to the new split type-checker setup (pyright on source; pyright/mypy/pyrefly/ty/zuban on tests/samples).What is the impact of these changes?
create_harness_agent(no loop unlessloop_should_continueis set).todos_remaining(provider)signature is removed and replaced bytodos_remaining(*, modes=None), which resolves the provider from the running agent'scontext_providersinstead of taking it as an argument. Callers passing a provider must update to the new form.What do you want reviewers to focus on?
todos_remainingAPI shape and its mode-gating behaviour.Related Issue
Fixes #6478
Contribution Checklist
breaking changelabel (or add "[BREAKING]" to the title prefix, before or after any language prefix) — a workflow keeps the label and title prefix in sync automatically.