Skip to content

fix(autoagent): recover stable 1.1.0 release#39

Merged
Ruth-mutua merged 2 commits into
mainfrom
fix/autoagent-stability-1.1.0
May 15, 2026
Merged

fix(autoagent): recover stable 1.1.0 release#39
Ruth-mutua merged 2 commits into
mainfrom
fix/autoagent-stability-1.1.0

Conversation

@ekizito96
Copy link
Copy Markdown
Contributor

Summary

This PR recovers AutoAgent stability for the v1.1.0 release and addresses the 7 open regressions tracked in #32 through #38.

The core theme is making AutoAgent reliable as a production developer-facing orchestration primitive: schema contracts are enforced, coder agents are grounded in the real sandbox environment, trivial tasks avoid unnecessary planning loops, semantic tool lookup is normalized, AutoAgent/CustomAgent runtime boundaries are explicit, execution success is separated from task success, and sandbox cleanup is made safe around async/ZMQ edge cases.

Closes #32
Closes #33
Closes #34
Closes #35
Closes #36
Closes #37
Closes #38

What Changed

  • Fixed structured output validation so Agent.output_schema is carried through the Kernel into CoderSubAgent and validated with Pydantic instead of silently returning invalid payloads.
  • Added sandbox environment manifests to CoderSubAgent prompts so the model sees the actual preloaded globals/modules available at runtime.
  • Added a task complexity classifier before the Planner so trivial tasks can run directly through the Kernel without unnecessary Plan → Execute → Evaluate overhead.
  • Added intent normalization before FunctionRegistry semantic search to avoid embedding misses caused by verbose user/task text.
  • Clarified the AutoAgent vs CustomAgent lifecycle boundary with p2p_responder, so only true mesh responders get background run loops.
  • Added semantic success detection so “code executed” and “task succeeded” are no longer treated as the same thing.
  • Fixed sandbox namespace cleanup by restoring __builtins__ in finally blocks for sync and async execution paths.
  • Restored AutoAgent file-writing usability by wiring the file-capable CoderSandbox.
  • Added a CoderSubAgent proof-of-work gate so code-producing tasks cannot prematurely return DONE without successful execution evidence.
  • Updated docs/versioning to release this as 1.1.0, mark 1.0.3 and 1.0.4 as broken/yanked, and document strict SemVer policy going forward.

Issue Coverage

Test Plan

  • pytest tests/test_issue_32.py tests/test_issue_33.py tests/test_issue_34.py tests/test_issue_35.py tests/test_issue_36.py tests/test_issue_37.py tests/test_issue_38.py
  • python test_usability.py
  • python -m compileall jarviscore
  • git diff --check HEAD^ HEAD
  • IDE lints checked on touched files

Release Notes

This should ship as 1.1.0, not another patch release, because it introduces new backward-compatible framework behavior and public primitives while recovering from the broken 1.0.3 / 1.0.4 PyPI releases.

@ekizito96 ekizito96 requested a review from Ruth-mutua May 12, 2026 21:07
Comment thread jarviscore/core/agent.py
mesh_config["seed_nodes"] = seed_nodes

# Find an available port for this agent's P2P listener
# SWIM doesn't support bind_port=0, so we find a free port
@ekizito96 ekizito96 requested a review from sangalo20 May 12, 2026 21:08
Comment thread test_usability.py
"""Promise: ValueError if system_prompt is absent."""
print("\n--- Test 1: Class Validation (system_prompt required) ---")
try:
agent = AgentMissingPrompt(agent_id="test-bad")
Comment thread jarviscore/core/agent.py
import os

if TYPE_CHECKING:
from jarviscore.p2p import PeerClient
Comment thread jarviscore/core/agent.py

if TYPE_CHECKING:
from jarviscore.p2p import PeerClient
from jarviscore.p2p.coordinator import P2PCoordinator
Comment thread jarviscore/planning/classifier.py Fixed
import logging
import os
import signal
import sys
Comment thread tests/test_issue_35.py
@@ -0,0 +1,26 @@
import pytest
from unittest.mock import AsyncMock, patch
Comment thread tests/test_issue_36.py
@@ -0,0 +1,86 @@
import pytest
import asyncio
Comment thread tests/test_issue_38.py Fixed
Comment thread test_usability.py Fixed
Comment thread jarviscore/core/agent.py

# 5. Wait for SWIM cluster to converge
# This allows SWIM gossip to sync membership
import asyncio
Restore AutoAgent usability by enforcing coder proof-of-work, wiring the file-capable CoderSandbox, and documenting the SemVer recovery from the broken 1.0.3/1.0.4 releases.
@ekizito96 ekizito96 force-pushed the fix/autoagent-stability-1.1.0 branch from 87b3ca1 to ff4cb42 Compare May 12, 2026 21:11
Harden AutoAgent, Kernel routing, planner/evaluator contracts, workflow status propagation, search/provider behavior, resource cleanup, and package hygiene so v1.1.0 restores a usable stable framework release.

Fixes #32.
Fixes #33.
Fixes #34.
Fixes #35.
Fixes #36.
Fixes #37.
Fixes #38.

Co-authored-by: Cursor <cursoragent@cursor.com>
finally:
await mesh.stop()

statuses = {r["step_id"] if "step_id" in r else f"s{i}": r["status"]
if hasattr(v, "model_json_schema"):
# Render Pydantic BaseModels as JSON schemas for the LLM
try:
import json
import re
import time
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List, Optional, TYPE_CHECKING, cast
@Ruth-mutua Ruth-mutua merged commit 9f17da0 into main May 15, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment