fix(autoagent): recover stable 1.1.0 release by ekizito96 · Pull Request #39 · Prescott-Data/jarviscore-framework

ekizito96 · 2026-05-12T21:07:29Z

Summary

This PR recovers AutoAgent stability for the v1.1.0 release and addresses the 7 open regressions tracked in #32 through #38.

The core theme is making AutoAgent reliable as a production developer-facing orchestration primitive: schema contracts are enforced, coder agents are grounded in the real sandbox environment, trivial tasks avoid unnecessary planning loops, semantic tool lookup is normalized, AutoAgent/CustomAgent runtime boundaries are explicit, execution success is separated from task success, and sandbox cleanup is made safe around async/ZMQ edge cases.

Closes #32
Closes #33
Closes #34
Closes #35
Closes #36
Closes #37
Closes #38

What Changed

Fixed structured output validation so Agent.output_schema is carried through the Kernel into CoderSubAgent and validated with Pydantic instead of silently returning invalid payloads.
Added sandbox environment manifests to CoderSubAgent prompts so the model sees the actual preloaded globals/modules available at runtime.
Added a task complexity classifier before the Planner so trivial tasks can run directly through the Kernel without unnecessary Plan → Execute → Evaluate overhead.
Added intent normalization before FunctionRegistry semantic search to avoid embedding misses caused by verbose user/task text.
Clarified the AutoAgent vs CustomAgent lifecycle boundary with p2p_responder, so only true mesh responders get background run loops.
Added semantic success detection so “code executed” and “task succeeded” are no longer treated as the same thing.
Fixed sandbox namespace cleanup by restoring __builtins__ in finally blocks for sync and async execution paths.
Restored AutoAgent file-writing usability by wiring the file-capable CoderSandbox.
Added a CoderSubAgent proof-of-work gate so code-producing tasks cannot prematurely return DONE without successful execution evidence.
Updated docs/versioning to release this as 1.1.0, mark 1.0.3 and 1.0.4 as broken/yanked, and document strict SemVer policy going forward.

Issue Coverage

AutoAgent: no output schema enforcement; identical tasks produce structurally different results #32: Output schema enforcement now fails fast on invalid structured results.
AutoAgent: CoderSubAgent hallucinates sandbox APIs when available objects are not explicitly manifested #33: CoderSubAgent now receives a real sandbox manifest instead of guessing available runtime APIs.
AutoAgent goal_oriented=True: no complexity gate; trivial tasks hang in Plan→Execute→Evaluate loop #34: AutoAgent now routes trivial tasks without forcing the Planner DAG.
FunctionRegistry: semantic router does not reliably match conversational re-submissions; cache hit rate near zero in practice #35: FunctionRegistry semantic search now uses normalized intent.
AutoAgent vs CustomAgent architectural boundary is undocumented; JarvisLifespan creates misleading parity at startup #36: AutoAgent and CustomAgent background lifecycle behavior is now explicit and validated.
Kernel result_handler conflates execution status with semantic status; domain failures reported as success #37: Semantic task failure is detected even when sandbox execution succeeds.
Sandbox namespace isolation leaks into ZMQ transport layer; KeyError: '__builtins__' on coroutine cleanup #38: Sandbox builtins are restored after sync/async execution to prevent coroutine cleanup crashes.

Test Plan

pytest tests/test_issue_32.py tests/test_issue_33.py tests/test_issue_34.py tests/test_issue_35.py tests/test_issue_36.py tests/test_issue_37.py tests/test_issue_38.py
python test_usability.py
python -m compileall jarviscore
git diff --check HEAD^ HEAD
IDE lints checked on touched files

Release Notes

This should ship as 1.1.0, not another patch release, because it introduces new backward-compatible framework behavior and public primitives while recovering from the broken 1.0.3 / 1.0.4 PyPI releases.

+            mesh_config["seed_nodes"] = seed_nodes
+
+        # Find an available port for this agent's P2P listener
+        # SWIM doesn't support bind_port=0, so we find a free port


+    """Promise: ValueError if system_prompt is absent."""
+    print("\n--- Test 1: Class Validation (system_prompt required) ---")
+    try:
+        agent = AgentMissingPrompt(agent_id="test-bad")


+import os
+
+if TYPE_CHECKING:
+    from jarviscore.p2p import PeerClient


+
+if TYPE_CHECKING:
+    from jarviscore.p2p import PeerClient
+    from jarviscore.p2p.coordinator import P2PCoordinator


+import logging
+import os
+import signal
+import sys


@@ -0,0 +1,26 @@
+import pytest
+from unittest.mock import AsyncMock, patch


@@ -0,0 +1,86 @@
+import pytest
+import asyncio


+
+        # 5. Wait for SWIM cluster to converge
+        # This allows SWIM gossip to sync membership
+        import asyncio


Restore AutoAgent usability by enforcing coder proof-of-work, wiring the file-capable CoderSandbox, and documenting the SemVer recovery from the broken 1.0.3/1.0.4 releases.

Harden AutoAgent, Kernel routing, planner/evaluator contracts, workflow status propagation, search/provider behavior, resource cleanup, and package hygiene so v1.1.0 restores a usable stable framework release. Fixes #32. Fixes #33. Fixes #34. Fixes #35. Fixes #36. Fixes #37. Fixes #38. Co-authored-by: Cursor <cursoragent@cursor.com>

+        finally:
+            await mesh.stop()
+
+        statuses = {r["step_id"] if "step_id" in r else f"s{i}": r["status"]


+                    if hasattr(v, "model_json_schema"):
+                        # Render Pydantic BaseModels as JSON schemas for the LLM
+                        try:
+                            import json


 import re
 import time
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List, Optional, TYPE_CHECKING, cast


ekizito96 requested a review from Ruth-mutua May 12, 2026 21:07

github-advanced-security AI found potential problems May 12, 2026

View reviewed changes

Comment thread jarviscore/core/agent.py

mesh_config["seed_nodes"] = seed_nodes

# Find an available port for this agent's P2P listener

# SWIM doesn't support bind_port=0, so we find a free port

ekizito96 requested a review from sangalo20 May 12, 2026 21:08

github-code-quality Bot found potential problems May 12, 2026

View reviewed changes

fix(autoagent): recover stable 1.1.0 release

ff4cb42

Restore AutoAgent usability by enforcing coder proof-of-work, wiring the file-capable CoderSandbox, and documenting the SemVer recovery from the broken 1.0.3/1.0.4 releases.

ekizito96 force-pushed the fix/autoagent-stability-1.1.0 branch from 87b3ca1 to ff4cb42 Compare May 12, 2026 21:11

github-code-quality Bot found potential problems May 14, 2026

View reviewed changes

Ruth-mutua merged commit 9f17da0 into main May 15, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(autoagent): recover stable 1.1.0 release#39

fix(autoagent): recover stable 1.1.0 release#39
Ruth-mutua merged 2 commits into
mainfrom
fix/autoagent-stability-1.1.0

ekizito96 commented May 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,26 @@
		import pytest
		from unittest.mock import AsyncMock, patch

Conversation

ekizito96 commented May 12, 2026

Summary

What Changed

Issue Coverage

Test Plan

Release Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants