Skip to content

test(agentic): add 46 state management and orchestrator tests — all passed#2

Open
Mog9 wants to merge 1 commit into
tensormux:mainfrom
Mog9:add-agentic-loop-state-runner-tests
Open

test(agentic): add 46 state management and orchestrator tests — all passed#2
Mog9 wants to merge 1 commit into
tensormux:mainfrom
Mog9:add-agentic-loop-state-runner-tests

Conversation

@Mog9

@Mog9 Mog9 commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Continues the agentic loop test coverage from #1. This PR adds tests for the state management layer (agent_state.py) and the main orchestrator loop (agent_runner.py), plus the remaining 5 tests for tool dispatch (agent_tools.py).

46 new tests across 3 files — all passing.

test_agent_tools.py (5 new tests)

# Test What it validates
1 test_unknown_tool_returns_error Unknown tool name returns error ToolResult
2 test_tool_exception_returns_error Handler raising exception is caught, returns error
3 test_run_verify_delegates_to_verifier Calls verify_candidate with correct args
4 test_run_benchmark_delegates_to_benchmarker Calls benchmark_candidate
5 test_read_candidate_file_rejects_bad_path Path traversal in read returns error

test_agent_state.py (16 new tests)

# Test What it validates
1 test_init_state_creates_file agent_state.json written to disk
2 test_init_state_truncates_transcript Old transcript cleared on fresh start
3 test_init_state_defaults Status=PENDING, iteration=0, cost=0.0
4 test_save_and_load_roundtrip State survives JSON serialization
5 test_load_returns_none_when_missing No file returns None
6 test_load_returns_none_on_corrupt_json Malformed file returns None
7 test_request_abort_sets_flag abort_requested becomes True
8 test_request_abort_only_when_pending_or_running Returns False for terminal statuses
9 test_request_abort_returns_false_when_no_state No state file returns False
10 test_append_transcript_adds_timestamp Each line has "at" key
11 test_append_transcript_multiple_lines Multiple appends produce multiple lines
12 test_read_transcript_empty No file returns []
13 test_read_transcript_skips_blank_lines Blank lines ignored
14 test_read_transcript_skips_invalid_json Malformed lines ignored
15 test_state_path_format Path is repo_root / artifact_dir / "agent_state.json"
16 test_transcript_path_format Path is repo_root / artifact_dir / "agent_transcript.jsonl"

test_agent_runner.py (25 new tests)

Cost tracking (6 tests)

# Test What it validates
1 test_cost_from_usage_all_zeros Zero tokens = $0.00
2 test_cost_from_usage_input_only 1M input tokens = $5.00
3 test_cost_from_usage_output_only 1M output tokens = $25.00
4 test_cost_from_usage_cache_write 1M cache_write tokens = $6.25
5 test_cost_from_usage_cache_read 1M cache_read tokens = $0.50
6 test_cost_from_usage_combined Mixed tokens = correct sum

System prompt (3 tests)

# Test What it validates
7 test_build_system_prompt_has_two_blocks Rules block + cached skill bundle block
8 test_build_system_prompt_cache_control Second block has cache_control: ephemeral
9 test_build_system_prompt_includes_task_details Op, GPU, dtype, shape in rules text

Loop termination (7 tests)

# Test What it validates
10 test_loop_errored_on_missing_api_key No ANTHROPIC_API_KEY = ERRORED immediately
11 test_loop_errored_on_api_refusal stop_reason="refusal" = ERRORED
12 test_loop_breaks_on_end_turn_without_tools end_turn with no tool_use = breaks
13 test_loop_rejected_on_give_up Agent calls give_up = REJECTED
14 test_loop_rejected_on_max_iterations Loop exhausts iterations = REJECTED
15 test_loop_rejected_on_cost_cap Cost exceeds cap = REJECTED
16 test_loop_errored_on_api_error APIStatusError = ERRORED

State transitions (5 tests)

# Test What it validates
17 test_iteration_counter_increments State.iteration matches loop count
18 test_cost_accumulates_across_turns cost_usd grows with each API call
19 test_token_counts_accumulate All four token counters grow correctly
20 test_verify_result_tracked_in_state last_verify_passed/reason updated
21 test_benchmark_result_tracked_in_state last_benchmark_passed/speedup updated

Edge cases (4 tests)

# Test What it validates
22 test_loop_continues_when_verify_fails Verify fails = no promotion, loop continues
23 test_loop_continues_when_benchmark_fails Benchmark fails = no promotion, loop continues
24 test_promotion_failure_does_not_crash promote_candidate raises ValueError = handled gracefully
25 test_transcript_written_each_turn transcript_lines increments each turn

Results

test_agent_tools.py: 15 passed (10 from #1 + 5 new)
test_agent_state.py: 16 passed (new file)
test_agent_runner.py: 25 passed (new file)
Full suite: 149 passed, 1 skipped

Coverage summary

Module Before After
agent_tools.py 0% 15 tests
agent_state.py 0% 16 tests
agent_runner.py 0% 25 tests
Total agentic 0 56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant