feat: decision rule prompts, theta_trust enforcement, spatial messaging, and scenario history filtering by legend5teve · Pull Request #45 · denoslab/AgentEvac

legend5teve · 2026-03-30T07:19:13Z

Summary

DECISION RULE prompt replaces BINDING CONSTRAINT for predeparture decisions — 4 ordered rules (p_danger > theta_r, heuristic signal, official order, else wait) with stop-at-first-match semantics, fixing the deadlock where no agents departed
theta_trust LLM enforcement — when theta_trust=0.0, inbox is stripped and a binding constraint prevents the LLM from citing social data; when theta_trust>0.0, calibrated percentage guidance is injected
Spatial messaging — new COMM_RADIUS_M env var enables distance-based broadcast filtering so only nearby agents receive messages
Scenario history filtering — agent self-history records are sanitized per information regime before embedding in LLM prompts (prevents forecast/advisory data leakage)
Messaging module extraction — AgentMessagingBus and OutboxMessage moved from main.py to agentevac/agents/messaging.py
Compact JSON encoder for readable run parameter logs
Sim config updates — new fire source F0_8, widened lambda bounds, updated spawn groups and SUMO network geometry

Commits

dcf53ce — refactor: Extract messaging module, add filter_history_for_scenario(), compact JSON encoder, new tests (54 pass)
ea5e61c — feat: Overhaul departure/routing prompts with decision rules, theta_trust enforcement, spatial messaging
f17ecac — chore: Update spawn configuration and SUMO network geometry

Test plan

All 54 unit tests pass (test_scenarios.py, test_messaging.py)
Toy-model validation run: no_notice with messaging off — verify agents depart with DECISION RULE
Toy-model validation run: advice_guided with messaging off — verify advisory labels update over time
Toy-model validation run: no_notice with messaging on — verify no degenerate herd behavior

🤖 Generated with Claude Code

Changes to be committed: modified: agentevac/simulation/main.py modified: agentevac/simulation/spawn_events.py modified: agentevac/utils/replay.py modified: sumo/Repaired.netecfg modified: sumo/Repaired.sumocfg

Module updated: agentevac/utils/replay.py - Fixed RouteReplay._load_schedule(...) so it only reads step and veh_id for replayable events: - route_change - departure_release - Non-replayable events like agent_cognition and metrics_snapshot are now ignored without touching veh_id. Cause - The loader was accessing rec["veh_id"] before checking the event type. - metrics_snapshot records do not have veh_id, so replay loading crashed with KeyError. Verification 1. python3 -m py_compile agentevac/utils/replay.py passed. 2. Reproduced the failing case with a small local script: - one route_change - one agent_cognition - one metrics_snapshot - replay load now succeeds and only indexes the route-change step.

…s and agent communication

…t_round_timeline.py

…gs for documentation

…g, and per-agent heterogeneity - Add compute_signal_conflict() using Jensen-Shannon divergence in belief_model.py - Restructure all three LLM prompts (pre-departure, destination, route) to expose raw env vs. social disagreement via your_observation/neighbor_assessment/ information_conflict/combined_belief fields - Add conflict_assessment field to all Pydantic response models - Add conflict recording to metrics (record_conflict_sample, compute_average_signal_conflict) - Implement distance-based noise scaling (proposal Eq. 1): effective sigma scales with fire margin / reference distance via DIST_REF_M config - Add per-agent parameter heterogeneity via sample_profile_params() with truncated normal distributions; configurable via *_SPREAD env vars (default 0 = legacy) - Fix stale subjective_information reference in scenarios.py - Add experiment stage scripts (stages 0-5) for RQ1/RQ2/RQ3 sweeps - Add comprehensive tests for all new features (291 tests passing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… recording - Add observation-based exposure function for no_notice scenario that uses agent belief state and route length instead of route-specific fire data - Enable expected_utility in all three scenarios (no_notice, alert_guided, advice_guided) with scenario-aware LLM policy text - Update menu filtering to retain travel time and utility for no_notice agents - Fix NET_FILE default from .rou.xml (route file) to .net.xml (network file), which caused EDGE_SHAPE to be empty and all exposure scores to be zero - Fix exposure recording to fire only on decision rounds instead of every simulation step, preventing dilution of the exposure average - Add Repaired.net.xml to repo; update SUMO configs to use local net file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e parameters Use SIM_END_TIME_S (default 1200s) to control simulation duration instead of relying on getMinExpectedNumber(), which terminated early when no agents had departed yet. Remove dummy t_0 vehicle from route file. Add --sim-end-time CLI flag and SIM_END_TIME_S env var. Update fire source growth rates and timing for more aggressive spread scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add input-hash caching (Plan C) across all 3 LLM call sites to skip redundant API calls when agent inputs haven't changed between rounds - Add parallel LLM dispatch (Plan A) for process_pending_departures using ThreadPoolExecutor — two-phase collect-then-process pattern fires all non-cached predeparture LLM calls concurrently (up to MAX_CONCURRENT_LLM) - Add 4 new fields to AgentRuntimeState for cache state tracking - Add RQ1–RQ4 experiment runner scripts for automated parameter sweeps Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Stop the simulation loop as soon as every spawned vehicle has departed and arrived at its destination, instead of running until SIM_END_TIME_S. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Restructure LLM decision prompts with explicit priority levels (safety > official guidance > risk assessment), add EOC guidance_source to operator briefings, and fix early termination to check actual arrivals via metrics.arrived_count() instead of active vehicle count. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…dule-plots # Conflicts: # agentevac/agents/agent_state.py # agentevac/analysis/metrics.py # agentevac/simulation/main.py # sumo/Repaired.netecfg # sumo/Repaired.sumocfg

…etwork updates - Record per-step edge traces via getRoadID() for faithful replay of actual routes taken (not just planned routes) - Add departure destination choice: agents pick a destination via LLM before spawning, so vehicles head the right direction from step zero (parallel LLM dispatch via ThreadPoolExecutor) - Fix early termination to check arrived_count instead of empty vehicle list (prevents premature exit when SUMO defers vehicle insertion) - Replay mode reads to_edge from departure records for correct initial routing - Update SUMO network with new shelter edges (E#S0, E#S1, E#S2) and refreshed spawn_events - Refine scenario prompt suffixes and DecisionModel schema (situation_summary field, expanded reason descriptions) - Update forecast_layer and corresponding tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…dule-plots # Conflicts: # agentevac/simulation/main.py # sumo/Repaired.net.xml # sumo/Repaired.netecfg # sumo/Repaired.sumocfg

…selection - Record departure destination choice in metrics via record_decision_snapshot so destination_choice_share counts all agents (not just those processed by process_vehicles) - Replace fixed-offset vehicle selection with round-robin so all agents get mid-route LLM re-evaluation over successive decision ticks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The experiment runner lacked a --sumo-seed argument, so RQ scripts passed the seed via env var prefix which broke under dash (/bin/sh). Add --sumo-seed to experiments.py and rewrite all four RQ scripts to use POSIX-compatible for-loops with the new flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Record each agent's sampled psychological parameters (theta_trust, theta_r, theta_u, gamma, lambda_e, lambda_t) and write them to an agent_profiles JSON file alongside the metrics file. Enables post-hoc verification of population heterogeneity distributions in RQ4 runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…stances The old thresholds (danger ≤100m, risky ≤300m, buffered ≤700m) were calibrated for a city-block mental model. In the actual simulation, the minimum observed margin is 619m, so every agent always classified the fire as "safe" and never perceived risk. Scale thresholds to danger ≤1200m, risky ≤2500m, buffered ≤5000m so agents meaningfully perceive fire hazard. Also scale RISK_DECAY_M from 80 to 960 to keep the exponential risk curve proportional. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

En-route agents in no_notice mode can now see fire on the first few edges ahead of their current position (VISUAL_LOOKAHEAD_EDGES, default 3). This adds a penalty to the current destination's exposure score, making agents more likely to switch shelters when fire blocks their route. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ling _observation_based_exposure() now prefers travel_time_s_fastest_path (minutes * 0.3) over len_edges (count * 0.15) to better reflect actual exposure duration. Falls back to edge count when travel time is unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…y prompts Adds an explicit instruction to all three LLM policy locations (pre-departure, en-route destination, en-route route) requiring agents to only reference information explicitly present in the prompt data. Prevents GPT-4o-mini from fabricating neighbor behaviors, evacuation patterns, or shelter choices that cascade through the messaging system. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Agents in no_notice mode now only perceive fires within FIRE_PERCEPTION_RANGE_M (default 1200m) of their position: - Perception horizon: if no fire is in range, env_signal margins are None → observed_state="unknown" (genuine uncertainty instead of false "safe"). When fire is in range, margins are computed from visible fires only. - Route-level fire data: all reachable menu items gain proximity_blocked_edges and proximity_min_margin_m from visible fires, enabling the utility function to differentiate destinations by hazard. - Exposure scoring: _observation_based_exposure() adds a proximity penalty (blocked * 8.0 + margin_penalty) matching _expected_exposure weights, so routes through visible fire are strongly deprioritised. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nd compact JSON encoder Functional updates: - Extract AgentMessagingBus and OutboxMessage from main.py into agentevac/agents/messaging.py with spatial filtering support (comm_radius_m parameter for distance-based broadcast delivery) - Add filter_history_for_scenario() in scenarios.py to sanitize agent self-history records before embedding in LLM prompts (strips forecast, advisory, fire metrics per information regime) - Add _CompactLeafEncoder in run_parameters.py for readable parameter log output (leaf dicts rendered on single lines) - Add agentevac/analysis/analyze_run.py run analysis utility - Add tests for messaging spatial filtering (test_messaging.py) and history filtering (test_scenarios.py) - Add Research_Proposal_0317.pptx to .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rust enforcement, and spatial messaging Functional updates in agentevac/simulation/main.py: DEPARTURE PROMPT: - Replace BINDING CONSTRAINT with DECISION RULE prompt — 4 ordered rules (p_danger > theta_r, heuristic_departure_signal, official order, else wait) with stop-at-first-match semantics - Add heuristic_departure_signal field to predeparture prompt, exposing the pre-computed departure model output (covers theta_r, urgency decay via theta_u/gamma, low-confidence precaution, and neighbor departure pressure) THETA_TRUST LLM ENFORCEMENT (predeparture + routing prompts): - When theta_trust=0.0: strip inbox entirely, add BINDING CONSTRAINT instructing LLM to IGNORE all neighbor/social data - When theta_trust>0.0: inject calibrated percentage guidance (e.g., "rely 70% on own observation, 30% on neighbor messages") - Apply _consider_pol, _belief_weigh_pol, trust_policy to all 3 routing prompt locations (destination, route, and predeparture) SCENARIO-AWARE HISTORY: - Filter agent_self_history through filter_history_for_scenario() before embedding in routing prompts (prevents data leakage from historical records) SPATIAL MESSAGING: - Add COMM_RADIUS_M env var for distance-based broadcast filtering - Precompute SPAWN_EDGE_MIDPOINT for pre-departure agent positions - Pass agent positions to messaging.begin_round() for spatial delivery - Include comm_radius_m in messaging config sections of all prompts CONFIGURATION: - Remove inline AgentMessagingBus/OutboxMessage (now imported) - Widen lambda_e bounds to (0.0, 100.0) and lambda_t to (0.0, 100.0) - Add fire source F0_8 at (16348, 6801) with r0=400 - Export fire_sources and fire_events to run parameter log Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Functional updates: - Disable veh3 and veh4 spawn groups (commented out) to reduce toy-model agent count for focused testing - Comment out large-scale spawn groups (veh9–veh41) — reserved for future 100+ agent experiments - Update SUMO network (Repaired.net.xml): add junction J20 and associated internal edges for improved routing near south zone - Update Repaired.netecfg and Repaired.sumocfg timestamps Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

legend5teve and others added 30 commits March 5, 2026 16:05

refactor: Change scenario prompts in agents/scenarios.py

3623b4c

chore: update the suggested routes and destination for Lytton

df2f056

Changes to be committed: modified: agentevac/simulation/main.py modified: agentevac/simulation/spawn_events.py modified: agentevac/utils/replay.py modified: sumo/Repaired.netecfg modified: sumo/Repaired.sumocfg

feat: optimize the visualization module for plotting statistic result…

0f4ac33

…s and agent communication

Merge branch 'main' into feat/visualization-module-plots

747d1ae

feat: implement timeline analysis for evacuation in scripts/plot_agen…

4f3172f

…t_round_timeline.py

Merge branch 'main' into feat/visualization-module-plots

aaca505

chore: add test cases to cover newly added features; update doc strin…

1c7ff71

…gs for documentation

chore: update plotting scales according to actual KPI scales

44bbd68

feat: log run parameters for plotting modules

a1f935f

Merge branch 'main' into feat/visualization-module-plots

077be18

Merge branch 'main' into feat/visualization-module-plots

d73e226

feat: add early termination when all agents have evacuated

edd71e9

Stop the simulation loop as soon as every spawned vehicle has departed and arrived at its destination, instead of running until SIM_END_TIME_S. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into feat/visualization-mo…

36014f4

…dule-plots # Conflicts: # agentevac/agents/agent_state.py # agentevac/analysis/metrics.py # agentevac/simulation/main.py # sumo/Repaired.netecfg # sumo/Repaired.sumocfg

Merge remote-tracking branch 'origin/main' into feat/visualization-mo…

7892b89

…dule-plots # Conflicts: # agentevac/simulation/main.py # sumo/Repaired.net.xml # sumo/Repaired.netecfg # sumo/Repaired.sumocfg

legend5teve and others added 2 commits March 30, 2026 01:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: decision rule prompts, theta_trust enforcement, spatial messaging, and scenario history filtering#45

feat: decision rule prompts, theta_trust enforcement, spatial messaging, and scenario history filtering#45
legend5teve wants to merge 32 commits intodenoslab:mainfrom
legend5teve:feat/visualization-module-plots

legend5teve commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

legend5teve commented Mar 30, 2026

Summary

Commits

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant