Skip to content

feat: decision rule prompts, theta_trust enforcement, spatial messaging, and scenario history filtering#45

Open
legend5teve wants to merge 32 commits intodenoslab:mainfrom
legend5teve:feat/visualization-module-plots
Open

feat: decision rule prompts, theta_trust enforcement, spatial messaging, and scenario history filtering#45
legend5teve wants to merge 32 commits intodenoslab:mainfrom
legend5teve:feat/visualization-module-plots

Conversation

@legend5teve
Copy link
Copy Markdown
Collaborator

Summary

  • DECISION RULE prompt replaces BINDING CONSTRAINT for predeparture decisions — 4 ordered rules (p_danger > theta_r, heuristic signal, official order, else wait) with stop-at-first-match semantics, fixing the deadlock where no agents departed
  • theta_trust LLM enforcement — when theta_trust=0.0, inbox is stripped and a binding constraint prevents the LLM from citing social data; when theta_trust>0.0, calibrated percentage guidance is injected
  • Spatial messaging — new COMM_RADIUS_M env var enables distance-based broadcast filtering so only nearby agents receive messages
  • Scenario history filtering — agent self-history records are sanitized per information regime before embedding in LLM prompts (prevents forecast/advisory data leakage)
  • Messaging module extractionAgentMessagingBus and OutboxMessage moved from main.py to agentevac/agents/messaging.py
  • Compact JSON encoder for readable run parameter logs
  • Sim config updates — new fire source F0_8, widened lambda bounds, updated spawn groups and SUMO network geometry

Commits

  1. dcf53cerefactor: Extract messaging module, add filter_history_for_scenario(), compact JSON encoder, new tests (54 pass)
  2. ea5e61cfeat: Overhaul departure/routing prompts with decision rules, theta_trust enforcement, spatial messaging
  3. f17ecacchore: Update spawn configuration and SUMO network geometry

Test plan

  • All 54 unit tests pass (test_scenarios.py, test_messaging.py)
  • Toy-model validation run: no_notice with messaging off — verify agents depart with DECISION RULE
  • Toy-model validation run: advice_guided with messaging off — verify advisory labels update over time
  • Toy-model validation run: no_notice with messaging on — verify no degenerate herd behavior

🤖 Generated with Claude Code

legend5teve and others added 30 commits March 5, 2026 16:05
 Changes to be committed:
	modified:   agentevac/simulation/main.py
	modified:   agentevac/simulation/spawn_events.py
	modified:   agentevac/utils/replay.py
	modified:   sumo/Repaired.netecfg
	modified:   sumo/Repaired.sumocfg
  Module updated: agentevac/utils/replay.py

  - Fixed RouteReplay._load_schedule(...) so it only reads step and veh_id for replayable events:
      - route_change
      - departure_release
  - Non-replayable events like agent_cognition and metrics_snapshot are now ignored without touching veh_id.

  Cause

  - The loader was accessing rec["veh_id"] before checking the event type.
  - metrics_snapshot records do not have veh_id, so replay loading crashed with KeyError.

  Verification

  1. python3 -m py_compile agentevac/utils/replay.py passed.
  2. Reproduced the failing case with a small local script:

  - one route_change
  - one agent_cognition
  - one metrics_snapshot
  - replay load now succeeds and only indexes the route-change step.
…g, and per-agent heterogeneity

- Add compute_signal_conflict() using Jensen-Shannon divergence in belief_model.py
- Restructure all three LLM prompts (pre-departure, destination, route) to expose
  raw env vs. social disagreement via your_observation/neighbor_assessment/
  information_conflict/combined_belief fields
- Add conflict_assessment field to all Pydantic response models
- Add conflict recording to metrics (record_conflict_sample, compute_average_signal_conflict)
- Implement distance-based noise scaling (proposal Eq. 1): effective sigma scales
  with fire margin / reference distance via DIST_REF_M config
- Add per-agent parameter heterogeneity via sample_profile_params() with truncated
  normal distributions; configurable via *_SPREAD env vars (default 0 = legacy)
- Fix stale subjective_information reference in scenarios.py
- Add experiment stage scripts (stages 0-5) for RQ1/RQ2/RQ3 sweeps
- Add comprehensive tests for all new features (291 tests passing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… recording

- Add observation-based exposure function for no_notice scenario that uses
  agent belief state and route length instead of route-specific fire data
- Enable expected_utility in all three scenarios (no_notice, alert_guided,
  advice_guided) with scenario-aware LLM policy text
- Update menu filtering to retain travel time and utility for no_notice agents
- Fix NET_FILE default from .rou.xml (route file) to .net.xml (network file),
  which caused EDGE_SHAPE to be empty and all exposure scores to be zero
- Fix exposure recording to fire only on decision rounds instead of every
  simulation step, preventing dilution of the exposure average
- Add Repaired.net.xml to repo; update SUMO configs to use local net file

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e parameters

Use SIM_END_TIME_S (default 1200s) to control simulation duration instead of
relying on getMinExpectedNumber(), which terminated early when no agents had
departed yet. Remove dummy t_0 vehicle from route file. Add --sim-end-time
CLI flag and SIM_END_TIME_S env var. Update fire source growth rates and
timing for more aggressive spread scenarios.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add input-hash caching (Plan C) across all 3 LLM call sites to skip
  redundant API calls when agent inputs haven't changed between rounds
- Add parallel LLM dispatch (Plan A) for process_pending_departures using
  ThreadPoolExecutor — two-phase collect-then-process pattern fires all
  non-cached predeparture LLM calls concurrently (up to MAX_CONCURRENT_LLM)
- Add 4 new fields to AgentRuntimeState for cache state tracking
- Add RQ1–RQ4 experiment runner scripts for automated parameter sweeps

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Stop the simulation loop as soon as every spawned vehicle has departed
and arrived at its destination, instead of running until SIM_END_TIME_S.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Restructure LLM decision prompts with explicit priority levels
(safety > official guidance > risk assessment), add EOC guidance_source
to operator briefings, and fix early termination to check actual
arrivals via metrics.arrived_count() instead of active vehicle count.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dule-plots

# Conflicts:
#	agentevac/agents/agent_state.py
#	agentevac/analysis/metrics.py
#	agentevac/simulation/main.py
#	sumo/Repaired.netecfg
#	sumo/Repaired.sumocfg
…etwork updates

- Record per-step edge traces via getRoadID() for faithful replay of
  actual routes taken (not just planned routes)
- Add departure destination choice: agents pick a destination via LLM
  before spawning, so vehicles head the right direction from step zero
  (parallel LLM dispatch via ThreadPoolExecutor)
- Fix early termination to check arrived_count instead of empty vehicle
  list (prevents premature exit when SUMO defers vehicle insertion)
- Replay mode reads to_edge from departure records for correct initial
  routing
- Update SUMO network with new shelter edges (E#S0, E#S1, E#S2) and
  refreshed spawn_events
- Refine scenario prompt suffixes and DecisionModel schema
  (situation_summary field, expanded reason descriptions)
- Update forecast_layer and corresponding tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dule-plots

# Conflicts:
#	agentevac/simulation/main.py
#	sumo/Repaired.net.xml
#	sumo/Repaired.netecfg
#	sumo/Repaired.sumocfg
…selection

- Record departure destination choice in metrics via record_decision_snapshot
  so destination_choice_share counts all agents (not just those processed by
  process_vehicles)
- Replace fixed-offset vehicle selection with round-robin so all agents get
  mid-route LLM re-evaluation over successive decision ticks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The experiment runner lacked a --sumo-seed argument, so RQ scripts
passed the seed via env var prefix which broke under dash (/bin/sh).
Add --sumo-seed to experiments.py and rewrite all four RQ scripts to
use POSIX-compatible for-loops with the new flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Record each agent's sampled psychological parameters (theta_trust,
theta_r, theta_u, gamma, lambda_e, lambda_t) and write them to an
agent_profiles JSON file alongside the metrics file. Enables post-hoc
verification of population heterogeneity distributions in RQ4 runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…stances

The old thresholds (danger ≤100m, risky ≤300m, buffered ≤700m) were
calibrated for a city-block mental model. In the actual simulation,
the minimum observed margin is 619m, so every agent always classified
the fire as "safe" and never perceived risk. Scale thresholds to
danger ≤1200m, risky ≤2500m, buffered ≤5000m so agents meaningfully
perceive fire hazard. Also scale RISK_DECAY_M from 80 to 960 to
keep the exponential risk curve proportional.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
En-route agents in no_notice mode can now see fire on the first few
edges ahead of their current position (VISUAL_LOOKAHEAD_EDGES, default 3).
This adds a penalty to the current destination's exposure score, making
agents more likely to switch shelters when fire blocks their route.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ling

_observation_based_exposure() now prefers travel_time_s_fastest_path
(minutes * 0.3) over len_edges (count * 0.15) to better reflect actual
exposure duration. Falls back to edge count when travel time is unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…y prompts

Adds an explicit instruction to all three LLM policy locations (pre-departure,
en-route destination, en-route route) requiring agents to only reference
information explicitly present in the prompt data. Prevents GPT-4o-mini from
fabricating neighbor behaviors, evacuation patterns, or shelter choices that
cascade through the messaging system.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agents in no_notice mode now only perceive fires within
FIRE_PERCEPTION_RANGE_M (default 1200m) of their position:

- Perception horizon: if no fire is in range, env_signal margins are
  None → observed_state="unknown" (genuine uncertainty instead of false
  "safe").  When fire is in range, margins are computed from visible
  fires only.

- Route-level fire data: all reachable menu items gain
  proximity_blocked_edges and proximity_min_margin_m from visible fires,
  enabling the utility function to differentiate destinations by hazard.

- Exposure scoring: _observation_based_exposure() adds a proximity
  penalty (blocked * 8.0 + margin_penalty) matching _expected_exposure
  weights, so routes through visible fire are strongly deprioritised.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nd compact JSON encoder

Functional updates:
- Extract AgentMessagingBus and OutboxMessage from main.py into
  agentevac/agents/messaging.py with spatial filtering support
  (comm_radius_m parameter for distance-based broadcast delivery)
- Add filter_history_for_scenario() in scenarios.py to sanitize
  agent self-history records before embedding in LLM prompts
  (strips forecast, advisory, fire metrics per information regime)
- Add _CompactLeafEncoder in run_parameters.py for readable
  parameter log output (leaf dicts rendered on single lines)
- Add agentevac/analysis/analyze_run.py run analysis utility
- Add tests for messaging spatial filtering (test_messaging.py)
  and history filtering (test_scenarios.py)
- Add Research_Proposal_0317.pptx to .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
legend5teve and others added 2 commits March 30, 2026 01:14
…rust enforcement, and spatial messaging

Functional updates in agentevac/simulation/main.py:

DEPARTURE PROMPT:
- Replace BINDING CONSTRAINT with DECISION RULE prompt — 4 ordered
  rules (p_danger > theta_r, heuristic_departure_signal, official
  order, else wait) with stop-at-first-match semantics
- Add heuristic_departure_signal field to predeparture prompt,
  exposing the pre-computed departure model output (covers theta_r,
  urgency decay via theta_u/gamma, low-confidence precaution, and
  neighbor departure pressure)

THETA_TRUST LLM ENFORCEMENT (predeparture + routing prompts):
- When theta_trust=0.0: strip inbox entirely, add BINDING CONSTRAINT
  instructing LLM to IGNORE all neighbor/social data
- When theta_trust>0.0: inject calibrated percentage guidance
  (e.g., "rely 70% on own observation, 30% on neighbor messages")
- Apply _consider_pol, _belief_weigh_pol, trust_policy to all 3
  routing prompt locations (destination, route, and predeparture)

SCENARIO-AWARE HISTORY:
- Filter agent_self_history through filter_history_for_scenario()
  before embedding in routing prompts (prevents data leakage from
  historical records)

SPATIAL MESSAGING:
- Add COMM_RADIUS_M env var for distance-based broadcast filtering
- Precompute SPAWN_EDGE_MIDPOINT for pre-departure agent positions
- Pass agent positions to messaging.begin_round() for spatial delivery
- Include comm_radius_m in messaging config sections of all prompts

CONFIGURATION:
- Remove inline AgentMessagingBus/OutboxMessage (now imported)
- Widen lambda_e bounds to (0.0, 100.0) and lambda_t to (0.0, 100.0)
- Add fire source F0_8 at (16348, 6801) with r0=400
- Export fire_sources and fire_events to run parameter log

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Functional updates:
- Disable veh3 and veh4 spawn groups (commented out) to reduce
  toy-model agent count for focused testing
- Comment out large-scale spawn groups (veh9–veh41) — reserved
  for future 100+ agent experiments
- Update SUMO network (Repaired.net.xml): add junction J20 and
  associated internal edges for improved routing near south zone
- Update Repaired.netecfg and Repaired.sumocfg timestamps

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant