Add Among Them SDK — Phase 0/1 with cogames packaging + dev loop by aaln · Pull Request #112 · Metta-AI/bitworld

aaln · 2026-05-07T00:24:59Z

A Cursor-SDK-style Python harness for authoring Among Them policy bots. Wraps evidencebot_v2 via in-process FFI as the default scripted policy and exposes a instructions= parameter plus module-swap kwargs (voter=, chatter=, reporter=, ...) for LLM-augmented cognition.

What's in this iteration

among_them/sdk/ — full Python package (~3.5k LOC)
- Agent.create() / LiveGame / LocalSDKPolicy / SDKPolicy (cogames MultiAgentPolicy entrypoint composing EvidenceBotV2NimPolicy)
- _DirectiveOverrideEngine shared between local + tournament code paths
- Directives Pydantic schema parsed from natural-language instructions via the existing cognition/llm.py provider (deterministic keyword fallback when no API key is set — required for cogames Docker validator)
- among_them_sdk.package CLI for emitting cogames upload bundles
- 13 runnable examples (hello, instructions, personas, custom voter/reporter, LLM chatter, mixed modules, A/B test, win-rate loop, transcript logger, debug directives, provider switch, tournament, eight_player_game, variant_arena head-to-head)
- 5 test files / 25+ tests passing under uv run pytest
among_them/sdk/docs/ — python-guide, tournament-submission, local-iteration-guide
among_them/players/sdk/DESIGN.md — architecture + phased roadmap
among_them/server.nim — drop a duplicate liveProgressMaxTick proc that blocked compilation

Tournament submission status
Validated end-to-end via cogames upload --season among-them after two fixes:

SDKPolicy now resolves evidencebot_v2_policy at runtime by walking up to find among_them/players/ (cogames only puts the entry-point package's dir on sys.path)
Bundle now includes among_them/votereader.nim (recently added Nim dep that evidencebot_v2.nim imports)

Known follow-ups (in flight in parallel iterations)

LLM integration design doc (docs/llm-integration.md)
Cross-game opponent modeling (opponents/ subpackage)
--persona shortcut on eight_player_game.py
Voter/Chatter advisory surfacing in result block

A Cursor-SDK-style Python harness for authoring Among Them policy bots. Wraps `evidencebot_v2` via in-process FFI as the default scripted policy and exposes a `instructions=` parameter plus module-swap kwargs (`voter=`, `chatter=`, `reporter=`, ...) for LLM-augmented cognition. What's in this iteration - `among_them/sdk/` — full Python package (~3.5k LOC) - `Agent.create()` / `LiveGame` / `LocalSDKPolicy` / `SDKPolicy` (cogames `MultiAgentPolicy` entrypoint composing `EvidenceBotV2NimPolicy`) - `_DirectiveOverrideEngine` shared between local + tournament code paths - `Directives` Pydantic schema parsed from natural-language instructions via the existing `cognition/llm.py` provider (deterministic keyword fallback when no API key is set — required for cogames Docker validator) - `among_them_sdk.package` CLI for emitting `cogames upload` bundles - 13 runnable examples (hello, instructions, personas, custom voter/reporter, LLM chatter, mixed modules, A/B test, win-rate loop, transcript logger, debug directives, provider switch, tournament, eight_player_game, variant_arena head-to-head) - 5 test files / 25+ tests passing under `uv run pytest` - `among_them/sdk/docs/` — python-guide, tournament-submission, local-iteration-guide - `among_them/players/sdk/DESIGN.md` — architecture + phased roadmap - `among_them/server.nim` — drop a duplicate `liveProgressMaxTick` proc that blocked compilation Tournament submission status Validated end-to-end via `cogames upload --season among-them` after two fixes: - `SDKPolicy` now resolves `evidencebot_v2_policy` at runtime by walking up to find `among_them/players/` (cogames only puts the entry-point package's dir on `sys.path`) - Bundle now includes `among_them/votereader.nim` (recently added Nim dep that `evidencebot_v2.nim` imports) Known follow-ups (in flight in parallel iterations) - LLM integration design doc (`docs/llm-integration.md`) - Cross-game opponent modeling (`opponents/` subpackage) - `--persona` shortcut on `eight_player_game.py` - Voter/Chatter advisory surfacing in result block Co-authored-by: Cursor <cursoragent@cursor.com>

`among_them/sdk/docs/llm-integration.md` (~4.5k words) — opinionated menu of how LLMs can plug into the SDK across slow / medium / fast / config-time decision paths. Covers seven architectural patterns (tool-loop, subagents, skills directory, provider routing, streaming + speculative, MCP, RAG over replays), a long deep-dive on chat / accusation / defense decomposed into five sub-modules with mermaid timeline, a tournament-safety matrix, a back-of-envelope cost model, five ranked near-term builds, and seven open questions with recommendations. Top recommendation: `LLMReporter` first — binary decision, 3 calls/game, unique tournament-safety story (LLM at packaging time emits a decision tree the runtime evaluates without inference). Several patterns already partially exist in the codebase: `@tool` / `ToolLoop` in `cognition/tools.py`, AI Gateway routing in `cognition/llm.py`, template banks in `modules/chatter.py`, packaging-time LLM in `cognition/instructions.py`, and the sidecar prior art in `among_them/bot-policies/sidecar/`. Plus a one-line README cross-link. Co-authored-by: Cursor <cursoragent@cursor.com>

Persistent learning loop: every local game captures what each named opponent says, votes, and does; an analyzer rolls those observations into a typed `OpponentProfile`; profiles refine across games (monotonic confidence) and feed back into `LLMVoter` / `LLMChatter` prompts. New package: `among_them/sdk/src/among_them_sdk/opponents/` - `models.py` — Pydantic v2 schema (`OpponentProfile`, `ObservationEvent`, sub-profile models for chat / vote / accusation / defense / role-conditional behavior) - `store.py` — `OpponentStore` persisting to `~/.among-them/opponents/<name>/{observations.ndjson, profile.json}` (overridable via constructor, env var, or `--store-root`) - `collector.py` — `ObservationCollector` wired into `AgentHooks`, source-tolerant about hook payload shapes (silently drops events lacking an actor — see "in-flight" notes for why we did NOT modify `live_game.py`) - `analyzer.py` — `analyze_opponent` / `analyze_all` with deterministic statistical fallback when no API key is set (caps confidence at 0.3), LLM path via existing `cognition/llm.py`, `merge_profiles` for monotonic intel improvement - `bundle.py` — `freeze_profiles` + read-only `BundledProfileLookup` for the cogames Docker path (no live LLM at runtime) - `__main__.py` — `python -m among_them_sdk.opponents` {`list`, `show`, `analyze`, `analyze-all`, `freeze`, `record`} Consumer integration (additive, kwarg defaults preserve behavior): - `LLMVoter(opponent_profiles=...)` — injects compact intel block into vote prompt - `LLMChatter(opponent_profiles=...)` — same for chat composition - `Agent.create(load_opponent_profiles=True)` — auto-loads from default store and threads through to LLM modules Tournament packaging: - `python -m among_them_sdk.package --profiles-from <store-dir>` writes `among_them_sdk_opponents.json` next to the bundle config and includes a `-f` flag in the printed `cogames upload` command. `SDKPolicy` reads via `BundledProfileLookup` — interface-compatible with the live store. Demo + tests: - `examples/opponent_learning_loop.py` — end-to-end loop with two modes (simulated, real). `--games 2 --no-llm` completes in ~600ms; produces 7 profiles, ~12KB tournament snapshot. - 26 new tests (`tests/test_opponents.py`); pytest now: 51 passed / 1 skipped. `uvx ruff check` clean. Doc: `among_them/sdk/docs/opponent-modeling.md` (~1.5k words) + README cross-link. Known limits documented in the doc: - LiveGame only surfaces SDK-player's own messages today; full cross-player chat capture needs a `/global` subscription (Phase 4) - Privacy: opponent names + chat persist verbatim to disk Co-authored-by: Cursor <cursoragent@cursor.com>

The orchestrator's drain timeout was racing the asyncio loop teardown in `_variant_worker.py` — when SIGTERM landed mid-`asyncio.run(...)` the post-loop `_write_metrics(...)` never ran, so per-variant metrics JSONs occasionally went missing on first invocation. Fix: install a SIGTERM handler that snapshots `policy.engine.stats` to disk and `os._exit(0)`s. Pair with a longer drain wait in the orchestrator (already in `variant_arena.py`). Verified by re-running `--games 5`: all 8 variant metrics flushed, comparison table populated. Engine signals now line up with intent — `paranoid_crewmate` (eagerness=high) passes reports, `aggressive_imposter` (eagerness=low) suppresses them. Co-authored-by: Cursor <cursoragent@cursor.com>

aaln and others added 4 commits May 6, 2026 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Among Them SDK — Phase 0/1 with cogames packaging + dev loop#112

Add Among Them SDK — Phase 0/1 with cogames packaging + dev loop#112
aaln wants to merge 4 commits into
masterfrom
aaln/among-them-sdk

aaln commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aaln commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant