Skip to content

Add Among Them SDK — Phase 0/1 with cogames packaging + dev loop#112

Open
aaln wants to merge 4 commits into
masterfrom
aaln/among-them-sdk
Open

Add Among Them SDK — Phase 0/1 with cogames packaging + dev loop#112
aaln wants to merge 4 commits into
masterfrom
aaln/among-them-sdk

Conversation

@aaln
Copy link
Copy Markdown
Contributor

@aaln aaln commented May 7, 2026

A Cursor-SDK-style Python harness for authoring Among Them policy bots. Wraps evidencebot_v2 via in-process FFI as the default scripted policy and exposes a instructions= parameter plus module-swap kwargs (voter=, chatter=, reporter=, ...) for LLM-augmented cognition.

What's in this iteration

  • among_them/sdk/ — full Python package (~3.5k LOC)
    • Agent.create() / LiveGame / LocalSDKPolicy / SDKPolicy (cogames MultiAgentPolicy entrypoint composing EvidenceBotV2NimPolicy)
    • _DirectiveOverrideEngine shared between local + tournament code paths
    • Directives Pydantic schema parsed from natural-language instructions via the existing cognition/llm.py provider (deterministic keyword fallback when no API key is set — required for cogames Docker validator)
    • among_them_sdk.package CLI for emitting cogames upload bundles
    • 13 runnable examples (hello, instructions, personas, custom voter/reporter, LLM chatter, mixed modules, A/B test, win-rate loop, transcript logger, debug directives, provider switch, tournament, eight_player_game, variant_arena head-to-head)
    • 5 test files / 25+ tests passing under uv run pytest
  • among_them/sdk/docs/ — python-guide, tournament-submission, local-iteration-guide
  • among_them/players/sdk/DESIGN.md — architecture + phased roadmap
  • among_them/server.nim — drop a duplicate liveProgressMaxTick proc that blocked compilation

Tournament submission status
Validated end-to-end via cogames upload --season among-them after two fixes:

  • SDKPolicy now resolves evidencebot_v2_policy at runtime by walking up to find among_them/players/ (cogames only puts the entry-point package's dir on sys.path)
  • Bundle now includes among_them/votereader.nim (recently added Nim dep that evidencebot_v2.nim imports)

Known follow-ups (in flight in parallel iterations)

  • LLM integration design doc (docs/llm-integration.md)
  • Cross-game opponent modeling (opponents/ subpackage)
  • --persona shortcut on eight_player_game.py
  • Voter/Chatter advisory surfacing in result block

aaln and others added 4 commits May 6, 2026 17:24
A Cursor-SDK-style Python harness for authoring Among Them policy bots.
Wraps `evidencebot_v2` via in-process FFI as the default scripted policy
and exposes a `instructions=` parameter plus module-swap kwargs
(`voter=`, `chatter=`, `reporter=`, ...) for LLM-augmented cognition.

What's in this iteration
- `among_them/sdk/` — full Python package (~3.5k LOC)
  - `Agent.create()` / `LiveGame` / `LocalSDKPolicy` / `SDKPolicy`
    (cogames `MultiAgentPolicy` entrypoint composing `EvidenceBotV2NimPolicy`)
  - `_DirectiveOverrideEngine` shared between local + tournament code paths
  - `Directives` Pydantic schema parsed from natural-language instructions
    via the existing `cognition/llm.py` provider (deterministic keyword fallback
    when no API key is set — required for cogames Docker validator)
  - `among_them_sdk.package` CLI for emitting `cogames upload` bundles
  - 13 runnable examples (hello, instructions, personas, custom voter/reporter,
    LLM chatter, mixed modules, A/B test, win-rate loop, transcript logger,
    debug directives, provider switch, tournament, eight_player_game,
    variant_arena head-to-head)
  - 5 test files / 25+ tests passing under `uv run pytest`
- `among_them/sdk/docs/` — python-guide, tournament-submission,
  local-iteration-guide
- `among_them/players/sdk/DESIGN.md` — architecture + phased roadmap
- `among_them/server.nim` — drop a duplicate `liveProgressMaxTick` proc
  that blocked compilation

Tournament submission status
Validated end-to-end via `cogames upload --season among-them` after
two fixes:
- `SDKPolicy` now resolves `evidencebot_v2_policy` at runtime by walking
  up to find `among_them/players/` (cogames only puts the entry-point
  package's dir on `sys.path`)
- Bundle now includes `among_them/votereader.nim` (recently added Nim
  dep that `evidencebot_v2.nim` imports)

Known follow-ups (in flight in parallel iterations)
- LLM integration design doc (`docs/llm-integration.md`)
- Cross-game opponent modeling (`opponents/` subpackage)
- `--persona` shortcut on `eight_player_game.py`
- Voter/Chatter advisory surfacing in result block

Co-authored-by: Cursor <cursoragent@cursor.com>
`among_them/sdk/docs/llm-integration.md` (~4.5k words) — opinionated menu
of how LLMs can plug into the SDK across slow / medium / fast / config-time
decision paths. Covers seven architectural patterns (tool-loop, subagents,
skills directory, provider routing, streaming + speculative, MCP, RAG over
replays), a long deep-dive on chat / accusation / defense decomposed into
five sub-modules with mermaid timeline, a tournament-safety matrix, a
back-of-envelope cost model, five ranked near-term builds, and seven open
questions with recommendations.

Top recommendation: `LLMReporter` first — binary decision, 3 calls/game,
unique tournament-safety story (LLM at packaging time emits a decision
tree the runtime evaluates without inference).

Several patterns already partially exist in the codebase: `@tool` /
`ToolLoop` in `cognition/tools.py`, AI Gateway routing in `cognition/llm.py`,
template banks in `modules/chatter.py`, packaging-time LLM in
`cognition/instructions.py`, and the sidecar prior art in
`among_them/bot-policies/sidecar/`.

Plus a one-line README cross-link.

Co-authored-by: Cursor <cursoragent@cursor.com>
Persistent learning loop: every local game captures what each named
opponent says, votes, and does; an analyzer rolls those observations
into a typed `OpponentProfile`; profiles refine across games (monotonic
confidence) and feed back into `LLMVoter` / `LLMChatter` prompts.

New package: `among_them/sdk/src/among_them_sdk/opponents/`
- `models.py` — Pydantic v2 schema (`OpponentProfile`,
  `ObservationEvent`, sub-profile models for chat / vote / accusation /
  defense / role-conditional behavior)
- `store.py` — `OpponentStore` persisting to
  `~/.among-them/opponents/<name>/{observations.ndjson, profile.json}`
  (overridable via constructor, env var, or `--store-root`)
- `collector.py` — `ObservationCollector` wired into `AgentHooks`,
  source-tolerant about hook payload shapes (silently drops events
  lacking an actor — see "in-flight" notes for why we did NOT modify
  `live_game.py`)
- `analyzer.py` — `analyze_opponent` / `analyze_all` with deterministic
  statistical fallback when no API key is set (caps confidence at 0.3),
  LLM path via existing `cognition/llm.py`, `merge_profiles` for
  monotonic intel improvement
- `bundle.py` — `freeze_profiles` + read-only `BundledProfileLookup` for
  the cogames Docker path (no live LLM at runtime)
- `__main__.py` — `python -m among_them_sdk.opponents`
  {`list`, `show`, `analyze`, `analyze-all`, `freeze`, `record`}

Consumer integration (additive, kwarg defaults preserve behavior):
- `LLMVoter(opponent_profiles=...)` — injects compact intel block into
  vote prompt
- `LLMChatter(opponent_profiles=...)` — same for chat composition
- `Agent.create(load_opponent_profiles=True)` — auto-loads from default
  store and threads through to LLM modules

Tournament packaging:
- `python -m among_them_sdk.package --profiles-from <store-dir>` writes
  `among_them_sdk_opponents.json` next to the bundle config and includes
  a `-f` flag in the printed `cogames upload` command. `SDKPolicy` reads
  via `BundledProfileLookup` — interface-compatible with the live store.

Demo + tests:
- `examples/opponent_learning_loop.py` — end-to-end loop with two modes
  (simulated, real). `--games 2 --no-llm` completes in ~600ms; produces
  7 profiles, ~12KB tournament snapshot.
- 26 new tests (`tests/test_opponents.py`); pytest now: 51 passed / 1
  skipped. `uvx ruff check` clean.

Doc: `among_them/sdk/docs/opponent-modeling.md` (~1.5k words) +
README cross-link.

Known limits documented in the doc:
- LiveGame only surfaces SDK-player's own messages today; full
  cross-player chat capture needs a `/global` subscription (Phase 4)
- Privacy: opponent names + chat persist verbatim to disk

Co-authored-by: Cursor <cursoragent@cursor.com>
The orchestrator's drain timeout was racing the asyncio loop teardown
in `_variant_worker.py` — when SIGTERM landed mid-`asyncio.run(...)`
the post-loop `_write_metrics(...)` never ran, so per-variant metrics
JSONs occasionally went missing on first invocation.

Fix: install a SIGTERM handler that snapshots `policy.engine.stats` to
disk and `os._exit(0)`s. Pair with a longer drain wait in the
orchestrator (already in `variant_arena.py`).

Verified by re-running `--games 5`: all 8 variant metrics flushed,
comparison table populated. Engine signals now line up with intent —
`paranoid_crewmate` (eagerness=high) passes reports, `aggressive_imposter`
(eagerness=low) suppresses them.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant