Add Among Them SDK — Phase 0/1 with cogames packaging + dev loop#112
Open
aaln wants to merge 4 commits into
Open
Add Among Them SDK — Phase 0/1 with cogames packaging + dev loop#112aaln wants to merge 4 commits into
aaln wants to merge 4 commits into
Conversation
A Cursor-SDK-style Python harness for authoring Among Them policy bots.
Wraps `evidencebot_v2` via in-process FFI as the default scripted policy
and exposes a `instructions=` parameter plus module-swap kwargs
(`voter=`, `chatter=`, `reporter=`, ...) for LLM-augmented cognition.
What's in this iteration
- `among_them/sdk/` — full Python package (~3.5k LOC)
- `Agent.create()` / `LiveGame` / `LocalSDKPolicy` / `SDKPolicy`
(cogames `MultiAgentPolicy` entrypoint composing `EvidenceBotV2NimPolicy`)
- `_DirectiveOverrideEngine` shared between local + tournament code paths
- `Directives` Pydantic schema parsed from natural-language instructions
via the existing `cognition/llm.py` provider (deterministic keyword fallback
when no API key is set — required for cogames Docker validator)
- `among_them_sdk.package` CLI for emitting `cogames upload` bundles
- 13 runnable examples (hello, instructions, personas, custom voter/reporter,
LLM chatter, mixed modules, A/B test, win-rate loop, transcript logger,
debug directives, provider switch, tournament, eight_player_game,
variant_arena head-to-head)
- 5 test files / 25+ tests passing under `uv run pytest`
- `among_them/sdk/docs/` — python-guide, tournament-submission,
local-iteration-guide
- `among_them/players/sdk/DESIGN.md` — architecture + phased roadmap
- `among_them/server.nim` — drop a duplicate `liveProgressMaxTick` proc
that blocked compilation
Tournament submission status
Validated end-to-end via `cogames upload --season among-them` after
two fixes:
- `SDKPolicy` now resolves `evidencebot_v2_policy` at runtime by walking
up to find `among_them/players/` (cogames only puts the entry-point
package's dir on `sys.path`)
- Bundle now includes `among_them/votereader.nim` (recently added Nim
dep that `evidencebot_v2.nim` imports)
Known follow-ups (in flight in parallel iterations)
- LLM integration design doc (`docs/llm-integration.md`)
- Cross-game opponent modeling (`opponents/` subpackage)
- `--persona` shortcut on `eight_player_game.py`
- Voter/Chatter advisory surfacing in result block
Co-authored-by: Cursor <cursoragent@cursor.com>
`among_them/sdk/docs/llm-integration.md` (~4.5k words) — opinionated menu of how LLMs can plug into the SDK across slow / medium / fast / config-time decision paths. Covers seven architectural patterns (tool-loop, subagents, skills directory, provider routing, streaming + speculative, MCP, RAG over replays), a long deep-dive on chat / accusation / defense decomposed into five sub-modules with mermaid timeline, a tournament-safety matrix, a back-of-envelope cost model, five ranked near-term builds, and seven open questions with recommendations. Top recommendation: `LLMReporter` first — binary decision, 3 calls/game, unique tournament-safety story (LLM at packaging time emits a decision tree the runtime evaluates without inference). Several patterns already partially exist in the codebase: `@tool` / `ToolLoop` in `cognition/tools.py`, AI Gateway routing in `cognition/llm.py`, template banks in `modules/chatter.py`, packaging-time LLM in `cognition/instructions.py`, and the sidecar prior art in `among_them/bot-policies/sidecar/`. Plus a one-line README cross-link. Co-authored-by: Cursor <cursoragent@cursor.com>
Persistent learning loop: every local game captures what each named
opponent says, votes, and does; an analyzer rolls those observations
into a typed `OpponentProfile`; profiles refine across games (monotonic
confidence) and feed back into `LLMVoter` / `LLMChatter` prompts.
New package: `among_them/sdk/src/among_them_sdk/opponents/`
- `models.py` — Pydantic v2 schema (`OpponentProfile`,
`ObservationEvent`, sub-profile models for chat / vote / accusation /
defense / role-conditional behavior)
- `store.py` — `OpponentStore` persisting to
`~/.among-them/opponents/<name>/{observations.ndjson, profile.json}`
(overridable via constructor, env var, or `--store-root`)
- `collector.py` — `ObservationCollector` wired into `AgentHooks`,
source-tolerant about hook payload shapes (silently drops events
lacking an actor — see "in-flight" notes for why we did NOT modify
`live_game.py`)
- `analyzer.py` — `analyze_opponent` / `analyze_all` with deterministic
statistical fallback when no API key is set (caps confidence at 0.3),
LLM path via existing `cognition/llm.py`, `merge_profiles` for
monotonic intel improvement
- `bundle.py` — `freeze_profiles` + read-only `BundledProfileLookup` for
the cogames Docker path (no live LLM at runtime)
- `__main__.py` — `python -m among_them_sdk.opponents`
{`list`, `show`, `analyze`, `analyze-all`, `freeze`, `record`}
Consumer integration (additive, kwarg defaults preserve behavior):
- `LLMVoter(opponent_profiles=...)` — injects compact intel block into
vote prompt
- `LLMChatter(opponent_profiles=...)` — same for chat composition
- `Agent.create(load_opponent_profiles=True)` — auto-loads from default
store and threads through to LLM modules
Tournament packaging:
- `python -m among_them_sdk.package --profiles-from <store-dir>` writes
`among_them_sdk_opponents.json` next to the bundle config and includes
a `-f` flag in the printed `cogames upload` command. `SDKPolicy` reads
via `BundledProfileLookup` — interface-compatible with the live store.
Demo + tests:
- `examples/opponent_learning_loop.py` — end-to-end loop with two modes
(simulated, real). `--games 2 --no-llm` completes in ~600ms; produces
7 profiles, ~12KB tournament snapshot.
- 26 new tests (`tests/test_opponents.py`); pytest now: 51 passed / 1
skipped. `uvx ruff check` clean.
Doc: `among_them/sdk/docs/opponent-modeling.md` (~1.5k words) +
README cross-link.
Known limits documented in the doc:
- LiveGame only surfaces SDK-player's own messages today; full
cross-player chat capture needs a `/global` subscription (Phase 4)
- Privacy: opponent names + chat persist verbatim to disk
Co-authored-by: Cursor <cursoragent@cursor.com>
The orchestrator's drain timeout was racing the asyncio loop teardown in `_variant_worker.py` — when SIGTERM landed mid-`asyncio.run(...)` the post-loop `_write_metrics(...)` never ran, so per-variant metrics JSONs occasionally went missing on first invocation. Fix: install a SIGTERM handler that snapshots `policy.engine.stats` to disk and `os._exit(0)`s. Pair with a longer drain wait in the orchestrator (already in `variant_arena.py`). Verified by re-running `--games 5`: all 8 variant metrics flushed, comparison table populated. Engine signals now line up with intent — `paranoid_crewmate` (eagerness=high) passes reports, `aggressive_imposter` (eagerness=low) suppresses them. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A Cursor-SDK-style Python harness for authoring Among Them policy bots. Wraps
evidencebot_v2via in-process FFI as the default scripted policy and exposes ainstructions=parameter plus module-swap kwargs (voter=,chatter=,reporter=, ...) for LLM-augmented cognition.What's in this iteration
among_them/sdk/— full Python package (~3.5k LOC)Agent.create()/LiveGame/LocalSDKPolicy/SDKPolicy(cogamesMultiAgentPolicyentrypoint composingEvidenceBotV2NimPolicy)_DirectiveOverrideEngineshared between local + tournament code pathsDirectivesPydantic schema parsed from natural-language instructions via the existingcognition/llm.pyprovider (deterministic keyword fallback when no API key is set — required for cogames Docker validator)among_them_sdk.packageCLI for emittingcogames uploadbundlesuv run pytestamong_them/sdk/docs/— python-guide, tournament-submission, local-iteration-guideamong_them/players/sdk/DESIGN.md— architecture + phased roadmapamong_them/server.nim— drop a duplicateliveProgressMaxTickproc that blocked compilationTournament submission status
Validated end-to-end via
cogames upload --season among-themafter two fixes:SDKPolicynow resolvesevidencebot_v2_policyat runtime by walking up to findamong_them/players/(cogames only puts the entry-point package's dir onsys.path)among_them/votereader.nim(recently added Nim dep thatevidencebot_v2.nimimports)Known follow-ups (in flight in parallel iterations)
docs/llm-integration.md)opponents/subpackage)--personashortcut oneight_player_game.py