devonartis
diff --git a/‎.agents/skills/broker/SKILL.md‎
Lines changed: 68 additions & 0 deletions b/‎.agents/skills/broker/SKILL.md‎
Lines changed: 68 additions & 0 deletions
diff --git a/‎.agents/skills/devflow-client/SKILL.md‎
Lines changed: 94 additions & 0 deletions b/‎.agents/skills/devflow-client/SKILL.md‎
Lines changed: 94 additions & 0 deletions
diff --git a/‎.claude/skills/broker/SKILL.md‎
Lines changed: 3 additions & 5 deletions b/‎.claude/skills/broker/SKILL.md‎
Lines changed: 3 additions & 5 deletions
diff --git a/‎.claude/skills/devflow-client/SKILL.md‎
Lines changed: 6 additions & 9 deletions b/‎.claude/skills/devflow-client/SKILL.md‎
Lines changed: 6 additions & 9 deletions
diff --git a/‎.plans/2026-04-02-sdk-broker-gap-review.md‎
Lines changed: 1 addition & 1 deletion b/‎.plans/2026-04-02-sdk-broker-gap-review.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.plans/2026-04-05-v0.3.0-phase2-cache-correctness-plan.md‎
Lines changed: 1 addition & 3 deletions b/‎.plans/2026-04-05-v0.3.0-phase2-cache-correctness-plan.md‎
Lines changed: 1 addition & 3 deletions
diff --git a/‎.plans/ARCHIVE/tracker-demo-app.jsonl‎
Lines changed: 17 additions & 0 deletions b/‎.plans/ARCHIVE/tracker-demo-app.jsonl‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎.plans/PROMPT.md‎
Lines changed: 48 additions & 0 deletions b/‎.plans/PROMPT.md‎
Lines changed: 48 additions & 0 deletions
@@ -0,0 +1,68 @@
+---
+name: broker
+description: Use when needing to start, stop, or check the AgentAuth core broker for integration testing, live verification, or acceptance tests
+---
+
+# Broker Management
+
+Manage the AgentAuth core broker Docker stack for local SDK testing.
+
+## Usage
+
+- `/broker up` — Start the broker
+- `/broker down` — Stop the broker
+- `/broker status` — Check if broker is running and healthy
+
+## Instructions
+
+Parse the argument from the skill invocation. Default to `status` if no argument given.
+
+### Configuration
+
+| Variable | Default | Override |
+|----------|---------|----------|
+| `AA_ADMIN_SECRET` | `live-test-secret-32bytes-long-ok` | Pass as second arg: `/broker up mysecret` |
+| `AA_HOST_PORT` | `8080` | Set env var before invoking |
+| Broker path | `./broker` (vendored in-repo) | — |
+
+### `up`
+
+```bash
+export AA_ADMIN_SECRET="${SECRET:-live-test-secret-32bytes-long-ok}"
+./broker/scripts/stack_up.sh
+```
+
+After stack_up completes, run a health check:
+
+```bash
+curl -sf http://127.0.0.1:${AA_HOST_PORT:-8080}/v1/health
+```
+
+Report success or failure clearly. If health check fails, wait 3 seconds and retry once — the broker may need a moment after `docker compose up -d`.
+
+### `down`
+
+```bash
+./broker/scripts/stack_down.sh
+```
+
+### `status`
+
+```bash
+curl -sf http://127.0.0.1:${AA_HOST_PORT:-8080}/v1/health
+```
+
+Report whether the broker is reachable. If not, suggest `/broker up`.
+
+## Output Format
+
+Always announce the action and result:
+
+```
+Broker: [action] — [result]
+```
+
+Examples:
+- `Broker: up — healthy at http://127.0.0.1:8080`
+- `Broker: down — stack removed`
+- `Broker: status — not reachable (run /broker up)`
@@ -0,0 +1,94 @@
+---
+name: devflow-client
+description: >
+  Use when starting any development work on AgentAuth Python SDK — loads the
+  Development Flow, checks tracker state, and tells you which step to execute next.
+  Trigger on: "start dev", "what's next", "resume work", "continue",
+  "where are we", "pick up where we left off", any development request.
+  No council steps, Python-specific gates.
+---
+
+# AgentAuth Python SDK — Development Flow
+
+Start here for any development work. This skill loads context and tells you
+what to do next.
+
+## Instructions
+
+1. Read these files in order:
+   - `MEMORY.md` (repo root)
+   - `FLOW.md` (repo root) — if it doesn't exist or has no current step, start at Step 1
+   - `.plans/tracker.jsonl` (current state of all stories and tasks) — create if missing
+
+2. From FLOW.md + tracker, identify the current step:
+
+| Step | What | Skill | Model | Done when |
+|------|------|-------|-------|-----------|
+| 1 | Brainstorm | `superpowers:brainstorming` | **opus** | Design doc in `.plans/designs/` |
+| 2 | Write Spec | Follow `.plans/SPEC-TEMPLATE.md` | **opus** | Spec in `.plans/specs/` |
+| 3 | Impl Plan | `superpowers:writing-plans` | **opus** | Plan in `.plans/` with tasks |
+| 4 | Acceptance Tests | Write stories in `tests/sdk-core/` | **opus** | Stories with Who/What/Why/How/Expected |
+| 5 | Register Tracker | Update `.plans/tracker.jsonl` | any | All stories + tasks registered |
+| 6 | Code | `superpowers:executing-plans` | **sonnet** | All tasks PASS, gates green |
+| 7 | Review | `superpowers:requesting-code-review` + `writing-plans` | **sonnet** / **opus** | Findings documented + fix plan written |
+| 7.5 | Fix Findings | `superpowers:executing-plans` | **sonnet** | Fix plan complete, gates green |
+| 8 | Live Test | `superpowers:verification-before-completion` | **sonnet** | Integration tests PASS against live broker |
+| 9 | Merge | `superpowers:finishing-a-development-branch` | any | Human approved, merged to `main` |
+
+**No council steps.** This is a client SDK — faster iteration, fewer review gates.
+
+**Step 7:** Reviewer produces findings AND a fix plan. No ad-hoc fixes.
+
+**Step 6 + 7.5:** Use `executing-plans` for all coding — even small fixes.
+
+3. Announce: "Dev Flow (Python SDK): Step N — [step name]. [X/Y tasks done]. Next: [action]."
+
+4. Invoke the relevant superpowers skill if one is listed.
+
+## API Source of Truth
+
+The broker API contract lives in-repo (vendored, frozen):
+- **API contract:** `broker/docs/api.md` — see `broker/VENDOR.md` for provenance
+
+Read the API doc before writing or modifying any HTTP call in the SDK.
+
+## Gates (run after every commit)
+
+```bash
+uv run ruff check .                    # G1: lint
+uv run mypy --strict src/              # G2: type check
+uv run pytest tests/unit/              # G3: unit tests
+```
+
+All three must PASS before moving to the next task.
+
+## Contamination Check
+
+After any HITL removal work:
+```bash
+grep -ri "hitl\|approval\|oidc\|federation\|sidecar" src/ tests/
+```
+Must return nothing.
+
+## Live Broker Testing
+
+Integration and acceptance tests require a running broker. Use the in-repo vendored copy:
+```bash
+export AA_ADMIN_SECRET="live-test-secret-32bytes-long-ok"
+./broker/scripts/stack_up.sh
+```
+
+Then run SDK integration tests:
+```bash
+uv run pytest -m integration
+```
+
+## Rules
+
+- Branch from `main`. Feature branches: `feature/*`, fix branches: `fix/*`.
+- Plans save to `.plans/`, specs to `.plans/specs/`, designs to `.plans/designs/`.
+- Update tracker when story/task status changes.
+- **Run gates after each commit.** Fix failures before moving on.
+- **Update `CHANGELOG.md` with every user-facing change** — same commit as the code.
+- **Strict types everywhere** — no untyped variables, parameters, or returns.
+- **`uv` only** — never pip, poetry, or conda.
@@ -23,14 +23,13 @@ Parse the argument from the skill invocation. Default to `status` if no argument
 |----------|---------|----------|
 | `AA_ADMIN_SECRET` | `live-test-secret-32bytes-long-ok` | Pass as second arg: `/broker up mysecret` |
 | `AA_HOST_PORT` | `8080` | Set env var before invoking |
-| Core project path | `~/proj/agentauth-core` | — |
+| Broker path | `./broker` (vendored in-repo) | — |
 
 ### `up`
 
 ```bash
 export AA_ADMIN_SECRET="${SECRET:-live-test-secret-32bytes-long-ok}"
-cd ~/proj/agentauth-core
-./scripts/stack_up.sh
+./broker/scripts/stack_up.sh
 ```
 
 After stack_up completes, run a health check:
@@ -44,8 +43,7 @@ Report success or failure clearly. If health check fails, wait 3 seconds and ret
 ### `down`
 
 ```bash
-cd ~/proj/agentauth-core
-./scripts/stack_down.sh
+./broker/scripts/stack_down.sh
 ```
 
 ### `status`
 
@@ -5,7 +5,7 @@ description: >
   Development Flow, checks tracker state, and tells you which step to execute next.
   Trigger on: "start dev", "what's next", "resume work", "continue",
   "where are we", "pick up where we left off", any development request.
-  Adapted from agentauth-core's devflow — no council steps, Python-specific gates.
+  No council steps, Python-specific gates.
 ---
 
 # AgentAuth Python SDK — Development Flow
@@ -45,12 +45,10 @@ what to do next.
 
 4. Invoke the relevant superpowers skill if one is listed.
 
-## Parent Project Context
+## API Source of Truth
 
-The API source of truth lives in the parent project:
-- **API contract:** `~/proj/agentauth-core/docs/api.md`
-- **Design doc:** `~/proj/agentauth-core/.plans/designs/2026-04-01-python-sdk-repo-design.md`
-- **Strategic decisions:** `~/proj/agentauth-core/FLOW.md`
+The broker API contract lives in-repo (vendored, frozen):
+- **API contract:** `broker/docs/api.md` — see `broker/VENDOR.md` for provenance
 
 Read the API doc before writing or modifying any HTTP call in the SDK.
 
@@ -74,11 +72,10 @@ Must return nothing.
 
 ## Live Broker Testing
 
-Integration and acceptance tests require a running core broker:
+Integration and acceptance tests require a running broker. Use the in-repo vendored copy:
 ```bash
-cd ~/proj/agentauth-core
 export AA_ADMIN_SECRET="live-test-secret-32bytes-long-ok"
-./scripts/stack_up.sh
+./broker/scripts/stack_up.sh
 ```
 
 Then run SDK integration tests:
 
@@ -3,7 +3,7 @@
 > **Date:** 2026-04-02
 > **Status:** Reviewed — Codex adversarial review added findings 12–15
 > **Scope:** Every field the broker returns vs what the Python SDK exposes, drops, or hides.
-> **Source of truth:** Broker handlers in `agentauth-core/internal/handler/` and `agentauth-core/internal/admin/`, `agentauth-core/internal/app/`. API spec: `agentauth-core/docs/api.md`.
+> **Source of truth:** Broker handlers in `broker/internal/handler/`, `broker/internal/admin/`, `broker/internal/app/` (vendored). API spec: `broker/docs/api.md`.
 
 ---
 
 
@@ -863,10 +863,8 @@ Expected: all PASS.
 
 First ensure broker is up:
 ```bash
-cd ~/proj/agentauth-core
 export AA_ADMIN_SECRET="live-test-secret-32bytes-long-ok"
-./scripts/stack_up.sh
-cd -
+./broker/scripts/stack_up.sh
 ```
 
 Then:
 
@@ -0,0 +1,17 @@
+{"type":"story","id":"DEMO-PC1","title":"Broker Is Running and Accessible","classification":"PRECONDITION","status":"NOT_VERIFIED","spec":".plans/specs/2026-04-01-demo-app-spec.md"}
+{"type":"story","id":"DEMO-PC2","title":"Anthropic API Key Is Valid","classification":"PRECONDITION","status":"NOT_VERIFIED","spec":".plans/specs/2026-04-01-demo-app-spec.md"}
+{"type":"story","id":"DEMO-PC3","title":"Demo App Starts Successfully","classification":"PRECONDITION","status":"NOT_VERIFIED","spec":".plans/specs/2026-04-01-demo-app-spec.md"}
+{"type":"story","id":"DEMO-S1","title":"Pipeline Processes All 12 Transactions","classification":"ACCEPTANCE","status":"NOT_STARTED","spec":".plans/specs/2026-04-01-demo-app-spec.md"}
+{"type":"story","id":"DEMO-S2","title":"Each Agent Gets Correctly Scoped Credential","classification":"ACCEPTANCE","status":"NOT_STARTED","spec":".plans/specs/2026-04-01-demo-app-spec.md"}
+{"type":"story","id":"DEMO-S3","title":"Prompt Injection Contained by Credential Layer","classification":"ACCEPTANCE","status":"NOT_STARTED","spec":".plans/specs/2026-04-01-demo-app-spec.md"}
+{"type":"story","id":"DEMO-S4","title":"Report Writer Never Sees Raw Transactions","classification":"ACCEPTANCE","status":"NOT_STARTED","spec":".plans/specs/2026-04-01-demo-app-spec.md"}
+{"type":"story","id":"DEMO-S5","title":"Delegation Chain Shows Scope Attenuation","classification":"ACCEPTANCE","status":"NOT_STARTED","spec":".plans/specs/2026-04-01-demo-app-spec.md"}
+{"type":"story","id":"DEMO-S6","title":"Audit Trail Has Verifiable Hash Chain","classification":"ACCEPTANCE","status":"NOT_STARTED","spec":".plans/specs/2026-04-01-demo-app-spec.md"}
+{"type":"story","id":"DEMO-S7","title":"All Tokens Revoked After Pipeline Completes","classification":"ACCEPTANCE","status":"NOT_STARTED","spec":".plans/specs/2026-04-01-demo-app-spec.md"}
+{"type":"story","id":"DEMO-S8","title":"Startup Fails Clearly When Dependencies Missing","classification":"ACCEPTANCE","status":"NOT_STARTED","spec":".plans/specs/2026-04-01-demo-app-spec.md"}
+{"type":"story","id":"DEMO-S9","title":"Dashboard Shows Real-Time Token Lifecycle","classification":"ACCEPTANCE","status":"NOT_STARTED","spec":".plans/specs/2026-04-01-demo-app-spec.md"}
+{"type":"step","id":"STEP-1","title":"Brainstorm","status":"DONE","note":"Design v2 approved - real LLM pipeline, not showcase booth"}
+{"type":"step","id":"STEP-2","title":"Write Spec","status":"DONE","note":"Rewritten against v2 design"}
+{"type":"step","id":"STEP-3","title":"Impl Plan","status":"DONE","note":"Plan saved to .plans/2026-04-01-demo-app-plan.md — 10 tasks"}
+{"type":"step","id":"STEP-4","title":"Acceptance Tests","status":"DONE","note":"12 stories (3 PC + 9 ACC) in tests/demo-app/user-stories.md"}
+{"type":"step","id":"STEP-5","title":"Register Tracker","status":"DONE","note":"This file"}
@@ -0,0 +1,48 @@
+Gemma-4 "felt" like it was reading because it was actually trying to build a mental model, which is why it hit the context limit.
+
+ ### The Solution: The "Decomposition" Strategy
+
+ Instead of asking the agent to "Read the code and write the SDK," you must break this into a multi-stage pipeline. You need to separate
+ Discovery from Design from Implementation.
+
+ Here is the exact workflow you should use with me (or any agent) to get that Python SDK skeleton:
+
+ #### Stage 1: The "Knowledge Extraction" Phase (Building the Map)
+
+ Do not ask for code yet. Ask for a "System Map."
+ - Prompt: "I want to build a Python SDK for this Go application. First, I don't want any code. I want you to explore the Go codebase and
+ create a high-level 'Logic Map.' Identify: 1. The primary entry points (APIs/CLI), 2. The core data structures (the 'Models'), and 3. The
+ lifecycle of a request/command. Use ls, grep, and read to build this map. Output it as a Markdown summary."
+ - Goal: This forces the model to use its tools to build a compressed version of the logic in its own "memory" without filling the context
+ with raw code yet.
+
+ #### Stage 2: The "Interface Definition" Phase (The Contract)
+
+ Once you have the map, define the "Contract."
+ - Prompt: "Based on the Logic Map we just created, let's define the Python Interface. I want a list of Python Classes and Methods that
+ would allow a developer to do [X, Y, and Z] without knowing the Go implementation. Do not write the implementation logic; just write the
+ Class signatures and Docstrings in Python."
+ - Goal: This creates the "Skeleton" you mentioned. By focusing on signatures and docstrings, you aren't wasting tokens on implementation
+ details.
+
+ #### Stage 3: The "Incremental Implementation" Phase (The Heavy Lifting)
+
+ Now, you implement the skeleton piece by piece. This is where most people fail by trying to do it all at once.
+ - Prompt: "Now, let's implement the Client class from our skeleton. To do this accurately, please read internal/transport/client.go and
+ pkg/models/user.go in the Go repo. Ensure the Python Client handles the authentication flow exactly as the Go version does. Use asyncio
+ for the Python implementation."
+ - Goal: You are providing the "source of truth" (the specific Go files) only when the model is ready to write the specific Python code.
+
+ ### Why this works for Gemma-4 specifically:
+
+ 1. Context Management: By breaking it into stages, you clear the "noise" of the Go implementation once the "Map" is built. You aren't
+ carrying 50 Go files in the context while trying to write 10 Python files.
+ 2. Precision: You are directing the "Attention" of the model. Instead of "Look at everything," you are saying "Look at these 2 files to
+ solve this 1 specific class."
+ 3. Verification: In Stage 1 and 2, you can check the model's work. If the "Logic Map" is wrong, you catch it before it writes a single
+ line of broken Python.
+
+ If you want to start right now, give me this command:
+
+ │ "I need to build a Python SDK for a Go application. Let's start with Stage 1. Explore the current directory, identify the core Go
+ │ modules, and give me a summary of how the application's main logic flows. Don't write any Python yet; just build the map."