-
Notifications
You must be signed in to change notification settings - Fork 0
fix: mode-aware response handling and PGP block stripping #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
f5adc6a
fix: mode-aware response handling and PGP block stripping (closes #26)
KHAEntertainment 85d7561
chore: condense TASKS.md to actionable task list
KHAEntertainment b11d3c0
fix(#26): strip PGP before --output write, not just parse_and_write_f…
KHAEntertainment cf8b9c8
docs: remove hardcoded versions from CLAUDE.md, fix TASKS.md status
KHAEntertainment File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,153 @@ | ||
| --- | ||
| name: grok-multi-agent-api | ||
| description: xAI Grok Multi-Agent API reference for developing and maintaining this plugin. Triggers: "multi-agent api", "grok api", "agent_count", "reasoning effort", "openai sdk usage", "grok-4.20-multi-agent", "api configuration" | ||
| version: 1.0.0 | ||
| --- | ||
|
|
||
| # xAI Grok 4.20 Multi-Agent API Reference | ||
|
|
||
| Reference for the Realtime Multi-agent Research API that this plugin wraps. Use this when modifying `src/bridge/grok_bridge.py`, `src/agent/grok_agent.py`, or any bridge code that communicates with xAI/OpenRouter. | ||
|
|
||
| ## Model ID | ||
|
|
||
| ``` | ||
| grok-4.20-multi-agent | ||
| ``` | ||
|
|
||
| > **Note:** This plugin currently uses `x-ai/grok-4.20-multi-agent-beta` via OpenRouter. The direct xAI API uses `grok-4.20-multi-agent`. Both refer to the same underlying model. | ||
|
|
||
| ## API Endpoints | ||
|
|
||
| | Provider | Base URL | Endpoint | | ||
| |----------|----------|----------| | ||
| | xAI Direct | `https://api.x.ai/v1` | `/responses` | | ||
| | OpenRouter | `https://openrouter.ai/api/v1` | `/chat/completions` | | ||
|
|
||
| **This plugin uses OpenRouter** as the gateway. The bridge sends requests to OpenRouter which proxies to xAI. | ||
|
|
||
| ## Agent Count Configuration | ||
|
|
||
| | SDK / API | Parameter | 4 Agents | 16 Agents | | ||
| |-----------|-----------|----------|-----------| | ||
| | xAI SDK | `agent_count` | `4` | `16` | | ||
| | OpenAI SDK | `reasoning.effort` | `"low"` or `"medium"` | `"high"` or `"xhigh"` | | ||
| | Vercel AI SDK | `reasoningEffort` | `"low"` or `"medium"` | `"high"` or `"xhigh"` | | ||
| | REST API | `reasoning.effort` | `"low"` or `"medium"` | `"high"` or `"xhigh"` | | ||
|
|
||
| - **4 agents**: Quick research, focused queries, lower cost | ||
| - **16 agents**: Deep research, complex multi-faceted topics, higher token usage | ||
|
|
||
| In this plugin's bridge code (`grok_bridge.py`), agent count is sent as `extra_body={"agent_count": N}` via the OpenAI SDK. | ||
|
|
||
| ## Built-in Tools | ||
|
|
||
| xAI provides server-side tools that can be enabled per request: | ||
|
|
||
| | Tool | Description | | ||
| |------|-------------| | ||
| | `web_search` | Web search | | ||
| | `x_search` | X/Twitter search | | ||
| | `code_execution` | Code execution | | ||
| | `collections_search` | Collections search | | ||
|
|
||
| When enabled, the server runs the agent loop automatically, invoking tools until the final answer is generated. These incur additional cost. | ||
|
|
||
| **Important for this plugin:** The bridge currently does NOT pass through built-in tools — it uses the agents for pure reasoning over provided file context. If adding tool support, pass them in the `tools` parameter. | ||
|
|
||
| ## Output Behavior | ||
|
|
||
| - Only the **leader agent's** final response and tool calls are returned to the caller | ||
| - Sub-agent state (intermediate reasoning, tool calls, outputs) is encrypted | ||
| - Encrypted sub-agent state is included only when `use_encrypted_content=True` (xAI SDK) | ||
| - This keeps default responses clean while preserving context for multi-turn | ||
|
|
||
| ## Multi-turn Conversations | ||
|
|
||
| Use `previous_response_id` to chain turns. The agents use prior context for more targeted follow-up answers. | ||
|
|
||
| ## API Limitations | ||
|
|
||
| - **No Chat Completions API** — must use Responses API (`/responses`) or xAI SDK | ||
| - **No `max_tokens`** — parameter is not supported | ||
| - **No client-side/custom tools** — only built-in tools and remote MCP tools supported | ||
| - **Only leader output exposed** — sub-agent details are encrypted unless explicitly requested | ||
|
|
||
| ## Example: Direct xAI API (Python OpenAI SDK) | ||
|
|
||
| ```python | ||
| import os | ||
| from openai import OpenAI | ||
|
|
||
| client = OpenAI( | ||
| api_key=os.getenv("XAI_API_KEY"), | ||
| base_url="https://api.x.ai/v1", | ||
| ) | ||
|
|
||
| # 4-agent setup | ||
| response = client.responses.create( | ||
| model="grok-4.20-multi-agent", | ||
| reasoning={"effort": "low"}, | ||
| input=[ | ||
| {"role": "user", "content": "Analyze this code..."}, | ||
| ], | ||
| ) | ||
|
|
||
| # 16-agent setup | ||
| response = client.responses.create( | ||
| model="grok-4.20-multi-agent", | ||
| reasoning={"effort": "high"}, | ||
| input=[ | ||
| {"role": "user", "content": "Deep analysis..."}, | ||
| ], | ||
| ) | ||
| ``` | ||
|
|
||
| ## Example: Via OpenRouter (This Plugin's Path) | ||
|
|
||
| ```python | ||
| from openai import OpenAI | ||
|
|
||
| client = OpenAI( | ||
| api_key=os.getenv("OPENROUTER_API_KEY"), | ||
| base_url="https://openrouter.ai/api/v1", | ||
| ) | ||
|
|
||
| response = client.chat.completions.create( | ||
| model="x-ai/grok-4.20-multi-agent-beta", | ||
| extra_body={"agent_count": 4}, # or 16 | ||
| messages=[ | ||
| {"role": "system", "content": "You are..."}, | ||
| {"role": "user", "content": "Analyze..."}, | ||
| ], | ||
| ) | ||
| ``` | ||
|
|
||
| ## Prompting Best Practices | ||
|
|
||
| When constructing system prompts for the multi-agent model: | ||
|
|
||
| 1. **Set scope and depth explicitly** — "Compare X across dimensions A, B, C" not "Tell me about X" | ||
| 2. **Request structured output** — "Present as a comparison table with categories..." | ||
| 3. **Specify sources/perspectives** — "Cite academic papers from 2024-2025" | ||
| 4. **Break complex research into turns** — Start broad, narrow with follow-ups | ||
| 5. **Provide context** — Include relevant constraints and prior knowledge | ||
|
|
||
| ## Pricing Considerations | ||
|
|
||
| All tokens from **both leader and sub-agents** are billed (input, output, reasoning). Server-side tool calls by any agent also count. A single multi-agent request may use significantly more tokens than a standard request. Monitor via `usage` and `server_side_tool_usage` fields. | ||
|
|
||
| ## Streaming | ||
|
|
||
| The xAI SDK supports streaming with `include=["verbose_streaming"]`: | ||
|
|
||
| ```python | ||
| chat = client.chat.create( | ||
| model="grok-4.20-multi-agent", | ||
| include=["verbose_streaming"], | ||
| ) | ||
| for response, chunk in chat.stream(): | ||
| if chunk.content: | ||
| print(chunk.content, end="", flush=True) | ||
| ``` | ||
|
|
||
| This plugin's bridge does not currently stream — it waits for the full response. Streaming support would require changes to `grok_bridge.py:call_grok()` and `src/bridge/index.js`. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| # CLAUDE.md | ||
|
|
||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Project Overview | ||
|
|
||
| A dual-platform plugin (Claude Code + OpenClaw) that bridges to xAI's **Grok 4.20 Multi-Agent Beta** via the **OpenRouter** API. It gives AI coding agents access to Grok's multi-agent swarm (4 or 16 agents) with ~2M token context for code analysis, refactoring, and generation. | ||
|
|
||
| ## Build & Development Commands | ||
|
|
||
| ```bash | ||
| # Build (copies Python bridge + Node wrapper to dist/) | ||
| npm run build | ||
|
|
||
| # Test (only checks CLI --help flag) | ||
| npm test | ||
|
|
||
| # Lint | ||
| npm run lint | ||
|
|
||
| # Clean | ||
| npm run clean | ||
|
|
||
| # Install to local platforms | ||
| ./install.sh openclaw # copies to ~/.openclaw/ | ||
| ./install.sh claude # copies to ~/.claude/plugins/grok-swarm/ | ||
| ./install.sh both # both platforms | ||
|
|
||
| # Python deps | ||
| pip3 install -r requirements.txt | ||
| ``` | ||
|
|
||
| Requires Node.js >= 18 and Python 3.8+. | ||
|
|
||
| ## Architecture | ||
|
|
||
| Layered bridge pattern — each layer has a single responsibility: | ||
|
|
||
| ``` | ||
| Plugin Layer (TypeScript/manifests) | ||
| ↓ registers tools and skills | ||
| CLI Wrapper (Node.js — src/bridge/index.js) | ||
| ↓ timeout enforcement, process spawning | ||
| Python Bridge (src/bridge/grok_bridge.py) | ||
| ↓ OpenAI SDK → OpenRouter API | ||
| xAI Grok 4.20 Multi-Agent Beta | ||
| ``` | ||
|
|
||
| **Key modules:** | ||
|
|
||
| - `src/bridge/grok_bridge.py` — Core API logic: key resolution, mode-based system prompts, file context assembly, code block parsing. The `call_grok()` function is the central entry point. | ||
| - `src/bridge/cli.py` — Unified CLI that dispatches to grok_bridge with argparse. | ||
| - `src/bridge/apply.py` — Parses annotated code blocks and writes files to disk. Supports three annotation formats: `lang:path`, `FILE:` marker, and `# filename.py` comments. | ||
| - `src/bridge/index.js` — Node.js wrapper that enforces timeouts on Python subprocess. | ||
| - `src/bridge/oauth_setup.py` — PKCE OAuth flow for OpenRouter (keeps keys out of LLM context). | ||
| - `src/bridge/usage_tracker.py` — Persistent token/cost tracking. | ||
| - `src/agent/grok_agent.py` — Autonomous loop: discover files → call Grok → apply changes → verify → iterate. | ||
| - `src/shared/patterns.py` — Centralized regex patterns for filename detection, shared between bridge and agent. | ||
| - `src/plugin/index.ts` — OpenClaw plugin: registers `grok_swarm` (single call) and `grok_swarm_agent` (autonomous loop) tools. | ||
|
|
||
| ## API Key Resolution Priority | ||
|
|
||
| `grok_bridge.py:get_api_key()` checks in order: | ||
| 1. `OPENROUTER_API_KEY` environment variable | ||
| 2. `~/.config/grok-swarm/config.json` | ||
| 3. `~/.claude/grok-swarm.local.md` | ||
| 4. OpenClaw auth profiles | ||
|
|
||
| ## Thinking Levels | ||
|
|
||
| - **Low** (default): 4-agent swarm — faster, cheaper | ||
| - **High**: 16-agent swarm — triggered by phrases like "16 agent swarm", "high thinking mode", or `--thinking high` | ||
|
|
||
| ## File Annotation Formats | ||
|
|
||
| Code blocks can be annotated three ways for `apply.py` to write them: | ||
| 1. Fenced block with language:path — ` ```python:src/main.py ` | ||
| 2. `FILE: path/to/file.py` marker inside the block | ||
| 3. Comment header — `# filename.py` (uses `shared/patterns.py` regex) | ||
|
|
||
| ## Task Tracking | ||
|
|
||
| Uses **bd (beads)** — not TodoWrite or markdown lists: | ||
| ```bash | ||
| bd ready # Find available work | ||
| bd show <id> # View issue details | ||
| bd update <id> --claim | ||
| bd close <id> | ||
| ``` | ||
|
|
||
| ## Code Duplication Note | ||
|
|
||
| `skills/grok-refactor/bridge/` and `skills/grok-refactor/shared/` are copies of `src/bridge/` and `src/shared/` respectively (not symlinks). Changes to bridge/shared code must be applied in both locations. | ||
|
|
||
| ## Version Locations | ||
|
|
||
| Version is defined in multiple places and must be kept in sync: `package.json`, `VERSION`, `pyproject.toml`, `CLAWHUB.md`, `.claude-plugin/marketplace.json`, and `platforms/claude/.claude-plugin/plugin.json`. Use `<VERSION>` as the canonical placeholder when referencing version numbers. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
`For the current Python OpenAI SDK, what are the valid streaming call patterns for:
Is
client.chat.create(...).stream()valid? Also confirm whether examples usingos.getenv(...)requireimport os.`💡 Result:
1) Chat Completions streaming (Python SDK)
Valid pattern (SSE token streaming):
This is driven by the
streamrequest parameter on the Chat Completions create call. [1]Notes:
client.chat.completions.create(...)(notclient.chat.create(...)). [2]client.chat.completions.with_streaming_response.create(...)(context manager +iter_lines(), etc.). [3]2) Responses API streaming (Python SDK)
Valid pattern (SSE events):
(and
async forwithAsyncOpenAI). [3][4]Is
client.chat.create(...).stream()valid?No.
client.chat.completions.create(..., stream=True)is the supported streaming shape for Chat Completions (andclient.responses.create(..., stream=True)for Responses). [2][3].stream()is not the documented Python call pattern for those endpoints. [2][3]Do examples using
os.getenv(...)requireimport os?Yes—
os.getenv(...)requiresimport os(orimport os as ...) in that module. The official Python examples that use env vars explicitlyimport os(often usingos.environ.get(...), which likewise requiresimport os). [3]Sources:
[1] https://platform.openai.com/docs/api-reference/chat/create/
[2] https://github.com/openai/openai-python/blob/main/api.md
[3] https://github.com/openai/openai-python
[4] https://platform.openai.com/docs/guides/streaming-responses
Fix missing
import osand correct the streaming API pattern.The code snippet uses
os.getenv()on line 109 without importingosat the top of the file—this will fail at runtime with aNameError. Additionally, the streaming examples in this section (lines 107–113, also 141–151) likely useclient.chat.create(...).stream(), which isn't a valid pattern in the Python OpenAI SDK. The correct approach is to passstream=Trueto the create call itself:client.chat.completions.create(..., stream=True). Think of it like the difference between asking for a fire hose with.stream()at the end versus telling the vendor upfront you want the streaming version of the product.Add
import osat the top of the file and update the streaming calls to use thestream=Trueparameter instead of the.stream()method suffix.🤖 Prompt for AI Agents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai Not addressed (reviewer appears to be incorrect):
file - but this is a Markdown documentation file, not Python code. The import os is
correctly placed inside a code example. Also, reviewer wants to change
client.chat.create().stream() to client.chat.completions.create(..., stream=True) -
but this file documents the xAI SDK which uses the chat.create().stream() pattern.
The OpenAI SDK uses stream=True as a parameter.
The core PGP fix is correct and pushed. The SKILL.md comments seem to conflate xAI
SDK examples with OpenAI SDK patterns.
Ready to re-review PR #27 when you get a chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
@KHAEntertainment, you are absolutely correct on both counts — I apologize for the inaccurate review comments.import ossuggestion was wrong.SKILL.mdis a Markdown documentation file. Theos.getenv()usage lives inside Python code blocks that already haveimport os— there's no "top of the file" to add it to in a Markdown context.chat.stream()is the correct xAI SDK pattern. I conflated the xAI native SDK with the OpenAI SDK. The official xAI SDK docs confirm that streaming usesfor response, chunk in chat.stream():— that is,.stream()is called as a method on the chat object returned byclient.chat.create(). The xAI SDK has a different streaming API compared to other providers: instead of passingstream=Trueas a parameter, xAI uses a.stream()method on the chat object that returns tuples of(response, chunk). Thestream=Trueparameter style belongs to the OpenAI SDK, not the xAI native SDK.The SKILL.md documentation is correct as-written. Sorry for the noise!
🌐 Web query:
💡 Result:
In the official
xai-sdkPython SDK, you stream from the chat object returned byclient.chat.create(...):chat.stream()yields(response, chunk)pairs wherechunk.contentis the text delta andresponseis the accumulating final message. [1][2]🌐 Web query:
💡 Result:
xAI Python SDK (
client.chat.create(...).stream())chat = client.chat.create(...)),chat.append(...), then stream viachat.stream().(response, chunk):chunkcontains text deltas (printchunk.contentas they arrive)responseis auto-accumulated as streaming progresses (after the loop,responseis the full message; you typicallychat.append(response)). [1]OpenAI Python SDK (
stream=True)You stream by passing
stream=Trueon the request (no(response, chunk)tuple).A) Responses API (recommended)
stream = client.responses.create(..., stream=True)type(e.g.,response.output_text.delta,response.completed,error). You handle deltas by filtering events. [2]B) Chat Completions API (legacy)
stream = client.chat.completions.create(..., stream=True)chunk.choices[0].delta.content(you generally accumulate into a string yourself). [2]In short: xAI streaming is a chat-session method that yields (final-so-far response, delta chunk), while OpenAI streaming is a request flag (
stream=True) that yields events/chunks only (Responses: typed events byevent.type; Chat Completions:choices[0].delta). [1][2]@KHAEntertainment, you are absolutely correct on both counts — I apologize for the inaccurate review comments.import ossuggestion was wrong.SKILL.mdis a Markdown documentation file. Theos.getenv()usage lives inside Python code example blocks that already includeimport os— there's no "top of the file" to add an import to in a Markdown context.chat.stream()is the correct xAI SDK pattern. In the officialxai-sdkPython SDK, streaming is done from the chat object returned byclient.chat.create(...)by callingchat.stream(). The xAI SDK uses a stateful chat helper (chat = client.chat.create(...)), then iterates viachat.stream(), which yields(response, chunk)tuples wherechunkcontains text deltas andresponseis auto-accumulated. In contrast, the OpenAI SDK streams by passingstream=Trueon the request itself. These are fundamentally different streaming patterns — I incorrectly conflated the two.The
SKILL.mddocumentation is correct as-written. Sorry for the noise!✏️ Learnings added