Skip to content

Add Clawd voice agent with vision and Solana trading tools#2

Merged
x402agent merged 2 commits into
newnewfrom
claude/voice-agent-trading-2dMQJ
May 15, 2026
Merged

Add Clawd voice agent with vision and Solana trading tools#2
x402agent merged 2 commits into
newnewfrom
claude/voice-agent-trading-2dMQJ

Conversation

@x402agent
Copy link
Copy Markdown
Owner

What does this PR do?

Introduces Clawd, a voice-first Solana trading copilot with two implementations:

  1. AssemblyAI Voice Agent API (voice-agent/) — Browser-based agent with real-time voice I/O, camera/screen vision, and trading tools via a Node.js backend.
  2. LiveKit Agents (livekit-agent/) — Python-based agent for LiveKit deployments with AssemblyAI STT, OpenAI LLM, Cartesia TTS, and Claude vision.

Both agents share the same core capabilities:

  • Token pricing via Jupiter API
  • Wallet balance queries via Solana RPC
  • Swap quotes from Jupiter (v6) and DFlow Trading API (/order endpoint)
  • Network status (slot, TPS) and priority fee estimates
  • Vision analysis of camera/screen frames using Claude
  • Conversational trading interface with short, opinionated responses

Type of change

  • New feature
  • Bug fix
  • Documentation update
  • Refactoring
  • Other

Key files

Voice Agent (AssemblyAI):

  • voice-agent/server.js — Express backend, token minting, tool dispatch
  • voice-agent/tools.js — Jupiter, Solana RPC, DFlow, Claude vision integrations
  • voice-agent/web/voice-agent.js — Browser client with mic/camera, WebSocket to AssemblyAI and local tools
  • voice-agent/web/index.html — UI with video feed and transcript
  • voice-agent/web/pcm-processor.js — AudioWorklet for PCM16 resampling

LiveKit Agent (Python):

  • livekit-agent/agent.py — LiveKit agent with function tools and video frame sampling
  • livekit-agent/tools.py — Async tool implementations (Jupiter, Solana RPC, DFlow, Claude vision)

Configuration & Docs:

  • .env.example files for both implementations
  • README.md files with setup and architecture diagrams
  • JSON config files for agent marketplaces

Checklist

  • Code follows project conventions (background terminals, Solana-specific agents)
  • Both implementations tested locally (AssemblyAI and LiveKit pipelines)
  • Documentation includes setup, architecture, and tool reference
  • Environment variables documented in .env.example

https://claude.ai/code/session_01FstevKPZkS83jiScpTizd2

claude added 2 commits May 15, 2026 20:36
Browser-based voice agent on AssemblyAI Voice Agent API. Streams mic/24kHz
TTS, pushes camera or screen frames every 1.5s, and calls live tools:
Jupiter price + swap quote, Solana RPC balance/slot/TPS, and Claude vision
for chart and screen analysis. Includes src/solana-clawd-voice-agent.json
matching the library schema.
LiveKit Python agent at livekit-agent/ pairs AssemblyAI Universal-3 Pro
Streaming, OpenAI GPT-4.1, Cartesia Sonic-3, Silero VAD, and LiveKit BVC
noise cancellation. Subscribes to the first remote video track and samples
frames into a shared buffer for the analyze_vision tool (Claude Haiku 4.5).

Both agents (Node browser and LiveKit Python) now route trades through
DFlow's /order endpoint as the primary source — route plan, price impact,
execution mode, and a signable transaction — with Jupiter retained as a
fallback. Added get_priority_fees via DFlow and configurable
SOLANA_RPC_URL covering mainnet beta, Helius, Triton, Ankr, Genesysgo.

Adds src/solana-clawd-livekit-voice-agent.json describing the LiveKit
pipeline, vision config, tool catalog, and providers.
@x402agent x402agent merged commit 9eae7dd into newnew May 15, 2026
1 of 3 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9ffd7297a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +226 to +230
toolWs.addEventListener("message", (event) => {
const msg = JSON.parse(event.data);
if (msg.type === "tool.result") {
pendingTools.push({ call_id: msg.call_id, result: msg.result });
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Flush late tool results for the current turn

When a server-side tool call finishes after the agent's short transition reply has already emitted reply.done (for example, Claude vision or a slow Jupiter/DFlow request), this handler appends the result after the only flush point has passed, so AssemblyAI never receives the result for that call until a later turn's reply.done and the current tool call stalls or gets stale output. Track in-flight tool promises and await/flush them in the matching reply.done, or explicitly handle results that arrive after the gate has passed.

Useful? React with 👍 / 👎.

Comment thread livekit-agent/agent.py

async def _consume_video(track: rtc.Track, tools: ClawdTools) -> None:
"""Sample frames from a remote video track into the shared latest-frame buffer."""
stream = rtc.VideoStream(track)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Request an RGBA video stream before decoding frames

For LiveKit camera/screen tracks whose native frame format is not RGBA, this stream is created without a requested format but _consume_video later decodes every frame as Image.frombytes("RGBA", ...); those frames raise in the decode block and are silently skipped, leaving analyze_vision with no image. Request rtc.VideoBufferType.RGBA from VideoStream or convert each frame before passing its bytes to PIL.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants