Add Clawd voice agent with vision and Solana trading tools#2
Conversation
Browser-based voice agent on AssemblyAI Voice Agent API. Streams mic/24kHz TTS, pushes camera or screen frames every 1.5s, and calls live tools: Jupiter price + swap quote, Solana RPC balance/slot/TPS, and Claude vision for chart and screen analysis. Includes src/solana-clawd-voice-agent.json matching the library schema.
LiveKit Python agent at livekit-agent/ pairs AssemblyAI Universal-3 Pro Streaming, OpenAI GPT-4.1, Cartesia Sonic-3, Silero VAD, and LiveKit BVC noise cancellation. Subscribes to the first remote video track and samples frames into a shared buffer for the analyze_vision tool (Claude Haiku 4.5). Both agents (Node browser and LiveKit Python) now route trades through DFlow's /order endpoint as the primary source — route plan, price impact, execution mode, and a signable transaction — with Jupiter retained as a fallback. Added get_priority_fees via DFlow and configurable SOLANA_RPC_URL covering mainnet beta, Helius, Triton, Ankr, Genesysgo. Adds src/solana-clawd-livekit-voice-agent.json describing the LiveKit pipeline, vision config, tool catalog, and providers.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f9ffd7297a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| toolWs.addEventListener("message", (event) => { | ||
| const msg = JSON.parse(event.data); | ||
| if (msg.type === "tool.result") { | ||
| pendingTools.push({ call_id: msg.call_id, result: msg.result }); | ||
| } |
There was a problem hiding this comment.
Flush late tool results for the current turn
When a server-side tool call finishes after the agent's short transition reply has already emitted reply.done (for example, Claude vision or a slow Jupiter/DFlow request), this handler appends the result after the only flush point has passed, so AssemblyAI never receives the result for that call until a later turn's reply.done and the current tool call stalls or gets stale output. Track in-flight tool promises and await/flush them in the matching reply.done, or explicitly handle results that arrive after the gate has passed.
Useful? React with 👍 / 👎.
|
|
||
| async def _consume_video(track: rtc.Track, tools: ClawdTools) -> None: | ||
| """Sample frames from a remote video track into the shared latest-frame buffer.""" | ||
| stream = rtc.VideoStream(track) |
There was a problem hiding this comment.
Request an RGBA video stream before decoding frames
For LiveKit camera/screen tracks whose native frame format is not RGBA, this stream is created without a requested format but _consume_video later decodes every frame as Image.frombytes("RGBA", ...); those frames raise in the decode block and are silently skipped, leaving analyze_vision with no image. Request rtc.VideoBufferType.RGBA from VideoStream or convert each frame before passing its bytes to PIL.
Useful? React with 👍 / 👎.
What does this PR do?
Introduces Clawd, a voice-first Solana trading copilot with two implementations:
voice-agent/) — Browser-based agent with real-time voice I/O, camera/screen vision, and trading tools via a Node.js backend.livekit-agent/) — Python-based agent for LiveKit deployments with AssemblyAI STT, OpenAI LLM, Cartesia TTS, and Claude vision.Both agents share the same core capabilities:
/orderendpoint)Type of change
Key files
Voice Agent (AssemblyAI):
voice-agent/server.js— Express backend, token minting, tool dispatchvoice-agent/tools.js— Jupiter, Solana RPC, DFlow, Claude vision integrationsvoice-agent/web/voice-agent.js— Browser client with mic/camera, WebSocket to AssemblyAI and local toolsvoice-agent/web/index.html— UI with video feed and transcriptvoice-agent/web/pcm-processor.js— AudioWorklet for PCM16 resamplingLiveKit Agent (Python):
livekit-agent/agent.py— LiveKit agent with function tools and video frame samplinglivekit-agent/tools.py— Async tool implementations (Jupiter, Solana RPC, DFlow, Claude vision)Configuration & Docs:
.env.examplefiles for both implementationsREADME.mdfiles with setup and architecture diagramsChecklist
.env.examplehttps://claude.ai/code/session_01FstevKPZkS83jiScpTizd2