-
Notifications
You must be signed in to change notification settings - Fork 0
Add Clawd voice agent with vision and Solana trading tools #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -49,3 +49,9 @@ sperax-ai-agents 2/ | |
| .DS_Store | ||
| __MACOSX | ||
|
|
||
|
|
||
| # Python | ||
| __pycache__/ | ||
| *.pyc | ||
| livekit-agent/.env | ||
| livekit-agent/.env.local | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| LIVEKIT_URL=wss://your-project.livekit.cloud | ||
| LIVEKIT_API_KEY=your_livekit_api_key | ||
| LIVEKIT_API_SECRET=your_livekit_api_secret | ||
| ASSEMBLYAI_API_KEY=your_assemblyai_key | ||
| ANTHROPIC_API_KEY=your_anthropic_key | ||
| OPENAI_API_KEY=your_openai_key | ||
| CARTESIA_API_KEY=your_cartesia_key | ||
| DFLOW_API_KEY= | ||
| DFLOW_API_URL=https://quote-api.dflow.net | ||
|
|
||
| # Primary Solana RPC. Examples: | ||
| # https://api.mainnet-beta.solana.com | ||
| # https://mainnet.helius-rpc.com/?api-key=YOUR_HELIUS_KEY | ||
| # https://api.solana.fm | ||
| # https://rpc.ankr.com/solana | ||
| # https://ssc-dao.genesysgo.net | ||
| SOLANA_RPC_URL=https://api.mainnet-beta.solana.com | ||
| SOLANA_RPC_URLS= |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| # 🦞 Clawd LiveKit Voice Agent | ||
|
|
||
| A Python LiveKit agent that handles voice, vision, and Solana trading. Built on the LiveKit Agents SDK with AssemblyAI Universal-3 Pro Streaming for STT, OpenAI GPT-4.1 for reasoning, Cartesia Sonic-3 for TTS, and Claude Haiku 4.5 for vision. | ||
|
|
||
| ## Pipeline | ||
|
|
||
| | Stage | Provider | | ||
| | --- | --- | | ||
| | STT | AssemblyAI `u3-rt-pro` (punctuation-based EOT) | | ||
| | Turn detection | AssemblyAI STT (`min_turn_silence=100`, `max_turn_silence=1000`) | | ||
| | LLM | OpenAI `gpt-4.1` | | ||
| | TTS | Cartesia `sonic-3` | | ||
| | Noise cancellation | LiveKit BVC | | ||
| | Vision | Anthropic Claude `haiku-4.5` | | ||
| | Trading | DFlow Trading API `/order` (primary), Jupiter (price + comparison) | | ||
| | RPC | Configurable: mainnet beta, Helius, Triton, Ankr, etc. | | ||
|
|
||
| ## Tools | ||
|
|
||
| | Tool | What it does | | ||
| | --- | --- | | ||
| | `get_token_price` | Jupiter price for symbol or mint | | ||
| | `get_wallet_balance` | SOL balance via Solana RPC | | ||
| | `quote_swap` | Jupiter v6 swap quote | | ||
| | `quote_dflow_order` | DFlow `/order` quote with route plan, price impact, execution mode | | ||
| | `get_priority_fees` | Live DFlow priority fee estimates | | ||
| | `get_network_status` | Slot and recent TPS | | ||
| | `analyze_vision` | Claude vision over the latest video frame | | ||
| | `list_supported_tokens` | Known symbols | | ||
|
|
||
| ## Quick start | ||
|
|
||
| ```bash | ||
| cd livekit-agent | ||
| cp .env.example .env | ||
| # fill in keys (see below) | ||
| pip install -r requirements.txt | ||
| python agent.py download-files # silero, turn detector, noise cancellation | ||
| python agent.py dev | ||
| ``` | ||
|
|
||
| Then connect via the [LiveKit Agents Playground](https://agents-playground.livekit.io) or your own LiveKit frontend. | ||
|
|
||
| ## Required env vars | ||
|
|
||
| | Var | Required | Notes | | ||
| | --- | --- | --- | | ||
| | `LIVEKIT_URL` | yes | `wss://<project>.livekit.cloud` | | ||
| | `LIVEKIT_API_KEY` | yes | | | ||
| | `LIVEKIT_API_SECRET` | yes | | | ||
| | `ASSEMBLYAI_API_KEY` | yes | STT | | ||
| | `OPENAI_API_KEY` | yes | LLM | | ||
| | `CARTESIA_API_KEY` | yes | TTS | | ||
| | `ANTHROPIC_API_KEY` | for vision | Falls back to "vision unavailable" if missing | | ||
| | `DFLOW_API_KEY` | for DFlow trading | Falls back to Jupiter only if missing | | ||
| | `SOLANA_RPC_URL` | optional | Defaults to mainnet beta | | ||
| | `SOLANA_RPC_URLS` | optional | Comma-separated fallbacks | | ||
|
|
||
| ## Deploy | ||
|
|
||
| ```bash | ||
| lk agent create | ||
| ``` | ||
|
|
||
| Registers and deploys to LiveKit Cloud. See [LiveKit Agents docs](https://docs.livekit.io/agents/) for production deployment options. | ||
|
|
||
| ## Notes | ||
|
|
||
| - The agent quotes trades. It does not sign or submit. The user signs the `transaction` returned by `/order` in their own wallet. | ||
| - Vision frames are sampled at ~1Hz from the first subscribed remote video track. `analyze_vision` always uses the latest. | ||
| - The agent uses STT-driven turn detection (recommended for U3 Pro Streaming). `min_turn_silence=100`, `max_turn_silence=1000`. | ||
| - For dictation of long entities like email or wallet addresses, raise `max_turn_silence` mid-stream via `stt.update_options(...)`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,222 @@ | ||
| """Clawd: LiveKit voice agent with vision + Solana trading tools. | ||
|
|
||
| STT : AssemblyAI Universal-3 Pro Streaming (u3-rt-pro) | ||
| LLM : OpenAI gpt-4.1 | ||
| TTS : Cartesia Sonic-3 | ||
| VAD : Silero | ||
| Turn : AssemblyAI punctuation-based EOT (STT-driven) | ||
| """ | ||
| from __future__ import annotations | ||
|
|
||
| import asyncio | ||
| import logging | ||
| from typing import Annotated | ||
|
|
||
| from dotenv import load_dotenv | ||
| from livekit import agents, rtc | ||
| from livekit.agents import ( | ||
| Agent, | ||
| AgentSession, | ||
| AgentServer, | ||
| JobContext, | ||
| RoomInputOptions, | ||
| TurnHandlingOptions, | ||
| function_tool, | ||
| ) | ||
| from livekit.plugins import assemblyai, cartesia, noise_cancellation, openai, silero | ||
| from PIL import Image | ||
|
|
||
| from tools import ClawdTools | ||
|
|
||
| load_dotenv(".env.local") | ||
| load_dotenv() | ||
|
|
||
| log = logging.getLogger("clawd") | ||
|
|
||
| INSTRUCTIONS = """You are Clawd, a Solana trading copilot on a voice call. You see what the user shows on camera or screen and help with trades. | ||
|
|
||
| BE SHORT. Keep replies to one or two sentences. If a reply has a comma, see if it can stop at the comma. | ||
|
|
||
| You're a trader on a call, not a feature tour. Have opinions. You can be a little dry. Don't hedge everything. | ||
|
|
||
| Never say: "certainly", "absolutely", "happy to help", "great question", "I'd be happy to", "let me walk you through". | ||
|
|
||
| Tools you can use: | ||
| - get_token_price for live USD prices | ||
| - get_wallet_balance for SOL balances by address | ||
| - quote_dflow_order is the PRIMARY routing for any "what would I get if I swapped X for Y" question. Returns the route plan, price impact, execution mode, and a signable transaction when a wallet is provided. | ||
| - quote_swap is the Jupiter fallback. Use it for cross-checks or when DFlow has no route. | ||
| - get_priority_fees for current micro-lamports per CU at medium/high/very-high tiers | ||
| - get_network_status for Solana slot and TPS | ||
| - analyze_vision whenever the user asks "what do you see", "look at this", "what's on my screen", or references a chart | ||
| - list_supported_tokens if the user asks which symbols you know | ||
|
|
||
| While a tool runs, say "one sec" or "checking" - never longer. | ||
|
|
||
| Read prices naturally: "around 142 dollars", not "142.3847". Read addresses by their first three and last three characters unless asked to spell them. | ||
| No markdown, no bullets. Plain spoken sentences only. | ||
|
|
||
| You CANNOT sign transactions or move funds. If asked to actually execute a trade, say you can quote it but the user has to sign in their wallet. | ||
| You CANNOT look things up on the internet beyond your tools. If asked about news or off-chain data, say so and offer what you can do. | ||
| """ | ||
|
|
||
| GREETING = "Hey, Clawd here. What are we trading?" | ||
|
|
||
|
|
||
| class ClawdAgent(Agent): | ||
| def __init__(self, tools: ClawdTools) -> None: | ||
| super().__init__(instructions=INSTRUCTIONS) | ||
| self._tools = tools | ||
|
|
||
| @function_tool | ||
| async def get_token_price( | ||
| self, | ||
| token: Annotated[str, "Token symbol (SOL, USDC, JUP, BONK, WIF, JTO, PYTH) or mint address."], | ||
| ) -> dict: | ||
| """Get the current USD price of a Solana token.""" | ||
| return await self._tools.get_token_price(token) | ||
|
|
||
| @function_tool | ||
| async def get_wallet_balance( | ||
| self, address: Annotated[str, "Solana wallet public key."] | ||
| ) -> dict: | ||
| """Get the SOL balance for a Solana wallet.""" | ||
| return await self._tools.get_wallet_balance(address) | ||
|
|
||
| @function_tool | ||
| async def quote_swap( | ||
| self, | ||
| input_token: Annotated[str, "Input token symbol or mint."], | ||
| output_token: Annotated[str, "Output token symbol or mint."], | ||
| amount: Annotated[float, "Amount of input token in whole units."], | ||
| slippage_bps: Annotated[int, "Slippage tolerance in bps (50 = 0.5%)."] = 50, | ||
| ) -> dict: | ||
| """Get a Jupiter v6 swap quote between two Solana tokens.""" | ||
| return await self._tools.quote_swap(input_token, output_token, amount, slippage_bps) | ||
|
|
||
| @function_tool | ||
| async def quote_dflow_order( | ||
| self, | ||
| input_token: Annotated[str, "Input token symbol or mint."], | ||
| output_token: Annotated[str, "Output token symbol or mint."], | ||
| amount: Annotated[float, "Amount of input token in whole units."], | ||
| slippage_bps: Annotated[int, "Max slippage in bps. Omit for auto."] = None, | ||
| user_public_key: Annotated[ | ||
| str, | ||
| "Optional user wallet pubkey. When provided, response includes a signable transaction.", | ||
| ] = None, | ||
| ) -> dict: | ||
| """Get a DFlow Trading API /order quote — the primary routing source for swaps.""" | ||
| return await self._tools.quote_dflow_order( | ||
| input_token, output_token, amount, slippage_bps, user_public_key | ||
| ) | ||
|
|
||
| @function_tool | ||
| async def get_priority_fees(self) -> dict: | ||
| """Get live Solana priority fee tiers (micro-lamports per CU) via DFlow.""" | ||
| return await self._tools.get_priority_fees() | ||
|
|
||
| @function_tool | ||
| async def get_network_status(self) -> dict: | ||
| """Get the current Solana slot height and recent TPS.""" | ||
| return await self._tools.get_network_status() | ||
|
|
||
| @function_tool | ||
| async def list_supported_tokens(self) -> dict: | ||
| """List the token symbols this agent knows by name without a mint.""" | ||
| return await self._tools.list_supported_tokens() | ||
|
|
||
| @function_tool | ||
| async def analyze_vision( | ||
| self, | ||
| question: Annotated[str, "What specifically to focus on in the user's camera/screen frame."], | ||
| ) -> dict: | ||
| """Describe what the user is currently showing on camera or screen. | ||
|
|
||
| Use this whenever the user asks "what do you see", "look at this", "check | ||
| my chart", or references something on screen. | ||
| """ | ||
| return await self._tools.analyze_vision(question) | ||
|
|
||
|
|
||
| async def _consume_video(track: rtc.Track, tools: ClawdTools) -> None: | ||
| """Sample frames from a remote video track into the shared latest-frame buffer.""" | ||
| stream = rtc.VideoStream(track) | ||
| last_capture = 0.0 | ||
| interval = 1.0 | ||
| try: | ||
| async for ev in stream: | ||
| now = asyncio.get_running_loop().time() | ||
| if now - last_capture < interval: | ||
| continue | ||
| last_capture = now | ||
| frame = ev.frame | ||
| try: | ||
| pil = Image.frombytes( | ||
| "RGBA", (frame.width, frame.height), frame.data, "raw", "RGBA" | ||
| ) | ||
| except Exception: | ||
| continue | ||
| tools.frame.update_from_pil(pil) | ||
| finally: | ||
| await stream.aclose() | ||
|
|
||
|
|
||
| def _attach_vision(ctx: JobContext, tools: ClawdTools) -> None: | ||
| @ctx.room.on("track_subscribed") | ||
| def _on_track( | ||
| track: rtc.Track, | ||
| publication: rtc.TrackPublication, | ||
| participant: rtc.RemoteParticipant, | ||
| ) -> None: | ||
| if track.kind == rtc.TrackKind.KIND_VIDEO: | ||
| log.info("subscribed to video track from %s", participant.identity) | ||
| asyncio.create_task(_consume_video(track, tools)) | ||
|
|
||
|
|
||
| server = AgentServer() | ||
|
|
||
|
|
||
| @server.rtc_session(agent_name="clawd-voice-agent") | ||
| async def clawd(ctx: JobContext) -> None: | ||
| tools = ClawdTools() | ||
| _attach_vision(ctx, tools) | ||
|
|
||
| session = AgentSession( | ||
| stt=assemblyai.STT( | ||
| model="u3-rt-pro", | ||
| min_turn_silence=100, | ||
| max_turn_silence=1000, | ||
| vad_threshold=0.3, | ||
| keyterms_prompt=[ | ||
| "Solana", "Jupiter", "Raydium", "Orca", "Phoenix", | ||
| "USDC", "SOL", "BONK", "JUP", "WIF", "JTO", "PYTH", | ||
| "Clawd", "AssemblyAI", | ||
| ], | ||
| ), | ||
| llm=openai.LLM(model="gpt-4.1"), | ||
| tts=cartesia.TTS(model="sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"), | ||
| vad=silero.VAD.load(activation_threshold=0.3), | ||
| turn_handling=TurnHandlingOptions( | ||
| turn_detection="stt", | ||
| endpointing={"min_delay": 0}, | ||
| ), | ||
| ) | ||
|
|
||
| try: | ||
| await session.start( | ||
| room=ctx.room, | ||
| agent=ClawdAgent(tools), | ||
| room_input_options=RoomInputOptions( | ||
| video_enabled=True, | ||
| noise_cancellation=noise_cancellation.BVC(), | ||
| ), | ||
| ) | ||
| await session.generate_reply(instructions=f'Say exactly: "{GREETING}"') | ||
| await ctx.wait_for_disconnect() | ||
| finally: | ||
| await tools.aclose() | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| agents.cli.run_app(server) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| livekit-agents[assemblyai,silero,openai,cartesia,turn-detector,noise-cancellation]~=1.5 | ||
| livekit~=1.0 | ||
| python-dotenv>=1.0.0 | ||
| solana>=0.34.0 | ||
| solders>=0.21.0 | ||
| anthropic>=0.39.0 | ||
| aiohttp>=3.9.0 | ||
| Pillow>=10.0.0 |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For LiveKit camera/screen tracks whose native frame format is not RGBA, this stream is created without a requested format but
_consume_videolater decodes every frame asImage.frombytes("RGBA", ...); those frames raise in the decode block and are silently skipped, leavinganalyze_visionwith no image. Requestrtc.VideoBufferType.RGBAfromVideoStreamor convert each frame before passing its bytes to PIL.Useful? React with 👍 / 👎.