Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,9 @@ sperax-ai-agents 2/
.DS_Store
__MACOSX


# Python
__pycache__/
*.pyc
livekit-agent/.env
livekit-agent/.env.local
18 changes: 18 additions & 0 deletions livekit-agent/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret
ASSEMBLYAI_API_KEY=your_assemblyai_key
ANTHROPIC_API_KEY=your_anthropic_key
OPENAI_API_KEY=your_openai_key
CARTESIA_API_KEY=your_cartesia_key
DFLOW_API_KEY=
DFLOW_API_URL=https://quote-api.dflow.net

# Primary Solana RPC. Examples:
# https://api.mainnet-beta.solana.com
# https://mainnet.helius-rpc.com/?api-key=YOUR_HELIUS_KEY
# https://api.solana.fm
# https://rpc.ankr.com/solana
# https://ssc-dao.genesysgo.net
SOLANA_RPC_URL=https://api.mainnet-beta.solana.com
SOLANA_RPC_URLS=
72 changes: 72 additions & 0 deletions livekit-agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# 🦞 Clawd LiveKit Voice Agent

A Python LiveKit agent that handles voice, vision, and Solana trading. Built on the LiveKit Agents SDK with AssemblyAI Universal-3 Pro Streaming for STT, OpenAI GPT-4.1 for reasoning, Cartesia Sonic-3 for TTS, and Claude Haiku 4.5 for vision.

## Pipeline

| Stage | Provider |
| --- | --- |
| STT | AssemblyAI `u3-rt-pro` (punctuation-based EOT) |
| Turn detection | AssemblyAI STT (`min_turn_silence=100`, `max_turn_silence=1000`) |
| LLM | OpenAI `gpt-4.1` |
| TTS | Cartesia `sonic-3` |
| Noise cancellation | LiveKit BVC |
| Vision | Anthropic Claude `haiku-4.5` |
| Trading | DFlow Trading API `/order` (primary), Jupiter (price + comparison) |
| RPC | Configurable: mainnet beta, Helius, Triton, Ankr, etc. |

## Tools

| Tool | What it does |
| --- | --- |
| `get_token_price` | Jupiter price for symbol or mint |
| `get_wallet_balance` | SOL balance via Solana RPC |
| `quote_swap` | Jupiter v6 swap quote |
| `quote_dflow_order` | DFlow `/order` quote with route plan, price impact, execution mode |
| `get_priority_fees` | Live DFlow priority fee estimates |
| `get_network_status` | Slot and recent TPS |
| `analyze_vision` | Claude vision over the latest video frame |
| `list_supported_tokens` | Known symbols |

## Quick start

```bash
cd livekit-agent
cp .env.example .env
# fill in keys (see below)
pip install -r requirements.txt
python agent.py download-files # silero, turn detector, noise cancellation
python agent.py dev
```

Then connect via the [LiveKit Agents Playground](https://agents-playground.livekit.io) or your own LiveKit frontend.

## Required env vars

| Var | Required | Notes |
| --- | --- | --- |
| `LIVEKIT_URL` | yes | `wss://<project>.livekit.cloud` |
| `LIVEKIT_API_KEY` | yes | |
| `LIVEKIT_API_SECRET` | yes | |
| `ASSEMBLYAI_API_KEY` | yes | STT |
| `OPENAI_API_KEY` | yes | LLM |
| `CARTESIA_API_KEY` | yes | TTS |
| `ANTHROPIC_API_KEY` | for vision | Falls back to "vision unavailable" if missing |
| `DFLOW_API_KEY` | for DFlow trading | Falls back to Jupiter only if missing |
| `SOLANA_RPC_URL` | optional | Defaults to mainnet beta |
| `SOLANA_RPC_URLS` | optional | Comma-separated fallbacks |

## Deploy

```bash
lk agent create
```

Registers and deploys to LiveKit Cloud. See [LiveKit Agents docs](https://docs.livekit.io/agents/) for production deployment options.

## Notes

- The agent quotes trades. It does not sign or submit. The user signs the `transaction` returned by `/order` in their own wallet.
- Vision frames are sampled at ~1Hz from the first subscribed remote video track. `analyze_vision` always uses the latest.
- The agent uses STT-driven turn detection (recommended for U3 Pro Streaming). `min_turn_silence=100`, `max_turn_silence=1000`.
- For dictation of long entities like email or wallet addresses, raise `max_turn_silence` mid-stream via `stt.update_options(...)`.
222 changes: 222 additions & 0 deletions livekit-agent/agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
"""Clawd: LiveKit voice agent with vision + Solana trading tools.

STT : AssemblyAI Universal-3 Pro Streaming (u3-rt-pro)
LLM : OpenAI gpt-4.1
TTS : Cartesia Sonic-3
VAD : Silero
Turn : AssemblyAI punctuation-based EOT (STT-driven)
"""
from __future__ import annotations

import asyncio
import logging
from typing import Annotated

from dotenv import load_dotenv
from livekit import agents, rtc
from livekit.agents import (
Agent,
AgentSession,
AgentServer,
JobContext,
RoomInputOptions,
TurnHandlingOptions,
function_tool,
)
from livekit.plugins import assemblyai, cartesia, noise_cancellation, openai, silero
from PIL import Image

from tools import ClawdTools

load_dotenv(".env.local")
load_dotenv()

log = logging.getLogger("clawd")

INSTRUCTIONS = """You are Clawd, a Solana trading copilot on a voice call. You see what the user shows on camera or screen and help with trades.

BE SHORT. Keep replies to one or two sentences. If a reply has a comma, see if it can stop at the comma.

You're a trader on a call, not a feature tour. Have opinions. You can be a little dry. Don't hedge everything.

Never say: "certainly", "absolutely", "happy to help", "great question", "I'd be happy to", "let me walk you through".

Tools you can use:
- get_token_price for live USD prices
- get_wallet_balance for SOL balances by address
- quote_dflow_order is the PRIMARY routing for any "what would I get if I swapped X for Y" question. Returns the route plan, price impact, execution mode, and a signable transaction when a wallet is provided.
- quote_swap is the Jupiter fallback. Use it for cross-checks or when DFlow has no route.
- get_priority_fees for current micro-lamports per CU at medium/high/very-high tiers
- get_network_status for Solana slot and TPS
- analyze_vision whenever the user asks "what do you see", "look at this", "what's on my screen", or references a chart
- list_supported_tokens if the user asks which symbols you know

While a tool runs, say "one sec" or "checking" - never longer.

Read prices naturally: "around 142 dollars", not "142.3847". Read addresses by their first three and last three characters unless asked to spell them.
No markdown, no bullets. Plain spoken sentences only.

You CANNOT sign transactions or move funds. If asked to actually execute a trade, say you can quote it but the user has to sign in their wallet.
You CANNOT look things up on the internet beyond your tools. If asked about news or off-chain data, say so and offer what you can do.
"""

GREETING = "Hey, Clawd here. What are we trading?"


class ClawdAgent(Agent):
def __init__(self, tools: ClawdTools) -> None:
super().__init__(instructions=INSTRUCTIONS)
self._tools = tools

@function_tool
async def get_token_price(
self,
token: Annotated[str, "Token symbol (SOL, USDC, JUP, BONK, WIF, JTO, PYTH) or mint address."],
) -> dict:
"""Get the current USD price of a Solana token."""
return await self._tools.get_token_price(token)

@function_tool
async def get_wallet_balance(
self, address: Annotated[str, "Solana wallet public key."]
) -> dict:
"""Get the SOL balance for a Solana wallet."""
return await self._tools.get_wallet_balance(address)

@function_tool
async def quote_swap(
self,
input_token: Annotated[str, "Input token symbol or mint."],
output_token: Annotated[str, "Output token symbol or mint."],
amount: Annotated[float, "Amount of input token in whole units."],
slippage_bps: Annotated[int, "Slippage tolerance in bps (50 = 0.5%)."] = 50,
) -> dict:
"""Get a Jupiter v6 swap quote between two Solana tokens."""
return await self._tools.quote_swap(input_token, output_token, amount, slippage_bps)

@function_tool
async def quote_dflow_order(
self,
input_token: Annotated[str, "Input token symbol or mint."],
output_token: Annotated[str, "Output token symbol or mint."],
amount: Annotated[float, "Amount of input token in whole units."],
slippage_bps: Annotated[int, "Max slippage in bps. Omit for auto."] = None,
user_public_key: Annotated[
str,
"Optional user wallet pubkey. When provided, response includes a signable transaction.",
] = None,
) -> dict:
"""Get a DFlow Trading API /order quote — the primary routing source for swaps."""
return await self._tools.quote_dflow_order(
input_token, output_token, amount, slippage_bps, user_public_key
)

@function_tool
async def get_priority_fees(self) -> dict:
"""Get live Solana priority fee tiers (micro-lamports per CU) via DFlow."""
return await self._tools.get_priority_fees()

@function_tool
async def get_network_status(self) -> dict:
"""Get the current Solana slot height and recent TPS."""
return await self._tools.get_network_status()

@function_tool
async def list_supported_tokens(self) -> dict:
"""List the token symbols this agent knows by name without a mint."""
return await self._tools.list_supported_tokens()

@function_tool
async def analyze_vision(
self,
question: Annotated[str, "What specifically to focus on in the user's camera/screen frame."],
) -> dict:
"""Describe what the user is currently showing on camera or screen.

Use this whenever the user asks "what do you see", "look at this", "check
my chart", or references something on screen.
"""
return await self._tools.analyze_vision(question)


async def _consume_video(track: rtc.Track, tools: ClawdTools) -> None:
"""Sample frames from a remote video track into the shared latest-frame buffer."""
stream = rtc.VideoStream(track)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Request an RGBA video stream before decoding frames

For LiveKit camera/screen tracks whose native frame format is not RGBA, this stream is created without a requested format but _consume_video later decodes every frame as Image.frombytes("RGBA", ...); those frames raise in the decode block and are silently skipped, leaving analyze_vision with no image. Request rtc.VideoBufferType.RGBA from VideoStream or convert each frame before passing its bytes to PIL.

Useful? React with 👍 / 👎.

last_capture = 0.0
interval = 1.0
try:
async for ev in stream:
now = asyncio.get_running_loop().time()
if now - last_capture < interval:
continue
last_capture = now
frame = ev.frame
try:
pil = Image.frombytes(
"RGBA", (frame.width, frame.height), frame.data, "raw", "RGBA"
)
except Exception:
continue
tools.frame.update_from_pil(pil)
finally:
await stream.aclose()


def _attach_vision(ctx: JobContext, tools: ClawdTools) -> None:
@ctx.room.on("track_subscribed")
def _on_track(
track: rtc.Track,
publication: rtc.TrackPublication,
participant: rtc.RemoteParticipant,
) -> None:
if track.kind == rtc.TrackKind.KIND_VIDEO:
log.info("subscribed to video track from %s", participant.identity)
asyncio.create_task(_consume_video(track, tools))


server = AgentServer()


@server.rtc_session(agent_name="clawd-voice-agent")
async def clawd(ctx: JobContext) -> None:
tools = ClawdTools()
_attach_vision(ctx, tools)

session = AgentSession(
stt=assemblyai.STT(
model="u3-rt-pro",
min_turn_silence=100,
max_turn_silence=1000,
vad_threshold=0.3,
keyterms_prompt=[
"Solana", "Jupiter", "Raydium", "Orca", "Phoenix",
"USDC", "SOL", "BONK", "JUP", "WIF", "JTO", "PYTH",
"Clawd", "AssemblyAI",
],
),
llm=openai.LLM(model="gpt-4.1"),
tts=cartesia.TTS(model="sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),
vad=silero.VAD.load(activation_threshold=0.3),
turn_handling=TurnHandlingOptions(
turn_detection="stt",
endpointing={"min_delay": 0},
),
)

try:
await session.start(
room=ctx.room,
agent=ClawdAgent(tools),
room_input_options=RoomInputOptions(
video_enabled=True,
noise_cancellation=noise_cancellation.BVC(),
),
)
await session.generate_reply(instructions=f'Say exactly: "{GREETING}"')
await ctx.wait_for_disconnect()
finally:
await tools.aclose()


if __name__ == "__main__":
agents.cli.run_app(server)
8 changes: 8 additions & 0 deletions livekit-agent/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
livekit-agents[assemblyai,silero,openai,cartesia,turn-detector,noise-cancellation]~=1.5
livekit~=1.0
python-dotenv>=1.0.0
solana>=0.34.0
solders>=0.21.0
anthropic>=0.39.0
aiohttp>=3.9.0
Pillow>=10.0.0
Loading
Loading