A TypeScript library for adding real-time Voice AI to web applications with a workflow and package structure inspired by TanStack AI.
Status: In development (experimental). APIs may change.
This repo provides:
- A small, provider-agnostic voice protocol and WebSocket server handler (
ai-voice) - A browser client/runtime for mic capture + audio playback + streaming (
ai-voice-client) - A React hook for integrating Voice AI into UI (
ai-voice-react) - Provider adapters for:
- Gemini Live (
ai-voice-gemini) - OpenAI Realtime (
ai-voice-openai)
- Gemini Live (
gemini-liveopenai-realtime
-
packages/ai-voice- Protocol types (
ClientToServerPacket,ServerToClientPacket) - WS upgrade handler and registry (
attachVoiceChatWsUpgradeHandlerByProviderId,prepareVoiceChat)
- Protocol types (
-
packages/ai-voice-client- Browser runtime that streams audio to/from the server (
createVoiceChatProxyRuntime)
- Browser runtime that streams audio to/from the server (
-
packages/ai-voice-react- React integration via
useVoiceChat
- React integration via
-
packages/ai-voice-gemini- Gemini Live adapter (
geminiLive)
- Gemini Live adapter (
-
packages/ai-voice-openai- OpenAI Realtime adapter (
openaiRealtime)
- OpenAI Realtime adapter (
Use the Vite plugin to attach the WebSocket upgrade handler.
import { defineConfig } from 'vite'
import { voiceVitePlugin } from 'ai-voice'
export default defineConfig({
plugins: [
...
voiceVitePlugin(),
],
})Register the provider adapter as a websocket handler.
import { prepareVoiceChat, type VoiceChatPrepareRouteResponse } from 'ai-voice'
import { geminiLive } from 'ai-voice-gemini'
import { getClipboardTextToolDef, getTimeServer } from './tools'
export async function POST(request: Request) {
const tools = [getTimeServer, getClipboardTextToolDef]
const providerId = prepareVoiceChat({
adapter: geminiLive({
model: process.env.GEMINI_MODEL || 'gemini-live-2.5-flash-preview',
}),
tools,
})
const resBody: VoiceChatPrepareRouteResponse = {
providerId,
wsPath: '/api/voice/ws',
}
return new Response(JSON.stringify(resBody), {
status: 200,
headers: { 'Content-Type': 'application/json' },
})
}Call useVoiceChat to initialize the voice session. It will first prepare the session by calling the prepare route, then automatically connect to the WebSocket and manage the voice chat lifecycle.
import { useMemo, useState } from 'react'
import { useVoiceChat } from 'ai-voice-react'
import { GEMINI_LIVE_PROVIDER_ID } from 'ai-voice-gemini'
import type { GeminiLiveSessionOptions } from 'ai-voice-gemini'
import type { VoiceProviderId } from 'ai-voice'
export function VoicePage() {
const [provider, setProvider] = useState<VoiceProviderId>(GEMINI_LIVE_PROVIDER_ID)
const [geminiLiveOptions, setGeminiLiveOptions] = useState<GeminiLiveSessionOptions>({
dialogType: 'standard',
includeThoughts: false,
})
const tools = useMemo(() => [], [])
const session = useVoiceChat({
prepareRoute: '/api/voice/prepare',
sessionOptions: geminiLiveOptions,
tools,
})
return (
<div>
<button onClick={() => session.start()} disabled={session.isLoading}>
Start
</button>
<button onClick={() => session.stop()} disabled={!session.isActive}>
Stop
</button>
</div>
)
}-
Gemini Live
GEMINI_API_KEYGEMINI_MODEL(optional)
-
OpenAI Realtime
OPENAI_API_KEYOPENAI_REALTIME_MODEL(optional)
pnpm i
pnpm -r build
pnpm typecheckTo see it running, check example/start.