Skip to content

erkamkavak/tanstack-voice-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TanStack Voice AI

A TypeScript library for adding real-time Voice AI to web applications with a workflow and package structure inspired by TanStack AI.

Status: In development (experimental). APIs may change.

This repo provides:

  • A small, provider-agnostic voice protocol and WebSocket server handler (ai-voice)
  • A browser client/runtime for mic capture + audio playback + streaming (ai-voice-client)
  • A React hook for integrating Voice AI into UI (ai-voice-react)
  • Provider adapters for:
    • Gemini Live (ai-voice-gemini)
    • OpenAI Realtime (ai-voice-openai)

Supported providers

  • gemini-live
  • openai-realtime

Packages

  • packages/ai-voice

    • Protocol types (ClientToServerPacket, ServerToClientPacket)
    • WS upgrade handler and registry (attachVoiceChatWsUpgradeHandlerByProviderId, prepareVoiceChat)
  • packages/ai-voice-client

    • Browser runtime that streams audio to/from the server (createVoiceChatProxyRuntime)
  • packages/ai-voice-react

    • React integration via useVoiceChat
  • packages/ai-voice-gemini

    • Gemini Live adapter (geminiLive)
  • packages/ai-voice-openai

    • OpenAI Realtime adapter (openaiRealtime)

Example usage

1) Server: Vite plugin

Use the Vite plugin to attach the WebSocket upgrade handler.

import { defineConfig } from 'vite'
import { voiceVitePlugin } from 'ai-voice'

export default defineConfig({
  plugins: [
    ...
    voiceVitePlugin(),
  ],
})

2) Server: Prepare route

Register the provider adapter as a websocket handler.

import { prepareVoiceChat, type VoiceChatPrepareRouteResponse } from 'ai-voice'
import { geminiLive } from 'ai-voice-gemini'
import { getClipboardTextToolDef, getTimeServer } from './tools'

export async function POST(request: Request) {
  const tools = [getTimeServer, getClipboardTextToolDef]

  const providerId = prepareVoiceChat({
    adapter: geminiLive({
      model: process.env.GEMINI_MODEL || 'gemini-live-2.5-flash-preview',
    }),
    tools,
  })

  const resBody: VoiceChatPrepareRouteResponse = {
    providerId,
    wsPath: '/api/voice/ws',
  }

  return new Response(JSON.stringify(resBody), {
    status: 200,
    headers: { 'Content-Type': 'application/json' },
  })
}

3) Client: React (useVoiceChat)

Call useVoiceChat to initialize the voice session. It will first prepare the session by calling the prepare route, then automatically connect to the WebSocket and manage the voice chat lifecycle.

import { useMemo, useState } from 'react'
import { useVoiceChat } from 'ai-voice-react'
import { GEMINI_LIVE_PROVIDER_ID } from 'ai-voice-gemini'
import type { GeminiLiveSessionOptions } from 'ai-voice-gemini'
import type { VoiceProviderId } from 'ai-voice'

export function VoicePage() {
  const [provider, setProvider] = useState<VoiceProviderId>(GEMINI_LIVE_PROVIDER_ID)

  const [geminiLiveOptions, setGeminiLiveOptions] = useState<GeminiLiveSessionOptions>({
    dialogType: 'standard',
    includeThoughts: false,
  })

  const tools = useMemo(() => [], [])

  const session = useVoiceChat({
    prepareRoute: '/api/voice/prepare',
    sessionOptions: geminiLiveOptions,
    tools,
  })

  return (
    <div>
      <button onClick={() => session.start()} disabled={session.isLoading}>
        Start
      </button>
      <button onClick={() => session.stop()} disabled={!session.isActive}>
        Stop
      </button>
    </div>
  )
}

Environment variables

  • Gemini Live

    • GEMINI_API_KEY
    • GEMINI_MODEL (optional)
  • OpenAI Realtime

    • OPENAI_API_KEY
    • OPENAI_REALTIME_MODEL (optional)

Development

pnpm i
pnpm -r build
pnpm typecheck

To see it running, check example/start.


About

Tanstack AI inspired TypeScript library for adding real-time Voice AI to web applications

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors