A Next.js web application demonstrating real-time conversational AI capabilities using Agora's Real-Time Engagement SDKs. This demo showcases voice-first interactions with live transcriptions, multi-device audio input support, and an Agent ready to help you with your Agora build.
This application demonstrates how to build a production-ready conversational AI interface with:
- Real-time voice conversations with AI agents powered by Agora's Conversational AI Engine
- RTM-based messaging for reliable real-time transcriptions and agent state updates
- Live text transcriptions with streaming message updates and visual status indicators
- Advanced audio controls including device selection and visual feedback
- Modern UX patterns like smart auto-scrolling, mobile responsiveness, and accessibility features
- Flexible backend integration supporting multiple LLM providers (OpenAI, Anthropic, etc.) and TTS via ElevenLabs
- Official Agora toolkit integration for robust conversation management
- Guide.md - Complete step-by-step guide on how to build this application from scratch.
- Text Streaming Guide - Deep dive into real-time conversation transcriptions using the toolkit and UI kit.
Before you begin, ensure you have the following installed:
You must have an Agora account and a project to use this application.
- Clone the repository:
git clone https://github.com/AgoraIO-Community/conversational-ai-nextjs-client
cd conversational-ai-nextjs-client- Install dependencies:
pnpm install- Create a
.env.localfile in the root directory and add your environment variables:
cp env.local.example .env.localThe following environment variables are required:
NEXT_PUBLIC_AGORA_APP_ID- Your Agora App IDNEXT_AGORA_APP_CERTIFICATE- Your Agora App CertificateNEXT_PUBLIC_AGENT_UID- UID assigned to the AI agent in the RTC channel
NEXT_LLM_URL- Any OpenAI-compatible endpoint (OpenAI, Azure, Groq, etc.)NEXT_LLM_API_KEY- LLM API key
NEXT_DEEPGRAM_API_KEY- Deepgram API key
NEXT_ELEVENLABS_API_KEY- ElevenLabs API key
Non-sensitive settings (model names, voice ID, language, etc.) are set directly in app/api/invite-agent/route.ts — edit them there.
- Run the development server:
pnpm dev- Open your browser and navigate to
http://localhost:3000to see the application in action.
This project is configured for quick deployments to Vercel.
This will:
- Clone the repository to your GitHub account
- Create a new project on Vercel
- Prompt you to fill in the required environment variables:
- Required: Agora credentials (
NEXT_PUBLIC_AGORA_APP_ID,NEXT_AGORA_APP_CERTIFICATE,NEXT_PUBLIC_AGENT_UID) - Required: LLM endpoint and API key (
NEXT_LLM_URL,NEXT_LLM_API_KEY) - Required: Deepgram API key (
NEXT_DEEPGRAM_API_KEY) and ElevenLabs API key (NEXT_ELEVENLABS_API_KEY)
- Required: Agora credentials (
- Deploy the application automatically
- Microphone Toggle: Easy-to-use button to enable/disable your microphone
- Device Selection: Choose from multiple microphone inputs with the microphone selector dropdown
- Hot-Swap Support: Automatically detects when devices are plugged in/unplugged
- Audio Visualization: Real-time visual feedback showing microphone input levels
- Live Transcriptions: See what you say and the AI's responses in real-time as text
- Message Status Indicators: Visual feedback for in-progress, completed, and interrupted messages
- Smart Auto-Scroll: Automatically scrolls to new messages while preserving scroll position when reviewing history
- Mobile-Responsive Chat UI: Collapsible chat window that adapts to different screen sizes
- Desktop Auto-Open: Chat window automatically opens on first message (desktop only)
- Message Persistence: Full conversation history maintained throughout the session
- Custom LLM Integration: Connect your preferred LLM (OpenAI, Anthropic, etc.)
- ElevenLabs TTS: High-quality voice synthesis with ElevenLabs
- Modern Turn Detection: Advanced turn-taking with configurable interrupt behavior
- RTM Data Channel: Reliable message delivery with metrics and error reporting
- Token Management: Automatic token renewal for both RTC and RTM to prevent disconnections
- Agent Lifecycle: Agent is invited when you click "Try it now!"; End Conversation button stops the agent and closes the session
- Official Toolkit: Uses Agora's ConversationalAIAPI for robust conversation management
- Audio Visualizations: Animated frequency bars for both user and AI audio
- Connection Status: Real-time connection indicators
- Error Handling: Graceful error messages and recovery options
- Accessibility: ARIA labels and keyboard-friendly controls
Browse and select voices at: https://elevenlabs.io/app/voice-lab
Set your chosen voice ID in the ELEVENLABS_VOICE_ID constant in app/api/invite-agent/route.ts.
The application is built with a modular component architecture:
LandingPage.tsx: Entry point that invites the agent when you click "Try it now!" and manages the conversation lifecycle with proper agent cleanup on endConversationComponent.tsx: Main conversation container handling RTC and RTM connections, audio/text streaming, and the End Conversation flowMicrophoneSelector.tsx: Dropdown component for selecting audio input devices with hot-swap support
agora-agent-uikit: Pre-built conversation UI components used directly inConversationComponent:AudioVisualizer— animated frequency bars that respond to the agent's audio trackConvoTextStream— floating chat panel showing live and completed transcript turnsMicButtonWithVisualizer(fromagora-agent-uikit/rtc) — mic button with built-in Web Audio visualizationtranscriptToMessageList— converts toolkit transcript items into UI-ready message objects
agora-agent-client-toolkit:AgoraVoiceAIclass that subscribes to RTM transcript events and normalizes them into a simple message listagora-agent-server-sdk: Server-side SDK used in API routes to start and stop the AI agent
lib/utils.ts: Helper functions including the shadcncnclass merge utilitytypes/conversation.ts: TypeScript type definitions for conversation data structures
Contributions are welcome! Please feel free to submit a Pull Request.
The application provides the following API endpoints:
- Endpoint:
/api/generate-agora-token - Method: GET
- Query Parameters:
uid(optional) - User ID (defaults to 0)channel(optional) - Channel name (auto-generated if not provided)
- Response: Returns token (with RTC + RTM privileges), uid, and channel information
- Endpoint:
/api/invite-agent - Method: POST
- Body:
{
requester_id: string;
channel_name: string;
input_modalities?: string[];
output_modalities?: string[];
}- Endpoint:
/api/stop-conversation - Method: POST
- Body:
{
agent_id: string;
}The text streaming feature uses agora-agent-client-toolkit with RTM for reliable real-time transcriptions:
- RTM Client establishes a real-time messaging connection alongside RTC for audio
AgoraVoiceAI(fromagora-agent-client-toolkit) subscribes to the RTM channel, processes transcript events, and emitsTRANSCRIPT_UPDATEDConversationComponenthandles the event, remaps local user UIDs, and updates React state with separated in-progress and completed turnsConvoTextStream(fromagora-agent-uikit) renders the chat panel with smart scrolling and streaming indicators
Key features:
- Dual RTC + RTM tokens for secure access to both audio and messaging channels
- Audio PTS metadata enabled for accurate transcription timing synchronization
- Modern turn detection with configurable interrupt behavior
- Proper resource cleanup when conversations end
The MicrophoneSelector component provides:
- Device enumeration via
AgoraRTC.getMicrophones() - Hot-swap detection through
AgoraRTC.onMicrophoneChangedcallbacks - Seamless switching using
localMicrophoneTrack.setDevice(deviceId) - Automatic fallback when the current device is disconnected
AudioVisualizer and MicButtonWithVisualizer (from agora-agent-uikit) use the Web Audio API:
- Connects to the Agora audio track's
MediaStreamvia anAnalyserNode - Uses
getByteFrequencyData()to extract frequency information - Animates visual bars using
requestAnimationFramefor smooth 60fps updates
This application uses a dual-channel architecture for optimal performance:
- RTC (Real-Time Communication): Handles high-quality audio streaming between users and AI agents
- RTM (Real-Time Messaging): Delivers transcriptions, agent state updates, metrics, and error messages
- Dual Token Authentication: Single token provides secure access to both RTC and RTM services
- Audio PTS Metadata: Enables precise synchronization between audio playback and transcription display
agora-agent-client-toolkit:AgoraVoiceAIclass managing the complete transcript lifecycle- Event-Driven Architecture: Real-time updates for transcripts, agent state changes, and system events
- Turn Detection: Modern voice activity detection with configurable interrupt behavior
- Resource Cleanup: Automatic cleanup of RTC, RTM, and agent resources when conversations end
- Reliable message delivery through dedicated RTM channel
- Access to real-time agent metrics and error reporting
- Better timing synchronization for natural conversation flow
- Proper resource management preventing memory leaks
- Modern API patterns following Agora best practices