Conversational AI: Dev Advocate Agent Demo

A Next.js web application demonstrating real-time conversational AI capabilities using Agora's Real-Time Engagement SDKs. This demo showcases voice-first interactions with live transcriptions, multi-device audio input support, and an Agent ready to help you with your Agora build.

Overview

This application demonstrates how to build a production-ready conversational AI interface with:

Real-time voice conversations with AI agents powered by Agora's Conversational AI Engine
RTM-based messaging for reliable real-time transcriptions and agent state updates
Live text transcriptions with streaming message updates and visual status indicators
Advanced audio controls including device selection and visual feedback
Modern UX patterns like smart auto-scrolling, mobile responsiveness, and accessibility features
Flexible backend integration supporting multiple LLM providers (OpenAI, Anthropic, etc.) and TTS via ElevenLabs
Official Agora toolkit integration for robust conversation management

Guides and Documentation

Guide.md - Complete step-by-step guide on how to build this application from scratch.
Text Streaming Guide - Deep dive into real-time conversation transcriptions using the toolkit and UI kit.

Prerequisites

Before you begin, ensure you have the following installed:

Node.js (version 22.x or higher)
pnpm (version 8.x or higher)

You must have an Agora account and a project to use this application.

Agora Account

Installation

Clone the repository:

git clone https://github.com/AgoraIO-Community/conversational-ai-nextjs-client
cd conversational-ai-nextjs-client

Install dependencies:

pnpm install

Create a .env.local file in the root directory and add your environment variables:

cp env.local.example .env.local

The following environment variables are required:

Agora

NEXT_PUBLIC_AGORA_APP_ID - Your Agora App ID
NEXT_AGORA_APP_CERTIFICATE - Your Agora App Certificate
NEXT_PUBLIC_AGENT_UID - UID assigned to the AI agent in the RTC channel

LLM

NEXT_LLM_URL - Any OpenAI-compatible endpoint (OpenAI, Azure, Groq, etc.)
NEXT_LLM_API_KEY - LLM API key

ASR

NEXT_DEEPGRAM_API_KEY - Deepgram API key

TTS

NEXT_ELEVENLABS_API_KEY - ElevenLabs API key

Non-sensitive settings (model names, voice ID, language, etc.) are set directly in app/api/invite-agent/route.ts — edit them there.

Run the development server:

pnpm dev

Open your browser and navigate to http://localhost:3000 to see the application in action.

Deployment to Vercel

This project is configured for quick deployments to Vercel.

This will:

Clone the repository to your GitHub account
Create a new project on Vercel
Prompt you to fill in the required environment variables:
- Required: Agora credentials (NEXT_PUBLIC_AGORA_APP_ID, NEXT_AGORA_APP_CERTIFICATE, NEXT_PUBLIC_AGENT_UID)
- Required: LLM endpoint and API key (NEXT_LLM_URL, NEXT_LLM_API_KEY)
- Required: Deepgram API key (NEXT_DEEPGRAM_API_KEY) and ElevenLabs API key (NEXT_ELEVENLABS_API_KEY)
Deploy the application automatically

Features

Audio Input Control

Microphone Toggle: Easy-to-use button to enable/disable your microphone
Device Selection: Choose from multiple microphone inputs with the microphone selector dropdown
Hot-Swap Support: Automatically detects when devices are plugged in/unplugged
Audio Visualization: Real-time visual feedback showing microphone input levels

Real-Time Text Streaming

Live Transcriptions: See what you say and the AI's responses in real-time as text
Message Status Indicators: Visual feedback for in-progress, completed, and interrupted messages
Smart Auto-Scroll: Automatically scrolls to new messages while preserving scroll position when reviewing history
Mobile-Responsive Chat UI: Collapsible chat window that adapts to different screen sizes
Desktop Auto-Open: Chat window automatically opens on first message (desktop only)
Message Persistence: Full conversation history maintained throughout the session

AI Conversation Engine

Custom LLM Integration: Connect your preferred LLM (OpenAI, Anthropic, etc.)
ElevenLabs TTS: High-quality voice synthesis with ElevenLabs
Modern Turn Detection: Advanced turn-taking with configurable interrupt behavior
RTM Data Channel: Reliable message delivery with metrics and error reporting
Token Management: Automatic token renewal for both RTC and RTM to prevent disconnections
Agent Lifecycle: Agent is invited when you click "Try it now!"; End Conversation button stops the agent and closes the session
Official Toolkit: Uses Agora's ConversationalAIAPI for robust conversation management

User Experience

Audio Visualizations: Animated frequency bars for both user and AI audio
Connection Status: Real-time connection indicators
Error Handling: Graceful error messages and recovery options
Accessibility: ARIA labels and keyboard-friendly controls

Voice Options

ElevenLabs

Browse and select voices at: https://elevenlabs.io/app/voice-lab

Set your chosen voice ID in the ELEVENLABS_VOICE_ID constant in app/api/invite-agent/route.ts.

Key Components

The application is built with a modular component architecture:

Core Components

LandingPage.tsx: Entry point that invites the agent when you click "Try it now!" and manages the conversation lifecycle with proper agent cleanup on end
ConversationComponent.tsx: Main conversation container handling RTC and RTM connections, audio/text streaming, and the End Conversation flow
MicrophoneSelector.tsx: Dropdown component for selecting audio input devices with hot-swap support

Agora Packages

agora-agent-uikit: Pre-built conversation UI components used directly in ConversationComponent:
- AudioVisualizer — animated frequency bars that respond to the agent's audio track
- ConvoTextStream — floating chat panel showing live and completed transcript turns
- MicButtonWithVisualizer (from agora-agent-uikit/rtc) — mic button with built-in Web Audio visualization
- transcriptToMessageList — converts toolkit transcript items into UI-ready message objects
agora-agent-client-toolkit: AgoraVoiceAI class that subscribes to RTM transcript events and normalizes them into a simple message list
agora-agent-server-sdk: Server-side SDK used in API routes to start and stop the AI agent

Utilities

lib/utils.ts: Helper functions including the shadcn cn class merge utility
types/conversation.ts: TypeScript type definitions for conversation data structures

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

API Endpoints

The application provides the following API endpoints:

Generate Agora Token

Endpoint: /api/generate-agora-token
Method: GET
Query Parameters:
- uid (optional) - User ID (defaults to 0)
- channel (optional) - Channel name (auto-generated if not provided)
Response: Returns token (with RTC + RTM privileges), uid, and channel information

Invite Agent

Endpoint: /api/invite-agent
Method: POST
Body:

{
  requester_id: string;
  channel_name: string;
  input_modalities?: string[];
  output_modalities?: string[];
}

Stop Conversation

Endpoint: /api/stop-conversation
Method: POST
Body:

{
  agent_id: string;
}

Technical Implementation Details

Text Streaming Architecture

The text streaming feature uses agora-agent-client-toolkit with RTM for reliable real-time transcriptions:

RTM Client establishes a real-time messaging connection alongside RTC for audio
AgoraVoiceAI (from agora-agent-client-toolkit) subscribes to the RTM channel, processes transcript events, and emits TRANSCRIPT_UPDATED
ConversationComponent handles the event, remaps local user UIDs, and updates React state with separated in-progress and completed turns
ConvoTextStream (from agora-agent-uikit) renders the chat panel with smart scrolling and streaming indicators

Key features:

Dual RTC + RTM tokens for secure access to both audio and messaging channels
Audio PTS metadata enabled for accurate transcription timing synchronization
Modern turn detection with configurable interrupt behavior
Proper resource cleanup when conversations end

Microphone Device Management

The MicrophoneSelector component provides:

Device enumeration via AgoraRTC.getMicrophones()
Hot-swap detection through AgoraRTC.onMicrophoneChanged callbacks
Seamless switching using localMicrophoneTrack.setDevice(deviceId)
Automatic fallback when the current device is disconnected

Audio Visualization

AudioVisualizer and MicButtonWithVisualizer (from agora-agent-uikit) use the Web Audio API:

Connects to the Agora audio track's MediaStream via an AnalyserNode
Uses getByteFrequencyData() to extract frequency information
Animates visual bars using requestAnimationFrame for smooth 60fps updates

Architecture

This application uses a dual-channel architecture for optimal performance:

RTC + RTM Integration

RTC (Real-Time Communication): Handles high-quality audio streaming between users and AI agents
RTM (Real-Time Messaging): Delivers transcriptions, agent state updates, metrics, and error messages
Dual Token Authentication: Single token provides secure access to both RTC and RTM services
Audio PTS Metadata: Enables precise synchronization between audio playback and transcription display

Conversation Management

agora-agent-client-toolkit: AgoraVoiceAI class managing the complete transcript lifecycle
Event-Driven Architecture: Real-time updates for transcripts, agent state changes, and system events
Turn Detection: Modern voice activity detection with configurable interrupt behavior
Resource Cleanup: Automatic cleanup of RTC, RTM, and agent resources when conversations end

Benefits

Reliable message delivery through dedicated RTM channel
Access to real-time agent metrics and error reporting
Better timing synchronization for natural conversation flow
Proper resource management preventing memory leaks
Modern API patterns following Agora best practices

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
DOCS		DOCS
app		app
components		components
hooks		hooks
lib		lib
public		public
styles		styles
types		types
.gitattributes		.gitattributes
.gitignore		.gitignore
.npmrc		.npmrc
.vercelignore		.vercelignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
agents.md		agents.md
components.json		components.json
env.local.example		env.local.example
next.config.mjs		next.config.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
react-jsx.d.ts		react-jsx.d.ts
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json
vercel.project.json		vercel.project.json

Folders and files

Latest commit

History

Repository files navigation

Conversational AI: Dev Advocate Agent Demo

Overview

Guides and Documentation

Prerequisites

Installation

Agora

LLM

ASR

TTS

Deployment to Vercel

Features

Audio Input Control

Real-Time Text Streaming

AI Conversation Engine

User Experience

Voice Options

ElevenLabs

Key Components

Core Components

Agora Packages

Utilities

Contributing

API Endpoints

Generate Agora Token

Invite Agent

Stop Conversation

Technical Implementation Details

Text Streaming Architecture

Microphone Device Management

Audio Visualization

Architecture

RTC + RTM Integration

Conversation Management

Benefits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages