- OpenAI API Key: Set in
.envfile or as Cloudflare Worker secret - Development Server: Run
npm run dev - Browser: Modern browser with WebSocket and Web Audio API support
- Microphone: Required only for voice chat testing
Objective: Verify text chat works without any voice functionality
Steps:
- Open
http://localhost:8787/chatin browser - Verify initial state shows "Ready" status
- Verify empty state message shows both text and voice options
- Type "Hello, can you help me?" in the text input
- Press Enter to send
Expected Results:
- ✅ WebSocket connects automatically (status changes to "Connected")
- ✅ User message appears in transcript immediately
- ✅ AI response appears in transcript
- ✅ Text input remains available for next message
- ✅ "Start Voice Chat" button is visible
Pass Criteria: Can have entire conversation using only text
Objective: Verify voice chat can be initiated directly
Steps:
- Open fresh page (or refresh)
- Click "Start Voice Chat" button
- Grant microphone permission when prompted
- Say "Hello, this is a test"
- Wait for AI response
Expected Results:
- ✅ Microphone permission prompt appears
- ✅ Status changes to "Connected"
- ✅ "Recording indicator" appears with pulsing dot
- ✅ Speech is transcribed and appears in transcript
- ✅ AI responds with voice (audio plays)
- ✅ AI response transcription appears in transcript
- ✅ Button changes to "Stop Voice"
Pass Criteria: Voice conversation works end-to-end
Objective: Verify smooth transition from text to voice in same session
Steps:
- Start with text chat (send 2-3 messages)
- Click "Start Voice Chat" button
- Grant microphone permission
- Speak a message
- Wait for AI voice response
Expected Results:
- ✅ Previous text messages remain in transcript
- ✅ Voice mode activates without interrupting conversation
- ✅ WebSocket connection maintained (not reconnected)
- ✅ Voice transcription appears after previous text messages
- ✅ Conversation context is preserved
Pass Criteria: Seamless transition with full context retention
Objective: Verify can return to text chat after using voice
Steps:
- Start voice chat and have 1-2 voice exchanges
- Click "Stop Voice" button
- Type a text message
- Send the message
Expected Results:
- ✅ Microphone is released (indicator disappears)
- ✅ "Start Voice Chat" button reappears
- ✅ WebSocket remains connected
- ✅ Text message sends successfully
- ✅ AI responds with text only
- ✅ Previous voice messages remain in transcript
Pass Criteria: Can continue conversation with text after stopping voice
Objective: Verify multiple switches between text and voice
Steps:
- Start with text (send 1 message)
- Start voice chat (say 1 thing)
- Stop voice (send 1 text message)
- Start voice again (say 1 thing)
- Stop voice (send 1 text message)
Expected Results:
- ✅ Each transition works smoothly
- ✅ Full conversation history maintained
- ✅ No connection interruptions
- ✅ Resources properly acquired/released
- ✅ AI maintains conversation context throughout
Pass Criteria: Can switch modes multiple times without issues
Objective: Verify complete disconnection
Steps:
- Start any type of chat (text or voice)
- Have a conversation with multiple messages
- Click "End Session" button
- Try to send a new message
Expected Results:
- ✅ Status changes to "Ready"
- ✅ Transcript is cleared
- ✅ All resources released
- ✅ New message triggers fresh connection
- ✅ Previous conversation is lost (as expected)
Pass Criteria: Clean disconnection and fresh start capability
Objective: Verify keyboard interactions work correctly
Steps:
- Open chat interface
- Type a message
- Press Enter (should send)
- Type another message
- Press Shift+Enter (should create new line)
- Press Enter again (should send with new line)
Expected Results:
- ✅ Enter sends message
- ✅ Shift+Enter creates new line
- ✅ Multi-line messages send correctly
- ✅ Text input clears after send
- ✅ Focus remains on input after send
Pass Criteria: Keyboard shortcuts work as documented
Objective: Verify graceful handling of denied microphone access
Steps:
- Open chat interface
- Click "Start Voice Chat"
- Deny microphone permission
- Try to send a text message
Expected Results:
- ✅ Error message appears explaining microphone issue
- ✅ Can still use text chat
- ✅ Can try voice again later
- ✅ No application crash
Pass Criteria: Graceful fallback to text-only mode
Objective: Verify handling of connection interruptions
Steps:
- Start text chat
- Disable network (simulate disconnect)
- Try to send a message
- Re-enable network
- Try to send another message
Expected Results:
- ✅ Error message shows connection issue
- ✅ Status updates to reflect disconnection
- ✅ Reconnection attempt on next message
- ✅ Connection restored successfully
Pass Criteria: Handles disconnection gracefully and can recover
Objective: Verify proper handling of interruptions
Steps:
- Start voice chat
- Speak a question that triggers long AI response
- Interrupt by speaking again mid-response
- Verify AI stops current response and processes new input
Expected Results:
- ✅ AI stops speaking when interrupted
- ✅ Audio queue is cleared
- ✅ New input is processed
- ✅ Transcript shows both partial and new response
- ✅ No audio overlap or glitches
Pass Criteria: Interruptions handled cleanly with proper cancellation
Test in the following browsers:
- Chrome/Edge (latest)
- Firefox (latest)
- Safari (latest)
- Mobile Chrome (Android)
- Mobile Safari (iOS)
Monitor the following during extended use:
- Memory usage stays stable
- No memory leaks after multiple sessions
- Audio playback remains smooth
- WebSocket connection stable over time
- UI remains responsive during audio streaming
- Keyboard navigation works throughout
- Screen readers can access transcript
- Status indicators are announced
- Error messages are clear and helpful
- All interactive elements have proper labels
- Browser Support: Requires modern browser with WebSocket and Web Audio API
- Microphone Access: Voice features require user permission
- API Costs: OpenAI Realtime API usage incurs costs per session
- Network Requirements: Requires stable internet for real-time streaming
- Context Window: Long conversations may exceed context limits
When reporting issues, include:
- Browser and version
- Steps to reproduce
- Expected vs actual behavior
- Console errors (if any)
- Network conditions
- Test scenario being performed