- Make sure your backend is running:
cd backend && ./start-local-dev.sh - Make sure your frontend is running:
cd frontend && npx expo start - Open the app on your device/simulator
- Navigate to the chat screen
- Look for the microphone button (🎤) in the input area
Goal: Verify the STT feature works end-to-end
What to say: "Hello, this is a test of the speech to text feature"
Expected result:
- ✅ Microphone button should respond when pressed
- ✅ Recording should start (visual feedback)
- ✅ After speaking, text should appear in the chat input
- ✅ You should be able to send the transcribed message
Goal: Test quick, simple commands
Test phrases:
- "What's the weather like?"
- "Tell me a joke"
- "Help me with coding"
- "Explain quantum computing"
Expected result:
- ✅ All phrases should be accurately transcribed
- ✅ Transcription should be fast (< 3 seconds)
- ✅ No garbled or missing text
Goal: Test accuracy with longer content
What to say: "I would like you to help me understand how machine learning algorithms work, specifically focusing on neural networks and their applications in natural language processing"
Expected result:
- ✅ Should handle longer sentences well
- ✅ Technical terms should be transcribed accurately
- ✅ Punctuation and context should be preserved
Goal: Test transcription of numbers and technical content
What to say: "The meeting is scheduled for March 15th at 3:30 PM. The project budget is $50,000 and we need to deliver by Q2 2024"
Expected result:
- ✅ Numbers should be transcribed correctly
- ✅ Dates and times should be accurate
- ✅ Currency and technical terms should work
Goal: Test robustness in different environments
Test setup:
- Try recording in a quiet room first
- Then try with some background noise (TV, music, etc.)
What to say: "Testing speech recognition with background noise"
Expected result:
- ✅ Should still work reasonably well with moderate noise
- ✅ May have some accuracy reduction but should be functional
Goal: Test language detection
Test phrases:
- "Hola, ¿cómo estás?" (Spanish)
- "Bonjour, comment allez-vous?" (French)
- "Hello, how are you?" (English)
Expected result:
- ✅ Should auto-detect language correctly
- ✅ Non-English should be transcribed accurately
- ✅ Language switching should work seamlessly
Goal: Test system limits and error handling
Test scenarios:
- Very short recording: Just say "Hi"
- Very long recording: Speak for 30+ seconds
- Silence: Record with no speech
- Interruption: Start recording, then stop immediately
Expected results:
- ✅ Short recordings should work (minimum 1 second)
- ✅ Long recordings should be handled gracefully
- ✅ Silent recordings should give appropriate feedback
- ✅ Interrupted recordings should not crash the app
Goal: Test the complete user experience
Steps to test:
- Tap microphone button
- Speak clearly
- Wait for transcription
- Review the transcribed text
- Edit if needed
- Send the message
- Verify the message appears in chat
Expected result:
- ✅ Smooth, intuitive flow
- ✅ Clear visual feedback during recording
- ✅ Easy to edit transcribed text
- ✅ Seamless integration with chat
Solutions:
- Check microphone permissions in your device settings
- Make sure the app has audio recording permissions
- Try restarting the app
Solutions:
- Verify backend is running on port 8000
- Check that the STT service is properly configured
- Look at backend logs for errors
Solutions:
- Speak clearly and at moderate pace
- Reduce background noise
- Try shorter phrases
- Check microphone quality
Solutions:
- Verify expo-audio is properly configured for WAV output
- Check that the backend expects WAV format
- Look for format mismatch errors in logs
✅ Test Passed If:
- All test scenarios complete without crashes
- Transcription accuracy is >80% for clear speech
- Response time is <5 seconds for typical phrases
- UI provides clear feedback during recording
- Integration with chat works seamlessly
- Error handling works gracefully
❌ Test Failed If:
- App crashes during recording
- Transcription never appears
- Severe accuracy issues (>50% errors)
- Very slow response times (>10 seconds)
- UI becomes unresponsive
- Integration breaks the chat flow
- If all tests pass: The STT feature is ready for production use
- If some tests fail: Note which scenarios failed and we can debug them
- Performance issues: We can optimize the Whisper model or recording settings
- Accuracy issues: We can fine-tune the audio recording parameters
Happy Testing! 🎤✨
Remember: The first few tests might be slower as the Whisper model loads. Subsequent tests should be faster.