Skip to content

Latest commit

 

History

History
226 lines (143 loc) · 5.54 KB

File metadata and controls

226 lines (143 loc) · 5.54 KB

🎤 Speech-to-Text Testing Conversation

🚀 Quick Setup

  1. Make sure your backend is running: cd backend && ./start-local-dev.sh
  2. Make sure your frontend is running: cd frontend && npx expo start
  3. Open the app on your device/simulator
  4. Navigate to the chat screen
  5. Look for the microphone button (🎤) in the input area

🧪 Test Scenarios

Test 1: Basic Functionality

Goal: Verify the STT feature works end-to-end

What to say: "Hello, this is a test of the speech to text feature"

Expected result:

  • ✅ Microphone button should respond when pressed
  • ✅ Recording should start (visual feedback)
  • ✅ After speaking, text should appear in the chat input
  • ✅ You should be able to send the transcribed message

Test 2: Short Phrases

Goal: Test quick, simple commands

Test phrases:

  1. "What's the weather like?"
  2. "Tell me a joke"
  3. "Help me with coding"
  4. "Explain quantum computing"

Expected result:

  • ✅ All phrases should be accurately transcribed
  • ✅ Transcription should be fast (< 3 seconds)
  • ✅ No garbled or missing text

Test 3: Longer, Complex Sentences

Goal: Test accuracy with longer content

What to say: "I would like you to help me understand how machine learning algorithms work, specifically focusing on neural networks and their applications in natural language processing"

Expected result:

  • ✅ Should handle longer sentences well
  • ✅ Technical terms should be transcribed accurately
  • ✅ Punctuation and context should be preserved

Test 4: Numbers and Special Characters

Goal: Test transcription of numbers and technical content

What to say: "The meeting is scheduled for March 15th at 3:30 PM. The project budget is $50,000 and we need to deliver by Q2 2024"

Expected result:

  • ✅ Numbers should be transcribed correctly
  • ✅ Dates and times should be accurate
  • ✅ Currency and technical terms should work

Test 5: Background Noise Handling

Goal: Test robustness in different environments

Test setup:

  • Try recording in a quiet room first
  • Then try with some background noise (TV, music, etc.)

What to say: "Testing speech recognition with background noise"

Expected result:

  • ✅ Should still work reasonably well with moderate noise
  • ✅ May have some accuracy reduction but should be functional

Test 6: Multiple Languages (if applicable)

Goal: Test language detection

Test phrases:

  1. "Hola, ¿cómo estás?" (Spanish)
  2. "Bonjour, comment allez-vous?" (French)
  3. "Hello, how are you?" (English)

Expected result:

  • ✅ Should auto-detect language correctly
  • ✅ Non-English should be transcribed accurately
  • ✅ Language switching should work seamlessly

Test 7: Edge Cases

Goal: Test system limits and error handling

Test scenarios:

  1. Very short recording: Just say "Hi"
  2. Very long recording: Speak for 30+ seconds
  3. Silence: Record with no speech
  4. Interruption: Start recording, then stop immediately

Expected results:

  • ✅ Short recordings should work (minimum 1 second)
  • ✅ Long recordings should be handled gracefully
  • ✅ Silent recordings should give appropriate feedback
  • ✅ Interrupted recordings should not crash the app

Test 8: UI/UX Flow

Goal: Test the complete user experience

Steps to test:

  1. Tap microphone button
  2. Speak clearly
  3. Wait for transcription
  4. Review the transcribed text
  5. Edit if needed
  6. Send the message
  7. Verify the message appears in chat

Expected result:

  • ✅ Smooth, intuitive flow
  • ✅ Clear visual feedback during recording
  • ✅ Easy to edit transcribed text
  • ✅ Seamless integration with chat

🐛 Troubleshooting Common Issues

Issue: "Failed to start recording"

Solutions:

  • Check microphone permissions in your device settings
  • Make sure the app has audio recording permissions
  • Try restarting the app

Issue: "STT request failed: 404"

Solutions:

  • Verify backend is running on port 8000
  • Check that the STT service is properly configured
  • Look at backend logs for errors

Issue: Poor transcription accuracy

Solutions:

  • Speak clearly and at moderate pace
  • Reduce background noise
  • Try shorter phrases
  • Check microphone quality

Issue: No audio format supported

Solutions:

  • Verify expo-audio is properly configured for WAV output
  • Check that the backend expects WAV format
  • Look for format mismatch errors in logs

📊 Success Criteria

✅ Test Passed If:

  • All test scenarios complete without crashes
  • Transcription accuracy is >80% for clear speech
  • Response time is <5 seconds for typical phrases
  • UI provides clear feedback during recording
  • Integration with chat works seamlessly
  • Error handling works gracefully

❌ Test Failed If:

  • App crashes during recording
  • Transcription never appears
  • Severe accuracy issues (>50% errors)
  • Very slow response times (>10 seconds)
  • UI becomes unresponsive
  • Integration breaks the chat flow

🎯 Next Steps After Testing

  1. If all tests pass: The STT feature is ready for production use
  2. If some tests fail: Note which scenarios failed and we can debug them
  3. Performance issues: We can optimize the Whisper model or recording settings
  4. Accuracy issues: We can fine-tune the audio recording parameters

Happy Testing! 🎤✨

Remember: The first few tests might be slower as the Whisper model loads. Subsequent tests should be faster.