Skip to content

Allow voice input for future prompts / output speech explaining what was done#34

Open
simonzheng wants to merge 1 commit intomainfrom
pr34
Open

Allow voice input for future prompts / output speech explaining what was done#34
simonzheng wants to merge 1 commit intomainfrom
pr34

Conversation

@simonzheng
Copy link
Collaborator

Summary:
Allow voice input for future prompts / output speech explaining what was done

Cursor Agent Prompt: Add Voice Input to Claude Code Go

Objective

Add a voice input feature using Expo Speech Recognition to the Claude Code Go app, placing a speech button next to the existing text input for prompt submission.

Background

Claude Code Go is an Expo React Native app that allows users to browse directories and run prompts in a git repository, leveraging Claude Code for code generation. Currently, the app only supports text input for prompts, but we want to add voice dictation as an alternative input method.

Requirements

  1. Add a microphone button next to the existing text input field
  2. Implement speech-to-text functionality using Expo Speech Recognition
  3. Show visual feedback during recording
  4. Allow users to cancel recording
  5. Maintain compatibility with the existing text input workflow

Technical Implementation

1. Install Dependencies

First, install the Expo Speech Recognition package:

npx expo install expo-speech-recognition @react-native-voice/voice

2. Add Permissions

Update the app.json file to request microphone permissions:

{
  "expo": {
    "plugins": [
      [
        "expo-speech-recognition",
        {
          "microphonePermission": "Allow $(PRODUCT_NAME) to access your microphone"
        }
      ]
    ],
    "android": {
      "permissions": ["RECORD_AUDIO"]
    }
  }
}

3. Component Implementation

Find the component file where your text input exists and implement the speech button with the following features:

  • A microphone button that toggles speech recognition
  • Visual feedback during recording
  • Error handling
  • Integration with the existing text input

4. Code Implementation

Create a custom hook for speech recognition:

// hooks/useSpeechRecognition.js
import { useState, useEffect } from 'react';
import * as Speech from 'expo-speech-recognition';
import { Platform } from 'react-native';

export const useSpeechRecognition = () => {
  const [isListening, setIsListening] = useState(false);
  const [speechText, setSpeechText] = useState('');
  const [hasPermission, setHasPermission] = useState(false);
  const [error, setError] = useState(null);

  useEffect(() => {
    const checkPermissions = async () => {
      try {
        const { status } = await Speech.requestPermissionsAsync();
        setHasPermission(status === 'granted');
      } catch (err) {
        setError('Permission check failed');
        console.error(err);
      }
    };

    checkPermissions();
    return () => {
      if (isListening) {
        stopListening();
      }
    };
  }, []);

  const startListening = async () => {
    try {
      if (!hasPermission) {
        const { status } = await Speech.requestPermissionsAsync();
        if (status !== 'granted') {
          setError('Microphone permission denied');
          return;
        }
        setHasPermission(true);
      }

      setSpeechText('');
      setError(null);

      await Speech.startAsync({
        onSpeechResult: (result) => {
          if (result.value && result.value.length > 0) {
            setSpeechText(result.value[0]);
          }
        },
        onSpeechError: (err) => {
          setError(err.message);
          setIsListening(false);
        },
        options: {
          language: 'en-US',
        },
      });

      setIsListening(true);
    } catch (err) {
      setError('Failed to start speech recognition');
      console.error(err);
    }
  };

  const stopListening = async () => {
    try {
      await Speech.stopAsync();
      setIsListening(false);
    } catch (err) {
      console.error(err);
    }
  };

  const toggleListening = async () => {
    if (isListening) {
      await stopListening();
    } else {
      await startListening();
    }
  };

  return {
    isListening,
    speechText,
    hasPermission,
    error,
    startListening,
    stopListening,
    toggleListening,
  };
};

Modify your prompt input component to include the voice button:

// components/PromptInput.jsx
import React, { useState, useEffect } from 'react';
import { View, TextInput, TouchableOpacity, StyleSheet, ActivityIndicator, Text } from 'react-native';
import { FontAwesome } from '@expo/vector-icons';
import { useSpeechRecognition } from '../hooks/useSpeechRecognition';

export const PromptInput = ({ onSubmit }) => {
  const [prompt, setPrompt] = useState('');
  const {
    isListening,
    speechText,
    hasPermission,
    error,
    toggleListening,
    stopListening,
  } = useSpeechRecognition();

  useEffect(() => {
    if (speechText) {
      setPrompt(prev => prev + speechText);
    }
  }, [speechText]);

  const handleSubmit = () => {
    if (prompt.trim()) {
      onSubmit(prompt.trim());
      setPrompt('');
    }
  };

  return (
    <View style={styles.container}>
      <View style={styles.inputContainer}>
        <TextInput
          style={styles.input}
          value={prompt}
          onChangeText={setPrompt}
          placeholder="Enter your prompt..."
          multiline
        />
        <View style={styles.buttons}>
          <TouchableOpacity
            style={[styles.iconButton, isListening && styles.recording]}
            onPress={toggleListening}
            disabled={!hasPermission}
          >
            {isListening ? (
              <FontAwesome name="microphone" size={24} color="#ff4f4f" />
            ) : (
              <FontAwesome name="microphone-slash" size={24} color="#333" />
            )}
          </TouchableOpacity>

          <TouchableOpacity
            style={styles.sendButton}
            onPress={handleSubmit}
            disabled={!prompt.trim()}
          >
            <FontAwesome name="send" size={20} color="#fff" />
          </TouchableOpacity>
        </View>
      </View>

      {isListening && (
        <View style={styles.listeningIndicator}>
          <ActivityIndicator size="small" color="#ff4f4f" />
          <Text style={styles.listeningText}>Listening...</Text>
          <TouchableOpacity onPress={stopListening}>
            <Text style={styles.cancelText}>Cancel</Text>
          </TouchableOpacity>
        </View>
      )}

      {error && <Text style={styles.errorText}>{error}</Text>}
    </View>
  );
};

const styles = StyleSheet.create({
  container: {
    padding: 12,
    borderTopWidth: 1,
    borderTopColor: '#e0e0e0',
    backgroundColor: '#fff',
  },
  inputContainer: {
    flexDirection: 'row',
    alignItems: 'flex-end',
  },
  input: {
    flex: 1,
    borderWidth: 1,
    borderColor: '#ddd',
    borderRadius: 8,
    padding: 10,
    maxHeight: 120,
    fontSize: 16,
  },
  buttons: {
    flexDirection: 'row',
    marginLeft: 8,
    alignItems: 'center',
  },
  iconButton: {
    padding: 10,
    borderRadius: 20,
    marginRight: 8,
    backgroundColor: '#f0f0f0',
  },
  recording: {
    backgroundColor: '#ffe0e0',
  },
  sendButton: {
    backgroundColor: '#007BFF',
    borderRadius: 20,
    padding: 10,
    alignItems: 'center',
    justifyContent: 'center',
  },
  listeningIndicator: {
    flexDirection: 'row',
    alignItems: 'center',
    padding: 8,
    marginTop: 8,
    backgroundColor: '#f8f8f8',
    borderRadius: 8,
  },
  listeningText: {
    marginLeft: 8,
    color: '#333',
    flex: 1,
  },
  cancelText: {
    color: '#ff4f4f',
    fontWeight: 'bold',
  },
  errorText: {
    color: '#ff4f4f',
    marginTop: 8,
  },
});

5. Register the Component

Make sure to update your main screen or layout to use the enhanced PromptInput component:

// screens/PromptScreen.jsx
import React from 'react';
import { View, StyleSheet } from 'react-native';
import { PromptInput } from '../components/PromptInput';

export const PromptScreen = () => {
  const handleSubmit = async (prompt) => {
    // Your existing logic to handle the prompt submission
    console.log('Processing prompt:', prompt);
    // Call your API or Claude Code here
  };

  return (
    <View style={styles.container}>
      {/* Other components */}
      <PromptInput onSubmit={handleSubmit} />
    </View>
  );
};

const styles = StyleSheet.create({
  container: {
    flex: 1,
    backgroundColor: '#fff',
  },
});

Testing

  1. Test on both iOS and Android devices
  2. Verify permissions work correctly
  3. Test both text input and voice input
  4. Ensure voice input properly populates the text field
  5. Confirm error handling works as expected

Additional Considerations

  • Add loading indicators during speech processing
  • Consider implementing a time limit for voice input
  • Add sound effects or haptic feedback when starting/stopping recording
  • Consider adding a voice level indicator during recording
  • Implement a way to handle different languages

Edge Cases

  • Handle microphone permission denials gracefully
  • Consider offline functionality
  • Handle long speech inputs that might exceed text limits

Test Plan:

…was done

Summary:
Allow voice input for future prompts / output speech explaining what was done

# Cursor Agent Prompt: Add Voice Input to Claude Code Go

## Objective
Add a voice input feature using Expo Speech Recognition to the Claude Code Go app, placing a speech button next to the existing text input for prompt submission.

## Background
Claude Code Go is an Expo React Native app that allows users to browse directories and run prompts in a git repository, leveraging Claude Code for code generation. Currently, the app only supports text input for prompts, but we want to add voice dictation as an alternative input method.

## Requirements
1. Add a microphone button next to the existing text input field
2. Implement speech-to-text functionality using Expo Speech Recognition
3. Show visual feedback during recording
4. Allow users to cancel recording
5. Maintain compatibility with the existing text input workflow

## Technical Implementation

### 1. Install Dependencies
First, install the Expo Speech Recognition package:

```bash
npx expo install expo-speech-recognition @react-native-voice/voice
```

### 2. Add Permissions
Update the app.json file to request microphone permissions:

```json
{
  "expo": {
    "plugins": [
      [
        "expo-speech-recognition",
        {
          "microphonePermission": "Allow $(PRODUCT_NAME) to access your microphone"
        }
      ]
    ],
    "android": {
      "permissions": ["RECORD_AUDIO"]
    }
  }
}
```

### 3. Component Implementation
Find the component file where your text input exists and implement the speech button with the following features:

- A microphone button that toggles speech recognition
- Visual feedback during recording
- Error handling
- Integration with the existing text input

### 4. Code Implementation
Create a custom hook for speech recognition:

```jsx
// hooks/useSpeechRecognition.js
import { useState, useEffect } from 'react';
import * as Speech from 'expo-speech-recognition';
import { Platform } from 'react-native';

export const useSpeechRecognition = () => {
  const [isListening, setIsListening] = useState(false);
  const [speechText, setSpeechText] = useState('');
  const [hasPermission, setHasPermission] = useState(false);
  const [error, setError] = useState(null);

  useEffect(() => {
    const checkPermissions = async () => {
      try {
        const { status } = await Speech.requestPermissionsAsync();
        setHasPermission(status === 'granted');
      } catch (err) {
        setError('Permission check failed');
        console.error(err);
      }
    };

    checkPermissions();
    return () => {
      if (isListening) {
        stopListening();
      }
    };
  }, []);

  const startListening = async () => {
    try {
      if (!hasPermission) {
        const { status } = await Speech.requestPermissionsAsync();
        if (status !== 'granted') {
          setError('Microphone permission denied');
          return;
        }
        setHasPermission(true);
      }

      setSpeechText('');
      setError(null);

      await Speech.startAsync({
        onSpeechResult: (result) => {
          if (result.value && result.value.length > 0) {
            setSpeechText(result.value[0]);
          }
        },
        onSpeechError: (err) => {
          setError(err.message);
          setIsListening(false);
        },
        options: {
          language: 'en-US',
        },
      });

      setIsListening(true);
    } catch (err) {
      setError('Failed to start speech recognition');
      console.error(err);
    }
  };

  const stopListening = async () => {
    try {
      await Speech.stopAsync();
      setIsListening(false);
    } catch (err) {
      console.error(err);
    }
  };

  const toggleListening = async () => {
    if (isListening) {
      await stopListening();
    } else {
      await startListening();
    }
  };

  return {
    isListening,
    speechText,
    hasPermission,
    error,
    startListening,
    stopListening,
    toggleListening,
  };
};
```

Modify your prompt input component to include the voice button:

```jsx
// components/PromptInput.jsx
import React, { useState, useEffect } from 'react';
import { View, TextInput, TouchableOpacity, StyleSheet, ActivityIndicator, Text } from 'react-native';
import { FontAwesome } from '@expo/vector-icons';
import { useSpeechRecognition } from '../hooks/useSpeechRecognition';

export const PromptInput = ({ onSubmit }) => {
  const [prompt, setPrompt] = useState('');
  const {
    isListening,
    speechText,
    hasPermission,
    error,
    toggleListening,
    stopListening,
  } = useSpeechRecognition();

  useEffect(() => {
    if (speechText) {
      setPrompt(prev => prev + speechText);
    }
  }, [speechText]);

  const handleSubmit = () => {
    if (prompt.trim()) {
      onSubmit(prompt.trim());
      setPrompt('');
    }
  };

  return (
    <View style={styles.container}>
      <View style={styles.inputContainer}>
        <TextInput
          style={styles.input}
          value={prompt}
          onChangeText={setPrompt}
          placeholder="Enter your prompt..."
          multiline
        />
        <View style={styles.buttons}>
          <TouchableOpacity
            style={[styles.iconButton, isListening && styles.recording]}
            onPress={toggleListening}
            disabled={!hasPermission}
          >
            {isListening ? (
              <FontAwesome name="microphone" size={24} color="#ff4f4f" />
            ) : (
              <FontAwesome name="microphone-slash" size={24} color="#333" />
            )}
          </TouchableOpacity>

          <TouchableOpacity
            style={styles.sendButton}
            onPress={handleSubmit}
            disabled={!prompt.trim()}
          >
            <FontAwesome name="send" size={20} color="#fff" />
          </TouchableOpacity>
        </View>
      </View>

      {isListening && (
        <View style={styles.listeningIndicator}>
          <ActivityIndicator size="small" color="#ff4f4f" />
          <Text style={styles.listeningText}>Listening...</Text>
          <TouchableOpacity onPress={stopListening}>
            <Text style={styles.cancelText}>Cancel</Text>
          </TouchableOpacity>
        </View>
      )}

      {error && <Text style={styles.errorText}>{error}</Text>}
    </View>
  );
};

const styles = StyleSheet.create({
  container: {
    padding: 12,
    borderTopWidth: 1,
    borderTopColor: '#e0e0e0',
    backgroundColor: '#fff',
  },
  inputContainer: {
    flexDirection: 'row',
    alignItems: 'flex-end',
  },
  input: {
    flex: 1,
    borderWidth: 1,
    borderColor: '#ddd',
    borderRadius: 8,
    padding: 10,
    maxHeight: 120,
    fontSize: 16,
  },
  buttons: {
    flexDirection: 'row',
    marginLeft: 8,
    alignItems: 'center',
  },
  iconButton: {
    padding: 10,
    borderRadius: 20,
    marginRight: 8,
    backgroundColor: '#f0f0f0',
  },
  recording: {
    backgroundColor: '#ffe0e0',
  },
  sendButton: {
    backgroundColor: '#007BFF',
    borderRadius: 20,
    padding: 10,
    alignItems: 'center',
    justifyContent: 'center',
  },
  listeningIndicator: {
    flexDirection: 'row',
    alignItems: 'center',
    padding: 8,
    marginTop: 8,
    backgroundColor: '#f8f8f8',
    borderRadius: 8,
  },
  listeningText: {
    marginLeft: 8,
    color: '#333',
    flex: 1,
  },
  cancelText: {
    color: '#ff4f4f',
    fontWeight: 'bold',
  },
  errorText: {
    color: '#ff4f4f',
    marginTop: 8,
  },
});
```

### 5. Register the Component
Make sure to update your main screen or layout to use the enhanced PromptInput component:

```jsx
// screens/PromptScreen.jsx
import React from 'react';
import { View, StyleSheet } from 'react-native';
import { PromptInput } from '../components/PromptInput';

export const PromptScreen = () => {
  const handleSubmit = async (prompt) => {
    // Your existing logic to handle the prompt submission
    console.log('Processing prompt:', prompt);
    // Call your API or Claude Code here
  };

  return (
    <View style={styles.container}>
      {/* Other components */}
      <PromptInput onSubmit={handleSubmit} />
    </View>
  );
};

const styles = StyleSheet.create({
  container: {
    flex: 1,
    backgroundColor: '#fff',
  },
});
```

## Testing
1. Test on both iOS and Android devices
2. Verify permissions work correctly
3. Test both text input and voice input
4. Ensure voice input properly populates the text field
5. Confirm error handling works as expected

## Additional Considerations
- Add loading indicators during speech processing
- Consider implementing a time limit for voice input
- Add sound effects or haptic feedback when starting/stopping recording
- Consider adding a voice level indicator during recording
- Implement a way to handle different languages

## Edge Cases
- Handle microphone permission denials gracefully
- Consider offline functionality
- Handle long speech inputs that might exceed text limits



Test Plan:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant