You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// Use this only if you need real-time VAD-detected PCM float frames.
96
+
}
91
97
})
92
98
}
93
99
@@ -99,10 +105,12 @@ class MainActivity : AppCompatActivity() {
99
105
```
100
106
101
107
### **2. Understanding `setVADCallback`**
108
+
102
109
`setVADCallback` is used to register a callback that gets notified when voice activity starts or ends.
103
110
104
111
-`onVoiceStart()`: Triggered when voice is detected.
105
112
-`onVoiceEnd(wavData: ByteArray?)`: Triggered when voice stops, providing a WAV file as a byte array.
113
+
-`onVoiceDidContinue(pcmFloatData: ByteArray?)`: Triggered during speech, providing real-time PCM float frames. Use this only if you need real-time audio data while speech is in progress.
106
114
107
115
This enables real-time processing of voice input, allowing applications to act on detected speech events.
108
116
@@ -111,14 +119,16 @@ This enables real-time processing of voice input, allowing applications to act o
111
119
## Configuration Options
112
120
113
121
### Sample Rates
122
+
114
123
You can set the audio sample rate using `setSamplerate`:
-**Start detection probability threshold (0.7)**: The VAD model must predict speech probability above this threshold to trigger voice start.
146
157
-**End detection probability threshold (0.7)**: The VAD model must predict speech probability below this threshold to trigger voice end.
147
158
-**True positive ratio for voice start (0.5)**: 50% of frames in a given window must be speech for voice activity to begin.
148
159
-**False positive ratio for voice end (0.95)**: 95% of frames in a given window must be silence for voice activity to end.
149
160
-**Start frame count (10 frames ≈ 0.32s)**: Number of frames required to confirm voice activity.
150
161
-**End frame count (57 frames ≈ 1.824s)**: Number of frames required to confirm silence before stopping voice detection.
151
162
152
-
153
163
#### **Important Notes:**
164
+
154
165
-**Stricter VAD Detection in Silero v5**:
155
-
Based on observations, Silero v5 appears to apply a stricter VAD detection mechanism compared to v4.
166
+
Based on observations, Silero v5 appears to apply a stricter VAD detection mechanism compared to v4.
156
167
157
168
-**Differences in Speech Start Detection**:
158
-
In Silero v4, speech is considered to have started if, within 10 frames (0.32s), **80%** of the frames exceed a VAD probability of 70%.
159
-
In Silero v5, this condition is relaxed, and speech is considered started if **50%** of the frames within 10 frames (0.32s) exceed a VAD probability of 70%.
160
-
Adjusting Sensitivity for Voice Activity Detection
161
-
If you need to fine-tune the sensitivity of voice segmentation, use the following function to customize the thresholds:
169
+
In Silero v4, speech is considered to have started if, within 10 frames (0.32s), **80%** of the frames exceed a VAD probability of 70%.
170
+
In Silero v5, this condition is relaxed, and speech is considered started if **50%** of the frames within 10 frames (0.32s) exceed a VAD probability of 70%.
171
+
Adjusting Sensitivity for Voice Activity Detection
172
+
If you need to fine-tune the sensitivity of voice segmentation, use the following function to customize the thresholds:
By adjusting these parameters, you can fine-tune the strictness of voice segmentation to better suit your application needs.
179
+
167
180
-**Silero v5 Performance**:
168
-
The performance of Silero model v5 may vary, and adjusting the thresholds might be necessary to achieve optimal results. There are also discussions on this topic, such as [this one](https://github.com/SYSTRAN/faster-whisper/issues/934#issuecomment-2439340290).
181
+
The performance of Silero model v5 may vary, and adjusting the thresholds might be necessary to achieve optimal results. There are also discussions on this topic, such as [this one](https://github.com/SYSTRAN/faster-whisper/issues/934#issuecomment-2439340290).
169
182
170
183
---
171
184
172
185
## Algorithm Explanation
173
186
174
187
### ONNX Runtime for Silero VAD
188
+
175
189
This library leverages **ONNX Runtime (C++)** to run the Silero VAD models efficiently. By utilizing ONNX Runtime, the library achieves high-performance inference across different platforms (iOS/macOS), ensuring fast and accurate voice activity detection.
176
190
177
191
### Why Use WebRTC's Audio Processing Module (APM)?
192
+
178
193
This library utilizes WebRTC's APM for several key reasons:
0 commit comments