Skip to content

Commit 98a9408

Browse files
onVoiceDidContinue コールバックの追加とドキュメント更新
1 parent a04fb66 commit 98a9408

7 files changed

Lines changed: 200 additions & 129 deletions

File tree

README.md

Lines changed: 43 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,14 @@ A real-time **Voice Activity Detection (VAD)** library for **Android** using **S
1111
**Customizable audio sample rates (8, 16, 24, 48 kHz)**
1212
**Outputs WAV data with automatic sample rate conversion to 16 kHz**
1313
**Lightweight and optimized for Android**
14-
**Available via JitPack**
14+
**Available via JitPack**
1515

1616
---
1717

1818
## **Sample Android App Demo**
1919

2020
Check out the sample Android app demonstrating real-time VAD:
2121

22-
2322
[Sample Android App Demo](https://github.com/user-attachments/assets/bb66e388-b0b9-4294-8e59-322b9f65ec4a)
2423

2524
---
@@ -29,6 +28,7 @@ Check out the sample Android app demonstrating real-time VAD:
2928
### **Using JitPack**
3029

3130
1. **Add JitPack to `settings.gradle.kts`**
31+
3232
```kotlin
3333
dependencyResolutionManagement {
3434
repositories {
@@ -40,9 +40,10 @@ dependencyResolutionManagement {
4040
```
4141

4242
2. **Add the dependency to `app/build.gradle.kts`**
43+
4344
```kotlin
4445
dependencies {
45-
implementation("com.github.helloooideeeeea:RealTimeCutVADLibraryForAndroid:1.0.2@aar")
46+
implementation("com.github.helloooideeeeea:RealTimeCutVADLibraryForAndroid:1.0.3@aar")
4647
}
4748
```
4849

@@ -51,6 +52,7 @@ dependencies {
5152
## **Usage**
5253

5354
### **1. Initialize VAD in `MainActivity`**
55+
5456
```kotlin
5557
import io.codeconcept.realtimecutvadlibrary.VADWrapper
5658
import android.os.Bundle
@@ -62,7 +64,7 @@ class MainActivity : AppCompatActivity() {
6264

6365
override fun onCreate(savedInstanceState: Bundle?) {
6466
super.onCreate(savedInstanceState)
65-
67+
6668
// Initialize VAD Wrapper
6769
vadWrapper = VADWrapper(this)
6870
vadWrapper?.setVADModel(VADWrapper.SileroModelVersion.V5)
@@ -88,6 +90,10 @@ class MainActivity : AppCompatActivity() {
8890
override fun onVoiceEnd(wavData: ByteArray?) {
8991
Log.d("VAD", "✅ onVoiceEnd() called. wavData length: ${wavData?.size ?: 0}")
9092
}
93+
94+
override fun onVoiceDidContinue(pcmFloatData: ByteArray?) {
95+
// Use this only if you need real-time VAD-detected PCM float frames.
96+
}
9197
})
9298
}
9399

@@ -99,10 +105,12 @@ class MainActivity : AppCompatActivity() {
99105
```
100106

101107
### **2. Understanding `setVADCallback`**
108+
102109
`setVADCallback` is used to register a callback that gets notified when voice activity starts or ends.
103110

104111
- `onVoiceStart()`: Triggered when voice is detected.
105112
- `onVoiceEnd(wavData: ByteArray?)`: Triggered when voice stops, providing a WAV file as a byte array.
113+
- `onVoiceDidContinue(pcmFloatData: ByteArray?)`: Triggered during speech, providing real-time PCM float frames. Use this only if you need real-time audio data while speech is in progress.
106114

107115
This enables real-time processing of voice input, allowing applications to act on detected speech events.
108116

@@ -111,14 +119,16 @@ This enables real-time processing of voice input, allowing applications to act o
111119
## Configuration Options
112120

113121
### Sample Rates
122+
114123
You can set the audio sample rate using `setSamplerate`:
115124

116-
- `.SAMPLERATE_8` (8 kHz)
125+
- `.SAMPLERATE_8` (8 kHz)
117126
- `.SAMPLERATE_16` (16 kHz)
118127
- `.SAMPLERATE_24` (24 kHz)
119128
- `.SAMPLERATE_48` (48 kHz)
120129

121130
### Silero Model Versions
131+
122132
Choose between Silero model versions:
123133

124134
- `.v4` - Silero Model Version 4
@@ -142,39 +152,44 @@ vadWrapper.setVADThreshold(0.7F, 0.7F, 0.5F, 0.95F, 10, 57)
142152
```
143153

144154
### **Threshold Explanation**
155+
145156
- **Start detection probability threshold (0.7)**: The VAD model must predict speech probability above this threshold to trigger voice start.
146157
- **End detection probability threshold (0.7)**: The VAD model must predict speech probability below this threshold to trigger voice end.
147158
- **True positive ratio for voice start (0.5)**: 50% of frames in a given window must be speech for voice activity to begin.
148159
- **False positive ratio for voice end (0.95)**: 95% of frames in a given window must be silence for voice activity to end.
149160
- **Start frame count (10 frames ≈ 0.32s)**: Number of frames required to confirm voice activity.
150161
- **End frame count (57 frames ≈ 1.824s)**: Number of frames required to confirm silence before stopping voice detection.
151162

152-
153163
#### **Important Notes:**
164+
154165
- **Stricter VAD Detection in Silero v5**:
155-
Based on observations, Silero v5 appears to apply a stricter VAD detection mechanism compared to v4.
166+
Based on observations, Silero v5 appears to apply a stricter VAD detection mechanism compared to v4.
156167

157168
- **Differences in Speech Start Detection**:
158-
In Silero v4, speech is considered to have started if, within 10 frames (0.32s), **80%** of the frames exceed a VAD probability of 70%.
159-
In Silero v5, this condition is relaxed, and speech is considered started if **50%** of the frames within 10 frames (0.32s) exceed a VAD probability of 70%.
160-
Adjusting Sensitivity for Voice Activity Detection
161-
If you need to fine-tune the sensitivity of voice segmentation, use the following function to customize the thresholds:
169+
In Silero v4, speech is considered to have started if, within 10 frames (0.32s), **80%** of the frames exceed a VAD probability of 70%.
170+
In Silero v5, this condition is relaxed, and speech is considered started if **50%** of the frames within 10 frames (0.32s) exceed a VAD probability of 70%.
171+
Adjusting Sensitivity for Voice Activity Detection
172+
If you need to fine-tune the sensitivity of voice segmentation, use the following function to customize the thresholds:
162173

163174
```java
164175
vadWrapper?.setVADThreshold(0.7F, 0.7F, 0.5F, 0.95F, 10, 57)
165176
```
177+
166178
By adjusting these parameters, you can fine-tune the strictness of voice segmentation to better suit your application needs.
179+
167180
- **Silero v5 Performance**:
168-
The performance of Silero model v5 may vary, and adjusting the thresholds might be necessary to achieve optimal results. There are also discussions on this topic, such as [this one](https://github.com/SYSTRAN/faster-whisper/issues/934#issuecomment-2439340290).
181+
The performance of Silero model v5 may vary, and adjusting the thresholds might be necessary to achieve optimal results. There are also discussions on this topic, such as [this one](https://github.com/SYSTRAN/faster-whisper/issues/934#issuecomment-2439340290).
169182

170183
---
171184

172185
## Algorithm Explanation
173186

174187
### ONNX Runtime for Silero VAD
188+
175189
This library leverages **ONNX Runtime (C++)** to run the Silero VAD models efficiently. By utilizing ONNX Runtime, the library achieves high-performance inference across different platforms (iOS/macOS), ensuring fast and accurate voice activity detection.
176190

177191
### Why Use WebRTC's Audio Processing Module (APM)?
192+
178193
This library utilizes WebRTC's APM for several key reasons:
179194

180195
- **High-pass Filtering**: Removes low-frequency noise.
@@ -186,15 +201,18 @@ This library utilizes WebRTC's APM for several key reasons:
186201

187202
1. **Input Audio Configuration**: The library supports sample rates of 8 kHz, 16 kHz, 24 kHz, and 48 kHz.
188203
2. **Audio Preprocessing**:
204+
189205
- The audio is split into chunks based on the sample rate.
190206
- APM processes these chunks with filters and gain adjustments.
191207
- Audio is converted to 16 kHz for Silero VAD compatibility.
192208

193209
3. **Voice Activity Detection**:
210+
194211
- The processed audio chunks are passed to Silero VAD.
195212
- VAD outputs a probability score indicating voice activity.
196213

197214
4. **Algorithm for Voice Detection**:
215+
198216
- **Voice Start Detection**: When the VAD probability exceeds the threshold, a pre-buffer stores audio frames to capture speech onset.
199217
- **Voice End Detection**: Once silence is detected over a set number of frames, recording stops, and the audio is output as WAV data.
200218

@@ -218,26 +236,28 @@ config.voice_detection.enabled = false;
218236
---
219237

220238
## **Additional Resources**
239+
221240
- **[RealTimeCutVADCXXLibrary](https://github.com/helloooideeeeea/RealTimeCutVADCXXLibrary)**
222241

223242
---
224243

225244
## **License**
245+
226246
This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.
227247

228248
---
229249

230250
## **📌 Summary**
231-
| Feature | Details |
232-
|---------|---------|
233-
| **Library Name** | `RealTimeCutVADLibrary` |
234-
| **Platform** | Android |
235-
| **Voice Detection** | Real-time |
236-
| **Supported Models** | Silero v4 & v5 |
237-
| **Sample Rates** | 8kHz, 16kHz, 24kHz, 48kHz |
238-
| **Output Format** | WAV (16 kHz) |
239-
| **Noise Reduction** | WebRTC APM |
240-
| **Installation** | JitPack (`implementation` via Gradle) |
241251

242-
🚀 **Now you can add real-time voice activity detection to your Android app with ease!** 🎉
252+
| Feature | Details |
253+
| -------------------- | ------------------------------------- |
254+
| **Library Name** | `RealTimeCutVADLibrary` |
255+
| **Platform** | Android |
256+
| **Voice Detection** | Real-time |
257+
| **Supported Models** | Silero v4 & v5 |
258+
| **Sample Rates** | 8kHz, 16kHz, 24kHz, 48kHz |
259+
| **Output Format** | WAV (16 kHz) |
260+
| **Noise Reduction** | WebRTC APM |
261+
| **Installation** | JitPack (`implementation` via Gradle) |
243262

263+
🚀 **Now you can add real-time voice activity detection to your Android app with ease!** 🎉

app/src/main/java/io/codeconcept/realtimecutvadsampleapp/MainActivity.kt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,10 @@ fun startVADProcessing(
181181
Log.d("VAD", "✅ onVoiceEnd() called. wavData length: ${wavData?.size ?: 0}")
182182
onStatusChange(RecordingStatus.RUNNING, wavData) // 🔹 waveAudioData を渡す
183183
}
184+
185+
override fun onVoiceDidContinue(pcmFloatData: ByteArray?) {
186+
// Use this only if you need real-time VAD-detected PCM float frames.
187+
}
184188
})
185189

186190
val audioRecord = AudioRecord(

realtimecutvadlibrary/build.gradle.kts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ val jniLibsZip = file("${projectDir}/RealTimeCutVADCXXLibrary.jniLibs.zip")
8080
tasks.register("downloadJniLibs") {
8181
doLast {
8282
if (!jniLibsZip.exists()) {
83-
val url = URI("https://github.com/helloooideeeeea/RealTimeCutVADCXXLibrary/releases/download/v1.0.2/RealTimeCutVADCXXLibrary.jniLibs.zip").toURL()
83+
val url = URI("https://github.com/helloooideeeeea/RealTimeCutVADLibraryForXCFramework/releases/download/v1.0.7/jniLibs.zip").toURL()
8484
println("Downloading jniLibs from $url")
8585

8686
url.openStream().use { input ->
@@ -114,7 +114,7 @@ afterEvaluate {
114114

115115
groupId = "com.github.helloooideeeeea"
116116
artifactId = "realtimecutvadlibrary"
117-
version = "1.0.2"
117+
version = "1.0.3"
118118

119119
pom {
120120
name.set("RealTimeCutVADLibrary")

realtimecutvadlibrary/src/main/cpp/c_wrapper.h

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,18 @@ typedef void* VADInstanceHandle;
1919

2020
typedef void (*VoiceStartCallback)(void* context);
2121
typedef void (*VoiceEndCallback)(void* context, const uint8_t* wav_data, size_t wav_size);
22+
typedef void (*VoiceDidContinueCallback)(void* context, const uint8_t* pcm_float_data, size_t data_size);
2223

2324
// インスタンスの作成と破棄
2425
VADInstanceHandle create_vad_instance();
2526
void destroy_vad_instance(VADInstanceHandle instance);
2627

2728
// コールバックの設定
28-
void set_vad_callback(VADInstanceHandle instance, void* context,
29-
VoiceStartCallback start_cb, VoiceEndCallback end_cb);
29+
void set_vad_callback(VADInstanceHandle instance,
30+
void* context,
31+
VoiceStartCallback start_cb,
32+
VoiceEndCallback end_cb,
33+
VoiceDidContinueCallback continue_cb);
3034

3135
// パラメータ設定
3236
void set_vad_sample_rate(VADInstanceHandle instance, int sample_rate);

0 commit comments

Comments
 (0)