From f7975b380fdcd03993eaf90119af700dc93f9ee1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?G=C3=B6kmen=20G=C3=B6rgen?= Date: Tue, 20 Jan 2026 11:17:49 +0100 Subject: [PATCH 1/7] aic v2 update. --- server/utilities/audio/aic-filter.mdx | 187 ++++++++++++++++++++++---- 1 file changed, 158 insertions(+), 29 deletions(-) diff --git a/server/utilities/audio/aic-filter.mdx b/server/utilities/audio/aic-filter.mdx index 2db340d5..3902e4ab 100644 --- a/server/utilities/audio/aic-filter.mdx +++ b/server/utilities/audio/aic-filter.mdx @@ -1,14 +1,18 @@ --- title: "AICFilter" -description: "Speech improvement using ai-coustics" +description: "Speech enhancement using ai-coustics" --- ## Overview -`AICFilter` is an audio processor that improves users speech by reducing background noise and improving speech clarity overall. It inherits from `BaseAudioFilter` and processes audio frames to improve audio quality. +`AICFilter` is an audio processor that enhances user speech by reducing background noise and improving speech clarity. It inherits from `BaseAudioFilter` and processes audio frames in real-time using ai-coustics' speech enhancement technology. To use AIC, you need a license key. Get started at [ai-coustics.com](https://ai-coustics.com/pipecat). + + This documentation covers **aic-sdk v2.x**. If you're using aic-sdk v1.x, please upgrade to v2 first. See the [Python 1.3 to 2.0 Migration Guide](https://docs.ai-coustics.com/guides/migrations/python-1-3-to-2-0#quick-migration-checklist) for details on API changes. + + ## Installation The AIC filter requires additional dependencies: @@ -20,81 +24,206 @@ pip install "pipecat-ai[aic]" ## Constructor Parameters - AIC license key + ai-coustics license key for authentication. Get your key at [developers.ai-coustics.io](https://developers.ai-coustics.io). + + + + Model identifier to download from CDN. Required if `model_path` is not provided. + See [artifacts.ai-coustics.io](https://artifacts.ai-coustics.io/) for available models. + + Examples: `"quail-vf-l-16khz"`, `"quail-s-16khz"`, `"quail-l-8khz"` - - Model + + Path to a local `.aicmodel` file. If provided, `model_id` is ignored and no download occurs. + Useful for offline deployments or custom models. - - Enhancement level + + Directory for downloading and caching models. Defaults to a cache directory in the user's home folder. + + +## Methods + +### create_vad_analyzer + +Creates an `AICVADAnalyzer` that uses the AIC model's built-in voice activity detection. + +```python +def create_vad_analyzer( + *, + speech_hold_duration: Optional[float] = None, + minimum_speech_duration: Optional[float] = None, + sensitivity: Optional[float] = None, +) -> AICVADAnalyzer +``` + +#### VAD Parameters + + + How long VAD continues detecting speech after audio signal ends (in seconds). + Range: `0.0` to `20x model window length`. Default: `0.05s` (50ms). - - Voice gain + + Minimum duration of speech required before VAD reports speech detected (in seconds). + Range: `0.0` to `20x model window length`. Default: `0.0s`. - - Enable noise gate + + Energy threshold sensitivity. Higher values make the detector less sensitive (require more energy to count as speech). + Range: `1.0` to `15.0`. Formula: `Energy threshold = 10 ** (-sensitivity)`. +### get_vad_context + +Returns the VAD context once the processor is initialized. Can be used to dynamically adjust VAD parameters at runtime. + +```python +vad_ctx = aic_filter.get_vad_context() +vad_ctx.set_parameter(VadParameter.Sensitivity, 8.0) +``` + ## Input Frames - Specific control frame to toggle filtering on/off + Control frame to toggle filtering on/off. ```python from pipecat.frames.frames import FilterEnableFrame -# Disable noise reduction +# Disable speech enhancement await task.queue_frame(FilterEnableFrame(False)) -# Re-enable noise reduction +# Re-enable speech enhancement await task.queue_frame(FilterEnableFrame(True)) ``` -## Usage Example +## Usage Examples + +### Basic Usage with AIC VAD + +The recommended approach is to use `AICFilter` with its built-in VAD analyzer: ```python from pipecat.audio.filters.aic_filter import AICFilter +from pipecat.transports.services.daily import DailyTransport, DailyParams + +# Create the AIC filter +aic_filter = AICFilter( + license_key=os.environ["AIC_SDK_LICENSE"], + model_id="quail-vf-l-16khz", +) +# Use AIC's integrated VAD transport = DailyTransport( room_url, token, - "Respond bot", + "Bot", DailyParams( - audio_in_filter=AICFilter(), # Enable AIC speech improvement audio_in_enabled=True, audio_out_enabled=True, - vad_analyzer=SileroVADAnalyzer(), + audio_in_filter=aic_filter, + vad_analyzer=aic_filter.create_vad_analyzer( + speech_hold_duration=0.05, + minimum_speech_duration=0.0, + sensitivity=6.0, + ), + ), +) +``` + +### Using a Local Model + +For offline deployments or when you want to manage model files yourself: + +```python +from pipecat.audio.filters.aic_filter import AICFilter + +aic_filter = AICFilter( + license_key=os.environ["AIC_SDK_LICENSE"], + model_path="/path/to/your/model.aicmodel", +) +``` + +### Custom Cache Directory + +Specify a custom directory for model downloads: + +```python +from pipecat.audio.filters.aic_filter import AICFilter + +aic_filter = AICFilter( + license_key=os.environ["AIC_SDK_LICENSE"], + model_id="quail-s-16khz", + model_download_dir="/opt/aic-models", +) +``` + +### With Other Transports + +The AIC filter works with any Pipecat transport: + +```python +from pipecat.audio.filters.aic_filter import AICFilter +from pipecat.transports.websocket import FastAPIWebsocketTransport, FastAPIWebsocketParams + +aic_filter = AICFilter( + license_key=os.environ["AIC_SDK_LICENSE"], + model_id="quail-vf-l-16khz", +) + +transport = FastAPIWebsocketTransport( + params=FastAPIWebsocketParams( + audio_in_enabled=True, + audio_out_enabled=True, + audio_in_filter=aic_filter, + vad_analyzer=aic_filter.create_vad_analyzer( + speech_hold_duration=0.05, + sensitivity=6.0, + ), ), ) ``` - See the [AIC filter - example](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/07zd-interruptible-aicoustics.py) - for a complete example. + See the [AIC filter example](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/07zd-interruptible-aicoustics.py) for a complete working example. +## Available Models + +Models are hosted at [artifacts.ai-coustics.io](https://artifacts.ai-coustics.io/). Common model options include: + +| Model ID | Sample Rate | Description | +|----------|-------------|-------------| +| `quail-vf-l-16khz` | 16kHz | Voice filtering, large model | +| `quail-l-16khz` | 16kHz | Large model | +| `quail-l-8khz` | 8kHz | Large model for telephony | +| `quail-s-16khz` | 16kHz | Small model for low latency | +| `quail-s-8khz` | 8kHz | Small model for telephony | + +Choose a model based on your sample rate requirements and latency constraints. + ## Audio Flow ```mermaid graph TD A[AudioRawFrame] --> B[AICFilter] - B[AICFilter] --> C[VAD] - C[VAD] --> D[STT] + B --> C[AICVADAnalyzer] + C --> D[STT] ``` +The AIC filter enhances audio before it reaches the VAD and STT stages, improving transcription accuracy in noisy environments. + ## Notes -- Requires ai-coustics license key -- Supports real-time audio processing -- Handles PCM_16 audio format +- Requires ai-coustics license key (get one at [developers.ai-coustics.io](https://developers.ai-coustics.io)) +- Models are automatically downloaded and cached on first use +- Supports real-time audio processing with low latency +- Handles PCM_16 audio format (int16 samples) - Thread-safe for pipeline processing -- Can be dynamically enabled/disabled -- Maintains audio quality while improving speech, including noise reduction -- Efficient processing for low latency +- Can be dynamically enabled/disabled via `FilterEnableFrame` +- Integrated VAD provides better accuracy than standalone VAD when using enhancement +- For available models, visit [artifacts.ai-coustics.io](https://artifacts.ai-coustics.io/) From e530e1d8d5fa0d6b9ab0d708e50db6dcbe7f6881 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?G=C3=B6kmen=20G=C3=B6rgen?= Date: Tue, 20 Jan 2026 13:10:25 +0100 Subject: [PATCH 2/7] Update server/utilities/audio/aic-filter.mdx Co-authored-by: Andres O. Vela --- server/utilities/audio/aic-filter.mdx | 1 + 1 file changed, 1 insertion(+) diff --git a/server/utilities/audio/aic-filter.mdx b/server/utilities/audio/aic-filter.mdx index 3902e4ab..1421d8a8 100644 --- a/server/utilities/audio/aic-filter.mdx +++ b/server/utilities/audio/aic-filter.mdx @@ -30,6 +30,7 @@ pip install "pipecat-ai[aic]" Model identifier to download from CDN. Required if `model_path` is not provided. See [artifacts.ai-coustics.io](https://artifacts.ai-coustics.io/) for available models. + See the [documentation](https://docs.ai-coustics.com/guides/models) for more detailed information about the models. Examples: `"quail-vf-l-16khz"`, `"quail-s-16khz"`, `"quail-l-8khz"` From 5e3806d200845a9a302b6963b168de74fa8e8b65 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?G=C3=B6kmen=20G=C3=B6rgen?= Date: Tue, 20 Jan 2026 13:10:37 +0100 Subject: [PATCH 3/7] Update server/utilities/audio/aic-filter.mdx Co-authored-by: Andres O. Vela --- server/utilities/audio/aic-filter.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/server/utilities/audio/aic-filter.mdx b/server/utilities/audio/aic-filter.mdx index 1421d8a8..8afc6b1d 100644 --- a/server/utilities/audio/aic-filter.mdx +++ b/server/utilities/audio/aic-filter.mdx @@ -1,6 +1,6 @@ --- title: "AICFilter" -description: "Speech enhancement using ai-coustics" +description: "Speech enhancement using ai-coustics' SDK" --- ## Overview From 6418b289f6d292d299e146e284bac3474f46ac82 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?G=C3=B6kmen=20G=C3=B6rgen?= Date: Tue, 20 Jan 2026 13:11:46 +0100 Subject: [PATCH 4/7] Update server/utilities/audio/aic-filter.mdx Co-authored-by: Andres O. Vela --- server/utilities/audio/aic-filter.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/server/utilities/audio/aic-filter.mdx b/server/utilities/audio/aic-filter.mdx index 8afc6b1d..5baabde1 100644 --- a/server/utilities/audio/aic-filter.mdx +++ b/server/utilities/audio/aic-filter.mdx @@ -68,7 +68,7 @@ def create_vad_analyzer( Minimum duration of speech required before VAD reports speech detected (in seconds). - Range: `0.0` to `20x model window length`. Default: `0.0s`. + Range: `0.0` to `1.0`. Default: `0.0s`. From 7383ef2a331a20c20cd85ea53d3fb39372b1de84 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?G=C3=B6kmen=20G=C3=B6rgen?= Date: Tue, 20 Jan 2026 13:16:50 +0100 Subject: [PATCH 5/7] rollback the control frame description. --- server/utilities/audio/aic-filter.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/server/utilities/audio/aic-filter.mdx b/server/utilities/audio/aic-filter.mdx index 5baabde1..fd2cbe70 100644 --- a/server/utilities/audio/aic-filter.mdx +++ b/server/utilities/audio/aic-filter.mdx @@ -88,7 +88,7 @@ vad_ctx.set_parameter(VadParameter.Sensitivity, 8.0) ## Input Frames - Control frame to toggle filtering on/off. + Specific control frame to toggle filtering on/off ```python from pipecat.frames.frames import FilterEnableFrame From 8c3cbc91ce000af6cd56315d540f868f2c912a03 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?G=C3=B6kmen=20G=C3=B6rgen?= Date: Tue, 20 Jan 2026 16:01:06 +0100 Subject: [PATCH 6/7] fix parameter types. --- server/utilities/audio/aic-filter.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/server/utilities/audio/aic-filter.mdx b/server/utilities/audio/aic-filter.mdx index fd2cbe70..75eb7034 100644 --- a/server/utilities/audio/aic-filter.mdx +++ b/server/utilities/audio/aic-filter.mdx @@ -23,11 +23,11 @@ pip install "pipecat-ai[aic]" ## Constructor Parameters - + ai-coustics license key for authentication. Get your key at [developers.ai-coustics.io](https://developers.ai-coustics.io). - + Model identifier to download from CDN. Required if `model_path` is not provided. See [artifacts.ai-coustics.io](https://artifacts.ai-coustics.io/) for available models. See the [documentation](https://docs.ai-coustics.com/guides/models) for more detailed information about the models. @@ -35,12 +35,12 @@ pip install "pipecat-ai[aic]" Examples: `"quail-vf-l-16khz"`, `"quail-s-16khz"`, `"quail-l-8khz"` - + Path to a local `.aicmodel` file. If provided, `model_id` is ignored and no download occurs. Useful for offline deployments or custom models. - + Directory for downloading and caching models. Defaults to a cache directory in the user's home folder. From 787badace6ad5416a94701b179a9bc8ac1802b05 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?G=C3=B6kmen=20G=C3=B6rgen?= Date: Tue, 20 Jan 2026 16:39:55 +0100 Subject: [PATCH 7/7] fix params, update descriptions of parameters. --- server/utilities/audio/aic-filter.mdx | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/server/utilities/audio/aic-filter.mdx b/server/utilities/audio/aic-filter.mdx index 75eb7034..1b0abb7f 100644 --- a/server/utilities/audio/aic-filter.mdx +++ b/server/utilities/audio/aic-filter.mdx @@ -60,20 +60,20 @@ def create_vad_analyzer( ``` #### VAD Parameters - - - How long VAD continues detecting speech after audio signal ends (in seconds). - Range: `0.0` to `20x model window length`. Default: `0.05s` (50ms). + + Controls for how long the VAD continues to detect speech after the audio signal no longer contains speech (in seconds). + Range: `0.0` to `20x model window length`, Default (in SDK): `0.05s` - - Minimum duration of speech required before VAD reports speech detected (in seconds). - Range: `0.0` to `1.0`. Default: `0.0s`. + + Controls for how long speech needs to be present in the audio signal before the VAD considers it speech (in seconds). + Range: `0.0` to `1.0`, Default (in SDK): `0.0s` - - Energy threshold sensitivity. Higher values make the detector less sensitive (require more energy to count as speech). - Range: `1.0` to `15.0`. Formula: `Energy threshold = 10 ** (-sensitivity)`. + + Controls the sensitivity (energy threshold) of the VAD. This value is used by the VAD as the threshold a speech audio signal's energy has to exceed in order to be considered speech. + Formula: `Energy threshold = 10 ** (-sensitivity)` + Range: `1.0` to `15.0`, Default (in SDK): `6.0` ### get_vad_context