diff --git a/server/utilities/audio/aic-filter.mdx b/server/utilities/audio/aic-filter.mdx index 2db340d5..1b0abb7f 100644 --- a/server/utilities/audio/aic-filter.mdx +++ b/server/utilities/audio/aic-filter.mdx @@ -1,14 +1,18 @@ --- title: "AICFilter" -description: "Speech improvement using ai-coustics" +description: "Speech enhancement using ai-coustics' SDK" --- ## Overview -`AICFilter` is an audio processor that improves users speech by reducing background noise and improving speech clarity overall. It inherits from `BaseAudioFilter` and processes audio frames to improve audio quality. +`AICFilter` is an audio processor that enhances user speech by reducing background noise and improving speech clarity. It inherits from `BaseAudioFilter` and processes audio frames in real-time using ai-coustics' speech enhancement technology. To use AIC, you need a license key. Get started at [ai-coustics.com](https://ai-coustics.com/pipecat). + + This documentation covers **aic-sdk v2.x**. If you're using aic-sdk v1.x, please upgrade to v2 first. See the [Python 1.3 to 2.0 Migration Guide](https://docs.ai-coustics.com/guides/migrations/python-1-3-to-2-0#quick-migration-checklist) for details on API changes. + + ## Installation The AIC filter requires additional dependencies: @@ -19,26 +23,68 @@ pip install "pipecat-ai[aic]" ## Constructor Parameters - - AIC license key + + ai-coustics license key for authentication. Get your key at [developers.ai-coustics.io](https://developers.ai-coustics.io). + + + + Model identifier to download from CDN. Required if `model_path` is not provided. + See [artifacts.ai-coustics.io](https://artifacts.ai-coustics.io/) for available models. + See the [documentation](https://docs.ai-coustics.com/guides/models) for more detailed information about the models. + + Examples: `"quail-vf-l-16khz"`, `"quail-s-16khz"`, `"quail-l-8khz"` - - Model + + Path to a local `.aicmodel` file. If provided, `model_id` is ignored and no download occurs. + Useful for offline deployments or custom models. - - Enhancement level + + Directory for downloading and caching models. Defaults to a cache directory in the user's home folder. + + +## Methods + +### create_vad_analyzer + +Creates an `AICVADAnalyzer` that uses the AIC model's built-in voice activity detection. + +```python +def create_vad_analyzer( + *, + speech_hold_duration: Optional[float] = None, + minimum_speech_duration: Optional[float] = None, + sensitivity: Optional[float] = None, +) -> AICVADAnalyzer +``` + +#### VAD Parameters + + Controls for how long the VAD continues to detect speech after the audio signal no longer contains speech (in seconds). + Range: `0.0` to `20x model window length`, Default (in SDK): `0.05s` - - Voice gain + + Controls for how long speech needs to be present in the audio signal before the VAD considers it speech (in seconds). + Range: `0.0` to `1.0`, Default (in SDK): `0.0s` - - Enable noise gate + + Controls the sensitivity (energy threshold) of the VAD. This value is used by the VAD as the threshold a speech audio signal's energy has to exceed in order to be considered speech. + Formula: `Energy threshold = 10 ** (-sensitivity)` + Range: `1.0` to `15.0`, Default (in SDK): `6.0` +### get_vad_context + +Returns the VAD context once the processor is initialized. Can be used to dynamically adjust VAD parameters at runtime. + +```python +vad_ctx = aic_filter.get_vad_context() +vad_ctx.set_parameter(VadParameter.Sensitivity, 8.0) +``` + ## Input Frames @@ -47,54 +93,138 @@ pip install "pipecat-ai[aic]" ```python from pipecat.frames.frames import FilterEnableFrame -# Disable noise reduction +# Disable speech enhancement await task.queue_frame(FilterEnableFrame(False)) -# Re-enable noise reduction +# Re-enable speech enhancement await task.queue_frame(FilterEnableFrame(True)) ``` -## Usage Example +## Usage Examples + +### Basic Usage with AIC VAD + +The recommended approach is to use `AICFilter` with its built-in VAD analyzer: ```python from pipecat.audio.filters.aic_filter import AICFilter +from pipecat.transports.services.daily import DailyTransport, DailyParams +# Create the AIC filter +aic_filter = AICFilter( + license_key=os.environ["AIC_SDK_LICENSE"], + model_id="quail-vf-l-16khz", +) + +# Use AIC's integrated VAD transport = DailyTransport( room_url, token, - "Respond bot", + "Bot", DailyParams( - audio_in_filter=AICFilter(), # Enable AIC speech improvement audio_in_enabled=True, audio_out_enabled=True, - vad_analyzer=SileroVADAnalyzer(), + audio_in_filter=aic_filter, + vad_analyzer=aic_filter.create_vad_analyzer( + speech_hold_duration=0.05, + minimum_speech_duration=0.0, + sensitivity=6.0, + ), + ), +) +``` + +### Using a Local Model + +For offline deployments or when you want to manage model files yourself: + +```python +from pipecat.audio.filters.aic_filter import AICFilter + +aic_filter = AICFilter( + license_key=os.environ["AIC_SDK_LICENSE"], + model_path="/path/to/your/model.aicmodel", +) +``` + +### Custom Cache Directory + +Specify a custom directory for model downloads: + +```python +from pipecat.audio.filters.aic_filter import AICFilter + +aic_filter = AICFilter( + license_key=os.environ["AIC_SDK_LICENSE"], + model_id="quail-s-16khz", + model_download_dir="/opt/aic-models", +) +``` + +### With Other Transports + +The AIC filter works with any Pipecat transport: + +```python +from pipecat.audio.filters.aic_filter import AICFilter +from pipecat.transports.websocket import FastAPIWebsocketTransport, FastAPIWebsocketParams + +aic_filter = AICFilter( + license_key=os.environ["AIC_SDK_LICENSE"], + model_id="quail-vf-l-16khz", +) + +transport = FastAPIWebsocketTransport( + params=FastAPIWebsocketParams( + audio_in_enabled=True, + audio_out_enabled=True, + audio_in_filter=aic_filter, + vad_analyzer=aic_filter.create_vad_analyzer( + speech_hold_duration=0.05, + sensitivity=6.0, + ), ), ) ``` - See the [AIC filter - example](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/07zd-interruptible-aicoustics.py) - for a complete example. + See the [AIC filter example](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/07zd-interruptible-aicoustics.py) for a complete working example. +## Available Models + +Models are hosted at [artifacts.ai-coustics.io](https://artifacts.ai-coustics.io/). Common model options include: + +| Model ID | Sample Rate | Description | +|----------|-------------|-------------| +| `quail-vf-l-16khz` | 16kHz | Voice filtering, large model | +| `quail-l-16khz` | 16kHz | Large model | +| `quail-l-8khz` | 8kHz | Large model for telephony | +| `quail-s-16khz` | 16kHz | Small model for low latency | +| `quail-s-8khz` | 8kHz | Small model for telephony | + +Choose a model based on your sample rate requirements and latency constraints. + ## Audio Flow ```mermaid graph TD A[AudioRawFrame] --> B[AICFilter] - B[AICFilter] --> C[VAD] - C[VAD] --> D[STT] + B --> C[AICVADAnalyzer] + C --> D[STT] ``` +The AIC filter enhances audio before it reaches the VAD and STT stages, improving transcription accuracy in noisy environments. + ## Notes -- Requires ai-coustics license key -- Supports real-time audio processing -- Handles PCM_16 audio format +- Requires ai-coustics license key (get one at [developers.ai-coustics.io](https://developers.ai-coustics.io)) +- Models are automatically downloaded and cached on first use +- Supports real-time audio processing with low latency +- Handles PCM_16 audio format (int16 samples) - Thread-safe for pipeline processing -- Can be dynamically enabled/disabled -- Maintains audio quality while improving speech, including noise reduction -- Efficient processing for low latency +- Can be dynamically enabled/disabled via `FilterEnableFrame` +- Integrated VAD provides better accuracy than standalone VAD when using enhancement +- For available models, visit [artifacts.ai-coustics.io](https://artifacts.ai-coustics.io/)