ASR tech debt cleanup: remove dead code, fix bugs, add benchmark script by Alex-Wengg · Pull Request #460 · FluidInference/FluidAudio

Alex-Wengg · 2026-03-28T22:19:16Z

Summary

Systematic cleanup of the ASR module addressing tech debt items from #457. Net reduction of ~430 lines while fixing real bugs and improving maintainability.

Bug fixes

enableFP16 silently ignored — optimizedConfiguration(enableFP16:) delegated to a shared factory that hardcoded allowLowPrecisionAccumulationOnGPU = true, ignoring the caller's parameter
MLArrayCache.returnArray only reset float32 data — cached arrays of other types (float16, int32) retained stale data from previous use
CTC model auto-detection broken — Repo.parakeetCtc110m.folderName returned "parakeet-ctc-110m" instead of "parakeet-ctc-110m-coreml" because the folderName switch fell through to a default case that stripped the -coreml suffix. Same for parakeetCtc06b.
Duplicate tokens at chunk merge boundary — mergeByMidpoint used <=/>= so tokens exactly at the cutoff appeared in both left and right chunks

Dead code removal

Deleted ANEOptimizer indirection layer (166 lines) — was a pass-through wrapping MLModel with no optimization
Deleted PerformanceMonitor actor and AggregatedMetrics — never instantiated, component times hardcoded to 0
Deleted getFloat16Array from MLArrayCache — never called
Deleted sliceEncoderOutput from AsrTranscription — never called (30 lines)
Deleted loadWithANEOptimization from AsrModels — never called
Removed unused tokenTimings parameter chain through processTranscriptionResult
Removed unused import OSLog / import CoreML across 5 files
Removed nonisolated(unsafe) from SlidingWindowAsrManager (types already Sendable)

Duplication elimination

Extracted clearCachedCtcData() helper (replaced 3× triple-nil assignments)
Extracted decoderState(for:) / setDecoderState(_:for:) (replaced 4× switch blocks)
Extracted frameAlignedAudio() (replaced 2× duplicated frame-alignment blocks)
Added ASRConstants.secondsPerEncoderFrame (replaced 5× magic 0.08)
Replaced hardcoded 16_000 with config.sampleRate / ASRConstants.sampleRate
Extracted MLModelConfigurationUtils.defaultConfiguration() (replaced 5× copy-pasted config methods)
Extracted MLModelConfigurationUtils.defaultModelsDirectory() (replaced 3× copy-pasted directory methods)
Consolidated duplicate vocabularyFile / vocabularyFileArray constants

File organization

Moved PerformanceMetrics.swift, ProgressEmitter.swift, MLArrayCache.swift from ASR/Parakeet/ to Shared/ (used by multiple modules)
Renamed StreamingAudioSourceFactory → AudioSourceFactory, StreamingAudioSampleSource → AudioSampleSource (types used by both ASR and Diarizer)
Renamed files to match type names: SortformerDiarizerPipeline.swift → SortformerDiarizer.swift, LSEENDDiarizerAPI.swift → LSEENDDiarizer.swift, NemotronPipeline.swift → NemotronStreamingAsrManager+Pipeline.swift
Replaced force unwraps in RnntDecoder.swift with guard let + descriptive errors
Removed stale TODO about decoder state in AsrManager

Benchmark script

Added Scripts/run_parakeet_benchmarks.sh — runs all 6 benchmarks (v3, v2, TDT-CTC-110M, CTC earnings, EOU 320ms, Nemotron 1120ms) with WER comparison against benchmarks100.md baselines and regression detection
Referenced from Documentation/ASR/benchmarks100.md

Verified — no regressions

Model                       Baseline    Current      Delta
Parakeet TDT v3 (0.6B)          2.6%      2.64%     +0.04%
Parakeet TDT v2 (0.6B)          3.8%      3.79%     -0.01%
CTC-TDT 110M                    3.6%      3.56%     -0.04%
CTC Earnings                  16.54%     16.51%     -0.03%
EOU 320ms (120M)               7.11%      7.11%     +0.00%
Nemotron 1120ms (0.6B)         1.99%      1.99%     +0.00%

Test plan

swift build passes
swift test passes (all existing tests, updated for removed dead code)
All 6 ASR benchmarks match baselines (100 files each)
swift format lint passes

…mingChunk AsrManager is the low-level inference engine and shouldn't own vocabulary boosting state or carry streaming semantics. This removes the duplicated vocabulary config/rescoring from AsrManager, leaving SlidingWindowAsrManager as the single owner. CTC head inference and cached logit storage remain in AsrManager since they need encoder_output MLMultiArray access. Closes #457

claude · 2026-03-28T22:19:37Z

Claude finished @Alex-Wengg's task —— View job

PR Review: AsrManager vocabulary boosting refactor ✅

I've completed a comprehensive review of this PR focusing on code quality, architecture, and potential issues.

Review Tasks:

Create pending review
Get PR diff to understand changes
Analyze code changes for quality and architecture
Add inline comments for specific issues
Submit comprehensive review

Review Summary

This is a well-executed refactoring that significantly improves the codebase architecture:

✅ Key Strengths

Clean separation of concerns: AsrManager → low-level inference, SlidingWindowAsrManager → vocabulary boosting
Eliminates code duplication: 203 lines removed, single source of truth for vocabulary features
Semantically correct rename: transcribeStreamingChunk → transcribeChunk removes misleading streaming semantics
Performance preservation: Smart retention of getCachedCtcRawLogits() for benchmarks
Zero API breakage: Public interface unchanged for end users

📊 Code Quality Rating: ★★★★★

Architecture: Excellent separation of responsibilities
API Design: Clean, maintains compatibility
Memory Management: Proper cleanup, no leaks
Maintainability: Single ownership model is much cleaner

🧪 Testing Status

✅ Build, tests, formatting all pass
⏳ Recommendation: Complete ASR + CTC benchmark validation before merge

Overall: APPROVE - This refactoring improves maintainability while preserving functionality and performance.

claude

This is a well-executed refactoring that improves the codebase architecture. Here's my comprehensive review:

✅ Excellent Changes

1. Clear Separation of Concerns

AsrManager: Now focused purely on low-level inference (appropriate)
SlidingWindowAsrManager: Single owner of vocabulary boosting (appropriate)
This follows the Single Responsibility Principle and eliminates code duplication

2. Method Rename is Semantically Correct

transcribeStreamingChunk → transcribeChunk removes misleading "streaming" semantics
AsrManager is the stateless inference engine, not a streaming component
The method preserves decoder state between calls, which is what matters

3. Smart Preservation of Performance Features

Correctly keeps getCachedCtcRawLogits() in AsrManager
CTC logit caching requires direct access to encoder_output MLMultiArray
This ensures benchmarks can still access raw logits efficiently

🔍 Code Quality Analysis

Architecture Quality: ★★★★★

Clean elimination of 59 lines from AsrManager (vocabulary state + methods)
143 lines removed from AsrTranscription.swift (duplicate rescoring logic)
Single method call change in SlidingWindowAsrManager maintains functionality

API Design: ★★★★★

Public API remains unchanged for end users
Internal refactoring doesn't break existing consumers
Method renaming improves API clarity without functional changes

Memory Management: ★★★★★

Proper cleanup in resetState() and cleanup()
Removes vocabulary boosting cleanup call since state is no longer held

🧪 Testing & Validation

The test plan looks comprehensive:

✅ Build compiles cleanly
✅ All tests pass
✅ No swift-format violations
✅ No dangling references to old methods
⏳ Pending: ASR benchmark WER validation
⏳ Pending: CTC earnings benchmark validation

📝 Minor Observations

Variable Declaration Style

Good change from var result = ... to let result = ... since vocabulary rescoring is removed
Shows the code is now more functional (immutable results)

Documentation Impact

The removed methods had good documentation - ensure equivalent docs exist in SlidingWindowAsrManager

🚀 Recommendation: APPROVE

This refactoring:

Improves maintainability by eliminating duplicate code paths
Clarifies responsibilities between inference engine vs. high-level features
Maintains performance by keeping critical optimizations in place
Preserves functionality while improving code organization

The only remaining items are benchmark validations, which are appropriate to verify no regressions were introduced.

Nit: Consider running the benchmarks mentioned in the test plan to complete the validation before merging.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

github-actions · 2026-03-28T22:23:02Z

PocketTTS Smoke Test ✅

Check	Result
Build	✅
Model download	✅
Model load	✅
Synthesis pipeline	✅
Output WAV	✅ (183.8 KB)

_{Runtime: 0m31s}

_{Note: PocketTTS uses CoreML MLState (macOS 15) KV cache + Mimi streaming state. CI VM lacks physical GPU — audio quality may differ from Apple Silicon.}

github-actions · 2026-03-28T22:26:13Z

Qwen3-ASR int8 Smoke Test ✅

Check	Result
Build	✅
Model download	✅
Model load	✅
Transcription pipeline	✅
Decoder size	571 MB (vs 1.1 GB f32)

Performance Metrics

Metric	CI Value	Expected on Apple Silicon
Median RTFx	0.05x	~2.5x
Overall RTFx	0.05x	~2.5x

_{Runtime: 4m15s}

_{Note: CI VM lacks physical GPU — CoreML MLState (macOS 15) KV cache produces degraded results on virtualized runners. On Apple Silicon: ~1.3% WER / 2.5x RTFx.}

github-actions · 2026-03-28T22:27:07Z

Sortformer High-Latency Benchmark Results

ES2004a Performance (30.4s latency config)

Metric	Value	Target	Status
DER	33.4%	<35%	✅
Miss Rate	24.4%	-	-
False Alarm	0.2%	-	-
Speaker Error	8.8%	-	-
RTFx	11.6x	>1.0x	✅
Speakers	4/4	-	-

_{Sortformer High-Latency • ES2004a • Runtime: 2m 23s • 2026-03-29T01:38:01.340Z}

github-actions · 2026-03-28T22:28:47Z

ASR Benchmark Results ✅

Status: All benchmarks passed

Parakeet v3 (multilingual)

Dataset	WER Avg	WER Med	RTFx	Status
test-clean	0.57%	0.00%	5.89x	✅
test-other	1.19%	0.00%	3.78x	✅

Parakeet v2 (English-optimized)

Dataset	WER Avg	WER Med	RTFx	Status
test-clean	0.80%	0.00%	4.99x	✅
test-other	1.00%	0.00%	3.17x	✅

Streaming (v3)

Metric	Value	Description
WER	0.00%	Word Error Rate in streaming mode
RTFx	0.67x	Streaming real-time factor
Avg Chunk Time	1.312s	Average time to process each chunk
Max Chunk Time	1.495s	Maximum chunk processing time
First Token	1.587s	Latency to first transcription token
Total Chunks	31	Number of chunks processed

Streaming (v2)

Metric	Value	Description
WER	0.00%	Word Error Rate in streaming mode
RTFx	0.53x	Streaming real-time factor
Avg Chunk Time	1.721s	Average time to process each chunk
Max Chunk Time	2.345s	Maximum chunk processing time
First Token	1.655s	Latency to first transcription token
Total Chunks	31	Number of chunks processed

_{Streaming tests use 5 files with 0.5s chunks to simulate real-time audio streaming}

_{25 files per dataset • Test runtime: 6m4s • 03/28/2026, 09:36 PM EST}

_{RTFx = Real-Time Factor (higher is better) • Calculated as: Total audio duration ÷ Total processing time
Processing time includes: Model inference on Apple Neural Engine, audio preprocessing, state resets between files, token-to-text conversion, and file I/O
Example: RTFx of 2.0x means 10 seconds of audio processed in 5 seconds (2x faster than real-time)}

Expected RTFx Performance on Physical M1 Hardware:

• M1 Mac: ~28x (clean), ~25x (other)
• CI shows ~0.5-3x due to virtualization limitations

_{Testing methodology follows HuggingFace Open ASR Leaderboard}

github-actions · 2026-03-28T22:31:10Z

Speaker Diarization Benchmark Results

Speaker Diarization Performance

Evaluating "who spoke when" detection accuracy

Metric	Value	Target	Status	Description
DER	15.1%	<30%	✅	Diarization Error Rate (lower is better)
JER	24.9%	<25%	✅	Jaccard Error Rate
RTFx	16.14x	>1.0x	✅	Real-Time Factor (higher is faster)

Diarization Pipeline Timing Breakdown

Time spent in each stage of speaker diarization

Stage	Time (s)	%	Description
Model Download	12.934	19.9	Fetching diarization models
Model Compile	5.543	8.5	CoreML compilation
Audio Load	0.171	0.3	Loading audio file
Segmentation	19.494	30.0	Detecting speech regions
Embedding	32.491	50.0	Extracting speaker voices
Clustering	12.996	20.0	Grouping same speakers
Total	65.032	100	Full pipeline

Speaker Diarization Research Comparison

Research baselines typically achieve 18-30% DER on standard datasets

Method	DER	Notes
FluidAudio	15.1%	On-device CoreML
Research baseline	18-30%	Standard dataset performance

Note: RTFx shown above is from GitHub Actions runner. On Apple Silicon with ANE:

M2 MacBook Air (2022): Runs at 150 RTFx real-time
Performance scales with Apple Neural Engine capabilities

_{🎯 Speaker Diarization Test • AMI Corpus ES2004a • 1049.0s meeting audio • 65.0s diarization time • Test runtime: 3m 14s • 03/28/2026, 09:42 PM EST}

github-actions · 2026-03-28T22:33:37Z

VAD Benchmark Results

Performance Comparison

Dataset	Accuracy	Precision	Recall	F1-Score	RTFx	Files
MUSAN	92.0%	86.2%	100.0%	92.6%	723.6x faster	50
VOiCES	92.0%	86.2%	100.0%	92.6%	695.9x faster	50

Dataset Details

MUSAN: Music, Speech, and Noise dataset - standard VAD evaluation
VOiCES: Voices Obscured in Complex Environmental Settings - tests robustness in real-world conditions

✅: Average F1-Score above 70%

github-actions · 2026-03-28T22:33:50Z

Parakeet EOU Benchmark Results ✅

Status: Benchmark passed
Chunk Size: 320ms
Files Tested: 100/100

Performance Metrics

Metric	Value	Description
WER (Avg)	7.03%	Average Word Error Rate
WER (Med)	4.17%	Median Word Error Rate
RTFx	6.79x	Real-time factor (higher = faster)
Total Audio	470.6s	Total audio duration processed
Total Time	73.0s	Total processing time

Streaming Metrics

Metric	Value	Description
Avg Chunk Time	0.073s	Average chunk processing time
Max Chunk Time	0.146s	Maximum chunk processing time
EOU Detections	0	Total End-of-Utterance detections

_{Test runtime: 1m43s • 03/28/2026, 09:34 PM EST}

_{RTFx = Real-Time Factor (higher is better) • Processing includes: Model inference, audio preprocessing, state management, and file I/O}

github-actions · 2026-03-28T22:34:07Z

Offline VBx Pipeline Results

Speaker Diarization Performance (VBx Batch Mode)

Optimal clustering with Hungarian algorithm for maximum accuracy

Metric	Value	Target	Status	Description
DER	14.5%	<20%	✅	Diarization Error Rate (lower is better)
RTFx	3.86x	>1.0x	✅	Real-Time Factor (higher is faster)

Offline VBx Pipeline Timing Breakdown

Time spent in each stage of batch diarization

Stage	Time (s)	%	Description
Model Download	14.305	5.3	Fetching diarization models
Model Compile	6.131	2.3	CoreML compilation
Audio Load	0.106	0.0	Loading audio file
Segmentation	28.320	10.4	VAD + speech detection
Embedding	270.530	99.6	Speaker embedding extraction
Clustering (VBx)	0.797	0.3	Hungarian algorithm + VBx clustering
Total	271.523	100	Full VBx pipeline

Speaker Diarization Research Comparison

Offline VBx achieves competitive accuracy with batch processing

Method	DER	Mode	Description
FluidAudio (Offline)	14.5%	VBx Batch	On-device CoreML with optimal clustering
FluidAudio (Streaming)	17.7%	Chunk-based	First-occurrence speaker mapping
Research baseline	18-30%	Various	Standard dataset performance

Pipeline Details:

Mode: Offline VBx with Hungarian algorithm for optimal speaker-to-cluster assignment
Segmentation: VAD-based voice activity detection
Embeddings: WeSpeaker-compatible speaker embeddings
Clustering: PowerSet with VBx refinement
Accuracy: Higher than streaming due to optimal post-hoc mapping

_{🎯 Offline VBx Test • AMI Corpus ES2004a • 1049.0s meeting audio • 299.6s processing • Test runtime: 5m 0s • 03/28/2026, 09:46 PM EST}

- Remove deprecated calculateStartFrameOffset and its tests - Add explicit parakeetTdtCtc110m case to Repo.folderName - Extract duplicated defaultConfiguration() and defaultModelsDirectory() into shared MLModelConfigurationUtils, replacing 5+3 copy-pasted methods - Rename StreamingAudioSourceFactory/SampleSource/Error to drop misleading "Streaming" prefix (types are used by both ASR and Diarizer) - Rename files to match their type names (SortformerDiarizer, LSEENDDiarizer, NemotronStreamingAsrManager+Pipeline) - Remove stale TODO and duplicate vocabularyFileArray constant - Remove outdated nonisolated(unsafe) from SlidingWindowAsrManager - Replace force unwraps in RnntDecoder with guard let + throw

- Delete dead loadModel() and getDefaultModelsDirectory() (legacy Models/Parakeet path) - Remove dangling doc comment from deleted property - Rename transcribeStreaming → transcribeDiskBacked (avoids confusion with the real streaming API in SlidingWindowAsrManager) - Convert getDecoderLayers() to decoderLayerCount computed property - Move AudioSource enum from AsrManager.swift to Shared/AudioSource.swift - Mark pure utility methods as nonisolated: normalizedTimingToken, calculateConfidence, sliceEncoderOutput, removeDuplicateTokenSequence

ANEOptimizer was a thin wrapper over ANEMemoryUtils in the wrong location (ASR/Parakeet/ instead of Shared/). All callers now use ANEMemoryUtils directly. - Replace ANEOptimizer.createANEAlignedArray → ANEMemoryUtils.createAlignedArray - Replace ANEOptimizer.prefetchToNeuralEngine(x) → x.prefetchToNeuralEngine() (MLMultiArray extension already in ANEMemoryOptimizer) - Move convertToFloat16 to ANEMemoryUtils (throws proper ANEMemoryError) - Move ZeroCopyFeatureProvider to Shared/ZeroCopyFeatureProvider.swift - Inline optimalComputeUnits (always returned .cpuAndNeuralEngine), delete ModelType enum - Simplify AsrModels.optimizedConfiguration to use shared utility - Delete ANEOptimizer.swift

…te duplication - Remove unused import OSLog and dead sliceEncoderOutput method - Add clearCachedCtcData() helper to eliminate repeated nil assignments - Add decoderState(for:)/setDecoderState(_:for:) to eliminate switch duplication - Extract frameAlignedAudio() helper for duplicated frame-alignment logic - Add ASRConstants.secondsPerEncoderFrame to replace magic number 0.08 - Replace hardcoded 16_000 with config.sampleRate - Remove unused duration parameter from calculateConfidence - Simplify processTranscriptionResult by removing dead tokenTimings parameter - Replace convertTokensWithExistingTimings with simpler convertTokensToText

…ad code - Apply enableFP16 to allowLowPrecisionAccumulationOnGPU in optimizedConfiguration (fixes review feedback on PR #460) - Remove dead loadWithANEOptimization method (no callers) - Remove unused import OSLog

- Delete PerformanceMonitor actor (never instantiated, component times hardcoded to 0) - Delete AggregatedMetrics struct (only used by dead monitor) - Remove unused imports (os, MachTaskSelfWrapper) - Move ASRPerformanceMetrics to Shared/ (not Parakeet-specific) - Remove dead PerformanceMonitor tests, keep ASRPerformanceMetrics tests

- Remove redundant currentStream() wrapper (callers use ensureSession) - Fix finishSession: return early when inactive instead of creating an orphan stream - Remove auto-create in resetAndPrepareNextSession (renamed to reset); next ensureSession() creates on demand - Remove onTermination closure with unnecessary weak self - Move from ASR/Parakeet/ to Shared/ (generic async stream utility)

…hared - Remove unused `import os` and logger - Delete dead `getFloat16Array` method and its 2 tests - Fix `returnArray` to reset data for all types, not just float32 - Remove debug logging from hot path - Move from ASR/Parakeet/ to Shared/ (used by ASR and shared cache)

…ff bug - Remove unused `import CoreML` and `import OSLog` - Replace hardcoded `sampleRate = 16000` with `ASRConstants.sampleRate` - Replace manual frameDuration calculation with `ASRConstants.secondsPerEncoderFrame` - Fix duplicate token at cutoff boundary in mergeByMidpoint (`<=` to `<`)

Track the benchmark orchestration script that runs all 4 Parakeet model benchmarks (v3, v2, TDT-CTC-110M, CTC earnings) with asset verification and sleep prevention. Link it from the benchmark results doc for reproducibility. Whitelist the script in .gitignore (scripts/ was ignored).

- Add explicit folderName cases for parakeetCtc110m and parakeetCtc06b in ModelNames.swift. The default case strips "-coreml" which broke auto-detection of the CTC model directory. - Add EOU 320ms and Nemotron 1120ms streaming benchmarks to the script - Add WER comparison table against benchmarks100.md baselines - Fix CTC earnings to use v2 TDT (matching baseline config) - Fix WER extraction for fields stored as percentages vs decimals Verified: all 6 benchmarks match baselines (v3 2.6%, v2 3.8%, TDT-CTC 3.6%, earnings 16.5%, EOU 7.11%, Nemotron 1.99%).

claude bot reviewed Mar 28, 2026

View reviewed changes

devin-ai-integration bot reviewed Mar 28, 2026

View reviewed changes

Alex-Wengg added 3 commits March 28, 2026 19:32

This comment was marked as resolved.

Sign in to view

Alex-Wengg added 8 commits March 28, 2026 20:13

Alex-Wengg changed the title ~~Move vocabulary boosting out of AsrManager and rename transcribeStreamingChunk~~ ASR tech debt cleanup: remove dead code, fix bugs, add benchmark script Mar 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASR tech debt cleanup: remove dead code, fix bugs, add benchmark script#460

ASR tech debt cleanup: remove dead code, fix bugs, add benchmark script#460
Alex-Wengg wants to merge 12 commits intomainfrom
refactor/457-asr-manager-vocab-cleanup

Alex-Wengg commented Mar 28, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

github-actions bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Alex-Wengg commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug fixes

Dead code removal

Duplication elimination

File organization

Benchmark script

Verified — no regressions

Test plan

Uh oh!

claude bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: AsrManager vocabulary boosting refactor ✅

Review Summary

✅ Key Strengths

📊 Code Quality Rating: ★★★★★

🧪 Testing Status

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

✅ Excellent Changes

1. Clear Separation of Concerns

2. Method Rename is Semantically Correct

3. Smart Preservation of Performance Features

🔍 Code Quality Analysis

Architecture Quality: ★★★★★

API Design: ★★★★★

Memory Management: ★★★★★

🧪 Testing & Validation

📝 Minor Observations

Variable Declaration Style

Documentation Impact

🚀 Recommendation: APPROVE

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

github-actions bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PocketTTS Smoke Test ✅

Uh oh!

github-actions bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Qwen3-ASR int8 Smoke Test ✅

Performance Metrics

Uh oh!

github-actions bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sortformer High-Latency Benchmark Results

ES2004a Performance (30.4s latency config)

Uh oh!

github-actions bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ASR Benchmark Results ✅

Parakeet v3 (multilingual)

Parakeet v2 (English-optimized)

Streaming (v3)

Streaming (v2)

Expected RTFx Performance on Physical M1 Hardware:

Uh oh!

github-actions bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Speaker Diarization Benchmark Results

Speaker Diarization Performance

Diarization Pipeline Timing Breakdown

Speaker Diarization Research Comparison

Uh oh!

github-actions bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

VAD Benchmark Results

Performance Comparison

Dataset Details

Uh oh!

github-actions bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Parakeet EOU Benchmark Results ✅

Performance Metrics

Alex-Wengg commented Mar 28, 2026 •

edited

Loading

claude bot commented Mar 28, 2026 •

edited

Loading

github-actions bot commented Mar 28, 2026 •

edited

Loading

github-actions bot commented Mar 28, 2026 •

edited

Loading

github-actions bot commented Mar 28, 2026 •

edited

Loading

github-actions bot commented Mar 28, 2026 •

edited

Loading

github-actions bot commented Mar 28, 2026 •

edited

Loading

github-actions bot commented Mar 28, 2026 •

edited

Loading

github-actions bot commented Mar 28, 2026 •

edited

Loading

github-actions bot commented Mar 28, 2026 •

edited

Loading