Metal optimizations 2 by Alex-Wengg · Pull Request #6 · FluidInference/FluidAudio

Alex-Wengg · 2025-06-30T01:44:08Z

No description provided.

github-actions · 2025-06-30T02:17:06Z

🎯 Single File Benchmark Results

Test File: ES2004a (NaNs audio)

Metric	Value	Target	Status
DER (Diarization Error Rate)	NaN%	< 30%	❌
JER (Jaccard Error Rate)	NaN%	< 25%	❌
RTF (Real-Time Factor)	NaNx	< 1.0x	❌
Speakers Detected		-	ℹ️
Benchmark Runtime	s	-	ℹ️

⚠️ Performance Below Target - Consider parameter optimization

📊 Research Comparison:

Powerset BCE (2023): 18.5% DER
EEND (2019): 25.3% DER
x-vector clustering: 28.7% DER

Automated benchmark using AMI corpus ES2004a test file

github-actions · 2025-06-30T02:21:02Z

🎯 Single File Benchmark Results

Test File: ES2004a (NaNs audio)

Metric	Value	Target	Status
DER (Diarization Error Rate)	NaN%	< 30%	❌
JER (Jaccard Error Rate)	NaN%	< 25%	❌
RTF (Real-Time Factor)	NaNx	< 1.0x	❌
Speakers Detected		-	ℹ️
Benchmark Runtime	s	-	ℹ️

⚠️ Performance Below Target - Consider parameter optimization

📊 Research Comparison:

Powerset BCE (2023): 18.5% DER
EEND (2019): 25.3% DER
x-vector clustering: 28.7% DER

Automated benchmark using AMI corpus ES2004a test file

github-actions · 2025-06-30T02:22:59Z

🎯 Single File Benchmark Results

Test File: ES2004a (NaNs audio)

Metric	Value	Target	Status
DER (Diarization Error Rate)	NaN%	< 30%	❌
JER (Jaccard Error Rate)	NaN%	< 25%	❌
RTF (Real-Time Factor)	NaNx	< 1.0x	❌
Speakers Detected		-	ℹ️
Benchmark Runtime	s	-	ℹ️

⚠️ Performance Below Target - Consider parameter optimization

📊 Research Comparison:

Powerset BCE (2023): 18.5% DER
EEND (2019): 25.3% DER
x-vector clustering: 28.7% DER

Automated benchmark using AMI corpus ES2004a test file

github-actions · 2025-06-30T02:26:14Z

🎯 Single File Benchmark Results

Test File: ES2004a (NaNs audio)

Metric	Value	Target	Status
DER (Diarization Error Rate)	NaN%	< 30%	❌
JER (Jaccard Error Rate)	NaN%	< 25%	❌
RTF (Real-Time Factor)	NaNx	< 1.0x	❌
Speakers Detected		-	ℹ️
Benchmark Runtime	s	-	ℹ️

⚠️ Performance Below Target - Consider parameter optimization

📊 Research Comparison:

Powerset BCE (2023): 18.5% DER
EEND (2019): 25.3% DER
x-vector clustering: 28.7% DER

Automated benchmark using AMI corpus ES2004a test file

github-actions · 2025-06-30T02:56:18Z

🎯 Single File Benchmark Results

Test File: ES2004a (1049.4s audio)

Metric	Value	Target	Status
DER (Diarization Error Rate)	31.9%	< 30%	❌
JER (Jaccard Error Rate)	29.1%	< 25%	❌
RTF (Real-Time Factor)	0.03x	< 1.0x	✅
Speakers Detected	16	-	ℹ️
Benchmark Runtime	NAs	-	ℹ️

⚠️ Performance Below Target - Consider parameter optimization

📊 Research Comparison:

Powerset BCE (2023): 18.5% DER
EEND (2019): 25.3% DER
x-vector clustering: 28.7% DER

Automated benchmark using AMI corpus ES2004a test file

github-actions · 2025-06-30T03:05:56Z

🎯 Single File Benchmark Results

Test File: ES2004a (1049.4s audio)

Metric	Value	Target	Status
DER (Diarization Error Rate)	34.7%	< 30%	❌
JER (Jaccard Error Rate)	39.1%	< 25%	❌
RTF (Real-Time Factor)	0.03x	< 1.0x	✅
Speakers Detected	16	-	ℹ️
Benchmark Runtime	NAs	-	ℹ️

⚠️ Performance Below Target - Consider parameter optimization

📊 Research Comparison:

Powerset BCE (2023): 18.5% DER
EEND (2019): 25.3% DER
x-vector clustering: 28.7% DER

Automated benchmark using AMI corpus ES2004a test file

### Why is this change needed?  Taking inspiration from the silero https://github.com/snakers4/silero-vad/blob/master/src/silero_vad/utils_vad.py Updating our segmentation implementation and supporitng streaming VAD ```bash %swift run fluidaudio vad-analyze voiceink-issue-279.wav --seconds --mode streaming Building for debugging... [1/1] Write swift-version--58304C5D6DBC2206.txt Build of product 'fluidaudio' complete! (0.07s) [00:08:02.789] [INFO] [DownloadUtils] Found silero-vad-coreml locally, no download needed [00:08:02.812] [INFO] [DownloadUtils] Loaded model: silero-vad-unified-256ms-v6.0.0.mlmodelc [00:08:02.812] [INFO] [VadManager] VAD model loaded successfully [00:08:02.812] [INFO] [VadManager] VAD system initialized in 0.02s [00:08:02.812] [INFO] [VadAnalyze] 📶 Running streaming simulation... [00:08:02.820] [INFO] [VadAnalyze] • Speech Start at 1.200s [00:08:02.821] [INFO] [VadAnalyze] • Speech End at 2.700s [00:08:02.822] [INFO] [VadAnalyze] • Speech Start at 4.300s [00:08:02.825] [INFO] [VadAnalyze] • Speech End at 7.800s [00:08:02.828] [INFO] [VadAnalyze] • Speech Start at 13.700s [00:08:02.830] [INFO] [VadAnalyze] • Speech End at 16.200s [00:08:02.830] [INFO] [VadAnalyze] • Speech Start at 17.300s [00:08:02.832] [INFO] [VadAnalyze] • Speech End at 19.000s [00:08:02.839] [INFO] [VadAnalyze] • Speech Start at 29.600s [00:08:02.840] [INFO] [VadAnalyze] • Speech End at 30.600s [00:08:02.849] [INFO] [VadAnalyze] • Speech Start at 45.000s [00:08:02.849] [INFO] [VadAnalyze] Flushing trailing silence to close open segments... [00:08:02.850] [INFO] [VadAnalyze] • Speech End at 45.500s [00:08:02.850] [INFO] [VadAnalyze] Streaming simulation produced 12 events % swift run fluidaudio vad-analyze voiceink-issue-279.wav --seconds Building for debugging... [1/1] Write swift-version--58304C5D6DBC2206.txt Build of product 'fluidaudio' complete! (0.07s) [00:08:08.289] [INFO] [DownloadUtils] Found silero-vad-coreml locally, no download needed [00:08:08.309] [INFO] [DownloadUtils] Loaded model: silero-vad-unified-256ms-v6.0.0.mlmodelc [00:08:08.309] [INFO] [VadManager] VAD model loaded successfully [00:08:08.309] [INFO] [VadManager] VAD system initialized in 0.02s [00:08:08.309] [INFO] [VadAnalyze] 📍 Running offline speech segmentation... [00:08:08.344] [INFO] [VadAnalyze] Detected 6 speech segments in 0.03s [00:08:08.344] [INFO] [VadAnalyze] RTFx: 1369.21x (audio: 45.66s, inference: 0.03s) [00:08:08.344] [INFO] [VadAnalyze] Segment #1: samples 18880-42560 (1.18s-2.66s) [00:08:08.344] [INFO] [VadAnalyze] Segment #2: samples 68032-124480 (4.25s-7.78s) [00:08:08.344] [INFO] [VadAnalyze] Segment #3: samples 219584-259648 (13.72s-16.23s) [00:08:08.344] [INFO] [VadAnalyze] Segment #4: samples 276928-304704 (17.31s-19.04s) [00:08:08.344] [INFO] [VadAnalyze] Segment #5: samples 473536-489024 (29.60s-30.56s) [00:08:08.344] [INFO] [VadAnalyze] Segment #6: samples 719296-730616 (44.96s-45.66s) % ffmpeg -i voiceink-issue-279.wav -af silencedetect=noise=-30dB:d=0.5 -f null - ffmpeg version 8.0 Copyright (c) 2000-2025 the FFmpeg developers built with Apple clang version 17.0.0 (clang-1700.0.13.3) ... libavutil 60. 8.100 / 60. 8.100 libavcodec 62. 11.100 / 62. 11.100 libavformat 62. 3.100 / 62. 3.100 libavdevice 62. 1.100 / 62. 1.100 libavfilter 11. 4.100 / 11. 4.100 libswscale 9. 1.100 / 9. 1.100 libswresample 6. 1.100 / 6. 1.100 [aist#0:0/pcm_s16le @ 0xb22c38180] Guessed Channel Layout: mono Input #0, wav, from 'voiceink-issue-279.wav': Duration: 00:00:45.66, bitrate: 256 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, null, to 'pipe:': Metadata: encoder : Lavf62.3.100 Stream #0:0: Audio: pcm_s16le, 16000 Hz, mono, s16, 256 kb/s Metadata: encoder : Lavc62.11.100 pcm_s16le [silencedetect @ 0xb22c6c420] silence_start: 0 [silencedetect @ 0xb22c6c420] silence_end: 1.364 | silence_duration: 1.364 [silencedetect @ 0xb22c6c420] silence_start: 2.305687 [silencedetect @ 0xb22c6c420] silence_end: 4.394813 | silence_duration: 2.089125 [silencedetect @ 0xb22c6c420] silence_start: 7.579813 [silencedetect @ 0xb22c6c420] silence_end: 14.003938 | silence_duration: 6.424125 [silencedetect @ 0xb22c6c420] silence_start: 15.845063 [silencedetect @ 0xb22c6c420] silence_end: 17.45075 | silence_duration: 1.605687 [silencedetect @ 0xb22c6c420] silence_start: 18.692625 [silencedetect @ 0xb22c6c420] silence_end: 29.667438 | silence_duration: 10.974813 [silencedetect @ 0xb22c6c420] silence_start: 30.367563 [silencedetect @ 0xb22c6c420] silence_end: 41.412062 | silence_duration: 11.0445 [silencedetect @ 0xb22c6c420] silence_start: 41.454687 [silencedetect @ 0xb22c6c420] silence_end: 45.000813 | silence_duration: 3.546125 [out#0/null @ 0xb2300c780] video:0KiB audio:1427KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: unknown size=N/A time=00:00:45.66 bitrate=N/A speed=8.51e+03x elapsed=0:00:00.00 ```

BrandonWeng and others added 6 commits June 29, 2025 22:00

Metal optimizations for pipeline operations

f9b1396

Benchmark runs

efab695

Sample files and downloading benchmarking files

5cbad0e

Add annotation for benchmark tests

89e66ea

Cleaning

7f756d2

cleaning part 2

33d7431

Alex-Wengg force-pushed the metal-optimizations-2 branch from 6b9fcf3 to 33d7431 Compare June 30, 2025 02:14

Alex-Wengg force-pushed the metal-optimizations-2 branch from 0172009 to 01e74c1 Compare June 30, 2025 02:21

add back test.yml

0124672

Alex-Wengg force-pushed the metal-optimizations-2 branch from 01e74c1 to 0124672 Compare June 30, 2025 02:22

fix missing main.swift stuff

2c97c40

Alex-Wengg force-pushed the metal-optimizations-2 branch from 18aefc7 to 2c97c40 Compare June 30, 2025 02:38

FluidInference deleted a comment from github-actions bot Jun 30, 2025

fix concurrency issue

e23415e

Alex-Wengg force-pushed the metal-optimizations-2 branch from af3285d to e23415e Compare June 30, 2025 03:00

BrandonWeng closed this Jul 23, 2025

BrandonWeng deleted the metal-optimizations-2 branch August 1, 2025 20:14

smdesai mentioned this pull request Feb 23, 2026

iOS 26.4 beta 1 and beta 2: BNNSGraphContextExecute_v2 error #328

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal optimizations 2#6

Metal optimizations 2#6
Alex-Wengg wants to merge 9 commits intomainfrom
metal-optimizations-2

Alex-Wengg commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Alex-Wengg commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

🎯 Single File Benchmark Results

Uh oh!

github-actions bot commented Jun 30, 2025

🎯 Single File Benchmark Results

Uh oh!

github-actions bot commented Jun 30, 2025

🎯 Single File Benchmark Results

Uh oh!

github-actions bot commented Jun 30, 2025

🎯 Single File Benchmark Results

Uh oh!

github-actions bot commented Jun 30, 2025

🎯 Single File Benchmark Results

Uh oh!

github-actions bot commented Jun 30, 2025

🎯 Single File Benchmark Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants