Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 37 additions & 2 deletions .github/workflows/asr-benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -198,9 +198,38 @@ jobs:
echo "EXECUTION_TIME=$EXECUTION_TIME" >> $GITHUB_OUTPUT
echo "FILES_COUNT=$MAX_FILES" >> $GITHUB_OUTPUT

# Validate RTFx values - 0 indicates benchmark failure
if [ "$CLEAN_RTFx" = "0.00" ] || [ "$CLEAN_RTFx" = "N/A" ]; then
echo "⚠️ test-clean RTFx is 0 or N/A - benchmark may have failed"
CLEAN_RTFX_FAILED=1
fi
if [ "$CLEAN_V2_RTFx" = "0.00" ] || [ "$CLEAN_V2_RTFx" = "N/A" ]; then
echo "⚠️ test-clean (v2) RTFx is 0 or N/A - benchmark may have failed"
CLEAN_V2_RTFX_FAILED=1
fi
if [ "$OTHER_RTFx" = "0.00" ] || [ "$OTHER_RTFx" = "N/A" ]; then
echo "⚠️ test-other RTFx is 0 or N/A - benchmark may have failed"
OTHER_RTFX_FAILED=1
fi
if [ "$OTHER_V2_RTFx" = "0.00" ] || [ "$OTHER_V2_RTFx" = "N/A" ]; then
echo "⚠️ test-other (v2) RTFx is 0 or N/A - benchmark may have failed"
OTHER_V2_RTFX_FAILED=1
fi
if [ "$STREAMING_RTFx" = "0.00" ] || [ "$STREAMING_RTFx" = "N/A" ]; then
echo "⚠️ streaming RTFx is 0 or N/A - benchmark may have failed"
STREAMING_RTFX_FAILED=1
fi
if [ "$STREAMING_V2_RTFx" = "0.00" ] || [ "$STREAMING_V2_RTFx" = "N/A" ]; then
echo "⚠️ streaming (v2) RTFx is 0 or N/A - benchmark may have failed"
STREAMING_V2_RTFX_FAILED=1
fi

# Report failures summary
if [ ! -z "$CLEAN_FAILED" ] || [ ! -z "$OTHER_FAILED" ] || [ ! -z "$STREAMING_FAILED" ] || \
[ ! -z "$CLEAN_V2_FAILED" ] || [ ! -z "$OTHER_V2_FAILED" ] || [ ! -z "$STREAMING_V2_FAILED" ]; then
[ ! -z "$CLEAN_V2_FAILED" ] || [ ! -z "$OTHER_V2_FAILED" ] || [ ! -z "$STREAMING_V2_FAILED" ] || \
[ ! -z "$CLEAN_RTFX_FAILED" ] || [ ! -z "$CLEAN_V2_RTFX_FAILED" ] || \
[ ! -z "$OTHER_RTFX_FAILED" ] || [ ! -z "$OTHER_V2_RTFX_FAILED" ] || \
[ ! -z "$STREAMING_RTFX_FAILED" ] || [ ! -z "$STREAMING_V2_RTFX_FAILED" ]; then
echo "BENCHMARK_STATUS=PARTIAL_FAILURE" >> $GITHUB_OUTPUT
echo "⚠️ Some benchmarks failed:"
[ ! -z "$CLEAN_FAILED" ] && echo " - test-clean benchmark failed"
Expand All @@ -209,7 +238,13 @@ jobs:
[ ! -z "$CLEAN_V2_FAILED" ] && echo " - test-clean (v2) benchmark failed"
[ ! -z "$OTHER_V2_FAILED" ] && echo " - test-other (v2) benchmark failed"
[ ! -z "$STREAMING_V2_FAILED" ] && echo " - streaming (v2) benchmark failed"
# Don't exit with error to allow PR comment to be posted
[ ! -z "$CLEAN_RTFX_FAILED" ] && echo " - test-clean RTFx is 0"
[ ! -z "$CLEAN_V2_RTFX_FAILED" ] && echo " - test-clean (v2) RTFx is 0"
[ ! -z "$OTHER_RTFX_FAILED" ] && echo " - test-other RTFx is 0"
[ ! -z "$OTHER_V2_RTFX_FAILED" ] && echo " - test-other (v2) RTFx is 0"
[ ! -z "$STREAMING_RTFX_FAILED" ] && echo " - streaming RTFx is 0"
[ ! -z "$STREAMING_V2_RTFX_FAILED" ] && echo " - streaming (v2) RTFx is 0"
exit 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 exit 1 prevents PR comment from being posted because Comment PR step lacks always()

The old code at this location had a deliberate comment: # Don't exit with error to allow PR comment to be posted. This PR replaces that with exit 1. The subsequent "Comment PR" step at asr-benchmark.yml:254 uses if: github.event_name == 'pull_request' which, per GitHub Actions docs, implicitly becomes if: success() && github.event_name == 'pull_request'. When exit 1 fires, success() evaluates to false and the PR comment step is skipped entirely. Benchmark results will not be visible on the PR. The other workflows that already had exit 1 and work correctly (diarizer, sortformer) use if: always() on their comment steps.

Prompt for agents
In .github/workflows/asr-benchmark.yml, the `exit 1` at line 247 causes the Comment PR step to be skipped. Fix by changing the Comment PR step's condition at line 254 from:
  if: github.event_name == 'pull_request'
to:
  if: always() && github.event_name == 'pull_request'

This matches the pattern used in other benchmarks (diarizer-benchmark.yml, vad-benchmark.yml) that already handle step failures correctly.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

else
echo "BENCHMARK_STATUS=SUCCESS" >> $GITHUB_OUTPUT
echo "✅ All benchmarks completed successfully"
Expand Down
7 changes: 7 additions & 0 deletions .github/workflows/diarizer-benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,13 @@ jobs:
echo "CLUSTERING_TIME=${CLUSTERING_TIME}" >> $GITHUB_OUTPUT
echo "INFERENCE_TIME=${INFERENCE_TIME}" >> $GITHUB_OUTPUT

# Validate RTFx - 0 indicates benchmark failure
if [ "$RTF" = "0" ] || [ -z "$RTF" ]; then
echo "❌ CRITICAL: RTFx is 0 or empty - benchmark failed"
echo "RTFx value: $RTF"
exit 1
fi

- name: Comment PR with Benchmark Results
if: always()
uses: actions/github-script@v7
Expand Down
7 changes: 7 additions & 0 deletions .github/workflows/parakeet-eou-benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,13 @@ jobs:
echo "MAX_FILES=$MAX_FILES" >> $GITHUB_OUTPUT
echo "BENCHMARK_STATUS=$BENCHMARK_STATUS" >> $GITHUB_OUTPUT

# Validate RTFx - 0 or N/A indicates benchmark failure
if [ "$RTFx" = "0.00" ] || [ "$RTFx" = "N/A" ]; then
echo "❌ CRITICAL: RTFx is 0 or N/A - benchmark failed"
echo "RTFx value: $RTFx"
exit 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 exit 1 prevents PR comment because Comment PR step lacks always() in parakeet-eou-benchmark

Same issue as in asr-benchmark.yml: the new exit 1 at line 111 will cause the "Comment PR" step at parakeet-eou-benchmark.yml:115 to be skipped. That step uses if: github.event_name == 'pull_request' without always(), so when the benchmark step fails, GitHub Actions' implicit success() AND prevents the comment from being posted.

Prompt for agents
In .github/workflows/parakeet-eou-benchmark.yml, the exit 1 at line 111 causes the Comment PR step to be skipped. Fix by changing the Comment PR step's condition at line 115 from:
  if: github.event_name == 'pull_request'
to:
  if: always() && github.event_name == 'pull_request'

This matches the pattern used in diarizer-benchmark.yml and vad-benchmark.yml.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

fi

- name: Comment PR
if: github.event_name == 'pull_request'
continue-on-error: true
Expand Down
34 changes: 34 additions & 0 deletions .github/workflows/qwen3-asr-benchmark.yml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 CI cache path references old folder name qwen3-asr-0.6b-coreml that no longer matches code

The CI workflow caches ~/Library/Application Support/FluidAudio/Models/qwen3-asr-0.6b-coreml (line 30), but after the folderName default-case change in Sources/FluidAudio/ModelNames.swift:133, Repo.qwen3AsrInt8.folderName now returns qwen3-asr-0.6b/int8 instead of qwen3-asr-0.6b-coreml/int8. The models will be downloaded to a path under qwen3-asr-0.6b/ which is not covered by the CI cache configuration, so the cache will never hit and models will be re-downloaded on every CI run.

(Refers to line 30)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,31 @@ jobs:
echo "SMOKE_STATUS=FAILED" >> $GITHUB_OUTPUT
fi

# Extract RTFx metrics if results file exists
if [ -f qwen3_results_int8.json ]; then
MEDIAN_RTFx=$(jq -r '.summary.medianRTFx // "N/A"' qwen3_results_int8.json 2>/dev/null)
OVERALL_RTFx=$(jq -r '.summary.overallRTFx // "N/A"' qwen3_results_int8.json 2>/dev/null)

[ "$MEDIAN_RTFx" != "null" ] && [ "$MEDIAN_RTFx" != "N/A" ] && MEDIAN_RTFx=$(printf "%.2f" "$MEDIAN_RTFx") || MEDIAN_RTFx="N/A"
[ "$OVERALL_RTFx" != "null" ] && [ "$OVERALL_RTFx" != "N/A" ] && OVERALL_RTFx=$(printf "%.2f" "$OVERALL_RTFx") || OVERALL_RTFx="N/A"

echo "MEDIAN_RTFx=$MEDIAN_RTFx" >> $GITHUB_OUTPUT
echo "OVERALL_RTFx=$OVERALL_RTFx" >> $GITHUB_OUTPUT

# Fail if RTFx is 0 or N/A - indicates benchmark failure
if [ "$MEDIAN_RTFx" = "N/A" ] || [ "$MEDIAN_RTFx" = "0.00" ] || [ "$OVERALL_RTFx" = "N/A" ] || [ "$OVERALL_RTFx" = "0.00" ]; then
echo "❌ CRITICAL: RTFx is 0 or N/A - benchmark failed to produce valid results"
echo "Median RTFx: $MEDIAN_RTFx"
echo "Overall RTFx: $OVERALL_RTFx"
exit 1
fi
else
echo "❌ CRITICAL: Results file not found - benchmark failed"
echo "MEDIAN_RTFx=N/A" >> $GITHUB_OUTPUT
echo "OVERALL_RTFx=N/A" >> $GITHUB_OUTPUT
exit 1
Comment on lines +88 to +94
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 exit 1 prevents PR comment because Comment PR step lacks always() in qwen3-asr-benchmark

Same issue: the new exit 1 at lines 88 and 94 in the smoketest step will cause the "Comment PR" step at qwen3-asr-benchmark.yml:101 to be skipped. That step uses if: github.event_name == 'pull_request' without always(), so the implicit success() check fails and no PR comment is posted when the RTFx validation fails.

Prompt for agents
In .github/workflows/qwen3-asr-benchmark.yml, the exit 1 at lines 88 and 94 causes the Comment PR step to be skipped. Fix by changing the Comment PR step's condition at line 101 from:
  if: github.event_name == 'pull_request'
to:
  if: always() && github.event_name == 'pull_request'

This matches the pattern used in diarizer-benchmark.yml and vad-benchmark.yml.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

fi

EXECUTION_TIME=$(( ($(date +%s) - BENCHMARK_START) / 60 ))m$(( ($(date +%s) - BENCHMARK_START) % 60 ))s
echo "EXECUTION_TIME=$EXECUTION_TIME" >> $GITHUB_OUTPUT

Expand All @@ -81,6 +106,9 @@ jobs:
const status = '${{ steps.smoketest.outputs.SMOKE_STATUS }}';
const emoji = status === 'PASSED' ? '✅' : '❌';

const medianRTFx = '${{ steps.smoketest.outputs.MEDIAN_RTFx }}';
const overallRTFx = '${{ steps.smoketest.outputs.OVERALL_RTFx }}';

const body = `## Qwen3-ASR int8 Smoke Test ${emoji}

| Check | Result |
Expand All @@ -91,6 +119,12 @@ jobs:
| Transcription pipeline | ${emoji} |
| Decoder size | 571 MB (vs 1.1 GB f32) |

### Performance Metrics
| Metric | CI Value | Expected on Apple Silicon |
|--------|----------|--------------------------|
| Median RTFx | ${medianRTFx}x | ~2.5x |
| Overall RTFx | ${overallRTFx}x | ~2.5x |

<sub>Runtime: ${{ steps.smoketest.outputs.EXECUTION_TIME }}</sub>

<sub>**Note:** CI VM lacks physical GPU — CoreML MLState (macOS 15) KV cache produces degraded results on virtualized runners. On Apple Silicon: ~1.3% WER / 2.5x RTFx.</sub>
Expand Down
7 changes: 7 additions & 0 deletions .github/workflows/sortformer-benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,13 @@ jobs:
echo "DETECTED=${DETECTED}" >> $GITHUB_OUTPUT
echo "GROUND_TRUTH=${GROUND_TRUTH}" >> $GITHUB_OUTPUT

# Validate RTFx - 0 indicates benchmark failure
if [ "$RTF" = "0" ] || [ -z "$RTF" ]; then
echo "❌ CRITICAL: RTFx is 0 or empty - benchmark failed"
echo "RTFx value: $RTF"
exit 1
fi

- name: Comment PR with Benchmark Results
if: always()
uses: actions/github-script@v7
Expand Down
26 changes: 26 additions & 0 deletions .github/workflows/vad-benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,32 @@ jobs:
--threshold 0.5 \
--output voices_vad_results.json

- name: Validate RTFx metrics
run: |
# Validate MUSAN RTFx
if [ -f musan_vad_results.json ]; then
MUSAN_RTFx=$(jq -r '.rtfx // 0' musan_vad_results.json)
if [ "$MUSAN_RTFx" = "0" ] || [ -z "$MUSAN_RTFx" ]; then
echo "❌ CRITICAL: MUSAN RTFx is 0 or empty - benchmark failed"
exit 1
fi
else
echo "❌ CRITICAL: musan_vad_results.json not found"
exit 1
fi

# Validate VOiCES RTFx
if [ -f voices_vad_results.json ]; then
VOICES_RTFx=$(jq -r '.rtfx // 0' voices_vad_results.json)
if [ "$VOICES_RTFx" = "0" ] || [ -z "$VOICES_RTFx" ]; then
echo "❌ CRITICAL: VOiCES RTFx is 0 or empty - benchmark failed"
exit 1
fi
else
echo "❌ CRITICAL: voices_vad_results.json not found"
exit 1
fi

- name: Upload results
if: always()
uses: actions/upload-artifact@v4
Expand Down
10 changes: 1 addition & 9 deletions Sources/FluidAudio/ModelNames.swift
Original file line number Diff line number Diff line change
Expand Up @@ -129,16 +129,8 @@ public enum Repo: String, CaseIterable {
return "nemotron-streaming/560ms"
case .sortformer:
return "sortformer"
case .lseend:
return "ls-eend"
case .pocketTts:
return "pocket-tts"
case .multilingualG2p:
return "charsiu-g2p-byt5"
case .parakeetTdtCtc110m:
return "parakeet-tdt-ctc-110m"
default:
return name
return name.replacingOccurrences(of: "-coreml", with: "")
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Changing folderName default from name to name.replacingOccurrences(of: "-coreml", with: "") silently renames cache directories for 8 repos

The refactoring intended to simplify the 4 explicitly removed cases (.lseend, .pocketTts, .multilingualG2p, .parakeetTdtCtc110m) into the default, but the default itself was changed from return name to return name.replacingOccurrences(of: "-coreml", with: ""). This changes the folderName for every repo that was previously falling through to the old default:

  • .vad: "silero-vad-coreml""silero-vad"
  • .parakeet: "parakeet-tdt-0.6b-v3-coreml""parakeet-tdt-0.6b-v3"
  • .parakeetV2: "parakeet-tdt-0.6b-v2-coreml""parakeet-tdt-0.6b-v2"
  • .parakeetCtc110m/.parakeetCtc06b: similarly stripped
  • .diarizer: "speaker-diarization-coreml""speaker-diarization"
  • .qwen3Asr/.qwen3AsrInt8: "qwen3-asr-0.6b-coreml/...""qwen3-asr-0.6b/..."

folderName is used pervasively to construct local model cache paths (DownloadUtils.swift:135, DownloadUtils.swift:190, AsrModels.swift:501, DiarizerModels.swift:106, etc.). This means (1) all existing cached models at old paths become orphaned and trigger unnecessary re-downloads, and (2) CI workflow cache paths still reference the old names (e.g. asr-benchmark.yml:28-29 caches parakeet-tdt-0.6b-v3-coreml but code now expects parakeet-tdt-0.6b-v3), rendering CI caches completely useless.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

}
}
}
Expand Down
2 changes: 1 addition & 1 deletion Tests/FluidAudioTests/ASR/Parakeet/ModelNamesTests.swift
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ final class ModelNamesTests: XCTestCase {
// Verify name (repo slug with -coreml suffix)
XCTAssertEqual(repo.name, "parakeet-tdt-ctc-110m-coreml")

// Verify folder name (simplified local folder name)
// Verify folder name (simplified - strips -coreml suffix by default)
XCTAssertEqual(repo.folderName, "parakeet-tdt-ctc-110m")

// Should have no subpath (not a variant repo)
Expand Down
Loading