Skip to content

Conversation

@gistrec
Copy link
Owner

@gistrec gistrec commented Jan 20, 2026

Motivation

  • Avoid counting tokens for the user-facing fallback text when transcription is empty and ensure zero token counts are recorded for empty transcriptions.
  • Keep stored text user-friendly while basing token accounting on the raw recognition output.
  • Fix README schema formatting so the llm_tokens_by_model column aligns with other columns for readability.

Description

  • Update utils/tokens.py so tokens_by_model returns zeros for every model when text.strip() is empty and keep LLM_TOKEN_MODELS-based mapping.
  • Change schedulers/transcription.py to compute token_counts = tokens_by_model(raw_text) using the raw parse_text(result) output and then replace empty text with the friendly fallback string before persisting results.
  • Persist llm_tokens_by_model=token_counts on both successful and failed updates in update_transcription calls.
  • Adjust README.md spacing for the llm_tokens_by_model JSON column to align with other schema columns.

Testing

  • No automated tests were run for this change.
  • Local static inspection and manual review of modified files were performed during the rollout and changes were committed successfully.

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants