ovos-localize is a GitHub-native translation platform designed specifically for OpenVoiceOS locale files. It uses contextual cues from skill source code to provide better translation quality.
You can use the CLI tool:
ovos-localize-cli validate /path/to/skill-repoThe langcodes library uses language_data to provide human-readable display names for language codes (e.g., "en-US" → "English (United States)").
The most common cause is a missing dev branch. The scanner defaults to cloning dev, and falls back to main then master. If none of those branches exist the clone will fail and the skill will be silently skipped. Check that the remote has at least one of those branches.
Also verify the skill has a locale/ or res/ directory with BCP-47 language subdirectories (e.g. locale/en-us/). Skills with no locale files are skipped even after a successful clone.
Ensure you have the dev dependencies installed:
uv sync --extra dev
uv run pytestYes! ovos-localize generates six JSONL dataset families under data/datasets/, updated daily via GitHub Actions:
| Directory | Description | Key fields |
|---|---|---|
classification/ |
Intent/voc utterances with skill+intent label | lang, skill, intent, text |
translation/ |
Parallel corpora for machine translation | pair, base_texts, target_texts |
slot_filling/ |
Intent templates with slot names + entity values | template, slots, entity_values |
response_pairs/ |
(utterance, responses) pairs via AST handler analysis | utterance, responses, handler |
tts/ |
Deduplicated dialog sentences for TTS training | lang, dialog, text |
skill_metadata/ |
Multilingual skill name, description, examples, tags | name, description, examples |
The response_pairs dataset is derived from AST analysis. At data-generation time, context_builder.py parses each skill's Python source and records context.triggers_dialog — the list of dialog file stems called by self.speak_dialog() in the intent handler. generate_response_pairs() uses this to pair intent utterances with their actual responses, without any string heuristics.
These are read-only informational pages. The onboarding (language selection) is only required for the Dashboard and skill editor pages, where user language preferences are used to filter content.
The onboarding guard was blocking all #/skill/... routes for users without a saved profile. But skill editor URLs with an explicit lang in the route (e.g. #/skill/foo/bar.entity/pt-PT) don't need a profile — the language is already in the URL. The guard now allows these through.
Previously, renderEditor() crashed with TypeError: Cannot read properties of undefined (reading 'type') when opened in create mode (no existing file). The bug was editor.dataset.fileType = fileData.type — fileData is undefined in create mode. Fixed to use the already-computed fileType variable instead.
Instead of "No source file found", the source panel shows the .intent files that reference {slotName}, with their sample utterances. This gives context for what values belong in the entity file. The panel header also changes to "Used in intents".
Click "Can't find your language? Request it" in the language selection modal. Enter the language name and its BCP-47 code (e.g. hi-IN). This opens a GitHub issue with a machine-readable payload. A maintainer reviews the PR created by automation and merges it — after which the language appears in the UI at 0 % progress, ready for contributions.
The allowlist lives in config/enabled_languages.txt — _load_enabled_languages() in scripts/generate_data.py reads it and injects those codes into coverage.json even when no locale files exist yet. The data refresh is triggered automatically on PR merge.
A maintainer must review and merge the automatically-created PR before the language appears. This prevents invalid codes (e.g. klingon, xx-XX) from polluting the platform.
GitHub ignores ?labels= query parameters for users without write access to the repo, so issues created via the frontend URL arrive without any labels. The enable_new_language.yml workflow now detects language requests by title prefix (Add language:) or body content (NEW_LANGUAGE_META) in addition to the new-language label, so label-less issues are handled correctly.
After BCP-47 normalization was added (lang_utils.EXPLICIT_MAPPING), eu-EU and eu both normalize to eu-ES, and es-LM normalizes to es-419. The old files contained data tagged with deprecated codes; they were replaced by eu-ES.jsonl and es-419.jsonl.