Skip to content

feat: HF dataset loader, intent expansion, and locale export#20

Draft
JarbasAl wants to merge 9 commits into
devfrom
feat/hf-datasets-and-scripts
Draft

feat: HF dataset loader, intent expansion, and locale export#20
JarbasAl wants to merge 9 commits into
devfrom
feat/hf-datasets-and-scripts

Conversation

@JarbasAl

@JarbasAl JarbasAl commented Jun 2, 2026

Copy link
Copy Markdown
Member

New module ovos_spec_tools.datasets for loading, expanding, and exporting OVOS-INTENT-2 templates from HuggingFace datasets.

What's added

ovos_spec_tools/datasets.py

  • load_dataset_templates(dataset_id, lang) — loads templates from any of three HF datasets: OpenVoiceOS/hass-intent-templates, intents-for-eval, massive-templates
  • expand_hf_template(template, expansions, max_samples) — resolves <keyword> refs, (a|b) alternations, [x] optionals into concrete utterances
  • export_to_locale(dataset_id, lang, output_dir) — writes .intent, .voc, .entity files to a standard OVOS-INTENT-2 locale tree
  • CLI entry via python -m ovos_spec_tools.datasets

New example scripts

Script Purpose
convert_hassil_intents.py Home Assistant hassil → OVOS locale converter
export_hf_dataset.py locale → multi-config HF dataset export
generate_entities.py auto-generate missing .entity files across all languages
reexport_recursive.py recursively resolve nested <keyword> references
reexport_uniform.py uniform list-struct schema for expansion values
hf_dataset.py unified CLI for all three supported datasets
locale_to_hf_dataset.py existing: OVOS locale → intents-for-eval format

Other

  • pyproject.toml: optional datasets extra
  • 17 tests in test/test_datasets.py
  • AGENTS.md + TODO.md in-repo docs
  • APPENDIX.md: formal hassil→OVOS grammar mapping documentation

Usage

from ovos_spec_tools.datasets import load_dataset_templates, expand_hf_template, export_to_locale

# Load English templates
templates = load_dataset_templates('hassil-intents', lang='en')

# Expand a template
utterances = expand_hf_template(tpl, expansions, max_samples=20)

# Export to locale directory
export_to_locale('hassil-intents', lang='en', output_dir='/tmp/locale')

Verification

  • All 320 existing tests pass with no regressions
  • 17 new tests in test_datasets.py
  • Tested end-to-end with the live OpenVoiceOS/hass-intent-templates dataset (61 configs, ~224k rows)

Summary by CodeRabbit

  • New Features

    • Added HuggingFace datasets module to load, expand, and export OVOS-INTENT-2 templates from supported datasets.
    • Added template expansion functionality to generate utterance samples from templates.
    • Added locale export capability to convert dataset templates into OVOS-compatible directory structures.
  • Documentation

    • Added comprehensive datasets guide with API reference and usage examples.
    • Added Home Assistant intent conversion documentation.
  • Tests

    • Added test coverage for datasets module functionality.
  • Chores

    • Added optional datasets dependency for advanced dataset features.

Introduce ovos_spec_tools.datasets — a new module for loading,
expanding, and exporting OVOS-INTENT-2 templates from HuggingFace
datasets:

* load_dataset_templates() — load from hass-intent-templates,
  intents-for-eval, or massive-templates via datasets library
* expand_hf_template() — resolve <keyword>, (a|b), [x] into
  concrete utterances using the existing expansion.py engine
* export_to_locale() — write .intent / .voc / .entity files
  into a standard OVOS-INTENT-2 locale directory tree

Add example scripts for the full dataset generation pipeline:
* convert_hassil_intents.py — Home Assistant hassil → OVOS locale
* export_hf_dataset.py — locale → multi-config HF dataset
* generate_entities.py — auto-generate missing .entity files
* reexport_recursive.py — recursively resolve nested <keyword> refs
* reexport_uniform.py — uniform list<struct> schema for expansions
* hf_dataset.py — unified CLI for all three supported datasets

Update pyproject.toml with optional 'datasets' dependency.
Add 17 tests covering config resolution, normalization, expansion,
and locale export.
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Too many files!

This PR contains 296 files, which is 146 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 475affe6-8f9c-4acd-bf49-c470f0a9852f

📥 Commits

Reviewing files that changed from the base of the PR and between bd54e89 and 1eb9068.

📒 Files selected for processing (296)
  • examples/hass-intent-dataset/base_locale/af/area.entity
  • examples/hass-intent-dataset/base_locale/af/color.entity
  • examples/hass-intent-dataset/base_locale/af/device_class.entity
  • examples/hass-intent-dataset/base_locale/af/domain.entity
  • examples/hass-intent-dataset/base_locale/af/floor.entity
  • examples/hass-intent-dataset/base_locale/af/name.entity
  • examples/hass-intent-dataset/base_locale/af/state.entity
  • examples/hass-intent-dataset/base_locale/ar/area.entity
  • examples/hass-intent-dataset/base_locale/ar/color.entity
  • examples/hass-intent-dataset/base_locale/ar/device_class.entity
  • examples/hass-intent-dataset/base_locale/ar/domain.entity
  • examples/hass-intent-dataset/base_locale/ar/floor.entity
  • examples/hass-intent-dataset/base_locale/ar/name.entity
  • examples/hass-intent-dataset/base_locale/ar/state.entity
  • examples/hass-intent-dataset/base_locale/bg/area.entity
  • examples/hass-intent-dataset/base_locale/bg/color.entity
  • examples/hass-intent-dataset/base_locale/bg/device_class.entity
  • examples/hass-intent-dataset/base_locale/bg/domain.entity
  • examples/hass-intent-dataset/base_locale/bg/floor.entity
  • examples/hass-intent-dataset/base_locale/bg/name.entity
  • examples/hass-intent-dataset/base_locale/bg/state.entity
  • examples/hass-intent-dataset/base_locale/bn/area.entity
  • examples/hass-intent-dataset/base_locale/bn/color.entity
  • examples/hass-intent-dataset/base_locale/bn/device_class.entity
  • examples/hass-intent-dataset/base_locale/bn/domain.entity
  • examples/hass-intent-dataset/base_locale/bn/floor.entity
  • examples/hass-intent-dataset/base_locale/bn/name.entity
  • examples/hass-intent-dataset/base_locale/bn/state.entity
  • examples/hass-intent-dataset/base_locale/ca/area.entity
  • examples/hass-intent-dataset/base_locale/ca/color.entity
  • examples/hass-intent-dataset/base_locale/ca/device_class.entity
  • examples/hass-intent-dataset/base_locale/ca/domain.entity
  • examples/hass-intent-dataset/base_locale/ca/floor.entity
  • examples/hass-intent-dataset/base_locale/ca/name.entity
  • examples/hass-intent-dataset/base_locale/ca/state.entity
  • examples/hass-intent-dataset/base_locale/cs/area.entity
  • examples/hass-intent-dataset/base_locale/cs/color.entity
  • examples/hass-intent-dataset/base_locale/cs/device_class.entity
  • examples/hass-intent-dataset/base_locale/cs/domain.entity
  • examples/hass-intent-dataset/base_locale/cs/floor.entity
  • examples/hass-intent-dataset/base_locale/cs/name.entity
  • examples/hass-intent-dataset/base_locale/cs/state.entity
  • examples/hass-intent-dataset/base_locale/cy/area.entity
  • examples/hass-intent-dataset/base_locale/cy/color.entity
  • examples/hass-intent-dataset/base_locale/cy/device_class.entity
  • examples/hass-intent-dataset/base_locale/cy/domain.entity
  • examples/hass-intent-dataset/base_locale/cy/floor.entity
  • examples/hass-intent-dataset/base_locale/cy/name.entity
  • examples/hass-intent-dataset/base_locale/cy/state.entity
  • examples/hass-intent-dataset/base_locale/da/area.entity
  • examples/hass-intent-dataset/base_locale/da/color.entity
  • examples/hass-intent-dataset/base_locale/da/device_class.entity
  • examples/hass-intent-dataset/base_locale/da/domain.entity
  • examples/hass-intent-dataset/base_locale/da/floor.entity
  • examples/hass-intent-dataset/base_locale/da/name.entity
  • examples/hass-intent-dataset/base_locale/da/state.entity
  • examples/hass-intent-dataset/base_locale/de/area.entity
  • examples/hass-intent-dataset/base_locale/de/color.entity
  • examples/hass-intent-dataset/base_locale/de/device_class.entity
  • examples/hass-intent-dataset/base_locale/de/domain.entity
  • examples/hass-intent-dataset/base_locale/de/floor.entity
  • examples/hass-intent-dataset/base_locale/de/name.entity
  • examples/hass-intent-dataset/base_locale/de/state.entity
  • examples/hass-intent-dataset/base_locale/el/area.entity
  • examples/hass-intent-dataset/base_locale/el/color.entity
  • examples/hass-intent-dataset/base_locale/el/device_class.entity
  • examples/hass-intent-dataset/base_locale/el/domain.entity
  • examples/hass-intent-dataset/base_locale/el/floor.entity
  • examples/hass-intent-dataset/base_locale/el/name.entity
  • examples/hass-intent-dataset/base_locale/el/state.entity
  • examples/hass-intent-dataset/base_locale/en/area.entity
  • examples/hass-intent-dataset/base_locale/en/color.entity
  • examples/hass-intent-dataset/base_locale/en/device_class.entity
  • examples/hass-intent-dataset/base_locale/en/domain.entity
  • examples/hass-intent-dataset/base_locale/en/floor.entity
  • examples/hass-intent-dataset/base_locale/en/name.entity
  • examples/hass-intent-dataset/base_locale/en/state.entity
  • examples/hass-intent-dataset/base_locale/es/area.entity
  • examples/hass-intent-dataset/base_locale/es/color.entity
  • examples/hass-intent-dataset/base_locale/es/device_class.entity
  • examples/hass-intent-dataset/base_locale/es/domain.entity
  • examples/hass-intent-dataset/base_locale/es/floor.entity
  • examples/hass-intent-dataset/base_locale/es/name.entity
  • examples/hass-intent-dataset/base_locale/es/state.entity
  • examples/hass-intent-dataset/base_locale/et/area.entity
  • examples/hass-intent-dataset/base_locale/et/color.entity
  • examples/hass-intent-dataset/base_locale/et/device_class.entity
  • examples/hass-intent-dataset/base_locale/et/domain.entity
  • examples/hass-intent-dataset/base_locale/et/floor.entity
  • examples/hass-intent-dataset/base_locale/et/name.entity
  • examples/hass-intent-dataset/base_locale/et/state.entity
  • examples/hass-intent-dataset/base_locale/eu/area.entity
  • examples/hass-intent-dataset/base_locale/eu/color.entity
  • examples/hass-intent-dataset/base_locale/eu/device_class.entity
  • examples/hass-intent-dataset/base_locale/eu/domain.entity
  • examples/hass-intent-dataset/base_locale/eu/floor.entity
  • examples/hass-intent-dataset/base_locale/eu/name.entity
  • examples/hass-intent-dataset/base_locale/eu/state.entity
  • examples/hass-intent-dataset/base_locale/fa/area.entity
  • examples/hass-intent-dataset/base_locale/fa/color.entity
  • examples/hass-intent-dataset/base_locale/fa/device_class.entity
  • examples/hass-intent-dataset/base_locale/fa/domain.entity
  • examples/hass-intent-dataset/base_locale/fa/floor.entity
  • examples/hass-intent-dataset/base_locale/fa/name.entity
  • examples/hass-intent-dataset/base_locale/fa/state.entity
  • examples/hass-intent-dataset/base_locale/fi/area.entity
  • examples/hass-intent-dataset/base_locale/fi/color.entity
  • examples/hass-intent-dataset/base_locale/fi/device_class.entity
  • examples/hass-intent-dataset/base_locale/fi/domain.entity
  • examples/hass-intent-dataset/base_locale/fi/floor.entity
  • examples/hass-intent-dataset/base_locale/fi/name.entity
  • examples/hass-intent-dataset/base_locale/fi/state.entity
  • examples/hass-intent-dataset/base_locale/fr/area.entity
  • examples/hass-intent-dataset/base_locale/fr/color.entity
  • examples/hass-intent-dataset/base_locale/fr/device_class.entity
  • examples/hass-intent-dataset/base_locale/fr/domain.entity
  • examples/hass-intent-dataset/base_locale/fr/floor.entity
  • examples/hass-intent-dataset/base_locale/fr/name.entity
  • examples/hass-intent-dataset/base_locale/fr/state.entity
  • examples/hass-intent-dataset/base_locale/ga/area.entity
  • examples/hass-intent-dataset/base_locale/ga/color.entity
  • examples/hass-intent-dataset/base_locale/ga/device_class.entity
  • examples/hass-intent-dataset/base_locale/ga/domain.entity
  • examples/hass-intent-dataset/base_locale/ga/floor.entity
  • examples/hass-intent-dataset/base_locale/ga/name.entity
  • examples/hass-intent-dataset/base_locale/ga/state.entity
  • examples/hass-intent-dataset/base_locale/gl/area.entity
  • examples/hass-intent-dataset/base_locale/gl/color.entity
  • examples/hass-intent-dataset/base_locale/gl/device_class.entity
  • examples/hass-intent-dataset/base_locale/gl/domain.entity
  • examples/hass-intent-dataset/base_locale/gl/floor.entity
  • examples/hass-intent-dataset/base_locale/gl/name.entity
  • examples/hass-intent-dataset/base_locale/gl/state.entity
  • examples/hass-intent-dataset/base_locale/gu/area.entity
  • examples/hass-intent-dataset/base_locale/gu/color.entity
  • examples/hass-intent-dataset/base_locale/gu/device_class.entity
  • examples/hass-intent-dataset/base_locale/gu/domain.entity
  • examples/hass-intent-dataset/base_locale/gu/floor.entity
  • examples/hass-intent-dataset/base_locale/gu/name.entity
  • examples/hass-intent-dataset/base_locale/gu/state.entity
  • examples/hass-intent-dataset/base_locale/he/area.entity
  • examples/hass-intent-dataset/base_locale/he/color.entity
  • examples/hass-intent-dataset/base_locale/he/device_class.entity
  • examples/hass-intent-dataset/base_locale/he/domain.entity
  • examples/hass-intent-dataset/base_locale/he/floor.entity
  • examples/hass-intent-dataset/base_locale/he/name.entity
  • examples/hass-intent-dataset/base_locale/he/state.entity
  • examples/hass-intent-dataset/base_locale/hi/area.entity
  • examples/hass-intent-dataset/base_locale/hi/color.entity
  • examples/hass-intent-dataset/base_locale/hi/device_class.entity
  • examples/hass-intent-dataset/base_locale/hi/domain.entity
  • examples/hass-intent-dataset/base_locale/hi/floor.entity
  • examples/hass-intent-dataset/base_locale/hi/name.entity
  • examples/hass-intent-dataset/base_locale/hi/state.entity
  • examples/hass-intent-dataset/base_locale/hr/area.entity
  • examples/hass-intent-dataset/base_locale/hr/color.entity
  • examples/hass-intent-dataset/base_locale/hr/device_class.entity
  • examples/hass-intent-dataset/base_locale/hr/domain.entity
  • examples/hass-intent-dataset/base_locale/hr/floor.entity
  • examples/hass-intent-dataset/base_locale/hr/name.entity
  • examples/hass-intent-dataset/base_locale/hr/state.entity
  • examples/hass-intent-dataset/base_locale/hu/area.entity
  • examples/hass-intent-dataset/base_locale/hu/color.entity
  • examples/hass-intent-dataset/base_locale/hu/device_class.entity
  • examples/hass-intent-dataset/base_locale/hu/domain.entity
  • examples/hass-intent-dataset/base_locale/hu/floor.entity
  • examples/hass-intent-dataset/base_locale/hu/name.entity
  • examples/hass-intent-dataset/base_locale/hu/state.entity
  • examples/hass-intent-dataset/base_locale/is/area.entity
  • examples/hass-intent-dataset/base_locale/is/color.entity
  • examples/hass-intent-dataset/base_locale/is/device_class.entity
  • examples/hass-intent-dataset/base_locale/is/domain.entity
  • examples/hass-intent-dataset/base_locale/is/floor.entity
  • examples/hass-intent-dataset/base_locale/is/name.entity
  • examples/hass-intent-dataset/base_locale/is/state.entity
  • examples/hass-intent-dataset/base_locale/it/area.entity
  • examples/hass-intent-dataset/base_locale/it/color.entity
  • examples/hass-intent-dataset/base_locale/it/device_class.entity
  • examples/hass-intent-dataset/base_locale/it/domain.entity
  • examples/hass-intent-dataset/base_locale/it/floor.entity
  • examples/hass-intent-dataset/base_locale/it/name.entity
  • examples/hass-intent-dataset/base_locale/it/state.entity
  • examples/hass-intent-dataset/base_locale/ja/area.entity
  • examples/hass-intent-dataset/base_locale/ja/color.entity
  • examples/hass-intent-dataset/base_locale/ja/device_class.entity
  • examples/hass-intent-dataset/base_locale/ja/domain.entity
  • examples/hass-intent-dataset/base_locale/ja/floor.entity
  • examples/hass-intent-dataset/base_locale/ja/name.entity
  • examples/hass-intent-dataset/base_locale/ja/state.entity
  • examples/hass-intent-dataset/base_locale/ka/area.entity
  • examples/hass-intent-dataset/base_locale/ka/color.entity
  • examples/hass-intent-dataset/base_locale/ka/device_class.entity
  • examples/hass-intent-dataset/base_locale/ka/domain.entity
  • examples/hass-intent-dataset/base_locale/ka/floor.entity
  • examples/hass-intent-dataset/base_locale/ka/name.entity
  • examples/hass-intent-dataset/base_locale/ka/state.entity
  • examples/hass-intent-dataset/base_locale/kn/area.entity
  • examples/hass-intent-dataset/base_locale/kn/color.entity
  • examples/hass-intent-dataset/base_locale/kn/device_class.entity
  • examples/hass-intent-dataset/base_locale/kn/domain.entity
  • examples/hass-intent-dataset/base_locale/kn/floor.entity
  • examples/hass-intent-dataset/base_locale/kn/name.entity
  • examples/hass-intent-dataset/base_locale/kn/state.entity
  • examples/hass-intent-dataset/base_locale/ko/area.entity
  • examples/hass-intent-dataset/base_locale/ko/color.entity
  • examples/hass-intent-dataset/base_locale/ko/device_class.entity
  • examples/hass-intent-dataset/base_locale/ko/domain.entity
  • examples/hass-intent-dataset/base_locale/ko/floor.entity
  • examples/hass-intent-dataset/base_locale/ko/name.entity
  • examples/hass-intent-dataset/base_locale/ko/state.entity
  • examples/hass-intent-dataset/base_locale/kw/area.entity
  • examples/hass-intent-dataset/base_locale/kw/color.entity
  • examples/hass-intent-dataset/base_locale/kw/device_class.entity
  • examples/hass-intent-dataset/base_locale/kw/domain.entity
  • examples/hass-intent-dataset/base_locale/kw/floor.entity
  • examples/hass-intent-dataset/base_locale/kw/name.entity
  • examples/hass-intent-dataset/base_locale/kw/state.entity
  • examples/hass-intent-dataset/base_locale/lb/area.entity
  • examples/hass-intent-dataset/base_locale/lb/color.entity
  • examples/hass-intent-dataset/base_locale/lb/device_class.entity
  • examples/hass-intent-dataset/base_locale/lb/domain.entity
  • examples/hass-intent-dataset/base_locale/lb/floor.entity
  • examples/hass-intent-dataset/base_locale/lb/name.entity
  • examples/hass-intent-dataset/base_locale/lb/state.entity
  • examples/hass-intent-dataset/base_locale/lt/area.entity
  • examples/hass-intent-dataset/base_locale/lt/color.entity
  • examples/hass-intent-dataset/base_locale/lt/device_class.entity
  • examples/hass-intent-dataset/base_locale/lt/domain.entity
  • examples/hass-intent-dataset/base_locale/lt/floor.entity
  • examples/hass-intent-dataset/base_locale/lt/name.entity
  • examples/hass-intent-dataset/base_locale/lt/state.entity
  • examples/hass-intent-dataset/base_locale/lv/area.entity
  • examples/hass-intent-dataset/base_locale/lv/color.entity
  • examples/hass-intent-dataset/base_locale/lv/device_class.entity
  • examples/hass-intent-dataset/base_locale/lv/domain.entity
  • examples/hass-intent-dataset/base_locale/lv/floor.entity
  • examples/hass-intent-dataset/base_locale/lv/name.entity
  • examples/hass-intent-dataset/base_locale/lv/state.entity
  • examples/hass-intent-dataset/base_locale/ml/area.entity
  • examples/hass-intent-dataset/base_locale/ml/color.entity
  • examples/hass-intent-dataset/base_locale/ml/device_class.entity
  • examples/hass-intent-dataset/base_locale/ml/domain.entity
  • examples/hass-intent-dataset/base_locale/ml/floor.entity
  • examples/hass-intent-dataset/base_locale/ml/name.entity
  • examples/hass-intent-dataset/base_locale/ml/state.entity
  • examples/hass-intent-dataset/base_locale/mn/area.entity
  • examples/hass-intent-dataset/base_locale/mn/color.entity
  • examples/hass-intent-dataset/base_locale/mn/device_class.entity
  • examples/hass-intent-dataset/base_locale/mn/domain.entity
  • examples/hass-intent-dataset/base_locale/mn/floor.entity
  • examples/hass-intent-dataset/base_locale/mn/name.entity
  • examples/hass-intent-dataset/base_locale/mn/state.entity
  • examples/hass-intent-dataset/base_locale/mr/area.entity
  • examples/hass-intent-dataset/base_locale/mr/color.entity
  • examples/hass-intent-dataset/base_locale/mr/device_class.entity
  • examples/hass-intent-dataset/base_locale/mr/domain.entity
  • examples/hass-intent-dataset/base_locale/mr/floor.entity
  • examples/hass-intent-dataset/base_locale/mr/name.entity
  • examples/hass-intent-dataset/base_locale/mr/state.entity
  • examples/hass-intent-dataset/base_locale/nb/area.entity
  • examples/hass-intent-dataset/base_locale/nb/color.entity
  • examples/hass-intent-dataset/base_locale/nb/device_class.entity
  • examples/hass-intent-dataset/base_locale/nb/domain.entity
  • examples/hass-intent-dataset/base_locale/nb/floor.entity
  • examples/hass-intent-dataset/base_locale/nb/name.entity
  • examples/hass-intent-dataset/base_locale/nb/state.entity
  • examples/hass-intent-dataset/base_locale/ne/area.entity
  • examples/hass-intent-dataset/base_locale/ne/color.entity
  • examples/hass-intent-dataset/base_locale/ne/device_class.entity
  • examples/hass-intent-dataset/base_locale/ne/domain.entity
  • examples/hass-intent-dataset/base_locale/ne/floor.entity
  • examples/hass-intent-dataset/base_locale/ne/name.entity
  • examples/hass-intent-dataset/base_locale/ne/state.entity
  • examples/hass-intent-dataset/base_locale/nl/area.entity
  • examples/hass-intent-dataset/base_locale/nl/color.entity
  • examples/hass-intent-dataset/base_locale/nl/device_class.entity
  • examples/hass-intent-dataset/base_locale/nl/domain.entity
  • examples/hass-intent-dataset/base_locale/nl/floor.entity
  • examples/hass-intent-dataset/base_locale/nl/name.entity
  • examples/hass-intent-dataset/base_locale/nl/state.entity
  • examples/hass-intent-dataset/base_locale/pa/area.entity
  • examples/hass-intent-dataset/base_locale/pa/color.entity
  • examples/hass-intent-dataset/base_locale/pa/device_class.entity
  • examples/hass-intent-dataset/base_locale/pa/domain.entity
  • examples/hass-intent-dataset/base_locale/pa/floor.entity
  • examples/hass-intent-dataset/base_locale/pa/name.entity
  • examples/hass-intent-dataset/base_locale/pa/state.entity
  • examples/hass-intent-dataset/base_locale/pl/area.entity
  • examples/hass-intent-dataset/base_locale/pl/color.entity
  • examples/hass-intent-dataset/base_locale/pl/device_class.entity
  • examples/hass-intent-dataset/base_locale/pl/domain.entity
  • examples/hass-intent-dataset/base_locale/pl/floor.entity
  • examples/hass-intent-dataset/base_locale/pl/name.entity
  • examples/hass-intent-dataset/base_locale/pl/state.entity
  • examples/hass-intent-dataset/base_locale/pt-BR/area.entity
  • examples/hass-intent-dataset/base_locale/pt-BR/color.entity

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds comprehensive HuggingFace dataset support to ovos-spec-tools. It introduces dataset loading and OVOS-INTENT-2 locale export APIs, a large hassil-to-OVOS conversion pipeline, utility scripts for dataset generation and re-export, example CLI tools, and full test coverage.

Changes

Dataset Loading and Export Feature

Layer / File(s) Summary
Core datasets module and exports
ovos_spec_tools/datasets.py, ovos_spec_tools/__init__.py, pyproject.toml
New datasets.py module provides load_dataset_templates (loads HF datasets with row normalization), expand_hf_template (expands templates into utterance samples), and export_to_locale (writes .intent, .voc, .entity files to locale structure). Conditional imports in __init__.py handle missing datasets library. Optional datasets dependency added to project.
Documentation and README updates
docs/README.md, docs/api-reference.md, docs/datasets.md
README updated from five to six capabilities with new dataset loader bullet. New comprehensive datasets.md documents supported datasets, API usage, round-trip workflows, and CLI examples. API reference adds Datasets chapter (7) with function signatures; Linting chapter shifted to 8.
Hassil to OVOS-INTENT-2 conversion pipeline
examples/hass-intent-dataset/convert_hassil_intents.py, examples/hass-intent-dataset/APPENDIX.md
Large conversion script (1813 lines) normalizes hassil grammar, inlines rule references, rewrites responses (Jinja decomposition), and streams .intent/.dialog/.voc/.entity outputs with safety caps and resumable per-target processing. Detailed APPENDIX documents grammar correspondence, naming rules, expansion semantics, slot layout validation, and response normalization with lossiness considerations.
HF dataset export and re-export utilities
examples/hass-intent-dataset/export_hf_dataset.py, examples/hass-intent-dataset/generate_entities.py, examples/hass-intent-dataset/reexport_recursive.py, examples/hass-intent-dataset/reexport_uniform.py
export_hf_dataset.py converts locale directories to HF JSONL format (templates/keywords/entities/test configs). generate_entities.py scans intents for slot placeholders and writes missing entity files with language-specific value mappings. reexport_recursive.py and reexport_uniform.py process template JSONL files, resolving <keyword> expansions from vocab files with cycle prevention and recursive substitution.
Example CLI tools for dataset workflows
examples/hf_dataset.py, examples/locale_to_hf_dataset.py
hf_dataset.py CLI loads HF dataset templates, optionally expands samples, and exports to locale directory. locale_to_hf_dataset.py CLI converts existing OVOS locale trees to HF dataset format, expanding vocab references and deriving slot examples with per-slot value caps.
Unit tests for datasets module
test/test_datasets.py
Test classes cover config resolution, domain stripping, row normalization across dataset styles, template expansion with grammar and max_samples capping, export to locale with file verification, and SUPPORTED_DATASETS registry validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 A rabbit hops through datasets vast,
HuggingFace templates loaded fast,
Hassil to OVOS, a winding track,
Expansion, export, no looking back!
Six things now possible, let's celebrate—
New voices loaded at a rapid rate! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: HF dataset loader, intent expansion, and locale export' directly and accurately summarizes the main changes: adding a HuggingFace dataset loader module, intent template expansion functionality, and locale directory export capability.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/hf-datasets-and-scripts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

Automated check summary ready. 📊

I've aggregated the results of the automated checks for this PR below.

🔍 Lint

The automated checks have finished their work. 🏁

ruff: issues found — see job log

🔨 Build Tests

Checking if the code is properly tempered. ⚔️

Python Build Install Tests
3.10 ⚠️
3.11 ⚠️
3.12 ⚠️
3.13 ⚠️
3.14 ⚠️

❌ 3.10: Install OK, tests failed
❌ 3.11: Install OK, tests failed
❌ 3.12: Install OK, tests failed
❌ 3.13: Install OK, tests failed
❌ 3.14: Install OK, tests failed
Check job logs for details.

📊 Coverage

Is the code fully immunized with tests? 💉

94.0% total coverage

⚠️ Some tests failed — coverage figures may be incomplete.

Per-file coverage (10 files)
File Coverage Missing lines
ovos_spec_tools/__init__.py 68.8% 5
ovos_spec_tools/datasets.py 79.1% 24
ovos_spec_tools/language.py 89.7% 7
ovos_spec_tools/message.py 95.9% 3
ovos_spec_tools/resources.py 97.1% 6
ovos_spec_tools/expansion.py 98.1% 2
ovos_spec_tools/lint.py 98.6% 2
ovos_spec_tools/dialog.py 100.0% 0
ovos_spec_tools/prompt.py 100.0% 0
ovos_spec_tools/version.py 100.0% 0

Full report: download the coverage-report artifact.

🔒 Security (pip-audit)

Ensuring our password hashing is up to date. 🔨

✅ No known vulnerabilities found (32 packages scanned).

🏷️ Release Preview

Ensuring our release process remains smooth and efficient. 🚂

Current: 0.7.0a1Next: 0.8.0a1

Signal Value
Label feature
PR title feat: HF dataset loader, intent expansion, and locale export
Bump minor

✅ PR title follows conventional commit format.


🚀 Release Channel Compatibility

Predicted next version: 0.8.0a1

Channel Status Note Current Constraint
Stable Not in channel -
Testing Not in channel -
Alpha Compatible ovos-spec-tools>=0.7.0a1

📋 Repo Health

Scanning for any signs of 'orphaned' code limbs. 🦾

✅ All required files present.

Latest Version: 0.7.0a1

ovos_spec_tools/version.py — Version file
README.md — README
LICENSE.md — License file (consider renaming to LICENSE)
pyproject.toml — pyproject.toml
⚠️ setup.py — setup.py
CHANGELOG.md — Changelog
ovos_spec_tools/version.py has valid version block markers

⚖️ License Check

I've checked the genealogical tree of your licenses. 🌳

❌ License violations detected (4 packages) — review required before merging.

Dependency                          License Name                                            License Type         Misc                                    
ovos-spec-tools                     Error                                                   Error                                                        

License Type                        Found                                                  
Error                               1

License distribution: 1× Apache Software License, 1× Apache-2.0 OR BSD-2-Clause, 1× MIT, 1× MIT License

Full breakdown — 4 packages
Package Version License URL
build 1.5.0 MIT link
ovos-spec-tools ⚠️ 0.7.0a1 Apache Software License link
packaging 26.2 Apache-2.0 OR BSD-2-Clause link
pyproject_hooks 1.2.0 MIT License link

Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed.


An automated high-five for your latest changes! 🖐️

@JarbasAl JarbasAl marked this pull request as ready for review June 2, 2026 08:37
@github-actions github-actions Bot added feature and removed feature labels Jun 2, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pyproject.toml (1)

33-36: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

test extra omits datasets.

CI runs test/test_datasets.py::TestExportToLocale::test_round_trip_small, which calls the real load_dataset_templates(...) and triggers ImportError because datasets isn't installed in the test environment. The primary fix is making that test hermetic (see the comment in test/test_datasets.py); however, if any test is intended to exercise the real loader, datasets must also be added here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` around lines 33 - 36, The test failure is caused by the test
importing the real datasets package via load_dataset_templates in
TestExportToLocale::test_round_trip_small (test/test_datasets.py); either make
that test hermetic by mocking/stubbing load_dataset_templates (or patching the
datasets import) so it doesn't require the real package, or if the intent is to
exercise the real loader, add "datasets" to the test extras in pyproject.toml
(the test = [...] list) so the CI environment installs it; update whichever
approach you choose and rerun tests.
🧹 Nitpick comments (4)
ovos_spec_tools/datasets.py (1)

284-289: 💤 Low value

Dead loop over <keyword> references.

This block iterates VOC_RE matches but only assigns kw and falls through with a comment — it produces no output and has no side effects. Either drop it, or implement the intended fallback (e.g. registering an empty .voc placeholder so downstream <keyword> refs resolve).

♻️ Suggested removal
-        # If no expansions but the template has <keyword> refs,
-        # try to extract inline alternations
-        if not row.get("expansions"):
-            for m in VOC_RE.finditer(tpl):
-                kw = m.group(1)
-                # No expansion data available; carry on
-
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ovos_spec_tools/datasets.py` around lines 284 - 289, The loop over
VOC_RE.finditer(tpl) (inside the branch if not row.get("expansions")) is dead —
it only assigns kw and does nothing; either remove the loop entirely or
implement the intended fallback: for each match extract kw and register an empty
expansion placeholder so downstream <keyword> references resolve (e.g., ensure
row has an "expansions" mapping and add an entry for kw with an empty
list/placeholder). Update the code around VOC_RE, tpl, and row to create
row.setdefault("expansions", {})[kw] = [] (or equivalent placeholder structure
used by downstream code) so the template fallback works, or delete the whole
block if the fallback is not needed.
examples/hass-intent-dataset/reexport_recursive.py (1)

29-69: 💤 Low value

Unbounded recursive expansion.

_resolve_value produces the full cartesian set of nested <keyword> substitutions with no cap; deeply nested vocabs can blow up memory. Cycles are guarded by seen, but breadth is not. Consider a max-results cap if this runs on large corpora.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/hass-intent-dataset/reexport_recursive.py` around lines 29 - 69, The
recursive expansion can explode in breadth; modify _resolve_value and
_resolve_expansions to accept a max_results (or limit) parameter and enforce it
during recursion: thread the limit through calls to _resolve_value from
_resolve_expansions, stop expanding further branches once the accumulating
results list reaches the limit (check len(results) during the loop and return
early), and propagate that early-return up the recursion so the overall result
set never exceeds max_results; also apply the same cap when building flat_values
before deduplication to prevent excessive memory use.
examples/hass-intent-dataset/generate_entities.py (1)

260-263: ⚡ Quick win

Define lt state inline instead of patching after the dict.

lt.state is declared as a nested list [[...]] (line 261) and then corrected by the post-dict block at lines 366-368. Define the correct flat list directly and drop the patch.

♻️ Proposed fix
     "lt": {
-        "state": [["įjungta", "išjungta", "atidaryta", "uždaryta", "užrakinta", "atrakinta"]],
+        "state": ["įjungta", "išjungta", "atidaryta", "uždaryta", "užrakinta", "atrakinta"],
         "color": ["balta", "juoda", "raudona", "oranžinė", "geltona", "žalia", "mėlyna", "violetinė", "rudа", "rožinė", "turkio"],
     },

And remove the patch block:

-# Fix lt state
-if "lt" in _LANG_OVERRIDES:
-    _LANG_OVERRIDES["lt"]["state"] = ["įjungta", "išjungta", "atidaryta", "uždaryta", "užrakinta", "atrakinta"]

Also applies to: 366-368

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/hass-intent-dataset/generate_entities.py` around lines 260 - 263,
The lt entry in the languages dict defines "state" as a nested list (e.g.
[["įjungta", ...]]) but is later patched to a flat list; change the "lt" ->
"state" value to a flat list directly (remove the extra nesting) and delete the
subsequent post-dict patch that overwrites lt['state'] (the block that sets
lt['state'] after the dict). Update references to "lt" and "state" accordingly
so no later correction is needed.
examples/hass-intent-dataset/convert_hassil_intents.py (1)

1226-1226: ⚡ Quick win

Simplify nested if-block detection.

The condition "{% if" in branch[len(branch) - len(branch.lstrip()):] is unnecessarily complex. The expression len(branch) - len(branch.lstrip()) computes the number of leading whitespace characters, and slicing from that position gives branch.lstrip(). This is equivalent to:

"{% if" in branch.lstrip()

or simply:

"{% if" in branch

since the presence check doesn't require stripping.

♻️ Proposed simplification
-        sub_branches = _split_jinja_if(branch) if "{% if" in branch[len(branch) - len(branch.lstrip()):] else [branch]
+        sub_branches = _split_jinja_if(branch) if "{% if" in branch else [branch]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/hass-intent-dataset/convert_hassil_intents.py` at line 1226, The
conditional that detects a leading Jinja if in the assignment to sub_branches is
overly complex; simplify the check by testing the stripped or full string
instead of computing leading-whitespace length—replace the expression used in
the ternary for sub_branches (which references branch and calls
_split_jinja_if(branch)) with a simpler membership test like "{% if" in
branch.lstrip() (or "{% if" in branch) so the branch is split via
_split_jinja_if(branch) only when appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/hass-intent-dataset/convert_hassil_intents.py`:
- Line 47: The import of the sys module is unused; remove the top-level "import
sys" statement so the script no longer contains an unused import (look for the
"import sys" line in convert_hassil_intents.py and delete it).
- Line 1759: The print statement currently uses an unnecessary f-string: replace
the f-prefixed call to print (the line printing "  --check: intents with <50%
sample survival --") with a plain string literal to avoid misleading formatting
usage; locate the print(...) invocation in convert_hassil_intents.py and remove
the leading "f" so it becomes print("  --check: intents with <50% sample
survival --").
- Line 702: The Danish "da" mapping in the dictionary contains a duplicate key
"abn" (second occurrence at the shown diff); remove the redundant "abn": "open"
entry so the "da" dict only has a single "abn" mapping, ensuring no duplicate
keys remain in the dictionary (locate the "da" dict in convert_hassil_intents.py
to make the edit).

In `@examples/hass-intent-dataset/export_hf_dataset.py`:
- Line 33: Remove the unused import statement "import os" from the top-level
imports in export_hf_dataset.py (the unused symbol is the os import) so the lint
error ruff F401 is resolved; verify there are no references to os elsewhere in
the file and run the linter/CI to confirm the import removal fixes the failing
check.

In `@examples/hass-intent-dataset/generate_entities.py`:
- Around line 360-363: The dict _LANG_OVERRIDES in generate_entities.py contains
a duplicated "pl" key; remove the second "pl" entry (the block with "state" and
"color") so there is only one "pl" definition in _LANG_OVERRIDES, or if the two
differ intentionally, merge their values into the single "pl" entry instead;
ensure no duplicate keys remain to fix the ruff F601 CI failure.
- Line 44: The domain list contains a value with a leading space (" humidifier")
which will produce incorrect vocabulary entries; remove the stray space so it
reads "humidifier" and also defensively trim domain strings before writing
entities (e.g., sanitize the domain list or apply .strip()/trim() to each item)
to prevent any other leading/trailing whitespace from leaking into generated
.entity files.
- Line 460: The import for the standard library module "re" is declared at the
bottom of the module but used earlier (around line 438), causing a lint error
(E402); move the line "import re" up into the top import block with the other
imports so the module-level imports appear before any code or usage, ensuring
the name "re" is available where it's referenced.
- Around line 6-9: The file imports unused symbols causing lint/CI
failures—remove the unused imports json, os, and defaultdict from the top of
generate_entities.py and leave only the required import(s) (e.g., pathlib.Path)
after verifying Path is actually used; update the import line(s) accordingly so
only used modules are imported.

In `@examples/hass-intent-dataset/reexport_uniform.py`:
- Around line 47-54: The code builds expansions from refs returned by
_extract_keyword_refs but preserves duplicates; change the loop that creates
expansions from refs so it deduplicates while preserving order: iterate refs,
keep a local seen set, for each ref skip if already seen, otherwise look up vals
= vocabs.get(ref) and if vals append {"keyword": ref, "values": vals} and add
ref to seen, then assign row["expansions"] only if expansions is non-empty;
update the block around refs/expansions to use this seen-based dedupe.

In `@examples/hf_dataset.py`:
- Line 17: Remove the unused top-level import "sys" from examples/hf_dataset.py:
delete the "import sys" statement (it is unused and ruff F401-failing), leaving
the local "argparse" usage inside main intact; verify there are no other
references to "sys" such as in function main or elsewhere before committing.

In `@ovos_spec_tools/datasets.py`:
- Around line 54-58: The SUPPORTED_DATASETS mapping in
ovos_spec_tools/datasets.py incorrectly maps the key "hassil-intents" to
"OpenVoiceOS/hassil-intents-locale" while the module docstring and docs expect
"OpenVoiceOS/hass-intent-templates"; update the dictionary entry in
SUPPORTED_DATASETS to use "OpenVoiceOS/hass-intent-templates" for
"hassil-intents" and ensure any related references (module docstring or
consumers of SUPPORTED_DATASETS) remain consistent; run or adjust
test_urls_valid if needed to validate the full repo name rather than only the
"OpenVoiceOS/" prefix.

In `@test/test_datasets.py`:
- Around line 204-207: The test test_urls_valid is too permissive—it's only
checking for a slash and the "OpenVoiceOS/" prefix on entries from
SUPPORTED_DATASETS, which allows wrong repo names to slip through; update the
test to assert the exact expected repository names for each key in
SUPPORTED_DATASETS (or assert the set of SUPPORTED_DATASETS.values() equals an
expected set/list of repo strings) so the registry cannot silently drift—modify
test_urls_valid to compare SUPPORTED_DATASETS against the precise expected repo
names.
- Around line 171-190: The test test_round_trip_small currently calls
load_dataset_templates before the patch, causing a real dataset fetch; fix it by
removing the real call and creating a local fixture list (e.g., a small list of
dicts with intent_id and template) to use as rows, then patch
ovos_spec_tools.datasets.load_dataset_templates to return that fixture before
calling export_to_locale; ensure you derive first = fixture[:2] and use
fixture[0]["intent_id"] and fixture[0]["template"] when checking the exported
intent file so the test remains hermetic and does not hit the datasets library.

---

Outside diff comments:
In `@pyproject.toml`:
- Around line 33-36: The test failure is caused by the test importing the real
datasets package via load_dataset_templates in
TestExportToLocale::test_round_trip_small (test/test_datasets.py); either make
that test hermetic by mocking/stubbing load_dataset_templates (or patching the
datasets import) so it doesn't require the real package, or if the intent is to
exercise the real loader, add "datasets" to the test extras in pyproject.toml
(the test = [...] list) so the CI environment installs it; update whichever
approach you choose and rerun tests.

---

Nitpick comments:
In `@examples/hass-intent-dataset/convert_hassil_intents.py`:
- Line 1226: The conditional that detects a leading Jinja if in the assignment
to sub_branches is overly complex; simplify the check by testing the stripped or
full string instead of computing leading-whitespace length—replace the
expression used in the ternary for sub_branches (which references branch and
calls _split_jinja_if(branch)) with a simpler membership test like "{% if" in
branch.lstrip() (or "{% if" in branch) so the branch is split via
_split_jinja_if(branch) only when appropriate.

In `@examples/hass-intent-dataset/generate_entities.py`:
- Around line 260-263: The lt entry in the languages dict defines "state" as a
nested list (e.g. [["įjungta", ...]]) but is later patched to a flat list;
change the "lt" -> "state" value to a flat list directly (remove the extra
nesting) and delete the subsequent post-dict patch that overwrites lt['state']
(the block that sets lt['state'] after the dict). Update references to "lt" and
"state" accordingly so no later correction is needed.

In `@examples/hass-intent-dataset/reexport_recursive.py`:
- Around line 29-69: The recursive expansion can explode in breadth; modify
_resolve_value and _resolve_expansions to accept a max_results (or limit)
parameter and enforce it during recursion: thread the limit through calls to
_resolve_value from _resolve_expansions, stop expanding further branches once
the accumulating results list reaches the limit (check len(results) during the
loop and return early), and propagate that early-return up the recursion so the
overall result set never exceeds max_results; also apply the same cap when
building flat_values before deduplication to prevent excessive memory use.

In `@ovos_spec_tools/datasets.py`:
- Around line 284-289: The loop over VOC_RE.finditer(tpl) (inside the branch if
not row.get("expansions")) is dead — it only assigns kw and does nothing; either
remove the loop entirely or implement the intended fallback: for each match
extract kw and register an empty expansion placeholder so downstream <keyword>
references resolve (e.g., ensure row has an "expansions" mapping and add an
entry for kw with an empty list/placeholder). Update the code around VOC_RE,
tpl, and row to create row.setdefault("expansions", {})[kw] = [] (or equivalent
placeholder structure used by downstream code) so the template fallback works,
or delete the whole block if the fallback is not needed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3b6dcafe-d104-4e1f-b5ff-bae3f224aa8f

📥 Commits

Reviewing files that changed from the base of the PR and between 4c24a60 and bd54e89.

📒 Files selected for processing (15)
  • docs/README.md
  • docs/api-reference.md
  • docs/datasets.md
  • examples/hass-intent-dataset/APPENDIX.md
  • examples/hass-intent-dataset/convert_hassil_intents.py
  • examples/hass-intent-dataset/export_hf_dataset.py
  • examples/hass-intent-dataset/generate_entities.py
  • examples/hass-intent-dataset/reexport_recursive.py
  • examples/hass-intent-dataset/reexport_uniform.py
  • examples/hf_dataset.py
  • examples/locale_to_hf_dataset.py
  • ovos_spec_tools/__init__.py
  • ovos_spec_tools/datasets.py
  • pyproject.toml
  • test/test_datasets.py

import hashlib
import itertools
import re
import sys

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused import.

The sys module is imported but never referenced in the script.

🧹 Proposed fix
 import hashlib
 import itertools
 import re
-import sys
 import unicodedata
 from pathlib import Path
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import sys
import hashlib
import itertools
import re
import unicodedata
from pathlib import Path
🧰 Tools
🪛 GitHub Actions: Lint / 0_lint _ lint.txt

[error] 47-47: Ruff check failed: F401 sys imported but unused

🪛 GitHub Actions: Lint / lint _ lint

[error] 47-47: Ruff check failed: F401 sys imported but unused

🪛 GitHub Check: lint / lint

[failure] 47-47: ruff (F401)
examples/hass-intent-dataset/convert_hassil_intents.py:47:8: F401 sys imported but unused
help: Remove unused import: sys

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/hass-intent-dataset/convert_hassil_intents.py` at line 47, The
import of the sys module is unused; remove the top-level "import sys" statement
so the script no longer contains an unused import (look for the "import sys"
line in convert_hassil_intents.py and delete it).

"timer_pause": "timer_pause", "timer_unpause": "timer_unpause",
"timer": "timer", "timers": "timers",
"hilsen": "greeting", "mine_data": "my_data",
"abn": "open", "al": "all",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove duplicate dictionary key.

The key "abn" already appears at line 658 in the same "da" (Danish) dictionary. Python silently overwrites the first entry with the second. While both map to "open" (so the outcome is the same), the duplication is a code smell and may indicate copy-paste error or future maintenance risk.

🔧 Proposed fix
         "timer_add": "timer_add", "timer_decrease": "timer_decrease",
         "timer_pause": "timer_pause", "timer_unpause": "timer_unpause",
         "timer": "timer", "timers": "timers",
         "hilsen": "greeting", "mine_data": "my_data",
-        "abn": "open", "al": "all",
+        "al": "all",
     },
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"abn": "open", "al": "all",
"al": "all",
🧰 Tools
🪛 GitHub Check: lint / lint

[failure] 702-702: ruff (F601)
examples/hass-intent-dataset/convert_hassil_intents.py:702:9: F601 Dictionary key literal "abn" repeated
help: Remove repeated key literal "abn"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/hass-intent-dataset/convert_hassil_intents.py` at line 702, The
Danish "da" mapping in the dictionary contains a duplicate key "abn" (second
occurrence at the shown diff); remove the redundant "abn": "open" entry so the
"da" dict only has a single "abn" mapping, ensuring no duplicate keys remain in
the dictionary (locate the "da" dict in convert_hassil_intents.py to make the
edit).

]
thin.sort(key=lambda x: x[2] / x[1])
if thin:
print(f" --check: intents with <50% sample survival --")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unnecessary f-string prefix.

The string at line 1759 has no placeholders and doesn't need the f prefix.

🧹 Proposed fix
-            print(f"  --check: intents with <50% sample survival --")
+            print("  --check: intents with <50% sample survival --")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f" --check: intents with <50% sample survival --")
print(" --check: intents with <50% sample survival --")
🧰 Tools
🪛 GitHub Check: lint / lint

[failure] 1759-1759: ruff (F541)
examples/hass-intent-dataset/convert_hassil_intents.py:1759:19: F541 f-string without any placeholders
help: Remove extraneous f prefix

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/hass-intent-dataset/convert_hassil_intents.py` at line 1759, The
print statement currently uses an unnecessary f-string: replace the f-prefixed
call to print (the line printing "  --check: intents with <50% sample survival
--") with a plain string literal to avoid misleading formatting usage; locate
the print(...) invocation in convert_hassil_intents.py and remove the leading
"f" so it becomes print("  --check: intents with <50% sample survival --").

from __future__ import annotations

import json
import os

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused import os.

Flagged by the lint check (ruff F401); CI is failing on it.

🧹 Proposed fix
 import json
-import os
 import re
 import sys
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import os
import json
import re
import sys
🧰 Tools
🪛 GitHub Check: lint / lint

[failure] 33-33: ruff (F401)
examples/hass-intent-dataset/export_hf_dataset.py:33:8: F401 os imported but unused
help: Remove unused import: os

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/hass-intent-dataset/export_hf_dataset.py` at line 33, Remove the
unused import statement "import os" from the top-level imports in
export_hf_dataset.py (the unused symbol is the os import) so the lint error ruff
F401 is resolved; verify there are no references to os elsewhere in the file and
run the linter/CI to confirm the import removal fixes the failing check.

Comment on lines +6 to +9
import json
import os
from pathlib import Path
from collections import defaultdict

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused imports.

json, os, and collections.defaultdict are unused (ruff F401, CI failing).

🧹 Proposed fix
 from __future__ import annotations

-import json
-import os
+import re
 from pathlib import Path
-from collections import defaultdict
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import json
import os
from pathlib import Path
from collections import defaultdict
import re
from pathlib import Path
🧰 Tools
🪛 GitHub Check: lint / lint

[failure] 9-9: ruff (F401)
examples/hass-intent-dataset/generate_entities.py:9:25: F401 collections.defaultdict imported but unused
help: Remove unused import: collections.defaultdict


[failure] 7-7: ruff (F401)
examples/hass-intent-dataset/generate_entities.py:7:8: F401 os imported but unused
help: Remove unused import: os


[failure] 6-6: ruff (F401)
examples/hass-intent-dataset/generate_entities.py:6:8: F401 json imported but unused
help: Remove unused import: json

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/hass-intent-dataset/generate_entities.py` around lines 6 - 9, The
file imports unused symbols causing lint/CI failures—remove the unused imports
json, os, and defaultdict from the top of generate_entities.py and leave only
the required import(s) (e.g., pathlib.Path) after verifying Path is actually
used; update the import line(s) accordingly so only used modules are imported.

Comment on lines +47 to +54
refs = _extract_keyword_refs(template)
expansions: list[dict[str, object]] = []
for ref in refs:
vals = vocabs.get(ref)
if vals:
expansions.append({"keyword": ref, "values": vals})
if expansions:
row["expansions"] = expansions

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Duplicate expansions entries for repeated refs.

_extract_keyword_refs returns refs in document order with duplicates, so a template referencing <x> twice emits two identical {keyword, values} entries. De-duplicate while preserving order.

♻️ Proposed fix
-                refs = _extract_keyword_refs(template)
+                refs = list(dict.fromkeys(_extract_keyword_refs(template)))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
refs = _extract_keyword_refs(template)
expansions: list[dict[str, object]] = []
for ref in refs:
vals = vocabs.get(ref)
if vals:
expansions.append({"keyword": ref, "values": vals})
if expansions:
row["expansions"] = expansions
refs = list(dict.fromkeys(_extract_keyword_refs(template)))
expansions: list[dict[str, object]] = []
for ref in refs:
vals = vocabs.get(ref)
if vals:
expansions.append({"keyword": ref, "values": vals})
if expansions:
row["expansions"] = expansions
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/hass-intent-dataset/reexport_uniform.py` around lines 47 - 54, The
code builds expansions from refs returned by _extract_keyword_refs but preserves
duplicates; change the loop that creates expansions from refs so it deduplicates
while preserving order: iterate refs, keep a local seen set, for each ref skip
if already seen, otherwise look up vals = vocabs.get(ref) and if vals append
{"keyword": ref, "values": vals} and add ref to seen, then assign
row["expansions"] only if expansions is non-empty; update the block around
refs/expansions to use this seen-based dedupe.

Comment thread examples/hf_dataset.py
"""
from __future__ import annotations

import sys

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused import sys.

Flagged by ruff F401 (CI failing); argparse is imported locally in main and sys is never used.

🧹 Proposed fix
-import sys
 from pathlib import Path
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import sys
from pathlib import Path
🧰 Tools
🪛 GitHub Check: lint / lint

[failure] 17-17: ruff (F401)
examples/hf_dataset.py:17:8: F401 sys imported but unused
help: Remove unused import: sys

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/hf_dataset.py` at line 17, Remove the unused top-level import "sys"
from examples/hf_dataset.py: delete the "import sys" statement (it is unused and
ruff F401-failing), leaving the local "argparse" usage inside main intact;
verify there are no other references to "sys" such as in function main or
elsewhere before committing.

Comment on lines +54 to +58
SUPPORTED_DATASETS: dict[str, str] = {
"hassil-intents": "OpenVoiceOS/hassil-intents-locale",
"intents-for-eval": "OpenVoiceOS/intents-for-eval",
"massive-templates": "OpenVoiceOS/massive-templates",
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm which repo name appears across the PR
rg -n "hassil-intents-locale|hass-intent-templates"

Repository: OpenVoiceOS/ovos-spec-tools

Length of output: 789


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "test_urls_valid" . || true
rg -n "SUPPORTED_DATASETS" ovos_spec_tools/datasets.py . || true
rg -n "docs/datasets\.md|docs/api-reference\.md|docs/README\.md" ovos_spec_tools/datasets.py . || true
python - <<'PY'
from pathlib import Path
p = Path("ovos_spec_tools/datasets.py")
print("datasets.py first 120 lines:")
print("-----")
print("\n".join(p.read_text(encoding="utf-8").splitlines()[:120]))
PY

Repository: OpenVoiceOS/ovos-spec-tools

Length of output: 6162


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '160,260p' test/test_datasets.py

Repository: OpenVoiceOS/ovos-spec-tools

Length of output: 1987


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '120,220p' ovos_spec_tools/datasets.py

Repository: OpenVoiceOS/ovos-spec-tools

Length of output: 3186


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,120p' ovos_spec_tools/datasets.py
echo "---- docs/datasets.md ----"
rg -n "hassil-intents|hass-intent-templates|SUPPORTED_DATASETS" docs/datasets.md
echo "---- docs/api-reference.md ----"
rg -n "SUPPORTED_DATASETS|hassil-intents|hass-intent-templates" docs/api-reference.md
echo "---- docs/README.md ----"
rg -n "hassil-intents|hass-intent-templates" docs/README.md
echo "---- docs/datasets.md context ----"
sed -n '1,90p' docs/datasets.md
sed -n '160,210p' docs/datasets.md

Repository: OpenVoiceOS/ovos-spec-tools

Length of output: 9136


Fix hassil-intents HF repo mismatch between registry and docs

  • ovos_spec_tools/datasets.py maps "hassil-intents" to OpenVoiceOS/hassil-intents-locale, but the module docstring and the docs (docs/datasets.md, docs/api-reference.md, docs/README.md) all document OpenVoiceOS/hass-intent-templates (per-language configs + expansions column).
  • test_urls_valid only checks OpenVoiceOS/ prefix, so it won’t catch this divergence.
🔧 Proposed fix (docs repo is correct)
 SUPPORTED_DATASETS: dict[str, str] = {
-    "hassil-intents": "OpenVoiceOS/hassil-intents-locale",
+    "hassil-intents": "OpenVoiceOS/hass-intent-templates",
     "intents-for-eval": "OpenVoiceOS/intents-for-eval",
     "massive-templates": "OpenVoiceOS/massive-templates",
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ovos_spec_tools/datasets.py` around lines 54 - 58, The SUPPORTED_DATASETS
mapping in ovos_spec_tools/datasets.py incorrectly maps the key "hassil-intents"
to "OpenVoiceOS/hassil-intents-locale" while the module docstring and docs
expect "OpenVoiceOS/hass-intent-templates"; update the dictionary entry in
SUPPORTED_DATASETS to use "OpenVoiceOS/hass-intent-templates" for
"hassil-intents" and ensure any related references (module docstring or
consumers of SUPPORTED_DATASETS) remain consistent; run or adjust
test_urls_valid if needed to validate the full repo name rather than only the
"OpenVoiceOS/" prefix.

Comment thread test/test_datasets.py
Comment on lines +171 to +190
def test_round_trip_small(self):
"""Load a single row, export, verify the intent file has the template."""
rows = load_dataset_templates("hassil-intents", lang="en", streaming=False)
# Only keep first 3 rows to make it fast
first = [rows[0], rows[1]]

with tempfile.TemporaryDirectory() as tmp:
dst = Path(tmp)
with patch(
"ovos_spec_tools.datasets.load_dataset_templates",
return_value=first,
):
count = export_to_locale("hassil-intents", "en", dst)
assert count == 2

# Check that first intent file has rows[0] template
name = rows[0]["intent_id"].split(":")[-1]
intent_path = dst / "locale" / "en" / f"{name}.intent"
assert intent_path.exists()
assert rows[0]["template"] in intent_path.read_text()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

test_round_trip_small makes a real, un-mocked dataset call — root cause of the CI failures.

Line 173 calls load_dataset_templates("hassil-intents", lang="en", streaming=False) before the patch, so it actually imports datasets and hits HuggingFace. This fails in CI (ImportError: The datasets library is required) and, even with the dependency installed, would make the unit test network-bound, slow, and flaky. Replace the real fetch with fixture rows so the test stays hermetic.

💚 Suggested hermetic rewrite
-    def test_round_trip_small(self):
-        """Load a single row, export, verify the intent file has the template."""
-        rows = load_dataset_templates("hassil-intents", lang="en", streaming=False)
-        # Only keep first 3 rows to make it fast
-        first = [rows[0], rows[1]]
-
-        with tempfile.TemporaryDirectory() as tmp:
+    def test_round_trip_small(self):
+        """Export fixture rows, verify the intent file has the template."""
+        rows = [
+            {"intent_id": "test:greet", "template": "<hello> {name}",
+             "slots": [], "expansions": [{"keyword": "hello", "values": ["hi"]}]},
+            {"intent_id": "test:bye", "template": "<bye> {name}",
+             "slots": [], "expansions": [{"keyword": "bye", "values": ["goodbye"]}]},
+        ]
+        first = [rows[0], rows[1]]
+
+        with tempfile.TemporaryDirectory() as tmp:
             dst = Path(tmp)
             with patch(
                 "ovos_spec_tools.datasets.load_dataset_templates",
                 return_value=first,
             ):
                 count = export_to_locale("hassil-intents", "en", dst)
             assert count == 2
 
             # Check that first intent file has rows[0] template
             name = rows[0]["intent_id"].split(":")[-1]
             intent_path = dst / "locale" / "en" / f"{name}.intent"
             assert intent_path.exists()
             assert rows[0]["template"] in intent_path.read_text()
🧰 Tools
🪛 GitHub Actions: Build Tests / 1_build _ build_tests (3.13).txt

[error] 173-173: Pytest failed: TestExportToLocale.test_round_trip_small could not load HuggingFace dataset because datasets is not installed.

🪛 GitHub Actions: Build Tests / 2_build _ build_tests (3.12).txt

[error] 173-173: Pytest failure in TestExportToLocale.test_round_trip_small when calling load_dataset_templates("hassil-intents", lang="en", streaming=False), which raised ImportError because the datasets library is not installed.

🪛 GitHub Actions: Build Tests / 3_build _ build_tests (3.14).txt

[error] 173-173: Test failure: TestExportToLocale.test_round_trip_small raised ImportError because datasets library is not installed.

🪛 GitHub Actions: Build Tests / 4_build _ build_tests (3.10).txt

[error] 173-173: Pytest failure: TestExportToLocale.test_round_trip_small failed when calling load_dataset_templates('hassil-intents', lang='en', streaming=False). Error: ImportError/ModuleNotFoundError for missing 'datasets' library.

🪛 GitHub Actions: Build Tests / 5_build _ build_tests (3.11).txt

[error] 173-173: Pytest failed: TestExportToLocale.test_round_trip_small. load_dataset_templates('hassil-intents', lang='en', streaming=False) raised ImportError because the 'datasets' library is not installed.

🪛 GitHub Actions: Build Tests / build _ build_tests (3.10)

[error] 173-173: Failed test: TestExportToLocale.test_round_trip_small (load_dataset_templates('hassil-intents', lang='en', streaming=False)) because required dependency 'datasets' is not installed.

🪛 GitHub Actions: Build Tests / build _ build_tests (3.11)

[error] 173-173: Failed test: TestExportToLocale.test_round_trip_small. Error: ImportError: The datasets library is required; install it with: pip install datasets.

🪛 GitHub Actions: Build Tests / build _ build_tests (3.12)

[error] 173-173: TestExportToLocale.test_round_trip_small failed while calling load_dataset_templates('hassil-intents', lang='en', streaming=False).

🪛 GitHub Actions: Build Tests / build _ build_tests (3.13)

[error] 173-173: Failure in TestExportToLocale.test_round_trip_small: load_dataset_templates("hassil-intents", lang="en", streaming=False) raised ImportError because the datasets library is not installed.

🪛 GitHub Actions: Build Tests / build _ build_tests (3.14)

[error] 173-173: Pytest failed: TestExportToLocale.test_round_trip_small. load_dataset_templates('hassil-intents', lang='en', streaming=False) raised ImportError because the datasets library is not installed.

🪛 GitHub Actions: Code Coverage / 0_coverage _ coverage.txt

[error] 173-173: TestExportToLocale.test_round_trip_small failed. Called load_dataset_templates("hassil-intents", lang="en", streaming=False) which raised ImportError because the datasets library is not installed.

🪛 GitHub Actions: Code Coverage / coverage _ coverage

[error] 173-173: Failure occurred when calling load_dataset_templates("hassil-intents", lang="en", streaming=False) due to missing datasets library.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_datasets.py` around lines 171 - 190, The test test_round_trip_small
currently calls load_dataset_templates before the patch, causing a real dataset
fetch; fix it by removing the real call and creating a local fixture list (e.g.,
a small list of dicts with intent_id and template) to use as rows, then patch
ovos_spec_tools.datasets.load_dataset_templates to return that fixture before
calling export_to_locale; ensure you derive first = fixture[:2] and use
fixture[0]["intent_id"] and fixture[0]["template"] when checking the exported
intent file so the test remains hermetic and does not hit the datasets library.

Comment thread test/test_datasets.py
Comment on lines +204 to +207
def test_urls_valid(self):
for url in SUPPORTED_DATASETS.values():
assert "/" in url
assert url.startswith("OpenVoiceOS/")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

test_urls_valid is too weak to catch a wrong repo.

It only asserts the OpenVoiceOS/ prefix and a /, so it passes for both hassil-intents-locale and hass-intent-templates (see the registry mismatch flagged in datasets.py). Consider asserting the exact expected repo names so the registry can't silently drift from the docs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_datasets.py` around lines 204 - 207, The test test_urls_valid is
too permissive—it's only checking for a slash and the "OpenVoiceOS/" prefix on
entries from SUPPORTED_DATASETS, which allows wrong repo names to slip through;
update the test to assert the exact expected repository names for each key in
SUPPORTED_DATASETS (or assert the set of SUPPORTED_DATASETS.values() equals an
expected set/list of repo strings) so the registry cannot silently drift—modify
test_urls_valid to compare SUPPORTED_DATASETS against the precise expected repo
names.

JarbasAl added 4 commits June 2, 2026 09:47
Drop all hardcoded strings from Python scripts — single source of
truth is base_locale/<lang>/<slot>.entity files:

- generate_base_locale.py: seed file with translation data for 59 languages
- base_locale/: 413 .entity files (area, name, color, state,
  device_class, domain, floor) across 59 languages
- convert_hassil_intents.py: read area.entity from base_locale
  instead of hardcoded COMMON_AREA_NAMES dict
- export_hf_dataset.py: slot examples from base_locale/, drop
  DOMAIN_DEVICE_NAMES and _extract_domain
- generate_entities.py: read from base_locale/ instead of
  _LANG_OVERRIDES and other hardcoded data

Numeric slots (brightness, percentage, temperature, ...) are still
generated programmatically as they are language-agnostic.
HA internal identifiers (device_class, domain) remain English
in all languages.
inline_keywords(template, expansions) replaces <keyword> refs with
(a|b|c) alternation groups inline — needed for engines like Padatious
that don't look up .voc files at runtime.  Handles nested refs
recursively with configurable max_values cap.

Exported from ovos_spec_tools top-level package and documented
in the API reference.
@JarbasAl JarbasAl marked this pull request as draft June 27, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant