feat: HF dataset loader, intent expansion, and locale export by JarbasAl · Pull Request #20 · OpenVoiceOS/ovos-spec-tools

JarbasAl · 2026-06-02T08:26:00Z

New module ovos_spec_tools.datasets for loading, expanding, and exporting OVOS-INTENT-2 templates from HuggingFace datasets.

What's added

`ovos_spec_tools/datasets.py`

load_dataset_templates(dataset_id, lang) — loads templates from any of three HF datasets: OpenVoiceOS/hass-intent-templates, intents-for-eval, massive-templates
expand_hf_template(template, expansions, max_samples) — resolves <keyword> refs, (a|b) alternations, [x] optionals into concrete utterances
export_to_locale(dataset_id, lang, output_dir) — writes .intent, .voc, .entity files to a standard OVOS-INTENT-2 locale tree
CLI entry via python -m ovos_spec_tools.datasets

New example scripts

Script	Purpose
`convert_hassil_intents.py`	Home Assistant hassil → OVOS locale converter
`export_hf_dataset.py`	locale → multi-config HF dataset export
`generate_entities.py`	auto-generate missing `.entity` files across all languages
`reexport_recursive.py`	recursively resolve nested `<keyword>` references
`reexport_uniform.py`	uniform list-struct schema for expansion values
`hf_dataset.py`	unified CLI for all three supported datasets
`locale_to_hf_dataset.py`	existing: OVOS locale → intents-for-eval format

Other

pyproject.toml: optional datasets extra
17 tests in test/test_datasets.py
AGENTS.md + TODO.md in-repo docs
APPENDIX.md: formal hassil→OVOS grammar mapping documentation

Usage

from ovos_spec_tools.datasets import load_dataset_templates, expand_hf_template, export_to_locale

# Load English templates
templates = load_dataset_templates('hassil-intents', lang='en')

# Expand a template
utterances = expand_hf_template(tpl, expansions, max_samples=20)

# Export to locale directory
export_to_locale('hassil-intents', lang='en', output_dir='/tmp/locale')

Verification

All 320 existing tests pass with no regressions
17 new tests in test_datasets.py
Tested end-to-end with the live OpenVoiceOS/hass-intent-templates dataset (61 configs, ~224k rows)

Summary by CodeRabbit

New Features
- Added HuggingFace datasets module to load, expand, and export OVOS-INTENT-2 templates from supported datasets.
- Added template expansion functionality to generate utterance samples from templates.
- Added locale export capability to convert dataset templates into OVOS-compatible directory structures.
Documentation
- Added comprehensive datasets guide with API reference and usage examples.
- Added Home Assistant intent conversion documentation.
Tests
- Added test coverage for datasets module functionality.
Chores
- Added optional datasets dependency for advanced dataset features.

Introduce ovos_spec_tools.datasets — a new module for loading, expanding, and exporting OVOS-INTENT-2 templates from HuggingFace datasets: * load_dataset_templates() — load from hass-intent-templates, intents-for-eval, or massive-templates via datasets library * expand_hf_template() — resolve <keyword>, (a|b), [x] into concrete utterances using the existing expansion.py engine * export_to_locale() — write .intent / .voc / .entity files into a standard OVOS-INTENT-2 locale directory tree Add example scripts for the full dataset generation pipeline: * convert_hassil_intents.py — Home Assistant hassil → OVOS locale * export_hf_dataset.py — locale → multi-config HF dataset * generate_entities.py — auto-generate missing .entity files * reexport_recursive.py — recursively resolve nested <keyword> refs * reexport_uniform.py — uniform list<struct> schema for expansions * hf_dataset.py — unified CLI for all three supported datasets Update pyproject.toml with optional 'datasets' dependency. Add 17 tests covering config resolution, normalization, expansion, and locale export.

coderabbitai · 2026-06-02T08:26:08Z

Important

Review skipped

Too many files!

This PR contains 296 files, which is 146 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 475affe6-8f9c-4acd-bf49-c470f0a9852f

📥 Commits

Reviewing files that changed from the base of the PR and between bd54e89 and 1eb9068.

📒 Files selected for processing (296)

examples/hass-intent-dataset/base_locale/af/area.entity
examples/hass-intent-dataset/base_locale/af/color.entity
examples/hass-intent-dataset/base_locale/af/device_class.entity
examples/hass-intent-dataset/base_locale/af/domain.entity
examples/hass-intent-dataset/base_locale/af/floor.entity
examples/hass-intent-dataset/base_locale/af/name.entity
examples/hass-intent-dataset/base_locale/af/state.entity
examples/hass-intent-dataset/base_locale/ar/area.entity
examples/hass-intent-dataset/base_locale/ar/color.entity
examples/hass-intent-dataset/base_locale/ar/device_class.entity
examples/hass-intent-dataset/base_locale/ar/domain.entity
examples/hass-intent-dataset/base_locale/ar/floor.entity
examples/hass-intent-dataset/base_locale/ar/name.entity
examples/hass-intent-dataset/base_locale/ar/state.entity
examples/hass-intent-dataset/base_locale/bg/area.entity
examples/hass-intent-dataset/base_locale/bg/color.entity
examples/hass-intent-dataset/base_locale/bg/device_class.entity
examples/hass-intent-dataset/base_locale/bg/domain.entity
examples/hass-intent-dataset/base_locale/bg/floor.entity
examples/hass-intent-dataset/base_locale/bg/name.entity
examples/hass-intent-dataset/base_locale/bg/state.entity
examples/hass-intent-dataset/base_locale/bn/area.entity
examples/hass-intent-dataset/base_locale/bn/color.entity
examples/hass-intent-dataset/base_locale/bn/device_class.entity
examples/hass-intent-dataset/base_locale/bn/domain.entity
examples/hass-intent-dataset/base_locale/bn/floor.entity
examples/hass-intent-dataset/base_locale/bn/name.entity
examples/hass-intent-dataset/base_locale/bn/state.entity
examples/hass-intent-dataset/base_locale/ca/area.entity
examples/hass-intent-dataset/base_locale/ca/color.entity
examples/hass-intent-dataset/base_locale/ca/device_class.entity
examples/hass-intent-dataset/base_locale/ca/domain.entity
examples/hass-intent-dataset/base_locale/ca/floor.entity
examples/hass-intent-dataset/base_locale/ca/name.entity
examples/hass-intent-dataset/base_locale/ca/state.entity
examples/hass-intent-dataset/base_locale/cs/area.entity
examples/hass-intent-dataset/base_locale/cs/color.entity
examples/hass-intent-dataset/base_locale/cs/device_class.entity
examples/hass-intent-dataset/base_locale/cs/domain.entity
examples/hass-intent-dataset/base_locale/cs/floor.entity
examples/hass-intent-dataset/base_locale/cs/name.entity
examples/hass-intent-dataset/base_locale/cs/state.entity
examples/hass-intent-dataset/base_locale/cy/area.entity
examples/hass-intent-dataset/base_locale/cy/color.entity
examples/hass-intent-dataset/base_locale/cy/device_class.entity
examples/hass-intent-dataset/base_locale/cy/domain.entity
examples/hass-intent-dataset/base_locale/cy/floor.entity
examples/hass-intent-dataset/base_locale/cy/name.entity
examples/hass-intent-dataset/base_locale/cy/state.entity
examples/hass-intent-dataset/base_locale/da/area.entity
examples/hass-intent-dataset/base_locale/da/color.entity
examples/hass-intent-dataset/base_locale/da/device_class.entity
examples/hass-intent-dataset/base_locale/da/domain.entity
examples/hass-intent-dataset/base_locale/da/floor.entity
examples/hass-intent-dataset/base_locale/da/name.entity
examples/hass-intent-dataset/base_locale/da/state.entity
examples/hass-intent-dataset/base_locale/de/area.entity
examples/hass-intent-dataset/base_locale/de/color.entity
examples/hass-intent-dataset/base_locale/de/device_class.entity
examples/hass-intent-dataset/base_locale/de/domain.entity
examples/hass-intent-dataset/base_locale/de/floor.entity
examples/hass-intent-dataset/base_locale/de/name.entity
examples/hass-intent-dataset/base_locale/de/state.entity
examples/hass-intent-dataset/base_locale/el/area.entity
examples/hass-intent-dataset/base_locale/el/color.entity
examples/hass-intent-dataset/base_locale/el/device_class.entity
examples/hass-intent-dataset/base_locale/el/domain.entity
examples/hass-intent-dataset/base_locale/el/floor.entity
examples/hass-intent-dataset/base_locale/el/name.entity
examples/hass-intent-dataset/base_locale/el/state.entity
examples/hass-intent-dataset/base_locale/en/area.entity
examples/hass-intent-dataset/base_locale/en/color.entity
examples/hass-intent-dataset/base_locale/en/device_class.entity
examples/hass-intent-dataset/base_locale/en/domain.entity
examples/hass-intent-dataset/base_locale/en/floor.entity
examples/hass-intent-dataset/base_locale/en/name.entity
examples/hass-intent-dataset/base_locale/en/state.entity
examples/hass-intent-dataset/base_locale/es/area.entity
examples/hass-intent-dataset/base_locale/es/color.entity
examples/hass-intent-dataset/base_locale/es/device_class.entity
examples/hass-intent-dataset/base_locale/es/domain.entity
examples/hass-intent-dataset/base_locale/es/floor.entity
examples/hass-intent-dataset/base_locale/es/name.entity
examples/hass-intent-dataset/base_locale/es/state.entity
examples/hass-intent-dataset/base_locale/et/area.entity
examples/hass-intent-dataset/base_locale/et/color.entity
examples/hass-intent-dataset/base_locale/et/device_class.entity
examples/hass-intent-dataset/base_locale/et/domain.entity
examples/hass-intent-dataset/base_locale/et/floor.entity
examples/hass-intent-dataset/base_locale/et/name.entity
examples/hass-intent-dataset/base_locale/et/state.entity
examples/hass-intent-dataset/base_locale/eu/area.entity
examples/hass-intent-dataset/base_locale/eu/color.entity
examples/hass-intent-dataset/base_locale/eu/device_class.entity
examples/hass-intent-dataset/base_locale/eu/domain.entity
examples/hass-intent-dataset/base_locale/eu/floor.entity
examples/hass-intent-dataset/base_locale/eu/name.entity
examples/hass-intent-dataset/base_locale/eu/state.entity
examples/hass-intent-dataset/base_locale/fa/area.entity
examples/hass-intent-dataset/base_locale/fa/color.entity
examples/hass-intent-dataset/base_locale/fa/device_class.entity
examples/hass-intent-dataset/base_locale/fa/domain.entity
examples/hass-intent-dataset/base_locale/fa/floor.entity
examples/hass-intent-dataset/base_locale/fa/name.entity
examples/hass-intent-dataset/base_locale/fa/state.entity
examples/hass-intent-dataset/base_locale/fi/area.entity
examples/hass-intent-dataset/base_locale/fi/color.entity
examples/hass-intent-dataset/base_locale/fi/device_class.entity
examples/hass-intent-dataset/base_locale/fi/domain.entity
examples/hass-intent-dataset/base_locale/fi/floor.entity
examples/hass-intent-dataset/base_locale/fi/name.entity
examples/hass-intent-dataset/base_locale/fi/state.entity
examples/hass-intent-dataset/base_locale/fr/area.entity
examples/hass-intent-dataset/base_locale/fr/color.entity
examples/hass-intent-dataset/base_locale/fr/device_class.entity
examples/hass-intent-dataset/base_locale/fr/domain.entity
examples/hass-intent-dataset/base_locale/fr/floor.entity
examples/hass-intent-dataset/base_locale/fr/name.entity
examples/hass-intent-dataset/base_locale/fr/state.entity
examples/hass-intent-dataset/base_locale/ga/area.entity
examples/hass-intent-dataset/base_locale/ga/color.entity
examples/hass-intent-dataset/base_locale/ga/device_class.entity
examples/hass-intent-dataset/base_locale/ga/domain.entity
examples/hass-intent-dataset/base_locale/ga/floor.entity
examples/hass-intent-dataset/base_locale/ga/name.entity
examples/hass-intent-dataset/base_locale/ga/state.entity
examples/hass-intent-dataset/base_locale/gl/area.entity
examples/hass-intent-dataset/base_locale/gl/color.entity
examples/hass-intent-dataset/base_locale/gl/device_class.entity
examples/hass-intent-dataset/base_locale/gl/domain.entity
examples/hass-intent-dataset/base_locale/gl/floor.entity
examples/hass-intent-dataset/base_locale/gl/name.entity
examples/hass-intent-dataset/base_locale/gl/state.entity
examples/hass-intent-dataset/base_locale/gu/area.entity
examples/hass-intent-dataset/base_locale/gu/color.entity
examples/hass-intent-dataset/base_locale/gu/device_class.entity
examples/hass-intent-dataset/base_locale/gu/domain.entity
examples/hass-intent-dataset/base_locale/gu/floor.entity
examples/hass-intent-dataset/base_locale/gu/name.entity
examples/hass-intent-dataset/base_locale/gu/state.entity
examples/hass-intent-dataset/base_locale/he/area.entity
examples/hass-intent-dataset/base_locale/he/color.entity
examples/hass-intent-dataset/base_locale/he/device_class.entity
examples/hass-intent-dataset/base_locale/he/domain.entity
examples/hass-intent-dataset/base_locale/he/floor.entity
examples/hass-intent-dataset/base_locale/he/name.entity
examples/hass-intent-dataset/base_locale/he/state.entity
examples/hass-intent-dataset/base_locale/hi/area.entity
examples/hass-intent-dataset/base_locale/hi/color.entity
examples/hass-intent-dataset/base_locale/hi/device_class.entity
examples/hass-intent-dataset/base_locale/hi/domain.entity
examples/hass-intent-dataset/base_locale/hi/floor.entity
examples/hass-intent-dataset/base_locale/hi/name.entity
examples/hass-intent-dataset/base_locale/hi/state.entity
examples/hass-intent-dataset/base_locale/hr/area.entity
examples/hass-intent-dataset/base_locale/hr/color.entity
examples/hass-intent-dataset/base_locale/hr/device_class.entity
examples/hass-intent-dataset/base_locale/hr/domain.entity
examples/hass-intent-dataset/base_locale/hr/floor.entity
examples/hass-intent-dataset/base_locale/hr/name.entity
examples/hass-intent-dataset/base_locale/hr/state.entity
examples/hass-intent-dataset/base_locale/hu/area.entity
examples/hass-intent-dataset/base_locale/hu/color.entity
examples/hass-intent-dataset/base_locale/hu/device_class.entity
examples/hass-intent-dataset/base_locale/hu/domain.entity
examples/hass-intent-dataset/base_locale/hu/floor.entity
examples/hass-intent-dataset/base_locale/hu/name.entity
examples/hass-intent-dataset/base_locale/hu/state.entity
examples/hass-intent-dataset/base_locale/is/area.entity
examples/hass-intent-dataset/base_locale/is/color.entity
examples/hass-intent-dataset/base_locale/is/device_class.entity
examples/hass-intent-dataset/base_locale/is/domain.entity
examples/hass-intent-dataset/base_locale/is/floor.entity
examples/hass-intent-dataset/base_locale/is/name.entity
examples/hass-intent-dataset/base_locale/is/state.entity
examples/hass-intent-dataset/base_locale/it/area.entity
examples/hass-intent-dataset/base_locale/it/color.entity
examples/hass-intent-dataset/base_locale/it/device_class.entity
examples/hass-intent-dataset/base_locale/it/domain.entity
examples/hass-intent-dataset/base_locale/it/floor.entity
examples/hass-intent-dataset/base_locale/it/name.entity
examples/hass-intent-dataset/base_locale/it/state.entity
examples/hass-intent-dataset/base_locale/ja/area.entity
examples/hass-intent-dataset/base_locale/ja/color.entity
examples/hass-intent-dataset/base_locale/ja/device_class.entity
examples/hass-intent-dataset/base_locale/ja/domain.entity
examples/hass-intent-dataset/base_locale/ja/floor.entity
examples/hass-intent-dataset/base_locale/ja/name.entity
examples/hass-intent-dataset/base_locale/ja/state.entity
examples/hass-intent-dataset/base_locale/ka/area.entity
examples/hass-intent-dataset/base_locale/ka/color.entity
examples/hass-intent-dataset/base_locale/ka/device_class.entity
examples/hass-intent-dataset/base_locale/ka/domain.entity
examples/hass-intent-dataset/base_locale/ka/floor.entity
examples/hass-intent-dataset/base_locale/ka/name.entity
examples/hass-intent-dataset/base_locale/ka/state.entity
examples/hass-intent-dataset/base_locale/kn/area.entity
examples/hass-intent-dataset/base_locale/kn/color.entity
examples/hass-intent-dataset/base_locale/kn/device_class.entity
examples/hass-intent-dataset/base_locale/kn/domain.entity
examples/hass-intent-dataset/base_locale/kn/floor.entity
examples/hass-intent-dataset/base_locale/kn/name.entity
examples/hass-intent-dataset/base_locale/kn/state.entity
examples/hass-intent-dataset/base_locale/ko/area.entity
examples/hass-intent-dataset/base_locale/ko/color.entity
examples/hass-intent-dataset/base_locale/ko/device_class.entity
examples/hass-intent-dataset/base_locale/ko/domain.entity
examples/hass-intent-dataset/base_locale/ko/floor.entity
examples/hass-intent-dataset/base_locale/ko/name.entity
examples/hass-intent-dataset/base_locale/ko/state.entity
examples/hass-intent-dataset/base_locale/kw/area.entity
examples/hass-intent-dataset/base_locale/kw/color.entity
examples/hass-intent-dataset/base_locale/kw/device_class.entity
examples/hass-intent-dataset/base_locale/kw/domain.entity
examples/hass-intent-dataset/base_locale/kw/floor.entity
examples/hass-intent-dataset/base_locale/kw/name.entity
examples/hass-intent-dataset/base_locale/kw/state.entity
examples/hass-intent-dataset/base_locale/lb/area.entity
examples/hass-intent-dataset/base_locale/lb/color.entity
examples/hass-intent-dataset/base_locale/lb/device_class.entity
examples/hass-intent-dataset/base_locale/lb/domain.entity
examples/hass-intent-dataset/base_locale/lb/floor.entity
examples/hass-intent-dataset/base_locale/lb/name.entity
examples/hass-intent-dataset/base_locale/lb/state.entity
examples/hass-intent-dataset/base_locale/lt/area.entity
examples/hass-intent-dataset/base_locale/lt/color.entity
examples/hass-intent-dataset/base_locale/lt/device_class.entity
examples/hass-intent-dataset/base_locale/lt/domain.entity
examples/hass-intent-dataset/base_locale/lt/floor.entity
examples/hass-intent-dataset/base_locale/lt/name.entity
examples/hass-intent-dataset/base_locale/lt/state.entity
examples/hass-intent-dataset/base_locale/lv/area.entity
examples/hass-intent-dataset/base_locale/lv/color.entity
examples/hass-intent-dataset/base_locale/lv/device_class.entity
examples/hass-intent-dataset/base_locale/lv/domain.entity
examples/hass-intent-dataset/base_locale/lv/floor.entity
examples/hass-intent-dataset/base_locale/lv/name.entity
examples/hass-intent-dataset/base_locale/lv/state.entity
examples/hass-intent-dataset/base_locale/ml/area.entity
examples/hass-intent-dataset/base_locale/ml/color.entity
examples/hass-intent-dataset/base_locale/ml/device_class.entity
examples/hass-intent-dataset/base_locale/ml/domain.entity
examples/hass-intent-dataset/base_locale/ml/floor.entity
examples/hass-intent-dataset/base_locale/ml/name.entity
examples/hass-intent-dataset/base_locale/ml/state.entity
examples/hass-intent-dataset/base_locale/mn/area.entity
examples/hass-intent-dataset/base_locale/mn/color.entity
examples/hass-intent-dataset/base_locale/mn/device_class.entity
examples/hass-intent-dataset/base_locale/mn/domain.entity
examples/hass-intent-dataset/base_locale/mn/floor.entity
examples/hass-intent-dataset/base_locale/mn/name.entity
examples/hass-intent-dataset/base_locale/mn/state.entity
examples/hass-intent-dataset/base_locale/mr/area.entity
examples/hass-intent-dataset/base_locale/mr/color.entity
examples/hass-intent-dataset/base_locale/mr/device_class.entity
examples/hass-intent-dataset/base_locale/mr/domain.entity
examples/hass-intent-dataset/base_locale/mr/floor.entity
examples/hass-intent-dataset/base_locale/mr/name.entity
examples/hass-intent-dataset/base_locale/mr/state.entity
examples/hass-intent-dataset/base_locale/nb/area.entity
examples/hass-intent-dataset/base_locale/nb/color.entity
examples/hass-intent-dataset/base_locale/nb/device_class.entity
examples/hass-intent-dataset/base_locale/nb/domain.entity
examples/hass-intent-dataset/base_locale/nb/floor.entity
examples/hass-intent-dataset/base_locale/nb/name.entity
examples/hass-intent-dataset/base_locale/nb/state.entity
examples/hass-intent-dataset/base_locale/ne/area.entity
examples/hass-intent-dataset/base_locale/ne/color.entity
examples/hass-intent-dataset/base_locale/ne/device_class.entity
examples/hass-intent-dataset/base_locale/ne/domain.entity
examples/hass-intent-dataset/base_locale/ne/floor.entity
examples/hass-intent-dataset/base_locale/ne/name.entity
examples/hass-intent-dataset/base_locale/ne/state.entity
examples/hass-intent-dataset/base_locale/nl/area.entity
examples/hass-intent-dataset/base_locale/nl/color.entity
examples/hass-intent-dataset/base_locale/nl/device_class.entity
examples/hass-intent-dataset/base_locale/nl/domain.entity
examples/hass-intent-dataset/base_locale/nl/floor.entity
examples/hass-intent-dataset/base_locale/nl/name.entity
examples/hass-intent-dataset/base_locale/nl/state.entity
examples/hass-intent-dataset/base_locale/pa/area.entity
examples/hass-intent-dataset/base_locale/pa/color.entity
examples/hass-intent-dataset/base_locale/pa/device_class.entity
examples/hass-intent-dataset/base_locale/pa/domain.entity
examples/hass-intent-dataset/base_locale/pa/floor.entity
examples/hass-intent-dataset/base_locale/pa/name.entity
examples/hass-intent-dataset/base_locale/pa/state.entity
examples/hass-intent-dataset/base_locale/pl/area.entity
examples/hass-intent-dataset/base_locale/pl/color.entity
examples/hass-intent-dataset/base_locale/pl/device_class.entity
examples/hass-intent-dataset/base_locale/pl/domain.entity
examples/hass-intent-dataset/base_locale/pl/floor.entity
examples/hass-intent-dataset/base_locale/pl/name.entity
examples/hass-intent-dataset/base_locale/pl/state.entity
examples/hass-intent-dataset/base_locale/pt-BR/area.entity
examples/hass-intent-dataset/base_locale/pt-BR/color.entity

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds comprehensive HuggingFace dataset support to ovos-spec-tools. It introduces dataset loading and OVOS-INTENT-2 locale export APIs, a large hassil-to-OVOS conversion pipeline, utility scripts for dataset generation and re-export, example CLI tools, and full test coverage.

Changes

Dataset Loading and Export Feature

Layer / File(s)	Summary
Core datasets module and exports `ovos_spec_tools/datasets.py`, `ovos_spec_tools/__init__.py`, `pyproject.toml`	New `datasets.py` module provides `load_dataset_templates` (loads HF datasets with row normalization), `expand_hf_template` (expands templates into utterance samples), and `export_to_locale` (writes `.intent`, `.voc`, `.entity` files to locale structure). Conditional imports in `__init__.py` handle missing `datasets` library. Optional `datasets` dependency added to project.
Documentation and README updates `docs/README.md`, `docs/api-reference.md`, `docs/datasets.md`	README updated from five to six capabilities with new dataset loader bullet. New comprehensive `datasets.md` documents supported datasets, API usage, round-trip workflows, and CLI examples. API reference adds Datasets chapter (7) with function signatures; Linting chapter shifted to 8.
Hassil to OVOS-INTENT-2 conversion pipeline `examples/hass-intent-dataset/convert_hassil_intents.py`, `examples/hass-intent-dataset/APPENDIX.md`	Large conversion script (1813 lines) normalizes hassil grammar, inlines rule references, rewrites responses (Jinja decomposition), and streams `.intent`/`.dialog`/`.voc`/`.entity` outputs with safety caps and resumable per-target processing. Detailed APPENDIX documents grammar correspondence, naming rules, expansion semantics, slot layout validation, and response normalization with lossiness considerations.
HF dataset export and re-export utilities `examples/hass-intent-dataset/export_hf_dataset.py`, `examples/hass-intent-dataset/generate_entities.py`, `examples/hass-intent-dataset/reexport_recursive.py`, `examples/hass-intent-dataset/reexport_uniform.py`	`export_hf_dataset.py` converts locale directories to HF JSONL format (templates/keywords/entities/test configs). `generate_entities.py` scans intents for slot placeholders and writes missing entity files with language-specific value mappings. `reexport_recursive.py` and `reexport_uniform.py` process template JSONL files, resolving `<keyword>` expansions from vocab files with cycle prevention and recursive substitution.
Example CLI tools for dataset workflows `examples/hf_dataset.py`, `examples/locale_to_hf_dataset.py`	`hf_dataset.py` CLI loads HF dataset templates, optionally expands samples, and exports to locale directory. `locale_to_hf_dataset.py` CLI converts existing OVOS locale trees to HF dataset format, expanding vocab references and deriving slot examples with per-slot value caps.
Unit tests for datasets module `test/test_datasets.py`	Test classes cover config resolution, domain stripping, row normalization across dataset styles, template expansion with grammar and `max_samples` capping, export to locale with file verification, and `SUPPORTED_DATASETS` registry validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 A rabbit hops through datasets vast,
HuggingFace templates loaded fast,
Hassil to OVOS, a winding track,
Expansion, export, no looking back!
Six things now possible, let's celebrate—
New voices loaded at a rapid rate! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.79% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat: HF dataset loader, intent expansion, and locale export' directly and accurately summarizes the main changes: adding a HuggingFace dataset loader module, intent template expansion functionality, and locale directory export capability.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/hf-datasets-and-scripts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-02T08:26:40Z

Automated check summary ready. 📊

I've aggregated the results of the automated checks for this PR below.

🔍 Lint

The automated checks have finished their work. 🏁

❌ ruff: issues found — see job log

🔨 Build Tests

Checking if the code is properly tempered. ⚔️

Python	Build	Install	Tests
3.10	✅	✅	⚠️
3.11	✅	✅	⚠️
3.12	✅	✅	⚠️
3.13	✅	✅	⚠️
3.14	✅	✅	⚠️

❌ 3.10: Install OK, tests failed
❌ 3.11: Install OK, tests failed
❌ 3.12: Install OK, tests failed
❌ 3.13: Install OK, tests failed
❌ 3.14: Install OK, tests failed
Check job logs for details.

📊 Coverage

Is the code fully immunized with tests? 💉

✅ 94.0% total coverage

⚠️ Some tests failed — coverage figures may be incomplete.

Per-file coverage (10 files)

File	Coverage	Missing lines
`ovos_spec_tools/__init__.py`	68.8%	5
`ovos_spec_tools/datasets.py`	79.1%	24
`ovos_spec_tools/language.py`	89.7%	7
`ovos_spec_tools/message.py`	95.9%	3
`ovos_spec_tools/resources.py`	97.1%	6
`ovos_spec_tools/expansion.py`	98.1%	2
`ovos_spec_tools/lint.py`	98.6%	2
`ovos_spec_tools/dialog.py`	100.0%	0
`ovos_spec_tools/prompt.py`	100.0%	0
`ovos_spec_tools/version.py`	100.0%	0

Full report: download the coverage-report artifact.

🔒 Security (pip-audit)

Ensuring our password hashing is up to date. 🔨

✅ No known vulnerabilities found (32 packages scanned).

🏷️ Release Preview

Ensuring our release process remains smooth and efficient. 🚂

Current: 0.7.0a1 → Next: 0.8.0a1

Signal	Value
Label	`feature`
PR title	`feat: HF dataset loader, intent expansion, and locale export`
Bump	minor

✅ PR title follows conventional commit format.

🚀 Release Channel Compatibility

Predicted next version: 0.8.0a1

Channel	Status	Note	Current Constraint
Stable	⚪	Not in channel	-
Testing	⚪	Not in channel	-
Alpha	✅	Compatible	`ovos-spec-tools>=0.7.0a1`

📋 Repo Health

Scanning for any signs of 'orphaned' code limbs. 🦾

✅ All required files present.

Latest Version: 0.7.0a1

✅ ovos_spec_tools/version.py — Version file
✅ README.md — README
✅ LICENSE.md — License file (consider renaming to LICENSE)
✅ pyproject.toml — pyproject.toml
⚠️ setup.py — setup.py
✅ CHANGELOG.md — Changelog
✅ ovos_spec_tools/version.py has valid version block markers

⚖️ License Check

I've checked the genealogical tree of your licenses. 🌳

❌ License violations detected (4 packages) — review required before merging.

Dependency                          License Name                                            License Type         Misc                                    
ovos-spec-tools                     Error                                                   Error                                                        

License Type                        Found                                                  
Error                               1

License distribution: 1× Apache Software License, 1× Apache-2.0 OR BSD-2-Clause, 1× MIT, 1× MIT License

Full breakdown — 4 packages

Package	Version	License	URL
`build`	1.5.0	MIT	link
`ovos-spec-tools` ⚠️	0.7.0a1	Apache Software License	link
`packaging`	26.2	Apache-2.0 OR BSD-2-Clause	link
`pyproject_hooks`	1.2.0	MIT License	link

Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed.

An automated high-five for your latest changes! 🖐️

coderabbitai

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pyproject.toml (1)
33-36: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

test extra omits datasets.

CI runs test/test_datasets.py::TestExportToLocale::test_round_trip_small, which calls the real load_dataset_templates(...) and triggers ImportError because datasets isn't installed in the test environment. The primary fix is making that test hermetic (see the comment in test/test_datasets.py); however, if any test is intended to exercise the real loader, datasets must also be added here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` around lines 33 - 36, The test failure is caused by the test
importing the real datasets package via load_dataset_templates in
TestExportToLocale::test_round_trip_small (test/test_datasets.py); either make
that test hermetic by mocking/stubbing load_dataset_templates (or patching the
datasets import) so it doesn't require the real package, or if the intent is to
exercise the real loader, add "datasets" to the test extras in pyproject.toml
(the test = [...] list) so the CI environment installs it; update whichever
approach you choose and rerun tests.

🧹 Nitpick comments (4)

ovos_spec_tools/datasets.py (1)

284-289: 💤 Low value

Dead loop over <keyword> references.

This block iterates VOC_RE matches but only assigns kw and falls through with a comment — it produces no output and has no side effects. Either drop it, or implement the intended fallback (e.g. registering an empty .voc placeholder so downstream <keyword> refs resolve).

♻️ Suggested removal

-        # If no expansions but the template has <keyword> refs,
-        # try to extract inline alternations
-        if not row.get("expansions"):
-            for m in VOC_RE.finditer(tpl):
-                kw = m.group(1)
-                # No expansion data available; carry on
-

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ovos_spec_tools/datasets.py` around lines 284 - 289, The loop over
VOC_RE.finditer(tpl) (inside the branch if not row.get("expansions")) is dead —
it only assigns kw and does nothing; either remove the loop entirely or
implement the intended fallback: for each match extract kw and register an empty
expansion placeholder so downstream <keyword> references resolve (e.g., ensure
row has an "expansions" mapping and add an entry for kw with an empty
list/placeholder). Update the code around VOC_RE, tpl, and row to create
row.setdefault("expansions", {})[kw] = [] (or equivalent placeholder structure
used by downstream code) so the template fallback works, or delete the whole
block if the fallback is not needed.

examples/hass-intent-dataset/reexport_recursive.py (1)

29-69: 💤 Low value

Unbounded recursive expansion.

_resolve_value produces the full cartesian set of nested <keyword> substitutions with no cap; deeply nested vocabs can blow up memory. Cycles are guarded by seen, but breadth is not. Consider a max-results cap if this runs on large corpora.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/hass-intent-dataset/reexport_recursive.py` around lines 29 - 69, The
recursive expansion can explode in breadth; modify _resolve_value and
_resolve_expansions to accept a max_results (or limit) parameter and enforce it
during recursion: thread the limit through calls to _resolve_value from
_resolve_expansions, stop expanding further branches once the accumulating
results list reaches the limit (check len(results) during the loop and return
early), and propagate that early-return up the recursion so the overall result
set never exceeds max_results; also apply the same cap when building flat_values
before deduplication to prevent excessive memory use.

examples/hass-intent-dataset/generate_entities.py (1)

260-263: ⚡ Quick win

Define lt state inline instead of patching after the dict.

lt.state is declared as a nested list [[...]] (line 261) and then corrected by the post-dict block at lines 366-368. Define the correct flat list directly and drop the patch.

♻️ Proposed fix

     "lt": {
-        "state": [["įjungta", "išjungta", "atidaryta", "uždaryta", "užrakinta", "atrakinta"]],
+        "state": ["įjungta", "išjungta", "atidaryta", "uždaryta", "užrakinta", "atrakinta"],
         "color": ["balta", "juoda", "raudona", "oranžinė", "geltona", "žalia", "mėlyna", "violetinė", "rudа", "rožinė", "turkio"],
     },

And remove the patch block:

-# Fix lt state
-if "lt" in _LANG_OVERRIDES:
-    _LANG_OVERRIDES["lt"]["state"] = ["įjungta", "išjungta", "atidaryta", "uždaryta", "užrakinta", "atrakinta"]

Also applies to: 366-368

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/hass-intent-dataset/generate_entities.py` around lines 260 - 263,
The lt entry in the languages dict defines "state" as a nested list (e.g.
[["įjungta", ...]]) but is later patched to a flat list; change the "lt" ->
"state" value to a flat list directly (remove the extra nesting) and delete the
subsequent post-dict patch that overwrites lt['state'] (the block that sets
lt['state'] after the dict). Update references to "lt" and "state" accordingly
so no later correction is needed.

examples/hass-intent-dataset/convert_hassil_intents.py (1)

1226-1226: ⚡ Quick win

Simplify nested if-block detection.

The condition "{% if" in branch[len(branch) - len(branch.lstrip()):] is unnecessarily complex. The expression len(branch) - len(branch.lstrip()) computes the number of leading whitespace characters, and slicing from that position gives branch.lstrip(). This is equivalent to:
"{% if" in branch.lstrip()
or simply:
"{% if" in branch
since the presence check doesn't require stripping.
♻️ Proposed simplification
-        sub_branches = _split_jinja_if(branch) if "{% if" in branch[len(branch) - len(branch.lstrip()):] else [branch]
+        sub_branches = _split_jinja_if(branch) if "{% if" in branch else [branch]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/hass-intent-dataset/convert_hassil_intents.py` at line 1226, The
conditional that detects a leading Jinja if in the assignment to sub_branches is
overly complex; simplify the check by testing the stripped or full string
instead of computing leading-whitespace length—replace the expression used in
the ternary for sub_branches (which references branch and calls
_split_jinja_if(branch)) with a simpler membership test like "{% if" in
branch.lstrip() (or "{% if" in branch) so the branch is split via
_split_jinja_if(branch) only when appropriate.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/hass-intent-dataset/convert_hassil_intents.py`:
- Line 47: The import of the sys module is unused; remove the top-level "import
sys" statement so the script no longer contains an unused import (look for the
"import sys" line in convert_hassil_intents.py and delete it).
- Line 1759: The print statement currently uses an unnecessary f-string: replace
the f-prefixed call to print (the line printing "  --check: intents with <50%
sample survival --") with a plain string literal to avoid misleading formatting
usage; locate the print(...) invocation in convert_hassil_intents.py and remove
the leading "f" so it becomes print("  --check: intents with <50% sample
survival --").
- Line 702: The Danish "da" mapping in the dictionary contains a duplicate key
"abn" (second occurrence at the shown diff); remove the redundant "abn": "open"
entry so the "da" dict only has a single "abn" mapping, ensuring no duplicate
keys remain in the dictionary (locate the "da" dict in convert_hassil_intents.py
to make the edit).

In `@examples/hass-intent-dataset/export_hf_dataset.py`:
- Line 33: Remove the unused import statement "import os" from the top-level
imports in export_hf_dataset.py (the unused symbol is the os import) so the lint
error ruff F401 is resolved; verify there are no references to os elsewhere in
the file and run the linter/CI to confirm the import removal fixes the failing
check.

In `@examples/hass-intent-dataset/generate_entities.py`:
- Around line 360-363: The dict _LANG_OVERRIDES in generate_entities.py contains
a duplicated "pl" key; remove the second "pl" entry (the block with "state" and
"color") so there is only one "pl" definition in _LANG_OVERRIDES, or if the two
differ intentionally, merge their values into the single "pl" entry instead;
ensure no duplicate keys remain to fix the ruff F601 CI failure.
- Line 44: The domain list contains a value with a leading space (" humidifier")
which will produce incorrect vocabulary entries; remove the stray space so it
reads "humidifier" and also defensively trim domain strings before writing
entities (e.g., sanitize the domain list or apply .strip()/trim() to each item)
to prevent any other leading/trailing whitespace from leaking into generated
.entity files.
- Line 460: The import for the standard library module "re" is declared at the
bottom of the module but used earlier (around line 438), causing a lint error
(E402); move the line "import re" up into the top import block with the other
imports so the module-level imports appear before any code or usage, ensuring
the name "re" is available where it's referenced.
- Around line 6-9: The file imports unused symbols causing lint/CI
failures—remove the unused imports json, os, and defaultdict from the top of
generate_entities.py and leave only the required import(s) (e.g., pathlib.Path)
after verifying Path is actually used; update the import line(s) accordingly so
only used modules are imported.

In `@examples/hass-intent-dataset/reexport_uniform.py`:
- Around line 47-54: The code builds expansions from refs returned by
_extract_keyword_refs but preserves duplicates; change the loop that creates
expansions from refs so it deduplicates while preserving order: iterate refs,
keep a local seen set, for each ref skip if already seen, otherwise look up vals
= vocabs.get(ref) and if vals append {"keyword": ref, "values": vals} and add
ref to seen, then assign row["expansions"] only if expansions is non-empty;
update the block around refs/expansions to use this seen-based dedupe.

In `@examples/hf_dataset.py`:
- Line 17: Remove the unused top-level import "sys" from examples/hf_dataset.py:
delete the "import sys" statement (it is unused and ruff F401-failing), leaving
the local "argparse" usage inside main intact; verify there are no other
references to "sys" such as in function main or elsewhere before committing.

In `@ovos_spec_tools/datasets.py`:
- Around line 54-58: The SUPPORTED_DATASETS mapping in
ovos_spec_tools/datasets.py incorrectly maps the key "hassil-intents" to
"OpenVoiceOS/hassil-intents-locale" while the module docstring and docs expect
"OpenVoiceOS/hass-intent-templates"; update the dictionary entry in
SUPPORTED_DATASETS to use "OpenVoiceOS/hass-intent-templates" for
"hassil-intents" and ensure any related references (module docstring or
consumers of SUPPORTED_DATASETS) remain consistent; run or adjust
test_urls_valid if needed to validate the full repo name rather than only the
"OpenVoiceOS/" prefix.

In `@test/test_datasets.py`:
- Around line 204-207: The test test_urls_valid is too permissive—it's only
checking for a slash and the "OpenVoiceOS/" prefix on entries from
SUPPORTED_DATASETS, which allows wrong repo names to slip through; update the
test to assert the exact expected repository names for each key in
SUPPORTED_DATASETS (or assert the set of SUPPORTED_DATASETS.values() equals an
expected set/list of repo strings) so the registry cannot silently drift—modify
test_urls_valid to compare SUPPORTED_DATASETS against the precise expected repo
names.
- Around line 171-190: The test test_round_trip_small currently calls
load_dataset_templates before the patch, causing a real dataset fetch; fix it by
removing the real call and creating a local fixture list (e.g., a small list of
dicts with intent_id and template) to use as rows, then patch
ovos_spec_tools.datasets.load_dataset_templates to return that fixture before
calling export_to_locale; ensure you derive first = fixture[:2] and use
fixture[0]["intent_id"] and fixture[0]["template"] when checking the exported
intent file so the test remains hermetic and does not hit the datasets library.

---

Outside diff comments:
In `@pyproject.toml`:
- Around line 33-36: The test failure is caused by the test importing the real
datasets package via load_dataset_templates in
TestExportToLocale::test_round_trip_small (test/test_datasets.py); either make
that test hermetic by mocking/stubbing load_dataset_templates (or patching the
datasets import) so it doesn't require the real package, or if the intent is to
exercise the real loader, add "datasets" to the test extras in pyproject.toml
(the test = [...] list) so the CI environment installs it; update whichever
approach you choose and rerun tests.

---

Nitpick comments:
In `@examples/hass-intent-dataset/convert_hassil_intents.py`:
- Line 1226: The conditional that detects a leading Jinja if in the assignment
to sub_branches is overly complex; simplify the check by testing the stripped or
full string instead of computing leading-whitespace length—replace the
expression used in the ternary for sub_branches (which references branch and
calls _split_jinja_if(branch)) with a simpler membership test like "{% if" in
branch.lstrip() (or "{% if" in branch) so the branch is split via
_split_jinja_if(branch) only when appropriate.

In `@examples/hass-intent-dataset/generate_entities.py`:
- Around line 260-263: The lt entry in the languages dict defines "state" as a
nested list (e.g. [["įjungta", ...]]) but is later patched to a flat list;
change the "lt" -> "state" value to a flat list directly (remove the extra
nesting) and delete the subsequent post-dict patch that overwrites lt['state']
(the block that sets lt['state'] after the dict). Update references to "lt" and
"state" accordingly so no later correction is needed.

In `@examples/hass-intent-dataset/reexport_recursive.py`:
- Around line 29-69: The recursive expansion can explode in breadth; modify
_resolve_value and _resolve_expansions to accept a max_results (or limit)
parameter and enforce it during recursion: thread the limit through calls to
_resolve_value from _resolve_expansions, stop expanding further branches once
the accumulating results list reaches the limit (check len(results) during the
loop and return early), and propagate that early-return up the recursion so the
overall result set never exceeds max_results; also apply the same cap when
building flat_values before deduplication to prevent excessive memory use.

In `@ovos_spec_tools/datasets.py`:
- Around line 284-289: The loop over VOC_RE.finditer(tpl) (inside the branch if
not row.get("expansions")) is dead — it only assigns kw and does nothing; either
remove the loop entirely or implement the intended fallback: for each match
extract kw and register an empty expansion placeholder so downstream <keyword>
references resolve (e.g., ensure row has an "expansions" mapping and add an
entry for kw with an empty list/placeholder). Update the code around VOC_RE,
tpl, and row to create row.setdefault("expansions", {})[kw] = [] (or equivalent
placeholder structure used by downstream code) so the template fallback works,
or delete the whole block if the fallback is not needed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3b6dcafe-d104-4e1f-b5ff-bae3f224aa8f

📥 Commits

Reviewing files that changed from the base of the PR and between 4c24a60 and bd54e89.

📒 Files selected for processing (15)

docs/README.md
docs/api-reference.md
docs/datasets.md
examples/hass-intent-dataset/APPENDIX.md
examples/hass-intent-dataset/convert_hassil_intents.py
examples/hass-intent-dataset/export_hf_dataset.py
examples/hass-intent-dataset/generate_entities.py
examples/hass-intent-dataset/reexport_recursive.py
examples/hass-intent-dataset/reexport_uniform.py
examples/hf_dataset.py
examples/locale_to_hf_dataset.py
ovos_spec_tools/__init__.py
ovos_spec_tools/datasets.py
pyproject.toml
test/test_datasets.py

coderabbitai · 2026-06-02T08:44:24Z

+import hashlib
+import itertools
+import re
+import sys


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused import.

The sys module is imported but never referenced in the script.

🧹 Proposed fix

import hashlib import itertools import re -import sys import unicodedata from pathlib import Path

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import sys

import hashlib

import itertools

import re

import unicodedata

from pathlib import Path

🧰 Tools

🪛 GitHub Actions: Lint / 0_lint _ lint.txt

[error] 47-47: Ruff check failed: F401 sys imported but unused

🪛 GitHub Actions: Lint / lint _ lint

[error] 47-47: Ruff check failed: F401 sys imported but unused

🪛 GitHub Check: lint / lint

[failure] 47-47: ruff (F401)
examples/hass-intent-dataset/convert_hassil_intents.py:47:8: F401 sys imported but unused
help: Remove unused import: sys

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/hass-intent-dataset/convert_hassil_intents.py` at line 47, The import of the sys module is unused; remove the top-level "import sys" statement so the script no longer contains an unused import (look for the "import sys" line in convert_hassil_intents.py and delete it).

coderabbitai · 2026-06-02T08:44:25Z

+        "timer_pause": "timer_pause", "timer_unpause": "timer_unpause",
+        "timer": "timer", "timers": "timers",
+        "hilsen": "greeting", "mine_data": "my_data",
+        "abn": "open", "al": "all",


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove duplicate dictionary key.

The key "abn" already appears at line 658 in the same "da" (Danish) dictionary. Python silently overwrites the first entry with the second. While both map to "open" (so the outcome is the same), the duplication is a code smell and may indicate copy-paste error or future maintenance risk.

🔧 Proposed fix

"timer_add": "timer_add", "timer_decrease": "timer_decrease", "timer_pause": "timer_pause", "timer_unpause": "timer_unpause", "timer": "timer", "timers": "timers", "hilsen": "greeting", "mine_data": "my_data", - "abn": "open", "al": "all", + "al": "all", }, }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"abn": "open", "al": "all",

"al": "all",

🧰 Tools

🪛 GitHub Check: lint / lint

[failure] 702-702: ruff (F601)
examples/hass-intent-dataset/convert_hassil_intents.py:702:9: F601 Dictionary key literal "abn" repeated
help: Remove repeated key literal "abn"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/hass-intent-dataset/convert_hassil_intents.py` at line 702, The Danish "da" mapping in the dictionary contains a duplicate key "abn" (second occurrence at the shown diff); remove the redundant "abn": "open" entry so the "da" dict only has a single "abn" mapping, ensuring no duplicate keys remain in the dictionary (locate the "da" dict in convert_hassil_intents.py to make the edit).

coderabbitai · 2026-06-02T08:44:25Z

+        ]
+        thin.sort(key=lambda x: x[2] / x[1])
+        if thin:
+            print(f"  --check: intents with <50% sample survival --")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unnecessary f-string prefix.

The string at line 1759 has no placeholders and doesn't need the f prefix.

🧹 Proposed fix

- print(f" --check: intents with <50% sample survival --") + print(" --check: intents with <50% sample survival --")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print(f" --check: intents with <50% sample survival --")

print(" --check: intents with <50% sample survival --")

🧰 Tools

🪛 GitHub Check: lint / lint

[failure] 1759-1759: ruff (F541)
examples/hass-intent-dataset/convert_hassil_intents.py:1759:19: F541 f-string without any placeholders
help: Remove extraneous f prefix

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/hass-intent-dataset/convert_hassil_intents.py` at line 1759, The print statement currently uses an unnecessary f-string: replace the f-prefixed call to print (the line printing " --check: intents with <50% sample survival --") with a plain string literal to avoid misleading formatting usage; locate the print(...) invocation in convert_hassil_intents.py and remove the leading "f" so it becomes print(" --check: intents with <50% sample survival --").

coderabbitai · 2026-06-02T08:44:25Z

+from __future__ import annotations
+
+import json
+import os


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused import os.

Flagged by the lint check (ruff F401); CI is failing on it.

🧹 Proposed fix

import json -import os import re import sys

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import os

import json

import re

import sys

🧰 Tools

🪛 GitHub Check: lint / lint

[failure] 33-33: ruff (F401)
examples/hass-intent-dataset/export_hf_dataset.py:33:8: F401 os imported but unused
help: Remove unused import: os

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/hass-intent-dataset/export_hf_dataset.py` at line 33, Remove the unused import statement "import os" from the top-level imports in export_hf_dataset.py (the unused symbol is the os import) so the lint error ruff F401 is resolved; verify there are no references to os elsewhere in the file and run the linter/CI to confirm the import removal fixes the failing check.

coderabbitai · 2026-06-02T08:44:25Z

+import json
+import os
+from pathlib import Path
+from collections import defaultdict


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused imports.

json, os, and collections.defaultdict are unused (ruff F401, CI failing).

🧹 Proposed fix

from __future__ import annotations -import json -import os +import re from pathlib import Path -from collections import defaultdict

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import json

import os

from pathlib import Path

from collections import defaultdict

import re

from pathlib import Path

🧰 Tools

🪛 GitHub Check: lint / lint

[failure] 9-9: ruff (F401)
examples/hass-intent-dataset/generate_entities.py:9:25: F401 collections.defaultdict imported but unused
help: Remove unused import: collections.defaultdict

[failure] 7-7: ruff (F401)
examples/hass-intent-dataset/generate_entities.py:7:8: F401 os imported but unused
help: Remove unused import: os

[failure] 6-6: ruff (F401)
examples/hass-intent-dataset/generate_entities.py:6:8: F401 json imported but unused
help: Remove unused import: json

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/hass-intent-dataset/generate_entities.py` around lines 6 - 9, The file imports unused symbols causing lint/CI failures—remove the unused imports json, os, and defaultdict from the top of generate_entities.py and leave only the required import(s) (e.g., pathlib.Path) after verifying Path is actually used; update the import line(s) accordingly so only used modules are imported.

coderabbitai · 2026-06-02T08:44:25Z

+                refs = _extract_keyword_refs(template)
+                expansions: list[dict[str, object]] = []
+                for ref in refs:
+                    vals = vocabs.get(ref)
+                    if vals:
+                        expansions.append({"keyword": ref, "values": vals})
+                if expansions:
+                    row["expansions"] = expansions


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Duplicate expansions entries for repeated refs.

_extract_keyword_refs returns refs in document order with duplicates, so a template referencing <x> twice emits two identical {keyword, values} entries. De-duplicate while preserving order.

♻️ Proposed fix

- refs = _extract_keyword_refs(template) + refs = list(dict.fromkeys(_extract_keyword_refs(template)))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

refs = _extract_keyword_refs(template)

expansions: list[dict[str, object]] = []

for ref in refs:

vals = vocabs.get(ref)

if vals:

expansions.append({"keyword": ref, "values": vals})

if expansions:

row["expansions"] = expansions

refs = list(dict.fromkeys(_extract_keyword_refs(template)))

expansions: list[dict[str, object]] = []

for ref in refs:

vals = vocabs.get(ref)

if vals:

expansions.append({"keyword": ref, "values": vals})

if expansions:

row["expansions"] = expansions

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/hass-intent-dataset/reexport_uniform.py` around lines 47 - 54, The code builds expansions from refs returned by _extract_keyword_refs but preserves duplicates; change the loop that creates expansions from refs so it deduplicates while preserving order: iterate refs, keep a local seen set, for each ref skip if already seen, otherwise look up vals = vocabs.get(ref) and if vals append {"keyword": ref, "values": vals} and add ref to seen, then assign row["expansions"] only if expansions is non-empty; update the block around refs/expansions to use this seen-based dedupe.

coderabbitai · 2026-06-02T08:44:25Z

+"""
+from __future__ import annotations
+
+import sys


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused import sys.

Flagged by ruff F401 (CI failing); argparse is imported locally in main and sys is never used.

🧹 Proposed fix

-import sys from pathlib import Path

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import sys

from pathlib import Path

🧰 Tools

🪛 GitHub Check: lint / lint

[failure] 17-17: ruff (F401)
examples/hf_dataset.py:17:8: F401 sys imported but unused
help: Remove unused import: sys

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/hf_dataset.py` at line 17, Remove the unused top-level import "sys" from examples/hf_dataset.py: delete the "import sys" statement (it is unused and ruff F401-failing), leaving the local "argparse" usage inside main intact; verify there are no other references to "sys" such as in function main or elsewhere before committing.

coderabbitai · 2026-06-02T08:44:26Z

+SUPPORTED_DATASETS: dict[str, str] = {
+    "hassil-intents": "OpenVoiceOS/hassil-intents-locale",
+    "intents-for-eval": "OpenVoiceOS/intents-for-eval",
+    "massive-templates": "OpenVoiceOS/massive-templates",
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm which repo name appears across the PR rg -n "hassil-intents-locale|hass-intent-templates"

Repository: OpenVoiceOS/ovos-spec-tools

Length of output: 789

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "test_urls_valid" . || true rg -n "SUPPORTED_DATASETS" ovos_spec_tools/datasets.py . || true rg -n "docs/datasets\.md|docs/api-reference\.md|docs/README\.md" ovos_spec_tools/datasets.py . || true python - <<'PY' from pathlib import Path p = Path("ovos_spec_tools/datasets.py") print("datasets.py first 120 lines:") print("-----") print("\n".join(p.read_text(encoding="utf-8").splitlines()[:120])) PY

Repository: OpenVoiceOS/ovos-spec-tools

Length of output: 6162

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '160,260p' test/test_datasets.py

Repository: OpenVoiceOS/ovos-spec-tools

Length of output: 1987

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '120,220p' ovos_spec_tools/datasets.py

Repository: OpenVoiceOS/ovos-spec-tools

Length of output: 3186

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,120p' ovos_spec_tools/datasets.py echo "---- docs/datasets.md ----" rg -n "hassil-intents|hass-intent-templates|SUPPORTED_DATASETS" docs/datasets.md echo "---- docs/api-reference.md ----" rg -n "SUPPORTED_DATASETS|hassil-intents|hass-intent-templates" docs/api-reference.md echo "---- docs/README.md ----" rg -n "hassil-intents|hass-intent-templates" docs/README.md echo "---- docs/datasets.md context ----" sed -n '1,90p' docs/datasets.md sed -n '160,210p' docs/datasets.md

Repository: OpenVoiceOS/ovos-spec-tools

Length of output: 9136

Fix hassil-intents HF repo mismatch between registry and docs

ovos_spec_tools/datasets.py maps "hassil-intents" to OpenVoiceOS/hassil-intents-locale, but the module docstring and the docs (docs/datasets.md, docs/api-reference.md, docs/README.md) all document OpenVoiceOS/hass-intent-templates (per-language configs + expansions column).

test_urls_valid only checks OpenVoiceOS/ prefix, so it won’t catch this divergence.

🔧 Proposed fix (docs repo is correct)

SUPPORTED_DATASETS: dict[str, str] = { - "hassil-intents": "OpenVoiceOS/hassil-intents-locale", + "hassil-intents": "OpenVoiceOS/hass-intent-templates", "intents-for-eval": "OpenVoiceOS/intents-for-eval", "massive-templates": "OpenVoiceOS/massive-templates", }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ovos_spec_tools/datasets.py` around lines 54 - 58, The SUPPORTED_DATASETS mapping in ovos_spec_tools/datasets.py incorrectly maps the key "hassil-intents" to "OpenVoiceOS/hassil-intents-locale" while the module docstring and docs expect "OpenVoiceOS/hass-intent-templates"; update the dictionary entry in SUPPORTED_DATASETS to use "OpenVoiceOS/hass-intent-templates" for "hassil-intents" and ensure any related references (module docstring or consumers of SUPPORTED_DATASETS) remain consistent; run or adjust test_urls_valid if needed to validate the full repo name rather than only the "OpenVoiceOS/" prefix.

coderabbitai · 2026-06-02T08:44:26Z

+    def test_round_trip_small(self):
+        """Load a single row, export, verify the intent file has the template."""
+        rows = load_dataset_templates("hassil-intents", lang="en", streaming=False)
+        # Only keep first 3 rows to make it fast
+        first = [rows[0], rows[1]]
+
+        with tempfile.TemporaryDirectory() as tmp:
+            dst = Path(tmp)
+            with patch(
+                "ovos_spec_tools.datasets.load_dataset_templates",
+                return_value=first,
+            ):
+                count = export_to_locale("hassil-intents", "en", dst)
+            assert count == 2
+
+            # Check that first intent file has rows[0] template
+            name = rows[0]["intent_id"].split(":")[-1]
+            intent_path = dst / "locale" / "en" / f"{name}.intent"
+            assert intent_path.exists()
+            assert rows[0]["template"] in intent_path.read_text()


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

test_round_trip_small makes a real, un-mocked dataset call — root cause of the CI failures.

Line 173 calls load_dataset_templates("hassil-intents", lang="en", streaming=False) before the patch, so it actually imports datasets and hits HuggingFace. This fails in CI (ImportError: The datasets library is required) and, even with the dependency installed, would make the unit test network-bound, slow, and flaky. Replace the real fetch with fixture rows so the test stays hermetic.

💚 Suggested hermetic rewrite

- def test_round_trip_small(self): - """Load a single row, export, verify the intent file has the template.""" - rows = load_dataset_templates("hassil-intents", lang="en", streaming=False) - # Only keep first 3 rows to make it fast - first = [rows[0], rows[1]] - - with tempfile.TemporaryDirectory() as tmp: + def test_round_trip_small(self): + """Export fixture rows, verify the intent file has the template.""" + rows = [ + {"intent_id": "test:greet", "template": "<hello> {name}", + "slots": [], "expansions": [{"keyword": "hello", "values": ["hi"]}]}, + {"intent_id": "test:bye", "template": "<bye> {name}", + "slots": [], "expansions": [{"keyword": "bye", "values": ["goodbye"]}]}, + ] + first = [rows[0], rows[1]] + + with tempfile.TemporaryDirectory() as tmp: dst = Path(tmp) with patch( "ovos_spec_tools.datasets.load_dataset_templates", return_value=first, ): count = export_to_locale("hassil-intents", "en", dst) assert count == 2 # Check that first intent file has rows[0] template name = rows[0]["intent_id"].split(":")[-1] intent_path = dst / "locale" / "en" / f"{name}.intent" assert intent_path.exists() assert rows[0]["template"] in intent_path.read_text()

🧰 Tools

🪛 GitHub Actions: Build Tests / 1_build _ build_tests (3.13).txt

[error] 173-173: Pytest failed: TestExportToLocale.test_round_trip_small could not load HuggingFace dataset because datasets is not installed.

🪛 GitHub Actions: Build Tests / 2_build _ build_tests (3.12).txt

[error] 173-173: Pytest failure in TestExportToLocale.test_round_trip_small when calling load_dataset_templates("hassil-intents", lang="en", streaming=False), which raised ImportError because the datasets library is not installed.

🪛 GitHub Actions: Build Tests / 3_build _ build_tests (3.14).txt

[error] 173-173: Test failure: TestExportToLocale.test_round_trip_small raised ImportError because datasets library is not installed.

🪛 GitHub Actions: Build Tests / 4_build _ build_tests (3.10).txt

[error] 173-173: Pytest failure: TestExportToLocale.test_round_trip_small failed when calling load_dataset_templates('hassil-intents', lang='en', streaming=False). Error: ImportError/ModuleNotFoundError for missing 'datasets' library.

🪛 GitHub Actions: Build Tests / 5_build _ build_tests (3.11).txt

[error] 173-173: Pytest failed: TestExportToLocale.test_round_trip_small. load_dataset_templates('hassil-intents', lang='en', streaming=False) raised ImportError because the 'datasets' library is not installed.

🪛 GitHub Actions: Build Tests / build _ build_tests (3.10)

[error] 173-173: Failed test: TestExportToLocale.test_round_trip_small (load_dataset_templates('hassil-intents', lang='en', streaming=False)) because required dependency 'datasets' is not installed.

🪛 GitHub Actions: Build Tests / build _ build_tests (3.11)

[error] 173-173: Failed test: TestExportToLocale.test_round_trip_small. Error: ImportError: The datasets library is required; install it with: pip install datasets.

🪛 GitHub Actions: Build Tests / build _ build_tests (3.12)

[error] 173-173: TestExportToLocale.test_round_trip_small failed while calling load_dataset_templates('hassil-intents', lang='en', streaming=False).

🪛 GitHub Actions: Build Tests / build _ build_tests (3.13)

[error] 173-173: Failure in TestExportToLocale.test_round_trip_small: load_dataset_templates("hassil-intents", lang="en", streaming=False) raised ImportError because the datasets library is not installed.

🪛 GitHub Actions: Build Tests / build _ build_tests (3.14)

[error] 173-173: Pytest failed: TestExportToLocale.test_round_trip_small. load_dataset_templates('hassil-intents', lang='en', streaming=False) raised ImportError because the datasets library is not installed.

🪛 GitHub Actions: Code Coverage / 0_coverage _ coverage.txt

[error] 173-173: TestExportToLocale.test_round_trip_small failed. Called load_dataset_templates("hassil-intents", lang="en", streaming=False) which raised ImportError because the datasets library is not installed.

🪛 GitHub Actions: Code Coverage / coverage _ coverage

[error] 173-173: Failure occurred when calling load_dataset_templates("hassil-intents", lang="en", streaming=False) due to missing datasets library.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/test_datasets.py` around lines 171 - 190, The test test_round_trip_small currently calls load_dataset_templates before the patch, causing a real dataset fetch; fix it by removing the real call and creating a local fixture list (e.g., a small list of dicts with intent_id and template) to use as rows, then patch ovos_spec_tools.datasets.load_dataset_templates to return that fixture before calling export_to_locale; ensure you derive first = fixture[:2] and use fixture[0]["intent_id"] and fixture[0]["template"] when checking the exported intent file so the test remains hermetic and does not hit the datasets library.

coderabbitai · 2026-06-02T08:44:26Z

+    def test_urls_valid(self):
+        for url in SUPPORTED_DATASETS.values():
+            assert "/" in url
+            assert url.startswith("OpenVoiceOS/")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

test_urls_valid is too weak to catch a wrong repo.

It only asserts the OpenVoiceOS/ prefix and a /, so it passes for both hassil-intents-locale and hass-intent-templates (see the registry mismatch flagged in datasets.py). Consider asserting the exact expected repo names so the registry can't silently drift from the docs.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/test_datasets.py` around lines 204 - 207, The test test_urls_valid is too permissive—it's only checking for a slash and the "OpenVoiceOS/" prefix on entries from SUPPORTED_DATASETS, which allows wrong repo names to slip through; update the test to assert the exact expected repository names for each key in SUPPORTED_DATASETS (or assert the set of SUPPORTED_DATASETS.values() equals an expected set/list of repo strings) so the registry cannot silently drift—modify test_urls_valid to compare SUPPORTED_DATASETS against the precise expected repo names.

…other langs

Drop all hardcoded strings from Python scripts — single source of truth is base_locale/<lang>/<slot>.entity files: - generate_base_locale.py: seed file with translation data for 59 languages - base_locale/: 413 .entity files (area, name, color, state, device_class, domain, floor) across 59 languages - convert_hassil_intents.py: read area.entity from base_locale instead of hardcoded COMMON_AREA_NAMES dict - export_hf_dataset.py: slot examples from base_locale/, drop DOMAIN_DEVICE_NAMES and _extract_domain - generate_entities.py: read from base_locale/ instead of _LANG_OVERRIDES and other hardcoded data Numeric slots (brightness, percentage, temperature, ...) are still generated programmatically as they are language-agnostic. HA internal identifiers (device_class, domain) remain English in all languages.

inline_keywords(template, expansions) replaces <keyword> refs with (a|b|c) alternation groups inline — needed for engines like Padatious that don't look up .voc files at runtime. Handles nested refs recursively with configurable max_values cap. Exported from ovos_spec_tools top-level package and documented in the API reference.

…pport" This reverts commit 0a34442.

github-actions Bot added the feature label Jun 2, 2026

JarbasAl added 4 commits June 2, 2026 09:31

fix: move dataset creation scripts into examples/hass-intent-dataset/

6455fbd

docs: update APPENDIX.md to reflect v3 union semantics and current caps

edb2f32

chore: remove AGENTS.md and TODO.md from PR

e2e148f

docs: add datasets chapter (7), update README and API reference

bd54e89

JarbasAl marked this pull request as ready for review June 2, 2026 08:37

github-actions Bot added feature and removed feature labels Jun 2, 2026

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

JarbasAl added 4 commits June 2, 2026 09:47

fix: domain-aware {name} slot examples (English only), no English in …

2c1238d

…other langs

Revert "feat: add inline_keywords utility for engines without .voc su…

1eb9068

…pport" This reverts commit 0a34442.

JarbasAl marked this pull request as draft June 27, 2026 14:59

	print(f" --check: intents with <50% sample survival --")
	print(" --check: intents with <50% sample survival --")

Uh oh!

Conversation

JarbasAl commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's added

ovos_spec_tools/datasets.py

New example scripts

Other

Usage

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated check summary ready. 📊

🔍 Lint

🔨 Build Tests

📊 Coverage

🔒 Security (pip-audit)

🏷️ Release Preview

📋 Repo Health

⚖️ License Check

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JarbasAl commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

`ovos_spec_tools/datasets.py`

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading