From 26e9f07062c157a5b53a609378948c3750c3b3f1 Mon Sep 17 00:00:00 2001 From: karamouche Date: Tue, 14 Apr 2026 16:52:11 -0400 Subject: [PATCH 1/2] docs: update README to include support for German, Italian, and Spanish languages --- README.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index ffe25a9..6231cdb 100644 --- a/README.md +++ b/README.md @@ -85,10 +85,13 @@ Pipelines are defined declaratively in **YAML presets**. Each preset lists the s ## Supported languages -| Code | Language | -| ---- | -------------- | -| `en` | English | -| `fr` | French (alpha) | +| Code | Language | +| ---- | -------- | +| `en` | English | +| `fr` | French | +| `de` | German | +| `it` | Italian | +| `es` | Spanish | Unsupported language codes fall back to a safe default that applies language-independent normalization only. From bf38462bb3920371eeb2912909314e91600c6fe4 Mon Sep 17 00:00:00 2001 From: karamouche Date: Tue, 14 Apr 2026 16:57:33 -0400 Subject: [PATCH 2/2] chore: enhance pull request template for clarity and organization --- .github/pull_request_template.md | 83 ++++++++++++++++++++++---------- 1 file changed, 58 insertions(+), 25 deletions(-) diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index efcc3e9..148b55f 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -1,43 +1,76 @@ ## What does this PR do? - + ## Type of change -- [ ] New language (`languages/{lang}/`) -- [ ] New step (`steps/text/` or `steps/word/`) -- [ ] New preset version (`presets/`) -- [ ] Bug fix -- [ ] Refactor / internal cleanup -- [ ] Docs / CI +- [ ] New language +- [ ] Edit existing language (fix a replacement, tweak config, …) +- [ ] New normalization step +- [ ] Edit existing step (bug fix, behaviour change) +- [ ] New preset version +- [ ] Bug fix (other) +- [ ] Refactor / docs / CI + +--- ## Checklist +**Only fill in the section(s) that match your change — delete the rest.** + +--- + ### New language -- [ ] Created `languages/{lang}/` with `operators.py`, `replacements.py`, `__init__.py` -- [ ] All word-level substitutions are in `replacements.py`, not inline in `operators.py` -- [ ] Decorated operators class with `@register_language` -- [ ] Added one import line to `languages/__init__.py` -- [ ] Added unit tests in `tests/unit/languages/` -- [ ] Added a per-language CSV in `tests/e2e/files/{preset}/` (e.g. `tests/e2e/files/gladia-3/fr.csv`) +- [ ] Created `normalization/languages/{lang}/` with `operators.py`, `replacements.py`, `__init__.py` +- [ ] Word substitutions are in `replacements.py` (not hardcoded in `operators.py`) +- [ ] `LanguageConfig` is filled in with the language's data (separators, currency words, digit words, …) +- [ ] Subclassed `LanguageOperators` — only override methods where the **logic** changes, not just the data +- [ ] Class is decorated with `@register_language` and imported in `normalization/languages/__init__.py` +- [ ] Unit tests added in `tests/unit/languages/` +- [ ] E2e CSV added in `tests/e2e/files/{preset}/{lang}.csv` (e.g. `tests/e2e/files/gladia-3/fr.csv`) + +--- + +### Edit existing language + +- [ ] New/changed word substitutions go in `replacements.py`, not inline in `operators.py` +- [ ] If you changed a config field that can be `None`: the step reading it still handles `None` gracefully +- [ ] Unit tests updated or added +- [ ] E2e CSV updated if the expected output changed + +--- ### New step -- [ ] `name` class attribute is unique and matches the YAML key -- [ ] Decorated with `@register_step` -- [ ] Added one import line to `steps/text/__init__.py` or `steps/word/__init__.py` -- [ ] Algorithm reads data from `operators.config.*`, no hardcoded language-specific values -- [ ] Optional config fields are guarded with `if operators.config.field is None: return text` -- [ ] Placeholder protect/restore pairs are both in `steps/text/placeholders.py` and `pipeline/base.py`'s `validate()` is updated -- [ ] Added unit tests in `tests/unit/steps/` -- [ ] Added step name to relevant preset YAMLs (new preset file if existing presets are affected) -- [ ] If the class docstring was added or changed, ran `uv run scripts/generate_step_docs.py` to regenerate `docs/steps.md` +- [ ] Unique `name` class attribute set (this is the key used in YAML presets) +- [ ] Decorated with `@register_step` and imported in `steps/text/__init__.py` or `steps/word/__init__.py` +- [ ] No hardcoded language values — read data from `operators.config.*` instead +- [ ] If placeholder-based: protect + restore are both in `steps/text/placeholders.py` and `pipeline/base.py`'s `validate()` is updated +- [ ] Unit tests added in `tests/unit/steps/` +- [ ] Step name added to the relevant preset YAML — or a **new preset file** created if existing presets are affected +- [ ] If the docstring changed: ran `uv run scripts/generate_step_docs.py` + +--- + +### Edit existing step + +- [ ] Step `name` is unchanged — if the output changes, create a new step name + new preset instead +- [ ] No language-specific logic or string literals added inside the step +- [ ] Unit tests updated or added +- [ ] If the docstring changed: ran `uv run scripts/generate_step_docs.py` + +--- ### Preset change -- [ ] Existing preset files are not modified — new behavior uses a new preset version file +- [ ] Existing preset files are **not modified** — new behaviour goes in a new preset file +- [ ] `pipeline.validate()` passes (runs automatically via `loader.py`) + +--- -## Tests +## How was this tested? - +``` +uv run pytest tests/ +```