Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 58 additions & 25 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,76 @@
## What does this PR do?

<!-- One-sentence summarizing of the change. -->
<!-- One sentence: what changed and why. -->

## Type of change

- [ ] New language (`languages/{lang}/`)
- [ ] New step (`steps/text/` or `steps/word/`)
- [ ] New preset version (`presets/`)
- [ ] Bug fix
- [ ] Refactor / internal cleanup
- [ ] Docs / CI
- [ ] New language
- [ ] Edit existing language (fix a replacement, tweak config, …)
- [ ] New normalization step
- [ ] Edit existing step (bug fix, behaviour change)
- [ ] New preset version
- [ ] Bug fix (other)
- [ ] Refactor / docs / CI

---

## Checklist

**Only fill in the section(s) that match your change — delete the rest.**

---

### New language

- [ ] Created `languages/{lang}/` with `operators.py`, `replacements.py`, `__init__.py`
- [ ] All word-level substitutions are in `replacements.py`, not inline in `operators.py`
- [ ] Decorated operators class with `@register_language`
- [ ] Added one import line to `languages/__init__.py`
- [ ] Added unit tests in `tests/unit/languages/`
- [ ] Added a per-language CSV in `tests/e2e/files/{preset}/` (e.g. `tests/e2e/files/gladia-3/fr.csv`)
- [ ] Created `normalization/languages/{lang}/` with `operators.py`, `replacements.py`, `__init__.py`
- [ ] Word substitutions are in `replacements.py` (not hardcoded in `operators.py`)
- [ ] `LanguageConfig` is filled in with the language's data (separators, currency words, digit words, …)
- [ ] Subclassed `LanguageOperators` — only override methods where the **logic** changes, not just the data
- [ ] Class is decorated with `@register_language` and imported in `normalization/languages/__init__.py`
- [ ] Unit tests added in `tests/unit/languages/`
- [ ] E2e CSV added in `tests/e2e/files/{preset}/{lang}.csv` (e.g. `tests/e2e/files/gladia-3/fr.csv`)

---

### Edit existing language

- [ ] New/changed word substitutions go in `replacements.py`, not inline in `operators.py`
- [ ] If you changed a config field that can be `None`: the step reading it still handles `None` gracefully
- [ ] Unit tests updated or added
- [ ] E2e CSV updated if the expected output changed

---

### New step

- [ ] `name` class attribute is unique and matches the YAML key
- [ ] Decorated with `@register_step`
- [ ] Added one import line to `steps/text/__init__.py` or `steps/word/__init__.py`
- [ ] Algorithm reads data from `operators.config.*`, no hardcoded language-specific values
- [ ] Optional config fields are guarded with `if operators.config.field is None: return text`
- [ ] Placeholder protect/restore pairs are both in `steps/text/placeholders.py` and `pipeline/base.py`'s `validate()` is updated
- [ ] Added unit tests in `tests/unit/steps/`
- [ ] Added step name to relevant preset YAMLs (new preset file if existing presets are affected)
- [ ] If the class docstring was added or changed, ran `uv run scripts/generate_step_docs.py` to regenerate `docs/steps.md`
- [ ] Unique `name` class attribute set (this is the key used in YAML presets)
- [ ] Decorated with `@register_step` and imported in `steps/text/__init__.py` or `steps/word/__init__.py`
- [ ] No hardcoded language values — read data from `operators.config.*` instead
- [ ] If placeholder-based: protect + restore are both in `steps/text/placeholders.py` and `pipeline/base.py`'s `validate()` is updated
- [ ] Unit tests added in `tests/unit/steps/`
- [ ] Step name added to the relevant preset YAML — or a **new preset file** created if existing presets are affected
- [ ] If the docstring changed: ran `uv run scripts/generate_step_docs.py`

---

### Edit existing step

- [ ] Step `name` is unchanged — if the output changes, create a new step name + new preset instead
- [ ] No language-specific logic or string literals added inside the step
- [ ] Unit tests updated or added
- [ ] If the docstring changed: ran `uv run scripts/generate_step_docs.py`

---

### Preset change

- [ ] Existing preset files are not modified — new behavior uses a new preset version file
- [ ] Existing preset files are **not modified** — new behaviour goes in a new preset file
- [ ] `pipeline.validate()` passes (runs automatically via `loader.py`)

---

## Tests
## How was this tested?

<!-- Describe what was tested and how. -->
```
uv run pytest tests/
```
Comment on lines +74 to +76
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language to the fenced code block (markdownlint MD040).

Line 74 should specify a fence language to satisfy lint and improve readability.

Proposed fix
-```
+```bash
 uv run pytest tests/
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>

[warning] 74-74: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @.github/pull_request_template.md around lines 74 - 76, Add a fence language
to the Markdown code block containing "uv run pytest tests/" by changing the
opening triple backticks to include a language identifier (e.g., bash) so the
fenced block becomes "```bash" to satisfy markdownlint MD040 and improve
readability; update the fenced block near the existing "uv run pytest tests/"
snippet accordingly.


</details>

<!-- fingerprinting:phantom:triton:hawk:15d7adc9-b32d-459a-82ca-f41ecc83c053 -->

<!-- This is an auto-generated comment by CodeRabbit -->

11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,10 +85,13 @@ Pipelines are defined declaratively in **YAML presets**. Each preset lists the s

## Supported languages

| Code | Language |
| ---- | -------------- |
| `en` | English |
| `fr` | French (alpha) |
| Code | Language |
| ---- | -------- |
| `en` | English |
| `fr` | French |
| `de` | German |
| `it` | Italian |
| `es` | Spanish |

Unsupported language codes fall back to a safe default that applies language-independent normalization only.

Expand Down
Loading