Releases: gladiaio/normalization
v0.2.0
What's Changed
Five new languages. This release adds full normalization support for French, Spanish, German, Italian, and Dutch, bringing the total to six supported languages. Each one includes number expansion, word and sentence-level replacements, and language-specific handling where needed.
Dedicated number normalizers. Every new language ships with its own number-to-digit expansion logic, handling compound numbers (Dutch, German), gendered forms (Spanish, Italian), and special constructs like quatre-vingts in French.
CLI. A new gladia-normalization command lets you normalize text or files directly from the terminal — no scripting needed.
Reorganized test suite. End-to-end tests are now split into per-language CSV files for easier maintenance and contribution.
New Contributors
- @egenthon-cmd made their first contribution in #13
Full Changelog: v0.1.1...v0.2.0
v0.1.1
What's Changed
- Compile regex patterns once for normalization speed up by @Karamouche in #12
Full Changelog: v0.1.0...v0.1.1
v0.1.0
🎉 First public release of gladia-normalization
We're excited to release the first version of gladia-normalization, an open-source Python library for text normalization — designed to enable fair and reproducible WER (Word Error Rate) comparisons across ASR systems.
What it does
gladia-normalization provides a modular, YAML-configured pipeline to normalize transcription text before evaluation — handling things like number formatting, punctuation, casing, and language-specific rules.
What's included in v0.1.0
- English normalization
- Preset system — load normalization presets by name for easy reuse across languages and use cases
- Flexible language config — language-specific normalization rules that are easy to extend
Getting started
pip install gladia-normalization→ Check out the README and Contributing Guide to get involved.
Full Changelog: https://github.com/gladiaio/normalization/commits/v0.1.0