Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions normalization/languages/dutch/operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,19 @@
"m": "hem",
}

_DUTCH_DIGIT_WORDS: dict[str, str] = {
"nul": "0",
"een": "1",
"twee": "2",
"drie": "3",
"vier": "4",
"vijf": "5",
"zes": "6",
"zeven": "7",
"acht": "8",
"negen": "9",
}
Comment on lines +35 to +46
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, understand the file structure and how digit_words are used
find . -type f -name "*.py" | grep -E "(dutch|operators)" | head -20

Repository: gladiaio/normalization

Length of output: 715


🏁 Script executed:

# Search for the functions mentioned in the review comment
rg "fix_dot_adjacent_number_words|protect_plus_word_before_digit_words" --type py -A 5

Repository: gladiaio/normalization

Length of output: 3696


🏁 Script executed:

# Check how digit_words are constructed and passed to these functions
rg "digit_words" --type py -B 2 -A 2 | head -100

Repository: gladiaio/normalization

Length of output: 8296


🏁 Script executed:

# Look for tests related to Dutch and "een"
find . -type f -name "*.py" | xargs grep -l "dutch\|Nederlandse" | grep test

Repository: gladiaio/normalization

Length of output: 203


🏁 Script executed:

# Read the full implementation of fix_dot_adjacent_number_words
cat -n normalization/steps/text/fix_dot_adjacent_number_words.py

Repository: gladiaio/normalization

Length of output: 1597


🏁 Script executed:

# Read the full implementation of protect_plus_word_before_digit_words
cat -n normalization/steps/text/protect_plus_word_before_digit_words.py

Repository: gladiaio/normalization

Length of output: 1606


🏁 Script executed:

# Check Dutch tests for "een" coverage
cat -n tests/unit/languages/dutch_operators_test.py | head -150

Repository: gladiaio/normalization

Length of output: 2352


🏁 Script executed:

# Check if there are any test cases with "een" as the indefinite article
rg "een" tests/unit/languages/dutch_operators_test.py -B 2 -A 2

Repository: gladiaio/normalization

Length of output: 424


🏁 Script executed:

# Check the dutch number normalizer tests for "een" coverage
cat -n tests/unit/languages/dutch_number_normalizer_test.py | head -100

Repository: gladiaio/normalization

Length of output: 2562


🏁 Script executed:

# Search for any integration tests or broader test scenarios
rg "een" tests/unit/languages/dutch_number_normalizer_test.py -B 2 -A 2

Repository: gladiaio/normalization

Length of output: 227


🏁 Script executed:

# Check if there are any other test files that might test the digit_words steps
find tests -type f -name "*.py" -exec grep -l "fix_dot_adjacent\|protect_plus_word" {} \;

Repository: gladiaio/normalization

Length of output: 48


🏁 Script executed:

# Check if there are step-level tests for these functions
find tests -type f -name "*.py" | xargs grep -l "FixDotAdjacentNumberWordsStep\|ProtectPlusWordBeforeDigitWordsStep" 2>/dev/null

Repository: gladiaio/normalization

Length of output: 50


🏁 Script executed:

# Search for any integration tests that might run these steps end-to-end
rg "fix_dot_adjacent\|protect_plus_word" tests/ -l

Repository: gladiaio/normalization

Length of output: 48


🏁 Script executed:

# Let's also check the Dutch config to understand the full context
cat -n normalization/languages/dutch/operators.py

Repository: gladiaio/normalization

Length of output: 5161


"een" matching in numeric contexts is intentional but warrants test coverage.

The functions fix_dot_adjacent_number_words and protect_plus_word_before_digit_words use word boundaries (\b) in their regex patterns, which provides protection against false positives in arbitrary text. However, "een" will still match when it appears in the specific numeric patterns these steps are designed for:

  • After "punt" (dot): "versie punt een""versie punt 1" ✓ (intended for IPs/versions)
  • After "plus" (plus): "plus een" → converts to phone context marker (intended for +1 country codes)

In these contexts, the behavior is correct. However, there's a narrow edge case risk: ambiguous sentences like "Dit kost plus een euro" could be misparsed. Adding explicit tests for "een" in these numeric-context patterns would confirm the behavior is safe for your typical input corpus.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@normalization/languages/dutch/operators.py` around lines 35 - 46, Add unit
tests to cover the Dutch digit-word "een" in numeric contexts handled by
fix_dot_adjacent_number_words and protect_plus_word_before_digit_words: assert
that "versie punt een" transforms "een" → "1" (dot/version/IP context) and that
"plus een" is treated as a phone-country-code context by
protect_plus_word_before_digit_words, and also add an edge-case test like "Dit
kost plus een euro" to ensure it does not incorrectly convert in ordinary
currency phrases; use the existing _DUTCH_DIGIT_WORDS mapping and the same test
harness used for other Dutch normalization tests to locate and validate
behavior.


DUTCH_CONFIG = LanguageConfig(
code="nl",
decimal_separator=",",
Expand Down Expand Up @@ -78,6 +91,37 @@
"uh",
],
sentence_replacements=DUTCH_SENTENCE_REPLACEMENTS,
digit_words=_DUTCH_DIGIT_WORDS,
number_words=[
*_DUTCH_DIGIT_WORDS,
"tien",
"elf",
"twaalf",
"dertien",
"veertien",
"vijftien",
"zestien",
"zeventien",
"achttien",
"negentien",
"twintig",
"dertig",
"veertig",
"vijftig",
"zestig",
"zeventig",
"tachtig",
"negentig",
"honderd",
"duizend",
"miljoen",
"miljoenen",
"miljard",
"miljarden",
"biljoen",
"biljoenen",
],
plus_word="plus",
)


Expand Down
67 changes: 67 additions & 0 deletions normalization/languages/german/number_normalizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
"""German number normalizer using text2num's alpha2digit.

Converts spelled-out numbers to digits (e.g. zwanzig → 20) and handles
mixed digit+word forms (e.g. 2 hundert → zwei hundert) before conversion
so alpha2digit does not misinterpret them.

A post-pass replaces words alpha2digit leaves unconverted in isolation:
- 'null' → '0' (alpha2digit skips it standalone)
- 'zwei' → '2' (alpha2digit skips it standalone and in plain noun phrases)
'ein'/'eins' are intentionally excluded — 'ein' is the German indefinite
article and cannot be safely replaced without context.
"""

import re

from text_to_num import alpha2digit

_DIGIT_TO_GERMAN: dict[str, str] = {
"0": "null",
"1": "ein",
"2": "zwei",
"3": "drei",
"4": "vier",
"5": "fünf",
"6": "sechs",
"7": "sieben",
"8": "acht",
"9": "neun",
}

_RE_MIXED_NUMBER = re.compile(
r"\b(\d+)\s+(hundert|tausend|millionen?|milliarden?|billionen?)\b",
re.IGNORECASE,
)

_RE_ZWEI = re.compile(r"\bzwei\b", re.IGNORECASE)
_RE_NULL = re.compile(r"\bnull\b", re.IGNORECASE)


def _normalize_mixed_numbers(text: str) -> str:
"""Convert '2 hundert' → 'zwei hundert' so alpha2digit yields 200, not '2 100'."""

def replace(match: re.Match) -> str:
number = match.group(1)
multiplier = match.group(2)
if len(number) == 1 and number in _DIGIT_TO_GERMAN:
return f"{_DIGIT_TO_GERMAN[number]} {multiplier}"
Comment on lines +31 to +47
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix mixed digit + German scale preprocessing for singular scale words.

Line 32 misses valid singular million/billion and Line 47 rewrites 1 milliarde as ein milliarde. That leaves common inputs like 2 million unhandled and can make 1 Million/Milliarde/Billion less alpha2digit-friendly.

🐛 Proposed fix
+_FEMININE_SINGULAR_SCALES = {"million", "milliarde", "billion"}
+
 _RE_MIXED_NUMBER = re.compile(
-    r"\b(\d+)\s+(hundert|tausend|millionen?|milliarden?|billionen?)\b",
+    r"\b(\d+)\s+(hundert|tausend|million(?:en)?|milliarde(?:n)?|billion(?:en)?)\b",
     re.IGNORECASE,
 )
         number = match.group(1)
         multiplier = match.group(2)
         if len(number) == 1 and number in _DIGIT_TO_GERMAN:
-            return f"{_DIGIT_TO_GERMAN[number]} {multiplier}"
+            digit_word = _DIGIT_TO_GERMAN[number]
+            if number == "1" and multiplier.lower() in _FEMININE_SINGULAR_SCALES:
+                digit_word = "eine"
+            return f"{digit_word} {multiplier}"
         return match.group(0)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@normalization/languages/german/number_normalizer.py` around lines 31 - 47,
The regex _RE_MIXED_NUMBER should accept singular forms of the large-scale words
and the replacer in _normalize_mixed_numbers must special-case the digit "1" to
use the feminine form "eine" for feminine scales (million/millionen,
milliarde/milliarden, billion/billionen) instead of the default _DIGIT_TO_GERMAN
value that yields "ein"; update the pattern for _RE_MIXED_NUMBER to include
explicit singular variants (e.g. million, milliarde, billion as well as their
plural forms) and modify the replace(match) in _normalize_mixed_numbers to
return "eine {multiplier}" when number == "1" and multiplier is in the feminine
set, otherwise fall back to the existing mapping (so "2 million" is matched and
becomes "zwei million" and "1 milliarde" becomes "eine milliarde").

return match.group(0)

return _RE_MIXED_NUMBER.sub(replace, text)


def _fix_remaining_words(text: str) -> str:
"""Replace number words alpha2digit did not convert."""
text = _RE_ZWEI.sub("2", text)
text = _RE_NULL.sub("0", text)
return text


class GermanNumberNormalizer:
"""Convert German spelled-out numbers to digits via text2num.alpha2digit."""

def __call__(self, text: str) -> str:
text = _normalize_mixed_numbers(text)
text = alpha2digit(text, "de")
text = _fix_remaining_words(text)
return text
50 changes: 50 additions & 0 deletions normalization/languages/german/operators.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,25 @@
from normalization.languages.base import LanguageConfig, LanguageOperators
from normalization.languages.german.number_normalizer import GermanNumberNormalizer
from normalization.languages.german.replacements import GERMAN_REPLACEMENTS
from normalization.languages.german.sentence_replacements import (
GERMAN_SENTENCE_REPLACEMENTS,
)
from normalization.languages.registry import register_language

_GERMAN_DIGIT_WORDS: dict[str, str] = {
"null": "0",
"ein": "1",
"eins": "1",
Comment on lines +9 to +12
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep ambiguous ein out of digit_words used by plus protection.

Line 11 makes ein a digit token while Line 79 enables the plus-word protection path. Since ProtectPlusWordBeforeDigitWordsStep consumes config.digit_words, normal phrases like plus ein bisschen can be treated as phone-plus context and later become + ein bisschen.

🐛 Proposed fix
 _GERMAN_DIGIT_WORDS: dict[str, str] = {
     "null": "0",
-    "ein": "1",
     "eins": "1",
     "zwei": "2",
     digit_words=_GERMAN_DIGIT_WORDS,
     number_words=[
+        "ein",
         *_GERMAN_DIGIT_WORDS,
         "zehn",

Also applies to: 49-50, 79-79

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@normalization/languages/german/operators.py` around lines 9 - 12, The
digit-word mapping _GERMAN_DIGIT_WORDS incorrectly includes the ambiguous token
"ein", which causes ProtectPlusWordBeforeDigitWordsStep (which consumes
config.digit_words) to misclassify normal phrases like "plus ein bisschen" as
phone-plus context; remove "ein" from _GERMAN_DIGIT_WORDS (leave "eins" if
needed) and ensure any configuration or references to config.digit_words no
longer contain the ambiguous "ein" token so plus-word protection only triggers
on unambiguous digit words.

"zwei": "2",
"drei": "3",
"vier": "4",
"fünf": "5",
"sechs": "6",
"sieben": "7",
"acht": "8",
"neun": "9",
}

GERMAN_CONFIG = LanguageConfig(
code="de",
decimal_separator=",",
Expand All @@ -31,13 +46,48 @@
},
filler_words=["äh", "ähm", "hm", "also", "naja", "halt"],
sentence_replacements=GERMAN_SENTENCE_REPLACEMENTS,
digit_words=_GERMAN_DIGIT_WORDS,
number_words=[
*_GERMAN_DIGIT_WORDS,
"zehn",
"elf",
"zwölf",
"dreizehn",
"vierzehn",
"fünfzehn",
"sechzehn",
"siebzehn",
"achtzehn",
"neunzehn",
"zwanzig",
"dreißig",
"vierzig",
"fünfzig",
"sechzig",
"siebzig",
"achtzig",
"neunzig",
"hundert",
"tausend",
"million",
"millionen",
"milliarde",
"milliarden",
"billion",
"billionen",
],
plus_word="plus",
)


@register_language
class GermanOperators(LanguageOperators):
def __init__(self):
super().__init__(GERMAN_CONFIG)
self._number_normalizer = GermanNumberNormalizer()

def get_word_replacements(self) -> dict[str, str]:
return GERMAN_REPLACEMENTS

def expand_written_numbers(self, text: str) -> str:
return self._number_normalizer(text)
58 changes: 58 additions & 0 deletions normalization/languages/italian/number_normalizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
"""Italian number normalizer using text2num's alpha2digit.

Converts spelled-out numbers to digits (e.g. venti → 20) and handles
mixed digit+word forms (e.g. 2 cento → due cento) before conversion
so alpha2digit does not misinterpret them.

A post-pass replaces words alpha2digit leaves unconverted in isolation:
- 'uno' → '1'
- 'due' → '2'
"""

import re

from text_to_num import alpha2digit

_RE_MIXED_NUMBER = re.compile(
r"\b(\d+)\s+(cento|mila?|milioni?|miliardi?)\b",
re.IGNORECASE,
)
Comment on lines +16 to +19
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
import re

pat = re.compile(r"\b(\d+)\s+(cento|mila?|milioni?|miliardi?)\b", re.IGNORECASE)
for text in ("2 mila", "1 mille", "2 mil"):
    print(f"{text!r}: {bool(pat.search(text))}")
PY

Repository: gladiaio/normalization

Length of output: 111


🏁 Script executed:

find . -type f -name "*.py" -o -name "*.json" | head -20

Repository: gladiaio/normalization

Length of output: 930


🏁 Script executed:

fd -e py -e json | grep -i italian | head -20

Repository: gladiaio/normalization

Length of output: 312


🏁 Script executed:

rg -i "mille|mila" --type py | head -30

Repository: gladiaio/normalization

Length of output: 355


🏁 Script executed:

cat -n normalization/languages/italian/operators.py | head -50

Repository: gladiaio/normalization

Length of output: 1745


🏁 Script executed:

cat -n normalization/languages/italian/number_normalizer.py

Repository: gladiaio/normalization

Length of output: 2358


Match mille explicitly instead of mila?.

Line 17 currently matches mil/mila but not the configured Italian word mille, so inputs like 1 mille skip the mixed-number pre-pass while invalid 2 mil is accepted by the regex.

Proposed fix
 _RE_MIXED_NUMBER = re.compile(
-    r"\b(\d+)\s+(cento|mila?|milioni?|miliardi?)\b",
+    r"\b(\d+)\s+(cento|mille|mila|milioni?|miliardi?)\b",
     re.IGNORECASE,
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_RE_MIXED_NUMBER = re.compile(
r"\b(\d+)\s+(cento|mila?|milioni?|miliardi?)\b",
re.IGNORECASE,
)
_RE_MIXED_NUMBER = re.compile(
r"\b(\d+)\s+(cento|mille|mila|milioni?|miliardi?)\b",
re.IGNORECASE,
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@normalization/languages/italian/number_normalizer.py` around lines 16 - 19,
The regex _RE_MIXED_NUMBER currently uses "mila?" which matches "mil" or "mila"
and misses the correct Italian "mille"; update the pattern used in
_RE_MIXED_NUMBER so it explicitly matches "mille" and "mila" (e.g., replace the
"mila?" token with an explicit alternation like "(mille|mila)") while preserving
other alternatives (cento, milione/milioni, miliardo/miliardi) and re.IGNORECASE
to ensure inputs like "1 mille" are correctly caught by the mixed-number
pre-pass.


_RE_UNO = re.compile(r"\buno\b", re.IGNORECASE)
_RE_DUE = re.compile(r"\bdue\b", re.IGNORECASE)


def _fix_remaining_words(text: str) -> str:
"""Replace number words alpha2digit did not convert."""
text = _RE_UNO.sub("1", text)
text = _RE_DUE.sub("2", text)
return text


class ItalianNumberNormalizer:
"""Convert Italian spelled-out numbers to digits via text2num.alpha2digit.

Accepts digit_words (word→digit mapping from LanguageConfig) to derive
the digit→word mapping used for mixed-form pre-passes (e.g. '2 cento' → 'due cento').
"""

def __init__(self, digit_words: dict[str, str]) -> None:
self._digit_to_word = {v: k for k, v in digit_words.items()}

def _normalize_mixed_numbers(self, text: str) -> str:
"""Convert '2 cento' → 'due cento' so alpha2digit yields 200, not '2 100'."""

def replace(match: re.Match) -> str:
number = match.group(1)
multiplier = match.group(2)
if len(number) == 1 and number in self._digit_to_word:
return f"{self._digit_to_word[number]} {multiplier}"
return match.group(0)

return _RE_MIXED_NUMBER.sub(replace, text)

def __call__(self, text: str) -> str:
text = self._normalize_mixed_numbers(text)
text = alpha2digit(text, "it")
text = _fix_remaining_words(text)
return text
29 changes: 11 additions & 18 deletions normalization/languages/italian/operators.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
import re

from normalization.languages.base import LanguageConfig, LanguageOperators
from normalization.languages.italian.number_normalizer import ItalianNumberNormalizer
from normalization.languages.italian.replacements import ITALIAN_REPLACEMENTS
from normalization.languages.italian.sentence_replacements import (
ITALIAN_SENTENCE_REPLACEMENTS,
)
from normalization.languages.registry import register_language

# Single digits 19: shared by digit_words and any future time/compound helpers.
# Single digits 1-9: shared by digit_words and any future time/compound helpers.
_ONE_TO_NINE: dict[str, str] = {
"uno": "1",
"due": "2",
Expand All @@ -17,11 +19,6 @@
"nove": "9",
}

ITALIAN_SENTENCE_REPLACEMENTS: dict[str, str] = {
# Spoken percentages (“dieci per cento”) → one canonical form aligned with “%” → percento
"per cento": "percento",
}

ITALIAN_CONFIG = LanguageConfig(
code="it",
decimal_separator=",",
Expand Down Expand Up @@ -101,16 +98,12 @@
class ItalianOperators(LanguageOperators):
def __init__(self):
super().__init__(ITALIAN_CONFIG)

def fix_one_word_in_numeric_contexts(self, text: str) -> str:
text = re.sub(r"(\d+)\s+uno\s+uno\b", r"\1 1 1", text)
text = re.sub(r"\buno\s+uno\s+(\d)", r"1 1 \1", text)
text = re.sub(r"(\d+)\s+uno\s+(\d)", r"\1 1 \2", text)
text = re.sub(r"(\d+)\s+uno\b", r"\1 1", text)
text = re.sub(r"\b(\d+)uno\b", r"\1 1", text)
text = re.sub(r"\buno\s+(\d)", r"1 \1", text)
text = re.sub(r"^uno\s+(?=[a-z])", "1 ", text)
return text
self._number_normalizer = ItalianNumberNormalizer(
ITALIAN_CONFIG.digit_words or {}
)

def get_word_replacements(self) -> dict[str, str]:
return ITALIAN_REPLACEMENTS

def expand_written_numbers(self, text: str) -> str:
return self._number_normalizer(text)
Comment on lines +108 to +109
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Inspect where the numeric-context step is included relative to n_to_digits.
# Expectation: If fix_one_word_in_numeric_contexts can run for Italian, ItalianOperators should still override it.
rg -n -C3 '\bfix_one_word_in_numeric_contexts\b|\bn_to_digits\b|ExpandWrittenNumbersToDigitsStep|FixOneWordInNumericContextsStep'

Repository: gladiaio/normalization

Length of output: 7509


🏁 Script executed:

grep -n "fix_one_word_in_numeric_contexts" normalization/languages/italian/operators.py

Repository: gladiaio/normalization

Length of output: 48


🏁 Script executed:

cat -n normalization/languages/italian/operators.py | head -130

Repository: gladiaio/normalization

Length of output: 3449


Implement fix_one_word_in_numeric_contexts() for Italian.

FixOneWordInNumericContextsStep is registered in the pipeline and calls operators.fix_one_word_in_numeric_contexts(). Without an override, Italian falls back to the base no-op, while English and Spanish both implement language-specific regex patterns to convert "one"/"uno" to "1" when adjacent to digits. The new expand_written_numbers() method handles full number words but does not cover isolated "uno" in numeric contexts, so inputs like "10 uno uno" will regress to unmodified output.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@normalization/languages/italian/operators.py` around lines 108 - 109, The
Italian operator needs a language-specific override of
fix_one_word_in_numeric_contexts to convert isolated "uno" to "1" when adjacent
to digits (mirroring English/Spanish implementations) because
expand_written_numbers only handles full number words; implement
operators.fix_one_word_in_numeric_contexts() to use a regex that matches
word-boundary "uno" when preceded or followed by digits (or digit sequences with
separators) and replace it with "1" while preserving surrounding
whitespace/punctuation, ensuring the method name
fix_one_word_in_numeric_contexts and the existing expand_written_numbers remain
unchanged.

3 changes: 3 additions & 0 deletions normalization/languages/italian/sentence_replacements.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ITALIAN_SENTENCE_REPLACEMENTS: dict[str, str] = {
"per cento": "percento",
}
13 changes: 13 additions & 0 deletions tests/e2e/files/gladia-3/de.csv
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,16 @@ halt mal so,mal so
st. petersburg,st petersburg
6 tage krieg,sechstagekrieg
kreuzungs punkt,kreuzungspunkt
£100,100 pounds
¥500,500 yens
$20 und $30,20 dollars und 30 dollars
zwei,2
drei,3
zehn,10
zwanzig,20
dreizehn,13
hundert,100
tausend,1000
drei euro,3 euro
hundert euro,100 euro
zwanzig apfel,20 apfel
10 changes: 10 additions & 0 deletions tests/e2e/files/gladia-3/en.csv
Original file line number Diff line number Diff line change
Expand Up @@ -120,3 +120,13 @@ x = 5,x equals 5
ø in Danish,o in danish
€20 or €30,20 euros or 30 euros
my name is bob,my name is bob
thirteen dogs,13 dogs
fifteen items,15 items
forty people,40 people
sixty items,60 items
seventy two,72
eighty nine,89
four hundred,400
five thousand dollars,5000 dollars
three thousand five hundred,3500
two billion people,2000000000 people
8 changes: 8 additions & 0 deletions tests/e2e/files/gladia-3/es.csv
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,11 @@ www.gladia.io,w w w punto gladia punto io
¢25,25 céntimos
£50,50 libras
¥1000,1000 yenes
cinco manzanas,5 manzanas
cero errores,0 errores
quince personas,15 personas
treinta,30
cuarenta y cinco,45
setenta y ocho,78
quinientos,500
quince mil,15000
8 changes: 8 additions & 0 deletions tests/e2e/files/gladia-3/fr.csv
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,11 @@ x = 5,x egal a 5
test@example.com,test arobase example point com
bonjour (euh) ami,bonjour ami
ça date d'hier,ca date d hier
seize,16
douze pommes,12 pommes
quarante,40
deux cents,200
trois mille,3000
dix-neuf,19
quatre-vingt-dix,90
soixante quinze,75
Loading
Loading