Problem
Legacy ERP systems often store city and postal code in a single field, leading to mixed data during migrations. This is a very common data quality issue.
Examples encountered in migrations:
"104 Reykjavík" (Icelandic: postal prefix)
"75001 Paris" (French: postal prefix)
"Amsterdam 1012 AB" (Dutch: postal suffix)
"3080-055 Figueira Da Foz" (Portuguese: hyphenated postal)
"13014 Kuwait" (Middle East: numeric prefix)
"London SW1A 1AA" (UK: alphanumeric suffix)
Proposed Solution
1. City/Postal Separator
Add a separate_city_postal() cleaner function that:
- Detects postal code patterns by country/region
- Separates city name from postal code
- Returns both values (tuple or dict)
from odoo_data_flow.clean import separate_city_postal
# Returns (city, postal_code) tuple
city, postal = separate_city_postal("75001 Paris", country="FR")
# city = "Paris", postal = "75001"
# Or with country detection from postal pattern
city, postal = separate_city_postal("Amsterdam 1012 AB")
# city = "Amsterdam", postal = "1012 AB"
# Handle edge cases gracefully
city, postal = separate_city_postal("Some City")
# city = "Some City", postal = ""
2. Country Code Detection/Recovery
When country code is missing, attempt to recover it from other available data:
from odoo_data_flow.clean import detect_country
# From phone number (international prefix)
country = detect_country(phone="+31 6 12345678")
# country = "NL"
# From postal code pattern
country = detect_country(postal="1012 AB")
# country = "NL"
# From city name (known major cities)
country = detect_country(city="Amsterdam")
# country = "NL"
# Combined detection (uses all available hints)
country = detect_country(
phone="+33 1 23456789",
postal="75001",
city="Paris"
)
# country = "FR"
Country-specific postal patterns to support
| Country |
Format |
Position |
Example |
| NL |
1234 AB |
suffix |
Amsterdam 1012 AB |
| BE |
1234 |
prefix |
1000 Brussel |
| DE |
12345 |
prefix |
10115 Berlin |
| FR |
75001 |
prefix |
75001 Paris |
| UK |
SW1A 1AA |
suffix |
London SW1A 1AA |
| US |
12345(-6789) |
suffix |
New York 10001 |
| PT |
1234-567 |
prefix |
3080-055 Figueira Da Foz |
| IS |
123 |
prefix |
104 Reykjavík |
| ES |
12345 |
prefix |
28001 Madrid |
| IT |
12345 |
prefix |
00100 Roma |
Phone prefix patterns for country detection
| Prefix |
Country |
| +31 |
NL |
| +32 |
BE |
| +33 |
FR |
| +44 |
GB |
| +49 |
DE |
| +1 |
US/CA |
| etc. |
... |
Use Case
Partner/address imports where:
- Source data has combined city+postal fields
- Country code is missing but can be inferred from phone/postal/city
This would complement the existing clean.zip_code() and clean.city() functions in the feature/production-ready-etl branch.
Problem
Legacy ERP systems often store city and postal code in a single field, leading to mixed data during migrations. This is a very common data quality issue.
Examples encountered in migrations:
"104 Reykjavík"(Icelandic: postal prefix)"75001 Paris"(French: postal prefix)"Amsterdam 1012 AB"(Dutch: postal suffix)"3080-055 Figueira Da Foz"(Portuguese: hyphenated postal)"13014 Kuwait"(Middle East: numeric prefix)"London SW1A 1AA"(UK: alphanumeric suffix)Proposed Solution
1. City/Postal Separator
Add a
separate_city_postal()cleaner function that:2. Country Code Detection/Recovery
When country code is missing, attempt to recover it from other available data:
Country-specific postal patterns to support
1234 AB12341234575001SW1A 1AA12345(-6789)1234-5671231234512345Phone prefix patterns for country detection
Use Case
Partner/address imports where:
This would complement the existing
clean.zip_code()andclean.city()functions in thefeature/production-ready-etlbranch.