Skip to content

Conversation

@orangejulius
Copy link
Member

This is a small extension to our deduplication logic so that periods in names will be ignored when comparing for dedupication.

For example, a query for 3929 St Marks Avenue, Niagara Falls, ON, Canada returns two duplicate addresses from OpenAddresses. One is sourced from a countrywide dataset, and another a regional dataset. One has a period after the abbreviation for Saint, one doesn't.

image

We should probably evaluate ignoring most or all punctuation, but this fixes a somewhat common case for now.

This is a small extension to our deduplication logic so that periods in
names will be ignored when comparing for dedupication.

For example, a query for `3929 St Marks Avenue, Niagara Falls, ON,
Canada` returns two duplicate addresses from OpenAddresses. One is
sourced from a countrywide dataset, and another a regional dataset. One
has a period after the abbreviation for Saint, one doesn't.

We should probably evaluate ignoring most or all punctuation, but this
fixes a somewhat common case for now.
@orangejulius orangejulius changed the title feat(deduplication): ignore periods when comparing names deduplication: ignore periods when comparing names Dec 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants