Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,85 @@
# Changelog

## 0.1.3 (2026-06-11)

Bug-fix release from a systematic stress-test of the public API surface. ~30
verified fixes, no new features.

**Resolution correctness.** Dotted abbreviations are no longer misclassified
as missing-value markers: `"U.S.A."` resolves to `country/USA` (it previously
resolved to an unrelated org entity) and `"U.K."` to `country/GBR`, while
genuine null markers (`#N/A`, `--`, `.`) still return no match. Mixed-case
inputs (`"fRaNcE"`, `"SUDan"`) resolve like their standard casings.
Zero-padded ISO numeric codes (`"004"`) now resolve, `to="iso_numeric"` emits
the canonical zero-padded form, and the pycountry-style `numeric` alias works
(`entity("France").numeric` → `"250"`). `snap()` accepts free-text candidate
labels as documented, alongside entity IDs. Punctuation-only inputs (`"."`,
`"?"`) return no match instead of an internal error. `EntityRecord.aliases`
no longer leaks the canonical name or duplicates.

**Crashes.** `bulk()` no longer crashes on polars Series input without a
pivot; `output="record"` builds primitive records that `to_polars()` accepts.
`ResolutionResult` pickles cleanly.

**Validation and errors.** Enum-like parameters are validated eagerly with
did-you-mean suggestions instead of silently accepting typos: `on_ambiguous`
(`resolve_id`), `on_missing`/`on_error`/`on_ambiguous` (`bulk`), `default_to`
types (`configure`), `confidence_threshold` and `domain` (`parse`),
`name:` language segments, and `ResolutionContext.country`.
`AmbiguousResolutionError` hints are candidate-aware (no longer suggesting
`entity_types=` when the tied candidates share one type) and `str()` previews
the top candidates. `to=<typo>` errors suggest the closest code system
instead of dumping all of them, and the `domain=`-with-auto-routing error no
longer references internals. `from_records()` reports the offending row and
column for empty name cells.

**Behavior consistency.** `configure()` no longer clears settings that are
omitted from the call; passing `None` explicitly resets a setting to its
default (`cache_dir`, `default_to`). Mutating a returned result's lists no
longer corrupts the query cache. `as_of=` accepts ISO date strings on
`members_of`/`is_member`/`related`/`within`. `bulk()` pandas output preserves
`None` under pandas 3. `available_entity_types()` returns the fine-grained
types that `entity_types=` accepts. BYOD labels containing
NFKC-compatibility characters (`™`, `№`) now round-trip; existing BYOD disk
caches are rebuilt automatically.

**API consistency.** Scalar `resolve()`/`resolve_id()` now coerce numeric
input the same way `bulk()` does (`resolve_id(840)` → `country/USA`; integral
floats like `840.0` from numeric dataframe columns coerce cleanly in both
paths), and non-string types raise the `TypeError` the docstring always
promised — `bool` included. `ResolutionContext(country=...)` accepts ISO
alpha-3 alongside alpha-2 (`"USA"` and `"US"` behave identically). The pandas
and polars accessors no longer convert caller mistakes into all-`None`
columns: parameter-validation errors propagate (polars previously garbled
them through `map_batches`), and `on_error` is exposed with the same
`"raise"` default as `bulk()`. `ResolutionResult.reasons`, `.candidates`,
and `.refinement_hints` are tuples now — the documented frozen contract is
real, not advisory. Commonly raised errors (`UnknownCodeSystemError`,
`UnknownOutputError`, `UnknownDomainError`, `OutputMissingError`,
`DataPackNotAvailableError`, `CrosswalkError`, `ExplainNotAvailableError`,
`NoModulesInstalledError`) are importable from top-level `resolvekit`;
`resolvekit.errors` remains the canonical home.

**Docs.** Corrected the `snap()` candidate guidance and the
`UnknownOutputError`/`UnknownCodeSystemError` reference entries; refreshed
stale confidence figures in the tutorials; documented that code
auto-detection is case-sensitive by design while `from_system` is
case-insensitive.

**Performance.** The SymSpell typo index is now built in a background daemon
thread during `Resolver` construction (default on), so the build cost no longer
lands on the first query that passes the exact-match tiers. Opt out with
`warm=False` on any constructor (`Resolver.auto(warm=False)`,
`Resolver.from_modules(warm=False)`, etc.) to keep construction fully lazy.
`resolvekit.warm()` and `Resolver.warm()` are new synchronous, idempotent,
thread-safe functions that build all lazy indexes and return when they're ready
— for servers or batch jobs that want deterministic readiness. The large-tier
SymSpell index (706k terms, ~6 s to build on remote-data installs) is now
cached as a locally-generated pickle under `<cache-dir>/compiled/` after its
first build (~150 MB, loads in ~1.4 s on subsequent processes), keyed by the
dictionary files and symspellpy version; existing bundled-only installs are
unaffected.

## 0.1.2 (2026-06-11)

**Fixed.** `download()` crashed on a clean install with "Missing package
Expand Down
Loading
Loading