Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 20 additions & 15 deletions internal/identity/matcher_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@ import (
"github.com/InWheelOrg/inwheel-api/pkg/models"
)

// fakeRepo returns a fixed candidate slice and records the categories it was
// called with so tests can assert the compat-filter was applied.
// fakeRepo holds an unfiltered candidate slice and applies the compat-filter
// that real CandidateRepo implementations apply at the database. Tests can
// therefore write fixtures that include category-incompatible distractors and
// verify the filter excludes them.
type fakeRepo struct {
candidates []models.Place
err error
Expand All @@ -27,14 +29,25 @@ type fakeRepo struct {

func (f *fakeRepo) FindCandidates(_ context.Context, _, _, _ float64, cats []models.Category) ([]models.Place, error) {
f.lastCats = cats
return f.candidates, f.err
if f.err != nil {
return nil, f.err
}
allowed := make(map[models.Category]bool, len(cats))
for _, c := range cats {
allowed[c] = true
}
out := make([]models.Place, 0, len(f.candidates))
for _, p := range f.candidates {
if allowed[p.Category] {
out = append(out, p)
}
}
return out, nil
}

type fixtureExpected struct {
Kind string `json:"Kind"`
MatchedPlaceID string `json:"MatchedPlaceID"`
MinConfidence float64 `json:"MinConfidence"`
MaxConfidence float64 `json:"MaxConfidence"`
Kind string `json:"Kind"`
MatchedPlaceID string `json:"MatchedPlaceID"`
}

type fixture struct {
Expand Down Expand Up @@ -214,14 +227,6 @@ func TestMatch_Fixtures(t *testing.T) {
t.Errorf("MatchedPlaceID = %q, want %q", d.MatchedPlaceID, f.Expected.MatchedPlaceID)
pass = false
}
if f.Expected.MinConfidence > 0 && d.Confidence < f.Expected.MinConfidence {
t.Errorf("Confidence = %v, want >= %v", d.Confidence, f.Expected.MinConfidence)
pass = false
}
if f.Expected.MaxConfidence > 0 && d.Confidence > f.Expected.MaxConfidence {
t.Errorf("Confidence = %v, want <= %v", d.Confidence, f.Expected.MaxConfidence)
pass = false
}
total++
if pass {
correct++
Expand Down
41 changes: 41 additions & 0 deletions internal/identity/testdata/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# identity matcher fixtures

`match_fixtures.json` is the source-agnostic regression set for `identity.Match`. Each entry pairs an incoming `identity.Record` with a list of candidate places and the decision the matcher must produce. `TestMatch_Fixtures` in `../matcher_test.go` runs every entry and asserts the result.

The goal is to lock in current matcher behaviour and catch accidental regressions when scoring, normalization, blocking, or thresholds are touched. This file is **not** a tuning corpus — it does not measure aggregate precision/recall against realistic data. Per-source realism fixtures live under `internal/sources/<name>/testdata/` and arrive with each new source.

## Entry format

```json
{
"name": "short label that appears as the subtest name",
"record": { "Name": "...", "Lat": 0.0, "Lng": 0.0, "Category": "...",
"Street": "...", "HouseNumber": "..." },
"candidates": [
{ "id": "p1", "name": "...", "lat": 0.0, "lng": 0.0, "category": "...",
"tags": { "addr:street": "...", "addr:housenumber": "..." } }
],
"expected": { "Kind": "confident|low_confidence|no_match", "MatchedPlaceID": "p1" }
}
```

- `record` fields mirror `identity.Record`. Omitted string fields default to `""`.
- `candidates` are `models.Place` values. The fake repo in the test applies the category compat filter (`identity.Compatible`) before passing them to `Match`, so entries can include incompatible distractors and verify they are excluded.
- `expected.Kind` is required. `expected.MatchedPlaceID` is asserted only when non-empty; omit it for `no_match` entries.
- Confidence values are **not** asserted. Specific scores shift whenever a weight or threshold changes; only the decision band matters for a regression test.

## Adding a new entry

1. Pick what behaviour the entry pins down — name normalization, distance falloff, compat filter, threshold band, tiebreak, etc. One concern per entry.
2. Compute the expected outcome by hand from the constants in `score.go` (`RadiusM`, `ConfidentThreshold`, `LowConfidenceThreshold`, the three weights). If the outcome depends on tuning being exactly what it is today, that is a signal the entry will be noisy and may need to be rewritten when thresholds change.
3. Keep coordinates in the same neighbourhood as existing entries (around `(46.4628, 6.8417)`). Compute lat offsets as `meters / 111000`; longitude offsets are not used by the current set.
4. Run `go test ./internal/identity/ -run TestMatch_Fixtures -v` and confirm the new entry passes.

## What is covered today

- Confident, low-confidence, and no-match outcomes across coordinate, name, and address signals.
- Distance falloff at 5 m, 25 m, 35 m, 49 m, and beyond `RadiusM`.
- Name normalization: diacritics, business-suffix drop, word reorder, partial overlap.
- Category compat: candidates of an incompatible category are filtered out by blocking.
- Tiebreak: stronger name beats slightly-closer distractor; argmax across multiple candidates.
- Address weight: matching street + housenumber boosts the score; mismatched address does not block a confident match driven by name and distance.
102 changes: 95 additions & 7 deletions internal/identity/testdata/match_fixtures.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@
{"id": "p1", "name": "Café Pascal", "lat": 46.4628, "lng": 6.8417, "category": "cafe",
"tags": {"addr:street": "Rue du Simplon", "addr:housenumber": "10"}}
],
"expected": {"Kind": "confident", "MatchedPlaceID": "p1", "MinConfidence": 0.95}
"expected": {"Kind": "confident", "MatchedPlaceID": "p1"}
},
{
"name": "low confidence: 25 m away, name match, no address",
"record": {"Name": "Pascal", "Lat": 46.4628, "Lng": 6.8417, "Category": "cafe"},
"candidates": [
{"id": "p1", "name": "Pascal", "lat": 46.462575, "lng": 6.8417, "category": "cafe"}
],
"expected": {"Kind": "low_confidence", "MatchedPlaceID": "p1", "MinConfidence": 0.55, "MaxConfidence": 0.80}
"expected": {"Kind": "low_confidence", "MatchedPlaceID": "p1"}
},
{
"name": "no match: no candidates returned",
Expand All @@ -40,28 +40,116 @@
{"id": "p1", "name": "Roma", "lat": 46.4628, "lng": 6.8417, "category": "cafe"},
{"id": "p2", "name": "Pascal", "lat": 46.4628, "lng": 6.8417, "category": "cafe"}
],
"expected": {"Kind": "confident", "MatchedPlaceID": "p2", "MinConfidence": 0.55}
"expected": {"Kind": "confident", "MatchedPlaceID": "p2"}
},
{
"name": "address absent: still confident on strong name + distance",
"record": {"Name": "Pascal", "Lat": 46.4628, "Lng": 6.8417, "Category": "cafe"},
"candidates": [
{"id": "p1", "name": "Pascal", "lat": 46.4628, "lng": 6.8417, "category": "cafe"}
],
"expected": {"Kind": "confident", "MatchedPlaceID": "p1", "MinConfidence": 0.95}
"expected": {"Kind": "confident", "MatchedPlaceID": "p1"}
},
{
"name": "diacritic normalization: Café matches Cafe",
"record": {"Name": "Café Pascal", "Lat": 46.4628, "Lng": 6.8417, "Category": "cafe"},
"candidates": [
{"id": "p1", "name": "Cafe Pascal", "lat": 46.4628, "lng": 6.8417, "category": "cafe"}
],
"expected": {"Kind": "confident", "MatchedPlaceID": "p1", "MinConfidence": 0.95}
"expected": {"Kind": "confident", "MatchedPlaceID": "p1"}
},
{
"name": "category incompatible: fake repo returns empty",
"name": "category incompatible: cafe candidate filtered from healthcare record",
"record": {"Name": "Pascal Pharmacy", "Lat": 46.4628, "Lng": 6.8417, "Category": "healthcare"},
"candidates": [],
"candidates": [
{"id": "p1", "name": "Pascal", "lat": 46.4628, "lng": 6.8417, "category": "cafe"}
],
"expected": {"Kind": "no_match"}
},
{
"name": "boundary: 5 m offset, identical name, no address → confident",
"record": {"Name": "Pascal", "Lat": 46.4628, "Lng": 6.8417, "Category": "cafe"},
"candidates": [
{"id": "p1", "name": "Pascal", "lat": 46.462755, "lng": 6.8417, "category": "cafe"}
],
"expected": {"Kind": "confident", "MatchedPlaceID": "p1"}
},
{
"name": "boundary: 35 m offset, identical name, no address → low_confidence",
"record": {"Name": "Pascal", "Lat": 46.4628, "Lng": 6.8417, "Category": "cafe"},
"candidates": [
{"id": "p1", "name": "Pascal", "lat": 46.462485, "lng": 6.8417, "category": "cafe"}
],
"expected": {"Kind": "low_confidence", "MatchedPlaceID": "p1"}
},
{
"name": "boundary: 49 m offset, identical name, no address → no_match (just inside radius, score below floor)",
"record": {"Name": "Pascal", "Lat": 46.4628, "Lng": 6.8417, "Category": "cafe"},
"candidates": [
{"id": "p1", "name": "Pascal", "lat": 46.462359, "lng": 6.8417, "category": "cafe"}
],
"expected": {"Kind": "no_match"}
},
{
"name": "boundary: 51 m offset, identical name → no_match (distance score clamps to 0)",
"record": {"Name": "Pascal", "Lat": 46.4628, "Lng": 6.8417, "Category": "cafe"},
"candidates": [
{"id": "p1", "name": "Pascal", "lat": 46.462341, "lng": 6.8417, "category": "cafe"}
],
"expected": {"Kind": "no_match"}
},
{
"name": "name variation: business suffix dropped (Pascal Inc → Pascal)",
"record": {"Name": "Pascal Inc", "Lat": 46.4628, "Lng": 6.8417, "Category": "cafe"},
"candidates": [
{"id": "p1", "name": "Pascal", "lat": 46.4628, "lng": 6.8417, "category": "cafe"}
],
"expected": {"Kind": "confident", "MatchedPlaceID": "p1"}
},
{
"name": "name variation: word reorder (Mario Pizza ↔ Pizza Mario)",
"record": {"Name": "Mario Pizza", "Lat": 46.4628, "Lng": 6.8417, "Category": "restaurant"},
"candidates": [
{"id": "p1", "name": "Pizza Mario", "lat": 46.4628, "lng": 6.8417, "category": "restaurant"}
],
"expected": {"Kind": "confident", "MatchedPlaceID": "p1"}
},
{
"name": "name variation: partial token overlap (Café Pascal Bistro vs Pascal) → low_confidence",
"record": {"Name": "Café Pascal Bistro", "Lat": 46.4628, "Lng": 6.8417, "Category": "cafe"},
"candidates": [
{"id": "p1", "name": "Pascal", "lat": 46.4628, "lng": 6.8417, "category": "cafe"}
],
"expected": {"Kind": "low_confidence", "MatchedPlaceID": "p1"}
},
{
"name": "ambiguity: stronger name signal beats slightly-closer noisier candidate",
"record": {"Name": "Pascal", "Lat": 46.4628, "Lng": 6.8417, "Category": "cafe"},
"candidates": [
{"id": "p1", "name": "Pascal", "lat": 46.462620, "lng": 6.8417, "category": "cafe"},
{"id": "p2", "name": "Pascal Cafe", "lat": 46.462755, "lng": 6.8417, "category": "cafe"}
],
"expected": {"Kind": "low_confidence", "MatchedPlaceID": "p1"}
},
{
"name": "category distractor: pharmacy filtered out, nearby cafe wins",
"record": {"Name": "Pascal", "Lat": 46.4628, "Lng": 6.8417, "Category": "cafe"},
"candidates": [
{"id": "p1", "name": "Pharmacie Pascal", "lat": 46.4628, "lng": 6.8417, "category": "healthcare"},
{"id": "p2", "name": "Pascal", "lat": 46.46271, "lng": 6.8417, "category": "cafe"}
],
"expected": {"Kind": "confident", "MatchedPlaceID": "p2"}
},
{
"name": "address mismatch: still confident when name + distance dominate",
"record": {
"Name": "Pascal", "Lat": 46.4628, "Lng": 6.8417, "Category": "cafe",
"Street": "Rue du Simplon", "HouseNumber": "10"
},
"candidates": [
{"id": "p1", "name": "Pascal", "lat": 46.4628, "lng": 6.8417, "category": "cafe",
"tags": {"addr:street": "Rue de la Gare", "addr:housenumber": "5"}}
],
"expected": {"Kind": "confident", "MatchedPlaceID": "p1"}
}
]
Loading