feat: location/entity name-resolution accuracy (fix system-token shadowing + fuzzy/noise matching)#147
Merged
Merged
Conversation
added 2 commits
June 1, 2026 02:30
…assification
The exact per-token catalog loop matched the bare system token
("Stanton") against the catalogued system row before reaching the
specific body that follows it, collapsing ~91% of real location
events to their system name (OOC_Stanton_2b_Daymar -> "Stanton").
Defer bare-system tokens to a second pass so the specific
body/place resolves first; a bare system identifier still hits the
system row.
Add a distinctive-token fuzzy fallback (LocationCatalog::fuzzy_match):
idf-weighted token overlap, gated by a rarity anchor (df <= 4,
non-digit, not an affiliation word) plus a system-consistency guard.
Recovers real wiki rows the engine names differently
(Stanton4a_RayariHydro_Kaltag -> "Rayari Kaltag Research Outpost")
while rejecting uncatalogued places that only share an operator word
(RayariHydro_McGarth).
Add noise matchers (match_dynamic_marker, match_procedural_node) for
procedural/dynamic engine identifiers (ab_mine/ab_collector, *.socpak
clusters, mission/nav markers) so they get honest generic labels
instead of being title-cased into fake proper-noun places.
Measured on a real 43k-event tray DB: specific-place resolution
8.7% -> 92.7%; +18 distinct locations recovered by fuzzy.
…solution Add resolveReferenceEntry(): exact case-insensitive lookup, then strip loaner variant suffixes (_Teach/_loaner) so ARGO_MOLE_Teach resolves to the catalogued ARGO_MOLE (vehicles ~93% -> ~100%). For items, skip avatar/structural classes (Default, Head_*, body_*, Shared_Scalp_*, *LensDisplay*, ...) so the attachment_received firehose stops rendering body parts as linkable items. Add isCosmeticItemPort() for port-based suppression. Wire EntityLink (web) + TrayEntityLink and findEntityInBundles (tray) through the shared resolver; keep the two mirrors in sync.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Overhauls entity name-resolution accuracy after auditing the real LIVE
Game.log+ tray DB against the live wiki catalogue. The headline is a pre-existing bug: ~91% of location events were collapsing to their bare system name.🔴 Location system-token shadowing (the big one)
The exact per-token catalog loop in
classify()matched the leading bare system token (Stanton) against the catalogued system row before reaching the specific body, soOOC_Stanton_2b_Daymarresolved to "Stanton", not "Daymar". Measured on a real 43k-event tray DB: specific-place resolution 8.7% → 92.7%. Fix: skip bare-system tokens in the exact pass, defer them to a second pass after fuzzy so a bare system identifier still resolves with full taxonomy.Distinctive-token fuzzy matcher (
LocationCatalog::fuzzy_match)idf-weighted token-overlap fallback, gated by a rarity anchor (df ≤ 4, non-digit, non-affiliation) + a system-consistency guard. Recovers real wiki rows the engine names differently (
Stanton4a_RayariHydro_Kaltag→ "Rayari Kaltag Research Outpost") while rejecting uncatalogued places that only share an operator/affiliation word (RayariHydro_McGarth). Measured: +18 distinct locations recovered, 1 low-volume FP (Pyro4…→Pyro3, 3 events).Noise classification
match_dynamic_marker+match_procedural_nodegive honest generic labels to procedural/dynamic engine ids (ab_mine_*,*.socpakclusters, mission/nav markers) instead of title-casing them into fake proper-noun places.Vehicle/item variant-suffix strip + item noise filter (web + tray)
resolveReferenceEntry(): exact lookup, then strip loaner suffixes (_Teach) soARGO_MOLE_Teach→ARGO_MOLE(vehicles ~93% → ~100%); skip avatar/structural item classes (Default,Head_*,body_*, …) so theattachment_receivedfirehose stops rendering body parts as linkable items. Mirrored into the tray (findEntityInBundles+TrayEntityLink).Test plan
starstats-core: 289 tests (new fuzzy/noise/shadow tests),cargo fmt --check+clippy -D warningscleanapps/web: 39 tests +tsc --noEmitcleanapps/tray-ui: 170 tests +tsc --noEmitcleancargo check(server + client) greenOut of scope
Combat events (
actor_death/vehicle_destruction) are absent fromGame.logat default verbosity (confirmed zero across the live log + 40 archives) — a game log-CVar matter, not a code issue.No migrations — pure query-time resolution, honoring the "derive classification at query time" invariant.
Roadmap-Item: location-entity-name-resolution-accuracy