Context
Discovered during review of the #277 fix (Unicode slug handling).
Description
_slug() strips non-ASCII characters after NFKD normalization, keeping only the ASCII portion. When a concept name contains both ASCII and non-ASCII characters, the non-ASCII part is silently dropped. Two different names that share the same ASCII prefix collide:
_slug("api_世界") # → "api"
_slug("api_你好") # → "api"
This produces identical ConceptNode.id values for semantically different concepts.
The pure-non-ASCII case is already handled by #277's hash fallback — this only affects mixed strings where the ASCII portion is non-empty.
Suggested approach
When the ASCII-only slug is shorter than the original (after normalization), append a short hash suffix to disambiguate:
if len(slug) < len(normalised.strip()):
slug += "_" + hashlib.sha256(normalised.encode()).hexdigest()[:6]
This preserves readability (the ASCII prefix survives) while guaranteeing uniqueness.
Context
Discovered during review of the #277 fix (Unicode slug handling).
Description
_slug()strips non-ASCII characters after NFKD normalization, keeping only the ASCII portion. When a concept name contains both ASCII and non-ASCII characters, the non-ASCII part is silently dropped. Two different names that share the same ASCII prefix collide:This produces identical
ConceptNode.idvalues for semantically different concepts.The pure-non-ASCII case is already handled by #277's hash fallback — this only affects mixed strings where the ASCII portion is non-empty.
Suggested approach
When the ASCII-only slug is shorter than the original (after normalization), append a short hash suffix to disambiguate:
This preserves readability (the ASCII prefix survives) while guaranteeing uniqueness.