Add slugify safe builtin for identifier-shaped expressions

## Motivation

Trans-specs that emit identifier-shaped slots from human-authored source labels routinely need to sanitize names: replace whitespace and punctuation, fold case, ensure leading-character validity. Concrete driver: schema-automator's EML importer (https://github.com/linkml/schema-automator/issues/208), where EML `<entityName>` and `<attributeName>` fields contain spaces, punctuation, and mixed case but the target LinkML `class_definition.name` / `slot_definition.name` requires valid identifiers.

The current eval helpers (`replace`, `lower`, etc., from 0f803b6) can approximate slugify via 6-10 chained calls, but the chain is fragile and obscures intent. Regex was deliberately kept out for ReDoS reasons; `slugify` is bounded-time string manipulation and doesn't open that surface.

## Proposed direction

Add `slugify(s, separator='_')` to the safe builtin set in `eval_utils.py`. Default behavior:

- ASCII-fold (Unicode → ASCII transliteration)
- Lowercase
- Collapse non-alphanumeric runs to the separator
- Strip leading and trailing separators
- Ensure leading character is non-digit (prepend separator if needed)

Optionally include sibling helpers in the same bounded-time class: `to_snake(s)`, `to_camel(s)`, `to_pascal(s)`. Often needed in tandem when trans-specs target schemas with different naming conventions per location.

## Return semantics: None on no-extractable-content

`slugify` returns `None` when the input has no extractable identifier content — empty string, all-whitespace, all-punctuation, or input that collapses to empty after sanitization. This matches linkml-map's existing expression-layer convention: `None` is the SQL-style "doesn't apply" signal, propagates through `case()` arms, and composes with `or` for fallback chains:

```yaml
range: { expr: "slugify(attributeName) or slugify(attributeLabel) or 'anonymous'" }
```

If `slugify` raised on empty input, the raise would short-circuit the trans-spec rather than letting the `or` chain do its job. Returning `None` keeps slugify composable with the rest of linkml-map's expression vocabulary.

## Enforcement of "this slot can't be None" lives at the schema layer

Schemas that require non-empty identifiers — class names, slot names, dictionary keys — enforce that requirement at the schema-derivation layer, not inside slugify. Reporting "this source instance has unusable data for required slot X" with full source provenance is the job of transform-time target validation (#241). slugify stays a simple, total string function; structural requirements live where they belong.

## Implementation location

`slugify` could land in `linkml-runtime`'s utility set if there's a reasonable home there, with linkml-map re-exporting it for the eval namespace. Same normalization is useful in schemasheets, schema-automator, and other tooling that today re-roll variants. If linkml-runtime has no obvious home, ship in linkml-map as a standalone module with a clear path to upstream later.

## Open questions

- **Separator default.** `_` vs `-`. LinkML identifier conventions skew snake-case → `_` default. Configurable per-call.
- **Unicode policy.** ASCII-fold by default (predictable, identifier-safe) with `slugify(s, allow_unicode=True)` opt-in for cases where Unicode identifiers are wanted.
- **Sibling helpers.** Bundle `to_snake` / `to_camel` / `to_pascal` here, or scope this tight to `slugify` and follow up separately?

## References / contrasts

- `0f803b6` — safe builtin registry that this extends
- #241 — handles "must-not-be-None" enforcement at the schema layer
- schema-automator's EML importer (https://github.com/linkml/schema-automator/issues/208) — concrete consumer


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add slugify safe builtin for identifier-shaped expressions #242

Motivation

Proposed direction

Return semantics: None on no-extractable-content

Enforcement of "this slot can't be None" lives at the schema layer

Implementation location

Open questions

References / contrasts

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add slugify safe builtin for identifier-shaped expressions #242

Description

Motivation

Proposed direction

Return semantics: None on no-extractable-content

Enforcement of "this slot can't be None" lives at the schema layer

Implementation location

Open questions

References / contrasts

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions