Skip to content

Classifier gaps: 7 new regex patterns in schemata v0.3.2 #40

@alltheseas

Description

@alltheseas

Context

schemata v0.3.2 introduced 9 new regex patterns that fall back to regex in the classifier. 2 are already covered by #38 (hex_alternation, base64_2pad). The remaining 7 need new native ops.

Unclassified patterns

From the classifier-gate CI failure on #39:

# Pattern Proposed op Description
1 ^[a-z][a-z0-9]*$ identifier lowercase identifier
2 ^[A-Z][a-zA-Z0-9]*$ identifier PascalCase identifier
3 ^[a-z][a-z0-9-]*$ identifier lowercase kebab identifier
4 ^!?[a-z][a-z0-9]*$ identifier optional ! prefix + lowercase
5 ^!?[0-9]+$ optional_prefix_digits optional ! prefix + digits
6 ^[a-z_]+( [a-z_]+)*$ space_separated_charset space-separated lowercase+underscore tokens
7 ^[A-Za-z][A-Za-z0-9+.-]*:// uri_scheme URI scheme prefix (no end anchor)

Approach

Patterns 1-4 share a common shape: ^[optional_prefix][first_char_class][rest_char_class]*$. A single identifier op with configurable fields could cover all four.

Pattern 5 is similar but digit-only body.

Pattern 6 extends the existing space_separated_tokens concept.

Pattern 7 is a URI scheme prefix check (unanchored end).

Blocked by

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions