Skip to content

Conversation

@kddnewton
Copy link
Collaborator

Per #3724 (comment), thanks @Earlopain

The unicode version has been updated upstream, which means new codepoints mapped to alpha/alnum/isupper flags. We need to update our tables to match.

I'm purposefully not adding a version check here, since that is such a large amount of code. It's possible that we could include different tables depending on a macro (like UNICODE_VERSION) or something to that effect, but it's such a minimal impact on the running of the actual parser that I don't think it's necessary.

The unicode version has been updated upstream, which means new
codepoints mapped to alpha/alnum/isupper flags. We need to update
our tables to match.

I'm purposefully not adding a version check here, since that is
such a large amount of code. It's possible that we could include
different tables depending on a macro (like UNICODE_VERSION) or
something to that effect, but it's such a minimal impact on the
running of the actual parser that I don't think it's necessary.
@kddnewton kddnewton merged commit 63c6059 into main Nov 30, 2025
64 checks passed
@kddnewton kddnewton deleted the update-unicode branch November 30, 2025 04:23
@Earlopain
Copy link
Collaborator

Ah, I get it now.

For reference, the parser gem doesn't bother with that and just uses the regex from the runtime ruby (/[[:upper:]]/ and similar). So basically the same as this and I don't think anyone ever complained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants