-
Notifications
You must be signed in to change notification settings - Fork 17
CI for unicode script property #1804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4e1ee0b to
3492880
Compare
.github/workflows/rust.yml
Outdated
| - uses: getsentry/action-setup-venv@v1.0.0 | ||
| id: venv | ||
| with: | ||
| python-version: 3.13.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we seem to have multiple inconsistent python-version in this file. Can we use the same one? Do we have to be so specific or can we say e.g. 3.13 and leave .x to setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will open a separate issue to update the fea-rs test code/tests more generally.
| scripts = list(unicodeScriptExtensions(cp)) | ||
| print(f"{{cp}}: {{','.join(scripts)}}") | ||
| "#, | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there an advantage to doing this rather than having a .py file that can be run independently if one is so inclined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, perhaps include_str a file in resources/scripts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was the simplest approach I could think of, felt nice to just have a single file here. Can you think of a reason you might want to run this independently? It's easy enough to split out if a reason arises...
| ) | ||
| } | ||
|
|
||
| let stdout = String::from_utf8_lossy(&output.stdout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest from_utf8.expect, it shouldn't be bad utf8 but if it is we shouldn't ignore it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call!
| // Test ALL valid Unicode codepoints (0x0000 to 0x10FFFF, excluding surrogates) | ||
| //the surrogates are 0xD800..=0xDFFF | ||
| let rangea = 0x0..0xD7FF; | ||
| let rangeb = 0xF000..0x10FFFF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels harder to follow than 0x0..0x10FFFF.skip(if surrogate)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is about splitting up the work into chunks to send to python, and we only want to send the start/end bound, so we don't need to write every codepoint through stdout.
rsheeter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a couple of suggestions
4507cac to
72d327d
Compare
.github/workflows/rust.yml
Outdated
| id: venv | ||
| with: | ||
| python-version: 3.13 | ||
| cache-dependency-path: resour/scripts/requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intentional or a typo?
| cache-dependency-path: resour/scripts/requirements.txt | |
| cache-dependency-path: resources/scripts/requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
This will help us be confident that we aren't failing anything as a result of a mismatch in our querying of these properties. This currently just tests unicode script extensions.
This should mean that we will fail in CI if our unicode versions don't match.
72d327d to
ab7c56b
Compare
This adds a little util binary that checks that python and rust are computing the same set of unicode script extensions for the full set of codepoints, and then adds a CI step to run that.
This job will fail if ever we are running different versions of unicode.