CI for unicode script property #1804

cmyr · 2025-12-09T16:49:14Z

This adds a little util binary that checks that python and rust are computing the same set of unicode script extensions for the full set of codepoints, and then adds a CI step to run that.

This job will fail if ever we are running different versions of unicode.

rsheeter · 2025-12-09T17:16:52Z

.github/workflows/rust.yml

+      - uses: getsentry/action-setup-venv@v1.0.0
+        id: venv
+        with:
+          python-version: 3.13.9


we seem to have multiple inconsistent python-version in this file. Can we use the same one? Do we have to be so specific or can we say e.g. 3.13 and leave .x to setup?

will open a separate issue to update the fea-rs test code/tests more generally.

rsheeter · 2025-12-09T17:18:32Z

fontbe/src/bin/check_unicode_props.rs

+    scripts = list(unicodeScriptExtensions(cp))
+    print(f"{{cp}}: {{','.join(scripts)}}")
+"#,
+    );


is there an advantage to doing this rather than having a .py file that can be run independently if one is so inclined?

For example, perhaps include_str a file in resources/scripts ?

this was the simplest approach I could think of, felt nice to just have a single file here. Can you think of a reason you might want to run this independently? It's easy enough to split out if a reason arises...

rsheeter · 2025-12-09T17:20:03Z

fontbe/src/bin/check_unicode_props.rs

+        )
+    }
+
+    let stdout = String::from_utf8_lossy(&output.stdout);


suggest from_utf8.expect, it shouldn't be bad utf8 but if it is we shouldn't ignore it

rsheeter · 2025-12-09T17:21:33Z

fontbe/src/bin/check_unicode_props.rs

+    // Test ALL valid Unicode codepoints (0x0000 to 0x10FFFF, excluding surrogates)
+    //the surrogates are  0xD800..=0xDFFF
+    let rangea = 0x0..0xD7FF;
+    let rangeb = 0xF000..0x10FFFF;


Feels harder to follow than 0x0..0x10FFFF.skip(if surrogate)?

this is about splitting up the work into chunks to send to python, and we only want to send the start/end bound, so we don't need to write every codepoint through stdout.

rsheeter

LGTM with a couple of suggestions

anthrotype · 2025-12-10T14:19:29Z

.github/workflows/rust.yml

+        id: venv
+        with:
+          python-version: 3.13
+          cache-dependency-path: resour/scripts/requirements.txt


is this intentional or a typo?

Suggested change

cache-dependency-path: resour/scripts/requirements.txt

cache-dependency-path: resources/scripts/requirements.txt

good catch!

This will help us be confident that we aren't failing anything as a result of a mismatch in our querying of these properties. This currently just tests unicode script extensions.

This should mean that we will fail in CI if our unicode versions don't match.

cmyr force-pushed the ci-for-unicode-script-property branch from 4e1ee0b to 3492880 Compare December 9, 2025 16:51

rsheeter reviewed Dec 9, 2025

View reviewed changes

rsheeter approved these changes Dec 9, 2025

View reviewed changes

cmyr force-pushed the ci-for-unicode-script-property branch 2 times, most recently from 4507cac to 72d327d Compare December 9, 2025 18:24

cmyr mentioned this pull request Dec 9, 2025

Update CI to run fea-rs tests against more recent fonttools/python #1806

Open

anthrotype reviewed Dec 10, 2025

View reviewed changes

cmyr added 2 commits December 12, 2025 10:55

Add utility for testing unicode properties against python

28b56fb

This will help us be confident that we aren't failing anything as a result of a mismatch in our querying of these properties. This currently just tests unicode script extensions.

[ci] Run 'check_unicode_props' binary in CI

ab7c56b

This should mean that we will fail in CI if our unicode versions don't match.

cmyr force-pushed the ci-for-unicode-script-property branch from 72d327d to ab7c56b Compare December 12, 2025 15:56

cmyr added this pull request to the merge queue Dec 12, 2025

Merged via the queue into main with commit 80461fa Dec 12, 2025
13 checks passed

cmyr deleted the ci-for-unicode-script-property branch December 12, 2025 15:59

	cache-dependency-path: resour/scripts/requirements.txt
	cache-dependency-path: resources/scripts/requirements.txt

CI for unicode script property #1804

CI for unicode script property #1804

Uh oh!

Conversation

cmyr commented Dec 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rsheeter left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants