Add an auto-generated unicode character category file#4605
Add an auto-generated unicode character category file#4605TheBlueMatt wants to merge 1 commit intolightningdevkit:mainfrom
Conversation
|
👋 Thanks for assigning @tnull as a reviewer! |
1a01b5a added detection of unicode format characters in `PrintableString`, but used a hard-coded table which may eventually become out of date. Here we switch to an auto-generated table, include all `General_Category` `Other` characters, and also ban unallocated code points. Finally, CI validates that the file is kept up to date. Written by Claude
b6f8c03 to
bd75483
Compare
| let is_other = is_unicode_general_category_other(c); | ||
| let is_unassigned = is_unicode_general_category_unassigned(c); | ||
| let c = if c.is_control() || is_other || is_unassigned { |
There was a problem hiding this comment.
Nit: c.is_control() is now fully redundant — it checks Cc (Control), which is already covered by is_unicode_general_category_other (see 0x0000..=0x001F and 0x007F..=0x009F in unicode.rs). The old code needed it because is_format_char only covered Cf, but the new function covers all of Cc / Cf / Cs / Co.
Not a bug (the || short-circuits harmlessly), but it's potentially confusing because it suggests is_other doesn't handle control characters.
| let is_other = is_unicode_general_category_other(c); | |
| let is_unassigned = is_unicode_general_category_unassigned(c); | |
| let c = if c.is_control() || is_other || is_unassigned { | |
| let c = if is_unicode_general_category_other(c) || is_unicode_general_category_unassigned(c) { |
|
I've thoroughly reviewed every file and hunk in this diff. The generator logic, generated tables, CI integration, and module wiring all check out. Beyond my prior comment about the Review SummaryNo new issues found. Previously flagged (still applicable):
Verification notes:
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4605 +/- ##
==========================================
+ Coverage 86.09% 86.15% +0.06%
==========================================
Files 157 158 +1
Lines 108828 109327 +499
Branches 108828 109327 +499
==========================================
+ Hits 93694 94193 +499
Misses 12519 12519
Partials 2615 2615
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
1a01b5a added detection of unicode format characters in
PrintableString, but used a hard-coded table which may eventually become out of date.Here we switch to an auto-generated table, include all
General_CategoryOthercharacters, and also ban unallocated code points.Finally, CI validates that the file is kept up to date.
Written by Claude