Use ascii() instead of repr() to escape non-ASCII characters by assisted-by-ai · Pull Request #39 · Kicksecure/helper-scripts

assisted-by-ai · 2026-04-15T11:46:55Z

Summary

Changed the character display logic in unicode_show.py to use Python's ascii() function instead of repr() to ensure that all non-ASCII characters are properly escaped in the output, preventing suspicious characters from appearing literally in terminal output.

Key Changes

Modified describe_char() function: Replaced repr(c) with ascii(c) when displaying characters that shouldn't be shown literally
Added comprehensive test coverage: New test_printable_non_ascii_chars_are_escaped() test that validates escaping of various character types:
- Accented letters (é)
- Cyrillic characters
- Combining marks
- CJK ideographs
- Emoji
- Currency symbols

Implementation Details

The change addresses a critical safety issue: Python's repr() function only escapes non-printable characters, allowing printable non-ASCII characters (letters, homoglyphs, combining marks, CJK, emoji, symbols, etc.) to pass through literally. Since unicode_show's purpose is to safely display and identify suspicious Unicode characters, using ascii() ensures that all non-ASCII characters are always escaped to their ASCII-safe representation, preventing them from appearing in terminal output.

The test suite verifies that:

All expected characters are properly escaped
The output is ASCII-only
Both stdin and file input modes work correctly

https://claude.ai/code/session_01JiGZC3R3SjVVdNbkUnXjES

describe_char used repr(c) to render suspicious characters in the description line. In Python 3, repr() only escapes characters that are not printable, so printable non-ASCII characters — letters (including Cyrillic/Greek/etc. homoglyphs), CJK, emoji, symbols, and combining marks — are passed through literally. This lets a suspicious character slip into unicode-show's own terminal output, defeating the tool's core purpose: a combining acute accent merges with the adjacent quote, a Cyrillic 'а' still reads as Latin 'a', etc. Use ascii(), which always returns an ASCII-only escaped representation, and add a regression test covering letters, homoglyphs, combining marks, CJK, emoji, and currency symbols.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use ascii() instead of repr() to escape non-ASCII characters#39

Use ascii() instead of repr() to escape non-ASCII characters#39
assisted-by-ai wants to merge 1 commit intoKicksecure:masterfrom
assisted-by-ai:claude/unicode-bypass-bugs-BnpRK

assisted-by-ai commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

assisted-by-ai commented Apr 15, 2026

Summary

Key Changes

Implementation Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants