Skip to content

escape_chars: add support for nonprintable#201

Open
rjbs wants to merge 1 commit into
garu:mainfrom
rjbs:escape-nonprint
Open

escape_chars: add support for nonprintable#201
rjbs wants to merge 1 commit into
garu:mainfrom
rjbs:escape-nonprint

Conversation

@rjbs

@rjbs rjbs commented Jul 3, 2025

Copy link
Copy Markdown
Contributor

The current options for escape_chars are not enough for my needs. Lately, I'm dumping strings like this:

domain\x{1b}username\x{1f}mailbox

Why? None of your business! (Well, actually, it's part of the db format in the Cyrus IMAP server.)

Anyway, the nonascii and nonlatin1 escape rules are very permissive. I would say that they're almost never what somebody wants. They always show the NUL byte as \0, but other control characters are passed through, meaning that they mostly become invisible without a hex dumper. Ugh! Also, both nonascii and nonlatin1 are annoying if I am dumping something with CJK, when I want to see 방탄소년단 dumped correctly, but I still want \x18 for CANCEL.

This introduces the nonprintable option for escape_chars, which will escape the 67 Unicode codepoints that are not Print, of which 65 are in the Latin-1 space. This will do much, much better at dealing with data with control characters. This is actually exactly "all the Control characters plus two extremely rare whitespace characters."

@rjbs rjbs force-pushed the escape-nonprint branch from c7949ab to 5860e50 Compare July 3, 2025 22:52
@rjbs rjbs marked this pull request as ready for review July 4, 2025 19:19
@rjbs

rjbs commented Jul 4, 2025

Copy link
Copy Markdown
Contributor Author

I think we can do better even than this, but this is so much more useful to me that I've started here.

I'm going to think about just what I want. But I think it's something like a mapping from character sets to how to escape them. When I dump things, I want \x20 to be shown as a literal space, but \x09 to be shown as a colorized probably. \x18 should be \x18 but I suspect that I want all "format" characters (ones matching /p{Format}/, like WORD JOINER or ZERO WIDTH SPACE or INVISIBLE TIMES) to be shown as \N{...}.

(Maybe the current patch should use /[\P{Print}\p{Format]]/ but then it needs a better name.)

@rjbs

rjbs commented Jul 4, 2025

Copy link
Copy Markdown
Contributor Author

https://github.com/garu/Data-Printer/pull/101/files looks like it is at least similar in concept to what I was thinking.

@Leont

Leont commented Nov 24, 2025

Copy link
Copy Markdown
Contributor

I like this PR. It solves a real problem in the simplest way possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants