fix: non-ASCII / UTF-8 robustness (git filenames, gain truncation, proxy capture)#2155
Open
snwsnwsnw wants to merge 4 commits into
Open
fix: non-ASCII / UTF-8 robustness (git filenames, gain truncation, proxy capture)#2155snwsnwsnw wants to merge 4 commits into
snwsnwsnw wants to merge 4 commits into
Conversation
Git escapes non-ASCII path bytes as octal \nnn by default (core.quotepath=true). rtk passed this straight through, so git status / log --name-only / diff --stat showed CJK and other non-ASCII filenames as unreadable escapes. Inject -c core.quotepath=false at the single git_cmd() chokepoint; no effect on ASCII paths.
gain --history and the failure summary sliced command strings by byte index (&cmd[..47] etc). A command containing multibyte UTF-8 (e.g. a non-ASCII search pattern or commit message) could be cut mid-codepoint and panic. Use the existing char-safe utils::truncate().
The proxy streaming path caps the captured copy at 1 MiB. When the cap lands inside a multibyte UTF-8 sequence, from_utf8_lossy emitted a trailing replacement char into the tracked output. decode_captured() trims an incomplete trailing sequence while keeping lossy behavior for genuinely invalid mid-stream bytes.
2378586 to
69b2f92
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three small, independent fixes for handling non-ASCII (CJK and other multibyte UTF-8) text. Each is in its own commit.
1.
git: non-ASCII filenames shown as octal escapesGit escapes non-ASCII path bytes as octal
\nnnby default (core.quotepath=true), and rtk passed that through unchanged.Affects
git status,git log --name-only,git diff --stat, etc. Fix injects-c core.quotepath=falseat the singlegit_cmd()chokepoint. No effect on ASCII paths.2.
gain: byte-index slicing can panic on multibyte inputgain --historyand the failure summary truncated command strings with&cmd[..47]/&rec.rtk_cmd[..22]/&rec.raw_command[..37]. A command containing multibyte UTF-8 (a non-ASCII search pattern, a non-ASCII commit message, etc.) can land the cut mid-codepoint and panic. Switched to the existing char-safeutils::truncate().3.
proxy: spurious replacement char at the 1 MiB capture capThe proxy streaming path caps the captured copy at 1 MiB. When the cap lands inside a multibyte sequence,
from_utf8_lossyappended a trailingU+FFFDto the tracked output.decode_captured()trims an incomplete trailing sequence while preserving lossy behavior for genuinely invalid mid-stream bytes. (User-facing stdout was already byte-exact; this only affected the captured copy used for tracking.)Testing
cargo build --releaseclean. Verified before/after on a repo with CJK filenames and commit messages for #1;gain --historyno longer panics for #2.