Skip to content

Commit ac809be

Browse files
committed
20250929_00-Docs
- Updated teh cleanup-text with a "TODO" item to normalize some of teh typographic glyphs as an option.
1 parent a1633aa commit ac809be

File tree

2 files changed

+12
-9
lines changed

2 files changed

+12
-9
lines changed

.pre-commit-config.yaml

Lines changed: 0 additions & 9 deletions
This file was deleted.

docs/cleanup-text.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,18 @@ tests/test_all.sh --help
8585
- Use the `-i` flag if you need to preserve invisible Unicode characters for special use cases.
8686
- Use the `-n` flag if you need to suppress the final newline (rare).
8787

88+
## TODO (Alignment & Bracket Normalization)
89+
90+
- Add optional folding of fullwidth square brackets to ASCII in `unicodefix.transforms.clean_text`:
91+
- Map ```[` and ```]` under a new flag (e.g., `preserve_fullwidth_brackets: bool = False`).
92+
- Preserve dagger glyph `` and inline spans (e.g., `†L147-L156`).
93+
- Rationale: terminal table alignment (fixed-width) and monospace column layout can drift with fullwidth characters.
94+
- Consider expanding flags to preserve typographic punctuation while still removing invisible/control chars:
95+
- Existing: `preserve_quotes`, `preserve_dashes`
96+
- Proposed: `preserve_fullwidth_brackets`, `preserve_fullwidth_variants`
97+
- Provide helper for ASCII-only display normalization for terminals while retaining original text for auditing/search.
98+
- Document patterns: (1) global pre-clean before render, (2) render-time folding behind a toggle.
99+
88100
## Changelog
89101

90102
See `CHANGELOG.md` for a summary of recent changes.

0 commit comments

Comments
 (0)