fix(cli): add allow_unicode=True and encoding="utf-8" to YAML I/O#1936
fix(cli): add allow_unicode=True and encoding="utf-8" to YAML I/O#1936mnriem merged 1 commit intogithub:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses Unicode handling in YAML serialization/deserialization across the CLI so non-ASCII characters are preserved (not \uXXXX-escaped) and YAML config reads/writes are no longer locale-dependent (notably fixing Windows default-encoding pitfalls).
Changes:
- Add
allow_unicode=Trueto relevantyaml.dump()calls so YAML output preserves non-ASCII characters. - Add
encoding="utf-8"to YAML-relatedread_text()/write_text()calls. - Add a unit test to ensure rendered frontmatter retains Unicode characters.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_extensions.py |
Adds regression coverage ensuring YAML frontmatter rendering doesn’t escape Unicode. |
src/specify_cli/presets.py |
Forces UTF-8 when reading preset catalog YAML config. |
src/specify_cli/extensions.py |
Forces UTF-8 for extension YAML configs and enables Unicode-preserving YAML dumps for saved config. |
src/specify_cli/agents.py |
Enables Unicode-preserving YAML rendering for command frontmatter. |
src/specify_cli/__init__.py |
Forces UTF-8 read/write and enables Unicode-preserving dumps for catalog-related CLI config updates. |
Comments suppressed due to low confidence (1)
src/specify_cli/extensions.py:982
- With
encoding="utf-8",config_path.read_text(...)may raiseUnicodeDecodeErrorfor non-UTF-8 files. This isn’t currently caught, so the intendedValidationErrorwrapping won’t happen and a raw exception may bubble up. Consider catchingUnicodeError(orUnicodeDecodeError) alongsideyaml.YAMLError/OSError.
data = yaml.safe_load(config_path.read_text(encoding="utf-8")) or {}
except (yaml.YAMLError, OSError) as e:
raise ValidationError(
f"Failed to read catalog config {config_path}: {e}"
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7b0b630 to
dc06382
Compare
None of the yaml.dump() calls specify allow_unicode=True, causing non-ASCII characters in extension descriptions to be escaped to \uXXXX sequences in generated .agent.md frontmatter and config files. Add allow_unicode=True to all 6 yaml.dump() call sites, and encoding="utf-8" to all corresponding write_text() and read_text() calls to ensure consistent UTF-8 handling across platforms.
dc06382 to
18721af
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Apologies — my coding agent ran force push without my approval. Sorry for the messy history. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…nc-main-2026-03-25 * upstream/main: (90 commits) fix(ps1): replace null-conditional operator for PowerShell 5.1 compatibility (github#1975) chore: bump version to 0.4.2 (github#1973) feat: Auto-register ai-skills for extensions whenever applicable (github#1840) docs: add manual testing guide for slash command validation (github#1955) Add AIDE, Extensify, and Presetify to community extensions (github#1961) docs: add community presets section to main README (github#1960) docs: move community extensions table to main README for discoverability (github#1959) docs(readme): consolidate Community Friends sections and fix ToC anchors (github#1958) fix(commands): rename NFR references to success criteria in analyze and clarify (github#1935) Add Community Friends section to README (github#1956) docs: add Community Friends section with Spec Kit Assistant VS Code extension (github#1944) chore: bump version to 0.4.1 (github#1953) Add checkpoint extension (github#1947) fix(scripts): prioritize .specify over git for repo root detection (github#1933) docs: add AIDE extension demo to community projects (github#1943) fix(templates): add missing Assumptions section to spec template (github#1939) chore: bump version to 0.4.0 (github#1937) fix(cli): add allow_unicode=True and encoding="utf-8" to YAML I/O (github#1936) fix(codex): native skills fallback refresh + legacy prompt suppression (github#1930) feat(cli): embed core pack in wheel for offline/air-gapped deployment (github#1803) ...
Description
None of the
yaml.dump()calls specifyallow_unicode=True,causing non-ASCII characters in extension descriptions to be escaped to
\uXXXXsequencesin generated
.agent.mdfrontmatter and config files.Additionally, most
write_text()andread_text()calls for YAML config files lackencoding="utf-8", making them locale-dependent.On Windows (where the default encoding is cp932 or cp1252),
this can cause
UnicodeEncodeErroron write orUnicodeDecodeErroron readwhen non-ASCII characters are present.
Reproduction
descriptioninextension.ymlspecify extension add --dev <path>.agent.md— frontmatter contains\uXXXXescapesFix
Add
allow_unicode=Trueto allyaml.dump()calls, andencoding="utf-8"to allcorresponding
write_text()/read_text()calls.Testing
uv run specify --helpuv sync && uv run pytestAI Disclosure
The bug was discovered when installing a Japanese-language extension via
specify extension addthrough GitHub Copilot (Claude Opus 4.6).
Root cause analysis and fix were also generated by the same agent.