Skip to content

fix(cli): add allow_unicode=True and encoding="utf-8" to YAML I/O#1936

Merged
mnriem merged 1 commit intogithub:mainfrom
seiya-koji:fix/yaml-dump-allow-unicode
Mar 23, 2026
Merged

fix(cli): add allow_unicode=True and encoding="utf-8" to YAML I/O#1936
mnriem merged 1 commit intogithub:mainfrom
seiya-koji:fix/yaml-dump-allow-unicode

Conversation

@seiya-koji
Copy link
Contributor

Description

None of the yaml.dump() calls specify allow_unicode=True,
causing non-ASCII characters in extension descriptions to be escaped to \uXXXX sequences
in generated .agent.md frontmatter and config files.

Additionally, most write_text() and read_text() calls for YAML config files lack
encoding="utf-8", making them locale-dependent.
On Windows (where the default encoding is cp932 or cp1252),
this can cause UnicodeEncodeError on write or UnicodeDecodeError on read
when non-ASCII characters are present.

# Before (escaped)
description: "Pr\u00FCfe Konformit\u00E4t der Implementierung"

# After (fixed)
description: "Prüfe Konformität der Implementierung"

Reproduction

  1. Create an extension with non-ASCII description in extension.yml
  2. Run specify extension add --dev <path>
  3. Inspect generated .agent.md — frontmatter contains \uXXXX escapes

Fix

Add allow_unicode=True to all yaml.dump() calls, and encoding="utf-8" to all
corresponding write_text() / read_text() calls.

Testing

  • Tested locally with uv run specify --help
  • Ran existing tests with uv sync && uv run pytest
  • Tested with a sample project (if applicable)

AI Disclosure

  • I did not use AI assistance for this contribution
  • I did use AI assistance (describe below)

The bug was discovered when installing a Japanese-language extension via specify extension add
through GitHub Copilot (Claude Opus 4.6).
Root cause analysis and fix were also generated by the same agent.

Copilot AI review requested due to automatic review settings March 23, 2026 10:08
@seiya-koji seiya-koji requested a review from mnriem as a code owner March 23, 2026 10:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Unicode handling in YAML serialization/deserialization across the CLI so non-ASCII characters are preserved (not \uXXXX-escaped) and YAML config reads/writes are no longer locale-dependent (notably fixing Windows default-encoding pitfalls).

Changes:

  • Add allow_unicode=True to relevant yaml.dump() calls so YAML output preserves non-ASCII characters.
  • Add encoding="utf-8" to YAML-related read_text() / write_text() calls.
  • Add a unit test to ensure rendered frontmatter retains Unicode characters.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_extensions.py Adds regression coverage ensuring YAML frontmatter rendering doesn’t escape Unicode.
src/specify_cli/presets.py Forces UTF-8 when reading preset catalog YAML config.
src/specify_cli/extensions.py Forces UTF-8 for extension YAML configs and enables Unicode-preserving YAML dumps for saved config.
src/specify_cli/agents.py Enables Unicode-preserving YAML rendering for command frontmatter.
src/specify_cli/__init__.py Forces UTF-8 read/write and enables Unicode-preserving dumps for catalog-related CLI config updates.
Comments suppressed due to low confidence (1)

src/specify_cli/extensions.py:982

  • With encoding="utf-8", config_path.read_text(...) may raise UnicodeDecodeError for non-UTF-8 files. This isn’t currently caught, so the intended ValidationError wrapping won’t happen and a raw exception may bubble up. Consider catching UnicodeError (or UnicodeDecodeError) alongside yaml.YAMLError/OSError.
            data = yaml.safe_load(config_path.read_text(encoding="utf-8")) or {}
        except (yaml.YAMLError, OSError) as e:
            raise ValidationError(
                f"Failed to read catalog config {config_path}: {e}"
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@seiya-koji seiya-koji force-pushed the fix/yaml-dump-allow-unicode branch from 7b0b630 to dc06382 Compare March 23, 2026 10:16
None of the yaml.dump() calls specify allow_unicode=True, causing
non-ASCII characters in extension descriptions to be escaped to
\uXXXX sequences in generated .agent.md frontmatter and config files.

Add allow_unicode=True to all 6 yaml.dump() call sites, and
encoding="utf-8" to all corresponding write_text() and read_text()
calls to ensure consistent UTF-8 handling across platforms.
@seiya-koji seiya-koji force-pushed the fix/yaml-dump-allow-unicode branch from dc06382 to 18721af Compare March 23, 2026 10:20
Copilot AI review requested due to automatic review settings March 23, 2026 10:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@seiya-koji
Copy link
Contributor Author

Apologies — my coding agent ran force push without my approval. Sorry for the messy history.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mnriem mnriem merged commit a351c82 into github:main Mar 23, 2026
15 of 16 checks passed
@seiya-koji seiya-koji deleted the fix/yaml-dump-allow-unicode branch March 23, 2026 14:07
jonasbokim added a commit to Jonas-Construction-Software/jonas-spec-kit-dev that referenced this pull request Mar 26, 2026
…nc-main-2026-03-25

* upstream/main: (90 commits)
  fix(ps1): replace null-conditional operator for PowerShell 5.1 compatibility (github#1975)
  chore: bump version to 0.4.2 (github#1973)
  feat: Auto-register ai-skills for extensions whenever applicable (github#1840)
  docs: add manual testing guide for slash command validation (github#1955)
  Add AIDE, Extensify, and Presetify to community extensions (github#1961)
  docs: add community presets section to main README (github#1960)
  docs: move community extensions table to main README for discoverability (github#1959)
  docs(readme): consolidate Community Friends sections and fix ToC anchors (github#1958)
  fix(commands): rename NFR references to success criteria in analyze and clarify (github#1935)
  Add Community Friends section to README (github#1956)
  docs: add Community Friends section with Spec Kit Assistant VS Code extension (github#1944)
  chore: bump version to 0.4.1 (github#1953)
  Add checkpoint extension (github#1947)
  fix(scripts): prioritize .specify over git for repo root detection (github#1933)
  docs: add AIDE extension demo to community projects (github#1943)
  fix(templates): add missing Assumptions section to spec template (github#1939)
  chore: bump version to 0.4.0 (github#1937)
  fix(cli): add allow_unicode=True and encoding="utf-8" to YAML I/O (github#1936)
  fix(codex): native skills fallback refresh + legacy prompt suppression (github#1930)
  feat(cli): embed core pack in wheel for offline/air-gapped deployment (github#1803)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants