Skip to content

fix: add surrogate pair sanitization to prevent JSON API errors#1205

Open
konard wants to merge 5 commits intomainfrom
issue-1204-13c3cf8b18fb
Open

fix: add surrogate pair sanitization to prevent JSON API errors#1205
konard wants to merge 5 commits intomainfrom
issue-1204-13c3cf8b18fb

Conversation

@konard
Copy link
Copy Markdown
Contributor

@konard konard commented Jan 30, 2026

Summary

Fixes #1204 — "The request body is not valid JSON: no low surrogate in string"

Root Cause

Claude Code CLI doesn't sanitize tool outputs containing lone/orphaned Unicode surrogates (U+D800-U+DFFF) before sending the JSON request body to the Anthropic API. This is a known upstream bug that remains unfixed as of January 2026.

When tool outputs contain binary data interpreted as text (e.g., compiled Rust output, .af files, mutation testing progress bars), byte sequences in the surrogate range produce lone surrogates that are invalid in JSON per RFC 8259. JavaScript's JSON.stringify() accepts them but Anthropic's server-side parser rejects them.

Changes

  • src/lib.mjs: Added exported sanitizeSurrogates() utility that replaces lone surrogates with U+FFFD (Unicode replacement character)
  • src/claude.lib.mjs: Sanitize prompts before passing to Claude CLI, preventing surrogate corruption in the initial request
  • src/interactive-mode.lib.mjs: Added sanitizeSurrogates() to safeJsonStringify() replacer function, ensuring all string values in GitHub PR comments are clean
  • docs/case-studies/issue-1204/: Full case study with timeline, root cause analysis, and upstream references
  • experiments/issue-1204-surrogate-reproduction.mjs: Reproducible demonstration of the problem and fix
  • tests/test-surrogate-sanitization.mjs: 22 unit tests covering all edge cases

Limitations

This fix sanitizes content that hive-mind controls (prompts, interactive mode comments). The core issue — tool output corruption within Claude Code's internal conversation history — requires an upstream fix in anthropics/claude-code. We've commented on the upstream issue with our analysis and suggested fix.

Test Plan

  • 22 new unit tests for surrogate sanitization (node tests/test-surrogate-sanitization.mjs)
  • 41 existing interactive-mode tests still pass (node tests/test-interactive-mode.mjs)
  • Experiment script demonstrates the fix (node experiments/issue-1204-surrogate-reproduction.mjs)
  • ESLint passes with no errors (npm run lint)
  • Pre-existing test suite passes (1 pre-existing failure in CLAUDE_5_HOUR_SESSION_THRESHOLD unrelated to changes)

This PR was created by the AI issue solver

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: #1204
@konard konard self-assigned this Jan 30, 2026
Add sanitizeSurrogates() utility that removes lone/orphaned Unicode
surrogates (U+D800-U+DFFF) from strings before they're used in JSON
serialization or API requests. This prevents the "no low surrogate in
string" error from Anthropic's API when tool outputs contain binary
data interpreted as text.

Applied in:
- lib.mjs: exported sanitizeSurrogates() utility
- claude.lib.mjs: sanitize prompts before passing to Claude CLI
- interactive-mode.lib.mjs: sanitize strings in safeJsonStringify()

Includes case study analysis in docs/case-studies/issue-1204/,
reproducible experiment, and 22 unit tests.

Fixes #1204

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@konard konard changed the title [WIP] The request body is not valid JSON: no low surrogate in string: line 1 column 160309 (char 160308) fix: add surrogate pair sanitization to prevent JSON API errors Jan 30, 2026
@konard konard marked this pull request as ready for review January 30, 2026 23:21
@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jan 30, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $6.526134 USD
  • Calculated by Anthropic: $5.170683 USD
  • Difference: $-1.355451 (-20.77%)
    📎 Log file uploaded as Gist (1591KB)
    🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard konard marked this pull request as draft January 31, 2026 00:22
@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jan 31, 2026

🤖 AI Work Session Started

Starting automated work session at 2026-01-31T00:22:01.972Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback.

konard and others added 2 commits January 31, 2026 01:24
Add changeset file documenting the patch-level fix for issue #1204.
This changeset will trigger version bump to 1.14.2 in the next release.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@konard konard marked this pull request as ready for review January 31, 2026 00:30
@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jan 31, 2026

✅ PR Ready for Review

All CI checks have passed and the PR is now ready for review.

Summary of Changes

Added changeset file (.changeset/fix-surrogate-pair-sanitization.md):

  • Properly documented this as a patch release (bug fix)
  • Will trigger version bump from 1.14.1 to 1.14.2 on merge

Merged latest main branch:

All tests passing:

  • ✅ ESLint: No errors
  • ✅ Surrogate sanitization tests: 22/22 passed
  • ✅ Interactive mode tests: 41/41 passed
  • ✅ CI checks: All green

Next Steps

This PR can now be merged. Upon merge, the changeset workflow will:

  1. Create/update a "Version Packages" PR with changelog
  2. When that PR is merged, publish v1.14.2 to npm

🤖 This PR is ready for human review and merge

@konard
Copy link
Copy Markdown
Contributor Author

konard commented Jan 31, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $1.111471 USD
  • Calculated by Anthropic: $0.697331 USD
  • Difference: $-0.414140 (-37.26%)
    📎 Log file uploaded as Gist (246KB)
    🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The request body is not valid JSON: no low surrogate in string: line 1 column 160309 (char 160308)

1 participant