Fix incorrect Content-Length for StringIO with multi-byte characters #7201

veeceey · 2026-02-10T08:35:01Z

Summary

super_len() uses seek/tell to measure the length of file-like objects such as StringIO and BytesIO. However, StringIO.tell() returns the character position, not the byte offset. For strings containing multi-byte UTF-8 characters (e.g. emoji), this produces an incorrect Content-Length header that violates RFC 9110 section 8.6.

For example, io.StringIO("\U0001F4A9") (a single emoji) previously returned a length of 1 (character count) instead of 4 (UTF-8 byte count), causing the server to receive a Content-Length: 1 header while 4 bytes are actually sent.

This is the same class of bug that was fixed for plain str bodies in #6586 -- str is encoded to UTF-8 before measuring, but StringIO was not. This PR makes StringIO handling consistent with str by reading the remaining text, encoding it to UTF-8, and measuring the byte length.

Before

str       → Content-Length: 4  ✓
bytes     → Content-Length: 4  ✓
BytesIO   → Content-Length: 4  ✓
StringIO  → Content-Length: 1  ✗  (character count, not byte count)

After

str       → Content-Length: 4  ✓
bytes     → Content-Length: 4  ✓
BytesIO   → Content-Length: 4  ✓
StringIO  → Content-Length: 4  ✓

Changes

src/requests/utils.py: In super_len(), detect io.StringIO and read+encode the remaining text to compute the UTF-8 byte length instead of relying on tell().
tests/test_utils.py: Added test_super_len_stringio_multibyte covering single emoji, mixed content, partially-read StringIO, and position preservation.

Test plan

All existing TestSuperLen tests pass (ASCII StringIO, BytesIO, partially-read files, etc.)
New test verifies correct byte count for multi-byte characters
New test verifies correct byte count for partially-read StringIO
New test verifies file position is preserved after super_len() call

🤖 Generated with Claude Code

StringIO.tell() returns the character position, not the byte offset, so super_len() returned the wrong value for StringIO objects containing multi-byte UTF-8 characters (e.g. emoji). This caused an incorrect Content-Length header that violates RFC 9110 section 8.6. Read the remaining text and encode it to UTF-8 to measure the true byte length, consistent with how plain str bodies are already handled. Closes psf#6917 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix incorrect Content-Length for StringIO with multi-byte characters #7201

Fix incorrect Content-Length for StringIO with multi-byte characters #7201

veeceey commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Fix incorrect Content-Length for StringIO with multi-byte characters #7201

Are you sure you want to change the base?

Fix incorrect Content-Length for StringIO with multi-byte characters #7201

Conversation

veeceey commented Feb 10, 2026

Summary

Before

After

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant