Skip to content

Conversation

@veeceey
Copy link

@veeceey veeceey commented Feb 10, 2026

Summary

Fixes #6917.

super_len() uses seek/tell to measure the length of file-like objects such as StringIO and BytesIO. However, StringIO.tell() returns the character position, not the byte offset. For strings containing multi-byte UTF-8 characters (e.g. emoji), this produces an incorrect Content-Length header that violates RFC 9110 section 8.6.

For example, io.StringIO("\U0001F4A9") (a single emoji) previously returned a length of 1 (character count) instead of 4 (UTF-8 byte count), causing the server to receive a Content-Length: 1 header while 4 bytes are actually sent.

This is the same class of bug that was fixed for plain str bodies in #6586 -- str is encoded to UTF-8 before measuring, but StringIO was not. This PR makes StringIO handling consistent with str by reading the remaining text, encoding it to UTF-8, and measuring the byte length.

Before

str       → Content-Length: 4  ✓
bytes     → Content-Length: 4  ✓
BytesIO   → Content-Length: 4  ✓
StringIO  → Content-Length: 1  ✗  (character count, not byte count)

After

str       → Content-Length: 4  ✓
bytes     → Content-Length: 4  ✓
BytesIO   → Content-Length: 4  ✓
StringIO  → Content-Length: 4  ✓

Changes

  • src/requests/utils.py: In super_len(), detect io.StringIO and read+encode the remaining text to compute the UTF-8 byte length instead of relying on tell().
  • tests/test_utils.py: Added test_super_len_stringio_multibyte covering single emoji, mixed content, partially-read StringIO, and position preservation.

Test plan

  • All existing TestSuperLen tests pass (ASCII StringIO, BytesIO, partially-read files, etc.)
  • New test verifies correct byte count for multi-byte characters
  • New test verifies correct byte count for partially-read StringIO
  • New test verifies file position is preserved after super_len() call

🤖 Generated with Claude Code

StringIO.tell() returns the character position, not the byte offset,
so super_len() returned the wrong value for StringIO objects containing
multi-byte UTF-8 characters (e.g. emoji).  This caused an incorrect
Content-Length header that violates RFC 9110 section 8.6.

Read the remaining text and encode it to UTF-8 to measure the true
byte length, consistent with how plain str bodies are already handled.

Closes psf#6917

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect Content-Length header with StringIO body

1 participant