Skip to content

Fix Unicode surrogate encoding errors in email sending#2106

Merged
duckduckgrayduck merged 2 commits intoMuckRock:masterfrom
heathdutton:fix-2098-unicode-email-encoding
Jan 29, 2026
Merged

Fix Unicode surrogate encoding errors in email sending#2106
duckduckgrayduck merged 2 commits intoMuckRock:masterfrom
heathdutton:fix-2098-unicode-email-encoding

Conversation

@heathdutton
Copy link
Copy Markdown
Contributor

Fixes #2098

The foia_send_email task fails with UnicodeEncodeError when email subject or body contains surrogate characters (U+D800 to U+DFFF) that cannot be encoded in UTF-8.

Adds a sanitize_surrogates utility function that replaces invalid surrogates before sending, applied to both subject and body in send_delayed_email.

@duckduckgrayduck
Copy link
Copy Markdown
Contributor

Seeing pylint errors:

************* Module muckrock.foia.tests.test_foia_request
muckrock/foia/tests/test_foia_request.py:715:8: C0415: Import outside toplevel (muckrock.foia.utils.sanitize_surrogates) (import-outside-toplevel)
muckrock/foia/tests/test_foia_request.py:730:8: C0415: Import outside toplevel (muckrock.foia.utils.sanitize_surrogates) (import-outside-toplevel)
muckrock/foia/tests/test_foia_request.py:739:8: C0415: Import outside toplevel (muckrock.foia.utils.sanitize_surrogates) (import-outside-toplevel)

Fixes pylint C0415 (import-outside-toplevel) errors.
@duckduckgrayduck duckduckgrayduck merged commit 75ec57b into MuckRock:master Jan 29, 2026
1 of 3 checks passed
jamditis pushed a commit to jamditis/muckrock that referenced this pull request Mar 24, 2026
The existing surrogate sanitization covers email subject and body
(PR MuckRock#2106), but attach_files_to_email() was missed. When chardet
decodes a text file attachment with a misidentified encoding, the
result can contain surrogate characters that crash email
serialization.

Also handles the edge case where chardet returns encoding=None
(e.g., empty files) which would cause a TypeError on decode().

Addresses MuckRock#2098
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

foia_send_email task runs into Unicode Encoding Errors

2 participants