Skip to content

Conversation

@n3rada
Copy link

@n3rada n3rada commented Dec 11, 2025

Hello maintainers 👋

The current implementation of the headers property incorrectly splits header values on commas. Many standard HTTP headers legitimately contain commas inside their values (e.g., Accept, Accept-Language, User-Agent). Splitting these values destroys their ordering, semantics, and meaning, resulting in invalid or incomplete headers being exposed to Python.

Additionally, because HTTPMessage[key] = value overwrites previous entries for the same header name, the split logic also caused the last fragment to override all earlier fragments.

Before the fix, a request with:

Accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8

was exposed as:

"accept": "image/avif"

Similarly, a User-Agent such as:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36

was incorrectly truncated because commas inside parentheses were interpreted as separators.

I don't know if it was the intended way of doing it, but it solved my problem.

@n3rada n3rada requested review from a team as code owners December 11, 2025 15:56
@github-actions
Copy link

github-actions bot commented Dec 11, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@jasnell jasnell added the python Issues/PRs relating to Python Workers label Dec 11, 2025
@jasnell jasnell requested review from dom96 and hoodmane December 11, 2025 16:04
@n3rada
Copy link
Author

n3rada commented Dec 11, 2025

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Dec 11, 2025
@jasnell
Copy link
Collaborator

jasnell commented Dec 11, 2025

The splitting here definitely looks wrong. At the very least it needs to be taking quoted strings into consideration. That said, any change on this is likely to need a compatibility flag as changing this is likely breaking for existing workers.

@n3rada
Copy link
Author

n3rada commented Dec 11, 2025

On my side, this change has corrected my workers, not broken them! No more incorrectly sent UAs!

@jasnell
Copy link
Collaborator

jasnell commented Dec 11, 2025

Understood. However, it's conceivable that changing this could break other workers, which means fixing it would require a compatibility flag so that those existing workers would be unaffected by the change.

@ryanking13
Copy link
Contributor

Yes, the splitting definitely looks wrong. @hoodmane @dom96 Do we have a way to load compat flags in that Python file? It looks like the MetadataReader is only in JavaScript.

@jasnell
Copy link
Collaborator

jasnell commented Dec 12, 2025

Compat flags can be inspected using the globalThis.Cloudflare.compatibilityFlags

@hoodmane
Copy link
Contributor

Yeah maybe we should do registerJsModule("_cloudflare_compat_flags", COMPATIBILITY_FLAGS).

@jasnell
Copy link
Collaborator

jasnell commented Dec 12, 2025

If I remember correctly, I believe @anonrig has some code in the node compat stuff for parsing headers out with quoted strings handled as part of the node:http impl. Worth taking a look there for impl logic that can be adopted/duplicated.

@dom96
Copy link
Contributor

dom96 commented Jan 2, 2026

Looking at the node.js implementation I see splitHeaderValue in internal_http_utils.ts. It seems to be intended for Headers which are only ever expected to contain a single value, as it only ever returns up to the first comma. So we cannot use it for all header keys. Python's Headers are also stored in a email.message.Message which supports retrieval of all the values via get_all, so we need to split the header values up appropriately.

@n3rada
Copy link
Author

n3rada commented Jan 5, 2026

I think there’s a conceptual mix-up here between:

  1. multiple header fields with the same name (e.g. two Set-Cookie: lines), and
  2. a single header field whose field-value contains commas (e.g. Accept, Accept-Language, User-Agent, Cache-Control, etc.).

Many standard headers legitimately contain commas, and commas do not imply "multiple header instances" unless the specific header definition says so (and even then quoted strings must be respected). email.message.Message.get_all() is useful to preserve multiple header fields with the same name. It should not require (or justify) comma-tokenizing a single field-value.

So I don’t think the fix should be "split appropriately". The fix should be "do not split at all at this layer".

What do you think @dom96?

@dom96
Copy link
Contributor

dom96 commented Jan 14, 2026

Well, when I say we need to split the headers up appropriately I mean we need to handle the Set-Cookie case correctly. The rest of the headers can remain in their unsplit state.

Copy link
Contributor

@dom96 dom96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a real bug, I think your fix is a good start, just needs to handle the Set-Cookie case, add a compat flag and make some changes to our tests.

If you're up for that, I can give you some hints on what changes you need:

If you want to run the test you can use bazel run //src/workerd/server/tests/python:sdk_0.28.2@

Totally understand if that's too much, just let me know and I'll open a PR, but since you created this one I wanted to give you a chance to complete it :)

@n3rada
Copy link
Author

n3rada commented Jan 14, 2026

I am keen on giving-it a try. Could you please tell me how to verify that everything is good when I want to test my changes?

@dom96
Copy link
Contributor

dom96 commented Jan 14, 2026

Running bazel run //src/workerd/server/tests/python:sdk_0.28.2@ should be enough to verify it's working correctly

@hoodmane
Copy link
Contributor

Thanks for your help with this @n3rada!

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 14, 2026

CodSpeed Performance Report

Merging this PR will degrade performance by 42.52%

Comparing n3rada:main (ce34c09) with main (b24baa9)

Summary

❌ 1 regressed benchmark
✅ 158 untouched benchmarks
⏩ 38 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
Existing_LargeStream_Value 27.5 s 47.8 s -42.52%

Footnotes

  1. 38 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@n3rada
Copy link
Author

n3rada commented Jan 14, 2026

Cheers for the advice, @dom96! I tried to make the changes you asked me to.

I have tried a few things, but I am raising the white flag 🏳️.

I would need a full tutorial on how this project is managed to understand how to know whether my tests are correct or not. I will leave it to you to sort out the PR. You can add commits straight to this PR if you want to keep my original contribution, I would be please.

I'd be interested to know what I've done wrong to get this kind of CodSpeed Performance Report. And how to read it properly.

@hoodmane
Copy link
Contributor

CodSpeed Performance Report

This is just spam for what it's worth.

@dom96
Copy link
Contributor

dom96 commented Jan 16, 2026

You were very close, the surprise I think was that the test was running with an old compat date and so getSetCookie wasn't defined.

Thanks for reporting this and helping out with the fix!

Here is the PR I just created: #5915.

@dom96 dom96 closed this Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Issues/PRs relating to Python Workers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants