Skip to content

fix: buffer tar.gz stream to prevent Content-Length mismatch in file upload#1112

Open
baptistecolle wants to merge 1 commit intoe2b-dev:mainfrom
baptistecolle:fix/gzip-stream-mismatch
Open

fix: buffer tar.gz stream to prevent Content-Length mismatch in file upload#1112
baptistecolle wants to merge 1 commit intoe2b-dev:mainfrom
baptistecolle:fix/gzip-stream-mismatch

Conversation

@baptistecolle
Copy link

Summary

tarFileStreamUpload in packages/js-sdk/src/template/utils.ts calls tarFileStream twice — once to calculate Content-Length by consuming the stream, then again to create the upload body. Since gzip compression is non-deterministic (internal dictionary state, portable mode timing), the second stream can produce a different byte count than the first. This causes Node's fetch to throw:

RequestContentLengthMismatchError: Request body length does not match content-length header

which surfaces as a FileUploadError.

How to reproduce

  1. Use .copy() with a directory containing many files (e.g. ~100 files)
  2. The upload intermittently fails with RequestContentLengthMismatchError
  3. Smaller single-file copies may not trigger it because the gzip variance is negligible

Fix

Buffer the tar.gz stream into memory once, then use that buffer for both the Content-Length header and the upload body. This eliminates the two-pass approach entirely.

  • Replaces the two calls to tarFileStream with a single call
  • Collects all chunks into a Uint8Array buffer
  • Uses buffer.length for content length and the buffer itself as the upload body

@changeset-bot
Copy link

changeset-bot bot commented Feb 1, 2026

⚠️ No Changeset found

Latest commit: 937c73c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca83cc0b41

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 352 to 356
for await (const chunk of stream as unknown as AsyncIterable<Uint8Array>) {
chunks.push(chunk)
}
const totalLength = chunks.reduce((sum, c) => sum + c.length, 0)
const buffer = new Uint8Array(totalLength)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid buffering entire tarball in memory

This change now reads the full tar.gz stream into an in-memory Uint8Array before uploading. For large templates (e.g., multi‑GB directories), this can exhaust process memory or trigger GC thrashing, causing uploads to fail or the process to crash. Previously the data was streamed, which bounded memory usage. If users upload large directories, this is a regression in resource usage. Consider using a single stream pass with a tee (counting bytes while streaming) or buffering to a temp file instead of RAM.

Useful? React with 👍 / 👎.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I solved this with my new commit

@baptistecolle
Copy link
Author

As requested in the CONTRIBUTING.md, I also started a discussion on the Discord server to bring visibility to this issue and get feedback from the community:

https://discord.com/channels/1092455714431180995/1467502046021160961

…upload

tarFileStreamUpload called tarFileStream twice — once to calculate
Content-Length by consuming the stream, then again to create the upload
body. Since gzip compression is non-deterministic (internal dictionary
state, timing), the second stream can produce a different byte count,
causing fetch to throw RequestContentLengthMismatchError.

Buffer the stream into memory on the first pass and reuse that buffer
for both the content length and the upload body.
@baptistecolle baptistecolle force-pushed the fix/gzip-stream-mismatch branch from ca83cc0 to 937c73c Compare February 1, 2026 12:52
@mishushakov
Copy link
Member

Hey there, thanks for the PR!
I believe we fixed this issue already in JS SDK 2.10.5 (#1095).

Can you please try it and let us know if the issue still persists?

@baptistecolle
Copy link
Author

baptistecolle commented Feb 3, 2026

Thanks for taking a quick look @mishushakov !

Unfortunately, the previous PR does not resolve the issue. I am currently using e2b@2.12.0.

If you want a reproducible (though non-minimal) setup, this repository should demonstrate the problem:
https://github.com/baptistecolle/bap/tree/main/app/src/e2b-template

If I am not mistaken, even with the previous fix applied, the gzip output is still non-deterministic. The zlib/gzip algorithm keeps internal state that can change between runs, so compressing the same input twice can still result in different byte outputs.

@mishushakov
Copy link
Member

Okay, can you try sending the archive without the content-length header? I think it should work. I am hesitant of using temp files

@mishushakov
Copy link
Member

I have tried building your template, but on my computer it ran without any issues:

0.0s  | 05:31:18 PM INFO  Requesting build for template: bap-agent-dev
1.1s  | 05:31:19 PM INFO  Template created with ID: ibb5eo1pjpaga0jnryny, Build ID: 2d44b2e3-76b7-4afd-ae6c-f4622c08361f
2.6s  | 05:31:21 PM INFO  Uploaded 'opencode.json'
2.6s  | 05:31:21 PM INFO  Uploaded 'plugins/integration-permissions.ts'
2.7s  | 05:31:21 PM INFO  Uploaded 'cli'
2.7s  | 05:31:21 PM INFO  All file uploads completed

What operating system are you using?

@noamzbr
Copy link
Contributor

noamzbr commented Feb 4, 2026

@mishushakov as of yesterday (around 10 GMT), we started experiencing the same RequestContentLengthMismatchError: Request body length does not match content-length header error that originally triggered me to write the previous fix.

I can confirm that the this fix (@baptistecolle 🙏 ) fixes the issue for us as well. It is intermittent - without the fix (either on 2.10.5 or 2.12.0) we sometimes get the RequestContentLengthMismatchError error, and sometimes the following undici error:

TypeError: fetch failed
  [cause]: SocketError: other side closed
  code: 'UND_ERR_SOCKET'
  socket: {
    localAddress: 'XXX',
    localPort: 55335,
    remoteAddress: '142.250.75.219',
    remotePort: 443,
    remoteFamily: 'IPv4',
    bytesWritten: 0,
    bytesRead: 0
  }
  • Node: 22.16.0
  • undici: 6.21.2
  • MacOs

@baptistecolle
Copy link
Author

What operating system are you using?

I am on Mac

@baptistecolle
Copy link
Author

can you try sending the archive without the content-length header?

I am still seeing the same issue with the proposed fix. To be honest, I am currently traveling, so I have not had much time to dig into it today. It is possible that my attempt to remove the Content-Length header is not fully correct. I can try tomorrow to spend a bit more time on it

@baptistecolle
Copy link
Author

baptistecolle commented Feb 5, 2026

So I looked into this further, and without the Content-Length header, I’m hitting request timeouts. Because of that, I wasn’t able to remove the “archive without the Content-Length header” part. From what I found, for a signed “simple upload” to GCS, Content-Length is effectively required, so I don’t think it’s possible to remove it.

@noamzbr Which operating system are you using? Also, do you have an example template that @mishushakov could use to reproduce the issue?

@mishushakov What operating system are you on? Also, could you clarify why you’re hesitant to use temporary files? I’m just trying to explore alternative solutions.

I think the main issue is the double gzip call, which is why I’m looking for a way to avoid it. Do you have any ideas or suggestions?

@mishushakov
Copy link
Member

I am using Mac and we test on Linux and Windows on the CI. But I think we might be not catching anything in tests. I have a solution in mind that does use a single multiplexed stream instead of two different streams, but on sicko leave now - will implement when I feel better. Thanks

@baptistecolle
Copy link
Author

baptistecolle commented Feb 5, 2026

will implement when I feel better.

Thanks a lot! 🔥

(FYI, I updated my previous comment: for a signed “simple upload” to GCS, Content-Length is required, which is why I was getting a timeout.)

So yes, I think the solution is to use a single multiplexed stream instead of two separate streams.

Either way, thanks for the quick responses, @mishushakov. Let me know if I can help. Otherwise, I’ll let you handle the rest of the PR.

on sick leave now

Rest well! 🤒

@mishushakov
Copy link
Member

Hey both, can you try this branch?
#1118

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants