fix: buffer tar.gz stream to prevent Content-Length mismatch in file upload#1112
fix: buffer tar.gz stream to prevent Content-Length mismatch in file upload#1112baptistecolle wants to merge 1 commit intoe2b-dev:mainfrom
Conversation
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ca83cc0b41
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for await (const chunk of stream as unknown as AsyncIterable<Uint8Array>) { | ||
| chunks.push(chunk) | ||
| } | ||
| const totalLength = chunks.reduce((sum, c) => sum + c.length, 0) | ||
| const buffer = new Uint8Array(totalLength) |
There was a problem hiding this comment.
Avoid buffering entire tarball in memory
This change now reads the full tar.gz stream into an in-memory Uint8Array before uploading. For large templates (e.g., multi‑GB directories), this can exhaust process memory or trigger GC thrashing, causing uploads to fail or the process to crash. Previously the data was streamed, which bounded memory usage. If users upload large directories, this is a regression in resource usage. Consider using a single stream pass with a tee (counting bytes while streaming) or buffering to a temp file instead of RAM.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
I solved this with my new commit
|
As requested in the CONTRIBUTING.md, I also started a discussion on the Discord server to bring visibility to this issue and get feedback from the community: https://discord.com/channels/1092455714431180995/1467502046021160961 |
…upload tarFileStreamUpload called tarFileStream twice — once to calculate Content-Length by consuming the stream, then again to create the upload body. Since gzip compression is non-deterministic (internal dictionary state, timing), the second stream can produce a different byte count, causing fetch to throw RequestContentLengthMismatchError. Buffer the stream into memory on the first pass and reuse that buffer for both the content length and the upload body.
ca83cc0 to
937c73c
Compare
|
Hey there, thanks for the PR! Can you please try it and let us know if the issue still persists? |
|
Thanks for taking a quick look @mishushakov ! Unfortunately, the previous PR does not resolve the issue. I am currently using e2b@2.12.0. If you want a reproducible (though non-minimal) setup, this repository should demonstrate the problem: If I am not mistaken, even with the previous fix applied, the gzip output is still non-deterministic. The zlib/gzip algorithm keeps internal state that can change between runs, so compressing the same input twice can still result in different byte outputs. |
|
Okay, can you try sending the archive without the content-length header? I think it should work. I am hesitant of using temp files |
|
I have tried building your template, but on my computer it ran without any issues: What operating system are you using? |
|
@mishushakov as of yesterday (around 10 GMT), we started experiencing the same I can confirm that the this fix (@baptistecolle 🙏 ) fixes the issue for us as well. It is intermittent - without the fix (either on 2.10.5 or 2.12.0) we sometimes get the
|
I am on Mac |
I am still seeing the same issue with the proposed fix. To be honest, I am currently traveling, so I have not had much time to dig into it today. It is possible that my attempt to remove the Content-Length header is not fully correct. I can try tomorrow to spend a bit more time on it |
|
So I looked into this further, and without the Content-Length header, I’m hitting request timeouts. Because of that, I wasn’t able to remove the “archive without the Content-Length header” part. From what I found, for a signed “simple upload” to GCS, Content-Length is effectively required, so I don’t think it’s possible to remove it. @noamzbr Which operating system are you using? Also, do you have an example template that @mishushakov could use to reproduce the issue? @mishushakov What operating system are you on? Also, could you clarify why you’re hesitant to use temporary files? I’m just trying to explore alternative solutions. I think the main issue is the double gzip call, which is why I’m looking for a way to avoid it. Do you have any ideas or suggestions? |
|
I am using Mac and we test on Linux and Windows on the CI. But I think we might be not catching anything in tests. I have a solution in mind that does use a single multiplexed stream instead of two different streams, but on sicko leave now - will implement when I feel better. Thanks |
Thanks a lot! 🔥 (FYI, I updated my previous comment: for a signed “simple upload” to GCS, Content-Length is required, which is why I was getting a timeout.) So yes, I think the solution is to use a single multiplexed stream instead of two separate streams. Either way, thanks for the quick responses, @mishushakov. Let me know if I can help. Otherwise, I’ll let you handle the rest of the PR.
Rest well! 🤒 |
|
Hey both, can you try this branch? |
Summary
tarFileStreamUploadinpackages/js-sdk/src/template/utils.tscallstarFileStreamtwice — once to calculateContent-Lengthby consuming the stream, then again to create the upload body. Since gzip compression is non-deterministic (internal dictionary state, portable mode timing), the second stream can produce a different byte count than the first. This causes Node'sfetchto throw:which surfaces as a
FileUploadError.How to reproduce
.copy()with a directory containing many files (e.g. ~100 files)RequestContentLengthMismatchErrorFix
Buffer the tar.gz stream into memory once, then use that buffer for both the
Content-Lengthheader and the upload body. This eliminates the two-pass approach entirely.tarFileStreamwith a single callUint8Arraybufferbuffer.lengthfor content length and the buffer itself as the upload body