-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Support S3 unsigned body #3323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: version-3
Are you sure you want to change the base?
Support S3 unsigned body #3323
Conversation
|
Detected 1 possible performance regressions:
|
| MINIMUM_CORE_VERSION = "3.239.1" | ||
| MINIMUM_CORE_VERSION = "3.240.0" | ||
|
|
||
| # Minimum `aws-sdk-core` version for new S3 gem builds | ||
| MINIMUM_CORE_VERSION_S3 = "3.234.0" | ||
| MINIMUM_CORE_VERSION_S3 = "3.240.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-reminder to update the minimum core version when this is ready for release.
| if @trailer_io | ||
| return @trailer_io.read(length, buf) | ||
| end | ||
| def read(_length = nil, buf = nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length parameter is ignored since we always expect 16KB (default) or whatever is set by @chunk_size. This is an internal implementation accessed by either our patch or Net::HTTP.
I could make this more dynamic but that gets tricky... the #size method needs to know the chunk size ahead of time to calculate the overhead bytes plus actual IO size. Supporting variable-length reads would break that calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the benefit of it being dynamic? If we're always expecting 16KB and this is an established expectation, I think it's fine that we don't use the length input. Not sure how feasible this is, but we could consider removing the parameter altogether in the future as to not cause confusion when using this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I don't think we could remove the parameter altogether due to the interfaces... this IO-like object is being read by http-client (net-http in our case) and length is a required parameter. Removing the parameter will break.
What would be the benefit of it being dynamic?
In the scenario where we don't use the patch - net-http could change the way their reads work - like increasing the chunk size and etc. We are bottlenecking the cap to 16KB here. We won't know if things have changed unless net-http checks the length on their end.
| # See: https://github.com/ruby/net-http/issues/205 | ||
| def supply_default_content_type | ||
| return if Thread.current[:net_http_skip_default_content_type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could eventually get rid of this - given that net-http no longer sets the default content type in their recent patch. We may want to be move this in later versions of Ruby when the default net-http gem version is the recent one.
| # Need to discuss | ||
| expect(resp.context.http_request.body.instance_variable_get(:@io).read).to eq(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized the body gets replaced with TrailerIO when we use the unsigned body path. I'm wondering if customers ever access this body object directly? If they do, TrailerIO doesn't have all the methods that StringIO would have, which could break their code.
One option is to swap the body back to the original IO object after we finish sending the request. That way customers would see the original object if they need access later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's feasible, it may be better to swap the body back to the original IO object to minimize potentially breaking customers. However, I'm assuming most of these details should be API private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be easy to swap back to the original - given that TrailerIO has access to the original io object. On that question... I would assume so but you never know!
| next unless %w[PutObject UploadPart].include?(key) | ||
|
|
||
| operation['authType'] = 'v4-unsigned-body' | ||
| operation['unsignedPayload'] = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add test cases for unsigned body since its already covered by our test cases here: https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-core/spec/aws/plugins/checksum_algorithm_spec.rb#L24
richardwang1124
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think it looks good overall to me. Added a few comments.
| headers[header_name] = calculate_checksum( | ||
| checksum_properties[:algorithm], | ||
| body | ||
| context.http_request.body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this change from context.http_request.body_contents to context.http_request.body intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. And yes! This was the root cause why the memory was high during testing. Calling #body_contents on the request loads the entire body into memory. It is unknown on why this have been done previously when I asked around so I think just having that body here makes sense (I do a rewind much later before we start calculating checksums).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do a rewind after calculating as well?
| if @trailer_io | ||
| return @trailer_io.read(length, buf) | ||
| end | ||
| def read(_length = nil, buf = nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the benefit of it being dynamic? If we're always expecting 16KB and this is an established expectation, I think it's fine that we don't use the length input. Not sure how feasible this is, but we could consider removing the parameter altogether in the future as to not cause confusion when using this method.
| # Need to discuss | ||
| expect(resp.context.http_request.body.instance_variable_get(:@io).read).to eq(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's feasible, it may be better to swap the body back to the original IO object to minimize potentially breaking customers. However, I'm assuming most of these details should be API private?
| headers[header_name] = calculate_checksum( | ||
| checksum_properties[:algorithm], | ||
| body | ||
| context.http_request.body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do a rewind after calculating as well?
| @digest = ChecksumAlgorithm.digest_for_algorithm(@algorithm) | ||
| @chunk_size = Thread.current[:net_http_override_body_stream_chunk] || MIN_CHUNK_SIZE | ||
| @overhead_bytes = calculate_overhead(@chunk_size) | ||
| @max_chunk_size = @chunk_size - @overhead_bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find max_chunk_size to be a little bit of a confusing name. Why is it less than chunk size? This is actually the chunk size that we're reading from the underlying io right? so maybe like... base_chunk_size or underlying_chunk_size or something like that?
This PR intends to improve performance for S3's
PutObjectandUploadPart.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
To make sure we include your contribution in the release notes, please make sure to add description entry for your changes in the "unreleased changes" section of the
CHANGELOG.mdfile (at corresponding gem). For the description entry, please make sure it lives in one line and starts withFeatureorIssuein the correct format.For generated code changes, please checkout below instructions first:
https://github.com/aws/aws-sdk-ruby/blob/version-3/CONTRIBUTING.md
Thank you for your contribution!