Skip to content

Conversation

@jterapin
Copy link
Contributor

@jterapin jterapin commented Dec 6, 2025

This PR intends to improve performance for S3's PutObject and UploadPart .


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

  1. To make sure we include your contribution in the release notes, please make sure to add description entry for your changes in the "unreleased changes" section of the CHANGELOG.md file (at corresponding gem). For the description entry, please make sure it lives in one line and starts with Feature or Issue in the correct format.

  2. For generated code changes, please checkout below instructions first:
    https://github.com/aws/aws-sdk-ruby/blob/version-3/CONTRIBUTING.md

Thank you for your contribution!

@aws aws deleted a comment from github-actions bot Dec 7, 2025
@aws aws deleted a comment from github-actions bot Dec 7, 2025
@github-actions
Copy link

github-actions bot commented Dec 7, 2025

Detected 1 possible performance regressions:

  • aws-sdk-s3.put_object_small_allocated_kb - z-score regression: 81.42 -> 83.44. Z-score: 31.52

Comment on lines -12 to +15
MINIMUM_CORE_VERSION = "3.239.1"
MINIMUM_CORE_VERSION = "3.240.0"

# Minimum `aws-sdk-core` version for new S3 gem builds
MINIMUM_CORE_VERSION_S3 = "3.234.0"
MINIMUM_CORE_VERSION_S3 = "3.240.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-reminder to update the minimum core version when this is ready for release.

if @trailer_io
return @trailer_io.read(length, buf)
end
def read(_length = nil, buf = nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length parameter is ignored since we always expect 16KB (default) or whatever is set by @chunk_size. This is an internal implementation accessed by either our patch or Net::HTTP.

I could make this more dynamic but that gets tricky... the #size method needs to know the chunk size ahead of time to calculate the overhead bytes plus actual IO size. Supporting variable-length reads would break that calculation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the benefit of it being dynamic? If we're always expecting 16KB and this is an established expectation, I think it's fine that we don't use the length input. Not sure how feasible this is, but we could consider removing the parameter altogether in the future as to not cause confusion when using this method.

Copy link
Contributor Author

@jterapin jterapin Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't think we could remove the parameter altogether due to the interfaces... this IO-like object is being read by http-client (net-http in our case) and length is a required parameter. Removing the parameter will break.

What would be the benefit of it being dynamic?

In the scenario where we don't use the patch - net-http could change the way their reads work - like increasing the chunk size and etc. We are bottlenecking the cap to 16KB here. We won't know if things have changed unless net-http checks the length on their end.

Comment on lines +23 to 25
# See: https://github.com/ruby/net-http/issues/205
def supply_default_content_type
return if Thread.current[:net_http_skip_default_content_type]
Copy link
Contributor Author

@jterapin jterapin Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could eventually get rid of this - given that net-http no longer sets the default content type in their recent patch. We may want to be move this in later versions of Ruby when the default net-http gem version is the recent one.

Comment on lines +170 to +171
# Need to discuss
expect(resp.context.http_request.body.instance_variable_get(:@io).read).to eq(data)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized the body gets replaced with TrailerIO when we use the unsigned body path. I'm wondering if customers ever access this body object directly? If they do, TrailerIO doesn't have all the methods that StringIO would have, which could break their code.

One option is to swap the body back to the original IO object after we finish sending the request. That way customers would see the original object if they need access later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's feasible, it may be better to swap the body back to the original IO object to minimize potentially breaking customers. However, I'm assuming most of these details should be API private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be easy to swap back to the original - given that TrailerIO has access to the original io object. On that question... I would assume so but you never know!

@jterapin jterapin marked this pull request as ready for review December 8, 2025 17:36
@jterapin jterapin changed the title [WIP] Support S3 unsigned body Support S3 unsigned body Dec 8, 2025
Comment on lines +164 to +167
next unless %w[PutObject UploadPart].include?(key)

operation['authType'] = 'v4-unsigned-body'
operation['unsignedPayload'] = true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add test cases for unsigned body since its already covered by our test cases here: https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-core/spec/aws/plugins/checksum_algorithm_spec.rb#L24

Copy link
Contributor

@richardwang1124 richardwang1124 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think it looks good overall to me. Added a few comments.

headers[header_name] = calculate_checksum(
checksum_properties[:algorithm],
body
context.http_request.body
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change from context.http_request.body_contents to context.http_request.body intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. And yes! This was the root cause why the memory was high during testing. Calling #body_contents on the request loads the entire body into memory. It is unknown on why this have been done previously when I asked around so I think just having that body here makes sense (I do a rewind much later before we start calculating checksums).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do a rewind after calculating as well?

if @trailer_io
return @trailer_io.read(length, buf)
end
def read(_length = nil, buf = nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the benefit of it being dynamic? If we're always expecting 16KB and this is an established expectation, I think it's fine that we don't use the length input. Not sure how feasible this is, but we could consider removing the parameter altogether in the future as to not cause confusion when using this method.

Comment on lines +170 to +171
# Need to discuss
expect(resp.context.http_request.body.instance_variable_get(:@io).read).to eq(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's feasible, it may be better to swap the body back to the original IO object to minimize potentially breaking customers. However, I'm assuming most of these details should be API private?

headers[header_name] = calculate_checksum(
checksum_properties[:algorithm],
body
context.http_request.body
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do a rewind after calculating as well?

@digest = ChecksumAlgorithm.digest_for_algorithm(@algorithm)
@chunk_size = Thread.current[:net_http_override_body_stream_chunk] || MIN_CHUNK_SIZE
@overhead_bytes = calculate_overhead(@chunk_size)
@max_chunk_size = @chunk_size - @overhead_bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find max_chunk_size to be a little bit of a confusing name. Why is it less than chunk size? This is actually the chunk size that we're reading from the underlying io right? so maybe like... base_chunk_size or underlying_chunk_size or something like that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants