Skip to content

feat: add S3Resource for large verifier payloads#64

Open
andrew-stelmach-fleet wants to merge 1 commit intomainfrom
feat/s3-resource
Open

feat: add S3Resource for large verifier payloads#64
andrew-stelmach-fleet wants to merge 1 commit intomainfrom
feat/s3-resource

Conversation

@andrew-stelmach-fleet
Copy link
Contributor

Summary

  • Adds S3Resource class (fleet._async.resources.s3) that wraps an S3 URL and provides transparent lazy download/caching of content
  • Exported as fleet.S3Resource for easy access in verifier code
  • Used by the harness to pass large payloads (conversation transcripts, up to ~200MB) to verifiers via S3 instead of through HTTP request bodies or Temporal activity params

How it works

  1. Harness uploads conversation data to S3 (see companion PR in theseus)
  2. S3 URL is passed to the verifier activity as a structured param
  3. Activity resolves the S3 URL back to conversation data before calling the verifier

API

from fleet import S3Resource

# Create from S3 URL
resource = S3Resource(s3_url="s3://bucket/key.json")

# Access content (lazy download, cached)
text = resource.content          # UTF-8 string
data = resource.json()           # Parsed JSON
raw = resource.content_bytes     # Raw bytes
resource.download("/tmp/data.json")  # Save to file

# Serialize/deserialize for Temporal params
d = resource.to_dict()
restored = S3Resource.from_dict(d)
S3Resource.is_s3_resource_dict(d)  # True

Tests

  • 31 tests in tests/test_s3_resource.py covering init, serialization, download, caching, repr
  • Run: pytest tests/test_s3_resource.py -v

Companion PR

  • theseus: [feat/s3-resource-verifier] — harness-side upload and activity resolution

🤖 Generated with Claude Code

Adds S3Resource class to fleet-sdk that wraps an S3 URL and provides
transparent content access. Used by the harness to pass large payloads
(e.g. conversation transcripts up to ~200MB) to verifiers via S3 instead
of through HTTP request bodies or Temporal activity params.

Key features:
- Lazy download with caching (content/json/download methods)
- Serializable via to_dict/from_dict for Temporal param passing
- is_s3_resource_dict() for detection in activity params

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

return self._cached_content
data = self._download()
self._cached_content = data.decode("utf-8")
return self._cached_content
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dual caching doubles memory for large payloads

Medium Severity

The PR states payloads can reach ~200MB. When content or json() is accessed, _download() caches the raw bytes in _cached_bytes, and then content separately caches the decoded string in _cached_content. Both caches are retained permanently, so a 200MB S3 object results in ~400MB+ of memory (bytes + Python str, which may use even more due to internal representation). The download() and download_temp() methods also go through _download(), permanently caching the full payload in memory even when the user only needs the data written to disk.

Fix in Cursor Fix in Web

try:
os.write(fd, data)
finally:
os.close(fd)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial write risk in download_temp for large payloads

Medium Severity

download_temp uses os.write(fd, data) without checking its return value. Unlike file.write() used in download(), the low-level os.write() wraps the POSIX write() syscall, which can perform partial writes — returning fewer bytes than requested — due to disk-full conditions, resource limits, or signal interruption. For payloads up to ~200MB, this could silently produce a truncated temp file with no error raised.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant