feat: add S3Resource for large verifier payloads#64
feat: add S3Resource for large verifier payloads#64andrew-stelmach-fleet wants to merge 1 commit intomainfrom
Conversation
Adds S3Resource class to fleet-sdk that wraps an S3 URL and provides transparent content access. Used by the harness to pass large payloads (e.g. conversation transcripts up to ~200MB) to verifiers via S3 instead of through HTTP request bodies or Temporal activity params. Key features: - Lazy download with caching (content/json/download methods) - Serializable via to_dict/from_dict for Temporal param passing - is_s3_resource_dict() for detection in activity params Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| return self._cached_content | ||
| data = self._download() | ||
| self._cached_content = data.decode("utf-8") | ||
| return self._cached_content |
There was a problem hiding this comment.
Dual caching doubles memory for large payloads
Medium Severity
The PR states payloads can reach ~200MB. When content or json() is accessed, _download() caches the raw bytes in _cached_bytes, and then content separately caches the decoded string in _cached_content. Both caches are retained permanently, so a 200MB S3 object results in ~400MB+ of memory (bytes + Python str, which may use even more due to internal representation). The download() and download_temp() methods also go through _download(), permanently caching the full payload in memory even when the user only needs the data written to disk.
| try: | ||
| os.write(fd, data) | ||
| finally: | ||
| os.close(fd) |
There was a problem hiding this comment.
Partial write risk in download_temp for large payloads
Medium Severity
download_temp uses os.write(fd, data) without checking its return value. Unlike file.write() used in download(), the low-level os.write() wraps the POSIX write() syscall, which can perform partial writes — returning fewer bytes than requested — due to disk-full conditions, resource limits, or signal interruption. For payloads up to ~200MB, this could silently produce a truncated temp file with no error raised.


Summary
S3Resourceclass (fleet._async.resources.s3) that wraps an S3 URL and provides transparent lazy download/caching of contentfleet.S3Resourcefor easy access in verifier codeHow it works
API
Tests
tests/test_s3_resource.pycovering init, serialization, download, caching, reprpytest tests/test_s3_resource.py -vCompanion PR
🤖 Generated with Claude Code