Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis pull request introduces comprehensive Cloudflare Workers platform support to the SEBS benchmarking framework, including native worker and container deployment modes, R2 object storage, Durable Objects NoSQL storage, complete request/response handlers for Python and Node.js, CLI-based Wrangler integration, benchmark applications, and supporting configuration and documentation infrastructure. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Worker as Cloudflare Worker
participant Handler as Request Handler
participant Storage as R2/NoSQL
participant Proxy as Worker Proxy
participant Backend as Backend Service
Client->>Worker: HTTP Request
Worker->>Handler: Parse event + metadata
Handler->>Handler: Extract bucket/key/operation
Handler->>Proxy: Route to Storage Proxy
Proxy->>Backend: API Call to R2/Durable Object
Backend-->>Proxy: Response
Proxy-->>Handler: Data/Result
Handler->>Handler: Compute metrics (timing/size)
Handler-->>Worker: Structured Result
Worker-->>Client: JSON Response (with measurements)
sequenceDiagram
participant SEBS as SEBS Framework
participant CLI as CloudflareCLI
participant Docker as Docker Engine
participant Wrangler as Wrangler Tool
participant Cloudflare as Cloudflare API
SEBS->>CLI: initialize()
CLI->>Docker: Pull/Build image
Docker-->>CLI: Image ready
SEBS->>CLI: package_code()
CLI->>Docker: Run container
Docker->>Docker: npm install / pip install
Docker-->>CLI: Return dependencies
SEBS->>CLI: generate_wrangler_toml()
CLI->>Docker: Write configuration
SEBS->>CLI: deploy()
CLI->>Wrangler: wrangler deploy
Wrangler->>Cloudflare: Upload worker
Cloudflare-->>Wrangler: Deployed
Wrangler-->>CLI: Success
CLI-->>SEBS: Worker URL
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can disable the changed files summary in the walkthrough.Disable the |
There was a problem hiding this comment.
Actionable comments posted: 11
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sebs.py (1)
102-105: Update help text to reflect Cloudflare container support.The help text states "(AWS only)" but this PR adds Cloudflare container deployment support. Update the documentation to reflect the expanded platform support.
🔎 Proposed fix
@click.option( "--container-deployment/--no-container-deployment", default=False, - help="Deploy functions as containers (AWS only). When enabled, functions are packaged as container images and pushed to Amazon ECR." + help="Deploy functions as containers. When enabled, functions are packaged as container images (AWS: Amazon ECR, Cloudflare: Container Workers)." )
🤖 Fix all issues with AI Agents
In @benchmarks/100.webapps/120.uploader/python/function_cloudflare.py:
- Around line 8-9: Move the module-level call storage.storage.get_instance()
into the request handler so the storage client is lazily initialized; inside the
handler, try to get the singleton via storage.storage.get_instance(), and if
that raises/returns None call storage.init_instance() (or the appropriate
initializer) and then call get_instance() again, assigning the result to a local
variable (e.g., client) before using it; update any references to the
module-level client to use this local/client-in-handler to avoid import-time
failures.
In @benchmarks/100.webapps/130.crud-api/nodejs/function.js:
- Around line 6-13: The addProduct function calls the asynchronous
nosqlClient.insert but does not await it; change addProduct to be async and
await the nosqlClient.insert call (or return its promise) so the handler waits
for completion; update the function signature (addProduct) and the call to
nosqlClient.insert to use await and propagate or handle errors as appropriate to
avoid fire-and-forget behavior.
In @benchmarks/300.utilities/311.compression/nodejs/function.js:
- Around line 108-113: The archive is being written into the source directory
(archive_path = path.join(download_path, archive_name)) and may be picked up by
collectFiles while creating the tar, corrupting the output; change
createTarGzArchive invocation to write the archive outside the directory being
archived (e.g., create the tar in a temp dir or sibling path) or exclude the
archive file from collection by ensuring archive_path is outside download_path
or by passing an explicit exclude for archive_name to collectFiles; update
references around archive_name, archive_path, download_path, and the call to
createTarGzArchive (and collectFiles if applicable) so the archive isn’t placed
inside the directory being archived.
In @benchmarks/300.utilities/311.compression/nodejs/package.json:
- Around line 6-8: The package.json currently depends on "uuid": "^10.0.0" which
requires Node.js 16+, but systems.json still lists Node.js 14 as a supported
runtime; fix by either removing Node.js 14 from the supported runtimes in
systems.json (so only Node 16+ remains) or by downgrading the package.json uuid
dependency to a Node‑14 compatible version (e.g., change "uuid" to a 9.x range
like ^9.0.0); update the corresponding file (package.json for the dependency
change or systems.json for the runtime change) and ensure any CI/test configs
are consistent with the chosen option.
In @benchmarks/wrappers/cloudflare/nodejs/build.js:
- Around line 64-130: The regex-based transformations in asyncNosqlPlugin
(inside build.onLoad for function.js) are unsafe: the function declaration
replacement (`/^(\s*)function\s+(\w+)\s*\(/gm`) forces all functions async and
the line-by-line call-rewrite regex turns constructor calls like `new
UserClass(...)` into invalid `new await UserClass(...)`. Replace the in-plugin
regex approach with a proper AST-based transform run before esbuild (e.g., use
@babel/parser + @babel/traverse or SWC/Recast) to selectively: (1) add await
only to actual async call expressions to nosqlClient methods and user-defined
async functions, (2) avoid modifying constructors (skip NewExpression nodes) and
generators, and (3) preserve non-async function declarations instead of globally
prefixing with async; alternatively restrict the plugin to only operate on
explicitly annotated sections rather than arbitrary regex edits.
In @benchmarks/wrappers/cloudflare/nodejs/container/nosql.js:
- Around line 39-48: The current error handling reads the response body multiple
times causing failures; change it to read the body once into a variable (e.g.,
const body = await response.text()), then attempt to parse JSON from that body
to extract errorData.error and fall back to the raw body if parsing fails,
assigning the result to errorMsg and then throwing the Error using that single
errorMsg (operate on the same response/body and use the existing response and
errorMsg variables).
- Around line 86-99: The query method contains leftover debug console.error
statements that pollute production logs; remove all console.error lines inside
async query (which calls this._make_request) and keep the rest of the
implementation intact (construct params, await this._make_request('query',
params), compute const items = result.items || [] and return items) so behavior
is unchanged.
In @benchmarks/wrappers/cloudflare/nodejs/container/package.json:
- Around line 7-9: Update the dependency version for "@cloudflare/containers" in
package.json because "^1.0.0" has no published releases; change the version
string to a valid published range such as "^0.0.20" or the specific latest
"0.0.31", then reinstall (npm install) and update package-lock.json so container
builds succeed.
In @benchmarks/wrappers/cloudflare/nodejs/container/storage.js:
- Around line 15-18: The class contains a duplicate static field declaration
"worker_url" defined twice; remove the redundant declaration so only a single
"static worker_url = null; // Set by handler from X-Worker-URL header" remains
(keep the comment), and verify no other duplicate static declarations exist in
the class and that all references to WorkerStorage.worker_url (or similar) still
resolve.
In @benchmarks/wrappers/cloudflare/nodejs/container/worker.js:
- Around line 31-38: The query method currently ignores its keyPrefix argument
and returns all stored items; update async query(keyPrefix) to filter the
listing by the prefix (e.g., only include entries where the key startsWith
keyPrefix) before pushing values into items so queries return the intended
subset; modify the loop over this.ctx.storage.list() (the [k, v] destructuring)
to check keyPrefix (handle undefined/empty) and only push v when the key matches
the prefix.
- Around line 309-324: Remove the temporary debug console.log statements inside
the '/r2/upload' branch: delete the logs that print bucket/key, env.R2
existence, received byte length, and the putResult (the lines using console.log
before and after the put). Keep the try/catch and the error reporting
(console.error for env.R2.put errors) and retain the successful behavior (no
debug prints); locate these in the code that checks url.pathname ===
'/r2/upload' and uses request.arrayBuffer() and env.R2.put(key, data).
In @benchmarks/wrappers/cloudflare/nodejs/nosql.js:
- Around line 62-66: The update method uses a non-atomic read-modify-write (the
update function calls get then insert), which allows concurrent updates to
clobber each other; either document this limitation or make update atomic: add a
version/etag field to records and implement optimistic concurrency (in update:
loop read existing via get, compute merged, set incremented version, attempt
insert only if version matches and retry on conflict), or switch to a Cloudflare
Durable Object/transactional API to perform the merge server-side; reference the
update, get, and insert functions when applying the change.
In @benchmarks/wrappers/cloudflare/nodejs/storage.js:
- Around line 151-176: download_stream has inconsistent handling of missing
objects: it returns null when R2 returns no object (obj == null) but throws if
no backend or if an R2 object lacks arrayBuffer()/text(); make the behavior
consistent by choosing one policy (prefer returning null for missing content)
and apply it across all branches: in download_stream, replace the final throw
and the fallback for objects without arrayBuffer/text with returning null, and
ensure the local filesystem branch also returns null rather than throwing when
the file is absent; update logic around instance.handle, obj,
obj.arrayBuffer()/obj.text(), and the fs/localPath check so every "not found"
path returns null and only unexpected failures (e.g., actual IO errors) throw.
- Around line 87-89: The download method uses this.download_stream instead of
the resolved singleton instance; update the call in async download(__bucket,
key, filepath) to use the resolved instance (storage.instance || this) by
replacing this.download_stream with instance.download_stream so the singleton
pattern is respected (refer to download and download_stream and
storage.instance).
In @benchmarks/wrappers/cloudflare/python/container/handler.py:
- Around line 135-137: The code computes an unused timing variable: it sets end
= datetime.datetime.now().timestamp() and compute_time = end - begin but never
uses compute_time; either remove those two lines (end and compute_time) or
include compute_time in the response/log (for example attach it to the response
JSON or a response header). Locate the timing code that uses begin, end and
compute_time in handler.py and either delete the end/compute_time assignments or
propagate compute_time into the outgoing response or a logging call (keeping the
begin variable if still needed elsewhere).
In @benchmarks/wrappers/cloudflare/python/container/storage.py:
- Around line 71-77: Add a timeout to the urllib.request.urlopen call and chain
the raised RuntimeError to preserve the original traceback: pass a sensible
timeout (e.g., 10 seconds) into the urllib.request.urlopen(req) invocation so
the call cannot hang indefinitely, and change the error re-raise to use "raise
RuntimeError(...)" with "from e" to chain the original exception thrown in the
except block referencing the existing exception variable e and the current
RuntimeError raise site.
- Around line 138-144: In _upload_with_key wrap the urlopen call with an
explicit timeout (pass the same timeout constant used elsewhere, e.g.,
timeout=DEFAULT_TIMEOUT or self._timeout) — change urllib.request.urlopen(req)
to urllib.request.urlopen(req, timeout=...) — and when re-raising the failure
convert to RuntimeError using exception chaining (raise RuntimeError(f"Failed to
upload to R2: {e}") from e) so the original traceback is preserved; keep the
existing log/print of the error and include the request/ key context if
available.
- Around line 91-101: The download_stream function currently calls
urllib.request.urlopen(url) without a timeout and re-raises RuntimeError without
exception chaining; update the call to pass the same timeout value used in
upload (e.g., urlopen(url, timeout=TIMEOUT)) and in both except blocks
(urllib.error.HTTPError and generic Exception) raise the RuntimeError using
exception chaining (raise RuntimeError(...) from e) so the original traceback is
preserved; reference the download_stream function, urllib.request.urlopen,
urllib.error.HTTPError, and the exception variable e when making these edits.
- Around line 168-201: The list request in download_directory lacks a timeout
and the raised RuntimeError loses the original exception context; add a timeout
to urllib.request.urlopen(req, timeout=...) when listing objects and ensure each
download call uses a timeout (update self.download signature to accept/forward a
timeout parameter or pass a configured timeout value), and re-raise the error
with exception chaining using "raise RuntimeError(f'Failed to download
directory: {e}') from e" so the original traceback is preserved.
In @benchmarks/wrappers/cloudflare/python/nosql.py:
- Around line 115-122: The class nosql_kv has incorrect references to nosql
instead of itself; in the static method init_instance you should instantiate and
assign nosql_kv.instance = nosql_kv() and set that instance's env (e.g.,
nosql_kv.instance.env = entry.env) rather than referencing nosql; update the
init_instance method (and any other places in this class referencing nosql) to
use the correct class name nosql_kv and the instance class attribute instance.
- Around line 8-15: The class uses the wrong identifier `nosql` instead of the
actual class `nosql_do`; change the type annotation and all
assignments/instantiation to refer to `nosql_do` (e.g., replace
`Optional["nosql"]` with `Optional["nosql_do"]`, replace `nosql.instance =
nosql()` with `nosql_do.instance = nosql_do()`) and ensure the binding
assignment uses `nosql_do.DO_BINDING_NAME` and sets `nosql_do.instance.binding =
getattr(entry.env, nosql_do.DO_BINDING_NAME)`.
- Around line 203-207: The get_instance method in class nosql_kv is creating the
wrong class: it calls nosql() (the nosql_do class) instead of instantiating
nosql_kv(); update the body of get_instance to set nosql.instance = nosql_kv()
when instance is None and return nosql.instance, ensuring the singleton
references the nosql_kv class rather than nosql.
- Around line 190-195: The code currently does a fragile get_res.replace("'",
"\"") on the result of run_sync(self.get_table(table_name).get(key)) before
json.loads; remove this blind replacement and instead ensure the stored value is
valid JSON (fix insert/update to write proper JSON), or if the backend returns a
Python literal use ast.literal_eval on get_res and then convert to JSON/dict, or
if get() can return an already-parsed object skip json.loads entirely; update
the logic around run_sync(self.get_table(table_name).get(key)), get_res, and the
json.loads call to handle proper parsing without string-replacement.
- Around line 161-168: The get method in nosql_kv returns the raw result instead
of a parsed dict; update nosql_kv.get to mirror query by checking for a
None/empty response and calling json.loads on the returned string before
returning it. Locate the get method that calls
run_sync(self.get_table(table_name).get(self.key_maker(primary_key,
secondary_key))) and replace the direct return with logic that handles a None
result and returns json.loads(get_res) (or the already-parsed dict) so callers
always receive a dict or None.
In @benchmarks/wrappers/cloudflare/python/storage.py:
- Line 60: The code attempts to call append on written_files which is
initialized as a set (self.written_files) causing an AttributeError; change the
call in the method that currently does self.written_files.append(filepath) to
use the set API (self.written_files.add(filepath)) or, if order/duplicates
matter, change the initializer from set() to [] and keep append—prefer using
self.written_files.add(filepath) to preserve the existing set semantics.
- Around line 111-113: get_instance currently uses invalid syntax raise "must
init storage singleton first" and is missing the @staticmethod decorator; fix by
adding @staticmethod above def get_instance() and replace the string raise with
raising a proper Exception (e.g., raise RuntimeError("must init storage
singleton first")) so the method correctly signals the error when
storage.instance is None; keep the method body referencing storage.instance as
written.
In @config/cloudflare-test.json:
- Around line 2-13: The runtime block currently declares `"runtime": {
"language": "nodejs", "version": "18" }` which is incorrect for Cloudflare
Workers; update the experiments.runtime configuration by removing the Node.js
language/version pair and instead set the Cloudflare Workers compatibility flags
(e.g., add `compatibility_date` with a date >= "2024-09-23" and `nodejs_compat:
true` when Node.js APIs are required), keeping the `"experiments"` object and
other fields intact and ensuring `nodejs_compat` is documented in the config
comment or README for future maintainers.
In @config/systems.json:
- Around line 389-391: The UUID version in config/systems.json ("packages": {
"uuid": "3.4.0" }) conflicts with the compression benchmark's package.json which
uses "uuid": "^10.0.0"; pick one canonical version and update the other to match
(either change config/systems.json uuid to "^10.0.0" or change the benchmark's
package.json uuid to "3.4.0") so all deployments use the same uuid package
version across providers; ensure you update the package.json or config entry and
run dependency resolution/tests to confirm compatibility.
In @experiments.json:
- Around line 52-54: The committed experiments.json exposes a real credential
value under the "credentials" -> "account_id" field; remove the sensitive value
from the PR and either delete the file from the branch or replace the value with
a placeholder like "<YOUR_ACCOUNT_ID>" in experiments.json, then add
experiments.json to .gitignore (or move this test output to a non-committed
sample file) to prevent future commits; ensure you also purge the secret from
the PR history or force-push a branch without the file if necessary.
In @package.json:
- Line 1: Delete the empty root package.json file (remove package.json from the
repo) because this project uses isolated benchmark packages; if a root
package.json is actually required, replace it with a meaningful file containing
the intended configuration or add a clear explanation in the repo README
indicating its purpose instead of leaving an empty {}.
In @sebs/cloudflare/cloudflare.py:
- Around line 969-981: The final line returning (directory, total_size, "") is
unreachable (there is an earlier return image_tag) and uses undefined
total_size; remove this leftover return statement or, if the intent was to
return size info, move calculation/definition of total_size into the same scope
and replace the earlier return image_tag accordingly. In short: delete the
unreachable return (directory, total_size, "") from the end of the function in
cloudflare.py (or consolidate/define total_size and adjust the earlier return to
a single consistent return value) so no undefined variable remains.
- Line 148: Credential verification and other HTTP calls in cloudflare.py are
missing timeouts which can cause indefinite hangs; define a default timeout
constant (e.g., DEFAULT_REQUEST_TIMEOUT = 10) near the module top and pass it to
all requests calls (e.g., change
requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers) to
requests.get(..., headers=headers, timeout=DEFAULT_REQUEST_TIMEOUT)) including
the worker-exists check and the workers.dev subdomain fetch, and update any
requests.post/put/delete calls similarly; optionally wrap these calls to catch
requests.exceptions.Timeout/RequestException where appropriate.
- Line 73: The initialize method uses a mutable default argument (config:
Dict[str, str] = {}); change the signature to use None as the default (e.g.,
config: Optional[Dict[str, str]] = None) and inside the initialize method set
config = {} if config is None to avoid sharing the same dict across calls;
update any type annotations/uses accordingly and ensure references to initialize
and its parameter name remain consistent.
In @sebs/cloudflare/durable_objects.py:
- Around line 100-113: The update_cache method uses a non-existent private
attribute self._cache_client which will raise AttributeError; change references
in update_cache to the public property self.cache_client (i.e., call
self.cache_client.update_nosql(...)) so it uses the NoSQLStorage-exposed client;
verify any other uses in the DurableObjects class for similar _cache_client
references and replace them with cache_client to remain consistent with the
parent API.
- Around line 185-203: The method remove_table mutates the inner dict while
iterating over tables.items(), which can raise RuntimeError; change the logic in
remove_table to first locate the matching benchmark and table_key without
deleting during iteration (e.g., iterate over a snapshot like
list(tables.items()) or record table_key to delete), then perform the deletion
after the loop; update references to self._tables, the local variable tables,
and the inner loop that checks table_name == name to reflect this two-step
find-then-delete approach and keep the logging/return behavior unchanged.
- Around line 30-39: The constructor __init__ accepts a credentials:
CloudflareCredentials but never stores it, causing _get_auth_headers to
reference missing self._credentials; fix by assigning the passed credentials to
an instance attribute (self._credentials = credentials) in __init__ so
_get_auth_headers can use it, or if credentials truly aren't needed remove the
credentials parameter and any references to _get_auth_headers; update only the
DurableObjects class __init__ and related references (_credentials and
_get_auth_headers) to keep consistency.
In @sebs/cloudflare/function.py:
- Around line 56-62: The deserializer uses hardcoded keys "Library" and "HTTP"
that don't match the serialized trigger type strings; replace the static mapping
with the actual serialized type names by using LibraryTrigger.typename() and
HTTPTrigger.typename() (e.g. mapping = {LibraryTrigger.typename():
LibraryTrigger, HTTPTrigger.typename(): HTTPTrigger}) and then lookup
trigger_type = mapping.get(trigger["type"]); keep the assert and call
ret.add_trigger(trigger_type.deserialize(trigger)) as before so deserialization
succeeds.
In @sebs/cloudflare/r2.py:
- Around line 81-84: The requests calls in _create_bucket, list_buckets, and
remove_bucket are missing timeouts and can hang indefinitely; update each
requests.<method> call (e.g., the POST in _create_bucket, the GET in
list_buckets, and the DELETE in remove_bucket) to pass a timeout argument (use a
shared constant or self._timeout if available, e.g., timeout=DEFAULT_TIMEOUT)
and ensure the same timeout is applied consistently across all requests calls in
this class; optionally, wrap the calls to catch requests.exceptions.Timeout and
handle/log it in the respective methods (e.g., _create_bucket, list_buckets,
remove_bucket).
- Around line 13-14: The typename() function returns the misspelled string
"Cloudlfare.R2"; update the return value in typename() to the correct service
identifier "Cloudflare.R2" so logs and identifiers use the correct spelling.
- Around line 58-60: The _create_bucket method uses a mutable default for the
buckets parameter (buckets: list[str] = []), which can cause shared-state bugs;
change the signature to accept None (e.g., buckets: Optional[list[str]] = None)
and inside _create_bucket initialize buckets = [] if buckets is None, or update
callers to always pass an explicit list, ensuring the function no longer relies
on a shared default list instance.
In @sebs/cloudflare/README.md:
- Around line 105-118: The "Container Deployment" limitation in the README is
outdated; update the "Container Deployment" and "Container Support Architecture"
sections to reflect that container support has been implemented: remove the "Not
currently implemented" limitation, document that Node.js and Python container
benchmark wrappers were added, and describe the implemented components
(container orchestrator, worker proxy for R2/Durable Objects, and provided
Dockerfiles such as the python Dockerfile) rather than listing what "would be
required"; also remove or update any references to a missing CloudflareContainer
class or missing package_code()/create_function() changes to accurately describe
the current implementation and where container integration is located.
In @sebs/cloudflare/triggers.py:
- Around line 136-142: The async_invoke method currently creates a new
ThreadPoolExecutor on every call (pool = ThreadPoolExecutor()) and never shuts
it down, leaking threads; change this to use a shared executor (e.g., a
module-level or Cloudflare trigger class attribute like _executor) and have
async_invoke submit to that shared executor instead of creating a new one, and
ensure the executor is cleanly shutdown when the application or trigger instance
is closed (implement a close()/shutdown() method or __del__ to call
_executor.shutdown(wait=True)); keep the call to submit(self.sync_invoke,
payload) and the returned Future unchanged.
🟠 Major comments (18)
benchmarks/wrappers/cloudflare/nodejs/container/package.json-7-9 (1)
7-9: Update version constraint to match published releases;^1.0.0has no available versions.The
@cloudflare/containerspackage exists and is an official Cloudflare package, but the version constraint"^1.0.0"is incorrect. The latest published version is0.0.31, and no version1.0.0or higher exists. Change the version to a valid range such as"^0.0.20"or the specific latest"0.0.31"to ensurenpm installsucceeds during container builds.experiments.json-52-54 (1)
52-54: Credentials exposed in committed file.The
account_idvalue appears to be a real Cloudflare account identifier. This file looks like test output data that should not be committed to the repository. Consider:
- Adding
experiments.jsonto.gitignore- Removing this file from the PR
- If this is intentional sample data, replace the account_id with a placeholder like
"<YOUR_ACCOUNT_ID>"benchmarks/300.utilities/311.compression/nodejs/function.js-108-113 (1)
108-113: Archive created inside source directory may include itself.The archive is created at
path.join(download_path, archive_name), which is inside thedownload_pathbeing archived. IfcollectFilesencounters the partially-written archive file during traversal, it will attempt to include it, leading to corrupted output or unpredictable behavior.🔎 Proposed fix: Create archive in a separate location
// Compress directory const compress_begin = Date.now(); const archive_name = `${key}.tar.gz`; - const archive_path = path.join(download_path, archive_name); + const archive_path = path.join('/tmp', `${archive_name}-${uuidv4()}`); await createTarGzArchive(download_path, archive_path); const compress_end = Date.now();Committable suggestion skipped: line range outside the PR's diff.
benchmarks/100.webapps/120.uploader/python/function_cloudflare.py-8-9 (1)
8-9: Module-level storage initialization may fail.
storage.storage.get_instance()is called at module import time. If the storage singleton hasn't been initialized viainit_instance()yet, this will raise an exception (once the bug instorage.pyis fixed). Consider lazy initialization inside the handler.🔎 Proposed fix: Lazy initialization
from . import storage -client = storage.storage.get_instance() +client = None SEBS_USER_AGENT = "SeBS/1.2 (https://github.com/spcl/serverless-benchmarks) SeBS Benchmark Suite/1.2" async def do_request(url, download_path): headers = {'User-Agent': SEBS_USER_AGENT} res = await pyfetch(url, headers=headers) bs = await res.bytes() with open(download_path, 'wb') as f: f.write(bs) def handler(event): + global client + if client is None: + client = storage.storage.get_instance() bucket = event.get('bucket').get('bucket')Committable suggestion skipped: line range outside the PR's diff.
benchmarks/300.utilities/311.compression/nodejs/package.json-6-8 (1)
6-8: Remove Node.js 14 support or downgrade uuid dependency.The
uuidv10.0.0 dependency requires Node.js 16 or newer, butsystems.jsonconfigures Node.js 14 as a supported version, creating an incompatibility. Either remove Node.js 14 from the supported runtimes or downgradeuuidto a version compatible with Node.js 14 (e.g., v9.x).benchmarks/wrappers/cloudflare/nodejs/container/worker.js-31-38 (1)
31-38:querymethod ignoreskeyPrefixparameter.The
querymethod accepts akeyPrefixparameter but doesn't use it. It returns all items from storage instead of filtering by the prefix. This could cause incorrect behavior when querying for specific subsets of data.🔎 Proposed fix
async query(keyPrefix) { - const list = await this.ctx.storage.list(); + const list = await this.ctx.storage.list({ prefix: keyPrefix }); const items = []; for (const [k, v] of list) { items.push(v); } return { items }; }benchmarks/wrappers/cloudflare/nodejs/container/nosql.js-39-48 (1)
39-48: Error handling consumes response body twice.On line 43,
response.json()consumes the response body. If that succeeds, line 43 then tries to accessawait response.text()as a fallback, but this will fail because the body stream is already consumed. Ifresponse.json()throws (line 44-46), thenresponse.text()is called again, which will also fail.🔎 Proposed fix
if (!response.ok) { - let errorMsg; - try { - const errorData = await response.json(); - errorMsg = errorData.error || await response.text(); - } catch { - errorMsg = await response.text(); - } + const errorText = await response.text(); + let errorMsg = errorText; + try { + const errorData = JSON.parse(errorText); + errorMsg = errorData.error || errorText; + } catch { + // Keep errorText as-is + } throw new Error(`NoSQL operation failed: ${errorMsg}`); }benchmarks/wrappers/cloudflare/python/container/handler.py-148-150 (1)
148-150: Fix platform-dependentru_maxrsscalculation for macOS compatibility.
resource.getrusage().ru_maxrssreturns different units across platforms:
- Linux: kilobytes
- macOS/BSD: bytes
The current code divides by 1024.0, which correctly converts Linux kilobytes to megabytes but produces kilobytes instead of megabytes on macOS. Use platform detection to apply the correct conversion factor:
- Linux: divide by 1024
- macOS/BSD: divide by 1,048,576
This same issue exists in
benchmarks/wrappers/cloudflare/python/handler.pyline 111.benchmarks/wrappers/cloudflare/nodejs/build.js-64-130 (1)
64-130: Regex-based async transformation will break constructor calls and violates esbuild best practices.The
asyncNosqlPluginuses regex patterns to transform JavaScript code, which causes two critical issues:
Constructor calls fail: The pattern at lines 97–116 matches
new UserClass(...)and transforms it tonew await UserClass(...), which is invalid JavaScript syntax. Thenewkeyword is neither in thecontrolFlowlist nor handled as a special case.All functions become async: Line 83 converts every function declaration to
async, which breaks:
- Functions returning non-Promise values
- Callbacks and synchronous contexts
- Generators and function-as-value patterns
Anti-pattern for esbuild: This violates esbuild plugin best practices. According to esbuild documentation, AST-heavy transforms should use a proper parser (Babel, SWC, or Recast) as a pre/post-build step rather than regex mutation inside the plugin.
Replace the regex-based transformation with either:
- An external AST transform step using
@babel/parser+@babel/traversebefore esbuild- A dedicated transformer service (SWC, Recast) that preserves code semantics
- Or limit the plugin to simple text replacements on explicitly marked sections
sebs/cloudflare/durable_objects.py-30-39 (1)
30-39:credentialsparameter is unused.The
credentialsparameter is passed to__init__but never stored. The_get_auth_headersmethod referencesself._credentialswhich doesn't exist. Either store the credentials or remove the parameter and the unused method.🔎 Proposed fix
def __init__( self, region: str, cache_client: Cache, resources: Resources, credentials: CloudflareCredentials, ): super().__init__(region, cache_client, resources) + self._credentials = credentials # Tables are just logical names - Durable Objects are accessed via Worker bindings self._tables: Dict[str, Dict[str, str]] = defaultdict(dict)benchmarks/wrappers/cloudflare/python/container/storage.py-71-77 (1)
71-77: Add timeout tourlopencalls to prevent indefinite hangs.Network calls without timeouts can hang indefinitely if the worker proxy is unresponsive, causing benchmark runs to stall. Also, chain the exception with
from efor proper traceback.🔎 Proposed fix
try: - with urllib.request.urlopen(req) as response: + with urllib.request.urlopen(req, timeout=30) as response: result = json.loads(response.read().decode('utf-8')) return result['key'] except Exception as e: print(f"R2 upload error: {e}") - raise RuntimeError(f"Failed to upload to R2: {e}") + raise RuntimeError(f"Failed to upload to R2: {e}") from ebenchmarks/wrappers/cloudflare/python/container/storage.py-138-144 (1)
138-144: Add timeout and exception chaining to _upload_with_key.Consistent with other methods.
🔎 Proposed fix
try: - with urllib.request.urlopen(req) as response: + with urllib.request.urlopen(req, timeout=30) as response: result = json.loads(response.read().decode('utf-8')) print(f"[storage._upload_with_key] Upload successful, key={result['key']}") except Exception as e: print(f"R2 upload error: {e}") - raise RuntimeError(f"Failed to upload to R2: {e}") + raise RuntimeError(f"Failed to upload to R2: {e}") from ebenchmarks/wrappers/cloudflare/python/container/storage.py-91-101 (1)
91-101: Add timeout and exception chaining to download_stream.Same issue as upload - missing timeout and exception chaining.
🔎 Proposed fix
try: - with urllib.request.urlopen(url) as response: + with urllib.request.urlopen(url, timeout=30) as response: return response.read() except urllib.error.HTTPError as e: if e.code == 404: - raise RuntimeError(f"Object not found: {key}") + raise RuntimeError(f"Object not found: {key}") from e else: - raise RuntimeError(f"Failed to download from R2: {e}") + raise RuntimeError(f"Failed to download from R2: {e}") from e except Exception as e: print(f"R2 download error: {e}") - raise RuntimeError(f"Failed to download from R2: {e}") + raise RuntimeError(f"Failed to download from R2: {e}") from ebenchmarks/wrappers/cloudflare/python/container/storage.py-168-201 (1)
168-201: Add timeout to download_directory list request and chain exception.The list operation and subsequent downloads should have timeouts.
🔎 Proposed fix
try: req = urllib.request.Request(list_url) req.add_header('User-Agent', 'SeBS/1.2 (https://github.com/spcl/serverless-benchmarks) SeBS Benchmark Suite/1.2') - with urllib.request.urlopen(req) as response: + with urllib.request.urlopen(req, timeout=30) as response: # ... rest of code ... except Exception as e: print(f"Error listing/downloading directory: {e}") - raise RuntimeError(f"Failed to download directory: {e}") + raise RuntimeError(f"Failed to download directory: {e}") from esebs/cloudflare/triggers.py-136-142 (1)
136-142: Thread pool executor leak: new pool created per invocation.Each call to
async_invokecreates a newThreadPoolExecutorthat is never shut down, leading to resource leakage. Use a shared executor or ensure proper cleanup.🔎 Proposed fix
+_executor = None + +def _get_executor(): + global _executor + if _executor is None: + _executor = concurrent.futures.ThreadPoolExecutor(max_workers=4) + return _executor + class HTTPTrigger(Trigger): # ... existing code ... def async_invoke(self, payload: dict) -> concurrent.futures.Future: """ Asynchronously invoke a Cloudflare Worker via HTTP. """ - pool = concurrent.futures.ThreadPoolExecutor() - fut = pool.submit(self.sync_invoke, payload) + fut = _get_executor().submit(self.sync_invoke, payload) return futsebs/cloudflare/cloudflare.py-148-148 (1)
148-148: Add timeouts torequestscalls to prevent indefinite hangs.Multiple HTTP requests lack timeouts:
- Line 148: credential verification
- Line 1069: checking if worker exists
- Line 1259: fetching workers.dev subdomain
🔎 Proposed fixes
# Line 148 - response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers) + response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers, timeout=30) # Line 1069 - response = requests.get(url, headers=headers) + response = requests.get(url, headers=headers, timeout=30) # Line 1259 - resp = requests.get(url, headers=headers) + resp = requests.get(url, headers=headers, timeout=30)Also applies to: 1069-1069, 1259-1259
sebs/cloudflare/r2.py-81-84 (1)
81-84: Add timeouts to allrequestscalls to prevent indefinite hangs.HTTP requests without timeouts can block indefinitely if the server is unresponsive, causing the application to hang. This is flagged in multiple locations:
_create_bucket(line 82),list_buckets(line 294), andremove_bucket(line 363).🔎 Proposed fix for line 82
create_bucket_response = requests.post( - create_bucket_uri, json=params, headers=self._get_auth_headers() + create_bucket_uri, json=params, headers=self._get_auth_headers(), timeout=30 )🔎 Proposed fix for line 294
- response = requests.get(list_buckets_uri, headers=self._get_auth_headers()) + response = requests.get(list_buckets_uri, headers=self._get_auth_headers(), timeout=30)🔎 Proposed fix for line 363
- response = requests.delete(delete_bucket_uri, headers=self._get_auth_headers()) + response = requests.delete(delete_bucket_uri, headers=self._get_auth_headers(), timeout=30)Also applies to: 293-294, 362-364
benchmarks/wrappers/cloudflare/python/nosql.py-190-195 (1)
190-195: Fragile string manipulation for JSON parsing.Replacing single quotes with double quotes (
get_res.replace("\'", "\"")) is a fragile approach that will break if the data contains escaped quotes or actual single-quote characters in values.🔎 Proposed fix
If the data is stored as JSON, it should already use double quotes. If it's stored with single quotes, consider:
- Ensuring data is stored as valid JSON in
insert/update- Using
ast.literal_evalfor Python literal syntax- Storing raw JSON without the quote replacement
get_res = run_sync(self.get_table(table_name).get(key)) - get_res = get_res.replace("\'", "\"") - ##print("gr", get_res) - - res.append(json.loads(get_res)) + if get_res is not None: + res.append(json.loads(get_res))
🟡 Minor comments (16)
config/cloudflare-test.json-2-13 (1)
2-13: Clarify the runtime configuration for Cloudflare Workers.Cloudflare Workers do not run a full Node.js runtime; they run a V8-based Workers runtime. The
"language": "nodejs"specification with"version": "18"does not reflect how Cloudflare Workers function. If Node.js APIs are required, enable thenodejs_compatcompatibility flag (requirescompatibility_date >= 2024-09-23) instead of specifying a language/version pair.package.json-1-1 (1)
1-1: Remove the empty root package.json.This repository uses isolated, self-contained benchmark packages in subdirectories and has no monorepo or workspace configuration that would justify a root
package.json. Adding an empty{}file serves no purpose and will only create confusion for developers and potentially cause unexpected behavior with Node.js tooling. If this file is genuinely needed for a specific purpose, clarify its use; otherwise, remove it.config/systems.json-389-391 (1)
389-391: UUID version inconsistency with benchmark package.json.The Cloudflare Node.js deployment specifies
uuid: "3.4.0", but the compression benchmark'spackage.jsonspecifiesuuid: "^10.0.0". This version mismatch may cause conflicts or unexpected behavior during deployment.🔎 Suggested fix: Align versions
Either update this to match the benchmark:
"packages": { - "uuid": "3.4.0" + "uuid": "^10.0.0" }Or update the benchmark's package.json to use
3.4.0for consistency with other providers (AWS also uses3.4.0).sebs/cloudflare/README.md-105-118 (1)
105-118: Documentation inconsistency: Container support status is unclear.This section states that "Container Deployment: Not currently implemented" and lists it as a limitation. However, the PR description explicitly states:
- "Adds Node.js and Python container benchmark wrappers."
- "Adds Node.js and Python container benchmark wrappers."
- "Added container support for Python and Node.js, container orchestrator and worker proxy for R2/Durable Objects."
Additionally, this PR includes
dockerfiles/cloudflare/python/Dockerfile, suggesting container support is at least partially implemented.Please update this documentation section to accurately reflect the current implementation status. If containers are implemented, remove this from limitations and update lines 299-332 (Container Support Architecture) to reflect what was actually built rather than what "would be required."
benchmarks/wrappers/cloudflare/python/container/handler.py-135-137 (1)
135-137: Remove unusedcompute_timevariable.The
compute_timevariable is calculated but never used. Either use it in the response or remove it.🔎 Proposed fix
# Calculate timing end = datetime.datetime.now().timestamp() - compute_time = end - beginbenchmarks/wrappers/cloudflare/nodejs/container/worker.js-309-324 (1)
309-324: Remove debug logging statements.The R2 upload handler contains multiple
console.logdebug statements (lines 311-312, 314, 319-320) that should be removed before merging to production.🔎 Proposed fix
} else if (url.pathname === '/r2/upload') { // Upload to R2 - console.log(`[worker.js /r2/upload] bucket=${bucket}, key=${key}`); - console.log(`[worker.js /r2/upload] env.R2 exists:`, !!env.R2); const data = await request.arrayBuffer(); - console.log(`[worker.js /r2/upload] Received ${data.byteLength} bytes`); // Use the key as-is (container already generates unique keys if needed) try { const putResult = await env.R2.put(key, data); - console.log(`[worker.js /r2/upload] R2.put() returned:`, putResult); - console.log(`[worker.js /r2/upload] Successfully uploaded to R2 with key=${key}`); } catch (error) { console.error(`[worker.js /r2/upload] R2.put() error:`, error); throw error; }benchmarks/wrappers/cloudflare/nodejs/container/nosql.js-86-99 (1)
86-99: Remove debug logging statements.The
querymethod contains multipleconsole.errordebug statements that should be removed before merging. These add noise to production logs.🔎 Proposed fix
async query(tableName, primaryKey, secondaryKeyName) { const params = { table_name: tableName, primary_key: primaryKey, secondary_key_name: secondaryKeyName, }; const result = await this._make_request('query', params); - console.error(`[nosql.query] result:`, JSON.stringify(result)); - console.error(`[nosql.query] result.items:`, result.items); - console.error(`[nosql.query] Array.isArray(result.items):`, Array.isArray(result.items)); const items = result.items || []; - console.error(`[nosql.query] returning items:`, items); return items; }benchmarks/wrappers/cloudflare/nodejs/nosql.js-62-66 (1)
62-66: Race condition inupdate: concurrent updates may lose data.The read-modify-write pattern in
updateis not atomic. If two concurrent updates occur, one may overwrite the other's changes. Consider documenting this limitation or using Durable Object transactions if available.benchmarks/wrappers/cloudflare/nodejs/container/storage.js-15-18 (1)
15-18: Duplicate static field declaration:worker_urldefined twice.This is flagged by static analysis and will cause the second declaration to overwrite the first (which is redundant but may indicate copy-paste error).
🔎 Proposed fix
static worker_url = null; // Set by handler from X-Worker-URL header - - - static worker_url = null; // Set by handler from X-Worker-URL headerbenchmarks/wrappers/cloudflare/python/container/storage.py-146-151 (1)
146-151: Potential issue:downloaddoesn't handle/tmpprefix like the worker version.Comparing with
benchmarks/wrappers/cloudflare/python/storage.py(lines 51-59), the worker version redirects writes to/tmpand tracks written files. This container version creates directories but doesn't ensure the path is under/tmp, which could fail in restricted environments.🔎 Proposed fix
def download(self, bucket, key, filepath): """Download file to disk""" data = self.download_stream(bucket, key) + # Ensure writes go to /tmp in container environment + if not filepath.startswith('/tmp'): + filepath = '/tmp' + os.path.abspath(filepath) os.makedirs(os.path.dirname(filepath), exist_ok=True) with open(filepath, 'wb') as f: f.write(data)benchmarks/wrappers/cloudflare/python/nosql.py-161-168 (1)
161-168:getmethod innosql_kvreturns raw result instead of parsed JSON.Unlike
querywhich callsjson.loads, thegetmethod returns the raw result without parsing. This inconsistency could cause type errors for callers expecting a dict.🔎 Proposed fix
def get( self, table_name: str, primary_key: Tuple[str, str], secondary_key: Tuple[str, str] ) -> Optional[dict]: get_res = run_sync( self.get_table(table_name).get( self.key_maker(primary_key, secondary_key) )) - return get_res + if get_res is None: + return None + return json.loads(get_res)sebs/cloudflare/r2.py-13-14 (1)
13-14: Typo intypename(): "Cloudlfare" should be "Cloudflare".This typo will cause inconsistencies when the typename is used for logging or identification purposes.
🔎 Proposed fix
@staticmethod def typename() -> str: - return "Cloudlfare.R2" + return "Cloudflare.R2"benchmarks/wrappers/cloudflare/nodejs/storage.js-151-176 (1)
151-176: Inconsistent error handling:download_streamreturnsnullfor missing objects but then throws at line 175.When R2 returns no object, the method returns
null(line 156). However, if no backend is available, it throws an error (line 175). Additionally, if the object is found in R2 but has neitherarrayBuffernortextmethods, it falls through to throwing even though the object exists.Consider aligning the behavior — either always return
nullfor missing objects or always throw.🔎 Proposed fix for clearer control flow
async download_stream(__bucket, key) { const instance = storage.instance || this; if (instance.handle) { const obj = await instance.handle.get(key); - if (!obj) return null; + if (!obj) { + throw new Error(`download_stream(): object not found in R2: ${key}`); + } // R2 object provides arrayBuffer()/text() helpers in Workers if (typeof obj.arrayBuffer === 'function') { const ab = await obj.arrayBuffer(); return Buffer.from(ab); } if (typeof obj.text === 'function') { return await obj.text(); } - // Fallback: return null - return null; + throw new Error(`download_stream(): R2 object has no readable content: ${key}`); } // Fallback to local filesystem const localPath = path.join('/tmp', key); if (fs && fs.existsSync(localPath)) { return fs.readFileSync(localPath); } throw new Error('download_stream(): object not found'); }sebs/cloudflare/cloudflare.py-73-73 (1)
73-73: Avoid mutable default argument.Using
{}as a default argument is a Python pitfall — the same dict instance is shared across calls.🔎 Proposed fix
- def initialize(self, config: Dict[str, str] = {}, resource_prefix: Optional[str] = None): + def initialize(self, config: Optional[Dict[str, str]] = None, resource_prefix: Optional[str] = None): + if config is None: + config = {}benchmarks/wrappers/cloudflare/nodejs/storage.js-87-89 (1)
87-89: Useinstanceinstead ofthisfor consistency with the singleton pattern.Other methods use
storage.instance || thisbutdownloadcallsthis.download_streamdirectly. For consistency and correctness when called on the singleton, useinstance.🔎 Proposed fix
async download(__bucket, key, filepath) { const instance = storage.instance || this; - const data = await this.download_stream(__bucket, key); + const data = await instance.download_stream(__bucket, key);sebs/cloudflare/r2.py-58-60 (1)
58-60: Avoid mutable default argument.Using
[]as a default argument is a common Python pitfall — the same list instance is shared across all calls if not overwritten.🔎 Proposed fix
def _create_bucket( - self, name: str, buckets: list[str] = [], randomize_name: bool = False + self, name: str, buckets: Optional[list[str]] = None, randomize_name: bool = False ) -> str: + if buckets is None: + buckets = [] for bucket_name in buckets:
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI Agents
In @sebs/cloudflare/README.md:
- Around line 280-285: Add a clear limitation note to the "Platform Limitations"
section stating that Cloudflare does not support cold start detection and all
invocations will report is_cold: false; reference the handler example's
hardcoded is_cold: false and warn users this makes cold-start metrics
incomparable to platforms like AWS Lambda/Azure Functions and may skew benchmark
comparisons.
- Around line 191-250: The README response schema and metrics extraction
description are wrong; update the docs to reflect the actual handler and
extractor: note that the Python handler
(benchmarks/wrappers/cloudflare/python/handler.py) returns extra unused fields
(results_time, is_cold_worker, container_id, environ_container_id) and that
measurements are nested at result.output['result']['measurement'] rather than
using the top-level compute_time; document that HTTPTrigger.sync_invoke in
sebs/cloudflare/triggers.py ignores top-level compute_time and instead reads
cpu_time_us or cpu_time_ms to populate provider_times.execution, wall_time_us or
wall_time_ms to populate times.benchmark, is_cold from the nested measurement,
and memory_used_mb for memory metrics; remove or stop claiming top-level
compute_time is used, and explicitly list which fields are consumed vs
returned-but-unused.
🧹 Nitpick comments (3)
sebs/cloudflare/README.md (3)
138-142: Clarify tier-specific limitations and document paid plan differences.The memory and CPU limits are stated as "128MB memory, 50ms CPU time on free tier," but the README doesn't mention that paid plans have different limits. This could mislead users deploying to paid accounts.
Add a subsection under "Platform Limitations" clarifying:
- Free tier: 128MB memory, 50ms CPU time
- Paid plans: Different limits (specify actual limits)
- Link to Cloudflare's pricing/limits documentation
300-306: Specify Cloudflare's container image size limit instead of leaving it vague.Line 305 states "Ensure container image size is under Cloudflare's limits" but doesn't specify the actual limit. This forces users to refer to external documentation.
If Cloudflare publishes a container image size limit, include it explicitly (e.g., "container images must be under X MB").
70-76: Minor wording: Line 76 is slightly awkward."This provides the behavior of SeBS to invoke serverless functions via either library or http triggers" is grammatically unclear. Consider rewording to: "This enables SeBS to invoke serverless functions via either library or HTTP triggers."
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
sebs/cloudflare/README.md
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
🔇 Additional comments (1)
sebs/cloudflare/README.md (1)
313-318: All external documentation links are valid and current.The four Cloudflare documentation URLs in the References section are accessible and reflect the current state of Cloudflare's APIs and services, including recent features like Workflows in the Bindings documentation and R2 storage classes. No updates needed.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI Agents
In @sebs/cloudflare/durable_objects.py:
- Around line 100-113: The update_cache method is using a non-existent private
attribute self._cache_client which will raise AttributeError; change the call in
update_cache to use the public property self.cache_client.update_nosql(...)
(matching retrieve_cache) and keep the same arguments and logging; ensure
update_cache references self.cache_client (not self._cache_client) and that
deployment_name, benchmark and self._tables[benchmark] are passed unchanged.
- Around line 30-40: The __init__ method for the Durable Objects class accepts a
credentials: CloudflareCredentials parameter but never stores it, causing
_get_auth_headers to raise AttributeError when accessing self._credentials; fix
by assigning the parameter to an instance variable (e.g., self._credentials =
credentials) inside __init__ so _get_auth_headers and other methods can
reference it reliably.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
sebs/cloudflare/durable_objects.pysebs/cloudflare/function.pysebs/sebs.py
🚧 Files skipped from review as they are similar to previous changes (1)
- sebs/cloudflare/function.py
🧰 Additional context used
🧬 Code graph analysis (2)
sebs/sebs.py (2)
sebs/utils.py (1)
has_platform(260-261)sebs/cloudflare/cloudflare.py (1)
Cloudflare(25-1611)
sebs/cloudflare/durable_objects.py (3)
sebs/cloudflare/config.py (12)
CloudflareCredentials(9-127)typename(32-33)typename(140-141)typename(210-211)resources(218-219)credentials(214-215)api_token(36-37)api_key(44-45)update_cache(118-121)update_cache(167-173)update_cache(248-251)sebs/faas/config.py (1)
Resources(57-147)sebs/cache.py (3)
Cache(37-463)get_nosql_config(177-178)update_nosql(190-193)
🪛 Ruff (0.14.10)
sebs/cloudflare/durable_objects.py
35-35: Unused method argument: credentials
(ARG002)
55-55: Avoid specifying long messages outside the exception class
(TRY003)
116-116: Unused method argument: primary_key
(ARG002)
116-116: Unused method argument: secondary_key
(ARG002)
146-146: Unused method argument: data
(ARG002)
147-147: Unused method argument: primary_key
(ARG002)
148-148: Unused method argument: secondary_key
(ARG002)
167-167: Avoid specifying long messages outside the exception class
(TRY003)
182-182: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (2)
sebs/sebs.py (1)
109-112: LGTM! Platform registration now follows consistent pattern.The Cloudflare registration now correctly uses
has_platform("cloudflare"), matching the pattern used for AWS, Azure, GCP, and OpenWhisk. This addresses the previous review concern about inconsistent registration.sebs/cloudflare/durable_objects.py (1)
185-212: LGTM! Dictionary mutation issue resolved.The two-step approach with
list()wrapping correctly preventsRuntimeErrorfrom modifying the dictionary during iteration. The logic first locates the matching benchmark and table_key, then performs deletion after the loop completes.
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In @benchmarks/100.webapps/130.crud-api/nodejs/function.js:
- Around line 23-47: The total and average calculations in queryProducts are
incorrect: priceSum currently adds product.price without accounting for
product.quantity and avgPrice divides sum(prices) by sum(quantity), which is
meaningless; change the aggregation so priceSum accumulates product.price *
product.quantity (i.e., priceSum += product.price * product.quantity) and
compute avgPrice as quantitySum > 0 ? priceSum / quantitySum : 0.0 (average unit
price per item), keeping product names appended to products and returning
total_cost: priceSum and avg_price: avgPrice.
In @sebs/cloudflare/cloudflare.py:
- Around line 1253-1280: The requests.get call in the method that fetches the
workers.dev subdomain should include a timeout to avoid hanging; update the call
to requests.get(url, headers=headers, timeout=...) and choose a sensible timeout
(e.g., 5-10s), and adjust the surrounding exception handling to specifically
catch requests.Timeout and requests.RequestException where appropriate so the
method still logs via self.logging.warning and returns None on timeout or other
request errors; refer to the existing method using _get_auth_headers,
self._workers_dev_subdomain, and the current requests.get usage to locate and
modify the call.
- Around line 148-154: The requests.get call used to verify Cloudflare
credentials lacks a timeout and can hang; update the call to include a sensible
timeout (e.g., timeout=10) or use a configurable attribute like self._timeout,
i.e., change requests.get(f"{self._api_base_url}/user/tokens/verify",
headers=headers) to include timeout, and add handling for
requests.exceptions.Timeout/requests.exceptions.RequestException to raise a
clear RuntimeError if the request times out or fails; refer to the requests.get
invocation and the response/status_code check to locate where to add the timeout
and error handling.
- Around line 1209-1213: Replace the bare except that follows response.json()
with a specific exception handler for JSON parse failures: catch
json.JSONDecodeError and ValueError (and bind it to a variable, e.g., "e")
around the call to response.json() where error_data and error_msg are set, then
fall back to the existing log path (self.logging.info(f"Container
provisioning... ({elapsed}s elapsed)"))—optionally include the exception text in
the log; ensure json is imported if not already and update the except block that
currently surrounds response.json()/error_msg assignment.
- Around line 799-814: The nested function replacer closes over loop-scoped
variables control_flow and builtins causing stale bindings; fix by capturing
their current values into locals when defining replacer (e.g., assign
local_control_flow = control_flow and local_builtins = builtins just before
defining replacer and refer to those inside replacer, or bind them as default
args like def replacer(match, control_flow=control_flow, builtins=builtins):),
then keep using replacer in the re.sub call so the correct sets are used for
each iteration.
- Around line 73-83: The initialize method currently declares a mutable default
config and never uses it; change the signature to use a safe default by
replacing config: Dict[str, str] = {} with config: Optional[Dict[str, str]] =
None (or remove the parameter entirely if it’s not needed), and if you keep it,
ensure you normalize it at the top of initialize (e.g., config = config or {})
before use and pass relevant values into initialize_resources or other code
paths; keep references to the existing initialize, _verify_credentials, and
initialize_resources symbols when making the change.
- Around line 1066-1077: The bare except around response.json() in the function
that handles the worker check should be replaced with a specific exception
handler for JSON parse errors: catch json.JSONDecodeError (and optionally
ValueError for compatibility) instead of using a bare except, and handle/log the
exception (e.g., with self.logging.debug/warning) before returning None; ensure
json is imported if not already and update the try/except that wraps
response.json().get("result") to use the specific exception types and an
exception variable (e) for logging.
🧹 Nitpick comments (8)
benchmarks/100.webapps/130.crud-api/nodejs/function.js (1)
49-78: Consider adding input validation.The handler assumes all required fields are present in
request.bodyandrequest.path. If upstream wrappers don't guarantee this, the function may throw unclear errors when accessing undefined properties.Consider adding validation for:
- Required body fields (
cart,product_id,name,price,quantityfor PUT)- Required path parameters (
idfor GET /cart/{id})- Data types (e.g., ensuring
priceandquantityare numbers)Note: Line 67 uses
body.cartfor a GET request, which is unconventional (GET requests typically use path/query parameters), but may be intentional for this benchmark's design.benchmarks/wrappers/cloudflare/nodejs/build.js (1)
108-155: Consider adding documentation for the post-processing logic.The regex-based post-processing that converts dynamic
__requirecalls into static imports is clever but complex. Adding comments explaining:
- Why this transformation is necessary (Cloudflare Workers ESM requirements)
- The pattern being matched (
__require<digits>("node:..."))- The cache mechanism
would improve maintainability.
📝 Example documentation
- // POST-PROCESS: Replace dynamic requires with static imports + // POST-PROCESS: Replace dynamic requires with static imports + // Cloudflare Workers require ESM with static imports for node: modules. + // esbuild generates __require\d*("node:...") calls for dynamic imports, + // so we transform them into static imports with a cache lookup. console.log('Post-processing to fix node: module imports...');sebs/cloudflare/durable_objects.py (1)
183-183: Remove unnecessary f-string prefix.The f-string on line 183 doesn't contain any placeholders, so the
fprefix is unnecessary.✨ Proposed fix
- self.logging.info(f"Durable Objects data is managed within the Worker") + self.logging.info("Durable Objects data is managed within the Worker")sebs/cloudflare/cloudflare.py (5)
385-391: Remove unused variabler2_bucket_configured.The variable
r2_bucket_configuredis assigned on line 385 but never used afterward. This can be safely removed to clean up the code.🧹 Proposed fix
# Add R2 bucket binding for benchmarking files (for both native and container deployments) - r2_bucket_configured = False try: storage = self.system_resources.get_storage() bucket_name = storage.get_bucket(Resources.StorageBucketType.BENCHMARKS) if bucket_name: toml_content += f"""# R2 bucket binding for benchmarking files # This bucket is used by fs and path polyfills to read benchmark data [[r2_buckets]] binding = "R2" bucket_name = "{bucket_name}" """ - r2_bucket_configured = True self.logging.info(f"R2 bucket '{bucket_name}' will be bound to worker as 'R2'")
654-657: Rename unused loop variable.The loop variable
dirnameson line 654 is not used within the loop body. Following Python conventions, it should be renamed to_dirnamesto indicate it's intentionally unused.✨ Proposed fix
# Calculate total size of the package directory total_size = 0 - for dirpath, dirnames, filenames in os.walk(directory): + for dirpath, _dirnames, filenames in os.walk(directory): for filename in filenames: filepath = os.path.join(dirpath, filename) total_size += os.path.getsize(filepath)
855-866: Remove unused variableresult.The variable
resultassigned on line 855 from thesubprocess.run()call is never used. Since the success is verified by the lack of exception, this assignment can be removed.🧹 Proposed fix
# Install Node.js dependencies (needed for all containers for worker.js) self.logging.info(f"Installing @cloudflare/containers for worker.js orchestration in {directory}") try: - result = subprocess.run( + subprocess.run( ["npm", "install", "--production"], cwd=directory, capture_output=True, text=True, check=True, timeout=120 ) self.logging.info("npm install completed successfully")
924-927: Rename unused loop variable.The loop variable
dirnameson line 924 is not used within the loop body. Following Python conventions, it should be renamed to_dirnames.✨ Proposed fix
# Calculate package size (approximate, as it's a source directory) total_size = 0 - for dirpath, dirnames, filenames in os.walk(directory): + for dirpath, _dirnames, filenames in os.walk(directory): for filename in filenames: filepath = os.path.join(dirpath, filename) total_size += os.path.getsize(filepath)
1495-1512: Rename unused loop variable.The loop variable
request_idon line 1495 is not used within the loop body. Following Python conventions, it should be renamed to_request_idto indicate it's intentionally unused.✨ Proposed fix
- for request_id, result in requests.items(): + for _request_id, result in requests.items(): # Count cold/warm starts if result.stats.cold_start:
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
benchmarks/100.webapps/130.crud-api/nodejs/function.jsbenchmarks/wrappers/cloudflare/nodejs/build.jssebs/cloudflare/README.mdsebs/cloudflare/cloudflare.pysebs/cloudflare/durable_objects.py
✅ Files skipped from review due to trivial changes (1)
- sebs/cloudflare/README.md
🧰 Additional context used
🧬 Code graph analysis (2)
sebs/cloudflare/durable_objects.py (3)
sebs/cloudflare/resources.py (1)
config(37-38)sebs/cloudflare/config.py (12)
CloudflareCredentials(9-127)typename(32-33)typename(140-141)typename(210-211)resources(218-219)credentials(214-215)api_token(36-37)api_key(44-45)update_cache(118-121)update_cache(167-173)update_cache(248-251)sebs/cache.py (3)
Cache(37-463)get_nosql_config(177-178)update_nosql(190-193)
sebs/cloudflare/cloudflare.py (2)
sebs/cloudflare/resources.py (4)
config(37-38)CloudflareSystemResources(16-95)get_storage(56-78)_get_auth_headers(40-54)sebs/cloudflare/triggers.py (8)
typename(19-20)typename(72-73)LibraryTrigger(7-57)HTTPTrigger(60-154)url(80-82)url(85-86)trigger_type(23-24)trigger_type(76-77)
🪛 Ruff (0.14.10)
sebs/cloudflare/durable_objects.py
56-56: Avoid specifying long messages outside the exception class
(TRY003)
117-117: Unused method argument: primary_key
(ARG002)
117-117: Unused method argument: secondary_key
(ARG002)
147-147: Unused method argument: data
(ARG002)
148-148: Unused method argument: primary_key
(ARG002)
149-149: Unused method argument: secondary_key
(ARG002)
168-168: Avoid specifying long messages outside the exception class
(TRY003)
183-183: f-string without any placeholders
Remove extraneous f prefix
(F541)
sebs/cloudflare/cloudflare.py
73-73: Unused method argument: config
(ARG002)
73-73: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
117-117: Do not catch blind exception: Exception
(BLE001)
128-131: Avoid specifying long messages outside the exception class
(TRY003)
134-137: Avoid specifying long messages outside the exception class
(TRY003)
148-148: Probable use of requests call without timeout
(S113)
151-154: Avoid specifying long messages outside the exception class
(TRY003)
162-162: Starting a process with a partial executable path
(S607)
174-174: Starting a process with a partial executable path
(S607)
184-184: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
184-184: Avoid specifying long messages outside the exception class
(TRY003)
186-188: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
186-188: Avoid specifying long messages outside the exception class
(TRY003)
190-190: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
190-190: Avoid specifying long messages outside the exception class
(TRY003)
196-196: Starting a process with a partial executable path
(S607)
208-208: Starting a process with a partial executable path
(S607)
218-218: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
218-218: Avoid specifying long messages outside the exception class
(TRY003)
220-222: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
220-222: Avoid specifying long messages outside the exception class
(TRY003)
224-224: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
224-224: Avoid specifying long messages outside the exception class
(TRY003)
227-227: Unused method argument: container_uri
(ARG002)
385-385: Local variable r2_bucket_configured is assigned to but never used
Remove assignment to unused variable r2_bucket_configured
(F841)
387-387: Do not catch blind exception: Exception
(BLE001)
416-416: Avoid specifying long messages outside the exception class
(TRY003)
423-423: Unused method argument: architecture
(ARG002)
464-464: Unused method argument: is_cached
(ARG002)
482-482: Starting a process with a partial executable path
(S607)
496-496: Starting a process with a partial executable path
(S607)
507-507: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
508-508: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
508-508: Avoid specifying long messages outside the exception class
(TRY003)
510-510: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
511-511: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
511-511: Avoid specifying long messages outside the exception class
(TRY003)
513-515: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
513-515: Avoid specifying long messages outside the exception class
(TRY003)
525-525: Starting a process with a partial executable path
(S607)
533-533: Do not catch blind exception: Exception
(BLE001)
643-646: Avoid specifying long messages outside the exception class
(TRY003)
647-650: Avoid specifying long messages outside the exception class
(TRY003)
654-654: Loop control variable dirnames not used within loop body
Rename unused dirnames to _dirnames
(B007)
685-687: Avoid specifying long messages outside the exception class
(TRY003)
705-705: Avoid specifying long messages outside the exception class
(TRY003)
718-718: f-string without any placeholders
Remove extraneous f prefix
(F541)
804-804: Function definition does not bind loop variable control_flow
(B023)
804-804: Function definition does not bind loop variable builtins
(B023)
855-855: Local variable result is assigned to but never used
Remove assignment to unused variable result
(F841)
856-856: Starting a process with a partial executable path
(S607)
864-864: Do not catch blind exception: Exception
(BLE001)
865-865: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
866-866: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
866-866: Avoid specifying long messages outside the exception class
(TRY003)
924-924: Loop control variable dirnames not used within loop body
Rename unused dirnames to _dirnames
(B007)
956-956: subprocess call: check for execution of untrusted input
(S603)
957-957: Starting a process with a partial executable path
(S607)
969-969: Consider moving this statement to an else block
(TRY300)
975-975: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
976-976: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
978-978: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
978-978: Avoid specifying long messages outside the exception class
(TRY003)
1011-1011: Avoid specifying long messages outside the exception class
(TRY003)
1066-1066: Probable use of requests call without timeout
(S113)
1071-1071: Do not use bare except
(E722)
1132-1132: subprocess call: check for execution of untrusted input
(S603)
1167-1167: Consider moving this statement to an else block
(TRY300)
1170-1170: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
1170-1170: Avoid specifying long messages outside the exception class
(TRY003)
1175-1175: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1176-1176: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
1178-1178: Unused method argument: package_dir
(ARG002)
1212-1212: Do not use bare except
(E722)
1256-1256: Probable use of requests call without timeout
(S113)
1278-1278: Do not catch blind exception: Exception
(BLE001)
1348-1348: Avoid specifying long messages outside the exception class
(TRY003)
1365-1365: Unused method argument: benchmark
(ARG002)
1391-1391: Unused method argument: resources
(ARG002)
1438-1438: Unused method argument: functions
(ARG002)
1438-1438: Unused method argument: code_package
(ARG002)
1458-1458: Unused method argument: start_time
(ARG002)
1459-1459: Unused method argument: end_time
(ARG002)
1495-1495: Loop control variable request_id not used within loop body
Rename unused request_id to _request_id
(B007)
🔇 Additional comments (11)
benchmarks/100.webapps/130.crud-api/nodejs/function.js (3)
6-13: Previous issue resolved.The missing
awaiton the asyncinsertoperation flagged in the previous review has been addressed. The function now properly awaits the insert operation, preventing fire-and-forget behavior.
15-21: LGTM!The function correctly awaits the async
getoperation and returns the result. Implementation is clean and straightforward.
1-4: No issue found. Line 3'snosql.nosql.get_instance()pattern is correct based on the module structure. The nosql.js file exports a class namednosql, and the double property access (variablenosqlcontaining the exports, accessing propertynosqlfor the class) is the natural result of ES6 named exports. While the identical naming between the variable and class is unconventional, it is not a functional error.benchmarks/wrappers/cloudflare/nodejs/build.js (5)
5-28: LGTM!The recursive file enumeration logic correctly excludes build artifacts, test files, and configuration files while collecting source files for bundling.
30-36: LGTM!The file copy utility correctly ensures the destination directory exists before copying.
173-173: LGTM!The script execution is straightforward and correct.
54-60: No issues found. Therequest-polyfill.jsfile exists in the wrapper directory where the build script expects it.
88-106: No action needed. Cloudflare Workers fully supports ES2020 and all later JavaScript features through its V8 engine, which is kept in sync with Chrome stable. Thees2020target in this build configuration is appropriate and compatible.sebs/cloudflare/durable_objects.py (3)
30-40: LGTM! Previous credential storage issue resolved.The
__init__method now correctly stores thecredentialsparameter asself._credentials(line 38), fixing the previously reportedAttributeError. The initialization logic is sound.
101-114: LGTM! Cache client property usage is now correct.The
update_cachemethod now correctly usesself.cache_client(line 107) instead of the non-existentself._cache_client, matching the pattern inretrieve_cache. This resolves the previously reportedAttributeError.
186-213: LGTM! Dictionary mutation issue resolved.The
remove_tablemethod now uses a safe two-step approach: first finding the benchmark and table_key (lines 198-206 withlist()wrappers to avoid mutation), then performing the deletion after iteration (lines 209-210). This prevents theRuntimeError: dictionary changed size during iterationthat was previously flagged.
| def initialize(self, config: Dict[str, str] = {}, resource_prefix: Optional[str] = None): | ||
| """ | ||
| Initialize the Cloudflare Workers platform. | ||
|
|
||
| Args: | ||
| config: Additional configuration parameters | ||
| resource_prefix: Prefix for resource naming | ||
| """ | ||
| # Verify credentials are valid | ||
| self._verify_credentials() | ||
| self.initialize_resources(select_prefix=resource_prefix) |
There was a problem hiding this comment.
Fix mutable default argument and unused parameter.
Line 73 has two issues:
- The
configparameter is defined but never used in the method body. - The mutable default
config: Dict[str, str] = {}can cause unexpected behavior if the same dict instance is reused across calls.
🔧 Proposed fix
- def initialize(self, config: Dict[str, str] = {}, resource_prefix: Optional[str] = None):
+ def initialize(self, config: Optional[Dict[str, str]] = None, resource_prefix: Optional[str] = None):
"""
Initialize the Cloudflare Workers platform.
Args:
config: Additional configuration parameters
resource_prefix: Prefix for resource naming
"""
+ if config is None:
+ config = {}
# Verify credentials are valid
self._verify_credentials()🧰 Tools
🪛 Ruff (0.14.10)
73-73: Unused method argument: config
(ARG002)
73-73: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
🤖 Prompt for AI Agents
In @sebs/cloudflare/cloudflare.py around lines 73 - 83, The initialize method
currently declares a mutable default config and never uses it; change the
signature to use a safe default by replacing config: Dict[str, str] = {} with
config: Optional[Dict[str, str]] = None (or remove the parameter entirely if
it’s not needed), and if you keep it, ensure you normalize it at the top of
initialize (e.g., config = config or {}) before use and pass relevant values
into initialize_resources or other code paths; keep references to the existing
initialize, _verify_credentials, and initialize_resources symbols when making
the change.
| response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers) | ||
|
|
||
| if response.status_code != 200: | ||
| raise RuntimeError( | ||
| f"Failed to verify Cloudflare credentials: {response.status_code} - {response.text}\n" | ||
| f"Please check that your CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID are correct." | ||
| ) |
There was a problem hiding this comment.
Add timeout to requests call for security.
The requests.get() call on line 148 lacks a timeout parameter, which can cause the application to hang indefinitely if the Cloudflare API is unresponsive. This is a security and reliability concern.
🔒 Proposed fix
- response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers)
+ response = requests.get(
+ f"{self._api_base_url}/user/tokens/verify",
+ headers=headers,
+ timeout=30 # 30 second timeout for credential verification
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers) | |
| if response.status_code != 200: | |
| raise RuntimeError( | |
| f"Failed to verify Cloudflare credentials: {response.status_code} - {response.text}\n" | |
| f"Please check that your CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID are correct." | |
| ) | |
| response = requests.get( | |
| f"{self._api_base_url}/user/tokens/verify", | |
| headers=headers, | |
| timeout=30 # 30 second timeout for credential verification | |
| ) | |
| if response.status_code != 200: | |
| raise RuntimeError( | |
| f"Failed to verify Cloudflare credentials: {response.status_code} - {response.text}\n" | |
| f"Please check that your CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID are correct." | |
| ) |
🧰 Tools
🪛 Ruff (0.14.10)
148-148: Probable use of requests call without timeout
(S113)
151-154: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In @sebs/cloudflare/cloudflare.py around lines 148 - 154, The requests.get call
used to verify Cloudflare credentials lacks a timeout and can hang; update the
call to include a sensible timeout (e.g., timeout=10) or use a configurable
attribute like self._timeout, i.e., change
requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers) to
include timeout, and add handling for
requests.exceptions.Timeout/requests.exceptions.RequestException to raise a
clear RuntimeError if the request times out or fails; refer to the requests.get
invocation and the response/status_code check to locate where to add the timeout
and error handling.
| try: | ||
| headers = self._get_auth_headers() | ||
| url = f"{self._api_base_url}/accounts/{account_id}/workers/subdomain" | ||
| resp = requests.get(url, headers=headers) | ||
| if resp.status_code == 200: | ||
| body = resp.json() | ||
| sub = None | ||
| # result may contain 'subdomain' or nested structure | ||
| if isinstance(body, dict): | ||
| sub = body.get("result", {}).get("subdomain") | ||
|
|
||
| if sub: | ||
| self._workers_dev_subdomain = sub | ||
| return sub | ||
| else: | ||
| self.logging.warning( | ||
| "Could not find workers.dev subdomain in API response; " | ||
| "please enable the workers.dev subdomain in your Cloudflare dashboard." | ||
| ) | ||
| return None | ||
| else: | ||
| self.logging.warning( | ||
| f"Failed to fetch workers.dev subdomain: {resp.status_code} - {resp.text}" | ||
| ) | ||
| return None | ||
| except Exception as e: | ||
| self.logging.warning(f"Error fetching workers.dev subdomain: {e}") | ||
| return None |
There was a problem hiding this comment.
Add timeout to requests call for reliability.
The requests.get() call on line 1256 lacks a timeout parameter, which can cause the application to hang indefinitely if the Cloudflare API is unresponsive. This affects reliability when fetching the workers.dev subdomain.
🔒 Proposed fix
try:
headers = self._get_auth_headers()
url = f"{self._api_base_url}/accounts/{account_id}/workers/subdomain"
- resp = requests.get(url, headers=headers)
+ resp = requests.get(url, headers=headers, timeout=30)
if resp.status_code == 200:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| headers = self._get_auth_headers() | |
| url = f"{self._api_base_url}/accounts/{account_id}/workers/subdomain" | |
| resp = requests.get(url, headers=headers) | |
| if resp.status_code == 200: | |
| body = resp.json() | |
| sub = None | |
| # result may contain 'subdomain' or nested structure | |
| if isinstance(body, dict): | |
| sub = body.get("result", {}).get("subdomain") | |
| if sub: | |
| self._workers_dev_subdomain = sub | |
| return sub | |
| else: | |
| self.logging.warning( | |
| "Could not find workers.dev subdomain in API response; " | |
| "please enable the workers.dev subdomain in your Cloudflare dashboard." | |
| ) | |
| return None | |
| else: | |
| self.logging.warning( | |
| f"Failed to fetch workers.dev subdomain: {resp.status_code} - {resp.text}" | |
| ) | |
| return None | |
| except Exception as e: | |
| self.logging.warning(f"Error fetching workers.dev subdomain: {e}") | |
| return None | |
| try: | |
| headers = self._get_auth_headers() | |
| url = f"{self._api_base_url}/accounts/{account_id}/workers/subdomain" | |
| resp = requests.get(url, headers=headers, timeout=30) | |
| if resp.status_code == 200: | |
| body = resp.json() | |
| sub = None | |
| # result may contain 'subdomain' or nested structure | |
| if isinstance(body, dict): | |
| sub = body.get("result", {}).get("subdomain") | |
| if sub: | |
| self._workers_dev_subdomain = sub | |
| return sub | |
| else: | |
| self.logging.warning( | |
| "Could not find workers.dev subdomain in API response; " | |
| "please enable the workers.dev subdomain in your Cloudflare dashboard." | |
| ) | |
| return None | |
| else: | |
| self.logging.warning( | |
| f"Failed to fetch workers.dev subdomain: {resp.status_code} - {resp.text}" | |
| ) | |
| return None | |
| except Exception as e: | |
| self.logging.warning(f"Error fetching workers.dev subdomain: {e}") | |
| return None |
🧰 Tools
🪛 Ruff (0.14.10)
1256-1256: Probable use of requests call without timeout
(S113)
1278-1278: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In @sebs/cloudflare/cloudflare.py around lines 1253 - 1280, The requests.get
call in the method that fetches the workers.dev subdomain should include a
timeout to avoid hanging; update the call to requests.get(url, headers=headers,
timeout=...) and choose a sensible timeout (e.g., 5-10s), and adjust the
surrounding exception handling to specifically catch requests.Timeout and
requests.RequestException where appropriate so the method still logs via
self.logging.warning and returns None on timeout or other request errors; refer
to the existing method using _get_auth_headers, self._workers_dev_subdomain, and
the current requests.get usage to locate and modify the call.
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In @benchmarks/wrappers/cloudflare/python/nosql.py:
- Line 192: The code is brittle because get_res is being massaged with
get_res.replace("'", "\"") before JSON parsing; stop doing that and parse the
data in its true format: ensure the producer stores JSON strings (fix the writer
to use json.dumps) or, if the KV contains Python literal reprs, replace the
fragile replace() with a safe parser (use json.loads(get_res) when data is valid
JSON, or ast.literal_eval(get_res) when Python literals are expected) and add
validation/error handling; remove the replace call and update the code that
reads/writes the KV so the stored format is unambiguous and parsed with the
proper function (reference the get_res variable and the surrounding KV
read/write logic).
- Line 91: Replace the hard assertion "assert len(keys) <= 100" with explicit
validation that raises a descriptive exception when the limit is exceeded: check
if len(keys) > 100 and raise a ValueError (or a custom exception) that includes
the actual len(keys) and the maximum allowed (100) so callers get a clear error
message instead of an abrupt assertion failure; update any docstring/comments
for the function in nosql.py that handles "keys" to reflect this explicit error
behavior.
- Line 184: Replace the hard assertion "assert len(keys) <= 100" with proper
error handling: detect when len(keys) > 100 and raise a descriptive exception
(e.g., ValueError) that includes the actual count and context (which
operation/function and parameter causing it). Apply the same pattern used to fix
nosql_do.query(): validate the keys collection early in the function that
constructs/uses the "keys" variable and raise a clear, actionable error instead
of asserting so the caller can handle it.
- Line 117: The class-level annotation for the singleton "instance" is
referencing the wrong name "nosql"; update the type hint to reference the
correct class "nosql_kv" (i.e., change the annotation on the variable instance:
Optional["nosql"] to Optional["nosql_kv"]) so the forward reference matches the
actual class name used in this module (look for the class definition named
nosql_kv and the variable instance).
- Around line 120-122: The init_instance function incorrectly constructs and
assigns nosql.instance using the nosql class; change it to instantiate and
assign nosql_kv instead (i.e., use nosql_kv() for creation and assign to
nosql.instance.env from the entry.env), ensuring init_instance references the
nosql_kv class for both the new instance and the instance property.
- Around line 204-207: get_instance currently constructs the wrong class by
calling nosql() (the nosql_do alias) instead of the KV implementation; change
the instantiation so that when nosql.instance is None you assign nosql.instance
= nosql_kv() (or the correct nosql_kv class/constructor) and return
nosql.instance, ensuring you're referencing the nosql_kv symbol used for the
key-value implementation rather than nosql().
- Line 88: Remove the leftover debug print by deleting the "print(key)" call in
benchmarks/wrappers/cloudflare/python/nosql.py; if key inspection is still
needed use the module's logger (e.g., logger.debug(...)) instead of printing to
stdout to avoid leaking debug output in production.
🧹 Nitpick comments (5)
benchmarks/wrappers/cloudflare/python/nosql.py (5)
51-51: Remove unused variable assignment.The
put_resvariable is assigned but never used. Consider removing it or using it for error handling.♻️ Proposed fix
send_data = self.data_pre(data) k=self.key_maker(primary_key, secondary_key) - put_res = run_sync(self.get_table(table_name).put(k, send_data)) + run_sync(self.get_table(table_name).put(k, send_data)) return
94-94: Consider implementing bulk operations.The TODO comment indicates that bulk operations would be preferred over individual gets. This could improve performance for queries returning multiple keys.
Would you like me to help investigate or implement bulk get operations for this query method?
140-144: Remove unused variable assignment.Similar to line 51, the
put_resvariable is assigned but never used.♻️ Proposed fix
- put_res = ( - run_sync(self.get_table(table_name).put( + run_sync(self.get_table(table_name).put( self.key_maker(primary_key, secondary_key), - json.dumps(data)) - )) + json.dumps(data))) return
154-158: Remove unused variable assignment.The
put_resvariable is assigned but never used.♻️ Proposed fix
- put_res = run_sync( - self.get_table(table_name).put( + run_sync( + self.get_table(table_name).put( self.key_maker(primary_key, secondary_key), - json.dumps(data) - )) + json.dumps(data))) return
187-187: Consider implementing bulk operations.Similar to the TODO in
nosql_do.query(), bulk operations would improve performance.Would you like me to help investigate or implement bulk get operations for this query method?
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
benchmarks/wrappers/cloudflare/python/nosql.py
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/wrappers/cloudflare/python/nosql.py (1)
benchmarks/wrappers/cloudflare/python/container/nosql.py (8)
init_instance(18-22)insert(53-66)update(68-81)get(83-92)query(94-103)delete(105-111)get_instance(114-117)nosql(11-117)
🪛 Ruff (0.14.10)
benchmarks/wrappers/cloudflare/python/nosql.py
23-23: Use explicit conversion flag
Replace with conversion flag
(RUF010)
26-26: Use explicit conversion flag
Replace with conversion flag
(RUF010)
38-38: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
40-40: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
51-51: Local variable put_res is assigned to but never used
Remove assignment to unused variable put_res
(F841)
125-125: Use explicit conversion flag
Replace with conversion flag
(RUF010)
128-128: Use explicit conversion flag
Replace with conversion flag
(RUF010)
140-140: Local variable put_res is assigned to but never used
Remove assignment to unused variable put_res
(F841)
154-154: Local variable put_res is assigned to but never used
Remove assignment to unused variable put_res
(F841)
🔇 Additional comments (1)
benchmarks/wrappers/cloudflare/python/nosql.py (1)
9-15: Previously reported critical issues have been fixed.The class name mismatches in
nosql_dothat were flagged in the previous review have been correctly addressed. The type annotation now uses"nosql_do"and all instance references usenosql_do.instance.
| keys = [] | ||
| for key in list_res: | ||
| if key.startswith(prefix_key): | ||
| print(key) |
There was a problem hiding this comment.
Remove debug print statement.
This print statement appears to be leftover debug code and should be removed for production use.
🔧 Proposed fix
for key in list_res:
if key.startswith(prefix_key):
- print(key)
keys.append(key)🤖 Prompt for AI Agents
In @benchmarks/wrappers/cloudflare/python/nosql.py at line 88, Remove the
leftover debug print by deleting the "print(key)" call in
benchmarks/wrappers/cloudflare/python/nosql.py; if key inspection is still
needed use the module's logger (e.g., logger.debug(...)) instead of printing to
stdout to avoid leaking debug output in production.
| print(key) | ||
| keys.append(key) | ||
| ##print("keys", keys) | ||
| assert len(keys) <= 100 |
There was a problem hiding this comment.
Replace hard assertion with proper error handling.
The hard assertion will crash the program if more than 100 keys are found. This should raise a descriptive exception instead to provide better error handling.
🛡️ Proposed fix
##print("keys", keys)
- assert len(keys) <= 100
+ if len(keys) > 100:
+ raise ValueError(f"Query returned {len(keys)} keys, exceeding the 100 key limit")🤖 Prompt for AI Agents
In @benchmarks/wrappers/cloudflare/python/nosql.py at line 91, Replace the hard
assertion "assert len(keys) <= 100" with explicit validation that raises a
descriptive exception when the limit is exceeded: check if len(keys) > 100 and
raise a ValueError (or a custom exception) that includes the actual len(keys)
and the maximum allowed (100) so callers get a clear error message instead of an
abrupt assertion failure; update any docstring/comments for the function in
nosql.py that handles "keys" to reflect this explicit error behavior.
| for key in list_res.keys: | ||
| keys.append(key.name) | ||
| ##print("keys", keys) | ||
| assert len(keys) <= 100 |
There was a problem hiding this comment.
Replace hard assertion with proper error handling.
Same issue as in nosql_do.query(): the hard assertion will crash if more than 100 keys are found. Replace with a descriptive exception.
🛡️ Proposed fix
##print("keys", keys)
- assert len(keys) <= 100
+ if len(keys) > 100:
+ raise ValueError(f"Query returned {len(keys)} keys, exceeding the 100 key limit")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| assert len(keys) <= 100 | |
| if len(keys) > 100: | |
| raise ValueError(f"Query returned {len(keys)} keys, exceeding the 100 key limit") |
🤖 Prompt for AI Agents
In @benchmarks/wrappers/cloudflare/python/nosql.py at line 184, Replace the hard
assertion "assert len(keys) <= 100" with proper error handling: detect when
len(keys) > 100 and raise a descriptive exception (e.g., ValueError) that
includes the actual count and context (which operation/function and parameter
causing it). Apply the same pattern used to fix nosql_do.query(): validate the
keys collection early in the function that constructs/uses the "keys" variable
and raise a clear, actionable error instead of asserting so the caller can
handle it.
| for key in keys: | ||
|
|
||
| get_res = run_sync(self.get_table(table_name).get(key)) | ||
| get_res = get_res.replace("\'", "\"") |
There was a problem hiding this comment.
Fragile string manipulation for JSON parsing.
Replacing single quotes with double quotes to fix JSON parsing suggests the data is being stored or retrieved in an incorrect format. This approach is brittle and could fail if the data contains escaped quotes or quotes within string values.
Investigation needed
Consider the following scenarios where this will break:
- Data containing:
{"name": "O'Brien"}→ becomes{"name": "O"Brien"}(invalid JSON) - Data containing:
{"quote": "She said 'hello'"}→ mismatched quotes
Root cause options:
- If the KV store is incorrectly storing Python repr() instead of JSON, fix the storage format
- If this is intentional, document why and add validation
- Consider using
ast.literal_eval()if Python literals are expected
🤖 Prompt for AI Agents
In @benchmarks/wrappers/cloudflare/python/nosql.py at line 192, The code is
brittle because get_res is being massaged with get_res.replace("'", "\"") before
JSON parsing; stop doing that and parse the data in its true format: ensure the
producer stores JSON strings (fix the writer to use json.dumps) or, if the KV
contains Python literal reprs, replace the fragile replace() with a safe parser
(use json.loads(get_res) when data is valid JSON, or ast.literal_eval(get_res)
when Python literals are expected) and add validation/error handling; remove the
replace call and update the code that reads/writes the KV so the stored format
is unambiguous and parsed with the proper function (reference the get_res
variable and the surrounding KV read/write logic).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @benchmarks/wrappers/cloudflare/python/nosql.py:
- Around line 161-168: The get method currently returns the raw stored string,
violating its Optional[dict] contract; update the get function (involving get(),
key_maker(), get_table(), run_sync()) to parse the retrieved JSON string with
json.loads before returning, handle the case where the run_sync result is None
by returning None, and propagate or wrap JSON parsing errors appropriately so
callers receive a dict or None consistent with insert/update behavior.
🧹 Nitpick comments (1)
benchmarks/wrappers/cloudflare/python/nosql.py (1)
54-63: Clarify update vs. insert semantics.The comment on line 54 questions whether
updateshould have different behavior frominsert. In typical NoSQL patterns, update might want to:
- Merge with existing data rather than overwrite
- Return an error if the key doesn't exist (vs. insert creating it)
- Perform atomic updates or conditional writes
If this distinction matters for your use case, the methods should be differentiated; otherwise, consider removing the comment or explicitly documenting that they're intentionally identical.
Would you like me to suggest an implementation that differentiates insert and update operations?
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
benchmarks/wrappers/cloudflare/python/nosql.py
🧰 Additional context used
🪛 Ruff (0.14.10)
benchmarks/wrappers/cloudflare/python/nosql.py
23-23: Use explicit conversion flag
Replace with conversion flag
(RUF010)
26-26: Use explicit conversion flag
Replace with conversion flag
(RUF010)
38-38: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
40-40: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
51-51: Local variable put_res is assigned to but never used
Remove assignment to unused variable put_res
(F841)
125-125: Use explicit conversion flag
Replace with conversion flag
(RUF010)
128-128: Use explicit conversion flag
Replace with conversion flag
(RUF010)
140-140: Local variable put_res is assigned to but never used
Remove assignment to unused variable put_res
(F841)
154-154: Local variable put_res is assigned to but never used
Remove assignment to unused variable put_res
(F841)
🔇 Additional comments (1)
benchmarks/wrappers/cloudflare/python/nosql.py (1)
29-40: Verify pickle security implications.Pickle deserialization can be unsafe with untrusted data. Ensure that the Durable Object storage is fully trusted and cannot be manipulated by untrusted sources. If there's any possibility of untrusted data entering this path, switch to JSON serialization instead.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @benchmarks/wrappers/cloudflare/python/storage.py:
- Around line 46-61: The bug is double-prefixing "/tmp" for already-absolute tmp
paths causing invalid paths; fix by (1) in download(bucket, key, filepath) add
the actual filesystem path (real_fp) to written_files instead of the requested
filepath (self.written_files.add(real_fp)) and (2) in upload(bucket, key,
filepath) when you detect filepath in written_files, only prepend "/tmp" if the
filepath does not already start with "/tmp" (e.g., if filepath in
self.written_files: filepath = filepath if filepath.startswith("/tmp") else
"/tmp" + os.path.abspath(filepath)). Ensure these changes use the existing
upload, download, and written_files symbols.
- Around line 101-109: In adownload_stream, get_res from
get_bucket(...).get(key) is an R2ObjectBody and does not have .bytes(); replace
the await get_res.bytes() call with await get_res.arrayBuffer() and convert the
returned ArrayBuffer to Python bytes by wrapping it in a JS Uint8Array (e.g.,
js.Uint8Array.new(array_buffer)) and then calling bytes(...) on that result so
the function returns proper binary data instead of breaking at runtime.
🧹 Nitpick comments (3)
benchmarks/wrappers/cloudflare/python/storage.py (3)
35-38: Unusedbucketparameter inget_bucket
bucketis never used here, which triggers ARG002 and can confuse readers. If you don’t plan to support multiple bindings, either drop the parameter or rename it to_bucketto make the intent explicit.
75-96: Unusedput_resfrombobj.put
put_resis assigned but never used, triggering F841 and slightly obscuring intent. Since you only care about the side effect of the PUT (and exceptions will propagate), you can drop the variable:♻️ Simplify `aupload_stream`
- bobj = self.get_bucket(bucket) - put_res = await bobj.put(unique_key, data_js) - return unique_key + bobj = self.get_bucket(bucket) + await bobj.put(unique_key, data_js) + return unique_key
111-116: Duplicatereturn storage.instanceis dead codeThe second
return storage.instanceis unreachable and can be removed:✂️ Remove dead code
def get_instance(): if storage.instance is None: raise RuntimeError("must init storage singleton first") - return storage.instance - return storage.instance + return storage.instanceThe current exception message is short and clear; TRY003 from Ruff is mostly stylistic here, so I’d keep it as-is unless you prefer a custom exception type.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
benchmarks/wrappers/cloudflare/python/storage.py
🧰 Additional context used
🪛 Ruff (0.14.10)
benchmarks/wrappers/cloudflare/python/storage.py
35-35: Unused method argument: bucket
(ARG002)
48-48: Probable insecure usage of temporary file or directory: "/tmp"
(S108)
57-57: Probable insecure usage of temporary file or directory: "/tmp"
(S108)
58-58: Probable insecure usage of temporary file or directory: "/tmp"
(S108)
95-95: Local variable put_res is assigned to but never used
Remove assignment to unused variable put_res
(F841)
114-114: Avoid specifying long messages outside the exception class
(TRY003)
| def upload(self, bucket, key, filepath): | ||
| if filepath in self.written_files: | ||
| filepath = "/tmp" + os.path.abspath(filepath) | ||
| with open(filepath, "rb") as f: | ||
| unique_key = self.upload_stream(bucket, key, f.read()) | ||
| return unique_key | ||
|
|
||
| def download(self, bucket, key, filepath): | ||
| data = self.download_stream(bucket, key) | ||
| # should only allow writes to tmp dir. so do have to edit the filepath here? | ||
| real_fp = filepath | ||
| if not filepath.startswith("/tmp"): | ||
| real_fp = "/tmp" + os.path.abspath(filepath) | ||
|
|
||
| self.written_files.add(filepath) | ||
| with open(real_fp, "wb") as f: |
There was a problem hiding this comment.
/tmp prefixing can generate invalid paths for already-absolute tmp files
The interaction between upload, download, and written_files has a corner case:
downloadrecords the requestedfilepathinwritten_files.uploadrewrites any path found inwritten_filesby prepending"/tmp" + os.path.abspath(filepath).
If a caller ever uses a path that already starts with /tmp (e.g., via download_directory(..., out_path="/tmp")), you’ll end up trying to open /tmp/tmp/..., which likely doesn’t exist.
You can retain the current behavior for non‑/tmp paths and avoid the double prefix with a small guard:
🔧 Suggested fix for `/tmp` double-prefix
def upload(self, bucket, key, filepath):
- if filepath in self.written_files:
- filepath = "/tmp" + os.path.abspath(filepath)
+ if filepath in self.written_files and not filepath.startswith("/tmp"):
+ filepath = "/tmp" + os.path.abspath(filepath)Note: Ruff’s S108 warning about /tmp is expected in Workers, since /tmp is the canonical writable directory; your os.path.abspath usage already mitigates simple traversal issues.
🧰 Tools
🪛 Ruff (0.14.10)
48-48: Probable insecure usage of temporary file or directory: "/tmp"
(S108)
57-57: Probable insecure usage of temporary file or directory: "/tmp"
(S108)
58-58: Probable insecure usage of temporary file or directory: "/tmp"
(S108)
🤖 Prompt for AI Agents
In @benchmarks/wrappers/cloudflare/python/storage.py around lines 46 - 61, The
bug is double-prefixing "/tmp" for already-absolute tmp paths causing invalid
paths; fix by (1) in download(bucket, key, filepath) add the actual filesystem
path (real_fp) to written_files instead of the requested filepath
(self.written_files.add(real_fp)) and (2) in upload(bucket, key, filepath) when
you detect filepath in written_files, only prepend "/tmp" if the filepath does
not already start with "/tmp" (e.g., if filepath in self.written_files: filepath
= filepath if filepath.startswith("/tmp") else "/tmp" +
os.path.abspath(filepath)). Ensure these changes use the existing upload,
download, and written_files symbols.
| async def adownload_stream(self, bucket, key): | ||
| bobj = self.get_bucket(bucket) | ||
| get_res = await bobj.get(key) | ||
| if get_res == jsnull: | ||
| print("key not stored in bucket") | ||
| return b'' | ||
| # Always read as raw binary data (Blob/ArrayBuffer) | ||
| data = await get_res.bytes() | ||
| return bytes(data) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Cloudflare Workers R2 JavaScript API Bucket.get() return object methods bytes() arrayBuffer()
💡 Result:
Short answer: env.BUCKET.get(key) returns Promise<R2ObjectBody | R2Object | null>. When you get an R2ObjectBody it exposes a ReadableStream in .body plus convenience read helpers — arrayBuffer(), text(), json(), blob() and bodyUsed. There is no bytes() method on the R2ObjectBody; use arrayBuffer() (or blob()/text()) to read the bytes. [1][2]
Example:
- Get object and read as ArrayBuffer:
const obj = await env.MY_BUCKET.get("file");
if (obj?.body) {
const buf = await obj.arrayBuffer(); // ArrayBuffer of the object's bytes
// buf.byteLength ...
}
Sources: Cloudflare Workers R2 API reference (R2Object / R2ObjectBody) and Download objects docs. [1][2]
🏁 Script executed:
# First, understand the file structure and locate relevant files
git ls-files | grep -E "(cloudflare|R2|storage)" | head -20Repository: spcl/serverless-benchmarks
Length of output: 1122
🏁 Script executed:
# Look at the storage.py file to understand the full context
cat -n benchmarks/wrappers/cloudflare/python/storage.pyRepository: spcl/serverless-benchmarks
Length of output: 4927
🏁 Script executed:
# Search for usage of 'bytes()' method or 'get_res' patterns in the codebase
rg "\.bytes\(\)" --type py -B 2 -A 2Repository: spcl/serverless-benchmarks
Length of output: 915
🏁 Script executed:
# Search for R2 imports or references
rg -i "r2\|bucket\.get" --type py -B 2 -A 2Repository: spcl/serverless-benchmarks
Length of output: 52
Fix: R2 get() returns R2ObjectBody which has no .bytes() method
The Cloudflare R2 API for Bucket.get() returns an R2ObjectBody that exposes arrayBuffer(), text(), json(), and blob(), but not bytes(). Line 108 will fail at runtime with "bytes is not a function".
Use arrayBuffer() instead:
Suggested fix
data = await get_res.arrayBuffer()
return bytes(js.Uint8Array.new(data))🤖 Prompt for AI Agents
In @benchmarks/wrappers/cloudflare/python/storage.py around lines 101 - 109, In
adownload_stream, get_res from get_bucket(...).get(key) is an R2ObjectBody and
does not have .bytes(); replace the await get_res.bytes() call with await
get_res.arrayBuffer() and convert the returned ArrayBuffer to Python bytes by
wrapping it in a JS Uint8Array (e.g., js.Uint8Array.new(array_buffer)) and then
calling bytes(...) on that result so the function returns proper binary data
instead of breaking at runtime.
sebs/cloudflare/README.md
Outdated
| @@ -0,0 +1,474 @@ | |||
| # Cloudflare Workers Implementation for SeBS | |||
|
|
|||
| This directory contains the **complete implementation** of Cloudflare Workers support for the SeBS (Serverless Benchmarking Suite). | |||
There was a problem hiding this comment.
Can you please move it to docs?
There was a problem hiding this comment.
I think we should merge with existing docs, possibly splitting the file - something to benchmark, something else to platforms, something else to storage, etc.
sebs/cloudflare/README.md
Outdated
| ## Implementation Status | ||
|
|
||
| ✅ **Fully Implemented** - All features are production-ready: | ||
| - Multi-language support (JavaScript, Python, Java, Go, Rust) via containers |
There was a problem hiding this comment.
I don't think we have Go and Rust support :-)
sebs/cloudflare/triggers.py
Outdated
| from sebs.faas.function import Trigger, ExecutionResult | ||
|
|
||
|
|
||
| class LibraryTrigger(Trigger): |
There was a problem hiding this comment.
I think you don't need this class at all; just don't implement this trigger type.
| COPY . . | ||
|
|
||
| # Create ffmpeg directory and symlink for video-processing benchmark compatibility | ||
| RUN mkdir -p /app/ffmpeg && ln -s /usr/bin/ffmpeg /app/ffmpeg/ffmpeg |
There was a problem hiding this comment.
This is does not look correct - all of this should be handled by the benchmark itself.
There was a problem hiding this comment.
Without this symlink the benchmark 220 is looking for ffmpeg at /app and throws:
FileNotFoundError: [Errno 2] No such file or directory: '/app/ffmpeg/ffmpeg'\n'}
There was a problem hiding this comment.
ahh i just noticed the other platforms just run the init.sh file in the benchmark directory. this installs the ffpeg etc at the right locations. so i updated our code to do the same.
sebs/cloudflare/cloudflare.py
Outdated
| # self.logging.info("Waiting 120 seconds for benchmark initialization...") | ||
| # time.sleep(400) | ||
|
|
||
| # For container deployments, wait for Durable Object infrastructure to initialize |
There was a problem hiding this comment.
safety net for sometimes provisioning of the container itself can take longer even though the /health endpoint already works
sebs/cloudflare/cloudflare.py
Outdated
| raise RuntimeError(error_msg) | ||
|
|
||
| def _wait_for_durable_object_ready(self, worker_name: str, package_dir: str, env: dict): | ||
| """Wait for container Durable Object to be fully provisioned and ready.""" |
There was a problem hiding this comment.
I don't get this - it looks like we wait until at least one worker is ready and running but how is this related to Durable Object?
There was a problem hiding this comment.
well the workers instantiate the durable object.... so we need to wait for the worker to be ready and thus transitively the durable object.
| def _build_workers_dev_url(self, worker_name: str, account_id: Optional[str]) -> str: | ||
| """Build a best-effort public URL for a worker. | ||
|
|
||
| Prefer using the account's readable workers.dev subdomain when available |
There was a problem hiding this comment.
just if the user has not configured correctly
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@benchmarks/wrappers/cloudflare/python/nosql.py`:
- Around line 107-111: The get_instance method on nosql_do currently constructs
a nosql_do() without ensuring the required binding attribute, which leads to
AttributeError if called before init_instance; update nosql_do.get_instance to
validate that an instance exists and that instance.binding is set (or otherwise
initialized) and if not, raise a clear RuntimeError instructing callers to call
nosql_do.init_instance(...) first (or provide the needed binding), or
alternatively call the existing init_instance flow if safe; reference the
nosql_do.get_instance and nosql_do.init_instance symbols and the
instance.binding attribute in your change.
- Around line 203-207: get_instance currently constructs a nosql_kv() without
ensuring the required env attribute is set; change get_instance (and mirror the
change made for nosql_do.get_instance) so it either calls init_instance to
create the singleton with env or validates and raises a clear error if env
hasn't been initialized; specifically update nosql_kv.get_instance to return the
existing instance if set, otherwise call nosql_kv.init_instance(...) (or require
an env parameter) to create the instance with env populated, or throw an
explicit exception telling callers to call init_instance first, so subsequent
methods on the singleton won't hit AttributeError due to a missing env.
In `@docs/platforms.md`:
- Around line 227-228: Update the workers.dev example URL in the HTTP Trigger
bullet: replace the placeholder https://{name}.{account}.workers.dev with
https://{name}.{subdomain}.workers.dev in the "- **HTTP Trigger**" line so the
docs use the correct {subdomain} token instead of {account}.
- Around line 236-244: Add blank lines before and after the markdown table that
begins with "| Instance Type | vCPU | Memory | Disk |" so the table is separated
from surrounding paragraphs (specifically the "Metrics Collection" paragraph
that follows); this satisfies markdownlint rule MD058—insert one empty line
above the table and one empty line below it in docs/platforms.md.
In `@sebs/cloudflare/cloudflare.py`:
- Around line 1451-1479: The digit-prefix rule must apply to all workers: in
format_function_name, after computing formatted (and stripping hyphens), check
if formatted is non-empty and starts with a digit; if so, prefix it
appropriately — use 'container-' when container_deployment is True and 'w-' (or
another agreed native-worker prefix) when container_deployment is False —
instead of only handling the container_deployment branch; update the conditional
that currently checks "if container_deployment and formatted and
formatted[0].isdigit()" to always enforce the digit-prefix rule based on
formatted and container_deployment.
♻️ Duplicate comments (9)
benchmarks/wrappers/cloudflare/python/nosql.py (3)
86-91: Remove debug print and replace assertion with proper error handling.Line 88 contains a debug
assertthat will crash with an unhelpfulAssertionErrorif more than 100 keys are returned.🔧 Proposed fix
keys = [] for key in list_res: if key.startswith(prefix_key): - print(key) keys.append(key) ##print("keys", keys) - assert len(keys) <= 100 + if len(keys) > 100: + raise ValueError(f"Query returned {len(keys)} keys, exceeding the 100 key limit")
161-168:getmethod returns raw string instead of parsed dict.The method signature declares
-> Optional[dict], but it returns the raw string from KV storage. Sinceinsertandupdatestore data withjson.dumps(), this method should parse the result withjson.loads()for consistency.🔧 Proposed fix
def get( self, table_name: str, primary_key: Tuple[str, str], secondary_key: Tuple[str, str] ) -> Optional[dict]: get_res = run_sync( self.get_table(table_name).get( self.key_maker(primary_key, secondary_key) )) - return get_res + if get_res is None: + return None + return json.loads(get_res)
180-196: Replace assertion and fix fragile JSON parsing.Line 184 has the same hard assertion issue. Line 192 uses brittle string replacement (
"'"→"\"") which will corrupt data containing apostrophes (e.g.,{"name": "O'Brien"}).🔧 Proposed fix
If the
getmethod is fixed to returnjson.loads(get_res), this query method can be simplified:keys = [] for key in list_res.keys: keys.append(key.name) ##print("keys", keys) - assert len(keys) <= 100 + if len(keys) > 100: + raise ValueError(f"Query returned {len(keys)} keys, exceeding the 100 key limit") # todo: please use bulk sometime (it didn't work when i tried it) res = [] for key in keys: get_res = run_sync(self.get_table(table_name).get(key)) - get_res = get_res.replace("\'", "\"") - ##print("gr", get_res) - - res.append(json.loads(get_res)) + if get_res is not None: + res.append(json.loads(get_res)) return resThe root cause of needing quote replacement suggests data may be stored incorrectly elsewhere. Verify that all writers use
json.dumps()consistently.sebs/cloudflare/cloudflare.py (6)
73-83: Avoid mutable default & unusedconfigparameter.
The default{}is unsafe andconfigis unused. PreferNoneand normalize, or drop the parameter entirely.🔧 Proposed fix
- def initialize(self, config: Dict[str, str] = {}, resource_prefix: Optional[str] = None): + def initialize(self, config: Optional[Dict[str, str]] = None, resource_prefix: Optional[str] = None): @@ - # Verify credentials are valid + if config is None: + config = {} + # Verify credentials are valid self._verify_credentials()
148-154: Add timeout + request error handling for credential verification.
requests.get()without a timeout can hang indefinitely; handle request errors explicitly.🔧 Proposed fix
- response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers) + try: + response = requests.get( + f"{self._api_base_url}/user/tokens/verify", + headers=headers, + timeout=30, + ) + except requests.exceptions.RequestException as e: + raise RuntimeError("Failed to verify Cloudflare credentials") from e
841-849: Bind loop variables inreplacerto avoid stale closure.
The nested function capturescontrol_flow/builtinsfrom the loop scope; bind them explicitly.🐛 Proposed fix
- def replacer(match): + def replacer(match, control_flow=control_flow, builtins=builtins):
1088-1094: Don’t create LibraryTriggers when the platform doesn’t support them.
Docs state Library triggers aren’t supported, but they’re still created and returned. This is likely to surface as unsupported trigger usage at runtime.🔧 Proposed fix
- from sebs.cloudflare.triggers import LibraryTrigger, HTTPTrigger + from sebs.cloudflare.triggers import HTTPTrigger @@ - library_trigger = LibraryTrigger(func_name, self) - library_trigger.logging_handlers = self.logging_handlers - worker.add_trigger(library_trigger) + # Library triggers are not supported for Cloudflare Workers @@ - from sebs.cloudflare.triggers import LibraryTrigger, HTTPTrigger + from sebs.cloudflare.triggers import HTTPTrigger @@ - if trigger_type == Trigger.TriggerType.LIBRARY: - trigger = LibraryTrigger(worker.name, self) - trigger.logging_handlers = self.logging_handlers - return trigger - elif trigger_type == Trigger.TriggerType.HTTP: + if trigger_type == Trigger.TriggerType.HTTP: account_id = worker.account_id or self.config.credentials.account_id worker_url = self._build_workers_dev_url(worker.name, account_id) trigger = HTTPTrigger(worker.name, worker_url) trigger.logging_handlers = self.logging_handlers return trigger + elif trigger_type == Trigger.TriggerType.LIBRARY: + raise NotImplementedError("Library triggers are not supported for Cloudflare Workers")Cloudflare Workers trigger types support library triggerAlso applies to: 1622-1635
1109-1115: Add timeout + specific JSON error handling in_get_worker.🔧 Proposed fix
- response = requests.get(url, headers=headers) + try: + response = requests.get(url, headers=headers, timeout=30) + except requests.exceptions.RequestException as e: + self.logging.warning(f"Failed to query worker {worker_name}: {e}") + return None @@ - try: - return response.json().get("result") - except: - return None + try: + return response.json().get("result") + except (json.JSONDecodeError, ValueError, KeyError): + return None
1296-1322: Add timeout + narrow exception handling in_get_workers_dev_subdomain.🔧 Proposed fix
- resp = requests.get(url, headers=headers) + resp = requests.get(url, headers=headers, timeout=30) @@ - except Exception as e: + except requests.exceptions.RequestException as e: self.logging.warning(f"Error fetching workers.dev subdomain: {e}") return None
🧹 Nitpick comments (2)
benchmarks/wrappers/cloudflare/python/nosql.py (1)
28-40: Consider security implications of pickle serialization.Using
picklefor serialization carries inherent risks (arbitrary code execution during deserialization). While this appears to be internal Worker-to-Durable-Object communication where data is trusted, consider using JSON serialization for defense-in-depth, similar to hownosql_kvhandles data.♻️ Suggested alternative using JSON
-## these data conversion funcs should not be necessary. i couldn't get pyodide to clone the data otherwise - def data_pre(self, data): - return pickle.dumps(data, 0).decode("ascii") - - def data_post(self, data): - # Handle None (key not found in storage) - if data is None: - return None - # Handle both string and bytes data from Durable Object storage - if isinstance(data, str): - return pickle.loads(bytes(data, "ascii")) - else: - return pickle.loads(data) + def data_pre(self, data): + return json.dumps(data) + + def data_post(self, data): + if data is None: + return None + return json.loads(data)If pickle is required due to Pyodide cloning limitations as the comment suggests, document this constraint more explicitly and ensure data sources are always trusted.
sebs/cloudflare/cloudflare.py (1)
1189-1204: Avoid double-waiting after container deployment.
_wait_for_durable_object_ready()already waits; the extra 60s sleep is redundant unless readiness failed.♻️ Proposed refactor
- if container_deployment: - self.logging.info("Waiting for container Durable Object to initialize...") - self._wait_for_durable_object_ready(worker_name, package_dir, env) + if container_deployment: + self.logging.info("Waiting for container Durable Object to initialize...") + ready = self._wait_for_durable_object_ready(worker_name, package_dir, env) @@ - if container_deployment: - self.logging.info("Waiting 60 seconds for container Durable Object to initialize...") - time.sleep(60) + if container_deployment and not ready: + self.logging.info("Waiting 60 seconds for container Durable Object to initialize...") + time.sleep(60)
| @staticmethod | ||
| def get_instance(): | ||
| if nosql_do.instance is None: | ||
| nosql_do.instance = nosql_do() | ||
| return nosql_do.instance |
There was a problem hiding this comment.
get_instance creates instance without required binding attribute.
If get_instance() is called before init_instance(), the created instance will lack the binding attribute, causing AttributeError on subsequent operations. Consider raising an explicit error or initializing defensively.
🔧 Proposed fix
`@staticmethod`
def get_instance():
if nosql_do.instance is None:
- nosql_do.instance = nosql_do()
+ raise RuntimeError("nosql_do not initialized. Call init_instance() first.")
return nosql_do.instance📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @staticmethod | |
| def get_instance(): | |
| if nosql_do.instance is None: | |
| nosql_do.instance = nosql_do() | |
| return nosql_do.instance | |
| `@staticmethod` | |
| def get_instance(): | |
| if nosql_do.instance is None: | |
| raise RuntimeError("nosql_do not initialized. Call init_instance() first.") | |
| return nosql_do.instance |
🤖 Prompt for AI Agents
In `@benchmarks/wrappers/cloudflare/python/nosql.py` around lines 107 - 111, The
get_instance method on nosql_do currently constructs a nosql_do() without
ensuring the required binding attribute, which leads to AttributeError if called
before init_instance; update nosql_do.get_instance to validate that an instance
exists and that instance.binding is set (or otherwise initialized) and if not,
raise a clear RuntimeError instructing callers to call
nosql_do.init_instance(...) first (or provide the needed binding), or
alternatively call the existing init_instance flow if safe; reference the
nosql_do.get_instance and nosql_do.init_instance symbols and the
instance.binding attribute in your change.
| @staticmethod | ||
| def get_instance(): | ||
| if nosql_kv.instance is None: | ||
| nosql_kv.instance = nosql_kv() | ||
| return nosql_kv.instance |
There was a problem hiding this comment.
get_instance creates instance without required env attribute.
Same issue as nosql_do.get_instance(): if called before init_instance(), the instance will lack the env attribute, causing AttributeError on subsequent operations.
🔧 Proposed fix
`@staticmethod`
def get_instance():
if nosql_kv.instance is None:
- nosql_kv.instance = nosql_kv()
+ raise RuntimeError("nosql_kv not initialized. Call init_instance() first.")
return nosql_kv.instance📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @staticmethod | |
| def get_instance(): | |
| if nosql_kv.instance is None: | |
| nosql_kv.instance = nosql_kv() | |
| return nosql_kv.instance | |
| `@staticmethod` | |
| def get_instance(): | |
| if nosql_kv.instance is None: | |
| raise RuntimeError("nosql_kv not initialized. Call init_instance() first.") | |
| return nosql_kv.instance |
🤖 Prompt for AI Agents
In `@benchmarks/wrappers/cloudflare/python/nosql.py` around lines 203 - 207,
get_instance currently constructs a nosql_kv() without ensuring the required env
attribute is set; change get_instance (and mirror the change made for
nosql_do.get_instance) so it either calls init_instance to create the singleton
with env or validates and raises a clear error if env hasn't been initialized;
specifically update nosql_kv.get_instance to return the existing instance if
set, otherwise call nosql_kv.init_instance(...) (or require an env parameter) to
create the instance with env populated, or throw an explicit exception telling
callers to call init_instance first, so subsequent methods on the singleton
won't hit AttributeError due to a missing env.
| - **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{account}.workers.dev` | ||
| - **Library Trigger**: ❌ Not currently supported |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Cloudflare Workers workers.dev URL format documentation
💡 Result:
Format: <YOUR_WORKER_NAME>.<YOUR_ACCOUNT_SUBDOMAIN>.workers.dev. Workers.dev account subdomain itself is <YOUR_ACCOUNT_SUBDOMAIN>.workers.dev. Worker-name restrictions when using workers.dev: ≤63 characters, only letters/numbers/dashes, and cannot start or end with a dash. [1][2]
Preview/versioned preview URL pattern: <VERSION_PREFIX>-<WORKER_NAME>..workers.dev. [3]
Sources: Cloudflare Workers docs. [1] [2] [3]
Replace {account} placeholder with {subdomain} in workers.dev URL.
The workers.dev URL format uses an account subdomain, not an account ID. The placeholder should reflect this to avoid confusion when users configure Workers.
✏️ Documentation fix
-- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{account}.workers.dev`
+- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{subdomain}.workers.dev`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{account}.workers.dev` | |
| - **Library Trigger**: ❌ Not currently supported | |
| - **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{subdomain}.workers.dev` | |
| - **Library Trigger**: ❌ Not currently supported |
🤖 Prompt for AI Agents
In `@docs/platforms.md` around lines 227 - 228, Update the workers.dev example URL
in the HTTP Trigger bullet: replace the placeholder
https://{name}.{account}.workers.dev with https://{name}.{subdomain}.workers.dev
in the "- **HTTP Trigger**" line so the docs use the correct {subdomain} token
instead of {account}.
| @staticmethod | ||
| def format_function_name(name: str, container_deployment: bool = False) -> str: | ||
| """ | ||
| Format a function name to comply with Cloudflare Worker naming rules. | ||
|
|
||
| Worker names must: | ||
| - Be lowercase | ||
| - Contain only alphanumeric characters and hyphens | ||
| - Not start or end with a hyphen | ||
| - Not start with a digit | ||
|
|
||
| Args: | ||
| name: The original name | ||
| container_deployment: Whether this is a container worker (adds 'w-' prefix if name starts with digit) | ||
|
|
||
| Returns: | ||
| Formatted name | ||
| """ | ||
| # Convert to lowercase and replace invalid characters | ||
| formatted = name.lower().replace('_', '-').replace('.', '-') | ||
| # Remove any characters that aren't alphanumeric or hyphen | ||
| formatted = ''.join(c for c in formatted if c.isalnum() or c == '-') | ||
| # Remove leading/trailing hyphens | ||
| formatted = formatted.strip('-') | ||
| # Ensure container worker names don't start with a digit (Cloudflare requirement) | ||
| # Only add prefix for container workers to differentiate from native workers | ||
| if container_deployment and formatted and formatted[0].isdigit(): | ||
| formatted = 'container-' + formatted | ||
| return formatted |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /tmp && git ls-files | head -20Repository: spcl/serverless-benchmarks
Length of output: 138
🏁 Script executed:
cat sebs/cloudflare/cloudflare.py | wc -lRepository: spcl/serverless-benchmarks
Length of output: 74
🏁 Script executed:
sed -n '1451,1479p' sebs/cloudflare/cloudflare.pyRepository: spcl/serverless-benchmarks
Length of output: 1348
🏁 Script executed:
rg "format_function_name" --type py -B 2 -A 3Repository: spcl/serverless-benchmarks
Length of output: 2585
🌐 Web query:
Cloudflare Workers script naming rules requirements
💡 Result:
Summary — Worker script name rules
- Allowed characters: letters (A–Z, a–z), digits (0–9) and dashes (-). Do not use underscores. [1][2]
- Maximum length: up to 255 characters for a Worker name. [1]
- If you use a workers.dev subdomain (workers.dev URL / DNS label): name must be ≤ 63 characters and cannot start or end with a dash. [2]
- The name is required in wrangler config (name property / script_name used in API/URLs). [1][3]
Sources: Cloudflare Workers / Wrangler documentation. [1] Wranger configuration (name rules). [2] workers.dev routing limits. [3] Cloudflare Workers Scripts API reference.
Apply the digit-prefix rule to all workers, not just container deployments.
The function's docstring documents that worker names must not start with a digit, but the current implementation only enforces this when container_deployment=True. Benchmark names like 311.compression will still produce invalid worker names for native deployments.
🔧 Suggested fix
- if container_deployment and formatted and formatted[0].isdigit():
- formatted = 'container-' + formatted
+ if formatted and formatted[0].isdigit():
+ prefix = "container-" if container_deployment else "w-"
+ formatted = prefix + formatted🤖 Prompt for AI Agents
In `@sebs/cloudflare/cloudflare.py` around lines 1451 - 1479, The digit-prefix
rule must apply to all workers: in format_function_name, after computing
formatted (and stripping hyphens), check if formatted is non-empty and starts
with a digit; if so, prefix it appropriately — use 'container-' when
container_deployment is True and 'w-' (or another agreed native-worker prefix)
when container_deployment is False — instead of only handling the
container_deployment branch; update the conditional that currently checks "if
container_deployment and formatted and formatted[0].isdigit()" to always enforce
the digit-prefix rule based on formatted and container_deployment.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@sebs/cloudflare/cli.py`:
- Around line 103-110: The readiness loop in cli.py (where
docker_instance.logs(stream=True, follow=True) is iterated via dkg and
next(dkg).decode(...)) can block forever; add a timeout (e.g., max_wait_seconds)
and track elapsed time (using time.time()) inside the while True to break and
raise/return after the timeout, and also sleep briefly on StopIteration to avoid
busy looping; ensure you update the block that assigns dkg and calls next(dkg)
so it stops with a clear error/exception when the timeout elapses instead of
hanging.
In `@sebs/cloudflare/cloudflare.py`:
- Around line 354-365: The requests.get call should include a timeout and the
bare except should be replaced with specific exception handling: call
requests.get(url, headers=headers, timeout=<reasonable_seconds>) and catch
JSONDecodeError (or ValueError) when calling response.json() instead of using a
bare except; also consider catching requests.exceptions.RequestException around
the requests.get call and log errors via self.logging.warning including
response.status_code or exception details to aid debugging (refer to
requests.get, response.json(), and self.logging.warning in the surrounding
method).
In `@sebs/cloudflare/containers.py`:
- Around line 356-362: Update the exception handling and tar extraction call in
the tarfile.open block: preserve the original exception by re-raising
RuntimeError using "from e" (i.e., raise RuntimeError(...) from e) so the
exception chain is kept, and when calling tar.extractall(directory) pass the
recommended filter parameter (e.g., filter='data' or an equivalent safe filter)
to harden extraction; locate the tarfile.open(...) block and the except
Exception as e handling that uses self.logging.error/f"npm install failed: {e}"
and change the extractall call and the raise accordingly.
♻️ Duplicate comments (6)
sebs/cloudflare/cloudflare.py (4)
80-90: Fix mutable default argument.The
configparameter uses a mutable default{}which can cause unexpected behavior if the same dict instance is reused across calls. Additionally, the parameter is unused in the method body.♻️ Proposed fix
- def initialize(self, config: Dict[str, str] = {}, resource_prefix: Optional[str] = None): + def initialize(self, config: Optional[Dict[str, str]] = None, resource_prefix: Optional[str] = None): """ Initialize the Cloudflare Workers platform. Args: config: Additional configuration parameters resource_prefix: Prefix for resource naming """ + if config is None: + config = {} # Verify credentials are valid self._verify_credentials()
155-161: Add timeout to requests call.The
requests.get()call lacks a timeout parameter, which can cause the application to hang indefinitely if the Cloudflare API is unresponsive.🔒 Proposed fix
- response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers) + response = requests.get( + f"{self._api_base_url}/user/tokens/verify", + headers=headers, + timeout=30 + )
455-482: Add timeout to requests call.The
requests.get()call on line 458 lacks a timeout parameter, which can cause the application to hang indefinitely when fetching the workers.dev subdomain.🔒 Proposed fix
- resp = requests.get(url, headers=headers) + resp = requests.get(url, headers=headers, timeout=30)
610-638: Apply digit-prefix rule to all workers, not just containers.The docstring correctly states worker names must not start with a digit, but the implementation only enforces this for
container_deployment=True. Benchmark names like311.compressionwill produce invalid worker names for native deployments.🐛 Proposed fix
- if container_deployment and formatted and formatted[0].isdigit(): - formatted = 'container-' + formatted + if formatted and formatted[0].isdigit(): + prefix = "container-" if container_deployment else "w-" + formatted = prefix + formatteddocs/platforms.md (2)
229-230: Replace{account}with{subdomain}in workers.dev URL.The workers.dev URL format uses an account subdomain, not an account ID. The placeholder should be
{subdomain}to match Cloudflare's actual URL structure.✏️ Proposed fix
-- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{account}.workers.dev` +- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{subdomain}.workers.dev`
236-246: Add blank lines around the table for markdown compliance.The table should be surrounded by blank lines to satisfy markdownlint rule MD058.
✏️ Proposed fix
- **Memory/Timeout Configuration (Containers)**: Managed by Cloudflare, available in different tiers: + | Instance Type | vCPU | Memory | Disk | |---------------|------|--------|------| | lite | 1/16 | 256 MiB | 2 GB | | basic | 1/4 | 1 GiB | 4 GB | | standard-1 | 1/2 | 4 GiB | 8 GB | | standard-2 | 1 | 6 GiB | 12 GB | | standard-3 | 2 | 8 GiB | 16 GB | | standard-4 | 4 | 12 GiB | 20 GB | + - **Metrics Collection**: ...
🧹 Nitpick comments (20)
sebs/cloudflare/durable_objects.py (2)
143-172: Consider logging a warning when data seeding is skipped.The
write_to_tablemethod silently skips data seeding due to platform limitations. Per a past reviewer comment, this silent skip could create a false impression that data was written. Consider adding a warning log to inform users that the operation was not performed.♻️ Proposed fix
table_name = self._get_table_name(benchmark, table) if not table_name: raise ValueError(f"Table {table} not found for benchmark {benchmark}") - # Silently skip data seeding for Cloudflare Durable Objects - # This is a platform limitation - pass + # Data seeding is not supported for Cloudflare Durable Objects + # This is a platform limitation - Durable Objects can only be written to from within Workers + self.logging.warning( + f"Skipping data seeding for table '{table}' - Cloudflare Durable Objects " + f"cannot be pre-populated via external API. Use 'large' benchmark size " + f"which creates data at runtime." + )
183-183: Remove extraneousfprefix from static string.This f-string has no placeholders, making the
fprefix unnecessary.♻️ Proposed fix
- self.logging.warning(f"Durable Objects data is managed within the Worker") + self.logging.warning("Durable Objects data is managed within the Worker")sebs/cloudflare/cloudflare.py (2)
438-441: Preserve exception chain withraise ... from e.When re-raising exceptions, use
raise ... from eto preserve the exception chain for better debugging.♻️ Proposed fix
except RuntimeError as e: error_msg = f"Wrangler deployment failed for worker {worker_name}: {str(e)}" - self.logging.error(error_msg) - raise RuntimeError(error_msg) + self.logging.exception(error_msg) + raise RuntimeError(error_msg) from e
431-434: Consider removing or making the hardcoded 60-second sleep configurable.After waiting for the Durable Object to report ready via the health endpoint, there's an additional unconditional 60-second sleep. This significantly increases deployment time. If this is truly needed as a safety net, consider making it configurable or reducing it based on observed provisioning times.
sebs/cloudflare/containers.py (4)
458-465: Preserve exception chain and use logging.exception.When re-raising exceptions, use
raise ... from eto preserve the exception chain. Also,logging.exceptionautomatically includes traceback information.♻️ Proposed fix
except docker.errors.BuildError as e: error_msg = f"Docker build failed for {image_tag}: {e}" - self.logging.error(error_msg) - raise RuntimeError(error_msg) + self.logging.exception(error_msg) + raise RuntimeError(error_msg) from e except Exception as e: error_msg = f"Unexpected error building Docker image {image_tag}: {e}" - self.logging.error(error_msg) - raise RuntimeError(error_msg) + self.logging.exception(error_msg) + raise RuntimeError(error_msg) from e
489-490: Remove unused variables.
consecutive_failuresandmax_consecutive_failuresare defined but never used. Remove them or implement the intended retry logic.♻️ Proposed fix - remove unused variables
- consecutive_failures = 0 - max_consecutive_failures = 5 - while time.time() - start_time < max_wait_seconds:
252-252: Remove extraneousfprefix from static string.This f-string has no placeholders.
♻️ Proposed fix
- self.logging.info(f"Copied worker.js orchestration file from nodejs/container") + self.logging.info("Copied worker.js orchestration file from nodejs/container")
351-355: Prefix unused variable with underscore.The
statvariable fromget_archiveis unused.♻️ Proposed fix
- bits, stat = cli.docker_instance.get_archive(f"{container_path}/node_modules") + bits, _stat = cli.docker_instance.get_archive(f"{container_path}/node_modules")dockerfiles/cloudflare/Dockerfile.manage (1)
24-25: Consider pinning uv version for reproducible builds.The uv installer script fetches the latest version by default, which could lead to build inconsistencies over time. Pinning to a specific version improves reproducibility.
♻️ Example with version pinning
-RUN curl -LsSf https://astral.sh/uv/install.sh | sh && \ +RUN curl -LsSf https://astral.sh/uv/0.9.26/install.sh | sh && \ /root/.local/bin/uv tool install workers-pysebs/cloudflare/cli.py (4)
51-79: Add exception chaining for better debugging.When re-raising exceptions within
exceptblocks, chain them to preserve the original traceback. This helps with debugging by showing the full error context.Proposed fix
if not os.path.exists(dockerfile_path): raise RuntimeError( f"Dockerfile not found at {dockerfile_path}. " "Cannot build Cloudflare CLI container." - ) + ) from None # ... build logic ... except docker.errors.BuildError as build_error: raise RuntimeError( f"Failed to build Docker image {full_image_name}: {build_error}" - ) + ) from build_error
28-40: Inconsistent logging usage.The code mixes module-level
logging(lines 30, 34, 39, 59, 75) withself.logging(line 101). Since the class inherits fromLoggingBase, prefer usingself.loggingconsistently for structured logging.Also applies to: 101-101
116-116: Use explicitOptionaltype hint.PEP 484 prohibits implicit
Optional. Update the type hint to be explicit.Proposed fix
- def execute(self, cmd: str, env: dict = None): + def execute(self, cmd: str, env: dict | None = None):Apply the same fix to
wrangler_deploy(line 187) andpywrangler_deploy(line 202).
231-244: Consider caching for faster Docker builds.The
--no-cacheflag ensures reproducible builds but significantly increases build time. Consider making this configurable or using--no-cacheonly for production deployments.sebs/cloudflare/workers.py (7)
8-12: Unused import:json.The
jsonmodule is imported but not used in this file.Proposed fix
import os import shutil -import json import io import tarfile
62-65: Unused parametercontainer_uri.This parameter is declared but never used. If it's a placeholder for future functionality, consider documenting it or removing it until needed.
Options
Either remove the unused parameter:
def generate_wrangler_toml( self, worker_name: str, package_dir: str, language: str, account_id: str, benchmark_name: Optional[str] = None, code_package: Optional[Benchmark] = None, - container_uri: str = "", ) -> str:Or add a TODO comment explaining its intended future use.
142-149: Fragile TOML writing fallback.Catching
TypeErrorto distinguish betweentomli_w.dump()(binary) andtoml.dumps()(text) is brittle. ATypeErrorcould occur for other reasons, masking real bugs.Proposed fix using explicit check
# Write wrangler.toml to package directory toml_path = os.path.join(package_dir, "wrangler.toml") - try: - # Try tomli_w (writes binary) - with open(toml_path, 'wb') as f: - tomli_w.dump(config, f) - except TypeError: - # Fallback to toml library (writes text) - with open(toml_path, 'w') as f: - f.write(tomli_w.dumps(config)) + if hasattr(tomli_w, 'dump'): + # tomli_w writes binary + with open(toml_path, 'wb') as f: + tomli_w.dump(config, f) + else: + # toml library writes text via dumps() + with open(toml_path, 'w') as f: + f.write(tomli_w.dumps(config))
213-215: Improve exception handling with chaining and proper logging.Use exception chaining (
from e) andlogging.exception()for better debugging context.Proposed fix
except Exception as e: - self.logging.error(f"npm install in container failed: {e}") - raise RuntimeError(f"Failed to install Node.js dependencies: {e}") + self.logging.exception(f"npm install in container failed: {e}") + raise RuntimeError(f"Failed to install Node.js dependencies: {e}") from eApply the same pattern at lines 240-242:
except Exception as e: - self.logging.error(f"Failed to install esbuild: {e}") - raise RuntimeError(f"Failed to install esbuild: {e}") + self.logging.exception(f"Failed to install esbuild: {e}") + raise RuntimeError(f"Failed to install esbuild: {e}") from eAlso applies to: 240-242
208-209:tarfile.extractall()has potential path traversal risk.While the tar archive comes from a controlled container, consider using the
filterparameter (Python 3.12+) or manual member validation to mitigate path traversal attacks for defense in depth.Proposed safer extraction (Python 3.12+)
with tarfile.open(fileobj=file_obj) as tar: - tar.extractall(directory) + tar.extractall(directory, filter='data')For older Python versions, you can validate members before extraction.
Also applies to: 236-237
154-161: Unused parameteris_cached.The
is_cachedparameter is never used. It could be leveraged to skip dependency installation for cached packages, improving performance.
263-293: Consider externalizing the hardcoded Pyodide package list.This list of ~200 packages will become outdated as Pyodide releases new versions. Pyodide publishes its supported packages via a versioned
repodata.jsonathttps://cdn.jsdelivr.net/pyodide/vX.Y.Z/full/repodata.json. Consider:
- Fetching the package list at build time from the appropriate Pyodide version's repodata.json
- Storing in a separate configuration file that can be updated alongside Pyodide version upgrades
- Adding a comment with a link to the Pyodide package index
| # Wait for container to be ready | ||
| while True: | ||
| try: | ||
| dkg = self.docker_instance.logs(stream=True, follow=True) | ||
| next(dkg).decode("utf-8") | ||
| break | ||
| except StopIteration: | ||
| pass |
There was a problem hiding this comment.
Readiness wait loop could block indefinitely.
The while True loop has no timeout mechanism. If the container doesn't produce logs (e.g., due to a startup failure or misconfiguration), this will hang forever.
Proposed fix with timeout
+ import time
+
# Wait for container to be ready
+ max_wait_seconds = 30
+ start_time = time.time()
while True:
+ if time.time() - start_time > max_wait_seconds:
+ raise RuntimeError(
+ f"Container {self.docker_instance.id} did not become ready within {max_wait_seconds}s"
+ )
try:
dkg = self.docker_instance.logs(stream=True, follow=True)
next(dkg).decode("utf-8")
break
except StopIteration:
- pass
+ time.sleep(0.1) # Brief sleep before retry📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Wait for container to be ready | |
| while True: | |
| try: | |
| dkg = self.docker_instance.logs(stream=True, follow=True) | |
| next(dkg).decode("utf-8") | |
| break | |
| except StopIteration: | |
| pass | |
| import time | |
| # Wait for container to be ready | |
| max_wait_seconds = 30 | |
| start_time = time.time() | |
| while True: | |
| if time.time() - start_time > max_wait_seconds: | |
| raise RuntimeError( | |
| f"Container {self.docker_instance.id} did not become ready within {max_wait_seconds}s" | |
| ) | |
| try: | |
| dkg = self.docker_instance.logs(stream=True, follow=True) | |
| next(dkg).decode("utf-8") | |
| break | |
| except StopIteration: | |
| time.sleep(0.1) # Brief sleep before retry |
🤖 Prompt for AI Agents
In `@sebs/cloudflare/cli.py` around lines 103 - 110, The readiness loop in cli.py
(where docker_instance.logs(stream=True, follow=True) is iterated via dkg and
next(dkg).decode(...)) can block forever; add a timeout (e.g., max_wait_seconds)
and track elapsed time (using time.time()) inside the while True to break and
raise/return after the timeout, and also sleep briefly on StopIteration to avoid
busy looping; ensure you update the block that assigns dkg and calls next(dkg)
so it stops with a clear error/exception when the timeout elapses instead of
hanging.
| response = requests.get(url, headers=headers) | ||
|
|
||
| if response.status_code == 200: | ||
| try: | ||
| return response.json().get("result") | ||
| except: | ||
| return None | ||
| elif response.status_code == 404: | ||
| return None | ||
| else: | ||
| self.logging.warning(f"Unexpected response checking worker: {response.status_code}") | ||
| return None |
There was a problem hiding this comment.
Add timeout and replace bare except with specific exception.
Two issues:
- The
requests.get()call lacks a timeout parameter - The bare
except:clause catches all exceptions including system-exiting ones
🔒 Proposed fix
- response = requests.get(url, headers=headers)
+ response = requests.get(url, headers=headers, timeout=30)
if response.status_code == 200:
try:
return response.json().get("result")
- except:
+ except (json.JSONDecodeError, ValueError, KeyError):
return None🧰 Tools
🪛 Ruff (0.14.13)
354-354: Probable use of requests call without timeout
(S113)
359-359: Do not use bare except
(E722)
🤖 Prompt for AI Agents
In `@sebs/cloudflare/cloudflare.py` around lines 354 - 365, The requests.get call
should include a timeout and the bare except should be replaced with specific
exception handling: call requests.get(url, headers=headers,
timeout=<reasonable_seconds>) and catch JSONDecodeError (or ValueError) when
calling response.json() instead of using a bare except; also consider catching
requests.exceptions.RequestException around the requests.get call and log errors
via self.logging.warning including response.status_code or exception details to
aid debugging (refer to requests.get, response.json(), and self.logging.warning
in the surrounding method).
| with tarfile.open(fileobj=file_obj) as tar: | ||
| tar.extractall(directory) | ||
|
|
||
| self.logging.info(f"Downloaded node_modules to {directory} for wrangler deployment") | ||
| except Exception as e: | ||
| self.logging.error(f"npm install failed: {e}") | ||
| raise RuntimeError(f"Failed to install Node.js dependencies: {e}") |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Python tarfile extractall filter parameter security CVE-2007-4559
💡 Result:
Short answer:
- CVE-2007-4559 is an old directory‑traversal issue in Python’s tarfile.extract()/extractall() where “..” paths in a TAR can overwrite files outside the target directory. [1]
- Separately, recent (2024–2025) CVEs/alerts describe bypasses of the newer tarfile extraction "filter" feature (added in Python 3.12, defaulting to "data" in 3.14) that let specially crafted archives cause writes or metadata changes outside the extraction directory when using extract()/extractall(filter="data" or "tar"). See advisories for CVE-2024-12718 / CVE-2025-4517 and related GHSA entries. [2][3]
Mitigation summary:
- Do not extract untrusted tarballs with tarfile.extract()/extractall() without explicit, safe checks.
- Avoid using filter="data" or filter="tar" on untrusted input (or set filter="none"/omit the filter) until you’ve updated to a patched Python.
- Apply vendor/CPython security patches or upgrade to patched Python releases when available.
- As an extra defense, validate each TarInfo (sanitize path, forbid absolute paths and any .. components, reject symlinks/device nodes) and extract into a safe temporary directory, then move validated files into place.
Sources:
[1] NVD entry for CVE-2007-4559 (tarfile directory traversal).
[2] OSS‑security / Python security announcement summarizing multiple tarfile CVEs (June 2025).
[3] CVE writeups for the extraction‑filter bypass (CVE-2025-4517 / CVE-2024-12718) and related advisories.
🏁 Script executed:
# Locate and view the actual code in the repository
find . -name "containers.py" -path "*/cloudflare/*" | head -5Repository: spcl/serverless-benchmarks
Length of output: 101
🏁 Script executed:
# View the actual code at lines 356-362
sed -n '350,365p' ./sebs/cloudflare/containers.pyRepository: spcl/serverless-benchmarks
Length of output: 845
🏁 Script executed:
# Check Python version requirements and context around the code
# Look for Python version specifications in the repo
find . -name "setup.py" -o -name "pyproject.toml" -o -name ".python-version" | head -10Repository: spcl/serverless-benchmarks
Length of output: 159
🏁 Script executed:
# Check how the tar data is sourced and if it's trusted
sed -n '340,370p' ./sebs/cloudflare/containers.py | cat -nRepository: spcl/serverless-benchmarks
Length of output: 1830
🏁 Script executed:
# Check Python version requirements in the project
cat pyproject.toml 2>/dev/null || cat setup.py 2>/dev/null | grep -A 5 "python" | head -15Repository: spcl/serverless-benchmarks
Length of output: 156
🏁 Script executed:
# Look for other tarfile.extractall usages to see if there's a pattern
rg "tarfile.extractall|tar\.extractall" -B 2 -A 2Repository: spcl/serverless-benchmarks
Length of output: 1686
Add exception chaining and consider filter parameter for tarfile.extractall().
Two improvements:
- Exception chaining: Add
from eto preserve the exception chain for better debugging (line 362). - Tarfile filter: The tar data comes from
cli.docker_instance.get_archive()(internal Docker container), which is a trusted source; however, adding an explicit filter parameter likefilter='data'follows defense-in-depth principles and is recommended for Python 3.12+.
Suggested changes
with tarfile.open(fileobj=file_obj) as tar:
- tar.extractall(directory)
+ tar.extractall(directory, filter='data')
self.logging.info(f"Downloaded node_modules to {directory} for wrangler deployment")
except Exception as e:
self.logging.error(f"npm install failed: {e}")
- raise RuntimeError(f"Failed to install Node.js dependencies: {e}")
+ raise RuntimeError(f"Failed to install Node.js dependencies: {e}") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with tarfile.open(fileobj=file_obj) as tar: | |
| tar.extractall(directory) | |
| self.logging.info(f"Downloaded node_modules to {directory} for wrangler deployment") | |
| except Exception as e: | |
| self.logging.error(f"npm install failed: {e}") | |
| raise RuntimeError(f"Failed to install Node.js dependencies: {e}") | |
| with tarfile.open(fileobj=file_obj) as tar: | |
| tar.extractall(directory, filter='data') | |
| self.logging.info(f"Downloaded node_modules to {directory} for wrangler deployment") | |
| except Exception as e: | |
| self.logging.error(f"npm install failed: {e}") | |
| raise RuntimeError(f"Failed to install Node.js dependencies: {e}") from e |
🧰 Tools
🪛 Ruff (0.14.13)
357-357: Uses of tarfile.extractall()
(S202)
360-360: Do not catch blind exception: Exception
(BLE001)
361-361: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
362-362: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
362-362: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@sebs/cloudflare/containers.py` around lines 356 - 362, Update the exception
handling and tar extraction call in the tarfile.open block: preserve the
original exception by re-raising RuntimeError using "from e" (i.e., raise
RuntimeError(...) from e) so the exception chain is kept, and when calling
tar.extractall(directory) pass the recommended filter parameter (e.g.,
filter='data' or an equivalent safe filter) to harden extraction; locate the
tarfile.open(...) block and the except Exception as e handling that uses
self.logging.error/f"npm install failed: {e}" and change the extractall call and
the raise accordingly.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@sebs/cloudflare/containers.py`:
- Around line 283-294: The regex replacement for "from . import " can rewrite
occurrences inside strings/comments and corrupt code; instead parse each Python
file (use the ast module) and transform ast.ImportFrom nodes where node.level ==
1 and node.module is None or '' into equivalent ast.Import nodes (preserve alias
names), then unparse the AST back to source before writing to file_path; operate
only when language_name == "python" and skip files in the existing exclusion
list (handler.py, storage.py, nosql.py, worker.py) to replicate the current
behavior.
♻️ Duplicate comments (1)
sebs/cloudflare/containers.py (1)
356-362: Add tarfile filter and exception chaining (previously flagged).This addresses security (CVE-2007-4559 and related) and improves debugging with proper exception chaining.
♻️ Suggested fix
with tarfile.open(fileobj=file_obj) as tar: - tar.extractall(directory) + tar.extractall(directory, filter='data') self.logging.info(f"Downloaded node_modules to {directory} for wrangler deployment") except Exception as e: - self.logging.error(f"npm install failed: {e}") - raise RuntimeError(f"Failed to install Node.js dependencies: {e}") + self.logging.exception("npm install failed") + raise RuntimeError(f"Failed to install Node.js dependencies: {e}") from e
🧹 Nitpick comments (10)
sebs/cloudflare/containers.py (10)
61-70: Remove or document unused parameters.The
language(line 65) andcontainer_uri(line 69) parameters are declared but never used in the method body. Either remove them or add a comment explaining they're reserved for future use.♻️ Suggested fix
def generate_wrangler_toml( self, worker_name: str, package_dir: str, - language: str, account_id: str, benchmark_name: Optional[str] = None, code_package: Optional[Benchmark] = None, - container_uri: str = "", ) -> str:
110-122: Accessing private_tablesattribute and missing key safety checks.
nosql_storage._tablesaccesses a private attribute that may change without notice.- Lines 116 and 122 assume
config['durable_objects']['bindings']andconfig['migrations'][0]['new_sqlite_classes']exist in the template. If the template structure changes, this will raise KeyError/IndexError.♻️ Suggested improvements
- nosql_tables = nosql_storage._tables.get(benchmark_name, {}) + # Consider exposing a public method like get_tables(benchmark_name) + nosql_tables = nosql_storage._tables.get(benchmark_name, {}) # Add durable object bindings for each nosql table for table_name in nosql_tables.keys(): + if 'durable_objects' not in config: + config['durable_objects'] = {'bindings': []} + if 'bindings' not in config['durable_objects']: + config['durable_objects']['bindings'] = [] config['durable_objects']['bindings'].append({ 'name': table_name, 'class_name': 'KVApiObject' }) # Update migrations to include KVApiObject + if 'migrations' in config and config['migrations']: + if 'new_sqlite_classes' not in config['migrations'][0]: + config['migrations'][0]['new_sqlite_classes'] = [] config['migrations'][0]['new_sqlite_classes'].append('KVApiObject')
143-147: Broad exception handling obscures specific failures.Catching bare
Exceptionhere makes debugging difficult. Consider catching more specific exceptions (e.g.,KeyError,AttributeError) or at minimum, log the exception type.♻️ Suggested fix
- except Exception as e: + except (KeyError, AttributeError) as e: self.logging.warning( f"R2 bucket binding not configured: {e}. " f"Benchmarks requiring file access will not work properly." )
250-252: Remove extraneous f-string prefix.Line 252 uses an f-string without any placeholders.
♻️ Suggested fix
- self.logging.info(f"Copied worker.js orchestration file from nodejs/container") + self.logging.info("Copied worker.js orchestration file from nodejs/container")
296-303: Remove dead code block.The Node.js transformation block contains only
passwith a comment suggesting future work. This is dead code that should be removed or implemented.♻️ Suggested fix
- # For Node.js containers, transform benchmark code to be async-compatible - # The container wrapper uses async HTTP calls, but benchmarks expect sync - elif language_name == "nodejs": - for item in os.listdir(directory): - if item.endswith('.js') and item not in ['handler.js', 'storage.js', 'nosql.js', 'worker.js', 'build.js', 'request-polyfill.js']: - file_path = os.path.join(directory, item) - # Could add transformations here if needed - pass
351-351: Rename unused variable to indicate intent.The
statvariable fromget_archive()is never used.♻️ Suggested fix
- bits, stat = cli.docker_instance.get_archive(f"{container_path}/node_modules") + bits, _stat = cli.docker_instance.get_archive(f"{container_path}/node_modules")
406-411: Rename unused loop variable.The
dirnamesvariable is not used within the loop body.♻️ Suggested fix
- for dirpath, dirnames, filenames in os.walk(directory): + for dirpath, _dirnames, filenames in os.walk(directory):
458-465: Improve exception handling with proper chaining and logging.Use
raise ... from eto preserve exception chains andlogging.exceptionto automatically include stack traces.♻️ Suggested fix
except docker.errors.BuildError as e: error_msg = f"Docker build failed for {image_tag}: {e}" - self.logging.error(error_msg) - raise RuntimeError(error_msg) + self.logging.exception(error_msg) + raise RuntimeError(error_msg) from e except Exception as e: error_msg = f"Unexpected error building Docker image {image_tag}: {e}" - self.logging.error(error_msg) - raise RuntimeError(error_msg) + self.logging.exception(error_msg) + raise RuntimeError(error_msg) from e
467-471: Unused parameterworker_name.The
worker_nameparameter is declared but never used. Consider using it in log messages for better observability or remove it.♻️ Suggested fix - use in logging
- self.logging.info("Checking container Durable Object readiness via health endpoint...") + self.logging.info(f"Checking container Durable Object readiness for {worker_name} via health endpoint...")Or remove the parameter if not needed.
489-490: Remove unused circuit breaker variables.
consecutive_failuresandmax_consecutive_failuresare assigned but never used. Either implement the intended circuit breaker logic or remove these variables.♻️ Option 1: Remove unused variables
- consecutive_failures = 0 - max_consecutive_failures = 5♻️ Option 2: Implement circuit breaker pattern
consecutive_failures = 0 max_consecutive_failures = 5 while time.time() - start_time < max_wait_seconds: try: response = requests.get( f"{worker_url}/health", timeout=60 ) if response.status_code == 200: self.logging.info("Container Durable Object is ready!") return True elif response.status_code == 503: + consecutive_failures = 0 # Reset on expected "not ready" response # ... existing logging ... else: + consecutive_failures += 1 self.logging.warning(...) + if consecutive_failures >= max_consecutive_failures: + self.logging.error("Too many consecutive failures, aborting wait") + return False except requests.exceptions.Timeout: + consecutive_failures += 1 # ... existing logging ... except requests.exceptions.RequestException as e: + consecutive_failures += 1 # ... existing logging ... + + if consecutive_failures >= max_consecutive_failures: + self.logging.error("Too many consecutive failures, aborting wait") + return False
| # For Python containers, fix relative imports in benchmark code | ||
| # Containers use flat structure, so "from . import storage" must become "import storage" | ||
| if language_name == "python": | ||
| for item in os.listdir(directory): | ||
| if item.endswith('.py') and item not in ['handler.py', 'storage.py', 'nosql.py', 'worker.py']: | ||
| file_path = os.path.join(directory, item) | ||
| with open(file_path, 'r') as f: | ||
| content = f.read() | ||
| # Fix relative imports | ||
| content = re.sub(r'from \. import ', 'import ', content) | ||
| with open(file_path, 'w') as f: | ||
| f.write(content) |
There was a problem hiding this comment.
Regex-based import rewriting may produce false positives.
The regex from \. import could match occurrences inside string literals or comments, potentially corrupting benchmark code. Consider using AST-based transformation or a more targeted approach.
♻️ Safer alternative using line-by-line check
- # Fix relative imports
- content = re.sub(r'from \. import ', 'import ', content)
+ # Fix relative imports (only at line start, not in strings/comments)
+ lines = content.split('\n')
+ fixed_lines = []
+ for line in lines:
+ stripped = line.lstrip()
+ if stripped.startswith('from . import '):
+ line = line.replace('from . import ', 'import ', 1)
+ fixed_lines.append(line)
+ content = '\n'.join(fixed_lines)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # For Python containers, fix relative imports in benchmark code | |
| # Containers use flat structure, so "from . import storage" must become "import storage" | |
| if language_name == "python": | |
| for item in os.listdir(directory): | |
| if item.endswith('.py') and item not in ['handler.py', 'storage.py', 'nosql.py', 'worker.py']: | |
| file_path = os.path.join(directory, item) | |
| with open(file_path, 'r') as f: | |
| content = f.read() | |
| # Fix relative imports | |
| content = re.sub(r'from \. import ', 'import ', content) | |
| with open(file_path, 'w') as f: | |
| f.write(content) | |
| # For Python containers, fix relative imports in benchmark code | |
| # Containers use flat structure, so "from . import storage" must become "import storage" | |
| if language_name == "python": | |
| for item in os.listdir(directory): | |
| if item.endswith('.py') and item not in ['handler.py', 'storage.py', 'nosql.py', 'worker.py']: | |
| file_path = os.path.join(directory, item) | |
| with open(file_path, 'r') as f: | |
| content = f.read() | |
| # Fix relative imports (only at line start, not in strings/comments) | |
| lines = content.split('\n') | |
| fixed_lines = [] | |
| for line in lines: | |
| stripped = line.lstrip() | |
| if stripped.startswith('from . import '): | |
| line = line.replace('from . import ', 'import ', 1) | |
| fixed_lines.append(line) | |
| content = '\n'.join(fixed_lines) | |
| with open(file_path, 'w') as f: | |
| f.write(content) |
🤖 Prompt for AI Agents
In `@sebs/cloudflare/containers.py` around lines 283 - 294, The regex replacement
for "from . import " can rewrite occurrences inside strings/comments and corrupt
code; instead parse each Python file (use the ast module) and transform
ast.ImportFrom nodes where node.level == 1 and node.module is None or '' into
equivalent ast.Import nodes (preserve alias names), then unparse the AST back to
source before writing to file_path; operate only when language_name == "python"
and skip files in the existing exclusion list (handler.py, storage.py, nosql.py,
worker.py) to replicate the current behavior.
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@sebs/cloudflare/r2.py`:
- Around line 12-14: The typename() static method incorrectly returns
"Cloudlfare.R2"; update the return value in typename() (in sebs.cloudflare.r2:
typename) to the correct string "Cloudflare.R2" so the provider name is spelled
properly.
- Around line 347-354: The clean_bucket method currently logs a warning and
returns silently which can mislead callers (similar to download); update the
implementation to raise NotImplementedError instead of silently passing so
callers get an explicit failure. Modify the clean_bucket function (and any
similar stubs like download) to raise NotImplementedError with a clear message
such as "clean_bucket not implemented for R2 bucket {bucket_name}" and remove or
keep the warning log as supplementary; ensure the method signature and docstring
remain unchanged.
- Around line 368-380: The requests.delete call lacks a timeout; update the HTTP
call to include the same timeout used elsewhere in this module by adding the
timeout keyword argument to requests.delete(delete_bucket_uri,
headers=self._get_auth_headers(), timeout=<module timeout value>), where <module
timeout value> should be the existing timeout constant or attribute used by
other calls in this file (for example self._timeout or DEFAULT_TIMEOUT), so that
the delete request for delete_bucket_uri follows the same timeout behavior as
other requests.
- Around line 87-106: The requests.post call in the R2 bucket creation (inside
the method using create_bucket_uri and create_bucket_response) needs a timeout
to avoid hangs and the inner bare except should be replaced with specific
exception handling; add a timeout argument to requests.post (e.g. timeout=10 or
a configurable constant) and wrap the response.json() call in a try/except that
catches json.JSONDecodeError (or ValueError) only, logging
create_bucket_response.text when JSON parsing fails; also consider catching
requests.exceptions.RequestException around the POST to log and re-raise or
handle network errors instead of letting it hang.
- Around line 122-133: The download method currently only logs a warning and
returns silently, which can mislead callers; replace the silent pass with
raising a NotImplementedError (include a clear message referencing the
bucket_name and key) so callers receive an explicit failure, keep the method
signature download(self, bucket_name: str, key: str, filepath: str) unchanged
and remove the trailing pass; retain or refine the log message if desired but
ensure the NotImplementedError is raised.
- Around line 58-60: The _create_bucket function currently uses a mutable
default for buckets and includes an unused randomize_name parameter; change the
signature of _create_bucket(name: str, buckets: Optional[list[str]] = None) (or
remove the buckets default entirely) and inside the function initialize buckets
= [] if buckets is None to avoid shared mutable state, and either remove the
randomize_name parameter or keep it but mark it with a TODO and do not rely on
it; update any callers of _create_bucket to pass an explicit list if they relied
on the previous default.
- Around line 299-316: The requests.get call to list_buckets_uri needs a timeout
and the error handling must avoid a bare except and properly capture JSON parse
errors and include the exception text in the log; update the call in the method
that calls requests.get(list_buckets_uri, headers=self._get_auth_headers()) to
pass a sensible timeout (e.g. timeout=10), replace the bare except with a
specific exception handler (catch json.JSONDecodeError or ValueError as e) when
calling response.json(), and include the exception details in the
self.logging.error message (or use self.logging.exception) so the log contains
the actual parse error while still returning [] on 403.
♻️ Duplicate comments (1)
sebs/cloudflare/r2.py (1)
145-232: Significant code duplication in S3 client initialization.The S3 client initialization logic (lines 163-170, 210-217, and 261-268) is repeated across
upload,upload_bytes, andlist_bucketmethods. Extract this into a helper method for maintainability.♻️ Proposed refactor
Add a helper method and refactor the upload methods:
+ def _get_s3_client(self): + """Get or create S3-compatible client for R2 operations.""" + if not self._credentials.r2_access_key_id or not self._credentials.r2_secret_access_key: + return None + + import boto3 + from botocore.config import Config + + account_id = self._credentials.account_id + return boto3.client( + 's3', + endpoint_url=f'https://{account_id}.r2.cloudflarestorage.com', + aws_access_key_id=self._credentials.r2_access_key_id, + aws_secret_access_key=self._credentials.r2_secret_access_key, + config=Config(signature_version='s3v4'), + region_name='auto' + ) + def upload(self, bucket_name: str, filepath: str, key: str): ... try: - import boto3 - from botocore.config import Config - - account_id = self._credentials.account_id - - # R2 uses S3-compatible API, but requires special configuration - # The endpoint is: https://<account_id>.r2.cloudflarestorage.com - # You need to create R2 API tokens in the Cloudflare dashboard - - # Check if we have S3-compatible credentials - if not self._credentials.r2_access_key_id or not self._credentials.r2_secret_access_key: + s3_client = self._get_s3_client() + if s3_client is None: self.logging.warning( "R2 upload requires S3-compatible API credentials (r2_access_key_id, r2_secret_access_key). " "File upload skipped. Set CLOUDFLARE_R2_ACCESS_KEY_ID and CLOUDFLARE_R2_SECRET_ACCESS_KEY." ) return - - s3_client = boto3.client( - 's3', - endpoint_url=f'https://{account_id}.r2.cloudflarestorage.com', - aws_access_key_id=self._credentials.r2_access_key_id, - aws_secret_access_key=self._credentials.r2_secret_access_key, - config=Config(signature_version='s3v4'), - region_name='auto' - )Based on past review comments, this duplication was previously flagged.
🧹 Nitpick comments (1)
sebs/cloudflare/r2.py (1)
20-26: Remove redundant property - already defined in parent class.The
replace_existingproperty is already defined inPersistentStorage(the parent class). This duplication is unnecessary and could lead to maintenance issues.♻️ Proposed fix
- `@property` - def replace_existing(self) -> bool: - return self._replace_existing - - `@replace_existing.setter` - def replace_existing(self, val: bool): - self._replace_existing = val - def __init__(
| @staticmethod | ||
| def typename() -> str: | ||
| return "Cloudlfare.R2" |
There was a problem hiding this comment.
Typo in typename: "Cloudlfare" should be "Cloudflare".
🔧 Proposed fix
`@staticmethod`
def typename() -> str:
- return "Cloudlfare.R2"
+ return "Cloudflare.R2"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @staticmethod | |
| def typename() -> str: | |
| return "Cloudlfare.R2" | |
| `@staticmethod` | |
| def typename() -> str: | |
| return "Cloudflare.R2" |
🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 12 - 14, The typename() static method
incorrectly returns "Cloudlfare.R2"; update the return value in typename() (in
sebs.cloudflare.r2: typename) to the correct string "Cloudflare.R2" so the
provider name is spelled properly.
| def _create_bucket( | ||
| self, name: str, buckets: list[str] = [], randomize_name: bool = False | ||
| ) -> str: |
There was a problem hiding this comment.
Fix mutable default argument and unused parameter.
Using a mutable default argument ([]) can cause unexpected behavior if the list is modified. The randomize_name parameter is also unused.
🔧 Proposed fix
def _create_bucket(
- self, name: str, buckets: list[str] = [], randomize_name: bool = False
+ self, name: str, buckets: Optional[List[str]] = None, randomize_name: bool = False
) -> str:
+ if buckets is None:
+ buckets = []
for bucket_name in buckets:If randomize_name is intended for future use, consider adding a TODO comment or removing it until needed.
🧰 Tools
🪛 Ruff (0.14.13)
59-59: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
59-59: Unused method argument: randomize_name
(ARG002)
🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 58 - 60, The _create_bucket function
currently uses a mutable default for buckets and includes an unused
randomize_name parameter; change the signature of _create_bucket(name: str,
buckets: Optional[list[str]] = None) (or remove the buckets default entirely)
and inside the function initialize buckets = [] if buckets is None to avoid
shared mutable state, and either remove the randomize_name parameter or keep it
but mark it with a TODO and do not rely on it; update any callers of
_create_bucket to pass an explicit list if they relied on the previous default.
| try: | ||
| create_bucket_response = requests.post( | ||
| create_bucket_uri, json=params, headers=self._get_auth_headers() | ||
| ) | ||
|
|
||
| # Log the response for debugging | ||
| if create_bucket_response.status_code >= 400: | ||
| try: | ||
| error_data = create_bucket_response.json() | ||
| self.logging.error( | ||
| f"R2 bucket creation failed. Status: {create_bucket_response.status_code}, " | ||
| f"Response: {error_data}" | ||
| ) | ||
| except: | ||
| self.logging.error( | ||
| f"R2 bucket creation failed. Status: {create_bucket_response.status_code}, " | ||
| f"Response: {create_bucket_response.text}" | ||
| ) | ||
|
|
||
| create_bucket_response.raise_for_status() |
There was a problem hiding this comment.
Add timeout to HTTP request and avoid bare except.
The requests.post call lacks a timeout, which could cause indefinite hangs. Also, bare except: catches all exceptions including KeyboardInterrupt and SystemExit.
🔧 Proposed fix
try:
create_bucket_response = requests.post(
- create_bucket_uri, json=params, headers=self._get_auth_headers()
+ create_bucket_uri, json=params, headers=self._get_auth_headers(), timeout=30
)
# Log the response for debugging
if create_bucket_response.status_code >= 400:
try:
error_data = create_bucket_response.json()
self.logging.error(
f"R2 bucket creation failed. Status: {create_bucket_response.status_code}, "
f"Response: {error_data}"
)
- except:
+ except json.JSONDecodeError:
self.logging.error(
f"R2 bucket creation failed. Status: {create_bucket_response.status_code}, "
f"Response: {create_bucket_response.text}"
)🧰 Tools
🪛 Ruff (0.14.13)
88-88: Probable use of requests call without timeout
(S113)
100-100: Do not use bare except
(E722)
101-104: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 87 - 106, The requests.post call in the
R2 bucket creation (inside the method using create_bucket_uri and
create_bucket_response) needs a timeout to avoid hangs and the inner bare except
should be replaced with specific exception handling; add a timeout argument to
requests.post (e.g. timeout=10 or a configurable constant) and wrap the
response.json() call in a try/except that catches json.JSONDecodeError (or
ValueError) only, logging create_bucket_response.text when JSON parsing fails;
also consider catching requests.exceptions.RequestException around the POST to
log and re-raise or handle network errors instead of letting it hang.
| def download(self, bucket_name: str, key: str, filepath: str) -> None: | ||
| """ | ||
| Download a file from a bucket. | ||
|
|
||
| :param bucket_name: | ||
| :param key: storage source filepath | ||
| :param filepath: local destination filepath | ||
| """ | ||
| # R2 requires S3-compatible access for object operations | ||
| # For now, this is not fully implemented | ||
| self.logging.warning(f"download not fully implemented for R2 bucket {bucket_name}") | ||
| pass |
There was a problem hiding this comment.
Incomplete implementation may silently fail.
The download method logs a warning but returns silently, which could cause callers to assume success. Consider raising NotImplementedError to make the incomplete state explicit.
🔧 Proposed fix
def download(self, bucket_name: str, key: str, filepath: str) -> None:
"""
Download a file from a bucket.
:param bucket_name:
:param key: storage source filepath
:param filepath: local destination filepath
"""
- # R2 requires S3-compatible access for object operations
- # For now, this is not fully implemented
- self.logging.warning(f"download not fully implemented for R2 bucket {bucket_name}")
- pass
+ raise NotImplementedError(
+ "R2 download requires S3-compatible access. This is not yet implemented."
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def download(self, bucket_name: str, key: str, filepath: str) -> None: | |
| """ | |
| Download a file from a bucket. | |
| :param bucket_name: | |
| :param key: storage source filepath | |
| :param filepath: local destination filepath | |
| """ | |
| # R2 requires S3-compatible access for object operations | |
| # For now, this is not fully implemented | |
| self.logging.warning(f"download not fully implemented for R2 bucket {bucket_name}") | |
| pass | |
| def download(self, bucket_name: str, key: str, filepath: str) -> None: | |
| """ | |
| Download a file from a bucket. | |
| :param bucket_name: | |
| :param key: storage source filepath | |
| :param filepath: local destination filepath | |
| """ | |
| raise NotImplementedError( | |
| "R2 download requires S3-compatible access. This is not yet implemented." | |
| ) |
🧰 Tools
🪛 Ruff (0.14.13)
122-122: Unused method argument: key
(ARG002)
122-122: Unused method argument: filepath
(ARG002)
🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 122 - 133, The download method currently
only logs a warning and returns silently, which can mislead callers; replace the
silent pass with raising a NotImplementedError (include a clear message
referencing the bucket_name and key) so callers receive an explicit failure,
keep the method signature download(self, bucket_name: str, key: str, filepath:
str) unchanged and remove the trailing pass; retain or refine the log message if
desired but ensure the NotImplementedError is raised.
| try: | ||
| response = requests.get(list_buckets_uri, headers=self._get_auth_headers()) | ||
|
|
||
| # Log detailed error information | ||
| if response.status_code == 403: | ||
| try: | ||
| error_data = response.json() | ||
| self.logging.error( | ||
| f"403 Forbidden accessing R2 buckets. " | ||
| f"Response: {error_data}. " | ||
| f"Your API token may need 'R2 Read and Write' permissions." | ||
| ) | ||
| except: | ||
| self.logging.error( | ||
| f"403 Forbidden accessing R2 buckets. " | ||
| f"Your API token may need 'R2 Read and Write' permissions." | ||
| ) | ||
| return [] |
There was a problem hiding this comment.
Add timeout and fix error handling.
The requests.get call lacks a timeout. The bare except: and f-strings without placeholders should also be fixed.
🔧 Proposed fix
try:
- response = requests.get(list_buckets_uri, headers=self._get_auth_headers())
+ response = requests.get(list_buckets_uri, headers=self._get_auth_headers(), timeout=30)
# Log detailed error information
if response.status_code == 403:
try:
error_data = response.json()
self.logging.error(
f"403 Forbidden accessing R2 buckets. "
f"Response: {error_data}. "
- f"Your API token may need 'R2 Read and Write' permissions."
+ "Your API token may need 'R2 Read and Write' permissions."
)
- except:
+ except json.JSONDecodeError:
self.logging.error(
- f"403 Forbidden accessing R2 buckets. "
- f"Your API token may need 'R2 Read and Write' permissions."
+ "403 Forbidden accessing R2 buckets. "
+ "Your API token may need 'R2 Read and Write' permissions."
)
return []🧰 Tools
🪛 Ruff (0.14.13)
300-300: Probable use of requests call without timeout
(S113)
311-311: Do not use bare except
(E722)
312-315: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
313-313: f-string without any placeholders
Remove extraneous f prefix
(F541)
314-314: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 299 - 316, The requests.get call to
list_buckets_uri needs a timeout and the error handling must avoid a bare except
and properly capture JSON parse errors and include the exception text in the
log; update the call in the method that calls requests.get(list_buckets_uri,
headers=self._get_auth_headers()) to pass a sensible timeout (e.g. timeout=10),
replace the bare except with a specific exception handler (catch
json.JSONDecodeError or ValueError as e) when calling response.json(), and
include the exception details in the self.logging.error message (or use
self.logging.exception) so the log contains the actual parse error while still
returning [] on 403.
| def clean_bucket(self, bucket_name: str): | ||
| """ | ||
| Remove all objects from a bucket. | ||
|
|
||
| :param bucket_name: | ||
| """ | ||
| self.logging.warning(f"clean_bucket not fully implemented for R2 bucket {bucket_name}") | ||
| pass |
There was a problem hiding this comment.
Incomplete implementation may silently fail.
Similar to download, this stub could cause callers to assume success. Consider raising NotImplementedError.
🔧 Proposed fix
def clean_bucket(self, bucket_name: str):
"""
Remove all objects from a bucket.
:param bucket_name:
"""
- self.logging.warning(f"clean_bucket not fully implemented for R2 bucket {bucket_name}")
- pass
+ raise NotImplementedError(
+ f"clean_bucket not yet implemented for R2 bucket {bucket_name}"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def clean_bucket(self, bucket_name: str): | |
| """ | |
| Remove all objects from a bucket. | |
| :param bucket_name: | |
| """ | |
| self.logging.warning(f"clean_bucket not fully implemented for R2 bucket {bucket_name}") | |
| pass | |
| def clean_bucket(self, bucket_name: str): | |
| """ | |
| Remove all objects from a bucket. | |
| :param bucket_name: | |
| """ | |
| raise NotImplementedError( | |
| f"clean_bucket not yet implemented for R2 bucket {bucket_name}" | |
| ) |
🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 347 - 354, The clean_bucket method
currently logs a warning and returns silently which can mislead callers (similar
to download); update the implementation to raise NotImplementedError instead of
silently passing so callers get an explicit failure. Modify the clean_bucket
function (and any similar stubs like download) to raise NotImplementedError with
a clear message such as "clean_bucket not implemented for R2 bucket
{bucket_name}" and remove or keep the warning log as supplementary; ensure the
method signature and docstring remain unchanged.
| try: | ||
| response = requests.delete(delete_bucket_uri, headers=self._get_auth_headers()) | ||
| response.raise_for_status() | ||
|
|
||
| data = response.json() | ||
|
|
||
| if data.get("success"): | ||
| self.logging.info(f"Successfully deleted R2 bucket {bucket}") | ||
| else: | ||
| self.logging.error(f"Failed to delete R2 bucket {bucket}: {data.get('errors')}") | ||
|
|
||
| except requests.exceptions.RequestException as e: | ||
| self.logging.error(f"Error deleting R2 bucket {bucket}: {e}") |
There was a problem hiding this comment.
Add timeout to HTTP request.
The requests.delete call lacks a timeout, consistent with other HTTP calls in this file.
🔧 Proposed fix
try:
- response = requests.delete(delete_bucket_uri, headers=self._get_auth_headers())
+ response = requests.delete(delete_bucket_uri, headers=self._get_auth_headers(), timeout=30)
response.raise_for_status()🧰 Tools
🪛 Ruff (0.14.13)
369-369: Probable use of requests call without timeout
(S113)
380-380: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 368 - 380, The requests.delete call lacks
a timeout; update the HTTP call to include the same timeout used elsewhere in
this module by adding the timeout keyword argument to
requests.delete(delete_bucket_uri, headers=self._get_auth_headers(),
timeout=<module timeout value>), where <module timeout value> should be the
existing timeout constant or attribute used by other calls in this file (for
example self._timeout or DEFAULT_TIMEOUT), so that the delete request for
delete_bucket_uri follows the same timeout behavior as other requests.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@sebs/cloudflare/cloudflare.py`:
- Around line 484-507: In _build_workers_dev_url, the last logging.warning uses
a plain string with "{name}" so worker_name is not interpolated; change that
call to use an f-string and reference {worker_name} (i.e., update the
logging.warning message to f"No account ID available; using
https://{worker_name}.workers.dev which may not be reachable.") so the actual
worker_name appears in logs.
♻️ Duplicate comments (12)
sebs/cloudflare/r2.py (7)
11-18: Fix provider typename typo.
The provider identifier is misspelled and can break lookups.🔧 Proposed fix
`@staticmethod` def typename() -> str: - return "Cloudlfare.R2" + return "Cloudflare.R2"
99-101: Avoid mutable default for buckets.
A shared list across calls can leak state; also noterandomize_nameremains unused.🔧 Proposed fix
def _create_bucket( - self, name: str, buckets: list[str] = [], randomize_name: bool = False + self, name: str, buckets: Optional[List[str]] = None, randomize_name: bool = False ) -> str: + if buckets is None: + buckets = []
128-145: Add timeout and narrow JSON parse handling in bucket creation.
Without a timeout the call can hang; bareexcepthides non‑JSON errors.🔧 Proposed fix
create_bucket_response = requests.post( - create_bucket_uri, json=params, headers=self._get_auth_headers() + create_bucket_uri, + json=params, + headers=self._get_auth_headers(), + timeout=30, ) @@ - except: + except json.JSONDecodeError: self.logging.error( f"R2 bucket creation failed. Status: {create_bucket_response.status_code}, " f"Response: {create_bucket_response.text}" )
163-175: Make download failure explicit.
A warning +passcan be mistaken for success.🔧 Proposed fix
- self.logging.warning(f"download not fully implemented for R2 bucket {bucket_name}") - pass + raise NotImplementedError( + f"R2 download requires S3-compatible access and is not implemented for bucket {bucket_name}" + )
266-315: Add timeout and avoid bare except in list_buckets.
Same reliability concerns as other HTTP calls.🔧 Proposed fix
- response = requests.get(list_buckets_uri, headers=self._get_auth_headers()) + response = requests.get( + list_buckets_uri, + headers=self._get_auth_headers(), + timeout=30, + ) @@ - except: + except json.JSONDecodeError: self.logging.error( f"403 Forbidden accessing R2 buckets. " f"Your API token may need 'R2 Read and Write' permissions." )
327-334: Make clean_bucket explicit about being unimplemented.🔧 Proposed fix
- self.logging.warning(f"clean_bucket not fully implemented for R2 bucket {bucket_name}") - pass + raise NotImplementedError( + f"clean_bucket not implemented for R2 bucket {bucket_name}" + )
336-361: Add timeout to delete bucket call.
Keeps delete requests from hanging indefinitely.🔧 Proposed fix
- response = requests.delete(delete_bucket_uri, headers=self._get_auth_headers()) + response = requests.delete( + delete_bucket_uri, + headers=self._get_auth_headers(), + timeout=30, + )sebs/cloudflare/cloudflare.py (5)
80-91: Avoid mutable default for config (still unused).
This keeps the signature safe even ifconfigstays unused.🔧 Proposed fix
- def initialize(self, config: Dict[str, str] = {}, resource_prefix: Optional[str] = None): + def initialize(self, config: Optional[Dict[str, str]] = None, resource_prefix: Optional[str] = None): @@ - # Verify credentials are valid + if config is None: + config = {} + # Verify credentials are valid
131-164: Add timeout and handle request failures during credential verification.
Prevents hangs and gives clearer failure context.🔧 Proposed fix
- response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers) + try: + response = requests.get( + f"{self._api_base_url}/user/tokens/verify", + headers=headers, + timeout=30, + ) + except requests.exceptions.RequestException as e: + raise RuntimeError("Failed to verify Cloudflare credentials") from e
349-366: Add timeout and avoid bare except in worker lookup.🔧 Proposed fix
- response = requests.get(url, headers=headers) + response = requests.get(url, headers=headers, timeout=30) @@ - except: + except ValueError: return None
443-483: Add timeout and narrow exception handling for subdomain lookup.🔧 Proposed fix
- resp = requests.get(url, headers=headers) + resp = requests.get(url, headers=headers, timeout=30) if resp.status_code == 200: - body = resp.json() + try: + body = resp.json() + except ValueError: + self.logging.warning("Invalid JSON while fetching workers.dev subdomain") + return None @@ - except Exception as e: + except requests.exceptions.RequestException as e: self.logging.warning(f"Error fetching workers.dev subdomain: {e}") return None
606-635: Enforce digit-prefix rule for all workers, not just containers.
Non-container workers starting with a digit will still produce invalid names.🔧 Proposed fix
- if container_deployment and formatted and formatted[0].isdigit(): - formatted = 'container-' + formatted + if formatted and formatted[0].isdigit(): + prefix = "container-" if container_deployment else "w-" + formatted = prefix + formatted
🧹 Nitpick comments (1)
sebs/cloudflare/cloudflare.py (1)
522-561: Prefer explicit container flag over name prefix.
Name-based detection only covers digit-prefixed names; an explicit flag is more reliable.♻️ Suggested change
- is_container = worker.name.startswith("container-") + is_container = container_deployment
| def _build_workers_dev_url(self, worker_name: str, account_id: Optional[str]) -> str: | ||
| """Build a best-effort public URL for a worker. | ||
|
|
||
| Prefer using the account's readable workers.dev subdomain when available | ||
| (e.g. <name>.<subdomain>.workers.dev). If we can't obtain that, fall | ||
| back to using the account_id as a last resort and log a warning. | ||
| """ | ||
| if account_id: | ||
| sub = self._get_workers_dev_subdomain(account_id) | ||
| if sub: | ||
| return f"https://{worker_name}.{sub}.workers.dev" | ||
| else: | ||
| # fallback: some code historically used account_id in the host | ||
| self.logging.warning( | ||
| "Using account ID in workers.dev URL as a fallback. " | ||
| "Enable the workers.dev subdomain in Cloudflare for proper URLs." | ||
| ) | ||
| return f"https://{worker_name}.{account_id}.workers.dev" | ||
| # Last fallback: plain workers.dev (may not resolve without a subdomain) | ||
| self.logging.warning( | ||
| "No account ID available; using https://{name}.workers.dev which may not be reachable." | ||
| ) | ||
| return f"https://{worker_name}.workers.dev" | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n sebs/cloudflare/cloudflare.py | sed -n '484,507p'Repository: spcl/serverless-benchmarks
Length of output: 1458
Fix warning string to use f-string and interpolate worker_name.
Line 504 has a plain string literal with {name} placeholder that won't be interpolated. It should be an f-string using {worker_name}:
🔧 Proposed fix
self.logging.warning(
- "No account ID available; using https://{name}.workers.dev which may not be reachable."
+ f"No account ID available; using https://{worker_name}.workers.dev which may not be reachable."
)🤖 Prompt for AI Agents
In `@sebs/cloudflare/cloudflare.py` around lines 484 - 507, In
_build_workers_dev_url, the last logging.warning uses a plain string with
"{name}" so worker_name is not interpolated; change that call to use an f-string
and reference {worker_name} (i.e., update the logging.warning message to f"No
account ID available; using https://{worker_name}.workers.dev which may not be
reachable.") so the actual worker_name appears in logs.
… sebs/cloudflare folder for comprehensive cloudflare doc and next steps
… if all configs are set up manually
…ge is needed and the wrappers have to work
…val suggestion, function deserialize suggestion
…ost processing in the build script
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
… varibale at compile time, as wrangler does not provide docker build args
…n dockerfile to download and link static binary dependencies
…ing their own package & deployment process orchestrated by main entry point cloudflare.py
There was a problem hiding this comment.
Actionable comments posted: 18
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
♻️ Duplicate comments (6)
docs/platforms.md (2)
242-250:⚠️ Potential issue | 🟡 MinorAdd a blank line after the table.
MD058 is still triggered because the next bullet starts immediately after the last row.
✏️ Minimal fix
| standard-3 | 2 | 8 GiB | 16 GB | | standard-4 | 4 | 12 GiB | 20 GB | + - **Metrics Collection**: Uses response-based per-invocation metrics. During each function invocation, the worker handler measures performance metrics (CPU time, wall time, memory usage) and embeds them directly in the JSON response. SeBS extracts these metrics immediately from each response. When `download_metrics()` is called for postprocessing, it only aggregates the metrics that were already collected during invocations—no additional data is fetched from external services. This approach provides immediate per-invocation granularity without delays. Note that while Cloudflare does expose an Analytics Engine, it only provides aggregated metrics without individual request-level data, making it unsuitable for detailed benchmarking purposes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/platforms.md` around lines 242 - 250, Add a blank line after the Markdown table (the block starting with "| Instance Type | vCPU | Memory | Disk |" and ending with the last table row "| standard-4 | 4 | 12 GiB | 20 GB |") so the following bullet ("**Metrics Collection**: ...") becomes separated by an empty line; this prevents MD058 by ensuring there's a blank line between the table and the next paragraph/bullet.
233-233:⚠️ Potential issue | 🟡 MinorUse
{subdomain}in the workers.dev example, not{account}.The hostname is derived from the account subdomain, not the account ID, so this placeholder points readers at the wrong value.
✏️ Doc fix
-- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{account}.workers.dev` +- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{subdomain}.workers.dev`What is the documented format of a Cloudflare workers.dev URL for a deployed Worker? Does it use the account ID or the account subdomain?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/platforms.md` at line 233, The docs currently show an incorrect placeholder in the Workers URL example: locate the string "https://{name}.{account}.workers.dev" in the HTTP Trigger example and replace the "{account}" placeholder with "{subdomain}" so it reads "https://{name}.{subdomain}.workers.dev"; also search for any other workers.dev examples using "{account}" and update them to "{subdomain}" to ensure the hostname reflects the account subdomain rather than the account ID.sebs/cloudflare/cli.py (1)
103-110:⚠️ Potential issue | 🟠 MajorDon't wait forever for the first log line.
A quiet startup, or a container that exits without emitting logs, will hang the CLI indefinitely here. Use a bounded timeout and poll container state/readiness instead of blocking on
next(logs(...)).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sebs/cloudflare/cli.py` around lines 103 - 110, The current loop using docker_instance.logs(stream=True, follow=True) and next(dkg) can block forever if no log line appears; replace the infinite while True with a bounded wait: record a deadline (e.g. time.time() + timeout), poll for logs non-blockingly and poll the container state (via your docker API wrapper—e.g. inspect/get container state or attributes on the same object used as self.docker_instance) in a loop with a short sleep, return/break when the first log line is read or when the container has exited/failed, and raise a timeout/error if the deadline passes; update references to dkg and the logs call accordingly so you stop blocking on next(dkg) and instead handle StopIteration and absence of logs by checking container readiness/state and timing out.sebs/cloudflare/durable_objects.py (1)
165-172:⚠️ Potential issue | 🟠 MajorDon't let seed writes succeed silently.
After the table lookup, this path just
passes. Callers will continue as if setup data exists, which produces invalid benchmark runs for workloads that depend on pre-populated rows.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sebs/cloudflare/durable_objects.py` around lines 165 - 172, The current code looks up table_name via self._get_table_name(...) then silently "pass" which lets callers believe seeding succeeded; replace the silent pass with a clear failure: after the table_name check, raise a descriptive exception (e.g., RuntimeError or NotImplementedError) indicating that seeding is not supported for Cloudflare Durable Objects (include benchmark and table in the message) so callers don't proceed with missing seed data; reference the existing _get_table_name call and the table_name variable to locate where to change.benchmarks/wrappers/cloudflare/python/storage.py (1)
46-51:⚠️ Potential issue | 🟡 Minor
/tmpdouble-prefix still possible inupload().If
filepathis already inwritten_filesand starts with/tmp, prepending/tmpagain creates an invalid path like/tmp/tmp/.... Thedownload()method checks for this, butupload()doesn't.🔧 Suggested fix
def upload(self, bucket, key, filepath): - if filepath in self.written_files: + if filepath in self.written_files and not filepath.startswith("/tmp"): filepath = "/tmp" + os.path.abspath(filepath) with open(filepath, "rb") as f: unique_key = self.upload_stream(bucket, key, f.read()) return unique_key🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/python/storage.py` around lines 46 - 51, The upload() method can double-prefix /tmp when filepath is in written_files and already starts with /tmp; modify upload() (function upload, variable written_files) to compute the tmp path safely — e.g., take abs_path = os.path.abspath(filepath) and only prefix "/tmp" if not abs_path.startswith("/tmp") (or use os.path.join("/tmp", os.path.basename(abs_path)) depending on desired semantics), then open that resolved path for reading and call upload_stream as before; mirror the same normalization logic used in download() to avoid creating "/tmp/tmp/..." paths.sebs/cloudflare/containers.py (1)
356-362:⚠️ Potential issue | 🟡 MinorAdd exception chaining and consider
filterparameter.Per past review: add
from efor exception chaining, and considerfilter='data'for defense-in-depth (Python 3.12+).🔧 Suggested fix
with tarfile.open(fileobj=file_obj) as tar: - tar.extractall(directory) + tar.extractall(directory, filter='data') self.logging.info(f"Downloaded node_modules to {directory} for wrangler deployment") except Exception as e: self.logging.error(f"npm install failed: {e}") - raise RuntimeError(f"Failed to install Node.js dependencies: {e}") + raise RuntimeError(f"Failed to install Node.js dependencies: {e}") from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sebs/cloudflare/containers.py` around lines 356 - 362, In the try/except around tarfile extraction in the method that opens tar via tarfile.open(fileobj=file_obj) and calls tar.extractall(directory), change the exception re-raise to use exception chaining (raise RuntimeError(f"Failed to install Node.js dependencies: {e}") from e) and, for defense-in-depth, call tar.extractall with the filter parameter (e.g., tar.extractall(directory, filter='data')) when running on Python 3.12+; update the self.logging.error line to keep the original exception message but do not swallow the original exception so the chained raise preserves the traceback.
🟡 Minor comments (8)
dockerfiles/cloudflare/nodejs/Dockerfile-21-27 (1)
21-27:⚠️ Potential issue | 🟡 MinorBroad error suppression may mask genuine failures.
Using
|| truesuppresses all errors frominit.sh, not just "resource already exists" scenarios. Download failures, permission issues, or corrupted downloads would be silently ignored, potentially causing confusing runtime errors.Consider checking for specific expected outcomes or logging failures:
🛡️ Suggested improvement
# Run benchmark init script if it exists (e.g., for ffmpeg in video-processing) # This downloads static binaries needed by the benchmark -# Note: ignore errors from init.sh (e.g., when resources already exist) RUN if [ -f "init.sh" ]; then \ chmod +x init.sh && \ - ./init.sh /app verbose x64 || true; \ + ./init.sh /app verbose x64 || echo "Warning: init.sh failed with exit code $?"; \ fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dockerfiles/cloudflare/nodejs/Dockerfile` around lines 21 - 27, The Dockerfile currently suppresses all errors from running init.sh with "|| true", which can hide real failures; replace this broad suppression by running init.sh in a small bash wrapper that captures its exit code and output (invoke "./init.sh /app verbose x64"), allow success (exit 0) or explicitly allow only known benign exit codes/messages (e.g., "resource already exists") by matching stdout/stderr, otherwise write the full error to the build logs via echo or >&2 and exit with a non-zero status so the build fails; locate the RUN that calls init.sh in the Dockerfile and implement the wrapper logic there to log errors and only ignore specific, documented conditions.dockerfiles/cloudflare/nodejs/Dockerfile-15-16 (1)
15-16:⚠️ Potential issue | 🟡 MinorUse
npm install --omit=devinstead of--production.
npm install --productionis deprecated in npm 8+ and triggers warnings. Usenpm install --omit=devfor the same behavior with modern npm versions (node:18-slim includes npm 9+).♻️ Suggested improvement
# Install dependencies -RUN npm install --production +RUN npm install --omit=dev🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dockerfiles/cloudflare/nodejs/Dockerfile` around lines 15 - 16, Replace the deprecated npm flag in the Dockerfile: locate the RUN command that currently uses "npm install --production" and change it to use "npm install --omit=dev" so it avoids npm 8+ deprecation warnings while keeping dev dependencies omitted during image build; ensure no other build steps rely on the old flag and keep the invocation in the same RUN line.benchmarks/wrappers/cloudflare/nodejs/container/handler.js-108-112 (1)
108-112:⚠️ Potential issue | 🟡 MinorUse key presence here, not truthiness.
if (!event[key])will overwrite valid body values like0,false, or""with query params. The merge should only fill missing keys.✏️ Minimal fix
- if (!event[key]) { + if (!(key in event)) { const intValue = parseInt(value); event[key] = isNaN(intValue) ? value : intValue; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/nodejs/container/handler.js` around lines 108 - 112, The loop merging URL query params into the event object uses truthiness check (if (!event[key])) which incorrectly overwrites valid falsy values like 0, false, or ""; change the condition to check presence instead (e.g., use "if (!(key in event))" or "if (!Object.prototype.hasOwnProperty.call(event, key))") in the for (const [key, value] of url.searchParams) block so only missing keys are filled with parsed intValue (from parseInt(value)) or the original value.sebs/cloudflare/durable_objects.py-183-183 (1)
183-183:⚠️ Potential issue | 🟡 MinorRemove the stray
fprefix on this warning.Ruff flags this as F541, so lint will stay red until this is a plain string literal.
✏️ Minimal fix
- self.logging.warning(f"Durable Objects data is managed within the Worker") + self.logging.warning("Durable Objects data is managed within the Worker")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sebs/cloudflare/durable_objects.py` at line 183, The warning call currently uses an unnecessary f-string: change the call to self.logging.warning(f"Durable Objects data is managed within the Worker") to a plain string literal by removing the leading f (i.e., self.logging.warning("Durable Objects data is managed within the Worker")); update the call in the durable objects code where the self.logging.warning invocation appears to eliminate the f-prefix so it no longer triggers the F541 lint.benchmarks/wrappers/cloudflare/python/handler.py-126-133 (1)
126-133:⚠️ Potential issue | 🟡 MinorReplace bare
exceptwithexcept Exception.Bare
except:catchesSystemExitandKeyboardInterrupt, which should typically propagate. Useexcept Exception:to catch only regular exceptions.🔧 Suggested fix
try: # Fetch the worker's own URL with favicon to minimize overhead final_url = URL.new(request.url) final_url.pathname = '/favicon' await js_fetch(str(final_url), method='HEAD') - except: + except Exception: # Ignore fetch errors pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/python/handler.py` around lines 126 - 133, The try/except block that fetches the worker URL (creating final_url via URL.new(request.url) and calling js_fetch(...)) currently uses a bare except:, which can swallow SystemExit/KeyboardInterrupt; replace it with except Exception: to only catch regular exceptions and preserve external interrupts—update the except clause in that try/except around final_url/js_fetch accordingly.benchmarks/wrappers/cloudflare/nodejs/container/storage.js-15-18 (1)
15-18:⚠️ Potential issue | 🟡 MinorRemove duplicate
static worker_urldeclaration.
worker_urlis declared twice on lines 15 and 18. Remove the duplicate.🔧 Suggested fix
static worker_url = null; // Set by handler from X-Worker-URL header - - static worker_url = null; // Set by handler from X-Worker-URL header - static unique_name(name) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/nodejs/container/storage.js` around lines 15 - 18, The class has a duplicate static property declaration "worker_url" declared twice; remove the redundant declaration so only one "static worker_url = null; // Set by handler from X-Worker-URL header" remains (locate the duplicate declarations of worker_url in the class and delete the extra one).benchmarks/wrappers/cloudflare/python/storage.py-111-116 (1)
111-116:⚠️ Potential issue | 🟡 MinorRemove duplicate return statement.
Line 116 is an unreachable duplicate of line 115.
🔧 Suggested fix
`@staticmethod` def get_instance(): if storage.instance is None: raise RuntimeError("must init storage singleton first") return storage.instance - return storage.instance🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/python/storage.py` around lines 111 - 116, The get_instance static method in storage has a duplicated unreachable return; inside storage.get_instance() remove the second redundant "return storage.instance" so the method only raises if instance is None and otherwise returns storage.instance (update the body of storage.get_instance to have a single return statement).benchmarks/wrappers/cloudflare/nodejs/handler.js-65-85 (1)
65-85:⚠️ Potential issue | 🟡 MinorURL parameter parsing loses data when values contain
=.The manual parsing with
split('=')only captures the first two parts, so a parameter like?key=a=bwould incorrectly parse as{key: 'a'}instead of{key: 'a=b'}. Consider usingURLSearchParamsfor robust parsing.🔧 Suggested fix
// Parse query string into event (URL parameters override/merge with body) // This makes it compatible with both input formats - const urlParts = request.url.split('?'); - if (urlParts.length > 1) { - const query = urlParts[1]; - const pairs = query.split('&'); - for (const p of pairs) { - const [k, v] = p.split('='); - try { - if (v === undefined) { - event[k] = null; - } else if (!Number.isNaN(Number(v)) && Number.isFinite(Number(v))) { - // mirror Python attempt to convert to int - const n = Number(v); - event[k] = Number.isInteger(n) ? parseInt(v, 10) : n; - } else { - event[k] = decodeURIComponent(v); - } - } catch (e) { - event[k] = v; - } - } - } + const url = new URL(request.url); + for (const [k, v] of url.searchParams) { + if (!Number.isNaN(Number(v)) && Number.isFinite(Number(v))) { + const n = Number(v); + event[k] = Number.isInteger(n) ? parseInt(v, 10) : n; + } else { + event[k] = v; + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/nodejs/handler.js` around lines 65 - 85, The current manual parsing of request.url loses data when values contain '='; replace the manual split loop with URLSearchParams: create a URL (e.g. new URL(request.url, 'http://localhost')) or new URLSearchParams(rawQuery) from request.url, iterate for (const [k, v] of searchParams) and apply the same conversion logic used now (treat missing-value pairs as null, convert numeric strings to Number/parseInt, otherwise decodeURIComponent), ensuring you preserve the existing behavior of assigning to event[k]; update the block that references request.url and event to use searchParams so values containing '=' are preserved.
🧹 Nitpick comments (18)
dockerfiles/cloudflare/python/Dockerfile (2)
18-21: Error suppression could mask real failures.The
|| truesuppresses all errors frominit.sh, including legitimate failures (e.g., network issues during download, disk space problems). Consider being more selective about which errors to ignore.🔧 Suggested approach to handle errors more selectively
RUN if [ -f "init.sh" ]; then \ chmod +x init.sh && \ - ./init.sh /app verbose x64 || true; \ + ./init.sh /app verbose x64 || echo "Warning: init.sh failed (may be expected if resources already exist)"; \ fiAlternatively, if specific exit codes indicate "resource exists," check for those explicitly rather than suppressing all errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dockerfiles/cloudflare/python/Dockerfile` around lines 18 - 21, The RUN step currently appends "|| true" after executing init.sh which hides all failures from init.sh; remove the unconditional suppression and instead handle expected/benign cases explicitly: update the Docker RUN (and/or the init.sh script referenced as init.sh) so that init.sh returns 0 for idempotent/known-OK conditions or document and check for specific exit codes in the RUN wrapper, e.g., test the exit code and only ignore the known harmless ones (resource-exists codes) while letting other non-zero exits fail the build; ensure init.sh is made idempotent (or exports meaningful exit codes) and that the Docker RUN will propagate real errors instead of using "|| true".
1-38: Consider adding a non-root user for improved security posture.The container runs as root by default. While this may be acceptable for isolated benchmark containers, adding a non-root user is a security best practice that reduces the attack surface if the container is compromised.
🛡️ Suggested fix to run as non-root user
ARG BASE_IMAGE=python:3.11-slim FROM ${BASE_IMAGE} # Install system dependencies needed for benchmarks RUN apt-get update && apt-get install -y --no-install-recommends \ wget \ xz-utils \ && rm -rf /var/lib/apt/lists/* +# Create non-root user +RUN useradd --create-home --shell /bin/bash appuser + WORKDIR /app # Copy all application files first COPY . . # Run benchmark init script if it exists (e.g., for ffmpeg in video-processing) # This downloads static binaries needed by the benchmark # Note: ignore errors from init.sh (e.g., when resources already exist) RUN if [ -f "init.sh" ]; then \ chmod +x init.sh && \ ./init.sh /app verbose x64 || true; \ - fi + fi && \ + chown -R appuser:appuser /app # Install dependencies # Core dependencies for wrapper modules: # - storage.py uses urllib (stdlib) to proxy R2 requests through worker.js # - nosql.py, worker.py, handler.py use stdlib only # Then install benchmark-specific requirements from requirements.txt RUN pip install --no-cache-dir --upgrade pip && \ if [ -f requirements.txt ]; then pip install --no-cache-dir -r requirements.txt; fi +# Switch to non-root user +USER appuser + # Expose port 8080 for container communication EXPOSE 8080🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dockerfiles/cloudflare/python/Dockerfile` around lines 1 - 38, The Dockerfile currently runs as root; create and switch to a non-root user by adding a RUN that creates a user/group (e.g., groupadd/useradd or adduser), chown the WORKDIR (/app) and any files created during build (so init.sh, requirements.txt and installed artifacts are owned by that user), set HOME if needed, and add a USER <username> before the final CMD ["python","handler.py"]; ensure any RUN steps that require root (apt-get, pip installs) occur before changing USER and reference symbols BASE_IMAGE, WORKDIR, init.sh, requirements.txt and CMD when making the changes.benchmarks/100.webapps/130.crud-api/nodejs/package.json (1)
7-8: Remove emptydependenciesblock unless a tool requires it.Line 7-Line 8 currently declare an empty object. If there are no runtime deps, omitting the key keeps the manifest cleaner and avoids noisy diffs later.
Optional cleanup
- "dependencies": { - } + "dependencies": {}Or remove it entirely:
- "dependencies": { - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/100.webapps/130.crud-api/nodejs/package.json` around lines 7 - 8, The package.json currently contains an empty "dependencies" object; remove the "dependencies" key entirely (or add required runtime deps if there are any) so the manifest is cleaner and avoids noisy diffs. Locate the package.json entry named "dependencies" and delete that key and its empty object value, then run any JSON linting/validation to ensure package.json remains valid; if a tool requires the key, replace the empty object with the required dependency entries instead.benchmarks/wrappers/cloudflare/nodejs/request-polyfill.js (1)
26-57: Polyfill has a design limitation but it doesn't affect current usage.The fetch request only executes inside
.pipe(), so direct callback-only usage (request(url, callback)) would not work. However, the codebase only uses the polyfill asrequest(url).pipe(file), so this limitation is not triggered. If callback-only invocations were added in the future, they would fail.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/nodejs/request-polyfill.js` around lines 26 - 57, The polyfill's fetch only runs inside requestObj.pipe, so calling request(url, callback) (i.e. using the callback-only path) does nothing; update the request factory (where requestObj is created) to start the fetch immediately when a callback is provided or expose a start method so fetch runs even if pipe is never called—specifically modify the logic that builds requestObj (refer to requestObj and its pipe method) to detect a provided callback and invoke the same fetch flow (the .then/.catch handling that emits destination errors and calls callback) on creation, ensuring the callback receives the response or error even without calling pipe.sebs/cloudflare/containers.py (3)
351-351: Prefix unused variable with underscore.The
statvariable fromget_archive()is never used.🔧 Suggested fix
- bits, stat = cli.docker_instance.get_archive(f"{container_path}/node_modules") + bits, _stat = cli.docker_instance.get_archive(f"{container_path}/node_modules")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sebs/cloudflare/containers.py` at line 351, The call to cli.docker_instance.get_archive(...) assigns an unused variable stat; rename it to _stat (or prefix with an underscore) to indicate it's intentionally unused and avoid linter warnings — update the assignment in the code where get_archive is invoked (the expression using cli.docker_instance.get_archive and the left-hand variables) to use bits, _stat instead of bits, stat.
458-465: Add exception chaining in_build_container_image_local.Preserve the original exception traceback using
from e.🔧 Suggested fix
except docker.errors.BuildError as e: error_msg = f"Docker build failed for {image_tag}: {e}" self.logging.error(error_msg) - raise RuntimeError(error_msg) + raise RuntimeError(error_msg) from e except Exception as e: error_msg = f"Unexpected error building Docker image {image_tag}: {e}" self.logging.error(error_msg) - raise RuntimeError(error_msg) + raise RuntimeError(error_msg) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sebs/cloudflare/containers.py` around lines 458 - 465, In _build_container_image_local, the caught exceptions (docker.errors.BuildError and the generic Exception) are re-raised as RuntimeError without preserving the original traceback; update both except blocks to chain the original exception by using "raise RuntimeError(error_msg) from e" so the original exception (e) is included, keeping the existing logging calls (self.logging.error(error_msg)) and messages unchanged.
252-252: Remove extraneous f-string prefix.Line 252 uses an f-string without any placeholders.
🔧 Suggested fix
- self.logging.info(f"Copied worker.js orchestration file from nodejs/container") + self.logging.info("Copied worker.js orchestration file from nodejs/container")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sebs/cloudflare/containers.py` at line 252, The logging call uses an unnecessary f-string prefix; update the self.logging.info call in containers.py (the line that logs "Copied worker.js orchestration file from nodejs/container") to use a plain string (remove the leading f) since there are no placeholders, e.g., change the self.logging.info invocation to pass a normal string literal.benchmarks/wrappers/cloudflare/python/handler.py (2)
43-43: Split multiple statements onto separate lines.Static analysis flags this as E701. Splitting improves readability.
🔧 Suggested fix
- if "favicon" in request.url: return Response("None") + if "favicon" in request.url: + return Response("None")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/python/handler.py` at line 43, The single-line conditional "if \"favicon\" in request.url: return Response(\"None\")" should be split into two lines to satisfy E701; replace it with a standard if block that checks "favicon" in request.url and then on the next line returns Response("None") (locate the statement in benchmarks/wrappers/cloudflare/python/handler.py and update that inline if to a multi-line if block).
154-199: Consider removing unused code.The functions
import_from_path,MakeAsync,AddAwait, andmake_benchmark_funcappear unused (lines 94-95 are commented out). If these are no longer needed, removing them would reduce maintenance burden.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/python/handler.py` around lines 154 - 199, Remove the dead code related to the AST-based transformation: delete the import_from_path function, the MakeAsync class, the AddAwait class, the make_benchmark_func function and the working_dir variable (and any associated unused imports such as importlib, ast, sys, os if they are only used by those symbols); before deleting, search for any callers of import_from_path, MakeAsync, AddAwait, or make_benchmark_func to ensure nothing else depends on them, and run tests/lint to confirm there are no remaining references.benchmarks/wrappers/cloudflare/python/container/handler.py (4)
185-189: Traceback exposure in error responses.The error response includes the full traceback, which can expose internal file paths and code structure. For a benchmarking tool this may be acceptable for debugging, but consider making this configurable or removing in production deployments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/python/container/handler.py` around lines 185 - 189, The error response currently includes a full traceback (see error_response and its use before self.wfile.write in handler.py), which can leak internals; change this to make traceback inclusion configurable or remove it in production by checking a flag (e.g., an environment/config value like SHOW_TRACEBACK or handler.show_traceback) and only add traceback: traceback.format_exc() to error_response when that flag is true, otherwise return a minimal error message or generic error code; update any relevant initialization/path where the handler is created to read the configuration and set the flag accordingly.
191-193: Parameterformatshadows Python builtin.This is flagged by static analysis. While this matches the
BaseHTTPRequestHandlersignature, you can usefmtto avoid shadowing.🔧 Suggested fix
- def log_message(self, format, *args): + def log_message(self, fmt, *args): # Override to use print instead of stderr - print(f"{self.address_string()} - {format % args}") + print(f"{self.address_string()} - {fmt % args}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/python/container/handler.py` around lines 191 - 193, Rename the parameter in the log_message override to avoid shadowing the builtin: change the parameter name from format to fmt in the log_message(self, format, *args) definition and update the body to use fmt where it currently uses format (e.g., f"{self.address_string()} - {fmt % args}"); ensure the method name log_message and its behavior remain unchanged so it still overrides BaseHTTPRequestHandler.
38-48: Consider handlingtimeout=Noneexplicitly inpatched_urlopen.When
timeoutisNoneand forwarded via**kwargs, the originalurlopenreceives it twice if the caller also passed it inkwargs. The current signature hastimeoutas a positional-or-keyword arg but also passes**kwargs.🔧 Suggested fix
-def patched_urlopen(url, data=None, timeout=None, **kwargs): +def patched_urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT, **kwargs): if isinstance(url, str): req = urllib.request.Request(url, data=data) req.add_header('User-Agent', 'SeBS/1.2 (https://github.com/spcl/serverless-benchmarks) SeBS Benchmark Suite/1.2') - return original_urlopen(req, timeout=timeout, **kwargs) + return original_urlopen(req, timeout=timeout) elif isinstance(url, urllib.request.Request): if not url.has_header('User-Agent'): url.add_header('User-Agent', 'SeBS/1.2 (https://github.com/spcl/serverless-benchmarks) SeBS Benchmark Suite/1.2') - return original_urlopen(url, data=data, timeout=timeout, **kwargs) + return original_urlopen(url, data=data, timeout=timeout) else: - return original_urlopen(url, data=data, timeout=timeout, **kwargs) + return original_urlopen(url, data=data, timeout=timeout)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/python/container/handler.py` around lines 38 - 48, patched_urlopen may forward timeout twice if timeout is None and also present in kwargs; update patched_urlopen (the function that wraps original_urlopen) to avoid passing duplicate timeout by checking kwargs: if 'timeout' is in kwargs, do not pass the explicit timeout parameter; otherwise pass timeout only when it is not None (or remove timeout key from kwargs before calling original_urlopen). Ensure this logic is applied in all three call sites that invoke original_urlopen so you never send timeout both positionally/keyword and via **kwargs.
148-150: Memory calculation may be incorrect on macOS.
resource.getrusage().ru_maxrssreturns kilobytes on Linux but bytes on macOS. Dividing by 1024 works for Linux but would be incorrect on macOS. Since this is a container handler (Linux), this is likely acceptable, but worth documenting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/python/container/handler.py` around lines 148 - 150, The memory calculation using resource.getrusage(...).ru_maxrss assumes ru_maxrss is in kilobytes (Linux) but on macOS it is in bytes; update the logic that computes memory_mb (the variable currently set and stored to log_data['measurement']['memory_used_mb']) to branch on the platform (e.g., platform.system() or sys.platform): if Darwin/macOS divide ru_maxrss by 1024*1024 to get MB, otherwise divide by 1024 for Linux, and add a brief comment noting the container/Linux assumption for clarity.benchmarks/wrappers/cloudflare/nodejs/container/storage.js (1)
123-137:upload()return value differs fromuploadStream().
upload()returns[unique_key, uploadPromise]whileuploadStream()returns[passThrough, upload, unique_key]. This inconsistency may confuse callers. Consider documenting the expected return format or aligning the APIs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/nodejs/container/storage.js` around lines 123 - 137, The upload() method currently returns [unique_key, uploadPromise] while uploadStream() returns [passThrough, upload, unique_key], causing inconsistent API; update upload() in storage.upload to match uploadStream's tuple shape (e.g., return [null, uploadPromise, unique_key] or wrap uploadPromise so callers get the same three-element array), or else align uploadStream to the two-element shape—modify the upload() return to mirror uploadStream (references: upload, uploadStream, _upload_stream_with_key, unique_name) and update any callers or documentation accordingly so both functions have the same return format.benchmarks/wrappers/cloudflare/python/container/storage.py (1)
71-77: Use exception chaining withraise ... from e.When re-raising exceptions, preserve the original traceback by using
from e. This applies to multiple exception handlers in this file.🔧 Suggested fix for this block (apply similar pattern elsewhere)
try: with urllib.request.urlopen(req) as response: result = json.loads(response.read().decode('utf-8')) return result['key'] except Exception as e: print(f"R2 upload error: {e}") - raise RuntimeError(f"Failed to upload to R2: {e}") + raise RuntimeError(f"Failed to upload to R2: {e}") from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/python/container/storage.py` around lines 71 - 77, The except block that catches Exception as e (in the R2 upload logic that calls urllib.request.urlopen and returns result['key']) should re-raise the RuntimeError using exception chaining (raise RuntimeError(f"Failed to upload to R2: {e}") from e) instead of plain raise so the original traceback is preserved; apply the same change to all other exception handlers in this file that currently re-raise a new exception without using "from e" (locate blocks around the upload/download helper functions and replace their plain raises with chained raises referencing the caught exception variable).sebs/cloudflare/cloudflare.py (2)
438-441: Chain exceptions for better debugging.When re-raising exceptions within an
exceptblock, useraise ... from eto preserve the exception chain for debugging.🔧 Proposed fix
except RuntimeError as e: error_msg = f"Wrangler deployment failed for worker {worker_name}: {e}" self.logging.error(error_msg) - raise RuntimeError(error_msg) + raise RuntimeError(error_msg) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sebs/cloudflare/cloudflare.py` around lines 438 - 441, The except block handling RuntimeError in the deployment flow should preserve the original exception chain: in the except RuntimeError as e block where you build error_msg and call self.logging.error(error_msg) and then raise RuntimeError(error_msg), change the re-raise to use "raise RuntimeError(error_msg) from e" so the original exception (e) is chained and retained for debugging.
694-694: Rename unused loop variable.The
request_idvariable is not used within the loop body. Prefix with underscore to indicate it's intentionally unused.🔧 Proposed fix
- for request_id, result in requests.items(): + for _request_id, result in requests.items():🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sebs/cloudflare/cloudflare.py` at line 694, Rename the unused loop variable request_id to _request_id in the for loop that iterates over requests.items() (the line "for request_id, result in requests.items():") to indicate it's intentionally unused; update the loop header in the same function in cloudflare.py and ensure no other code in the loop body references request_id (adjust references if they exist).benchmarks/wrappers/cloudflare/python/nosql.py (1)
28-40: Consider documenting pickle security context.Static analysis flags
pickle.loadsas a security concern. While this appears to be internal data serialization (not untrusted external input), pickle deserialization can execute arbitrary code if data is ever tampered with or comes from an unexpected source.The comment explains the technical reason for using pickle (pyodide cloning issues). Consider adding a brief note about the security assumption that stored data is trusted, or evaluate if JSON with custom encoding could work for the specific data types being serialized.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/wrappers/cloudflare/python/nosql.py` around lines 28 - 40, Add a short in-code comment above the data_pre and data_post functions explaining the security assumptions for using pickle (that stored data is trusted and not user-provided), note that pickle.loads can execute arbitrary code if data is tampered with, and recommend mitigations (validate/verify data provenance, use a digital signature/checksum, or switch to a safer serializer like JSON with custom encoding if feasible); if opting to keep pickle for pyodide cloning, explicitly document why pickle is required and any planned constraints (trusted storage, no external input) so future reviewers understand the risk and rationale.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 24b791dd-5dea-4594-9dcf-91da8a6e9c4c
📒 Files selected for processing (37)
benchmarks/000.microbenchmarks/040.server-reply/config.jsonbenchmarks/100.webapps/120.uploader/python/function_cloudflare.pybenchmarks/100.webapps/130.crud-api/nodejs/function.jsbenchmarks/100.webapps/130.crud-api/nodejs/package.jsonbenchmarks/300.utilities/311.compression/nodejs/function.jsbenchmarks/300.utilities/311.compression/nodejs/package.jsonbenchmarks/wrappers/cloudflare/nodejs/build.jsbenchmarks/wrappers/cloudflare/nodejs/container/handler.jsbenchmarks/wrappers/cloudflare/nodejs/container/nosql.jsbenchmarks/wrappers/cloudflare/nodejs/container/package.jsonbenchmarks/wrappers/cloudflare/nodejs/container/storage.jsbenchmarks/wrappers/cloudflare/nodejs/container/worker.jsbenchmarks/wrappers/cloudflare/nodejs/handler.jsbenchmarks/wrappers/cloudflare/nodejs/nosql.jsbenchmarks/wrappers/cloudflare/nodejs/request-polyfill.jsbenchmarks/wrappers/cloudflare/nodejs/storage.jsbenchmarks/wrappers/cloudflare/python/container/handler.pybenchmarks/wrappers/cloudflare/python/container/nosql.pybenchmarks/wrappers/cloudflare/python/container/storage.pybenchmarks/wrappers/cloudflare/python/handler.pybenchmarks/wrappers/cloudflare/python/nosql.pybenchmarks/wrappers/cloudflare/python/storage.pyconfigs/cloudflare-test.jsonconfigs/systems.jsondockerfiles/cloudflare/Dockerfile.managedockerfiles/cloudflare/nodejs/Dockerfiledockerfiles/cloudflare/python/Dockerfiledocs/platforms.mddocs/storage.mdsebs/cli.pysebs/cloudflare/__init__.pysebs/cloudflare/cli.pysebs/cloudflare/cloudflare.pysebs/cloudflare/config.pysebs/cloudflare/containers.pysebs/cloudflare/durable_objects.pysebs/cloudflare/function.py
🚧 Files skipped from review as they are similar to previous changes (7)
- benchmarks/300.utilities/311.compression/nodejs/package.json
- benchmarks/100.webapps/130.crud-api/nodejs/function.js
- benchmarks/wrappers/cloudflare/nodejs/container/package.json
- benchmarks/300.utilities/311.compression/nodejs/function.js
- benchmarks/wrappers/cloudflare/nodejs/container/nosql.js
- docs/storage.md
- benchmarks/000.microbenchmarks/040.server-reply/config.json
| res = await pyfetch(url, headers=headers) | ||
| bs = await res.bytes() | ||
|
|
||
| with open(download_path, 'wb') as f: | ||
| f.write(bs) |
There was a problem hiding this comment.
Validate HTTP download success before writing and uploading.
Right now any HTTP response body (including 404/500 pages) is written and then uploaded as if it were valid content. Add an explicit status check and fail fast on non-success responses.
🔧 Proposed fix
async def do_request(url, download_path):
headers = {'User-Agent': SEBS_USER_AGENT}
res = await pyfetch(url, headers=headers)
+ if not res.ok:
+ raise RuntimeError(f"Download failed with status {res.status} for URL: {url}")
bs = await res.bytes()
with open(download_path, 'wb') as f:
f.write(bs)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmarks/100.webapps/120.uploader/python/function_cloudflare.py` around
lines 16 - 20, The code writes and uploads whatever body pyfetch returned
without checking HTTP status; update the download flow that uses pyfetch/res and
download_path to first inspect the response status (e.g., res.status or res.ok)
and raise/return an error (fail fast) for non-2xx responses so the file is not
written or uploaded; only call res.bytes() and open/download_path for writing
when the response indicates success, and include a clear error/exception message
when rejecting non-success responses.
| output_prefix = event.get('bucket').get('output') | ||
| url = event.get('object').get('url') | ||
| name = os.path.basename(url) | ||
| download_path = '/tmp/{}'.format(name) |
There was a problem hiding this comment.
Use a secure temporary file path instead of a predictable /tmp/<name>.
The current path is predictable and can collide across concurrent invocations. Use tempfile (mkstemp/NamedTemporaryFile) to avoid race/collision issues.
🔒 Proposed fix
import datetime
import os
+import tempfile
@@
- name = os.path.basename(url)
- download_path = '/tmp/{}'.format(name)
+ name = os.path.basename(url)
+ fd, download_path = tempfile.mkstemp(prefix="sebs-", suffix=f"-{name}", dir="/tmp")
+ os.close(fd)🧰 Tools
🪛 Ruff (0.15.6)
[error] 28-28: Probable insecure usage of temporary file or directory: "/tmp/{}"
(S108)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmarks/100.webapps/120.uploader/python/function_cloudflare.py` at line
28, The assignment download_path = '/tmp/{}'.format(name) uses a predictable
filename; replace it by creating a secure temp file (e.g., tempfile.mkstemp() or
tempfile.NamedTemporaryFile()) and use the returned path/filename instead of
composing /tmp/<name>. Update the code that writes/read the file to use the file
descriptor or file object from tempfile, close the fd (if using mkstemp) and
ensure proper cleanup; refer to the download_path variable and the surrounding
download/save logic when making this change.
| // Extract Worker URL from header for R2 and NoSQL proxy | ||
| const workerUrl = req.headers['x-worker-url']; | ||
| if (workerUrl) { | ||
| if (storage && storage.storage && storage.storage.set_worker_url) { | ||
| storage.storage.set_worker_url(workerUrl); | ||
| } | ||
| if (nosql && nosql.nosql && nosql.nosql.set_worker_url) { | ||
| nosql.nosql.set_worker_url(workerUrl); | ||
| } | ||
| console.log(`Set worker URL for R2/NoSQL proxy: ${workerUrl}`); |
There was a problem hiding this comment.
Don't take the storage/NoSQL proxy origin from a request header.
This header is copied straight into the backend clients, so any caller that can reach the container can redirect follow-up requests to arbitrary endpoints. Resolve the Worker origin from deployment config, or hard-validate the expected host before accepting it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmarks/wrappers/cloudflare/nodejs/container/handler.js` around lines 74 -
83, Do not trust req.headers['x-worker-url'] directly; instead derive the worker
origin from deployment configuration or a validated allowlist before calling
storage.storage.set_worker_url or nosql.nosql.set_worker_url. Replace direct
usage of the header in the handler: remove unconditional calls that pass
workerUrl to storage.storage.set_worker_url and nosql.nosql.set_worker_url, and
implement logic that (a) reads the canonical worker origin from config (or an
environment variable) or (b) validates the header value against a strict
allowlist/regex of expected hosts and only then calls
storage.storage.set_worker_url / nosql.nosql.set_worker_url; log a warning and
skip setting the backend clients if validation fails.
| // For debugging: check /tmp directory before and after benchmark | ||
| const fs = require('fs'); | ||
|
|
||
| // Call the benchmark function | ||
| const ret = await benchmarkHandler(event); | ||
|
|
||
| // Check what was downloaded | ||
| const tmpFiles = fs.readdirSync('/tmp'); | ||
| for (const file of tmpFiles) { | ||
| const filePath = `/tmp/${file}`; | ||
| const stats = fs.statSync(filePath); | ||
| if (stats.size < 500) { | ||
| const content = fs.readFileSync(filePath, 'utf8'); | ||
| } | ||
| } |
There was a problem hiding this comment.
Remove the synchronous /tmp scan from the benchmark path.
This adds extra filesystem work to every invocation and will skew the latency/memory numbers SeBS is trying to measure. content is never used, so this looks like leftover debugging.
✏️ Minimal fix
- // For debugging: check /tmp directory before and after benchmark
- const fs = require('fs');
-
// Call the benchmark function
const ret = await benchmarkHandler(event);
-
- // Check what was downloaded
- const tmpFiles = fs.readdirSync('/tmp');
- for (const file of tmpFiles) {
- const filePath = `/tmp/${file}`;
- const stats = fs.statSync(filePath);
- if (stats.size < 500) {
- const content = fs.readFileSync(filePath, 'utf8');
- }
- }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmarks/wrappers/cloudflare/nodejs/container/handler.js` around lines 120
- 134, The temporary /tmp directory synchronous scan (fs.readdirSync,
fs.statSync, fs.readFileSync) after calling benchmarkHandler is leftover
debugging and should be removed because it adds I/O and alloc overhead and uses
the unused variable content; delete the entire block that builds tmpFiles and
iterates over it (the fs.* calls and the content read) from the code around the
await benchmarkHandler(event) call (refer to benchmarkHandler, tmpFiles,
fs.readdirSync, fs.statSync, fs.readFileSync) — alternatively, if you need this
for local debugging only, wrap it behind a debug flag so it is not executed in
normal benchmark runs.
| async query(keyPrefix) { | ||
| const list = await this.ctx.storage.list(); | ||
| const items = []; | ||
| for (const [k, v] of list) { | ||
| items.push(v); | ||
| } | ||
| return { items }; |
There was a problem hiding this comment.
query currently ignores keyPrefix and returns all rows.
This breaks query semantics and can become unbounded as data grows. Filter by prefix before appending results.
🔧 Proposed fix
async query(keyPrefix) {
const list = await this.ctx.storage.list();
const items = [];
+ const prefix = Array.isArray(keyPrefix) ? keyPrefix.join(':') : String(keyPrefix ?? '');
for (const [k, v] of list) {
- items.push(v);
+ if (!prefix || k.startsWith(prefix)) {
+ items.push(v);
+ }
}
return { items };
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async query(keyPrefix) { | |
| const list = await this.ctx.storage.list(); | |
| const items = []; | |
| for (const [k, v] of list) { | |
| items.push(v); | |
| } | |
| return { items }; | |
| async query(keyPrefix) { | |
| const list = await this.ctx.storage.list(); | |
| const items = []; | |
| const prefix = Array.isArray(keyPrefix) ? keyPrefix.join(':') : String(keyPrefix ?? ''); | |
| for (const [k, v] of list) { | |
| if (!prefix || k.startsWith(prefix)) { | |
| items.push(v); | |
| } | |
| } | |
| return { items }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmarks/wrappers/cloudflare/nodejs/container/worker.js` around lines 31 -
37, The query method is ignoring the keyPrefix and returning all rows; update
the async query(keyPrefix) implementation to filter entries from
this.ctx.storage.list() by checking the key (the variable k in the for (const
[k, v] of list) loop) against keyPrefix (e.g., skip unless keyPrefix is falsy or
k.startsWith(keyPrefix)) before pushing v into the items array so only matching
rows are returned.
| RUN curl -LsSf https://astral.sh/uv/install.sh | sh && \ | ||
| /root/.local/bin/uv tool install workers-py | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -nP 'curl .*install\.sh \| sh|uv tool install workers-py' dockerfiles/cloudflare/Dockerfile.manageRepository: spcl/serverless-benchmarks
Length of output: 180
🏁 Script executed:
# Check the Dockerfile context and see if there are ARG definitions or security patterns already in use
head -n 30 dockerfiles/cloudflare/Dockerfile.manageRepository: spcl/serverless-benchmarks
Length of output: 1292
🏁 Script executed:
# Search for other similar curl|sh patterns in the codebase
rg 'curl.*\| sh' dockerfiles/Repository: spcl/serverless-benchmarks
Length of output: 167
🏁 Script executed:
# Check for other uv tool installations
rg 'uv tool install' dockerfiles/Repository: spcl/serverless-benchmarks
Length of output: 158
🏁 Script executed:
# Look for existing ARG usage for security/version management in Dockerfiles
rg 'ARG.*SHA|ARG.*VERSION' dockerfiles/Repository: spcl/serverless-benchmarks
Length of output: 1513
🏁 Script executed:
# Check if there are any comments or git history explaining the workers-py unpinned installation
git log --oneline -n 20 dockerfiles/cloudflare/Dockerfile.manage 2>/dev/null || echo "Git history not available"Repository: spcl/serverless-benchmarks
Length of output: 160
🏁 Script executed:
# Look for any documentation or discussion about workers-py version requirements
rg 'workers-py' . --type=md --type=txt --type=pyRepository: spcl/serverless-benchmarks
Length of output: 114
Implement integrity verification and version pinning for uv and workers-py installation.
Line 24 pipes a network script directly to sh without integrity verification; Line 25 installs workers-py unpinned. Both pose supply-chain and reproducibility risks. This is inconsistent with the Docker GPG verification already implemented in this same Dockerfile (lines 11-12).
Proposed safer install pattern
+ARG UV_INSTALL_SHA256
+ARG WORKERS_PY_VERSION
-RUN curl -LsSf https://astral.sh/uv/install.sh | sh && \
- /root/.local/bin/uv tool install workers-py
+RUN curl -fsSLo /tmp/uv-install.sh https://astral.sh/uv/install.sh \
+ && echo "${UV_INSTALL_SHA256} /tmp/uv-install.sh" | sha256sum -c - \
+ && sh /tmp/uv-install.sh \
+ && /root/.local/bin/uv tool install "workers-py==${WORKERS_PY_VERSION}" \
+ && rm -f /tmp/uv-install.sh🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dockerfiles/cloudflare/Dockerfile.manage` around lines 24 - 26, Avoid piping
the remote install script directly to sh and pin versions/checksums: download
the astral installer with curl to a file (the existing RUN that invokes "curl
-LsSf https://astral.sh/uv/install.sh | sh" and the subsequent
"/root/.local/bin/uv tool install workers-py" should be changed), fetch or ARG
the expected sha256 (or GPG signature) for the installer and verify it before
executing, install a pinned uv release (call the installer with the specific uv
version or use a released binary artifact) and then install workers-py with an
explicit pinned version (e.g., use "uv tool install workers-py@<VERSION>"),
failing the build if the checksum/signature or versions do not match; reference
the installer download step and the "/root/.local/bin/uv tool install
workers-py" invocation when making these changes.
| # Start the HTTP server | ||
| CMD ["node", "handler.js"] |
There was a problem hiding this comment.
Container runs as root user.
The static analysis tool flagged that this container runs as root (DS-0002). Running containers as root is a security risk—if the application is compromised, the attacker has full privileges within the container.
Consider adding a non-root user:
🔒 Suggested fix to run as non-root user
# Copy all application files
COPY . .
+# Create non-root user for security
+RUN groupadd -r nodejs && useradd -r -g nodejs nodejs && \
+ chown -R nodejs:nodejs /app
+
# Run benchmark init script if it exists (e.g., for ffmpeg in video-processing)
# This downloads static binaries needed by the benchmark
-# Note: ignore errors from init.sh (e.g., when resources already exist)
RUN if [ -f "init.sh" ]; then \
chmod +x init.sh && \
- ./init.sh /app verbose x64 || true; \
+ ./init.sh /app verbose x64 || echo "Warning: init.sh failed"; \
fi
+USER nodejs
+
# Expose port 8080 for container communication
EXPOSE 8080Note: If init.sh requires root privileges (e.g., installing packages), run the USER directive after the init script.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dockerfiles/cloudflare/nodejs/Dockerfile` around lines 35 - 36, The
Dockerfile currently starts the container with CMD ["node", "handler.js"] which
runs as root; create a non-root user and group (e.g., appuser), chown the app
files and any runtime directories to that user, and add a USER appuser directive
before the CMD so the process runs unprivileged; if init.sh requires root
(package installs, apt, etc.) keep running those steps as root and move the USER
directive to after init.sh to drop privileges before CMD.
| # Only override container_deployment if explicitly set via CLI | ||
| # If not in config, use CLI default (False) | ||
| if container_deployment or "container_deployment" not in config_obj.get("experiments", {}): | ||
| update_nested_dict(config_obj, ["experiments", "container_deployment"], container_deployment) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Click 7.1.2, for an option declared as --foo/--no-foowithdefault=False, can the callback distinguish between an omitted flag and explicit --no-foo? Is default=None supported for tri-state behavior?
💡 Result:
No, in Click 7.1.2, the callback cannot distinguish between an omitted flag and explicit --no-foo for an option declared as --foo/--no-foo with default=False. Both cases pass False to the callback with no mechanism to differentiate the source (omitted vs. explicit). The value is the same (False), and there is no get_parameter_source or equivalent in 7.1.2 to check if it came from the command line or default. No, default=None is not supported for tri-state behavior in boolean flags like --foo/--no-foo. Setting default=None passes None only if omitted, but --no-foo sets False, and --foo sets True. However, docs note the default is passed as-is, but for flags, behavior may differ; primary sources confirm no explicit tri-state support without custom handling, and default=None prompts differently but does not enable distinguishing in callback without additional context access (unavailable in 7.1.2).
Citations:
- 1: https://dokk.org/documentation/pallets-click/7.1.2/parameters/
- 2: https://click.palletsprojects.com/en/7.x/options/?highlight=options
- 3: https://click.palletsprojects.com/en/7.x/changelog/
- 4: https://github.com/pallets/click/tree/7.1.2
- 5: https://github.com/pallets/click/blob/7.1.2/src/click/decorators.py
- 6: Option callback always fires even when it is not specified in the command invocation pallets/click#2997
- 7: Binary flags vs
default=None: semantics and issue with prompt pallets/click#1992
🏁 Script executed:
# First, find and examine the sebs/cli.py file
git ls-files | grep -E "sebs/cli\.py$"Repository: spcl/serverless-benchmarks
Length of output: 81
🏁 Script executed:
# Read the context around lines 188-191
head -n 210 sebs/cli.py | tail -n 35Repository: spcl/serverless-benchmarks
Length of output: 1878
🏁 Script executed:
# Find the Click option decorator for container_deployment
rg -A 5 "container.deployment.*click\.option" sebs/cli.pyRepository: spcl/serverless-benchmarks
Length of output: 52
🏁 Script executed:
# Search for the function definition that contains container_deployment parameter
rg -B 30 "container_deployment" sebs/cli.py | head -n 100Repository: spcl/serverless-benchmarks
Length of output: 3213
🏁 Script executed:
# Search for the --container-deployment option definition
rg "container-deployment" sebs/cli.pyRepository: spcl/serverless-benchmarks
Length of output: 129
🏁 Script executed:
# Search for simplified_common_params to understand how options are defined
rg -B 50 "simplified_common_params" sebs/cli.py | head -n 150Repository: spcl/serverless-benchmarks
Length of output: 4502
--no-container-deployment cannot override a truthy config value with the current logic.
Click passes False both when the flag is omitted and when explicitly set to --no-container-deployment. The condition if container_deployment or "container_deployment" not in config_obj.get("experiments", {}) only updates when the key is missing from config, leaving a container_deployment: true in the config unchanged.
Suggested fix
`@click.option`(
"--container-deployment/--no-container-deployment",
- default=False,
+ default=None,
help="Deploy functions as container images (AWS only).",
)-if container_deployment or "container_deployment" not in config_obj.get("experiments", {}):
- update_nested_dict(config_obj, ["experiments", "container_deployment"], container_deployment)
+if container_deployment is not None:
+ update_nested_dict(
+ config_obj, ["experiments", "container_deployment"], container_deployment
+ )
+elif "container_deployment" not in config_obj.get("experiments", {}):
+ update_nested_dict(config_obj, ["experiments", "container_deployment"], False)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sebs/cli.py` around lines 188 - 191, The current logic can't distinguish an
omitted flag from an explicit --no, so change the Click option for
container_deployment to use default=None (so omitted => None, explicit
True/False passed) and then update this check to "if container_deployment is not
None or 'container_deployment' not in config_obj.get('experiments', {})" before
calling update_nested_dict(config_obj, ['experiments','container_deployment'],
container_deployment); this ensures an explicit False from the CLI overwrites a
truthy value in config_obj while keeping the config default when the flag is
omitted.
| dockerfile_path = os.path.join( | ||
| os.path.dirname(__file__), | ||
| "..", | ||
| "..", | ||
| "dockerfiles", | ||
| "cloudflare", | ||
| "Dockerfile.manage" | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In docker SDK for Python 4.2.0, for images.build(path=..., dockerfile=...), must dockerfile be specified relative to the build context?
💡 Result:
Yes, in Docker SDK for Python 4.2.0, the dockerfile parameter for images.build(path=..., dockerfile=...) must be specified as a path relative to the build context (the path directory). The official documentation describes it as "path within the build context to the Dockerfile". While some users report that absolute paths may work in practice, this is not documented behavior and relative paths are required per the docs. The Docker build system itself enforces that the Dockerfile must exist within (or be reachable from) the sent build context tarball.
Citations:
- 1: https://docker-py.readthedocs.io/en/4.2.0
- 2: https://docker-py.readthedocs.io/en/7.0.0/images.html
- 3: https://docker-py.readthedocs.io/en/latest/images.html
- 4: https://docker-py.readthedocs.io/en/stable/images.html
- 5: https://stackoverflow.com/questions/65561498/docker-sdk-for-python-how-to-build-an-image-with-custom-dockerfile-and-custom-c
🏁 Script executed:
# Examine the file to understand the exact code and verify both locations
cat -n sebs/cloudflare/cli.py | sed -n '35,75p'Repository: spcl/serverless-benchmarks
Length of output: 2167
Use relative path for dockerfile parameter.
images.build() requires the dockerfile parameter to be relative to the build context (path). Passing an absolute host path will fail when the Docker SDK attempts to locate the file within the sent build context.
Suggested fix
- dockerfile_path = os.path.join(
+ dockerfile_relpath = os.path.join(
+ "dockerfiles",
+ "cloudflare",
+ "Dockerfile.manage",
+ )
+ dockerfile_path = os.path.join(
os.path.dirname(__file__),
"..",
"..",
- "dockerfiles",
- "cloudflare",
- "Dockerfile.manage"
+ dockerfile_relpath,
)
@@
image, build_logs = docker_client.images.build(
path=build_path,
- dockerfile=dockerfile_path,
+ dockerfile=dockerfile_relpath,
tag=full_image_name,
rm=True,
pull=True
)Also applies to: 62-68
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sebs/cloudflare/cli.py` around lines 42 - 49, The dockerfile_path is built as
an absolute host path but images.build() requires the dockerfile argument to be
relative to the build context; update the Docker build calls so you compute a
build_context (e.g., the directory to use as path) and pass dockerfile as a
relative path within that context (for example
"dockerfiles/cloudflare/Dockerfile.manage"), replacing uses of dockerfile_path
in the images.build() calls (and the similar occurrence at lines 62-68) so the
dockerfile parameter is relative to the provided build_context.
| elif "CLOUDFLARE_API_TOKEN" in os.environ: | ||
| ret = CloudflareCredentials( | ||
| api_token=os.environ["CLOUDFLARE_API_TOKEN"], | ||
| account_id=os.environ.get("CLOUDFLARE_ACCOUNT_ID"), | ||
| r2_access_key_id=os.environ.get("CLOUDFLARE_R2_ACCESS_KEY_ID"), | ||
| r2_secret_access_key=os.environ.get("CLOUDFLARE_R2_SECRET_ACCESS_KEY") | ||
| ) | ||
| elif "CLOUDFLARE_EMAIL" in os.environ and "CLOUDFLARE_API_KEY" in os.environ: | ||
| ret = CloudflareCredentials( | ||
| email=os.environ["CLOUDFLARE_EMAIL"], | ||
| api_key=os.environ["CLOUDFLARE_API_KEY"], | ||
| account_id=os.environ.get("CLOUDFLARE_ACCOUNT_ID"), | ||
| r2_access_key_id=os.environ.get("CLOUDFLARE_R2_ACCESS_KEY_ID"), | ||
| r2_secret_access_key=os.environ.get("CLOUDFLARE_R2_SECRET_ACCESS_KEY") | ||
| ) |
There was a problem hiding this comment.
Enforce account_id presence after deserialization.
account_id is documented as required, but env-based paths currently allow None. Fail early if it is missing to avoid delayed runtime failures.
🔧 Proposed fix
else:
raise RuntimeError(
"Cloudflare login credentials are missing! Please set "
"up environmental variables CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID, "
"or CLOUDFLARE_EMAIL, CLOUDFLARE_API_KEY, and CLOUDFLARE_ACCOUNT_ID"
)
+
+ if not ret.account_id:
+ raise RuntimeError(
+ "Cloudflare account ID is required. Please set CLOUDFLARE_ACCOUNT_ID "
+ "or provide credentials.account_id in configuration."
+ )Also applies to: 99-103
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sebs/cloudflare/config.py` around lines 83 - 97, The env-based
CloudflareCredentials branches currently allow account_id to be None; after
constructing ret in both branches (the CloudflareCredentials instantiations that
set api_token or email/api_key), immediately validate ret.account_id and raise a
clear ValueError (or similar) if it's missing to fail fast; apply the same check
for the other identical deserialization block referenced in the comment (the
second env-path at lines 99-103) so both the token and email/api_key flows
enforce a required account_id.
Summary by CodeRabbit
Release Notes
New Features
Documentation
Configuration