Skip to content

SeBS Cloudflare Compatibility#274

Open
userlaurin wants to merge 69 commits intospcl:masterfrom
userlaurin:master
Open

SeBS Cloudflare Compatibility#274
userlaurin wants to merge 69 commits intospcl:masterfrom
userlaurin:master

Conversation

@userlaurin
Copy link
Contributor

@userlaurin userlaurin commented Jan 5, 2026

  • nodejs worker bm wrapper
  • python worker bm wrapper
  • nodejs container bm wrapper
  • python container bm wrapper
  • cf sebs implementation
  • dockerfiles cf containers

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for Cloudflare Workers as a serverless deployment platform with both native and container-based deployment options.
    • Integrated Cloudflare R2 storage backend for persistent data operations.
    • Added Durable Objects support for stateful NoSQL storage on Cloudflare.
    • Introduced new benchmark functions for cloud storage uploads, CRUD operations, and data compression.
  • Documentation

    • Added comprehensive Cloudflare platform configuration and usage guides.
    • Documented Cloudflare storage options and integration details.
  • Configuration

    • Expanded system configurations to include Cloudflare deployment settings.
    • Added test configuration templates for Cloudflare benchmarking.

@coderabbitai
Copy link

coderabbitai bot commented Jan 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request introduces comprehensive Cloudflare Workers platform support to the SEBS benchmarking framework, including native worker and container deployment modes, R2 object storage, Durable Objects NoSQL storage, complete request/response handlers for Python and Node.js, CLI-based Wrangler integration, benchmark applications, and supporting configuration and documentation infrastructure.

Changes

Cohort / File(s) Summary
Core Configuration & Credentials
sebs/cloudflare/config.py, sebs/cloudflare/__init__.py
Introduces CloudflareConfig, CloudflareCredentials, and CloudflareResources classes with initialization, serialization, caching, and environment variable deserialization patterns.
Platform Implementation
sebs/cloudflare/cloudflare.py, sebs/cloudflare/function.py, sebs/cloudflare/resources.py
Implements the main Cloudflare platform class orchestrating deployments, credential verification, package bundling, worker lifecycle management, URL construction, metrics extraction, and trigger creation. Includes CloudflareWorker function wrapper and system resources adapter.
Deployment Orchestration
sebs/cloudflare/workers.py, sebs/cloudflare/containers.py, sebs/cloudflare/cli.py
Provides deployment handlers for native workers and container workers, including Wrangler configuration generation, code packaging with dependency management, Docker image building/caching, and CloudflareCLI Docker-based tooling wrapper.
Storage Backends
sebs/cloudflare/r2.py, sebs/cloudflare/durable_objects.py
Implements R2 (S3-compatible object storage) and Durable Objects (NoSQL) storage adapters with authentication, bucket/table management, and data operations.
Trigger Support
sebs/cloudflare/triggers.py
Adds HTTPTrigger class enabling HTTP-based worker invocation with measurement extraction and both sync/async invocation patterns.
Native Worker Handlers
benchmarks/wrappers/cloudflare/nodejs/handler.js, benchmarks/wrappers/cloudflare/nodejs/nosql.js, benchmarks/wrappers/cloudflare/nodejs/storage.js, benchmarks/wrappers/cloudflare/nodejs/request-polyfill.js
Implements Node.js request/response handler for workers with NoSQL client, R2 storage adapter, request polyfill for Node.js APIs, and comprehensive timing/measurement instrumentation.
Native Worker Python
benchmarks/wrappers/cloudflare/python/handler.py, benchmarks/wrappers/cloudflare/python/nosql.py, benchmarks/wrappers/cloudflare/python/storage.py
Implements Python request/response handler with NoSQL and R2 storage adapters using HTTP proxies and async/await transformation patterns.
Container Worker Infrastructure
benchmarks/wrappers/cloudflare/nodejs/container/handler.js, benchmarks/wrappers/cloudflare/nodejs/container/nosql.js, benchmarks/wrappers/cloudflare/nodejs/container/storage.js, benchmarks/wrappers/cloudflare/nodejs/container/package.json, benchmarks/wrappers/cloudflare/nodejs/container/worker.js, benchmarks/wrappers/cloudflare/nodejs/build.js
Provides Node.js container worker implementation with HTTP server, R2 and NoSQL proxies via Durable Objects, build tooling with esbuild, and dynamic import transformation.
Container Worker Python
benchmarks/wrappers/cloudflare/python/container/handler.py, benchmarks/wrappers/cloudflare/python/container/nosql.py, benchmarks/wrappers/cloudflare/python/container/storage.py
Implements Python container worker HTTP server with NoSQL and R2 storage adapters communicating via worker proxies.
Benchmark Applications
benchmarks/100.webapps/120.uploader/python/function_cloudflare.py, benchmarks/100.webapps/130.crud-api/nodejs/function.js, benchmarks/100.webapps/130.crud-api/nodejs/package.json, benchmarks/300.utilities/311.compression/nodejs/function.js, benchmarks/300.utilities/311.compression/nodejs/package.json
Adds Cloudflare-specific benchmark implementations: file upload with Pyodide, Node.js CRUD API with NoSQL backend, and directory compression with tar/gzip archiving.
Wrangler Templates
templates/wrangler-worker.toml, templates/wrangler-container.toml
Provides TOML scaffolds for native workers and container workers with Durable Object bindings, migrations, and configuration placeholders.
Docker Infrastructure
dockerfiles/cloudflare/nodejs/Dockerfile, dockerfiles/cloudflare/python/Dockerfile, dockerfiles/cloudflare/Dockerfile.manage
Adds container images for Node.js and Python workers, plus management container with Wrangler, Docker CLI, and Python tooling.
System Configuration
configs/systems.json, configs/cloudflare-test.json
Introduces Cloudflare platform configuration defining base/container images, deployment modes, language support (Python/Node.js), and test configuration with credentials placeholders.
Documentation
docs/platforms.md, docs/storage.md
Adds comprehensive Cloudflare platform documentation covering credentials, language support, triggers, limitations, metrics, and storage configuration for R2 and Durable Objects.
Framework Integration
sebs/config.py, sebs/faas/config.py, sebs/sebs.py, sebs/cli.py, sebs/experiments/config.py, benchmarks/000.microbenchmarks/040.server-reply/config.json
Integrates Cloudflare into framework: adds container image config lookup, optional Cloudflare config deserialization, deployment registration, CLI option, container deployment field handling, and removes Node.js from microbenchmark languages.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Worker as Cloudflare Worker
    participant Handler as Request Handler
    participant Storage as R2/NoSQL
    participant Proxy as Worker Proxy
    participant Backend as Backend Service

    Client->>Worker: HTTP Request
    Worker->>Handler: Parse event + metadata
    Handler->>Handler: Extract bucket/key/operation
    Handler->>Proxy: Route to Storage Proxy
    Proxy->>Backend: API Call to R2/Durable Object
    Backend-->>Proxy: Response
    Proxy-->>Handler: Data/Result
    Handler->>Handler: Compute metrics (timing/size)
    Handler-->>Worker: Structured Result
    Worker-->>Client: JSON Response (with measurements)
Loading
sequenceDiagram
    participant SEBS as SEBS Framework
    participant CLI as CloudflareCLI
    participant Docker as Docker Engine
    participant Wrangler as Wrangler Tool
    participant Cloudflare as Cloudflare API

    SEBS->>CLI: initialize()
    CLI->>Docker: Pull/Build image
    Docker-->>CLI: Image ready
    SEBS->>CLI: package_code()
    CLI->>Docker: Run container
    Docker->>Docker: npm install / pip install
    Docker-->>CLI: Return dependencies
    SEBS->>CLI: generate_wrangler_toml()
    CLI->>Docker: Write configuration
    SEBS->>CLI: deploy()
    CLI->>Wrangler: wrangler deploy
    Wrangler->>Cloudflare: Upload worker
    Cloudflare-->>Wrangler: Deployed
    Wrangler-->>CLI: Success
    CLI-->>SEBS: Worker URL
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 Burrow deep through clouds so high,
Workers dance as workers fly,
R2 stores with Durable grace,
New Cloudflare benchmarks find their place! ☁️✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'SeBS Cloudflare Compatibility' accurately reflects the main objective of the PR, which adds comprehensive Cloudflare Workers support to the SeBS benchmarking framework.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can disable the changed files summary in the walkthrough.

Disable the reviews.changed_files_summary setting to disable the changed files summary in the walkthrough.

@userlaurin userlaurin marked this pull request as ready for review January 6, 2026 11:53
@userlaurin userlaurin changed the title Draft PR for Cloudflare Serverless Benchmarking SeBS Cloudflare Compatibility & 311 Benchmark Translation to Nodejs Jan 6, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sebs.py (1)

102-105: Update help text to reflect Cloudflare container support.

The help text states "(AWS only)" but this PR adds Cloudflare container deployment support. Update the documentation to reflect the expanded platform support.

🔎 Proposed fix
     @click.option(
         "--container-deployment/--no-container-deployment",
         default=False,
-        help="Deploy functions as containers (AWS only). When enabled, functions are packaged as container images and pushed to Amazon ECR."
+        help="Deploy functions as containers. When enabled, functions are packaged as container images (AWS: Amazon ECR, Cloudflare: Container Workers)."
     )
🤖 Fix all issues with AI Agents
In @benchmarks/100.webapps/120.uploader/python/function_cloudflare.py:
- Around line 8-9: Move the module-level call storage.storage.get_instance()
into the request handler so the storage client is lazily initialized; inside the
handler, try to get the singleton via storage.storage.get_instance(), and if
that raises/returns None call storage.init_instance() (or the appropriate
initializer) and then call get_instance() again, assigning the result to a local
variable (e.g., client) before using it; update any references to the
module-level client to use this local/client-in-handler to avoid import-time
failures.

In @benchmarks/100.webapps/130.crud-api/nodejs/function.js:
- Around line 6-13: The addProduct function calls the asynchronous
nosqlClient.insert but does not await it; change addProduct to be async and
await the nosqlClient.insert call (or return its promise) so the handler waits
for completion; update the function signature (addProduct) and the call to
nosqlClient.insert to use await and propagate or handle errors as appropriate to
avoid fire-and-forget behavior.

In @benchmarks/300.utilities/311.compression/nodejs/function.js:
- Around line 108-113: The archive is being written into the source directory
(archive_path = path.join(download_path, archive_name)) and may be picked up by
collectFiles while creating the tar, corrupting the output; change
createTarGzArchive invocation to write the archive outside the directory being
archived (e.g., create the tar in a temp dir or sibling path) or exclude the
archive file from collection by ensuring archive_path is outside download_path
or by passing an explicit exclude for archive_name to collectFiles; update
references around archive_name, archive_path, download_path, and the call to
createTarGzArchive (and collectFiles if applicable) so the archive isn’t placed
inside the directory being archived.

In @benchmarks/300.utilities/311.compression/nodejs/package.json:
- Around line 6-8: The package.json currently depends on "uuid": "^10.0.0" which
requires Node.js 16+, but systems.json still lists Node.js 14 as a supported
runtime; fix by either removing Node.js 14 from the supported runtimes in
systems.json (so only Node 16+ remains) or by downgrading the package.json uuid
dependency to a Node‑14 compatible version (e.g., change "uuid" to a 9.x range
like ^9.0.0); update the corresponding file (package.json for the dependency
change or systems.json for the runtime change) and ensure any CI/test configs
are consistent with the chosen option.

In @benchmarks/wrappers/cloudflare/nodejs/build.js:
- Around line 64-130: The regex-based transformations in asyncNosqlPlugin
(inside build.onLoad for function.js) are unsafe: the function declaration
replacement (`/^(\s*)function\s+(\w+)\s*\(/gm`) forces all functions async and
the line-by-line call-rewrite regex turns constructor calls like `new
UserClass(...)` into invalid `new await UserClass(...)`. Replace the in-plugin
regex approach with a proper AST-based transform run before esbuild (e.g., use
@babel/parser + @babel/traverse or SWC/Recast) to selectively: (1) add await
only to actual async call expressions to nosqlClient methods and user-defined
async functions, (2) avoid modifying constructors (skip NewExpression nodes) and
generators, and (3) preserve non-async function declarations instead of globally
prefixing with async; alternatively restrict the plugin to only operate on
explicitly annotated sections rather than arbitrary regex edits.

In @benchmarks/wrappers/cloudflare/nodejs/container/nosql.js:
- Around line 39-48: The current error handling reads the response body multiple
times causing failures; change it to read the body once into a variable (e.g.,
const body = await response.text()), then attempt to parse JSON from that body
to extract errorData.error and fall back to the raw body if parsing fails,
assigning the result to errorMsg and then throwing the Error using that single
errorMsg (operate on the same response/body and use the existing response and
errorMsg variables).
- Around line 86-99: The query method contains leftover debug console.error
statements that pollute production logs; remove all console.error lines inside
async query (which calls this._make_request) and keep the rest of the
implementation intact (construct params, await this._make_request('query',
params), compute const items = result.items || [] and return items) so behavior
is unchanged.

In @benchmarks/wrappers/cloudflare/nodejs/container/package.json:
- Around line 7-9: Update the dependency version for "@cloudflare/containers" in
package.json because "^1.0.0" has no published releases; change the version
string to a valid published range such as "^0.0.20" or the specific latest
"0.0.31", then reinstall (npm install) and update package-lock.json so container
builds succeed.

In @benchmarks/wrappers/cloudflare/nodejs/container/storage.js:
- Around line 15-18: The class contains a duplicate static field declaration
"worker_url" defined twice; remove the redundant declaration so only a single
"static worker_url = null; // Set by handler from X-Worker-URL header" remains
(keep the comment), and verify no other duplicate static declarations exist in
the class and that all references to WorkerStorage.worker_url (or similar) still
resolve.

In @benchmarks/wrappers/cloudflare/nodejs/container/worker.js:
- Around line 31-38: The query method currently ignores its keyPrefix argument
and returns all stored items; update async query(keyPrefix) to filter the
listing by the prefix (e.g., only include entries where the key startsWith
keyPrefix) before pushing values into items so queries return the intended
subset; modify the loop over this.ctx.storage.list() (the [k, v] destructuring)
to check keyPrefix (handle undefined/empty) and only push v when the key matches
the prefix.
- Around line 309-324: Remove the temporary debug console.log statements inside
the '/r2/upload' branch: delete the logs that print bucket/key, env.R2
existence, received byte length, and the putResult (the lines using console.log
before and after the put). Keep the try/catch and the error reporting
(console.error for env.R2.put errors) and retain the successful behavior (no
debug prints); locate these in the code that checks url.pathname ===
'/r2/upload' and uses request.arrayBuffer() and env.R2.put(key, data).

In @benchmarks/wrappers/cloudflare/nodejs/nosql.js:
- Around line 62-66: The update method uses a non-atomic read-modify-write (the
update function calls get then insert), which allows concurrent updates to
clobber each other; either document this limitation or make update atomic: add a
version/etag field to records and implement optimistic concurrency (in update:
loop read existing via get, compute merged, set incremented version, attempt
insert only if version matches and retry on conflict), or switch to a Cloudflare
Durable Object/transactional API to perform the merge server-side; reference the
update, get, and insert functions when applying the change.

In @benchmarks/wrappers/cloudflare/nodejs/storage.js:
- Around line 151-176: download_stream has inconsistent handling of missing
objects: it returns null when R2 returns no object (obj == null) but throws if
no backend or if an R2 object lacks arrayBuffer()/text(); make the behavior
consistent by choosing one policy (prefer returning null for missing content)
and apply it across all branches: in download_stream, replace the final throw
and the fallback for objects without arrayBuffer/text with returning null, and
ensure the local filesystem branch also returns null rather than throwing when
the file is absent; update logic around instance.handle, obj,
obj.arrayBuffer()/obj.text(), and the fs/localPath check so every "not found"
path returns null and only unexpected failures (e.g., actual IO errors) throw.
- Around line 87-89: The download method uses this.download_stream instead of
the resolved singleton instance; update the call in async download(__bucket,
key, filepath) to use the resolved instance (storage.instance || this) by
replacing this.download_stream with instance.download_stream so the singleton
pattern is respected (refer to download and download_stream and
storage.instance).

In @benchmarks/wrappers/cloudflare/python/container/handler.py:
- Around line 135-137: The code computes an unused timing variable: it sets end
= datetime.datetime.now().timestamp() and compute_time = end - begin but never
uses compute_time; either remove those two lines (end and compute_time) or
include compute_time in the response/log (for example attach it to the response
JSON or a response header). Locate the timing code that uses begin, end and
compute_time in handler.py and either delete the end/compute_time assignments or
propagate compute_time into the outgoing response or a logging call (keeping the
begin variable if still needed elsewhere).

In @benchmarks/wrappers/cloudflare/python/container/storage.py:
- Around line 71-77: Add a timeout to the urllib.request.urlopen call and chain
the raised RuntimeError to preserve the original traceback: pass a sensible
timeout (e.g., 10 seconds) into the urllib.request.urlopen(req) invocation so
the call cannot hang indefinitely, and change the error re-raise to use "raise
RuntimeError(...)" with "from e" to chain the original exception thrown in the
except block referencing the existing exception variable e and the current
RuntimeError raise site.
- Around line 138-144: In _upload_with_key wrap the urlopen call with an
explicit timeout (pass the same timeout constant used elsewhere, e.g.,
timeout=DEFAULT_TIMEOUT or self._timeout) — change urllib.request.urlopen(req)
to urllib.request.urlopen(req, timeout=...) — and when re-raising the failure
convert to RuntimeError using exception chaining (raise RuntimeError(f"Failed to
upload to R2: {e}") from e) so the original traceback is preserved; keep the
existing log/print of the error and include the request/ key context if
available.
- Around line 91-101: The download_stream function currently calls
urllib.request.urlopen(url) without a timeout and re-raises RuntimeError without
exception chaining; update the call to pass the same timeout value used in
upload (e.g., urlopen(url, timeout=TIMEOUT)) and in both except blocks
(urllib.error.HTTPError and generic Exception) raise the RuntimeError using
exception chaining (raise RuntimeError(...) from e) so the original traceback is
preserved; reference the download_stream function, urllib.request.urlopen,
urllib.error.HTTPError, and the exception variable e when making these edits.
- Around line 168-201: The list request in download_directory lacks a timeout
and the raised RuntimeError loses the original exception context; add a timeout
to urllib.request.urlopen(req, timeout=...) when listing objects and ensure each
download call uses a timeout (update self.download signature to accept/forward a
timeout parameter or pass a configured timeout value), and re-raise the error
with exception chaining using "raise RuntimeError(f'Failed to download
directory: {e}') from e" so the original traceback is preserved.

In @benchmarks/wrappers/cloudflare/python/nosql.py:
- Around line 115-122: The class nosql_kv has incorrect references to nosql
instead of itself; in the static method init_instance you should instantiate and
assign nosql_kv.instance = nosql_kv() and set that instance's env (e.g.,
nosql_kv.instance.env = entry.env) rather than referencing nosql; update the
init_instance method (and any other places in this class referencing nosql) to
use the correct class name nosql_kv and the instance class attribute instance.
- Around line 8-15: The class uses the wrong identifier `nosql` instead of the
actual class `nosql_do`; change the type annotation and all
assignments/instantiation to refer to `nosql_do` (e.g., replace
`Optional["nosql"]` with `Optional["nosql_do"]`, replace `nosql.instance =
nosql()` with `nosql_do.instance = nosql_do()`) and ensure the binding
assignment uses `nosql_do.DO_BINDING_NAME` and sets `nosql_do.instance.binding =
getattr(entry.env, nosql_do.DO_BINDING_NAME)`.
- Around line 203-207: The get_instance method in class nosql_kv is creating the
wrong class: it calls nosql() (the nosql_do class) instead of instantiating
nosql_kv(); update the body of get_instance to set nosql.instance = nosql_kv()
when instance is None and return nosql.instance, ensuring the singleton
references the nosql_kv class rather than nosql.
- Around line 190-195: The code currently does a fragile get_res.replace("'",
"\"") on the result of run_sync(self.get_table(table_name).get(key)) before
json.loads; remove this blind replacement and instead ensure the stored value is
valid JSON (fix insert/update to write proper JSON), or if the backend returns a
Python literal use ast.literal_eval on get_res and then convert to JSON/dict, or
if get() can return an already-parsed object skip json.loads entirely; update
the logic around run_sync(self.get_table(table_name).get(key)), get_res, and the
json.loads call to handle proper parsing without string-replacement.
- Around line 161-168: The get method in nosql_kv returns the raw result instead
of a parsed dict; update nosql_kv.get to mirror query by checking for a
None/empty response and calling json.loads on the returned string before
returning it. Locate the get method that calls
run_sync(self.get_table(table_name).get(self.key_maker(primary_key,
secondary_key))) and replace the direct return with logic that handles a None
result and returns json.loads(get_res) (or the already-parsed dict) so callers
always receive a dict or None.

In @benchmarks/wrappers/cloudflare/python/storage.py:
- Line 60: The code attempts to call append on written_files which is
initialized as a set (self.written_files) causing an AttributeError; change the
call in the method that currently does self.written_files.append(filepath) to
use the set API (self.written_files.add(filepath)) or, if order/duplicates
matter, change the initializer from set() to [] and keep append—prefer using
self.written_files.add(filepath) to preserve the existing set semantics.
- Around line 111-113: get_instance currently uses invalid syntax raise "must
init storage singleton first" and is missing the @staticmethod decorator; fix by
adding @staticmethod above def get_instance() and replace the string raise with
raising a proper Exception (e.g., raise RuntimeError("must init storage
singleton first")) so the method correctly signals the error when
storage.instance is None; keep the method body referencing storage.instance as
written.

In @config/cloudflare-test.json:
- Around line 2-13: The runtime block currently declares `"runtime": {
"language": "nodejs", "version": "18" }` which is incorrect for Cloudflare
Workers; update the experiments.runtime configuration by removing the Node.js
language/version pair and instead set the Cloudflare Workers compatibility flags
(e.g., add `compatibility_date` with a date >= "2024-09-23" and `nodejs_compat:
true` when Node.js APIs are required), keeping the `"experiments"` object and
other fields intact and ensuring `nodejs_compat` is documented in the config
comment or README for future maintainers.

In @config/systems.json:
- Around line 389-391: The UUID version in config/systems.json ("packages": {
"uuid": "3.4.0" }) conflicts with the compression benchmark's package.json which
uses "uuid": "^10.0.0"; pick one canonical version and update the other to match
(either change config/systems.json uuid to "^10.0.0" or change the benchmark's
package.json uuid to "3.4.0") so all deployments use the same uuid package
version across providers; ensure you update the package.json or config entry and
run dependency resolution/tests to confirm compatibility.

In @experiments.json:
- Around line 52-54: The committed experiments.json exposes a real credential
value under the "credentials" -> "account_id" field; remove the sensitive value
from the PR and either delete the file from the branch or replace the value with
a placeholder like "<YOUR_ACCOUNT_ID>" in experiments.json, then add
experiments.json to .gitignore (or move this test output to a non-committed
sample file) to prevent future commits; ensure you also purge the secret from
the PR history or force-push a branch without the file if necessary.

In @package.json:
- Line 1: Delete the empty root package.json file (remove package.json from the
repo) because this project uses isolated benchmark packages; if a root
package.json is actually required, replace it with a meaningful file containing
the intended configuration or add a clear explanation in the repo README
indicating its purpose instead of leaving an empty {}.

In @sebs/cloudflare/cloudflare.py:
- Around line 969-981: The final line returning (directory, total_size, "") is
unreachable (there is an earlier return image_tag) and uses undefined
total_size; remove this leftover return statement or, if the intent was to
return size info, move calculation/definition of total_size into the same scope
and replace the earlier return image_tag accordingly. In short: delete the
unreachable return (directory, total_size, "") from the end of the function in
cloudflare.py (or consolidate/define total_size and adjust the earlier return to
a single consistent return value) so no undefined variable remains.
- Line 148: Credential verification and other HTTP calls in cloudflare.py are
missing timeouts which can cause indefinite hangs; define a default timeout
constant (e.g., DEFAULT_REQUEST_TIMEOUT = 10) near the module top and pass it to
all requests calls (e.g., change
requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers) to
requests.get(..., headers=headers, timeout=DEFAULT_REQUEST_TIMEOUT)) including
the worker-exists check and the workers.dev subdomain fetch, and update any
requests.post/put/delete calls similarly; optionally wrap these calls to catch
requests.exceptions.Timeout/RequestException where appropriate.
- Line 73: The initialize method uses a mutable default argument (config:
Dict[str, str] = {}); change the signature to use None as the default (e.g.,
config: Optional[Dict[str, str]] = None) and inside the initialize method set
config = {} if config is None to avoid sharing the same dict across calls;
update any type annotations/uses accordingly and ensure references to initialize
and its parameter name remain consistent.

In @sebs/cloudflare/durable_objects.py:
- Around line 100-113: The update_cache method uses a non-existent private
attribute self._cache_client which will raise AttributeError; change references
in update_cache to the public property self.cache_client (i.e., call
self.cache_client.update_nosql(...)) so it uses the NoSQLStorage-exposed client;
verify any other uses in the DurableObjects class for similar _cache_client
references and replace them with cache_client to remain consistent with the
parent API.
- Around line 185-203: The method remove_table mutates the inner dict while
iterating over tables.items(), which can raise RuntimeError; change the logic in
remove_table to first locate the matching benchmark and table_key without
deleting during iteration (e.g., iterate over a snapshot like
list(tables.items()) or record table_key to delete), then perform the deletion
after the loop; update references to self._tables, the local variable tables,
and the inner loop that checks table_name == name to reflect this two-step
find-then-delete approach and keep the logging/return behavior unchanged.
- Around line 30-39: The constructor __init__ accepts a credentials:
CloudflareCredentials but never stores it, causing _get_auth_headers to
reference missing self._credentials; fix by assigning the passed credentials to
an instance attribute (self._credentials = credentials) in __init__ so
_get_auth_headers can use it, or if credentials truly aren't needed remove the
credentials parameter and any references to _get_auth_headers; update only the
DurableObjects class __init__ and related references (_credentials and
_get_auth_headers) to keep consistency.

In @sebs/cloudflare/function.py:
- Around line 56-62: The deserializer uses hardcoded keys "Library" and "HTTP"
that don't match the serialized trigger type strings; replace the static mapping
with the actual serialized type names by using LibraryTrigger.typename() and
HTTPTrigger.typename() (e.g. mapping = {LibraryTrigger.typename():
LibraryTrigger, HTTPTrigger.typename(): HTTPTrigger}) and then lookup
trigger_type = mapping.get(trigger["type"]); keep the assert and call
ret.add_trigger(trigger_type.deserialize(trigger)) as before so deserialization
succeeds.

In @sebs/cloudflare/r2.py:
- Around line 81-84: The requests calls in _create_bucket, list_buckets, and
remove_bucket are missing timeouts and can hang indefinitely; update each
requests.<method> call (e.g., the POST in _create_bucket, the GET in
list_buckets, and the DELETE in remove_bucket) to pass a timeout argument (use a
shared constant or self._timeout if available, e.g., timeout=DEFAULT_TIMEOUT)
and ensure the same timeout is applied consistently across all requests calls in
this class; optionally, wrap the calls to catch requests.exceptions.Timeout and
handle/log it in the respective methods (e.g., _create_bucket, list_buckets,
remove_bucket).
- Around line 13-14: The typename() function returns the misspelled string
"Cloudlfare.R2"; update the return value in typename() to the correct service
identifier "Cloudflare.R2" so logs and identifiers use the correct spelling.
- Around line 58-60: The _create_bucket method uses a mutable default for the
buckets parameter (buckets: list[str] = []), which can cause shared-state bugs;
change the signature to accept None (e.g., buckets: Optional[list[str]] = None)
and inside _create_bucket initialize buckets = [] if buckets is None, or update
callers to always pass an explicit list, ensuring the function no longer relies
on a shared default list instance.

In @sebs/cloudflare/README.md:
- Around line 105-118: The "Container Deployment" limitation in the README is
outdated; update the "Container Deployment" and "Container Support Architecture"
sections to reflect that container support has been implemented: remove the "Not
currently implemented" limitation, document that Node.js and Python container
benchmark wrappers were added, and describe the implemented components
(container orchestrator, worker proxy for R2/Durable Objects, and provided
Dockerfiles such as the python Dockerfile) rather than listing what "would be
required"; also remove or update any references to a missing CloudflareContainer
class or missing package_code()/create_function() changes to accurately describe
the current implementation and where container integration is located.

In @sebs/cloudflare/triggers.py:
- Around line 136-142: The async_invoke method currently creates a new
ThreadPoolExecutor on every call (pool = ThreadPoolExecutor()) and never shuts
it down, leaking threads; change this to use a shared executor (e.g., a
module-level or Cloudflare trigger class attribute like _executor) and have
async_invoke submit to that shared executor instead of creating a new one, and
ensure the executor is cleanly shutdown when the application or trigger instance
is closed (implement a close()/shutdown() method or __del__ to call
_executor.shutdown(wait=True)); keep the call to submit(self.sync_invoke,
payload) and the returned Future unchanged.
🟠 Major comments (18)
benchmarks/wrappers/cloudflare/nodejs/container/package.json-7-9 (1)

7-9: Update version constraint to match published releases; ^1.0.0 has no available versions.

The @cloudflare/containers package exists and is an official Cloudflare package, but the version constraint "^1.0.0" is incorrect. The latest published version is 0.0.31, and no version 1.0.0 or higher exists. Change the version to a valid range such as "^0.0.20" or the specific latest "0.0.31" to ensure npm install succeeds during container builds.

experiments.json-52-54 (1)

52-54: Credentials exposed in committed file.

The account_id value appears to be a real Cloudflare account identifier. This file looks like test output data that should not be committed to the repository. Consider:

  1. Adding experiments.json to .gitignore
  2. Removing this file from the PR
  3. If this is intentional sample data, replace the account_id with a placeholder like "<YOUR_ACCOUNT_ID>"
benchmarks/300.utilities/311.compression/nodejs/function.js-108-113 (1)

108-113: Archive created inside source directory may include itself.

The archive is created at path.join(download_path, archive_name), which is inside the download_path being archived. If collectFiles encounters the partially-written archive file during traversal, it will attempt to include it, leading to corrupted output or unpredictable behavior.

🔎 Proposed fix: Create archive in a separate location
   // Compress directory
   const compress_begin = Date.now();
   const archive_name = `${key}.tar.gz`;
-  const archive_path = path.join(download_path, archive_name);
+  const archive_path = path.join('/tmp', `${archive_name}-${uuidv4()}`);
   await createTarGzArchive(download_path, archive_path);
   const compress_end = Date.now();

Committable suggestion skipped: line range outside the PR's diff.

benchmarks/100.webapps/120.uploader/python/function_cloudflare.py-8-9 (1)

8-9: Module-level storage initialization may fail.

storage.storage.get_instance() is called at module import time. If the storage singleton hasn't been initialized via init_instance() yet, this will raise an exception (once the bug in storage.py is fixed). Consider lazy initialization inside the handler.

🔎 Proposed fix: Lazy initialization
 from . import storage
-client = storage.storage.get_instance()
+client = None
 
 SEBS_USER_AGENT = "SeBS/1.2 (https://github.com/spcl/serverless-benchmarks) SeBS Benchmark Suite/1.2"
 
 async def do_request(url, download_path):
     headers = {'User-Agent': SEBS_USER_AGENT}
 
     res = await pyfetch(url, headers=headers)
     bs = await res.bytes()
     
     with open(download_path, 'wb') as f:
         f.write(bs)
 
 def handler(event):
+    global client
+    if client is None:
+        client = storage.storage.get_instance()
 
     bucket = event.get('bucket').get('bucket')

Committable suggestion skipped: line range outside the PR's diff.

benchmarks/300.utilities/311.compression/nodejs/package.json-6-8 (1)

6-8: Remove Node.js 14 support or downgrade uuid dependency.

The uuid v10.0.0 dependency requires Node.js 16 or newer, but systems.json configures Node.js 14 as a supported version, creating an incompatibility. Either remove Node.js 14 from the supported runtimes or downgrade uuid to a version compatible with Node.js 14 (e.g., v9.x).

benchmarks/wrappers/cloudflare/nodejs/container/worker.js-31-38 (1)

31-38: query method ignores keyPrefix parameter.

The query method accepts a keyPrefix parameter but doesn't use it. It returns all items from storage instead of filtering by the prefix. This could cause incorrect behavior when querying for specific subsets of data.

🔎 Proposed fix
   async query(keyPrefix) {
-    const list = await this.ctx.storage.list();
+    const list = await this.ctx.storage.list({ prefix: keyPrefix });
     const items = [];
     for (const [k, v] of list) {
       items.push(v);
     }
     return { items };
   }
benchmarks/wrappers/cloudflare/nodejs/container/nosql.js-39-48 (1)

39-48: Error handling consumes response body twice.

On line 43, response.json() consumes the response body. If that succeeds, line 43 then tries to access await response.text() as a fallback, but this will fail because the body stream is already consumed. If response.json() throws (line 44-46), then response.text() is called again, which will also fail.

🔎 Proposed fix
       if (!response.ok) {
-        let errorMsg;
-        try {
-          const errorData = await response.json();
-          errorMsg = errorData.error || await response.text();
-        } catch {
-          errorMsg = await response.text();
-        }
+        const errorText = await response.text();
+        let errorMsg = errorText;
+        try {
+          const errorData = JSON.parse(errorText);
+          errorMsg = errorData.error || errorText;
+        } catch {
+          // Keep errorText as-is
+        }
         throw new Error(`NoSQL operation failed: ${errorMsg}`);
       }
benchmarks/wrappers/cloudflare/python/container/handler.py-148-150 (1)

148-150: Fix platform-dependent ru_maxrss calculation for macOS compatibility.

resource.getrusage().ru_maxrss returns different units across platforms:

  • Linux: kilobytes
  • macOS/BSD: bytes

The current code divides by 1024.0, which correctly converts Linux kilobytes to megabytes but produces kilobytes instead of megabytes on macOS. Use platform detection to apply the correct conversion factor:

  • Linux: divide by 1024
  • macOS/BSD: divide by 1,048,576

This same issue exists in benchmarks/wrappers/cloudflare/python/handler.py line 111.

benchmarks/wrappers/cloudflare/nodejs/build.js-64-130 (1)

64-130: Regex-based async transformation will break constructor calls and violates esbuild best practices.

The asyncNosqlPlugin uses regex patterns to transform JavaScript code, which causes two critical issues:

  1. Constructor calls fail: The pattern at lines 97–116 matches new UserClass(...) and transforms it to new await UserClass(...), which is invalid JavaScript syntax. The new keyword is neither in the controlFlow list nor handled as a special case.

  2. All functions become async: Line 83 converts every function declaration to async, which breaks:

    • Functions returning non-Promise values
    • Callbacks and synchronous contexts
    • Generators and function-as-value patterns
  3. Anti-pattern for esbuild: This violates esbuild plugin best practices. According to esbuild documentation, AST-heavy transforms should use a proper parser (Babel, SWC, or Recast) as a pre/post-build step rather than regex mutation inside the plugin.

Replace the regex-based transformation with either:

  • An external AST transform step using @babel/parser + @babel/traverse before esbuild
  • A dedicated transformer service (SWC, Recast) that preserves code semantics
  • Or limit the plugin to simple text replacements on explicitly marked sections
sebs/cloudflare/durable_objects.py-30-39 (1)

30-39: credentials parameter is unused.

The credentials parameter is passed to __init__ but never stored. The _get_auth_headers method references self._credentials which doesn't exist. Either store the credentials or remove the parameter and the unused method.

🔎 Proposed fix
     def __init__(
         self,
         region: str,
         cache_client: Cache,
         resources: Resources,
         credentials: CloudflareCredentials,
     ):
         super().__init__(region, cache_client, resources)
+        self._credentials = credentials
         # Tables are just logical names - Durable Objects are accessed via Worker bindings
         self._tables: Dict[str, Dict[str, str]] = defaultdict(dict)
benchmarks/wrappers/cloudflare/python/container/storage.py-71-77 (1)

71-77: Add timeout to urlopen calls to prevent indefinite hangs.

Network calls without timeouts can hang indefinitely if the worker proxy is unresponsive, causing benchmark runs to stall. Also, chain the exception with from e for proper traceback.

🔎 Proposed fix
         try:
-            with urllib.request.urlopen(req) as response:
+            with urllib.request.urlopen(req, timeout=30) as response:
                 result = json.loads(response.read().decode('utf-8'))
                 return result['key']
         except Exception as e:
             print(f"R2 upload error: {e}")
-            raise RuntimeError(f"Failed to upload to R2: {e}")
+            raise RuntimeError(f"Failed to upload to R2: {e}") from e
benchmarks/wrappers/cloudflare/python/container/storage.py-138-144 (1)

138-144: Add timeout and exception chaining to _upload_with_key.

Consistent with other methods.

🔎 Proposed fix
         try:
-            with urllib.request.urlopen(req) as response:
+            with urllib.request.urlopen(req, timeout=30) as response:
                 result = json.loads(response.read().decode('utf-8'))
                 print(f"[storage._upload_with_key] Upload successful, key={result['key']}")
         except Exception as e:
             print(f"R2 upload error: {e}")
-            raise RuntimeError(f"Failed to upload to R2: {e}")
+            raise RuntimeError(f"Failed to upload to R2: {e}") from e
benchmarks/wrappers/cloudflare/python/container/storage.py-91-101 (1)

91-101: Add timeout and exception chaining to download_stream.

Same issue as upload - missing timeout and exception chaining.

🔎 Proposed fix
         try:
-            with urllib.request.urlopen(url) as response:
+            with urllib.request.urlopen(url, timeout=30) as response:
                 return response.read()
         except urllib.error.HTTPError as e:
             if e.code == 404:
-                raise RuntimeError(f"Object not found: {key}")
+                raise RuntimeError(f"Object not found: {key}") from e
             else:
-                raise RuntimeError(f"Failed to download from R2: {e}")
+                raise RuntimeError(f"Failed to download from R2: {e}") from e
         except Exception as e:
             print(f"R2 download error: {e}")
-            raise RuntimeError(f"Failed to download from R2: {e}")
+            raise RuntimeError(f"Failed to download from R2: {e}") from e
benchmarks/wrappers/cloudflare/python/container/storage.py-168-201 (1)

168-201: Add timeout to download_directory list request and chain exception.

The list operation and subsequent downloads should have timeouts.

🔎 Proposed fix
         try:
             req = urllib.request.Request(list_url)
             req.add_header('User-Agent', 'SeBS/1.2 (https://github.com/spcl/serverless-benchmarks) SeBS Benchmark Suite/1.2')
             
-            with urllib.request.urlopen(req) as response:
+            with urllib.request.urlopen(req, timeout=30) as response:
                 # ... rest of code ...
                 
         except Exception as e:
             print(f"Error listing/downloading directory: {e}")
-            raise RuntimeError(f"Failed to download directory: {e}")
+            raise RuntimeError(f"Failed to download directory: {e}") from e
sebs/cloudflare/triggers.py-136-142 (1)

136-142: Thread pool executor leak: new pool created per invocation.

Each call to async_invoke creates a new ThreadPoolExecutor that is never shut down, leading to resource leakage. Use a shared executor or ensure proper cleanup.

🔎 Proposed fix
+_executor = None
+
+def _get_executor():
+    global _executor
+    if _executor is None:
+        _executor = concurrent.futures.ThreadPoolExecutor(max_workers=4)
+    return _executor
+
 class HTTPTrigger(Trigger):
     # ... existing code ...
     
     def async_invoke(self, payload: dict) -> concurrent.futures.Future:
         """
         Asynchronously invoke a Cloudflare Worker via HTTP.
         """
-        pool = concurrent.futures.ThreadPoolExecutor()
-        fut = pool.submit(self.sync_invoke, payload)
+        fut = _get_executor().submit(self.sync_invoke, payload)
         return fut
sebs/cloudflare/cloudflare.py-148-148 (1)

148-148: Add timeouts to requests calls to prevent indefinite hangs.

Multiple HTTP requests lack timeouts:

  • Line 148: credential verification
  • Line 1069: checking if worker exists
  • Line 1259: fetching workers.dev subdomain
🔎 Proposed fixes
# Line 148
-        response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers)
+        response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers, timeout=30)

# Line 1069
-        response = requests.get(url, headers=headers)
+        response = requests.get(url, headers=headers, timeout=30)

# Line 1259
-            resp = requests.get(url, headers=headers)
+            resp = requests.get(url, headers=headers, timeout=30)

Also applies to: 1069-1069, 1259-1259

sebs/cloudflare/r2.py-81-84 (1)

81-84: Add timeouts to all requests calls to prevent indefinite hangs.

HTTP requests without timeouts can block indefinitely if the server is unresponsive, causing the application to hang. This is flagged in multiple locations: _create_bucket (line 82), list_buckets (line 294), and remove_bucket (line 363).

🔎 Proposed fix for line 82
             create_bucket_response = requests.post(
-                create_bucket_uri, json=params, headers=self._get_auth_headers()
+                create_bucket_uri, json=params, headers=self._get_auth_headers(), timeout=30
             )
🔎 Proposed fix for line 294
-            response = requests.get(list_buckets_uri, headers=self._get_auth_headers())
+            response = requests.get(list_buckets_uri, headers=self._get_auth_headers(), timeout=30)
🔎 Proposed fix for line 363
-            response = requests.delete(delete_bucket_uri, headers=self._get_auth_headers())
+            response = requests.delete(delete_bucket_uri, headers=self._get_auth_headers(), timeout=30)

Also applies to: 293-294, 362-364

benchmarks/wrappers/cloudflare/python/nosql.py-190-195 (1)

190-195: Fragile string manipulation for JSON parsing.

Replacing single quotes with double quotes (get_res.replace("\'", "\"")) is a fragile approach that will break if the data contains escaped quotes or actual single-quote characters in values.

🔎 Proposed fix

If the data is stored as JSON, it should already use double quotes. If it's stored with single quotes, consider:

  1. Ensuring data is stored as valid JSON in insert/update
  2. Using ast.literal_eval for Python literal syntax
  3. Storing raw JSON without the quote replacement
             get_res = run_sync(self.get_table(table_name).get(key))
-            get_res = get_res.replace("\'", "\"")
-            ##print("gr", get_res)
-        
-            res.append(json.loads(get_res))
+            if get_res is not None:
+                res.append(json.loads(get_res))
🟡 Minor comments (16)
config/cloudflare-test.json-2-13 (1)

2-13: Clarify the runtime configuration for Cloudflare Workers.

Cloudflare Workers do not run a full Node.js runtime; they run a V8-based Workers runtime. The "language": "nodejs" specification with "version": "18" does not reflect how Cloudflare Workers function. If Node.js APIs are required, enable the nodejs_compat compatibility flag (requires compatibility_date >= 2024-09-23) instead of specifying a language/version pair.

package.json-1-1 (1)

1-1: Remove the empty root package.json.

This repository uses isolated, self-contained benchmark packages in subdirectories and has no monorepo or workspace configuration that would justify a root package.json. Adding an empty {} file serves no purpose and will only create confusion for developers and potentially cause unexpected behavior with Node.js tooling. If this file is genuinely needed for a specific purpose, clarify its use; otherwise, remove it.

config/systems.json-389-391 (1)

389-391: UUID version inconsistency with benchmark package.json.

The Cloudflare Node.js deployment specifies uuid: "3.4.0", but the compression benchmark's package.json specifies uuid: "^10.0.0". This version mismatch may cause conflicts or unexpected behavior during deployment.

🔎 Suggested fix: Align versions

Either update this to match the benchmark:

         "packages": {
-          "uuid": "3.4.0"
+          "uuid": "^10.0.0"
         }

Or update the benchmark's package.json to use 3.4.0 for consistency with other providers (AWS also uses 3.4.0).

sebs/cloudflare/README.md-105-118 (1)

105-118: Documentation inconsistency: Container support status is unclear.

This section states that "Container Deployment: Not currently implemented" and lists it as a limitation. However, the PR description explicitly states:

  • "Adds Node.js and Python container benchmark wrappers."
  • "Adds Node.js and Python container benchmark wrappers."
  • "Added container support for Python and Node.js, container orchestrator and worker proxy for R2/Durable Objects."

Additionally, this PR includes dockerfiles/cloudflare/python/Dockerfile, suggesting container support is at least partially implemented.

Please update this documentation section to accurately reflect the current implementation status. If containers are implemented, remove this from limitations and update lines 299-332 (Container Support Architecture) to reflect what was actually built rather than what "would be required."

benchmarks/wrappers/cloudflare/python/container/handler.py-135-137 (1)

135-137: Remove unused compute_time variable.

The compute_time variable is calculated but never used. Either use it in the response or remove it.

🔎 Proposed fix
             # Calculate timing
             end = datetime.datetime.now().timestamp()
-            compute_time = end - begin
benchmarks/wrappers/cloudflare/nodejs/container/worker.js-309-324 (1)

309-324: Remove debug logging statements.

The R2 upload handler contains multiple console.log debug statements (lines 311-312, 314, 319-320) that should be removed before merging to production.

🔎 Proposed fix
     } else if (url.pathname === '/r2/upload') {
       // Upload to R2
-      console.log(`[worker.js /r2/upload] bucket=${bucket}, key=${key}`);
-      console.log(`[worker.js /r2/upload] env.R2 exists:`, !!env.R2);
       const data = await request.arrayBuffer();
-      console.log(`[worker.js /r2/upload] Received ${data.byteLength} bytes`);
       
       // Use the key as-is (container already generates unique keys if needed)
       try {
         const putResult = await env.R2.put(key, data);
-        console.log(`[worker.js /r2/upload] R2.put() returned:`, putResult);
-        console.log(`[worker.js /r2/upload] Successfully uploaded to R2 with key=${key}`);
       } catch (error) {
         console.error(`[worker.js /r2/upload] R2.put() error:`, error);
         throw error;
       }
benchmarks/wrappers/cloudflare/nodejs/container/nosql.js-86-99 (1)

86-99: Remove debug logging statements.

The query method contains multiple console.error debug statements that should be removed before merging. These add noise to production logs.

🔎 Proposed fix
   async query(tableName, primaryKey, secondaryKeyName) {
     const params = {
       table_name: tableName,
       primary_key: primaryKey,
       secondary_key_name: secondaryKeyName,
     };
     const result = await this._make_request('query', params);
-    console.error(`[nosql.query] result:`, JSON.stringify(result));
-    console.error(`[nosql.query] result.items:`, result.items);
-    console.error(`[nosql.query] Array.isArray(result.items):`, Array.isArray(result.items));
     const items = result.items || [];
-    console.error(`[nosql.query] returning items:`, items);
     return items;
   }
benchmarks/wrappers/cloudflare/nodejs/nosql.js-62-66 (1)

62-66: Race condition in update: concurrent updates may lose data.

The read-modify-write pattern in update is not atomic. If two concurrent updates occur, one may overwrite the other's changes. Consider documenting this limitation or using Durable Object transactions if available.

benchmarks/wrappers/cloudflare/nodejs/container/storage.js-15-18 (1)

15-18: Duplicate static field declaration: worker_url defined twice.

This is flagged by static analysis and will cause the second declaration to overwrite the first (which is redundant but may indicate copy-paste error).

🔎 Proposed fix
   static worker_url = null; // Set by handler from X-Worker-URL header
-
-  
-  static worker_url = null; // Set by handler from X-Worker-URL header
benchmarks/wrappers/cloudflare/python/container/storage.py-146-151 (1)

146-151: Potential issue: download doesn't handle /tmp prefix like the worker version.

Comparing with benchmarks/wrappers/cloudflare/python/storage.py (lines 51-59), the worker version redirects writes to /tmp and tracks written files. This container version creates directories but doesn't ensure the path is under /tmp, which could fail in restricted environments.

🔎 Proposed fix
     def download(self, bucket, key, filepath):
         """Download file to disk"""
         data = self.download_stream(bucket, key)
+        # Ensure writes go to /tmp in container environment
+        if not filepath.startswith('/tmp'):
+            filepath = '/tmp' + os.path.abspath(filepath)
         os.makedirs(os.path.dirname(filepath), exist_ok=True)
         with open(filepath, 'wb') as f:
             f.write(data)
benchmarks/wrappers/cloudflare/python/nosql.py-161-168 (1)

161-168: get method in nosql_kv returns raw result instead of parsed JSON.

Unlike query which calls json.loads, the get method returns the raw result without parsing. This inconsistency could cause type errors for callers expecting a dict.

🔎 Proposed fix
     def get(
         self, table_name: str, primary_key: Tuple[str, str], secondary_key: Tuple[str, str]
     ) -> Optional[dict]:
         get_res = run_sync(
             self.get_table(table_name).get(
                 self.key_maker(primary_key, secondary_key)
             ))
-        return get_res
+        if get_res is None:
+            return None
+        return json.loads(get_res)
sebs/cloudflare/r2.py-13-14 (1)

13-14: Typo in typename(): "Cloudlfare" should be "Cloudflare".

This typo will cause inconsistencies when the typename is used for logging or identification purposes.

🔎 Proposed fix
     @staticmethod
     def typename() -> str:
-        return "Cloudlfare.R2"
+        return "Cloudflare.R2"
benchmarks/wrappers/cloudflare/nodejs/storage.js-151-176 (1)

151-176: Inconsistent error handling: download_stream returns null for missing objects but then throws at line 175.

When R2 returns no object, the method returns null (line 156). However, if no backend is available, it throws an error (line 175). Additionally, if the object is found in R2 but has neither arrayBuffer nor text methods, it falls through to throwing even though the object exists.

Consider aligning the behavior — either always return null for missing objects or always throw.

🔎 Proposed fix for clearer control flow
   async download_stream(__bucket, key) {
     const instance = storage.instance || this;
     
     if (instance.handle) {
       const obj = await instance.handle.get(key);
-      if (!obj) return null;
+      if (!obj) {
+        throw new Error(`download_stream(): object not found in R2: ${key}`);
+      }
       // R2 object provides arrayBuffer()/text() helpers in Workers
       if (typeof obj.arrayBuffer === 'function') {
         const ab = await obj.arrayBuffer();
         return Buffer.from(ab);
       }
       if (typeof obj.text === 'function') {
         return await obj.text();
       }
-      // Fallback: return null
-      return null;
+      throw new Error(`download_stream(): R2 object has no readable content: ${key}`);
     }

     // Fallback to local filesystem
     const localPath = path.join('/tmp', key);
     if (fs && fs.existsSync(localPath)) {
       return fs.readFileSync(localPath);
     }

     throw new Error('download_stream(): object not found');
   }
sebs/cloudflare/cloudflare.py-73-73 (1)

73-73: Avoid mutable default argument.

Using {} as a default argument is a Python pitfall — the same dict instance is shared across calls.

🔎 Proposed fix
-    def initialize(self, config: Dict[str, str] = {}, resource_prefix: Optional[str] = None):
+    def initialize(self, config: Optional[Dict[str, str]] = None, resource_prefix: Optional[str] = None):
+        if config is None:
+            config = {}
benchmarks/wrappers/cloudflare/nodejs/storage.js-87-89 (1)

87-89: Use instance instead of this for consistency with the singleton pattern.

Other methods use storage.instance || this but download calls this.download_stream directly. For consistency and correctness when called on the singleton, use instance.

🔎 Proposed fix
   async download(__bucket, key, filepath) {
     const instance = storage.instance || this;
-    const data = await this.download_stream(__bucket, key);
+    const data = await instance.download_stream(__bucket, key);
sebs/cloudflare/r2.py-58-60 (1)

58-60: Avoid mutable default argument.

Using [] as a default argument is a common Python pitfall — the same list instance is shared across all calls if not overwritten.

🔎 Proposed fix
     def _create_bucket(
-        self, name: str, buckets: list[str] = [], randomize_name: bool = False
+        self, name: str, buckets: Optional[list[str]] = None, randomize_name: bool = False
     ) -> str:
+        if buckets is None:
+            buckets = []
         for bucket_name in buckets:

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @sebs/cloudflare/README.md:
- Around line 280-285: Add a clear limitation note to the "Platform Limitations"
section stating that Cloudflare does not support cold start detection and all
invocations will report is_cold: false; reference the handler example's
hardcoded is_cold: false and warn users this makes cold-start metrics
incomparable to platforms like AWS Lambda/Azure Functions and may skew benchmark
comparisons.
- Around line 191-250: The README response schema and metrics extraction
description are wrong; update the docs to reflect the actual handler and
extractor: note that the Python handler
(benchmarks/wrappers/cloudflare/python/handler.py) returns extra unused fields
(results_time, is_cold_worker, container_id, environ_container_id) and that
measurements are nested at result.output['result']['measurement'] rather than
using the top-level compute_time; document that HTTPTrigger.sync_invoke in
sebs/cloudflare/triggers.py ignores top-level compute_time and instead reads
cpu_time_us or cpu_time_ms to populate provider_times.execution, wall_time_us or
wall_time_ms to populate times.benchmark, is_cold from the nested measurement,
and memory_used_mb for memory metrics; remove or stop claiming top-level
compute_time is used, and explicitly list which fields are consumed vs
returned-but-unused.
🧹 Nitpick comments (3)
sebs/cloudflare/README.md (3)

138-142: Clarify tier-specific limitations and document paid plan differences.

The memory and CPU limits are stated as "128MB memory, 50ms CPU time on free tier," but the README doesn't mention that paid plans have different limits. This could mislead users deploying to paid accounts.

Add a subsection under "Platform Limitations" clarifying:

  • Free tier: 128MB memory, 50ms CPU time
  • Paid plans: Different limits (specify actual limits)
  • Link to Cloudflare's pricing/limits documentation

300-306: Specify Cloudflare's container image size limit instead of leaving it vague.

Line 305 states "Ensure container image size is under Cloudflare's limits" but doesn't specify the actual limit. This forces users to refer to external documentation.

If Cloudflare publishes a container image size limit, include it explicitly (e.g., "container images must be under X MB").


70-76: Minor wording: Line 76 is slightly awkward.

"This provides the behavior of SeBS to invoke serverless functions via either library or http triggers" is grammatically unclear. Consider rewording to: "This enables SeBS to invoke serverless functions via either library or HTTP triggers."

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 50cf891 and 6115f9c.

📒 Files selected for processing (1)
  • sebs/cloudflare/README.md
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
🔇 Additional comments (1)
sebs/cloudflare/README.md (1)

313-318: All external documentation links are valid and current.

The four Cloudflare documentation URLs in the References section are accessible and reflect the current state of Cloudflare's APIs and services, including recent features like Workflows in the Bindings documentation and R2 storage classes. No updates needed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @sebs/cloudflare/durable_objects.py:
- Around line 100-113: The update_cache method is using a non-existent private
attribute self._cache_client which will raise AttributeError; change the call in
update_cache to use the public property self.cache_client.update_nosql(...)
(matching retrieve_cache) and keep the same arguments and logging; ensure
update_cache references self.cache_client (not self._cache_client) and that
deployment_name, benchmark and self._tables[benchmark] are passed unchanged.
- Around line 30-40: The __init__ method for the Durable Objects class accepts a
credentials: CloudflareCredentials parameter but never stores it, causing
_get_auth_headers to raise AttributeError when accessing self._credentials; fix
by assigning the parameter to an instance variable (e.g., self._credentials =
credentials) inside __init__ so _get_auth_headers and other methods can
reference it reliably.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6115f9c and 5049c42.

📒 Files selected for processing (3)
  • sebs/cloudflare/durable_objects.py
  • sebs/cloudflare/function.py
  • sebs/sebs.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • sebs/cloudflare/function.py
🧰 Additional context used
🧬 Code graph analysis (2)
sebs/sebs.py (2)
sebs/utils.py (1)
  • has_platform (260-261)
sebs/cloudflare/cloudflare.py (1)
  • Cloudflare (25-1611)
sebs/cloudflare/durable_objects.py (3)
sebs/cloudflare/config.py (12)
  • CloudflareCredentials (9-127)
  • typename (32-33)
  • typename (140-141)
  • typename (210-211)
  • resources (218-219)
  • credentials (214-215)
  • api_token (36-37)
  • email (40-41)
  • api_key (44-45)
  • update_cache (118-121)
  • update_cache (167-173)
  • update_cache (248-251)
sebs/faas/config.py (1)
  • Resources (57-147)
sebs/cache.py (3)
  • Cache (37-463)
  • get_nosql_config (177-178)
  • update_nosql (190-193)
🪛 Ruff (0.14.10)
sebs/cloudflare/durable_objects.py

35-35: Unused method argument: credentials

(ARG002)


55-55: Avoid specifying long messages outside the exception class

(TRY003)


116-116: Unused method argument: primary_key

(ARG002)


116-116: Unused method argument: secondary_key

(ARG002)


146-146: Unused method argument: data

(ARG002)


147-147: Unused method argument: primary_key

(ARG002)


148-148: Unused method argument: secondary_key

(ARG002)


167-167: Avoid specifying long messages outside the exception class

(TRY003)


182-182: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (2)
sebs/sebs.py (1)

109-112: LGTM! Platform registration now follows consistent pattern.

The Cloudflare registration now correctly uses has_platform("cloudflare"), matching the pattern used for AWS, Azure, GCP, and OpenWhisk. This addresses the previous review concern about inconsistent registration.

sebs/cloudflare/durable_objects.py (1)

185-212: LGTM! Dictionary mutation issue resolved.

The two-step approach with list() wrapping correctly prevents RuntimeError from modifying the dictionary during iteration. The logic first locates the matching benchmark and table_key, then performs deletion after the loop completes.

@userlaurin userlaurin changed the title SeBS Cloudflare Compatibility & 311 Benchmark Translation to Nodejs SeBS Cloudflare Compatibility & 311/130 Benchmark Translation to Nodejs Jan 6, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In @benchmarks/100.webapps/130.crud-api/nodejs/function.js:
- Around line 23-47: The total and average calculations in queryProducts are
incorrect: priceSum currently adds product.price without accounting for
product.quantity and avgPrice divides sum(prices) by sum(quantity), which is
meaningless; change the aggregation so priceSum accumulates product.price *
product.quantity (i.e., priceSum += product.price * product.quantity) and
compute avgPrice as quantitySum > 0 ? priceSum / quantitySum : 0.0 (average unit
price per item), keeping product names appended to products and returning
total_cost: priceSum and avg_price: avgPrice.

In @sebs/cloudflare/cloudflare.py:
- Around line 1253-1280: The requests.get call in the method that fetches the
workers.dev subdomain should include a timeout to avoid hanging; update the call
to requests.get(url, headers=headers, timeout=...) and choose a sensible timeout
(e.g., 5-10s), and adjust the surrounding exception handling to specifically
catch requests.Timeout and requests.RequestException where appropriate so the
method still logs via self.logging.warning and returns None on timeout or other
request errors; refer to the existing method using _get_auth_headers,
self._workers_dev_subdomain, and the current requests.get usage to locate and
modify the call.
- Around line 148-154: The requests.get call used to verify Cloudflare
credentials lacks a timeout and can hang; update the call to include a sensible
timeout (e.g., timeout=10) or use a configurable attribute like self._timeout,
i.e., change requests.get(f"{self._api_base_url}/user/tokens/verify",
headers=headers) to include timeout, and add handling for
requests.exceptions.Timeout/requests.exceptions.RequestException to raise a
clear RuntimeError if the request times out or fails; refer to the requests.get
invocation and the response/status_code check to locate where to add the timeout
and error handling.
- Around line 1209-1213: Replace the bare except that follows response.json()
with a specific exception handler for JSON parse failures: catch
json.JSONDecodeError and ValueError (and bind it to a variable, e.g., "e")
around the call to response.json() where error_data and error_msg are set, then
fall back to the existing log path (self.logging.info(f"Container
provisioning... ({elapsed}s elapsed)"))—optionally include the exception text in
the log; ensure json is imported if not already and update the except block that
currently surrounds response.json()/error_msg assignment.
- Around line 799-814: The nested function replacer closes over loop-scoped
variables control_flow and builtins causing stale bindings; fix by capturing
their current values into locals when defining replacer (e.g., assign
local_control_flow = control_flow and local_builtins = builtins just before
defining replacer and refer to those inside replacer, or bind them as default
args like def replacer(match, control_flow=control_flow, builtins=builtins):),
then keep using replacer in the re.sub call so the correct sets are used for
each iteration.
- Around line 73-83: The initialize method currently declares a mutable default
config and never uses it; change the signature to use a safe default by
replacing config: Dict[str, str] = {} with config: Optional[Dict[str, str]] =
None (or remove the parameter entirely if it’s not needed), and if you keep it,
ensure you normalize it at the top of initialize (e.g., config = config or {})
before use and pass relevant values into initialize_resources or other code
paths; keep references to the existing initialize, _verify_credentials, and
initialize_resources symbols when making the change.
- Around line 1066-1077: The bare except around response.json() in the function
that handles the worker check should be replaced with a specific exception
handler for JSON parse errors: catch json.JSONDecodeError (and optionally
ValueError for compatibility) instead of using a bare except, and handle/log the
exception (e.g., with self.logging.debug/warning) before returning None; ensure
json is imported if not already and update the try/except that wraps
response.json().get("result") to use the specific exception types and an
exception variable (e) for logging.
🧹 Nitpick comments (8)
benchmarks/100.webapps/130.crud-api/nodejs/function.js (1)

49-78: Consider adding input validation.

The handler assumes all required fields are present in request.body and request.path. If upstream wrappers don't guarantee this, the function may throw unclear errors when accessing undefined properties.

Consider adding validation for:

  • Required body fields (cart, product_id, name, price, quantity for PUT)
  • Required path parameters (id for GET /cart/{id})
  • Data types (e.g., ensuring price and quantity are numbers)

Note: Line 67 uses body.cart for a GET request, which is unconventional (GET requests typically use path/query parameters), but may be intentional for this benchmark's design.

benchmarks/wrappers/cloudflare/nodejs/build.js (1)

108-155: Consider adding documentation for the post-processing logic.

The regex-based post-processing that converts dynamic __require calls into static imports is clever but complex. Adding comments explaining:

  • Why this transformation is necessary (Cloudflare Workers ESM requirements)
  • The pattern being matched (__require<digits>("node:..."))
  • The cache mechanism

would improve maintainability.

📝 Example documentation
       
-      // POST-PROCESS: Replace dynamic requires with static imports
+      // POST-PROCESS: Replace dynamic requires with static imports
+      // Cloudflare Workers require ESM with static imports for node: modules.
+      // esbuild generates __require\d*("node:...") calls for dynamic imports,
+      // so we transform them into static imports with a cache lookup.
       console.log('Post-processing to fix node: module imports...');
sebs/cloudflare/durable_objects.py (1)

183-183: Remove unnecessary f-string prefix.

The f-string on line 183 doesn't contain any placeholders, so the f prefix is unnecessary.

✨ Proposed fix
-        self.logging.info(f"Durable Objects data is managed within the Worker")
+        self.logging.info("Durable Objects data is managed within the Worker")
sebs/cloudflare/cloudflare.py (5)

385-391: Remove unused variable r2_bucket_configured.

The variable r2_bucket_configured is assigned on line 385 but never used afterward. This can be safely removed to clean up the code.

🧹 Proposed fix
         # Add R2 bucket binding for benchmarking files (for both native and container deployments)
-        r2_bucket_configured = False
         try:
             storage = self.system_resources.get_storage()
             bucket_name = storage.get_bucket(Resources.StorageBucketType.BENCHMARKS)
             if bucket_name:
                 toml_content += f"""# R2 bucket binding for benchmarking files
 # This bucket is used by fs and path polyfills to read benchmark data
 [[r2_buckets]]
 binding = "R2"
 bucket_name = "{bucket_name}"
 
 """
-                r2_bucket_configured = True
                 self.logging.info(f"R2 bucket '{bucket_name}' will be bound to worker as 'R2'")

654-657: Rename unused loop variable.

The loop variable dirnames on line 654 is not used within the loop body. Following Python conventions, it should be renamed to _dirnames to indicate it's intentionally unused.

✨ Proposed fix
         # Calculate total size of the package directory
         total_size = 0
-        for dirpath, dirnames, filenames in os.walk(directory):
+        for dirpath, _dirnames, filenames in os.walk(directory):
             for filename in filenames:
                 filepath = os.path.join(dirpath, filename)
                 total_size += os.path.getsize(filepath)

855-866: Remove unused variable result.

The variable result assigned on line 855 from the subprocess.run() call is never used. Since the success is verified by the lack of exception, this assignment can be removed.

🧹 Proposed fix
         # Install Node.js dependencies (needed for all containers for worker.js)
         self.logging.info(f"Installing @cloudflare/containers for worker.js orchestration in {directory}")
         try:
-            result = subprocess.run(
+            subprocess.run(
                 ["npm", "install", "--production"],
                 cwd=directory,
                 capture_output=True,
                 text=True,
                 check=True,
                 timeout=120
             )
             self.logging.info("npm install completed successfully")

924-927: Rename unused loop variable.

The loop variable dirnames on line 924 is not used within the loop body. Following Python conventions, it should be renamed to _dirnames.

✨ Proposed fix
         # Calculate package size (approximate, as it's a source directory)
         total_size = 0
-        for dirpath, dirnames, filenames in os.walk(directory):
+        for dirpath, _dirnames, filenames in os.walk(directory):
             for filename in filenames:
                 filepath = os.path.join(dirpath, filename)
                 total_size += os.path.getsize(filepath)

1495-1512: Rename unused loop variable.

The loop variable request_id on line 1495 is not used within the loop body. Following Python conventions, it should be renamed to _request_id to indicate it's intentionally unused.

✨ Proposed fix
-        for request_id, result in requests.items():
+        for _request_id, result in requests.items():
             # Count cold/warm starts
             if result.stats.cold_start:
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5049c42 and 8516207.

📒 Files selected for processing (5)
  • benchmarks/100.webapps/130.crud-api/nodejs/function.js
  • benchmarks/wrappers/cloudflare/nodejs/build.js
  • sebs/cloudflare/README.md
  • sebs/cloudflare/cloudflare.py
  • sebs/cloudflare/durable_objects.py
✅ Files skipped from review due to trivial changes (1)
  • sebs/cloudflare/README.md
🧰 Additional context used
🧬 Code graph analysis (2)
sebs/cloudflare/durable_objects.py (3)
sebs/cloudflare/resources.py (1)
  • config (37-38)
sebs/cloudflare/config.py (12)
  • CloudflareCredentials (9-127)
  • typename (32-33)
  • typename (140-141)
  • typename (210-211)
  • resources (218-219)
  • credentials (214-215)
  • api_token (36-37)
  • email (40-41)
  • api_key (44-45)
  • update_cache (118-121)
  • update_cache (167-173)
  • update_cache (248-251)
sebs/cache.py (3)
  • Cache (37-463)
  • get_nosql_config (177-178)
  • update_nosql (190-193)
sebs/cloudflare/cloudflare.py (2)
sebs/cloudflare/resources.py (4)
  • config (37-38)
  • CloudflareSystemResources (16-95)
  • get_storage (56-78)
  • _get_auth_headers (40-54)
sebs/cloudflare/triggers.py (8)
  • typename (19-20)
  • typename (72-73)
  • LibraryTrigger (7-57)
  • HTTPTrigger (60-154)
  • url (80-82)
  • url (85-86)
  • trigger_type (23-24)
  • trigger_type (76-77)
🪛 Ruff (0.14.10)
sebs/cloudflare/durable_objects.py

56-56: Avoid specifying long messages outside the exception class

(TRY003)


117-117: Unused method argument: primary_key

(ARG002)


117-117: Unused method argument: secondary_key

(ARG002)


147-147: Unused method argument: data

(ARG002)


148-148: Unused method argument: primary_key

(ARG002)


149-149: Unused method argument: secondary_key

(ARG002)


168-168: Avoid specifying long messages outside the exception class

(TRY003)


183-183: f-string without any placeholders

Remove extraneous f prefix

(F541)

sebs/cloudflare/cloudflare.py

73-73: Unused method argument: config

(ARG002)


73-73: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


117-117: Do not catch blind exception: Exception

(BLE001)


128-131: Avoid specifying long messages outside the exception class

(TRY003)


134-137: Avoid specifying long messages outside the exception class

(TRY003)


148-148: Probable use of requests call without timeout

(S113)


151-154: Avoid specifying long messages outside the exception class

(TRY003)


162-162: Starting a process with a partial executable path

(S607)


174-174: Starting a process with a partial executable path

(S607)


184-184: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


184-184: Avoid specifying long messages outside the exception class

(TRY003)


186-188: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


186-188: Avoid specifying long messages outside the exception class

(TRY003)


190-190: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


190-190: Avoid specifying long messages outside the exception class

(TRY003)


196-196: Starting a process with a partial executable path

(S607)


208-208: Starting a process with a partial executable path

(S607)


218-218: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


218-218: Avoid specifying long messages outside the exception class

(TRY003)


220-222: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


220-222: Avoid specifying long messages outside the exception class

(TRY003)


224-224: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


224-224: Avoid specifying long messages outside the exception class

(TRY003)


227-227: Unused method argument: container_uri

(ARG002)


385-385: Local variable r2_bucket_configured is assigned to but never used

Remove assignment to unused variable r2_bucket_configured

(F841)


387-387: Do not catch blind exception: Exception

(BLE001)


416-416: Avoid specifying long messages outside the exception class

(TRY003)


423-423: Unused method argument: architecture

(ARG002)


464-464: Unused method argument: is_cached

(ARG002)


482-482: Starting a process with a partial executable path

(S607)


496-496: Starting a process with a partial executable path

(S607)


507-507: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


508-508: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


508-508: Avoid specifying long messages outside the exception class

(TRY003)


510-510: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


511-511: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


511-511: Avoid specifying long messages outside the exception class

(TRY003)


513-515: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


513-515: Avoid specifying long messages outside the exception class

(TRY003)


525-525: Starting a process with a partial executable path

(S607)


533-533: Do not catch blind exception: Exception

(BLE001)


643-646: Avoid specifying long messages outside the exception class

(TRY003)


647-650: Avoid specifying long messages outside the exception class

(TRY003)


654-654: Loop control variable dirnames not used within loop body

Rename unused dirnames to _dirnames

(B007)


685-687: Avoid specifying long messages outside the exception class

(TRY003)


705-705: Avoid specifying long messages outside the exception class

(TRY003)


718-718: f-string without any placeholders

Remove extraneous f prefix

(F541)


804-804: Function definition does not bind loop variable control_flow

(B023)


804-804: Function definition does not bind loop variable builtins

(B023)


855-855: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)


856-856: Starting a process with a partial executable path

(S607)


864-864: Do not catch blind exception: Exception

(BLE001)


865-865: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


866-866: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


866-866: Avoid specifying long messages outside the exception class

(TRY003)


924-924: Loop control variable dirnames not used within loop body

Rename unused dirnames to _dirnames

(B007)


956-956: subprocess call: check for execution of untrusted input

(S603)


957-957: Starting a process with a partial executable path

(S607)


969-969: Consider moving this statement to an else block

(TRY300)


975-975: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


976-976: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


978-978: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


978-978: Avoid specifying long messages outside the exception class

(TRY003)


1011-1011: Avoid specifying long messages outside the exception class

(TRY003)


1066-1066: Probable use of requests call without timeout

(S113)


1071-1071: Do not use bare except

(E722)


1132-1132: subprocess call: check for execution of untrusted input

(S603)


1167-1167: Consider moving this statement to an else block

(TRY300)


1170-1170: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


1170-1170: Avoid specifying long messages outside the exception class

(TRY003)


1175-1175: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


1176-1176: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


1178-1178: Unused method argument: package_dir

(ARG002)


1212-1212: Do not use bare except

(E722)


1256-1256: Probable use of requests call without timeout

(S113)


1278-1278: Do not catch blind exception: Exception

(BLE001)


1348-1348: Avoid specifying long messages outside the exception class

(TRY003)


1365-1365: Unused method argument: benchmark

(ARG002)


1391-1391: Unused method argument: resources

(ARG002)


1438-1438: Unused method argument: functions

(ARG002)


1438-1438: Unused method argument: code_package

(ARG002)


1458-1458: Unused method argument: start_time

(ARG002)


1459-1459: Unused method argument: end_time

(ARG002)


1495-1495: Loop control variable request_id not used within loop body

Rename unused request_id to _request_id

(B007)

🔇 Additional comments (11)
benchmarks/100.webapps/130.crud-api/nodejs/function.js (3)

6-13: Previous issue resolved.

The missing await on the async insert operation flagged in the previous review has been addressed. The function now properly awaits the insert operation, preventing fire-and-forget behavior.


15-21: LGTM!

The function correctly awaits the async get operation and returns the result. Implementation is clean and straightforward.


1-4: No issue found. Line 3's nosql.nosql.get_instance() pattern is correct based on the module structure. The nosql.js file exports a class named nosql, and the double property access (variable nosql containing the exports, accessing property nosql for the class) is the natural result of ES6 named exports. While the identical naming between the variable and class is unconventional, it is not a functional error.

benchmarks/wrappers/cloudflare/nodejs/build.js (5)

5-28: LGTM!

The recursive file enumeration logic correctly excludes build artifacts, test files, and configuration files while collecting source files for bundling.


30-36: LGTM!

The file copy utility correctly ensures the destination directory exists before copying.


173-173: LGTM!

The script execution is straightforward and correct.


54-60: No issues found. The request-polyfill.js file exists in the wrapper directory where the build script expects it.


88-106: No action needed. Cloudflare Workers fully supports ES2020 and all later JavaScript features through its V8 engine, which is kept in sync with Chrome stable. The es2020 target in this build configuration is appropriate and compatible.

sebs/cloudflare/durable_objects.py (3)

30-40: LGTM! Previous credential storage issue resolved.

The __init__ method now correctly stores the credentials parameter as self._credentials (line 38), fixing the previously reported AttributeError. The initialization logic is sound.


101-114: LGTM! Cache client property usage is now correct.

The update_cache method now correctly uses self.cache_client (line 107) instead of the non-existent self._cache_client, matching the pattern in retrieve_cache. This resolves the previously reported AttributeError.


186-213: LGTM! Dictionary mutation issue resolved.

The remove_table method now uses a safe two-step approach: first finding the benchmark and table_key (lines 198-206 with list() wrappers to avoid mutation), then performing the deletion after iteration (lines 209-210). This prevents the RuntimeError: dictionary changed size during iteration that was previously flagged.

Comment on lines +73 to +83
def initialize(self, config: Dict[str, str] = {}, resource_prefix: Optional[str] = None):
"""
Initialize the Cloudflare Workers platform.

Args:
config: Additional configuration parameters
resource_prefix: Prefix for resource naming
"""
# Verify credentials are valid
self._verify_credentials()
self.initialize_resources(select_prefix=resource_prefix)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix mutable default argument and unused parameter.

Line 73 has two issues:

  1. The config parameter is defined but never used in the method body.
  2. The mutable default config: Dict[str, str] = {} can cause unexpected behavior if the same dict instance is reused across calls.
🔧 Proposed fix
-    def initialize(self, config: Dict[str, str] = {}, resource_prefix: Optional[str] = None):
+    def initialize(self, config: Optional[Dict[str, str]] = None, resource_prefix: Optional[str] = None):
         """
         Initialize the Cloudflare Workers platform.
 
         Args:
             config: Additional configuration parameters
             resource_prefix: Prefix for resource naming
         """
+        if config is None:
+            config = {}
         # Verify credentials are valid
         self._verify_credentials()
🧰 Tools
🪛 Ruff (0.14.10)

73-73: Unused method argument: config

(ARG002)


73-73: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🤖 Prompt for AI Agents
In @sebs/cloudflare/cloudflare.py around lines 73 - 83, The initialize method
currently declares a mutable default config and never uses it; change the
signature to use a safe default by replacing config: Dict[str, str] = {} with
config: Optional[Dict[str, str]] = None (or remove the parameter entirely if
it’s not needed), and if you keep it, ensure you normalize it at the top of
initialize (e.g., config = config or {}) before use and pass relevant values
into initialize_resources or other code paths; keep references to the existing
initialize, _verify_credentials, and initialize_resources symbols when making
the change.

Comment on lines +148 to +154
response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers)

if response.status_code != 200:
raise RuntimeError(
f"Failed to verify Cloudflare credentials: {response.status_code} - {response.text}\n"
f"Please check that your CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID are correct."
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout to requests call for security.

The requests.get() call on line 148 lacks a timeout parameter, which can cause the application to hang indefinitely if the Cloudflare API is unresponsive. This is a security and reliability concern.

🔒 Proposed fix
-        response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers)
+        response = requests.get(
+            f"{self._api_base_url}/user/tokens/verify",
+            headers=headers,
+            timeout=30  # 30 second timeout for credential verification
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers)
if response.status_code != 200:
raise RuntimeError(
f"Failed to verify Cloudflare credentials: {response.status_code} - {response.text}\n"
f"Please check that your CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID are correct."
)
response = requests.get(
f"{self._api_base_url}/user/tokens/verify",
headers=headers,
timeout=30 # 30 second timeout for credential verification
)
if response.status_code != 200:
raise RuntimeError(
f"Failed to verify Cloudflare credentials: {response.status_code} - {response.text}\n"
f"Please check that your CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID are correct."
)
🧰 Tools
🪛 Ruff (0.14.10)

148-148: Probable use of requests call without timeout

(S113)


151-154: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In @sebs/cloudflare/cloudflare.py around lines 148 - 154, The requests.get call
used to verify Cloudflare credentials lacks a timeout and can hang; update the
call to include a sensible timeout (e.g., timeout=10) or use a configurable
attribute like self._timeout, i.e., change
requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers) to
include timeout, and add handling for
requests.exceptions.Timeout/requests.exceptions.RequestException to raise a
clear RuntimeError if the request times out or fails; refer to the requests.get
invocation and the response/status_code check to locate where to add the timeout
and error handling.

Comment on lines +1253 to +1280
try:
headers = self._get_auth_headers()
url = f"{self._api_base_url}/accounts/{account_id}/workers/subdomain"
resp = requests.get(url, headers=headers)
if resp.status_code == 200:
body = resp.json()
sub = None
# result may contain 'subdomain' or nested structure
if isinstance(body, dict):
sub = body.get("result", {}).get("subdomain")

if sub:
self._workers_dev_subdomain = sub
return sub
else:
self.logging.warning(
"Could not find workers.dev subdomain in API response; "
"please enable the workers.dev subdomain in your Cloudflare dashboard."
)
return None
else:
self.logging.warning(
f"Failed to fetch workers.dev subdomain: {resp.status_code} - {resp.text}"
)
return None
except Exception as e:
self.logging.warning(f"Error fetching workers.dev subdomain: {e}")
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout to requests call for reliability.

The requests.get() call on line 1256 lacks a timeout parameter, which can cause the application to hang indefinitely if the Cloudflare API is unresponsive. This affects reliability when fetching the workers.dev subdomain.

🔒 Proposed fix
         try:
             headers = self._get_auth_headers()
             url = f"{self._api_base_url}/accounts/{account_id}/workers/subdomain"
-            resp = requests.get(url, headers=headers)
+            resp = requests.get(url, headers=headers, timeout=30)
             if resp.status_code == 200:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
headers = self._get_auth_headers()
url = f"{self._api_base_url}/accounts/{account_id}/workers/subdomain"
resp = requests.get(url, headers=headers)
if resp.status_code == 200:
body = resp.json()
sub = None
# result may contain 'subdomain' or nested structure
if isinstance(body, dict):
sub = body.get("result", {}).get("subdomain")
if sub:
self._workers_dev_subdomain = sub
return sub
else:
self.logging.warning(
"Could not find workers.dev subdomain in API response; "
"please enable the workers.dev subdomain in your Cloudflare dashboard."
)
return None
else:
self.logging.warning(
f"Failed to fetch workers.dev subdomain: {resp.status_code} - {resp.text}"
)
return None
except Exception as e:
self.logging.warning(f"Error fetching workers.dev subdomain: {e}")
return None
try:
headers = self._get_auth_headers()
url = f"{self._api_base_url}/accounts/{account_id}/workers/subdomain"
resp = requests.get(url, headers=headers, timeout=30)
if resp.status_code == 200:
body = resp.json()
sub = None
# result may contain 'subdomain' or nested structure
if isinstance(body, dict):
sub = body.get("result", {}).get("subdomain")
if sub:
self._workers_dev_subdomain = sub
return sub
else:
self.logging.warning(
"Could not find workers.dev subdomain in API response; "
"please enable the workers.dev subdomain in your Cloudflare dashboard."
)
return None
else:
self.logging.warning(
f"Failed to fetch workers.dev subdomain: {resp.status_code} - {resp.text}"
)
return None
except Exception as e:
self.logging.warning(f"Error fetching workers.dev subdomain: {e}")
return None
🧰 Tools
🪛 Ruff (0.14.10)

1256-1256: Probable use of requests call without timeout

(S113)


1278-1278: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In @sebs/cloudflare/cloudflare.py around lines 1253 - 1280, The requests.get
call in the method that fetches the workers.dev subdomain should include a
timeout to avoid hanging; update the call to requests.get(url, headers=headers,
timeout=...) and choose a sensible timeout (e.g., 5-10s), and adjust the
surrounding exception handling to specifically catch requests.Timeout and
requests.RequestException where appropriate so the method still logs via
self.logging.warning and returns None on timeout or other request errors; refer
to the existing method using _get_auth_headers, self._workers_dev_subdomain, and
the current requests.get usage to locate and modify the call.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In @benchmarks/wrappers/cloudflare/python/nosql.py:
- Line 192: The code is brittle because get_res is being massaged with
get_res.replace("'", "\"") before JSON parsing; stop doing that and parse the
data in its true format: ensure the producer stores JSON strings (fix the writer
to use json.dumps) or, if the KV contains Python literal reprs, replace the
fragile replace() with a safe parser (use json.loads(get_res) when data is valid
JSON, or ast.literal_eval(get_res) when Python literals are expected) and add
validation/error handling; remove the replace call and update the code that
reads/writes the KV so the stored format is unambiguous and parsed with the
proper function (reference the get_res variable and the surrounding KV
read/write logic).
- Line 91: Replace the hard assertion "assert len(keys) <= 100" with explicit
validation that raises a descriptive exception when the limit is exceeded: check
if len(keys) > 100 and raise a ValueError (or a custom exception) that includes
the actual len(keys) and the maximum allowed (100) so callers get a clear error
message instead of an abrupt assertion failure; update any docstring/comments
for the function in nosql.py that handles "keys" to reflect this explicit error
behavior.
- Line 184: Replace the hard assertion "assert len(keys) <= 100" with proper
error handling: detect when len(keys) > 100 and raise a descriptive exception
(e.g., ValueError) that includes the actual count and context (which
operation/function and parameter causing it). Apply the same pattern used to fix
nosql_do.query(): validate the keys collection early in the function that
constructs/uses the "keys" variable and raise a clear, actionable error instead
of asserting so the caller can handle it.
- Line 117: The class-level annotation for the singleton "instance" is
referencing the wrong name "nosql"; update the type hint to reference the
correct class "nosql_kv" (i.e., change the annotation on the variable instance:
Optional["nosql"] to Optional["nosql_kv"]) so the forward reference matches the
actual class name used in this module (look for the class definition named
nosql_kv and the variable instance).
- Around line 120-122: The init_instance function incorrectly constructs and
assigns nosql.instance using the nosql class; change it to instantiate and
assign nosql_kv instead (i.e., use nosql_kv() for creation and assign to
nosql.instance.env from the entry.env), ensuring init_instance references the
nosql_kv class for both the new instance and the instance property.
- Around line 204-207: get_instance currently constructs the wrong class by
calling nosql() (the nosql_do alias) instead of the KV implementation; change
the instantiation so that when nosql.instance is None you assign nosql.instance
= nosql_kv() (or the correct nosql_kv class/constructor) and return
nosql.instance, ensuring you're referencing the nosql_kv symbol used for the
key-value implementation rather than nosql().
- Line 88: Remove the leftover debug print by deleting the "print(key)" call in
benchmarks/wrappers/cloudflare/python/nosql.py; if key inspection is still
needed use the module's logger (e.g., logger.debug(...)) instead of printing to
stdout to avoid leaking debug output in production.
🧹 Nitpick comments (5)
benchmarks/wrappers/cloudflare/python/nosql.py (5)

51-51: Remove unused variable assignment.

The put_res variable is assigned but never used. Consider removing it or using it for error handling.

♻️ Proposed fix
     send_data = self.data_pre(data)
     k=self.key_maker(primary_key, secondary_key)
-    put_res = run_sync(self.get_table(table_name).put(k, send_data))
+    run_sync(self.get_table(table_name).put(k, send_data))
     return

94-94: Consider implementing bulk operations.

The TODO comment indicates that bulk operations would be preferred over individual gets. This could improve performance for queries returning multiple keys.

Would you like me to help investigate or implement bulk get operations for this query method?


140-144: Remove unused variable assignment.

Similar to line 51, the put_res variable is assigned but never used.

♻️ Proposed fix
-    put_res = (
-        run_sync(self.get_table(table_name).put(
+    run_sync(self.get_table(table_name).put(
             self.key_maker(primary_key, secondary_key),
-            json.dumps(data))
-        ))
+            json.dumps(data)))
     return

154-158: Remove unused variable assignment.

The put_res variable is assigned but never used.

♻️ Proposed fix
-    put_res = run_sync(
-        self.get_table(table_name).put(
+    run_sync(
+        self.get_table(table_name).put(
             self.key_maker(primary_key, secondary_key),
-            json.dumps(data)
-        ))
+            json.dumps(data)))
     return

187-187: Consider implementing bulk operations.

Similar to the TODO in nosql_do.query(), bulk operations would improve performance.

Would you like me to help investigate or implement bulk get operations for this query method?

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8516207 and 8b7003b.

📒 Files selected for processing (1)
  • benchmarks/wrappers/cloudflare/python/nosql.py
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/wrappers/cloudflare/python/nosql.py (1)
benchmarks/wrappers/cloudflare/python/container/nosql.py (8)
  • init_instance (18-22)
  • insert (53-66)
  • update (68-81)
  • get (83-92)
  • query (94-103)
  • delete (105-111)
  • get_instance (114-117)
  • nosql (11-117)
🪛 Ruff (0.14.10)
benchmarks/wrappers/cloudflare/python/nosql.py

23-23: Use explicit conversion flag

Replace with conversion flag

(RUF010)


26-26: Use explicit conversion flag

Replace with conversion flag

(RUF010)


38-38: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)


40-40: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)


51-51: Local variable put_res is assigned to but never used

Remove assignment to unused variable put_res

(F841)


125-125: Use explicit conversion flag

Replace with conversion flag

(RUF010)


128-128: Use explicit conversion flag

Replace with conversion flag

(RUF010)


140-140: Local variable put_res is assigned to but never used

Remove assignment to unused variable put_res

(F841)


154-154: Local variable put_res is assigned to but never used

Remove assignment to unused variable put_res

(F841)

🔇 Additional comments (1)
benchmarks/wrappers/cloudflare/python/nosql.py (1)

9-15: Previously reported critical issues have been fixed.

The class name mismatches in nosql_do that were flagged in the previous review have been correctly addressed. The type annotation now uses "nosql_do" and all instance references use nosql_do.instance.

keys = []
for key in list_res:
if key.startswith(prefix_key):
print(key)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove debug print statement.

This print statement appears to be leftover debug code and should be removed for production use.

🔧 Proposed fix
     for key in list_res:
         if key.startswith(prefix_key):
-            print(key)
             keys.append(key)
🤖 Prompt for AI Agents
In @benchmarks/wrappers/cloudflare/python/nosql.py at line 88, Remove the
leftover debug print by deleting the "print(key)" call in
benchmarks/wrappers/cloudflare/python/nosql.py; if key inspection is still
needed use the module's logger (e.g., logger.debug(...)) instead of printing to
stdout to avoid leaking debug output in production.

print(key)
keys.append(key)
##print("keys", keys)
assert len(keys) <= 100
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace hard assertion with proper error handling.

The hard assertion will crash the program if more than 100 keys are found. This should raise a descriptive exception instead to provide better error handling.

🛡️ Proposed fix
     ##print("keys", keys)
-    assert len(keys) <= 100
+    if len(keys) > 100:
+        raise ValueError(f"Query returned {len(keys)} keys, exceeding the 100 key limit")
🤖 Prompt for AI Agents
In @benchmarks/wrappers/cloudflare/python/nosql.py at line 91, Replace the hard
assertion "assert len(keys) <= 100" with explicit validation that raises a
descriptive exception when the limit is exceeded: check if len(keys) > 100 and
raise a ValueError (or a custom exception) that includes the actual len(keys)
and the maximum allowed (100) so callers get a clear error message instead of an
abrupt assertion failure; update any docstring/comments for the function in
nosql.py that handles "keys" to reflect this explicit error behavior.

for key in list_res.keys:
keys.append(key.name)
##print("keys", keys)
assert len(keys) <= 100
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace hard assertion with proper error handling.

Same issue as in nosql_do.query(): the hard assertion will crash if more than 100 keys are found. Replace with a descriptive exception.

🛡️ Proposed fix
     ##print("keys", keys)
-    assert len(keys) <= 100
+    if len(keys) > 100:
+        raise ValueError(f"Query returned {len(keys)} keys, exceeding the 100 key limit")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert len(keys) <= 100
if len(keys) > 100:
raise ValueError(f"Query returned {len(keys)} keys, exceeding the 100 key limit")
🤖 Prompt for AI Agents
In @benchmarks/wrappers/cloudflare/python/nosql.py at line 184, Replace the hard
assertion "assert len(keys) <= 100" with proper error handling: detect when
len(keys) > 100 and raise a descriptive exception (e.g., ValueError) that
includes the actual count and context (which operation/function and parameter
causing it). Apply the same pattern used to fix nosql_do.query(): validate the
keys collection early in the function that constructs/uses the "keys" variable
and raise a clear, actionable error instead of asserting so the caller can
handle it.

for key in keys:

get_res = run_sync(self.get_table(table_name).get(key))
get_res = get_res.replace("\'", "\"")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fragile string manipulation for JSON parsing.

Replacing single quotes with double quotes to fix JSON parsing suggests the data is being stored or retrieved in an incorrect format. This approach is brittle and could fail if the data contains escaped quotes or quotes within string values.

Investigation needed

Consider the following scenarios where this will break:

  • Data containing: {"name": "O'Brien"} → becomes {"name": "O"Brien"} (invalid JSON)
  • Data containing: {"quote": "She said 'hello'"} → mismatched quotes

Root cause options:

  1. If the KV store is incorrectly storing Python repr() instead of JSON, fix the storage format
  2. If this is intentional, document why and add validation
  3. Consider using ast.literal_eval() if Python literals are expected
🤖 Prompt for AI Agents
In @benchmarks/wrappers/cloudflare/python/nosql.py at line 192, The code is
brittle because get_res is being massaged with get_res.replace("'", "\"") before
JSON parsing; stop doing that and parse the data in its true format: ensure the
producer stores JSON strings (fix the writer to use json.dumps) or, if the KV
contains Python literal reprs, replace the fragile replace() with a safe parser
(use json.loads(get_res) when data is valid JSON, or ast.literal_eval(get_res)
when Python literals are expected) and add validation/error handling; remove the
replace call and update the code that reads/writes the KV so the stored format
is unambiguous and parsed with the proper function (reference the get_res
variable and the surrounding KV read/write logic).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @benchmarks/wrappers/cloudflare/python/nosql.py:
- Around line 161-168: The get method currently returns the raw stored string,
violating its Optional[dict] contract; update the get function (involving get(),
key_maker(), get_table(), run_sync()) to parse the retrieved JSON string with
json.loads before returning, handle the case where the run_sync result is None
by returning None, and propagate or wrap JSON parsing errors appropriately so
callers receive a dict or None consistent with insert/update behavior.
🧹 Nitpick comments (1)
benchmarks/wrappers/cloudflare/python/nosql.py (1)

54-63: Clarify update vs. insert semantics.

The comment on line 54 questions whether update should have different behavior from insert. In typical NoSQL patterns, update might want to:

  • Merge with existing data rather than overwrite
  • Return an error if the key doesn't exist (vs. insert creating it)
  • Perform atomic updates or conditional writes

If this distinction matters for your use case, the methods should be differentiated; otherwise, consider removing the comment or explicitly documenting that they're intentionally identical.

Would you like me to suggest an implementation that differentiates insert and update operations?

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b7003b and 5d9e36d.

📒 Files selected for processing (1)
  • benchmarks/wrappers/cloudflare/python/nosql.py
🧰 Additional context used
🪛 Ruff (0.14.10)
benchmarks/wrappers/cloudflare/python/nosql.py

23-23: Use explicit conversion flag

Replace with conversion flag

(RUF010)


26-26: Use explicit conversion flag

Replace with conversion flag

(RUF010)


38-38: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)


40-40: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)


51-51: Local variable put_res is assigned to but never used

Remove assignment to unused variable put_res

(F841)


125-125: Use explicit conversion flag

Replace with conversion flag

(RUF010)


128-128: Use explicit conversion flag

Replace with conversion flag

(RUF010)


140-140: Local variable put_res is assigned to but never used

Remove assignment to unused variable put_res

(F841)


154-154: Local variable put_res is assigned to but never used

Remove assignment to unused variable put_res

(F841)

🔇 Additional comments (1)
benchmarks/wrappers/cloudflare/python/nosql.py (1)

29-40: Verify pickle security implications.

Pickle deserialization can be unsafe with untrusted data. Ensure that the Durable Object storage is fully trusted and cannot be manipulated by untrusted sources. If there's any possibility of untrusted data entering this path, switch to JSON serialization instead.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @benchmarks/wrappers/cloudflare/python/storage.py:
- Around line 46-61: The bug is double-prefixing "/tmp" for already-absolute tmp
paths causing invalid paths; fix by (1) in download(bucket, key, filepath) add
the actual filesystem path (real_fp) to written_files instead of the requested
filepath (self.written_files.add(real_fp)) and (2) in upload(bucket, key,
filepath) when you detect filepath in written_files, only prepend "/tmp" if the
filepath does not already start with "/tmp" (e.g., if filepath in
self.written_files: filepath = filepath if filepath.startswith("/tmp") else
"/tmp" + os.path.abspath(filepath)). Ensure these changes use the existing
upload, download, and written_files symbols.
- Around line 101-109: In adownload_stream, get_res from
get_bucket(...).get(key) is an R2ObjectBody and does not have .bytes(); replace
the await get_res.bytes() call with await get_res.arrayBuffer() and convert the
returned ArrayBuffer to Python bytes by wrapping it in a JS Uint8Array (e.g.,
js.Uint8Array.new(array_buffer)) and then calling bytes(...) on that result so
the function returns proper binary data instead of breaking at runtime.
🧹 Nitpick comments (3)
benchmarks/wrappers/cloudflare/python/storage.py (3)

35-38: Unused bucket parameter in get_bucket

bucket is never used here, which triggers ARG002 and can confuse readers. If you don’t plan to support multiple bindings, either drop the parameter or rename it to _bucket to make the intent explicit.


75-96: Unused put_res from bobj.put

put_res is assigned but never used, triggering F841 and slightly obscuring intent. Since you only care about the side effect of the PUT (and exceptions will propagate), you can drop the variable:

♻️ Simplify `aupload_stream`
-        bobj = self.get_bucket(bucket)
-        put_res = await bobj.put(unique_key, data_js)
-        return unique_key
+        bobj = self.get_bucket(bucket)
+        await bobj.put(unique_key, data_js)
+        return unique_key

111-116: Duplicate return storage.instance is dead code

The second return storage.instance is unreachable and can be removed:

✂️ Remove dead code
     def get_instance():
         if storage.instance is None:
             raise RuntimeError("must init storage singleton first")
-        return storage.instance
-        return storage.instance
+        return storage.instance

The current exception message is short and clear; TRY003 from Ruff is mostly stylistic here, so I’d keep it as-is unless you prefer a custom exception type.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d9e36d and 8d4cc0f.

📒 Files selected for processing (1)
  • benchmarks/wrappers/cloudflare/python/storage.py
🧰 Additional context used
🪛 Ruff (0.14.10)
benchmarks/wrappers/cloudflare/python/storage.py

35-35: Unused method argument: bucket

(ARG002)


48-48: Probable insecure usage of temporary file or directory: "/tmp"

(S108)


57-57: Probable insecure usage of temporary file or directory: "/tmp"

(S108)


58-58: Probable insecure usage of temporary file or directory: "/tmp"

(S108)


95-95: Local variable put_res is assigned to but never used

Remove assignment to unused variable put_res

(F841)


114-114: Avoid specifying long messages outside the exception class

(TRY003)

Comment on lines +46 to +61
def upload(self, bucket, key, filepath):
if filepath in self.written_files:
filepath = "/tmp" + os.path.abspath(filepath)
with open(filepath, "rb") as f:
unique_key = self.upload_stream(bucket, key, f.read())
return unique_key

def download(self, bucket, key, filepath):
data = self.download_stream(bucket, key)
# should only allow writes to tmp dir. so do have to edit the filepath here?
real_fp = filepath
if not filepath.startswith("/tmp"):
real_fp = "/tmp" + os.path.abspath(filepath)

self.written_files.add(filepath)
with open(real_fp, "wb") as f:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

/tmp prefixing can generate invalid paths for already-absolute tmp files

The interaction between upload, download, and written_files has a corner case:

  • download records the requested filepath in written_files.
  • upload rewrites any path found in written_files by prepending "/tmp" + os.path.abspath(filepath).

If a caller ever uses a path that already starts with /tmp (e.g., via download_directory(..., out_path="/tmp")), you’ll end up trying to open /tmp/tmp/..., which likely doesn’t exist.

You can retain the current behavior for non‑/tmp paths and avoid the double prefix with a small guard:

🔧 Suggested fix for `/tmp` double-prefix
     def upload(self, bucket, key, filepath):
-        if filepath in self.written_files:
-            filepath = "/tmp" + os.path.abspath(filepath)
+        if filepath in self.written_files and not filepath.startswith("/tmp"):
+            filepath = "/tmp" + os.path.abspath(filepath)

Note: Ruff’s S108 warning about /tmp is expected in Workers, since /tmp is the canonical writable directory; your os.path.abspath usage already mitigates simple traversal issues.

🧰 Tools
🪛 Ruff (0.14.10)

48-48: Probable insecure usage of temporary file or directory: "/tmp"

(S108)


57-57: Probable insecure usage of temporary file or directory: "/tmp"

(S108)


58-58: Probable insecure usage of temporary file or directory: "/tmp"

(S108)

🤖 Prompt for AI Agents
In @benchmarks/wrappers/cloudflare/python/storage.py around lines 46 - 61, The
bug is double-prefixing "/tmp" for already-absolute tmp paths causing invalid
paths; fix by (1) in download(bucket, key, filepath) add the actual filesystem
path (real_fp) to written_files instead of the requested filepath
(self.written_files.add(real_fp)) and (2) in upload(bucket, key, filepath) when
you detect filepath in written_files, only prepend "/tmp" if the filepath does
not already start with "/tmp" (e.g., if filepath in self.written_files: filepath
= filepath if filepath.startswith("/tmp") else "/tmp" +
os.path.abspath(filepath)). Ensure these changes use the existing upload,
download, and written_files symbols.

Comment on lines +101 to +109
async def adownload_stream(self, bucket, key):
bobj = self.get_bucket(bucket)
get_res = await bobj.get(key)
if get_res == jsnull:
print("key not stored in bucket")
return b''
# Always read as raw binary data (Blob/ArrayBuffer)
data = await get_res.bytes()
return bytes(data)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Cloudflare Workers R2 JavaScript API Bucket.get() return object methods bytes() arrayBuffer()

💡 Result:

Short answer: env.BUCKET.get(key) returns Promise<R2ObjectBody | R2Object | null>. When you get an R2ObjectBody it exposes a ReadableStream in .body plus convenience read helpers — arrayBuffer(), text(), json(), blob() and bodyUsed. There is no bytes() method on the R2ObjectBody; use arrayBuffer() (or blob()/text()) to read the bytes. [1][2]

Example:

  • Get object and read as ArrayBuffer:
    const obj = await env.MY_BUCKET.get("file");
    if (obj?.body) {
    const buf = await obj.arrayBuffer(); // ArrayBuffer of the object's bytes
    // buf.byteLength ...
    }

Sources: Cloudflare Workers R2 API reference (R2Object / R2ObjectBody) and Download objects docs. [1][2]


🏁 Script executed:

# First, understand the file structure and locate relevant files
git ls-files | grep -E "(cloudflare|R2|storage)" | head -20

Repository: spcl/serverless-benchmarks

Length of output: 1122


🏁 Script executed:

# Look at the storage.py file to understand the full context
cat -n benchmarks/wrappers/cloudflare/python/storage.py

Repository: spcl/serverless-benchmarks

Length of output: 4927


🏁 Script executed:

# Search for usage of 'bytes()' method or 'get_res' patterns in the codebase
rg "\.bytes\(\)" --type py -B 2 -A 2

Repository: spcl/serverless-benchmarks

Length of output: 915


🏁 Script executed:

# Search for R2 imports or references
rg -i "r2\|bucket\.get" --type py -B 2 -A 2

Repository: spcl/serverless-benchmarks

Length of output: 52


Fix: R2 get() returns R2ObjectBody which has no .bytes() method

The Cloudflare R2 API for Bucket.get() returns an R2ObjectBody that exposes arrayBuffer(), text(), json(), and blob(), but not bytes(). Line 108 will fail at runtime with "bytes is not a function".

Use arrayBuffer() instead:

Suggested fix
data = await get_res.arrayBuffer()
return bytes(js.Uint8Array.new(data))
🤖 Prompt for AI Agents
In @benchmarks/wrappers/cloudflare/python/storage.py around lines 101 - 109, In
adownload_stream, get_res from get_bucket(...).get(key) is an R2ObjectBody and
does not have .bytes(); replace the await get_res.bytes() call with await
get_res.arrayBuffer() and convert the returned ArrayBuffer to Python bytes by
wrapping it in a JS Uint8Array (e.g., js.Uint8Array.new(array_buffer)) and then
calling bytes(...) on that result so the function returns proper binary data
instead of breaking at runtime.

@@ -0,0 +1,474 @@
# Cloudflare Workers Implementation for SeBS

This directory contains the **complete implementation** of Cloudflare Workers support for the SeBS (Serverless Benchmarking Suite).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move it to docs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should merge with existing docs, possibly splitting the file - something to benchmark, something else to platforms, something else to storage, etc.

## Implementation Status

✅ **Fully Implemented** - All features are production-ready:
- Multi-language support (JavaScript, Python, Java, Go, Rust) via containers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have Go and Rust support :-)

from sebs.faas.function import Trigger, ExecutionResult


class LibraryTrigger(Trigger):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need this class at all; just don't implement this trigger type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

COPY . .

# Create ffmpeg directory and symlink for video-processing benchmark compatibility
RUN mkdir -p /app/ffmpeg && ln -s /usr/bin/ffmpeg /app/ffmpeg/ffmpeg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is does not look correct - all of this should be handled by the benchmark itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this symlink the benchmark 220 is looking for ffmpeg at /app and throws:

FileNotFoundError: [Errno 2] No such file or directory: '/app/ffmpeg/ffmpeg'\n'}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh i just noticed the other platforms just run the init.sh file in the benchmark directory. this installs the ffpeg etc at the right locations. so i updated our code to do the same.

# self.logging.info("Waiting 120 seconds for benchmark initialization...")
# time.sleep(400)

# For container deployments, wait for Durable Object infrastructure to initialize
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we do it earlier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safety net for sometimes provisioning of the container itself can take longer even though the /health endpoint already works

raise RuntimeError(error_msg)

def _wait_for_durable_object_ready(self, worker_name: str, package_dir: str, env: dict):
"""Wait for container Durable Object to be fully provisioned and ready."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this - it looks like we wait until at least one worker is ready and running but how is this related to Durable Object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the workers instantiate the durable object.... so we need to wait for the worker to be ready and thus transitively the durable object.

def _build_workers_dev_url(self, worker_name: str, account_id: Optional[str]) -> str:
"""Build a best-effort public URL for a worker.

Prefer using the account's readable workers.dev subdomain when available
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does it fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just if the user has not configured correctly

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@benchmarks/wrappers/cloudflare/python/nosql.py`:
- Around line 107-111: The get_instance method on nosql_do currently constructs
a nosql_do() without ensuring the required binding attribute, which leads to
AttributeError if called before init_instance; update nosql_do.get_instance to
validate that an instance exists and that instance.binding is set (or otherwise
initialized) and if not, raise a clear RuntimeError instructing callers to call
nosql_do.init_instance(...) first (or provide the needed binding), or
alternatively call the existing init_instance flow if safe; reference the
nosql_do.get_instance and nosql_do.init_instance symbols and the
instance.binding attribute in your change.
- Around line 203-207: get_instance currently constructs a nosql_kv() without
ensuring the required env attribute is set; change get_instance (and mirror the
change made for nosql_do.get_instance) so it either calls init_instance to
create the singleton with env or validates and raises a clear error if env
hasn't been initialized; specifically update nosql_kv.get_instance to return the
existing instance if set, otherwise call nosql_kv.init_instance(...) (or require
an env parameter) to create the instance with env populated, or throw an
explicit exception telling callers to call init_instance first, so subsequent
methods on the singleton won't hit AttributeError due to a missing env.

In `@docs/platforms.md`:
- Around line 227-228: Update the workers.dev example URL in the HTTP Trigger
bullet: replace the placeholder https://{name}.{account}.workers.dev with
https://{name}.{subdomain}.workers.dev in the "- **HTTP Trigger**" line so the
docs use the correct {subdomain} token instead of {account}.
- Around line 236-244: Add blank lines before and after the markdown table that
begins with "| Instance Type | vCPU | Memory | Disk |" so the table is separated
from surrounding paragraphs (specifically the "Metrics Collection" paragraph
that follows); this satisfies markdownlint rule MD058—insert one empty line
above the table and one empty line below it in docs/platforms.md.

In `@sebs/cloudflare/cloudflare.py`:
- Around line 1451-1479: The digit-prefix rule must apply to all workers: in
format_function_name, after computing formatted (and stripping hyphens), check
if formatted is non-empty and starts with a digit; if so, prefix it
appropriately — use 'container-' when container_deployment is True and 'w-' (or
another agreed native-worker prefix) when container_deployment is False —
instead of only handling the container_deployment branch; update the conditional
that currently checks "if container_deployment and formatted and
formatted[0].isdigit()" to always enforce the digit-prefix rule based on
formatted and container_deployment.
♻️ Duplicate comments (9)
benchmarks/wrappers/cloudflare/python/nosql.py (3)

86-91: Remove debug print and replace assertion with proper error handling.

Line 88 contains a debug print statement that should be removed. Line 91 uses a hard assert that will crash with an unhelpful AssertionError if more than 100 keys are returned.

🔧 Proposed fix
         keys = []
         for key in list_res:
             if key.startswith(prefix_key):
-                print(key)
                 keys.append(key)
         ##print("keys", keys)
-        assert len(keys) <= 100
+        if len(keys) > 100:
+            raise ValueError(f"Query returned {len(keys)} keys, exceeding the 100 key limit")

161-168: get method returns raw string instead of parsed dict.

The method signature declares -> Optional[dict], but it returns the raw string from KV storage. Since insert and update store data with json.dumps(), this method should parse the result with json.loads() for consistency.

🔧 Proposed fix
     def get(
         self, table_name: str, primary_key: Tuple[str, str], secondary_key: Tuple[str, str]
     ) -> Optional[dict]:
         get_res = run_sync(
             self.get_table(table_name).get(
                 self.key_maker(primary_key, secondary_key)
             ))
-        return get_res
+        if get_res is None:
+            return None
+        return json.loads(get_res)

180-196: Replace assertion and fix fragile JSON parsing.

Line 184 has the same hard assertion issue. Line 192 uses brittle string replacement ("'""\"") which will corrupt data containing apostrophes (e.g., {"name": "O'Brien"}).

🔧 Proposed fix

If the get method is fixed to return json.loads(get_res), this query method can be simplified:

         keys = []
         for key in list_res.keys:
             keys.append(key.name)
         ##print("keys", keys)
-        assert len(keys) <= 100
+        if len(keys) > 100:
+            raise ValueError(f"Query returned {len(keys)} keys, exceeding the 100 key limit")


         # todo: please use bulk sometime (it didn't work when i tried it)
         res = []
         for key in keys:
             
             get_res = run_sync(self.get_table(table_name).get(key))
-            get_res = get_res.replace("\'", "\"")
-            ##print("gr", get_res)
-        
-            res.append(json.loads(get_res))
+            if get_res is not None:
+                res.append(json.loads(get_res))
         return res

The root cause of needing quote replacement suggests data may be stored incorrectly elsewhere. Verify that all writers use json.dumps() consistently.

sebs/cloudflare/cloudflare.py (6)

73-83: Avoid mutable default & unused config parameter.
The default {} is unsafe and config is unused. Prefer None and normalize, or drop the parameter entirely.

🔧 Proposed fix
-    def initialize(self, config: Dict[str, str] = {}, resource_prefix: Optional[str] = None):
+    def initialize(self, config: Optional[Dict[str, str]] = None, resource_prefix: Optional[str] = None):
@@
-        # Verify credentials are valid
+        if config is None:
+            config = {}
+        # Verify credentials are valid
         self._verify_credentials()

148-154: Add timeout + request error handling for credential verification.
requests.get() without a timeout can hang indefinitely; handle request errors explicitly.

🔧 Proposed fix
-        response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers)
+        try:
+            response = requests.get(
+                f"{self._api_base_url}/user/tokens/verify",
+                headers=headers,
+                timeout=30,
+            )
+        except requests.exceptions.RequestException as e:
+            raise RuntimeError("Failed to verify Cloudflare credentials") from e

841-849: Bind loop variables in replacer to avoid stale closure.
The nested function captures control_flow/builtins from the loop scope; bind them explicitly.

🐛 Proposed fix
-                        def replacer(match):
+                        def replacer(match, control_flow=control_flow, builtins=builtins):

1088-1094: Don’t create LibraryTriggers when the platform doesn’t support them.
Docs state Library triggers aren’t supported, but they’re still created and returned. This is likely to surface as unsupported trigger usage at runtime.

🔧 Proposed fix
-        from sebs.cloudflare.triggers import LibraryTrigger, HTTPTrigger
+        from sebs.cloudflare.triggers import HTTPTrigger
@@
-        library_trigger = LibraryTrigger(func_name, self)
-        library_trigger.logging_handlers = self.logging_handlers
-        worker.add_trigger(library_trigger)
+        # Library triggers are not supported for Cloudflare Workers
@@
-        from sebs.cloudflare.triggers import LibraryTrigger, HTTPTrigger
+        from sebs.cloudflare.triggers import HTTPTrigger
@@
-        if trigger_type == Trigger.TriggerType.LIBRARY:
-            trigger = LibraryTrigger(worker.name, self)
-            trigger.logging_handlers = self.logging_handlers
-            return trigger
-        elif trigger_type == Trigger.TriggerType.HTTP:
+        if trigger_type == Trigger.TriggerType.HTTP:
             account_id = worker.account_id or self.config.credentials.account_id
             worker_url = self._build_workers_dev_url(worker.name, account_id)
             trigger = HTTPTrigger(worker.name, worker_url)
             trigger.logging_handlers = self.logging_handlers
             return trigger
+        elif trigger_type == Trigger.TriggerType.LIBRARY:
+            raise NotImplementedError("Library triggers are not supported for Cloudflare Workers")
Cloudflare Workers trigger types support library trigger

Also applies to: 1622-1635


1109-1115: Add timeout + specific JSON error handling in _get_worker.

🔧 Proposed fix
-        response = requests.get(url, headers=headers)
+        try:
+            response = requests.get(url, headers=headers, timeout=30)
+        except requests.exceptions.RequestException as e:
+            self.logging.warning(f"Failed to query worker {worker_name}: {e}")
+            return None
@@
-            try:
-                return response.json().get("result")
-            except:
-                return None
+            try:
+                return response.json().get("result")
+            except (json.JSONDecodeError, ValueError, KeyError):
+                return None

1296-1322: Add timeout + narrow exception handling in _get_workers_dev_subdomain.

🔧 Proposed fix
-            resp = requests.get(url, headers=headers)
+            resp = requests.get(url, headers=headers, timeout=30)
@@
-        except Exception as e:
+        except requests.exceptions.RequestException as e:
             self.logging.warning(f"Error fetching workers.dev subdomain: {e}")
             return None
🧹 Nitpick comments (2)
benchmarks/wrappers/cloudflare/python/nosql.py (1)

28-40: Consider security implications of pickle serialization.

Using pickle for serialization carries inherent risks (arbitrary code execution during deserialization). While this appears to be internal Worker-to-Durable-Object communication where data is trusted, consider using JSON serialization for defense-in-depth, similar to how nosql_kv handles data.

♻️ Suggested alternative using JSON
-## these data conversion funcs should not be necessary. i couldn't get pyodide to clone the data otherwise
-    def data_pre(self, data):
-        return pickle.dumps(data, 0).decode("ascii")
-
-    def data_post(self, data):
-        # Handle None (key not found in storage)
-        if data is None:
-            return None
-        # Handle both string and bytes data from Durable Object storage
-        if isinstance(data, str):
-            return pickle.loads(bytes(data, "ascii"))
-        else:
-            return pickle.loads(data)
+    def data_pre(self, data):
+        return json.dumps(data)
+
+    def data_post(self, data):
+        if data is None:
+            return None
+        return json.loads(data)

If pickle is required due to Pyodide cloning limitations as the comment suggests, document this constraint more explicitly and ensure data sources are always trusted.

sebs/cloudflare/cloudflare.py (1)

1189-1204: Avoid double-waiting after container deployment.
_wait_for_durable_object_ready() already waits; the extra 60s sleep is redundant unless readiness failed.

♻️ Proposed refactor
-            if container_deployment:
-                self.logging.info("Waiting for container Durable Object to initialize...")
-                self._wait_for_durable_object_ready(worker_name, package_dir, env)
+            if container_deployment:
+                self.logging.info("Waiting for container Durable Object to initialize...")
+                ready = self._wait_for_durable_object_ready(worker_name, package_dir, env)
@@
-            if container_deployment:
-                self.logging.info("Waiting 60 seconds for container Durable Object to initialize...")
-                time.sleep(60)
+            if container_deployment and not ready:
+                self.logging.info("Waiting 60 seconds for container Durable Object to initialize...")
+                time.sleep(60)

Comment on lines +107 to +111
@staticmethod
def get_instance():
if nosql_do.instance is None:
nosql_do.instance = nosql_do()
return nosql_do.instance
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

get_instance creates instance without required binding attribute.

If get_instance() is called before init_instance(), the created instance will lack the binding attribute, causing AttributeError on subsequent operations. Consider raising an explicit error or initializing defensively.

🔧 Proposed fix
     `@staticmethod`
     def get_instance():
         if nosql_do.instance is None:
-            nosql_do.instance = nosql_do()
+            raise RuntimeError("nosql_do not initialized. Call init_instance() first.")
         return nosql_do.instance
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@staticmethod
def get_instance():
if nosql_do.instance is None:
nosql_do.instance = nosql_do()
return nosql_do.instance
`@staticmethod`
def get_instance():
if nosql_do.instance is None:
raise RuntimeError("nosql_do not initialized. Call init_instance() first.")
return nosql_do.instance
🤖 Prompt for AI Agents
In `@benchmarks/wrappers/cloudflare/python/nosql.py` around lines 107 - 111, The
get_instance method on nosql_do currently constructs a nosql_do() without
ensuring the required binding attribute, which leads to AttributeError if called
before init_instance; update nosql_do.get_instance to validate that an instance
exists and that instance.binding is set (or otherwise initialized) and if not,
raise a clear RuntimeError instructing callers to call
nosql_do.init_instance(...) first (or provide the needed binding), or
alternatively call the existing init_instance flow if safe; reference the
nosql_do.get_instance and nosql_do.init_instance symbols and the
instance.binding attribute in your change.

Comment on lines +203 to +207
@staticmethod
def get_instance():
if nosql_kv.instance is None:
nosql_kv.instance = nosql_kv()
return nosql_kv.instance
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

get_instance creates instance without required env attribute.

Same issue as nosql_do.get_instance(): if called before init_instance(), the instance will lack the env attribute, causing AttributeError on subsequent operations.

🔧 Proposed fix
     `@staticmethod`
     def get_instance():
         if nosql_kv.instance is None:
-            nosql_kv.instance = nosql_kv()
+            raise RuntimeError("nosql_kv not initialized. Call init_instance() first.")
         return nosql_kv.instance
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@staticmethod
def get_instance():
if nosql_kv.instance is None:
nosql_kv.instance = nosql_kv()
return nosql_kv.instance
`@staticmethod`
def get_instance():
if nosql_kv.instance is None:
raise RuntimeError("nosql_kv not initialized. Call init_instance() first.")
return nosql_kv.instance
🤖 Prompt for AI Agents
In `@benchmarks/wrappers/cloudflare/python/nosql.py` around lines 203 - 207,
get_instance currently constructs a nosql_kv() without ensuring the required env
attribute is set; change get_instance (and mirror the change made for
nosql_do.get_instance) so it either calls init_instance to create the singleton
with env or validates and raises a clear error if env hasn't been initialized;
specifically update nosql_kv.get_instance to return the existing instance if
set, otherwise call nosql_kv.init_instance(...) (or require an env parameter) to
create the instance with env populated, or throw an explicit exception telling
callers to call init_instance first, so subsequent methods on the singleton
won't hit AttributeError due to a missing env.

Comment on lines +227 to +228
- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{account}.workers.dev`
- **Library Trigger**: ❌ Not currently supported
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Cloudflare Workers workers.dev URL format documentation

💡 Result:

Format: <YOUR_WORKER_NAME>.<YOUR_ACCOUNT_SUBDOMAIN>.workers.dev. Workers.dev account subdomain itself is <YOUR_ACCOUNT_SUBDOMAIN>.workers.dev. Worker-name restrictions when using workers.dev: ≤63 characters, only letters/numbers/dashes, and cannot start or end with a dash. [1][2]

Preview/versioned preview URL pattern: <VERSION_PREFIX>-<WORKER_NAME>..workers.dev. [3]

Sources: Cloudflare Workers docs. [1] [2] [3]


Replace {account} placeholder with {subdomain} in workers.dev URL.
The workers.dev URL format uses an account subdomain, not an account ID. The placeholder should reflect this to avoid confusion when users configure Workers.

✏️ Documentation fix
-- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{account}.workers.dev`
+- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{subdomain}.workers.dev`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{account}.workers.dev`
- **Library Trigger**: ❌ Not currently supported
- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{subdomain}.workers.dev`
- **Library Trigger**: ❌ Not currently supported
🤖 Prompt for AI Agents
In `@docs/platforms.md` around lines 227 - 228, Update the workers.dev example URL
in the HTTP Trigger bullet: replace the placeholder
https://{name}.{account}.workers.dev with https://{name}.{subdomain}.workers.dev
in the "- **HTTP Trigger**" line so the docs use the correct {subdomain} token
instead of {account}.

Comment on lines +1451 to +1479
@staticmethod
def format_function_name(name: str, container_deployment: bool = False) -> str:
"""
Format a function name to comply with Cloudflare Worker naming rules.

Worker names must:
- Be lowercase
- Contain only alphanumeric characters and hyphens
- Not start or end with a hyphen
- Not start with a digit

Args:
name: The original name
container_deployment: Whether this is a container worker (adds 'w-' prefix if name starts with digit)

Returns:
Formatted name
"""
# Convert to lowercase and replace invalid characters
formatted = name.lower().replace('_', '-').replace('.', '-')
# Remove any characters that aren't alphanumeric or hyphen
formatted = ''.join(c for c in formatted if c.isalnum() or c == '-')
# Remove leading/trailing hyphens
formatted = formatted.strip('-')
# Ensure container worker names don't start with a digit (Cloudflare requirement)
# Only add prefix for container workers to differentiate from native workers
if container_deployment and formatted and formatted[0].isdigit():
formatted = 'container-' + formatted
return formatted
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && git ls-files | head -20

Repository: spcl/serverless-benchmarks

Length of output: 138


🏁 Script executed:

cat sebs/cloudflare/cloudflare.py | wc -l

Repository: spcl/serverless-benchmarks

Length of output: 74


🏁 Script executed:

sed -n '1451,1479p' sebs/cloudflare/cloudflare.py

Repository: spcl/serverless-benchmarks

Length of output: 1348


🏁 Script executed:

rg "format_function_name" --type py -B 2 -A 3

Repository: spcl/serverless-benchmarks

Length of output: 2585


🌐 Web query:

Cloudflare Workers script naming rules requirements

💡 Result:

Summary — Worker script name rules

  • Allowed characters: letters (A–Z, a–z), digits (0–9) and dashes (-). Do not use underscores. [1][2]
  • Maximum length: up to 255 characters for a Worker name. [1]
  • If you use a workers.dev subdomain (workers.dev URL / DNS label): name must be ≤ 63 characters and cannot start or end with a dash. [2]
  • The name is required in wrangler config (name property / script_name used in API/URLs). [1][3]

Sources: Cloudflare Workers / Wrangler documentation. [1] Wranger configuration (name rules). [2] workers.dev routing limits. [3] Cloudflare Workers Scripts API reference.


Apply the digit-prefix rule to all workers, not just container deployments.
The function's docstring documents that worker names must not start with a digit, but the current implementation only enforces this when container_deployment=True. Benchmark names like 311.compression will still produce invalid worker names for native deployments.

🔧 Suggested fix
-        if container_deployment and formatted and formatted[0].isdigit():
-            formatted = 'container-' + formatted
+        if formatted and formatted[0].isdigit():
+            prefix = "container-" if container_deployment else "w-"
+            formatted = prefix + formatted
🤖 Prompt for AI Agents
In `@sebs/cloudflare/cloudflare.py` around lines 1451 - 1479, The digit-prefix
rule must apply to all workers: in format_function_name, after computing
formatted (and stripping hyphens), check if formatted is non-empty and starts
with a digit; if so, prefix it appropriately — use 'container-' when
container_deployment is True and 'w-' (or another agreed native-worker prefix)
when container_deployment is False — instead of only handling the
container_deployment branch; update the conditional that currently checks "if
container_deployment and formatted and formatted[0].isdigit()" to always enforce
the digit-prefix rule based on formatted and container_deployment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@sebs/cloudflare/cli.py`:
- Around line 103-110: The readiness loop in cli.py (where
docker_instance.logs(stream=True, follow=True) is iterated via dkg and
next(dkg).decode(...)) can block forever; add a timeout (e.g., max_wait_seconds)
and track elapsed time (using time.time()) inside the while True to break and
raise/return after the timeout, and also sleep briefly on StopIteration to avoid
busy looping; ensure you update the block that assigns dkg and calls next(dkg)
so it stops with a clear error/exception when the timeout elapses instead of
hanging.

In `@sebs/cloudflare/cloudflare.py`:
- Around line 354-365: The requests.get call should include a timeout and the
bare except should be replaced with specific exception handling: call
requests.get(url, headers=headers, timeout=<reasonable_seconds>) and catch
JSONDecodeError (or ValueError) when calling response.json() instead of using a
bare except; also consider catching requests.exceptions.RequestException around
the requests.get call and log errors via self.logging.warning including
response.status_code or exception details to aid debugging (refer to
requests.get, response.json(), and self.logging.warning in the surrounding
method).

In `@sebs/cloudflare/containers.py`:
- Around line 356-362: Update the exception handling and tar extraction call in
the tarfile.open block: preserve the original exception by re-raising
RuntimeError using "from e" (i.e., raise RuntimeError(...) from e) so the
exception chain is kept, and when calling tar.extractall(directory) pass the
recommended filter parameter (e.g., filter='data' or an equivalent safe filter)
to harden extraction; locate the tarfile.open(...) block and the except
Exception as e handling that uses self.logging.error/f"npm install failed: {e}"
and change the extractall call and the raise accordingly.
♻️ Duplicate comments (6)
sebs/cloudflare/cloudflare.py (4)

80-90: Fix mutable default argument.

The config parameter uses a mutable default {} which can cause unexpected behavior if the same dict instance is reused across calls. Additionally, the parameter is unused in the method body.

♻️ Proposed fix
-    def initialize(self, config: Dict[str, str] = {}, resource_prefix: Optional[str] = None):
+    def initialize(self, config: Optional[Dict[str, str]] = None, resource_prefix: Optional[str] = None):
         """
         Initialize the Cloudflare Workers platform.
 
         Args:
             config: Additional configuration parameters
             resource_prefix: Prefix for resource naming
         """
+        if config is None:
+            config = {}
         # Verify credentials are valid
         self._verify_credentials()

155-161: Add timeout to requests call.

The requests.get() call lacks a timeout parameter, which can cause the application to hang indefinitely if the Cloudflare API is unresponsive.

🔒 Proposed fix
-        response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers)
+        response = requests.get(
+            f"{self._api_base_url}/user/tokens/verify",
+            headers=headers,
+            timeout=30
+        )

455-482: Add timeout to requests call.

The requests.get() call on line 458 lacks a timeout parameter, which can cause the application to hang indefinitely when fetching the workers.dev subdomain.

🔒 Proposed fix
-            resp = requests.get(url, headers=headers)
+            resp = requests.get(url, headers=headers, timeout=30)

610-638: Apply digit-prefix rule to all workers, not just containers.

The docstring correctly states worker names must not start with a digit, but the implementation only enforces this for container_deployment=True. Benchmark names like 311.compression will produce invalid worker names for native deployments.

🐛 Proposed fix
-        if container_deployment and formatted and formatted[0].isdigit():
-            formatted = 'container-' + formatted
+        if formatted and formatted[0].isdigit():
+            prefix = "container-" if container_deployment else "w-"
+            formatted = prefix + formatted
docs/platforms.md (2)

229-230: Replace {account} with {subdomain} in workers.dev URL.

The workers.dev URL format uses an account subdomain, not an account ID. The placeholder should be {subdomain} to match Cloudflare's actual URL structure.

✏️ Proposed fix
-- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{account}.workers.dev`
+- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{subdomain}.workers.dev`

236-246: Add blank lines around the table for markdown compliance.

The table should be surrounded by blank lines to satisfy markdownlint rule MD058.

✏️ Proposed fix
 - **Memory/Timeout Configuration (Containers)**: Managed by Cloudflare, available in different tiers:
 
+
   | Instance Type | vCPU | Memory | Disk |
   |---------------|------|--------|------|
   | lite | 1/16 | 256 MiB | 2 GB |
   | basic | 1/4 | 1 GiB | 4 GB |
   | standard-1 | 1/2 | 4 GiB | 8 GB |
   | standard-2 | 1 | 6 GiB | 12 GB |
   | standard-3 | 2 | 8 GiB | 16 GB |
   | standard-4 | 4 | 12 GiB | 20 GB |
+
 - **Metrics Collection**: ...
🧹 Nitpick comments (20)
sebs/cloudflare/durable_objects.py (2)

143-172: Consider logging a warning when data seeding is skipped.

The write_to_table method silently skips data seeding due to platform limitations. Per a past reviewer comment, this silent skip could create a false impression that data was written. Consider adding a warning log to inform users that the operation was not performed.

♻️ Proposed fix
         table_name = self._get_table_name(benchmark, table)
         
         if not table_name:
             raise ValueError(f"Table {table} not found for benchmark {benchmark}")
         
-        # Silently skip data seeding for Cloudflare Durable Objects
-        # This is a platform limitation
-        pass
+        # Data seeding is not supported for Cloudflare Durable Objects
+        # This is a platform limitation - Durable Objects can only be written to from within Workers
+        self.logging.warning(
+            f"Skipping data seeding for table '{table}' - Cloudflare Durable Objects "
+            f"cannot be pre-populated via external API. Use 'large' benchmark size "
+            f"which creates data at runtime."
+        )

183-183: Remove extraneous f prefix from static string.

This f-string has no placeholders, making the f prefix unnecessary.

♻️ Proposed fix
-        self.logging.warning(f"Durable Objects data is managed within the Worker")
+        self.logging.warning("Durable Objects data is managed within the Worker")
sebs/cloudflare/cloudflare.py (2)

438-441: Preserve exception chain with raise ... from e.

When re-raising exceptions, use raise ... from e to preserve the exception chain for better debugging.

♻️ Proposed fix
         except RuntimeError as e:
             error_msg = f"Wrangler deployment failed for worker {worker_name}: {str(e)}"
-            self.logging.error(error_msg)
-            raise RuntimeError(error_msg)
+            self.logging.exception(error_msg)
+            raise RuntimeError(error_msg) from e

431-434: Consider removing or making the hardcoded 60-second sleep configurable.

After waiting for the Durable Object to report ready via the health endpoint, there's an additional unconditional 60-second sleep. This significantly increases deployment time. If this is truly needed as a safety net, consider making it configurable or reducing it based on observed provisioning times.

sebs/cloudflare/containers.py (4)

458-465: Preserve exception chain and use logging.exception.

When re-raising exceptions, use raise ... from e to preserve the exception chain. Also, logging.exception automatically includes traceback information.

♻️ Proposed fix
         except docker.errors.BuildError as e:
             error_msg = f"Docker build failed for {image_tag}: {e}"
-            self.logging.error(error_msg)
-            raise RuntimeError(error_msg)
+            self.logging.exception(error_msg)
+            raise RuntimeError(error_msg) from e
         except Exception as e:
             error_msg = f"Unexpected error building Docker image {image_tag}: {e}"
-            self.logging.error(error_msg)
-            raise RuntimeError(error_msg)
+            self.logging.exception(error_msg)
+            raise RuntimeError(error_msg) from e

489-490: Remove unused variables.

consecutive_failures and max_consecutive_failures are defined but never used. Remove them or implement the intended retry logic.

♻️ Proposed fix - remove unused variables
-        consecutive_failures = 0
-        max_consecutive_failures = 5
-        
         while time.time() - start_time < max_wait_seconds:

252-252: Remove extraneous f prefix from static string.

This f-string has no placeholders.

♻️ Proposed fix
-            self.logging.info(f"Copied worker.js orchestration file from nodejs/container")
+            self.logging.info("Copied worker.js orchestration file from nodejs/container")

351-355: Prefix unused variable with underscore.

The stat variable from get_archive is unused.

♻️ Proposed fix
-            bits, stat = cli.docker_instance.get_archive(f"{container_path}/node_modules")
+            bits, _stat = cli.docker_instance.get_archive(f"{container_path}/node_modules")
dockerfiles/cloudflare/Dockerfile.manage (1)

24-25: Consider pinning uv version for reproducible builds.

The uv installer script fetches the latest version by default, which could lead to build inconsistencies over time. Pinning to a specific version improves reproducibility.

♻️ Example with version pinning
-RUN curl -LsSf https://astral.sh/uv/install.sh | sh && \
+RUN curl -LsSf https://astral.sh/uv/0.9.26/install.sh | sh && \
     /root/.local/bin/uv tool install workers-py
sebs/cloudflare/cli.py (4)

51-79: Add exception chaining for better debugging.

When re-raising exceptions within except blocks, chain them to preserve the original traceback. This helps with debugging by showing the full error context.

Proposed fix
                 if not os.path.exists(dockerfile_path):
                     raise RuntimeError(
                         f"Dockerfile not found at {dockerfile_path}. "
                         "Cannot build Cloudflare CLI container."
-                    )
+                    ) from None
                 
                 # ... build logic ...
                 
                 except docker.errors.BuildError as build_error:
                     raise RuntimeError(
                         f"Failed to build Docker image {full_image_name}: {build_error}"
-                    )
+                    ) from build_error

28-40: Inconsistent logging usage.

The code mixes module-level logging (lines 30, 34, 39, 59, 75) with self.logging (line 101). Since the class inherits from LoggingBase, prefer using self.logging consistently for structured logging.

Also applies to: 101-101


116-116: Use explicit Optional type hint.

PEP 484 prohibits implicit Optional. Update the type hint to be explicit.

Proposed fix
-    def execute(self, cmd: str, env: dict = None):
+    def execute(self, cmd: str, env: dict | None = None):

Apply the same fix to wrangler_deploy (line 187) and pywrangler_deploy (line 202).


231-244: Consider caching for faster Docker builds.

The --no-cache flag ensures reproducible builds but significantly increases build time. Consider making this configurable or using --no-cache only for production deployments.

sebs/cloudflare/workers.py (7)

8-12: Unused import: json.

The json module is imported but not used in this file.

Proposed fix
 import os
 import shutil
-import json
 import io
 import tarfile

62-65: Unused parameter container_uri.

This parameter is declared but never used. If it's a placeholder for future functionality, consider documenting it or removing it until needed.

Options

Either remove the unused parameter:

     def generate_wrangler_toml(
         self,
         worker_name: str,
         package_dir: str,
         language: str,
         account_id: str,
         benchmark_name: Optional[str] = None,
         code_package: Optional[Benchmark] = None,
-        container_uri: str = "",
     ) -> str:

Or add a TODO comment explaining its intended future use.


142-149: Fragile TOML writing fallback.

Catching TypeError to distinguish between tomli_w.dump() (binary) and toml.dumps() (text) is brittle. A TypeError could occur for other reasons, masking real bugs.

Proposed fix using explicit check
         # Write wrangler.toml to package directory
         toml_path = os.path.join(package_dir, "wrangler.toml")
-        try:
-            # Try tomli_w (writes binary)
-            with open(toml_path, 'wb') as f:
-                tomli_w.dump(config, f)
-        except TypeError:
-            # Fallback to toml library (writes text)
-            with open(toml_path, 'w') as f:
-                f.write(tomli_w.dumps(config))
+        if hasattr(tomli_w, 'dump'):
+            # tomli_w writes binary
+            with open(toml_path, 'wb') as f:
+                tomli_w.dump(config, f)
+        else:
+            # toml library writes text via dumps()
+            with open(toml_path, 'w') as f:
+                f.write(tomli_w.dumps(config))

213-215: Improve exception handling with chaining and proper logging.

Use exception chaining (from e) and logging.exception() for better debugging context.

Proposed fix
                 except Exception as e:
-                    self.logging.error(f"npm install in container failed: {e}")
-                    raise RuntimeError(f"Failed to install Node.js dependencies: {e}")
+                    self.logging.exception(f"npm install in container failed: {e}")
+                    raise RuntimeError(f"Failed to install Node.js dependencies: {e}") from e

Apply the same pattern at lines 240-242:

                     except Exception as e:
-                        self.logging.error(f"Failed to install esbuild: {e}")
-                        raise RuntimeError(f"Failed to install esbuild: {e}")
+                        self.logging.exception(f"Failed to install esbuild: {e}")
+                        raise RuntimeError(f"Failed to install esbuild: {e}") from e

Also applies to: 240-242


208-209: tarfile.extractall() has potential path traversal risk.

While the tar archive comes from a controlled container, consider using the filter parameter (Python 3.12+) or manual member validation to mitigate path traversal attacks for defense in depth.

Proposed safer extraction (Python 3.12+)
                     with tarfile.open(fileobj=file_obj) as tar:
-                        tar.extractall(directory)
+                        tar.extractall(directory, filter='data')

For older Python versions, you can validate members before extraction.

Also applies to: 236-237


154-161: Unused parameter is_cached.

The is_cached parameter is never used. It could be leveraged to skip dependency installation for cached packages, improving performance.


263-293: Consider externalizing the hardcoded Pyodide package list.

This list of ~200 packages will become outdated as Pyodide releases new versions. Pyodide publishes its supported packages via a versioned repodata.json at https://cdn.jsdelivr.net/pyodide/vX.Y.Z/full/repodata.json. Consider:

  1. Fetching the package list at build time from the appropriate Pyodide version's repodata.json
  2. Storing in a separate configuration file that can be updated alongside Pyodide version upgrades
  3. Adding a comment with a link to the Pyodide package index

Comment on lines +103 to +110
# Wait for container to be ready
while True:
try:
dkg = self.docker_instance.logs(stream=True, follow=True)
next(dkg).decode("utf-8")
break
except StopIteration:
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Readiness wait loop could block indefinitely.

The while True loop has no timeout mechanism. If the container doesn't produce logs (e.g., due to a startup failure or misconfiguration), this will hang forever.

Proposed fix with timeout
+        import time
+        
         # Wait for container to be ready
+        max_wait_seconds = 30
+        start_time = time.time()
         while True:
+            if time.time() - start_time > max_wait_seconds:
+                raise RuntimeError(
+                    f"Container {self.docker_instance.id} did not become ready within {max_wait_seconds}s"
+                )
             try:
                 dkg = self.docker_instance.logs(stream=True, follow=True)
                 next(dkg).decode("utf-8")
                 break
             except StopIteration:
-                pass
+                time.sleep(0.1)  # Brief sleep before retry
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Wait for container to be ready
while True:
try:
dkg = self.docker_instance.logs(stream=True, follow=True)
next(dkg).decode("utf-8")
break
except StopIteration:
pass
import time
# Wait for container to be ready
max_wait_seconds = 30
start_time = time.time()
while True:
if time.time() - start_time > max_wait_seconds:
raise RuntimeError(
f"Container {self.docker_instance.id} did not become ready within {max_wait_seconds}s"
)
try:
dkg = self.docker_instance.logs(stream=True, follow=True)
next(dkg).decode("utf-8")
break
except StopIteration:
time.sleep(0.1) # Brief sleep before retry
🤖 Prompt for AI Agents
In `@sebs/cloudflare/cli.py` around lines 103 - 110, The readiness loop in cli.py
(where docker_instance.logs(stream=True, follow=True) is iterated via dkg and
next(dkg).decode(...)) can block forever; add a timeout (e.g., max_wait_seconds)
and track elapsed time (using time.time()) inside the while True to break and
raise/return after the timeout, and also sleep briefly on StopIteration to avoid
busy looping; ensure you update the block that assigns dkg and calls next(dkg)
so it stops with a clear error/exception when the timeout elapses instead of
hanging.

Comment on lines +354 to +365
response = requests.get(url, headers=headers)

if response.status_code == 200:
try:
return response.json().get("result")
except:
return None
elif response.status_code == 404:
return None
else:
self.logging.warning(f"Unexpected response checking worker: {response.status_code}")
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout and replace bare except with specific exception.

Two issues:

  1. The requests.get() call lacks a timeout parameter
  2. The bare except: clause catches all exceptions including system-exiting ones
🔒 Proposed fix
-        response = requests.get(url, headers=headers)
+        response = requests.get(url, headers=headers, timeout=30)
 
         if response.status_code == 200:
             try:
                 return response.json().get("result")
-            except:
+            except (json.JSONDecodeError, ValueError, KeyError):
                 return None
🧰 Tools
🪛 Ruff (0.14.13)

354-354: Probable use of requests call without timeout

(S113)


359-359: Do not use bare except

(E722)

🤖 Prompt for AI Agents
In `@sebs/cloudflare/cloudflare.py` around lines 354 - 365, The requests.get call
should include a timeout and the bare except should be replaced with specific
exception handling: call requests.get(url, headers=headers,
timeout=<reasonable_seconds>) and catch JSONDecodeError (or ValueError) when
calling response.json() instead of using a bare except; also consider catching
requests.exceptions.RequestException around the requests.get call and log errors
via self.logging.warning including response.status_code or exception details to
aid debugging (refer to requests.get, response.json(), and self.logging.warning
in the surrounding method).

Comment on lines +356 to +362
with tarfile.open(fileobj=file_obj) as tar:
tar.extractall(directory)

self.logging.info(f"Downloaded node_modules to {directory} for wrangler deployment")
except Exception as e:
self.logging.error(f"npm install failed: {e}")
raise RuntimeError(f"Failed to install Node.js dependencies: {e}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Python tarfile extractall filter parameter security CVE-2007-4559

💡 Result:

Short answer:

  • CVE-2007-4559 is an old directory‑traversal issue in Python’s tarfile.extract()/extractall() where “..” paths in a TAR can overwrite files outside the target directory. [1]
  • Separately, recent (2024–2025) CVEs/alerts describe bypasses of the newer tarfile extraction "filter" feature (added in Python 3.12, defaulting to "data" in 3.14) that let specially crafted archives cause writes or metadata changes outside the extraction directory when using extract()/extractall(filter="data" or "tar"). See advisories for CVE-2024-12718 / CVE-2025-4517 and related GHSA entries. [2][3]

Mitigation summary:

  • Do not extract untrusted tarballs with tarfile.extract()/extractall() without explicit, safe checks.
  • Avoid using filter="data" or filter="tar" on untrusted input (or set filter="none"/omit the filter) until you’ve updated to a patched Python.
  • Apply vendor/CPython security patches or upgrade to patched Python releases when available.
  • As an extra defense, validate each TarInfo (sanitize path, forbid absolute paths and any .. components, reject symlinks/device nodes) and extract into a safe temporary directory, then move validated files into place.

Sources:
[1] NVD entry for CVE-2007-4559 (tarfile directory traversal).
[2] OSS‑security / Python security announcement summarizing multiple tarfile CVEs (June 2025).
[3] CVE writeups for the extraction‑filter bypass (CVE-2025-4517 / CVE-2024-12718) and related advisories.


🏁 Script executed:

# Locate and view the actual code in the repository
find . -name "containers.py" -path "*/cloudflare/*" | head -5

Repository: spcl/serverless-benchmarks

Length of output: 101


🏁 Script executed:

# View the actual code at lines 356-362
sed -n '350,365p' ./sebs/cloudflare/containers.py

Repository: spcl/serverless-benchmarks

Length of output: 845


🏁 Script executed:

# Check Python version requirements and context around the code
# Look for Python version specifications in the repo
find . -name "setup.py" -o -name "pyproject.toml" -o -name ".python-version" | head -10

Repository: spcl/serverless-benchmarks

Length of output: 159


🏁 Script executed:

# Check how the tar data is sourced and if it's trusted
sed -n '340,370p' ./sebs/cloudflare/containers.py | cat -n

Repository: spcl/serverless-benchmarks

Length of output: 1830


🏁 Script executed:

# Check Python version requirements in the project
cat pyproject.toml 2>/dev/null || cat setup.py 2>/dev/null | grep -A 5 "python" | head -15

Repository: spcl/serverless-benchmarks

Length of output: 156


🏁 Script executed:

# Look for other tarfile.extractall usages to see if there's a pattern
rg "tarfile.extractall|tar\.extractall" -B 2 -A 2

Repository: spcl/serverless-benchmarks

Length of output: 1686


Add exception chaining and consider filter parameter for tarfile.extractall().

Two improvements:

  1. Exception chaining: Add from e to preserve the exception chain for better debugging (line 362).
  2. Tarfile filter: The tar data comes from cli.docker_instance.get_archive() (internal Docker container), which is a trusted source; however, adding an explicit filter parameter like filter='data' follows defense-in-depth principles and is recommended for Python 3.12+.
Suggested changes
             with tarfile.open(fileobj=file_obj) as tar:
-                tar.extractall(directory)
+                tar.extractall(directory, filter='data')
             
             self.logging.info(f"Downloaded node_modules to {directory} for wrangler deployment")
         except Exception as e:
             self.logging.error(f"npm install failed: {e}")
-            raise RuntimeError(f"Failed to install Node.js dependencies: {e}")
+            raise RuntimeError(f"Failed to install Node.js dependencies: {e}") from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with tarfile.open(fileobj=file_obj) as tar:
tar.extractall(directory)
self.logging.info(f"Downloaded node_modules to {directory} for wrangler deployment")
except Exception as e:
self.logging.error(f"npm install failed: {e}")
raise RuntimeError(f"Failed to install Node.js dependencies: {e}")
with tarfile.open(fileobj=file_obj) as tar:
tar.extractall(directory, filter='data')
self.logging.info(f"Downloaded node_modules to {directory} for wrangler deployment")
except Exception as e:
self.logging.error(f"npm install failed: {e}")
raise RuntimeError(f"Failed to install Node.js dependencies: {e}") from e
🧰 Tools
🪛 Ruff (0.14.13)

357-357: Uses of tarfile.extractall()

(S202)


360-360: Do not catch blind exception: Exception

(BLE001)


361-361: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


362-362: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


362-362: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@sebs/cloudflare/containers.py` around lines 356 - 362, Update the exception
handling and tar extraction call in the tarfile.open block: preserve the
original exception by re-raising RuntimeError using "from e" (i.e., raise
RuntimeError(...) from e) so the exception chain is kept, and when calling
tar.extractall(directory) pass the recommended filter parameter (e.g.,
filter='data' or an equivalent safe filter) to harden extraction; locate the
tarfile.open(...) block and the except Exception as e handling that uses
self.logging.error/f"npm install failed: {e}" and change the extractall call and
the raise accordingly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@sebs/cloudflare/containers.py`:
- Around line 283-294: The regex replacement for "from . import " can rewrite
occurrences inside strings/comments and corrupt code; instead parse each Python
file (use the ast module) and transform ast.ImportFrom nodes where node.level ==
1 and node.module is None or '' into equivalent ast.Import nodes (preserve alias
names), then unparse the AST back to source before writing to file_path; operate
only when language_name == "python" and skip files in the existing exclusion
list (handler.py, storage.py, nosql.py, worker.py) to replicate the current
behavior.
♻️ Duplicate comments (1)
sebs/cloudflare/containers.py (1)

356-362: Add tarfile filter and exception chaining (previously flagged).

This addresses security (CVE-2007-4559 and related) and improves debugging with proper exception chaining.

♻️ Suggested fix
             with tarfile.open(fileobj=file_obj) as tar:
-                tar.extractall(directory)
+                tar.extractall(directory, filter='data')
             
             self.logging.info(f"Downloaded node_modules to {directory} for wrangler deployment")
         except Exception as e:
-            self.logging.error(f"npm install failed: {e}")
-            raise RuntimeError(f"Failed to install Node.js dependencies: {e}")
+            self.logging.exception("npm install failed")
+            raise RuntimeError(f"Failed to install Node.js dependencies: {e}") from e
🧹 Nitpick comments (10)
sebs/cloudflare/containers.py (10)

61-70: Remove or document unused parameters.

The language (line 65) and container_uri (line 69) parameters are declared but never used in the method body. Either remove them or add a comment explaining they're reserved for future use.

♻️ Suggested fix
     def generate_wrangler_toml(
         self,
         worker_name: str,
         package_dir: str,
-        language: str,
         account_id: str,
         benchmark_name: Optional[str] = None,
         code_package: Optional[Benchmark] = None,
-        container_uri: str = "",
     ) -> str:

110-122: Accessing private _tables attribute and missing key safety checks.

  1. nosql_storage._tables accesses a private attribute that may change without notice.
  2. Lines 116 and 122 assume config['durable_objects']['bindings'] and config['migrations'][0]['new_sqlite_classes'] exist in the template. If the template structure changes, this will raise KeyError/IndexError.
♻️ Suggested improvements
-                nosql_tables = nosql_storage._tables.get(benchmark_name, {})
+                # Consider exposing a public method like get_tables(benchmark_name)
+                nosql_tables = nosql_storage._tables.get(benchmark_name, {})
                 
                 # Add durable object bindings for each nosql table
                 for table_name in nosql_tables.keys():
+                    if 'durable_objects' not in config:
+                        config['durable_objects'] = {'bindings': []}
+                    if 'bindings' not in config['durable_objects']:
+                        config['durable_objects']['bindings'] = []
                     config['durable_objects']['bindings'].append({
                         'name': table_name,
                         'class_name': 'KVApiObject'
                     })
                 
                 # Update migrations to include KVApiObject
+                if 'migrations' in config and config['migrations']:
+                    if 'new_sqlite_classes' not in config['migrations'][0]:
+                        config['migrations'][0]['new_sqlite_classes'] = []
                 config['migrations'][0]['new_sqlite_classes'].append('KVApiObject')

143-147: Broad exception handling obscures specific failures.

Catching bare Exception here makes debugging difficult. Consider catching more specific exceptions (e.g., KeyError, AttributeError) or at minimum, log the exception type.

♻️ Suggested fix
-        except Exception as e:
+        except (KeyError, AttributeError) as e:
             self.logging.warning(
                 f"R2 bucket binding not configured: {e}. "
                 f"Benchmarks requiring file access will not work properly."
             )

250-252: Remove extraneous f-string prefix.

Line 252 uses an f-string without any placeholders.

♻️ Suggested fix
-            self.logging.info(f"Copied worker.js orchestration file from nodejs/container")
+            self.logging.info("Copied worker.js orchestration file from nodejs/container")

296-303: Remove dead code block.

The Node.js transformation block contains only pass with a comment suggesting future work. This is dead code that should be removed or implemented.

♻️ Suggested fix
-        # For Node.js containers, transform benchmark code to be async-compatible
-        # The container wrapper uses async HTTP calls, but benchmarks expect sync
-        elif language_name == "nodejs":
-            for item in os.listdir(directory):
-                if item.endswith('.js') and item not in ['handler.js', 'storage.js', 'nosql.js', 'worker.js', 'build.js', 'request-polyfill.js']:
-                    file_path = os.path.join(directory, item)
-                    # Could add transformations here if needed
-                    pass

351-351: Rename unused variable to indicate intent.

The stat variable from get_archive() is never used.

♻️ Suggested fix
-            bits, stat = cli.docker_instance.get_archive(f"{container_path}/node_modules")
+            bits, _stat = cli.docker_instance.get_archive(f"{container_path}/node_modules")

406-411: Rename unused loop variable.

The dirnames variable is not used within the loop body.

♻️ Suggested fix
-        for dirpath, dirnames, filenames in os.walk(directory):
+        for dirpath, _dirnames, filenames in os.walk(directory):

458-465: Improve exception handling with proper chaining and logging.

Use raise ... from e to preserve exception chains and logging.exception to automatically include stack traces.

♻️ Suggested fix
         except docker.errors.BuildError as e:
             error_msg = f"Docker build failed for {image_tag}: {e}"
-            self.logging.error(error_msg)
-            raise RuntimeError(error_msg)
+            self.logging.exception(error_msg)
+            raise RuntimeError(error_msg) from e
         except Exception as e:
             error_msg = f"Unexpected error building Docker image {image_tag}: {e}"
-            self.logging.error(error_msg)
-            raise RuntimeError(error_msg)
+            self.logging.exception(error_msg)
+            raise RuntimeError(error_msg) from e

467-471: Unused parameter worker_name.

The worker_name parameter is declared but never used. Consider using it in log messages for better observability or remove it.

♻️ Suggested fix - use in logging
-        self.logging.info("Checking container Durable Object readiness via health endpoint...")
+        self.logging.info(f"Checking container Durable Object readiness for {worker_name} via health endpoint...")

Or remove the parameter if not needed.


489-490: Remove unused circuit breaker variables.

consecutive_failures and max_consecutive_failures are assigned but never used. Either implement the intended circuit breaker logic or remove these variables.

♻️ Option 1: Remove unused variables
-        consecutive_failures = 0
-        max_consecutive_failures = 5
♻️ Option 2: Implement circuit breaker pattern
         consecutive_failures = 0
         max_consecutive_failures = 5
         
         while time.time() - start_time < max_wait_seconds:
             try:
                 response = requests.get(
                     f"{worker_url}/health",
                     timeout=60
                 )
                 
                 if response.status_code == 200:
                     self.logging.info("Container Durable Object is ready!")
                     return True
                 elif response.status_code == 503:
+                    consecutive_failures = 0  # Reset on expected "not ready" response
                     # ... existing logging ...
                 else:
+                    consecutive_failures += 1
                     self.logging.warning(...)
+                    if consecutive_failures >= max_consecutive_failures:
+                        self.logging.error("Too many consecutive failures, aborting wait")
+                        return False
                     
             except requests.exceptions.Timeout:
+                consecutive_failures += 1
                 # ... existing logging ...
             except requests.exceptions.RequestException as e:
+                consecutive_failures += 1
                 # ... existing logging ...
+            
+            if consecutive_failures >= max_consecutive_failures:
+                self.logging.error("Too many consecutive failures, aborting wait")
+                return False

Comment on lines +283 to +294
# For Python containers, fix relative imports in benchmark code
# Containers use flat structure, so "from . import storage" must become "import storage"
if language_name == "python":
for item in os.listdir(directory):
if item.endswith('.py') and item not in ['handler.py', 'storage.py', 'nosql.py', 'worker.py']:
file_path = os.path.join(directory, item)
with open(file_path, 'r') as f:
content = f.read()
# Fix relative imports
content = re.sub(r'from \. import ', 'import ', content)
with open(file_path, 'w') as f:
f.write(content)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Regex-based import rewriting may produce false positives.

The regex from \. import could match occurrences inside string literals or comments, potentially corrupting benchmark code. Consider using AST-based transformation or a more targeted approach.

♻️ Safer alternative using line-by-line check
-                    # Fix relative imports
-                    content = re.sub(r'from \. import ', 'import ', content)
+                    # Fix relative imports (only at line start, not in strings/comments)
+                    lines = content.split('\n')
+                    fixed_lines = []
+                    for line in lines:
+                        stripped = line.lstrip()
+                        if stripped.startswith('from . import '):
+                            line = line.replace('from . import ', 'import ', 1)
+                        fixed_lines.append(line)
+                    content = '\n'.join(fixed_lines)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# For Python containers, fix relative imports in benchmark code
# Containers use flat structure, so "from . import storage" must become "import storage"
if language_name == "python":
for item in os.listdir(directory):
if item.endswith('.py') and item not in ['handler.py', 'storage.py', 'nosql.py', 'worker.py']:
file_path = os.path.join(directory, item)
with open(file_path, 'r') as f:
content = f.read()
# Fix relative imports
content = re.sub(r'from \. import ', 'import ', content)
with open(file_path, 'w') as f:
f.write(content)
# For Python containers, fix relative imports in benchmark code
# Containers use flat structure, so "from . import storage" must become "import storage"
if language_name == "python":
for item in os.listdir(directory):
if item.endswith('.py') and item not in ['handler.py', 'storage.py', 'nosql.py', 'worker.py']:
file_path = os.path.join(directory, item)
with open(file_path, 'r') as f:
content = f.read()
# Fix relative imports (only at line start, not in strings/comments)
lines = content.split('\n')
fixed_lines = []
for line in lines:
stripped = line.lstrip()
if stripped.startswith('from . import '):
line = line.replace('from . import ', 'import ', 1)
fixed_lines.append(line)
content = '\n'.join(fixed_lines)
with open(file_path, 'w') as f:
f.write(content)
🤖 Prompt for AI Agents
In `@sebs/cloudflare/containers.py` around lines 283 - 294, The regex replacement
for "from . import " can rewrite occurrences inside strings/comments and corrupt
code; instead parse each Python file (use the ast module) and transform
ast.ImportFrom nodes where node.level == 1 and node.module is None or '' into
equivalent ast.Import nodes (preserve alias names), then unparse the AST back to
source before writing to file_path; operate only when language_name == "python"
and skip files in the existing exclusion list (handler.py, storage.py, nosql.py,
worker.py) to replicate the current behavior.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@sebs/cloudflare/r2.py`:
- Around line 12-14: The typename() static method incorrectly returns
"Cloudlfare.R2"; update the return value in typename() (in sebs.cloudflare.r2:
typename) to the correct string "Cloudflare.R2" so the provider name is spelled
properly.
- Around line 347-354: The clean_bucket method currently logs a warning and
returns silently which can mislead callers (similar to download); update the
implementation to raise NotImplementedError instead of silently passing so
callers get an explicit failure. Modify the clean_bucket function (and any
similar stubs like download) to raise NotImplementedError with a clear message
such as "clean_bucket not implemented for R2 bucket {bucket_name}" and remove or
keep the warning log as supplementary; ensure the method signature and docstring
remain unchanged.
- Around line 368-380: The requests.delete call lacks a timeout; update the HTTP
call to include the same timeout used elsewhere in this module by adding the
timeout keyword argument to requests.delete(delete_bucket_uri,
headers=self._get_auth_headers(), timeout=<module timeout value>), where <module
timeout value> should be the existing timeout constant or attribute used by
other calls in this file (for example self._timeout or DEFAULT_TIMEOUT), so that
the delete request for delete_bucket_uri follows the same timeout behavior as
other requests.
- Around line 87-106: The requests.post call in the R2 bucket creation (inside
the method using create_bucket_uri and create_bucket_response) needs a timeout
to avoid hangs and the inner bare except should be replaced with specific
exception handling; add a timeout argument to requests.post (e.g. timeout=10 or
a configurable constant) and wrap the response.json() call in a try/except that
catches json.JSONDecodeError (or ValueError) only, logging
create_bucket_response.text when JSON parsing fails; also consider catching
requests.exceptions.RequestException around the POST to log and re-raise or
handle network errors instead of letting it hang.
- Around line 122-133: The download method currently only logs a warning and
returns silently, which can mislead callers; replace the silent pass with
raising a NotImplementedError (include a clear message referencing the
bucket_name and key) so callers receive an explicit failure, keep the method
signature download(self, bucket_name: str, key: str, filepath: str) unchanged
and remove the trailing pass; retain or refine the log message if desired but
ensure the NotImplementedError is raised.
- Around line 58-60: The _create_bucket function currently uses a mutable
default for buckets and includes an unused randomize_name parameter; change the
signature of _create_bucket(name: str, buckets: Optional[list[str]] = None) (or
remove the buckets default entirely) and inside the function initialize buckets
= [] if buckets is None to avoid shared mutable state, and either remove the
randomize_name parameter or keep it but mark it with a TODO and do not rely on
it; update any callers of _create_bucket to pass an explicit list if they relied
on the previous default.
- Around line 299-316: The requests.get call to list_buckets_uri needs a timeout
and the error handling must avoid a bare except and properly capture JSON parse
errors and include the exception text in the log; update the call in the method
that calls requests.get(list_buckets_uri, headers=self._get_auth_headers()) to
pass a sensible timeout (e.g. timeout=10), replace the bare except with a
specific exception handler (catch json.JSONDecodeError or ValueError as e) when
calling response.json(), and include the exception details in the
self.logging.error message (or use self.logging.exception) so the log contains
the actual parse error while still returning [] on 403.
♻️ Duplicate comments (1)
sebs/cloudflare/r2.py (1)

145-232: Significant code duplication in S3 client initialization.

The S3 client initialization logic (lines 163-170, 210-217, and 261-268) is repeated across upload, upload_bytes, and list_bucket methods. Extract this into a helper method for maintainability.

♻️ Proposed refactor

Add a helper method and refactor the upload methods:

+    def _get_s3_client(self):
+        """Get or create S3-compatible client for R2 operations."""
+        if not self._credentials.r2_access_key_id or not self._credentials.r2_secret_access_key:
+            return None
+        
+        import boto3
+        from botocore.config import Config
+        
+        account_id = self._credentials.account_id
+        return boto3.client(
+            's3',
+            endpoint_url=f'https://{account_id}.r2.cloudflarestorage.com',
+            aws_access_key_id=self._credentials.r2_access_key_id,
+            aws_secret_access_key=self._credentials.r2_secret_access_key,
+            config=Config(signature_version='s3v4'),
+            region_name='auto'
+        )
+
     def upload(self, bucket_name: str, filepath: str, key: str):
         ...
         try:
-            import boto3
-            from botocore.config import Config
-            
-            account_id = self._credentials.account_id
-            
-            # R2 uses S3-compatible API, but requires special configuration
-            # The endpoint is: https://<account_id>.r2.cloudflarestorage.com
-            # You need to create R2 API tokens in the Cloudflare dashboard
-            
-            # Check if we have S3-compatible credentials
-            if not self._credentials.r2_access_key_id or not self._credentials.r2_secret_access_key:
+            s3_client = self._get_s3_client()
+            if s3_client is None:
                 self.logging.warning(
                     "R2 upload requires S3-compatible API credentials (r2_access_key_id, r2_secret_access_key). "
                     "File upload skipped. Set CLOUDFLARE_R2_ACCESS_KEY_ID and CLOUDFLARE_R2_SECRET_ACCESS_KEY."
                 )
                 return
-            
-            s3_client = boto3.client(
-                's3',
-                endpoint_url=f'https://{account_id}.r2.cloudflarestorage.com',
-                aws_access_key_id=self._credentials.r2_access_key_id,
-                aws_secret_access_key=self._credentials.r2_secret_access_key,
-                config=Config(signature_version='s3v4'),
-                region_name='auto'
-            )

Based on past review comments, this duplication was previously flagged.

🧹 Nitpick comments (1)
sebs/cloudflare/r2.py (1)

20-26: Remove redundant property - already defined in parent class.

The replace_existing property is already defined in PersistentStorage (the parent class). This duplication is unnecessary and could lead to maintenance issues.

♻️ Proposed fix
-    `@property`
-    def replace_existing(self) -> bool:
-        return self._replace_existing
-
-    `@replace_existing.setter`
-    def replace_existing(self, val: bool):
-        self._replace_existing = val
-
     def __init__(

Comment on lines +12 to +14
@staticmethod
def typename() -> str:
return "Cloudlfare.R2"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in typename: "Cloudlfare" should be "Cloudflare".

🔧 Proposed fix
     `@staticmethod`
     def typename() -> str:
-        return "Cloudlfare.R2"
+        return "Cloudflare.R2"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@staticmethod
def typename() -> str:
return "Cloudlfare.R2"
`@staticmethod`
def typename() -> str:
return "Cloudflare.R2"
🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 12 - 14, The typename() static method
incorrectly returns "Cloudlfare.R2"; update the return value in typename() (in
sebs.cloudflare.r2: typename) to the correct string "Cloudflare.R2" so the
provider name is spelled properly.

Comment on lines +58 to +60
def _create_bucket(
self, name: str, buckets: list[str] = [], randomize_name: bool = False
) -> str:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix mutable default argument and unused parameter.

Using a mutable default argument ([]) can cause unexpected behavior if the list is modified. The randomize_name parameter is also unused.

🔧 Proposed fix
     def _create_bucket(
-        self, name: str, buckets: list[str] = [], randomize_name: bool = False
+        self, name: str, buckets: Optional[List[str]] = None, randomize_name: bool = False
     ) -> str:
+        if buckets is None:
+            buckets = []
         for bucket_name in buckets:

If randomize_name is intended for future use, consider adding a TODO comment or removing it until needed.

🧰 Tools
🪛 Ruff (0.14.13)

59-59: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


59-59: Unused method argument: randomize_name

(ARG002)

🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 58 - 60, The _create_bucket function
currently uses a mutable default for buckets and includes an unused
randomize_name parameter; change the signature of _create_bucket(name: str,
buckets: Optional[list[str]] = None) (or remove the buckets default entirely)
and inside the function initialize buckets = [] if buckets is None to avoid
shared mutable state, and either remove the randomize_name parameter or keep it
but mark it with a TODO and do not rely on it; update any callers of
_create_bucket to pass an explicit list if they relied on the previous default.

Comment on lines +87 to +106
try:
create_bucket_response = requests.post(
create_bucket_uri, json=params, headers=self._get_auth_headers()
)

# Log the response for debugging
if create_bucket_response.status_code >= 400:
try:
error_data = create_bucket_response.json()
self.logging.error(
f"R2 bucket creation failed. Status: {create_bucket_response.status_code}, "
f"Response: {error_data}"
)
except:
self.logging.error(
f"R2 bucket creation failed. Status: {create_bucket_response.status_code}, "
f"Response: {create_bucket_response.text}"
)

create_bucket_response.raise_for_status()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout to HTTP request and avoid bare except.

The requests.post call lacks a timeout, which could cause indefinite hangs. Also, bare except: catches all exceptions including KeyboardInterrupt and SystemExit.

🔧 Proposed fix
         try:
             create_bucket_response = requests.post(
-                create_bucket_uri, json=params, headers=self._get_auth_headers()
+                create_bucket_uri, json=params, headers=self._get_auth_headers(), timeout=30
             )
             
             # Log the response for debugging
             if create_bucket_response.status_code >= 400:
                 try:
                     error_data = create_bucket_response.json()
                     self.logging.error(
                         f"R2 bucket creation failed. Status: {create_bucket_response.status_code}, "
                         f"Response: {error_data}"
                     )
-                except:
+                except json.JSONDecodeError:
                     self.logging.error(
                         f"R2 bucket creation failed. Status: {create_bucket_response.status_code}, "
                         f"Response: {create_bucket_response.text}"
                     )
🧰 Tools
🪛 Ruff (0.14.13)

88-88: Probable use of requests call without timeout

(S113)


100-100: Do not use bare except

(E722)


101-104: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 87 - 106, The requests.post call in the
R2 bucket creation (inside the method using create_bucket_uri and
create_bucket_response) needs a timeout to avoid hangs and the inner bare except
should be replaced with specific exception handling; add a timeout argument to
requests.post (e.g. timeout=10 or a configurable constant) and wrap the
response.json() call in a try/except that catches json.JSONDecodeError (or
ValueError) only, logging create_bucket_response.text when JSON parsing fails;
also consider catching requests.exceptions.RequestException around the POST to
log and re-raise or handle network errors instead of letting it hang.

Comment on lines +122 to +133
def download(self, bucket_name: str, key: str, filepath: str) -> None:
"""
Download a file from a bucket.

:param bucket_name:
:param key: storage source filepath
:param filepath: local destination filepath
"""
# R2 requires S3-compatible access for object operations
# For now, this is not fully implemented
self.logging.warning(f"download not fully implemented for R2 bucket {bucket_name}")
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Incomplete implementation may silently fail.

The download method logs a warning but returns silently, which could cause callers to assume success. Consider raising NotImplementedError to make the incomplete state explicit.

🔧 Proposed fix
     def download(self, bucket_name: str, key: str, filepath: str) -> None:
         """
         Download a file from a bucket.

         :param bucket_name:
         :param key: storage source filepath
         :param filepath: local destination filepath
         """
-        # R2 requires S3-compatible access for object operations
-        # For now, this is not fully implemented
-        self.logging.warning(f"download not fully implemented for R2 bucket {bucket_name}")
-        pass
+        raise NotImplementedError(
+            "R2 download requires S3-compatible access. This is not yet implemented."
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def download(self, bucket_name: str, key: str, filepath: str) -> None:
"""
Download a file from a bucket.
:param bucket_name:
:param key: storage source filepath
:param filepath: local destination filepath
"""
# R2 requires S3-compatible access for object operations
# For now, this is not fully implemented
self.logging.warning(f"download not fully implemented for R2 bucket {bucket_name}")
pass
def download(self, bucket_name: str, key: str, filepath: str) -> None:
"""
Download a file from a bucket.
:param bucket_name:
:param key: storage source filepath
:param filepath: local destination filepath
"""
raise NotImplementedError(
"R2 download requires S3-compatible access. This is not yet implemented."
)
🧰 Tools
🪛 Ruff (0.14.13)

122-122: Unused method argument: key

(ARG002)


122-122: Unused method argument: filepath

(ARG002)

🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 122 - 133, The download method currently
only logs a warning and returns silently, which can mislead callers; replace the
silent pass with raising a NotImplementedError (include a clear message
referencing the bucket_name and key) so callers receive an explicit failure,
keep the method signature download(self, bucket_name: str, key: str, filepath:
str) unchanged and remove the trailing pass; retain or refine the log message if
desired but ensure the NotImplementedError is raised.

Comment on lines +299 to +316
try:
response = requests.get(list_buckets_uri, headers=self._get_auth_headers())

# Log detailed error information
if response.status_code == 403:
try:
error_data = response.json()
self.logging.error(
f"403 Forbidden accessing R2 buckets. "
f"Response: {error_data}. "
f"Your API token may need 'R2 Read and Write' permissions."
)
except:
self.logging.error(
f"403 Forbidden accessing R2 buckets. "
f"Your API token may need 'R2 Read and Write' permissions."
)
return []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout and fix error handling.

The requests.get call lacks a timeout. The bare except: and f-strings without placeholders should also be fixed.

🔧 Proposed fix
         try:
-            response = requests.get(list_buckets_uri, headers=self._get_auth_headers())
+            response = requests.get(list_buckets_uri, headers=self._get_auth_headers(), timeout=30)
             
             # Log detailed error information
             if response.status_code == 403:
                 try:
                     error_data = response.json()
                     self.logging.error(
                         f"403 Forbidden accessing R2 buckets. "
                         f"Response: {error_data}. "
-                        f"Your API token may need 'R2 Read and Write' permissions."
+                        "Your API token may need 'R2 Read and Write' permissions."
                     )
-                except:
+                except json.JSONDecodeError:
                     self.logging.error(
-                        f"403 Forbidden accessing R2 buckets. "
-                        f"Your API token may need 'R2 Read and Write' permissions."
+                        "403 Forbidden accessing R2 buckets. "
+                        "Your API token may need 'R2 Read and Write' permissions."
                     )
                 return []
🧰 Tools
🪛 Ruff (0.14.13)

300-300: Probable use of requests call without timeout

(S113)


311-311: Do not use bare except

(E722)


312-315: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


313-313: f-string without any placeholders

Remove extraneous f prefix

(F541)


314-314: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 299 - 316, The requests.get call to
list_buckets_uri needs a timeout and the error handling must avoid a bare except
and properly capture JSON parse errors and include the exception text in the
log; update the call in the method that calls requests.get(list_buckets_uri,
headers=self._get_auth_headers()) to pass a sensible timeout (e.g. timeout=10),
replace the bare except with a specific exception handler (catch
json.JSONDecodeError or ValueError as e) when calling response.json(), and
include the exception details in the self.logging.error message (or use
self.logging.exception) so the log contains the actual parse error while still
returning [] on 403.

Comment on lines +347 to +354
def clean_bucket(self, bucket_name: str):
"""
Remove all objects from a bucket.

:param bucket_name:
"""
self.logging.warning(f"clean_bucket not fully implemented for R2 bucket {bucket_name}")
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Incomplete implementation may silently fail.

Similar to download, this stub could cause callers to assume success. Consider raising NotImplementedError.

🔧 Proposed fix
     def clean_bucket(self, bucket_name: str):
         """
         Remove all objects from a bucket.
         
         :param bucket_name:
         """
-        self.logging.warning(f"clean_bucket not fully implemented for R2 bucket {bucket_name}")
-        pass
+        raise NotImplementedError(
+            f"clean_bucket not yet implemented for R2 bucket {bucket_name}"
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def clean_bucket(self, bucket_name: str):
"""
Remove all objects from a bucket.
:param bucket_name:
"""
self.logging.warning(f"clean_bucket not fully implemented for R2 bucket {bucket_name}")
pass
def clean_bucket(self, bucket_name: str):
"""
Remove all objects from a bucket.
:param bucket_name:
"""
raise NotImplementedError(
f"clean_bucket not yet implemented for R2 bucket {bucket_name}"
)
🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 347 - 354, The clean_bucket method
currently logs a warning and returns silently which can mislead callers (similar
to download); update the implementation to raise NotImplementedError instead of
silently passing so callers get an explicit failure. Modify the clean_bucket
function (and any similar stubs like download) to raise NotImplementedError with
a clear message such as "clean_bucket not implemented for R2 bucket
{bucket_name}" and remove or keep the warning log as supplementary; ensure the
method signature and docstring remain unchanged.

Comment on lines +368 to +380
try:
response = requests.delete(delete_bucket_uri, headers=self._get_auth_headers())
response.raise_for_status()

data = response.json()

if data.get("success"):
self.logging.info(f"Successfully deleted R2 bucket {bucket}")
else:
self.logging.error(f"Failed to delete R2 bucket {bucket}: {data.get('errors')}")

except requests.exceptions.RequestException as e:
self.logging.error(f"Error deleting R2 bucket {bucket}: {e}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout to HTTP request.

The requests.delete call lacks a timeout, consistent with other HTTP calls in this file.

🔧 Proposed fix
         try:
-            response = requests.delete(delete_bucket_uri, headers=self._get_auth_headers())
+            response = requests.delete(delete_bucket_uri, headers=self._get_auth_headers(), timeout=30)
             response.raise_for_status()
🧰 Tools
🪛 Ruff (0.14.13)

369-369: Probable use of requests call without timeout

(S113)


380-380: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🤖 Prompt for AI Agents
In `@sebs/cloudflare/r2.py` around lines 368 - 380, The requests.delete call lacks
a timeout; update the HTTP call to include the same timeout used elsewhere in
this module by adding the timeout keyword argument to
requests.delete(delete_bucket_uri, headers=self._get_auth_headers(),
timeout=<module timeout value>), where <module timeout value> should be the
existing timeout constant or attribute used by other calls in this file (for
example self._timeout or DEFAULT_TIMEOUT), so that the delete request for
delete_bucket_uri follows the same timeout behavior as other requests.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@sebs/cloudflare/cloudflare.py`:
- Around line 484-507: In _build_workers_dev_url, the last logging.warning uses
a plain string with "{name}" so worker_name is not interpolated; change that
call to use an f-string and reference {worker_name} (i.e., update the
logging.warning message to f"No account ID available; using
https://{worker_name}.workers.dev which may not be reachable.") so the actual
worker_name appears in logs.
♻️ Duplicate comments (12)
sebs/cloudflare/r2.py (7)

11-18: Fix provider typename typo.
The provider identifier is misspelled and can break lookups.

🔧 Proposed fix
     `@staticmethod`
     def typename() -> str:
-        return "Cloudlfare.R2"
+        return "Cloudflare.R2"

99-101: Avoid mutable default for buckets.
A shared list across calls can leak state; also note randomize_name remains unused.

🔧 Proposed fix
     def _create_bucket(
-        self, name: str, buckets: list[str] = [], randomize_name: bool = False
+        self, name: str, buckets: Optional[List[str]] = None, randomize_name: bool = False
     ) -> str:
+        if buckets is None:
+            buckets = []

128-145: Add timeout and narrow JSON parse handling in bucket creation.
Without a timeout the call can hang; bare except hides non‑JSON errors.

🔧 Proposed fix
             create_bucket_response = requests.post(
-                create_bucket_uri, json=params, headers=self._get_auth_headers()
+                create_bucket_uri,
+                json=params,
+                headers=self._get_auth_headers(),
+                timeout=30,
             )
@@
-                except:
+                except json.JSONDecodeError:
                     self.logging.error(
                         f"R2 bucket creation failed. Status: {create_bucket_response.status_code}, "
                         f"Response: {create_bucket_response.text}"
                     )

163-175: Make download failure explicit.
A warning + pass can be mistaken for success.

🔧 Proposed fix
-        self.logging.warning(f"download not fully implemented for R2 bucket {bucket_name}")
-        pass
+        raise NotImplementedError(
+            f"R2 download requires S3-compatible access and is not implemented for bucket {bucket_name}"
+        )

266-315: Add timeout and avoid bare except in list_buckets.
Same reliability concerns as other HTTP calls.

🔧 Proposed fix
-            response = requests.get(list_buckets_uri, headers=self._get_auth_headers())
+            response = requests.get(
+                list_buckets_uri,
+                headers=self._get_auth_headers(),
+                timeout=30,
+            )
@@
-                except:
+                except json.JSONDecodeError:
                     self.logging.error(
                         f"403 Forbidden accessing R2 buckets. "
                         f"Your API token may need 'R2 Read and Write' permissions."
                     )

327-334: Make clean_bucket explicit about being unimplemented.

🔧 Proposed fix
-        self.logging.warning(f"clean_bucket not fully implemented for R2 bucket {bucket_name}")
-        pass
+        raise NotImplementedError(
+            f"clean_bucket not implemented for R2 bucket {bucket_name}"
+        )

336-361: Add timeout to delete bucket call.
Keeps delete requests from hanging indefinitely.

🔧 Proposed fix
-            response = requests.delete(delete_bucket_uri, headers=self._get_auth_headers())
+            response = requests.delete(
+                delete_bucket_uri,
+                headers=self._get_auth_headers(),
+                timeout=30,
+            )
sebs/cloudflare/cloudflare.py (5)

80-91: Avoid mutable default for config (still unused).
This keeps the signature safe even if config stays unused.

🔧 Proposed fix
-    def initialize(self, config: Dict[str, str] = {}, resource_prefix: Optional[str] = None):
+    def initialize(self, config: Optional[Dict[str, str]] = None, resource_prefix: Optional[str] = None):
@@
-        # Verify credentials are valid
+        if config is None:
+            config = {}
+        # Verify credentials are valid

131-164: Add timeout and handle request failures during credential verification.
Prevents hangs and gives clearer failure context.

🔧 Proposed fix
-        response = requests.get(f"{self._api_base_url}/user/tokens/verify", headers=headers)
+        try:
+            response = requests.get(
+                f"{self._api_base_url}/user/tokens/verify",
+                headers=headers,
+                timeout=30,
+            )
+        except requests.exceptions.RequestException as e:
+            raise RuntimeError("Failed to verify Cloudflare credentials") from e

349-366: Add timeout and avoid bare except in worker lookup.

🔧 Proposed fix
-        response = requests.get(url, headers=headers)
+        response = requests.get(url, headers=headers, timeout=30)
@@
-            except:
+            except ValueError:
                 return None

443-483: Add timeout and narrow exception handling for subdomain lookup.

🔧 Proposed fix
-            resp = requests.get(url, headers=headers)
+            resp = requests.get(url, headers=headers, timeout=30)
             if resp.status_code == 200:
-                body = resp.json()
+                try:
+                    body = resp.json()
+                except ValueError:
+                    self.logging.warning("Invalid JSON while fetching workers.dev subdomain")
+                    return None
@@
-        except Exception as e:
+        except requests.exceptions.RequestException as e:
             self.logging.warning(f"Error fetching workers.dev subdomain: {e}")
             return None

606-635: Enforce digit-prefix rule for all workers, not just containers.
Non-container workers starting with a digit will still produce invalid names.

🔧 Proposed fix
-        if container_deployment and formatted and formatted[0].isdigit():
-            formatted = 'container-' + formatted
+        if formatted and formatted[0].isdigit():
+            prefix = "container-" if container_deployment else "w-"
+            formatted = prefix + formatted
🧹 Nitpick comments (1)
sebs/cloudflare/cloudflare.py (1)

522-561: Prefer explicit container flag over name prefix.
Name-based detection only covers digit-prefixed names; an explicit flag is more reliable.

♻️ Suggested change
-        is_container = worker.name.startswith("container-")
+        is_container = container_deployment

Comment on lines +484 to +507
def _build_workers_dev_url(self, worker_name: str, account_id: Optional[str]) -> str:
"""Build a best-effort public URL for a worker.

Prefer using the account's readable workers.dev subdomain when available
(e.g. <name>.<subdomain>.workers.dev). If we can't obtain that, fall
back to using the account_id as a last resort and log a warning.
"""
if account_id:
sub = self._get_workers_dev_subdomain(account_id)
if sub:
return f"https://{worker_name}.{sub}.workers.dev"
else:
# fallback: some code historically used account_id in the host
self.logging.warning(
"Using account ID in workers.dev URL as a fallback. "
"Enable the workers.dev subdomain in Cloudflare for proper URLs."
)
return f"https://{worker_name}.{account_id}.workers.dev"
# Last fallback: plain workers.dev (may not resolve without a subdomain)
self.logging.warning(
"No account ID available; using https://{name}.workers.dev which may not be reachable."
)
return f"https://{worker_name}.workers.dev"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n sebs/cloudflare/cloudflare.py | sed -n '484,507p'

Repository: spcl/serverless-benchmarks

Length of output: 1458


Fix warning string to use f-string and interpolate worker_name.

Line 504 has a plain string literal with {name} placeholder that won't be interpolated. It should be an f-string using {worker_name}:

🔧 Proposed fix
         self.logging.warning(
-            "No account ID available; using https://{name}.workers.dev which may not be reachable."
+            f"No account ID available; using https://{worker_name}.workers.dev which may not be reachable."
         )
🤖 Prompt for AI Agents
In `@sebs/cloudflare/cloudflare.py` around lines 484 - 507, In
_build_workers_dev_url, the last logging.warning uses a plain string with
"{name}" so worker_name is not interpolated; change that call to use an f-string
and reference {worker_name} (i.e., update the logging.warning message to f"No
account ID available; using https://{worker_name}.workers.dev which may not be
reachable.") so the actual worker_name appears in logs.

@userlaurin userlaurin requested a review from mcopik January 19, 2026 10:41
@userlaurin userlaurin changed the title SeBS Cloudflare Compatibility & 311/130 Benchmark Translation to Nodejs SeBS Cloudflare Compatibility Jan 22, 2026
laurin and others added 28 commits March 18, 2026 10:25
…val suggestion, function deserialize suggestion
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
… varibale at compile time, as wrangler does not provide docker build args
…n dockerfile to download and link static binary dependencies
…ing their own package & deployment process orchestrated by main entry point cloudflare.py
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

♻️ Duplicate comments (6)
docs/platforms.md (2)

242-250: ⚠️ Potential issue | 🟡 Minor

Add a blank line after the table.

MD058 is still triggered because the next bullet starts immediately after the last row.

✏️ Minimal fix
   | standard-3 | 2 | 8 GiB | 16 GB |
   | standard-4 | 4 | 12 GiB | 20 GB |
+
 - **Metrics Collection**: Uses response-based per-invocation metrics. During each function invocation, the worker handler measures performance metrics (CPU time, wall time, memory usage) and embeds them directly in the JSON response. SeBS extracts these metrics immediately from each response. When `download_metrics()` is called for postprocessing, it only aggregates the metrics that were already collected during invocations—no additional data is fetched from external services. This approach provides immediate per-invocation granularity without delays. Note that while Cloudflare does expose an Analytics Engine, it only provides aggregated metrics without individual request-level data, making it unsuitable for detailed benchmarking purposes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/platforms.md` around lines 242 - 250, Add a blank line after the
Markdown table (the block starting with "| Instance Type | vCPU | Memory | Disk
|" and ending with the last table row "| standard-4 | 4 | 12 GiB | 20 GB |") so
the following bullet ("**Metrics Collection**: ...") becomes separated by an
empty line; this prevents MD058 by ensuring there's a blank line between the
table and the next paragraph/bullet.

233-233: ⚠️ Potential issue | 🟡 Minor

Use {subdomain} in the workers.dev example, not {account}.

The hostname is derived from the account subdomain, not the account ID, so this placeholder points readers at the wrong value.

✏️ Doc fix
-- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{account}.workers.dev`
+- **HTTP Trigger**: ✅ Fully supported - Workers are automatically accessible at `https://{name}.{subdomain}.workers.dev`
What is the documented format of a Cloudflare workers.dev URL for a deployed Worker? Does it use the account ID or the account subdomain?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/platforms.md` at line 233, The docs currently show an incorrect
placeholder in the Workers URL example: locate the string
"https://{name}.{account}.workers.dev" in the HTTP Trigger example and replace
the "{account}" placeholder with "{subdomain}" so it reads
"https://{name}.{subdomain}.workers.dev"; also search for any other workers.dev
examples using "{account}" and update them to "{subdomain}" to ensure the
hostname reflects the account subdomain rather than the account ID.
sebs/cloudflare/cli.py (1)

103-110: ⚠️ Potential issue | 🟠 Major

Don't wait forever for the first log line.

A quiet startup, or a container that exits without emitting logs, will hang the CLI indefinitely here. Use a bounded timeout and poll container state/readiness instead of blocking on next(logs(...)).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/cloudflare/cli.py` around lines 103 - 110, The current loop using
docker_instance.logs(stream=True, follow=True) and next(dkg) can block forever
if no log line appears; replace the infinite while True with a bounded wait:
record a deadline (e.g. time.time() + timeout), poll for logs non-blockingly and
poll the container state (via your docker API wrapper—e.g. inspect/get container
state or attributes on the same object used as self.docker_instance) in a loop
with a short sleep, return/break when the first log line is read or when the
container has exited/failed, and raise a timeout/error if the deadline passes;
update references to dkg and the logs call accordingly so you stop blocking on
next(dkg) and instead handle StopIteration and absence of logs by checking
container readiness/state and timing out.
sebs/cloudflare/durable_objects.py (1)

165-172: ⚠️ Potential issue | 🟠 Major

Don't let seed writes succeed silently.

After the table lookup, this path just passes. Callers will continue as if setup data exists, which produces invalid benchmark runs for workloads that depend on pre-populated rows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/cloudflare/durable_objects.py` around lines 165 - 172, The current code
looks up table_name via self._get_table_name(...) then silently "pass" which
lets callers believe seeding succeeded; replace the silent pass with a clear
failure: after the table_name check, raise a descriptive exception (e.g.,
RuntimeError or NotImplementedError) indicating that seeding is not supported
for Cloudflare Durable Objects (include benchmark and table in the message) so
callers don't proceed with missing seed data; reference the existing
_get_table_name call and the table_name variable to locate where to change.
benchmarks/wrappers/cloudflare/python/storage.py (1)

46-51: ⚠️ Potential issue | 🟡 Minor

/tmp double-prefix still possible in upload().

If filepath is already in written_files and starts with /tmp, prepending /tmp again creates an invalid path like /tmp/tmp/.... The download() method checks for this, but upload() doesn't.

🔧 Suggested fix
     def upload(self, bucket, key, filepath):
-        if filepath in self.written_files:
+        if filepath in self.written_files and not filepath.startswith("/tmp"):
             filepath = "/tmp" + os.path.abspath(filepath)
         with open(filepath, "rb") as f:
             unique_key = self.upload_stream(bucket, key, f.read())
         return unique_key
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/python/storage.py` around lines 46 - 51, The
upload() method can double-prefix /tmp when filepath is in written_files and
already starts with /tmp; modify upload() (function upload, variable
written_files) to compute the tmp path safely — e.g., take abs_path =
os.path.abspath(filepath) and only prefix "/tmp" if not
abs_path.startswith("/tmp") (or use os.path.join("/tmp",
os.path.basename(abs_path)) depending on desired semantics), then open that
resolved path for reading and call upload_stream as before; mirror the same
normalization logic used in download() to avoid creating "/tmp/tmp/..." paths.
sebs/cloudflare/containers.py (1)

356-362: ⚠️ Potential issue | 🟡 Minor

Add exception chaining and consider filter parameter.

Per past review: add from e for exception chaining, and consider filter='data' for defense-in-depth (Python 3.12+).

🔧 Suggested fix
             with tarfile.open(fileobj=file_obj) as tar:
-                tar.extractall(directory)
+                tar.extractall(directory, filter='data')
             
             self.logging.info(f"Downloaded node_modules to {directory} for wrangler deployment")
         except Exception as e:
             self.logging.error(f"npm install failed: {e}")
-            raise RuntimeError(f"Failed to install Node.js dependencies: {e}")
+            raise RuntimeError(f"Failed to install Node.js dependencies: {e}") from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/cloudflare/containers.py` around lines 356 - 362, In the try/except
around tarfile extraction in the method that opens tar via
tarfile.open(fileobj=file_obj) and calls tar.extractall(directory), change the
exception re-raise to use exception chaining (raise RuntimeError(f"Failed to
install Node.js dependencies: {e}") from e) and, for defense-in-depth, call
tar.extractall with the filter parameter (e.g., tar.extractall(directory,
filter='data')) when running on Python 3.12+; update the self.logging.error line
to keep the original exception message but do not swallow the original exception
so the chained raise preserves the traceback.
🟡 Minor comments (8)
dockerfiles/cloudflare/nodejs/Dockerfile-21-27 (1)

21-27: ⚠️ Potential issue | 🟡 Minor

Broad error suppression may mask genuine failures.

Using || true suppresses all errors from init.sh, not just "resource already exists" scenarios. Download failures, permission issues, or corrupted downloads would be silently ignored, potentially causing confusing runtime errors.

Consider checking for specific expected outcomes or logging failures:

🛡️ Suggested improvement
 # Run benchmark init script if it exists (e.g., for ffmpeg in video-processing)
 # This downloads static binaries needed by the benchmark
-# Note: ignore errors from init.sh (e.g., when resources already exist)
 RUN if [ -f "init.sh" ]; then \
         chmod +x init.sh && \
-        ./init.sh /app verbose x64 || true; \
+        ./init.sh /app verbose x64 || echo "Warning: init.sh failed with exit code $?"; \
     fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dockerfiles/cloudflare/nodejs/Dockerfile` around lines 21 - 27, The
Dockerfile currently suppresses all errors from running init.sh with "|| true",
which can hide real failures; replace this broad suppression by running init.sh
in a small bash wrapper that captures its exit code and output (invoke
"./init.sh /app verbose x64"), allow success (exit 0) or explicitly allow only
known benign exit codes/messages (e.g., "resource already exists") by matching
stdout/stderr, otherwise write the full error to the build logs via echo or >&2
and exit with a non-zero status so the build fails; locate the RUN that calls
init.sh in the Dockerfile and implement the wrapper logic there to log errors
and only ignore specific, documented conditions.
dockerfiles/cloudflare/nodejs/Dockerfile-15-16 (1)

15-16: ⚠️ Potential issue | 🟡 Minor

Use npm install --omit=dev instead of --production.

npm install --production is deprecated in npm 8+ and triggers warnings. Use npm install --omit=dev for the same behavior with modern npm versions (node:18-slim includes npm 9+).

♻️ Suggested improvement
 # Install dependencies
-RUN npm install --production
+RUN npm install --omit=dev
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dockerfiles/cloudflare/nodejs/Dockerfile` around lines 15 - 16, Replace the
deprecated npm flag in the Dockerfile: locate the RUN command that currently
uses "npm install --production" and change it to use "npm install --omit=dev" so
it avoids npm 8+ deprecation warnings while keeping dev dependencies omitted
during image build; ensure no other build steps rely on the old flag and keep
the invocation in the same RUN line.
benchmarks/wrappers/cloudflare/nodejs/container/handler.js-108-112 (1)

108-112: ⚠️ Potential issue | 🟡 Minor

Use key presence here, not truthiness.

if (!event[key]) will overwrite valid body values like 0, false, or "" with query params. The merge should only fill missing keys.

✏️ Minimal fix
-      if (!event[key]) {
+      if (!(key in event)) {
         const intValue = parseInt(value);
         event[key] = isNaN(intValue) ? value : intValue;
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/nodejs/container/handler.js` around lines 108
- 112, The loop merging URL query params into the event object uses truthiness
check (if (!event[key])) which incorrectly overwrites valid falsy values like 0,
false, or ""; change the condition to check presence instead (e.g., use "if
(!(key in event))" or "if (!Object.prototype.hasOwnProperty.call(event, key))")
in the for (const [key, value] of url.searchParams) block so only missing keys
are filled with parsed intValue (from parseInt(value)) or the original value.
sebs/cloudflare/durable_objects.py-183-183 (1)

183-183: ⚠️ Potential issue | 🟡 Minor

Remove the stray f prefix on this warning.

Ruff flags this as F541, so lint will stay red until this is a plain string literal.

✏️ Minimal fix
-        self.logging.warning(f"Durable Objects data is managed within the Worker")
+        self.logging.warning("Durable Objects data is managed within the Worker")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/cloudflare/durable_objects.py` at line 183, The warning call currently
uses an unnecessary f-string: change the call to self.logging.warning(f"Durable
Objects data is managed within the Worker") to a plain string literal by
removing the leading f (i.e., self.logging.warning("Durable Objects data is
managed within the Worker")); update the call in the durable objects code where
the self.logging.warning invocation appears to eliminate the f-prefix so it no
longer triggers the F541 lint.
benchmarks/wrappers/cloudflare/python/handler.py-126-133 (1)

126-133: ⚠️ Potential issue | 🟡 Minor

Replace bare except with except Exception.

Bare except: catches SystemExit and KeyboardInterrupt, which should typically propagate. Use except Exception: to catch only regular exceptions.

🔧 Suggested fix
             try:
                 # Fetch the worker's own URL with favicon to minimize overhead
                 final_url = URL.new(request.url)
                 final_url.pathname = '/favicon'
                 await js_fetch(str(final_url), method='HEAD')
-            except:
+            except Exception:
                 # Ignore fetch errors
                 pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/python/handler.py` around lines 126 - 133, The
try/except block that fetches the worker URL (creating final_url via
URL.new(request.url) and calling js_fetch(...)) currently uses a bare except:,
which can swallow SystemExit/KeyboardInterrupt; replace it with except
Exception: to only catch regular exceptions and preserve external
interrupts—update the except clause in that try/except around final_url/js_fetch
accordingly.
benchmarks/wrappers/cloudflare/nodejs/container/storage.js-15-18 (1)

15-18: ⚠️ Potential issue | 🟡 Minor

Remove duplicate static worker_url declaration.

worker_url is declared twice on lines 15 and 18. Remove the duplicate.

🔧 Suggested fix
   static worker_url = null; // Set by handler from X-Worker-URL header

-  
-  static worker_url = null; // Set by handler from X-Worker-URL header
-
   static unique_name(name) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/nodejs/container/storage.js` around lines 15 -
18, The class has a duplicate static property declaration "worker_url" declared
twice; remove the redundant declaration so only one "static worker_url = null;
// Set by handler from X-Worker-URL header" remains (locate the duplicate
declarations of worker_url in the class and delete the extra one).
benchmarks/wrappers/cloudflare/python/storage.py-111-116 (1)

111-116: ⚠️ Potential issue | 🟡 Minor

Remove duplicate return statement.

Line 116 is an unreachable duplicate of line 115.

🔧 Suggested fix
     `@staticmethod`
     def get_instance():
         if storage.instance is None:
             raise RuntimeError("must init storage singleton first")
         return storage.instance
-        return storage.instance
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/python/storage.py` around lines 111 - 116, The
get_instance static method in storage has a duplicated unreachable return;
inside storage.get_instance() remove the second redundant "return
storage.instance" so the method only raises if instance is None and otherwise
returns storage.instance (update the body of storage.get_instance to have a
single return statement).
benchmarks/wrappers/cloudflare/nodejs/handler.js-65-85 (1)

65-85: ⚠️ Potential issue | 🟡 Minor

URL parameter parsing loses data when values contain =.

The manual parsing with split('=') only captures the first two parts, so a parameter like ?key=a=b would incorrectly parse as {key: 'a'} instead of {key: 'a=b'}. Consider using URLSearchParams for robust parsing.

🔧 Suggested fix
     // Parse query string into event (URL parameters override/merge with body)
     // This makes it compatible with both input formats
-    const urlParts = request.url.split('?');
-    if (urlParts.length > 1) {
-      const query = urlParts[1];
-      const pairs = query.split('&');
-      for (const p of pairs) {
-        const [k, v] = p.split('=');
-        try {
-          if (v === undefined) {
-            event[k] = null;
-          } else if (!Number.isNaN(Number(v)) && Number.isFinite(Number(v))) {
-            // mirror Python attempt to convert to int
-            const n = Number(v);
-            event[k] = Number.isInteger(n) ? parseInt(v, 10) : n;
-          } else {
-            event[k] = decodeURIComponent(v);
-          }
-        } catch (e) {
-          event[k] = v;
-        }
-      }
-    }
+    const url = new URL(request.url);
+    for (const [k, v] of url.searchParams) {
+      if (!Number.isNaN(Number(v)) && Number.isFinite(Number(v))) {
+        const n = Number(v);
+        event[k] = Number.isInteger(n) ? parseInt(v, 10) : n;
+      } else {
+        event[k] = v;
+      }
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/nodejs/handler.js` around lines 65 - 85, The
current manual parsing of request.url loses data when values contain '=';
replace the manual split loop with URLSearchParams: create a URL (e.g. new
URL(request.url, 'http://localhost')) or new URLSearchParams(rawQuery) from
request.url, iterate for (const [k, v] of searchParams) and apply the same
conversion logic used now (treat missing-value pairs as null, convert numeric
strings to Number/parseInt, otherwise decodeURIComponent), ensuring you preserve
the existing behavior of assigning to event[k]; update the block that references
request.url and event to use searchParams so values containing '=' are
preserved.
🧹 Nitpick comments (18)
dockerfiles/cloudflare/python/Dockerfile (2)

18-21: Error suppression could mask real failures.

The || true suppresses all errors from init.sh, including legitimate failures (e.g., network issues during download, disk space problems). Consider being more selective about which errors to ignore.

🔧 Suggested approach to handle errors more selectively
 RUN if [ -f "init.sh" ]; then \
         chmod +x init.sh && \
-        ./init.sh /app verbose x64 || true; \
+        ./init.sh /app verbose x64 || echo "Warning: init.sh failed (may be expected if resources already exist)"; \
     fi

Alternatively, if specific exit codes indicate "resource exists," check for those explicitly rather than suppressing all errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dockerfiles/cloudflare/python/Dockerfile` around lines 18 - 21, The RUN step
currently appends "|| true" after executing init.sh which hides all failures
from init.sh; remove the unconditional suppression and instead handle
expected/benign cases explicitly: update the Docker RUN (and/or the init.sh
script referenced as init.sh) so that init.sh returns 0 for idempotent/known-OK
conditions or document and check for specific exit codes in the RUN wrapper,
e.g., test the exit code and only ignore the known harmless ones
(resource-exists codes) while letting other non-zero exits fail the build;
ensure init.sh is made idempotent (or exports meaningful exit codes) and that
the Docker RUN will propagate real errors instead of using "|| true".

1-38: Consider adding a non-root user for improved security posture.

The container runs as root by default. While this may be acceptable for isolated benchmark containers, adding a non-root user is a security best practice that reduces the attack surface if the container is compromised.

🛡️ Suggested fix to run as non-root user
 ARG BASE_IMAGE=python:3.11-slim
 FROM ${BASE_IMAGE}
 
 # Install system dependencies needed for benchmarks
 RUN apt-get update && apt-get install -y --no-install-recommends \
     wget \
     xz-utils \
     && rm -rf /var/lib/apt/lists/*
 
+# Create non-root user
+RUN useradd --create-home --shell /bin/bash appuser
+
 WORKDIR /app
 
 # Copy all application files first
 COPY . .
 
 # Run benchmark init script if it exists (e.g., for ffmpeg in video-processing)
 # This downloads static binaries needed by the benchmark
 # Note: ignore errors from init.sh (e.g., when resources already exist)
 RUN if [ -f "init.sh" ]; then \
         chmod +x init.sh && \
         ./init.sh /app verbose x64 || true; \
-    fi
+    fi && \
+    chown -R appuser:appuser /app
 
 # Install dependencies
 # Core dependencies for wrapper modules:
 # - storage.py uses urllib (stdlib) to proxy R2 requests through worker.js
 # - nosql.py, worker.py, handler.py use stdlib only
 # Then install benchmark-specific requirements from requirements.txt
 RUN pip install --no-cache-dir --upgrade pip && \
     if [ -f requirements.txt ]; then pip install --no-cache-dir -r requirements.txt; fi
 
+# Switch to non-root user
+USER appuser
+
 # Expose port 8080 for container communication
 EXPOSE 8080
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dockerfiles/cloudflare/python/Dockerfile` around lines 1 - 38, The Dockerfile
currently runs as root; create and switch to a non-root user by adding a RUN
that creates a user/group (e.g., groupadd/useradd or adduser), chown the WORKDIR
(/app) and any files created during build (so init.sh, requirements.txt and
installed artifacts are owned by that user), set HOME if needed, and add a USER
<username> before the final CMD ["python","handler.py"]; ensure any RUN steps
that require root (apt-get, pip installs) occur before changing USER and
reference symbols BASE_IMAGE, WORKDIR, init.sh, requirements.txt and CMD when
making the changes.
benchmarks/100.webapps/130.crud-api/nodejs/package.json (1)

7-8: Remove empty dependencies block unless a tool requires it.

Line 7-Line 8 currently declare an empty object. If there are no runtime deps, omitting the key keeps the manifest cleaner and avoids noisy diffs later.

Optional cleanup
-  "dependencies": {
-  }
+  "dependencies": {}

Or remove it entirely:

-  "dependencies": {
-  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/100.webapps/130.crud-api/nodejs/package.json` around lines 7 - 8,
The package.json currently contains an empty "dependencies" object; remove the
"dependencies" key entirely (or add required runtime deps if there are any) so
the manifest is cleaner and avoids noisy diffs. Locate the package.json entry
named "dependencies" and delete that key and its empty object value, then run
any JSON linting/validation to ensure package.json remains valid; if a tool
requires the key, replace the empty object with the required dependency entries
instead.
benchmarks/wrappers/cloudflare/nodejs/request-polyfill.js (1)

26-57: Polyfill has a design limitation but it doesn't affect current usage.

The fetch request only executes inside .pipe(), so direct callback-only usage (request(url, callback)) would not work. However, the codebase only uses the polyfill as request(url).pipe(file), so this limitation is not triggered. If callback-only invocations were added in the future, they would fail.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/nodejs/request-polyfill.js` around lines 26 -
57, The polyfill's fetch only runs inside requestObj.pipe, so calling
request(url, callback) (i.e. using the callback-only path) does nothing; update
the request factory (where requestObj is created) to start the fetch immediately
when a callback is provided or expose a start method so fetch runs even if pipe
is never called—specifically modify the logic that builds requestObj (refer to
requestObj and its pipe method) to detect a provided callback and invoke the
same fetch flow (the .then/.catch handling that emits destination errors and
calls callback) on creation, ensuring the callback receives the response or
error even without calling pipe.
sebs/cloudflare/containers.py (3)

351-351: Prefix unused variable with underscore.

The stat variable from get_archive() is never used.

🔧 Suggested fix
-            bits, stat = cli.docker_instance.get_archive(f"{container_path}/node_modules")
+            bits, _stat = cli.docker_instance.get_archive(f"{container_path}/node_modules")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/cloudflare/containers.py` at line 351, The call to
cli.docker_instance.get_archive(...) assigns an unused variable stat; rename it
to _stat (or prefix with an underscore) to indicate it's intentionally unused
and avoid linter warnings — update the assignment in the code where get_archive
is invoked (the expression using cli.docker_instance.get_archive and the
left-hand variables) to use bits, _stat instead of bits, stat.

458-465: Add exception chaining in _build_container_image_local.

Preserve the original exception traceback using from e.

🔧 Suggested fix
         except docker.errors.BuildError as e:
             error_msg = f"Docker build failed for {image_tag}: {e}"
             self.logging.error(error_msg)
-            raise RuntimeError(error_msg)
+            raise RuntimeError(error_msg) from e
         except Exception as e:
             error_msg = f"Unexpected error building Docker image {image_tag}: {e}"
             self.logging.error(error_msg)
-            raise RuntimeError(error_msg)
+            raise RuntimeError(error_msg) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/cloudflare/containers.py` around lines 458 - 465, In
_build_container_image_local, the caught exceptions (docker.errors.BuildError
and the generic Exception) are re-raised as RuntimeError without preserving the
original traceback; update both except blocks to chain the original exception by
using "raise RuntimeError(error_msg) from e" so the original exception (e) is
included, keeping the existing logging calls (self.logging.error(error_msg)) and
messages unchanged.

252-252: Remove extraneous f-string prefix.

Line 252 uses an f-string without any placeholders.

🔧 Suggested fix
-            self.logging.info(f"Copied worker.js orchestration file from nodejs/container")
+            self.logging.info("Copied worker.js orchestration file from nodejs/container")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/cloudflare/containers.py` at line 252, The logging call uses an
unnecessary f-string prefix; update the self.logging.info call in containers.py
(the line that logs "Copied worker.js orchestration file from nodejs/container")
to use a plain string (remove the leading f) since there are no placeholders,
e.g., change the self.logging.info invocation to pass a normal string literal.
benchmarks/wrappers/cloudflare/python/handler.py (2)

43-43: Split multiple statements onto separate lines.

Static analysis flags this as E701. Splitting improves readability.

🔧 Suggested fix
-        if "favicon" in request.url: return Response("None")
+        if "favicon" in request.url:
+            return Response("None")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/python/handler.py` at line 43, The single-line
conditional "if \"favicon\" in request.url: return Response(\"None\")" should be
split into two lines to satisfy E701; replace it with a standard if block that
checks "favicon" in request.url and then on the next line returns
Response("None") (locate the statement in
benchmarks/wrappers/cloudflare/python/handler.py and update that inline if to a
multi-line if block).

154-199: Consider removing unused code.

The functions import_from_path, MakeAsync, AddAwait, and make_benchmark_func appear unused (lines 94-95 are commented out). If these are no longer needed, removing them would reduce maintenance burden.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/python/handler.py` around lines 154 - 199,
Remove the dead code related to the AST-based transformation: delete the
import_from_path function, the MakeAsync class, the AddAwait class, the
make_benchmark_func function and the working_dir variable (and any associated
unused imports such as importlib, ast, sys, os if they are only used by those
symbols); before deleting, search for any callers of import_from_path,
MakeAsync, AddAwait, or make_benchmark_func to ensure nothing else depends on
them, and run tests/lint to confirm there are no remaining references.
benchmarks/wrappers/cloudflare/python/container/handler.py (4)

185-189: Traceback exposure in error responses.

The error response includes the full traceback, which can expose internal file paths and code structure. For a benchmarking tool this may be acceptable for debugging, but consider making this configurable or removing in production deployments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/python/container/handler.py` around lines 185
- 189, The error response currently includes a full traceback (see
error_response and its use before self.wfile.write in handler.py), which can
leak internals; change this to make traceback inclusion configurable or remove
it in production by checking a flag (e.g., an environment/config value like
SHOW_TRACEBACK or handler.show_traceback) and only add traceback:
traceback.format_exc() to error_response when that flag is true, otherwise
return a minimal error message or generic error code; update any relevant
initialization/path where the handler is created to read the configuration and
set the flag accordingly.

191-193: Parameter format shadows Python builtin.

This is flagged by static analysis. While this matches the BaseHTTPRequestHandler signature, you can use fmt to avoid shadowing.

🔧 Suggested fix
-    def log_message(self, format, *args):
+    def log_message(self, fmt, *args):
         # Override to use print instead of stderr
-        print(f"{self.address_string()} - {format % args}")
+        print(f"{self.address_string()} - {fmt % args}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/python/container/handler.py` around lines 191
- 193, Rename the parameter in the log_message override to avoid shadowing the
builtin: change the parameter name from format to fmt in the log_message(self,
format, *args) definition and update the body to use fmt where it currently uses
format (e.g., f"{self.address_string()} - {fmt % args}"); ensure the method name
log_message and its behavior remain unchanged so it still overrides
BaseHTTPRequestHandler.

38-48: Consider handling timeout=None explicitly in patched_urlopen.

When timeout is None and forwarded via **kwargs, the original urlopen receives it twice if the caller also passed it in kwargs. The current signature has timeout as a positional-or-keyword arg but also passes **kwargs.

🔧 Suggested fix
-def patched_urlopen(url, data=None, timeout=None, **kwargs):
+def patched_urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT, **kwargs):
     if isinstance(url, str):
         req = urllib.request.Request(url, data=data)
         req.add_header('User-Agent', 'SeBS/1.2 (https://github.com/spcl/serverless-benchmarks) SeBS Benchmark Suite/1.2')
-        return original_urlopen(req, timeout=timeout, **kwargs)
+        return original_urlopen(req, timeout=timeout)
     elif isinstance(url, urllib.request.Request):
         if not url.has_header('User-Agent'):
             url.add_header('User-Agent', 'SeBS/1.2 (https://github.com/spcl/serverless-benchmarks) SeBS Benchmark Suite/1.2')
-        return original_urlopen(url, data=data, timeout=timeout, **kwargs)
+        return original_urlopen(url, data=data, timeout=timeout)
     else:
-        return original_urlopen(url, data=data, timeout=timeout, **kwargs)
+        return original_urlopen(url, data=data, timeout=timeout)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/python/container/handler.py` around lines 38 -
48, patched_urlopen may forward timeout twice if timeout is None and also
present in kwargs; update patched_urlopen (the function that wraps
original_urlopen) to avoid passing duplicate timeout by checking kwargs: if
'timeout' is in kwargs, do not pass the explicit timeout parameter; otherwise
pass timeout only when it is not None (or remove timeout key from kwargs before
calling original_urlopen). Ensure this logic is applied in all three call sites
that invoke original_urlopen so you never send timeout both positionally/keyword
and via **kwargs.

148-150: Memory calculation may be incorrect on macOS.

resource.getrusage().ru_maxrss returns kilobytes on Linux but bytes on macOS. Dividing by 1024 works for Linux but would be incorrect on macOS. Since this is a container handler (Linux), this is likely acceptable, but worth documenting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/python/container/handler.py` around lines 148
- 150, The memory calculation using resource.getrusage(...).ru_maxrss assumes
ru_maxrss is in kilobytes (Linux) but on macOS it is in bytes; update the logic
that computes memory_mb (the variable currently set and stored to
log_data['measurement']['memory_used_mb']) to branch on the platform (e.g.,
platform.system() or sys.platform): if Darwin/macOS divide ru_maxrss by
1024*1024 to get MB, otherwise divide by 1024 for Linux, and add a brief comment
noting the container/Linux assumption for clarity.
benchmarks/wrappers/cloudflare/nodejs/container/storage.js (1)

123-137: upload() return value differs from uploadStream().

upload() returns [unique_key, uploadPromise] while uploadStream() returns [passThrough, upload, unique_key]. This inconsistency may confuse callers. Consider documenting the expected return format or aligning the APIs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/nodejs/container/storage.js` around lines 123
- 137, The upload() method currently returns [unique_key, uploadPromise] while
uploadStream() returns [passThrough, upload, unique_key], causing inconsistent
API; update upload() in storage.upload to match uploadStream's tuple shape
(e.g., return [null, uploadPromise, unique_key] or wrap uploadPromise so callers
get the same three-element array), or else align uploadStream to the two-element
shape—modify the upload() return to mirror uploadStream (references: upload,
uploadStream, _upload_stream_with_key, unique_name) and update any callers or
documentation accordingly so both functions have the same return format.
benchmarks/wrappers/cloudflare/python/container/storage.py (1)

71-77: Use exception chaining with raise ... from e.

When re-raising exceptions, preserve the original traceback by using from e. This applies to multiple exception handlers in this file.

🔧 Suggested fix for this block (apply similar pattern elsewhere)
         try:
             with urllib.request.urlopen(req) as response:
                 result = json.loads(response.read().decode('utf-8'))
                 return result['key']
         except Exception as e:
             print(f"R2 upload error: {e}")
-            raise RuntimeError(f"Failed to upload to R2: {e}")
+            raise RuntimeError(f"Failed to upload to R2: {e}") from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/python/container/storage.py` around lines 71 -
77, The except block that catches Exception as e (in the R2 upload logic that
calls urllib.request.urlopen and returns result['key']) should re-raise the
RuntimeError using exception chaining (raise RuntimeError(f"Failed to upload to
R2: {e}") from e) instead of plain raise so the original traceback is preserved;
apply the same change to all other exception handlers in this file that
currently re-raise a new exception without using "from e" (locate blocks around
the upload/download helper functions and replace their plain raises with chained
raises referencing the caught exception variable).
sebs/cloudflare/cloudflare.py (2)

438-441: Chain exceptions for better debugging.

When re-raising exceptions within an except block, use raise ... from e to preserve the exception chain for debugging.

🔧 Proposed fix
         except RuntimeError as e:
             error_msg = f"Wrangler deployment failed for worker {worker_name}: {e}"
             self.logging.error(error_msg)
-            raise RuntimeError(error_msg)
+            raise RuntimeError(error_msg) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/cloudflare/cloudflare.py` around lines 438 - 441, The except block
handling RuntimeError in the deployment flow should preserve the original
exception chain: in the except RuntimeError as e block where you build error_msg
and call self.logging.error(error_msg) and then raise RuntimeError(error_msg),
change the re-raise to use "raise RuntimeError(error_msg) from e" so the
original exception (e) is chained and retained for debugging.

694-694: Rename unused loop variable.

The request_id variable is not used within the loop body. Prefix with underscore to indicate it's intentionally unused.

🔧 Proposed fix
-        for request_id, result in requests.items():
+        for _request_id, result in requests.items():
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/cloudflare/cloudflare.py` at line 694, Rename the unused loop variable
request_id to _request_id in the for loop that iterates over requests.items()
(the line "for request_id, result in requests.items():") to indicate it's
intentionally unused; update the loop header in the same function in
cloudflare.py and ensure no other code in the loop body references request_id
(adjust references if they exist).
benchmarks/wrappers/cloudflare/python/nosql.py (1)

28-40: Consider documenting pickle security context.

Static analysis flags pickle.loads as a security concern. While this appears to be internal data serialization (not untrusted external input), pickle deserialization can execute arbitrary code if data is ever tampered with or comes from an unexpected source.

The comment explains the technical reason for using pickle (pyodide cloning issues). Consider adding a brief note about the security assumption that stored data is trusted, or evaluate if JSON with custom encoding could work for the specific data types being serialized.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/python/nosql.py` around lines 28 - 40, Add a
short in-code comment above the data_pre and data_post functions explaining the
security assumptions for using pickle (that stored data is trusted and not
user-provided), note that pickle.loads can execute arbitrary code if data is
tampered with, and recommend mitigations (validate/verify data provenance, use a
digital signature/checksum, or switch to a safer serializer like JSON with
custom encoding if feasible); if opting to keep pickle for pyodide cloning,
explicitly document why pickle is required and any planned constraints (trusted
storage, no external input) so future reviewers understand the risk and
rationale.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24b791dd-5dea-4594-9dcf-91da8a6e9c4c

📥 Commits

Reviewing files that changed from the base of the PR and between 50c9822 and 9dd0a6e.

📒 Files selected for processing (37)
  • benchmarks/000.microbenchmarks/040.server-reply/config.json
  • benchmarks/100.webapps/120.uploader/python/function_cloudflare.py
  • benchmarks/100.webapps/130.crud-api/nodejs/function.js
  • benchmarks/100.webapps/130.crud-api/nodejs/package.json
  • benchmarks/300.utilities/311.compression/nodejs/function.js
  • benchmarks/300.utilities/311.compression/nodejs/package.json
  • benchmarks/wrappers/cloudflare/nodejs/build.js
  • benchmarks/wrappers/cloudflare/nodejs/container/handler.js
  • benchmarks/wrappers/cloudflare/nodejs/container/nosql.js
  • benchmarks/wrappers/cloudflare/nodejs/container/package.json
  • benchmarks/wrappers/cloudflare/nodejs/container/storage.js
  • benchmarks/wrappers/cloudflare/nodejs/container/worker.js
  • benchmarks/wrappers/cloudflare/nodejs/handler.js
  • benchmarks/wrappers/cloudflare/nodejs/nosql.js
  • benchmarks/wrappers/cloudflare/nodejs/request-polyfill.js
  • benchmarks/wrappers/cloudflare/nodejs/storage.js
  • benchmarks/wrappers/cloudflare/python/container/handler.py
  • benchmarks/wrappers/cloudflare/python/container/nosql.py
  • benchmarks/wrappers/cloudflare/python/container/storage.py
  • benchmarks/wrappers/cloudflare/python/handler.py
  • benchmarks/wrappers/cloudflare/python/nosql.py
  • benchmarks/wrappers/cloudflare/python/storage.py
  • configs/cloudflare-test.json
  • configs/systems.json
  • dockerfiles/cloudflare/Dockerfile.manage
  • dockerfiles/cloudflare/nodejs/Dockerfile
  • dockerfiles/cloudflare/python/Dockerfile
  • docs/platforms.md
  • docs/storage.md
  • sebs/cli.py
  • sebs/cloudflare/__init__.py
  • sebs/cloudflare/cli.py
  • sebs/cloudflare/cloudflare.py
  • sebs/cloudflare/config.py
  • sebs/cloudflare/containers.py
  • sebs/cloudflare/durable_objects.py
  • sebs/cloudflare/function.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • benchmarks/300.utilities/311.compression/nodejs/package.json
  • benchmarks/100.webapps/130.crud-api/nodejs/function.js
  • benchmarks/wrappers/cloudflare/nodejs/container/package.json
  • benchmarks/300.utilities/311.compression/nodejs/function.js
  • benchmarks/wrappers/cloudflare/nodejs/container/nosql.js
  • docs/storage.md
  • benchmarks/000.microbenchmarks/040.server-reply/config.json

Comment on lines +16 to +20
res = await pyfetch(url, headers=headers)
bs = await res.bytes()

with open(download_path, 'wb') as f:
f.write(bs)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate HTTP download success before writing and uploading.

Right now any HTTP response body (including 404/500 pages) is written and then uploaded as if it were valid content. Add an explicit status check and fail fast on non-success responses.

🔧 Proposed fix
 async def do_request(url, download_path):
     headers = {'User-Agent': SEBS_USER_AGENT}

     res = await pyfetch(url, headers=headers)
+    if not res.ok:
+        raise RuntimeError(f"Download failed with status {res.status} for URL: {url}")
     bs = await res.bytes()
     
     with open(download_path, 'wb') as f:
         f.write(bs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/100.webapps/120.uploader/python/function_cloudflare.py` around
lines 16 - 20, The code writes and uploads whatever body pyfetch returned
without checking HTTP status; update the download flow that uses pyfetch/res and
download_path to first inspect the response status (e.g., res.status or res.ok)
and raise/return an error (fail fast) for non-2xx responses so the file is not
written or uploaded; only call res.bytes() and open/download_path for writing
when the response indicates success, and include a clear error/exception message
when rejecting non-success responses.

output_prefix = event.get('bucket').get('output')
url = event.get('object').get('url')
name = os.path.basename(url)
download_path = '/tmp/{}'.format(name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a secure temporary file path instead of a predictable /tmp/<name>.

The current path is predictable and can collide across concurrent invocations. Use tempfile (mkstemp/NamedTemporaryFile) to avoid race/collision issues.

🔒 Proposed fix
 import datetime
 import os
+import tempfile
@@
-    name = os.path.basename(url)
-    download_path = '/tmp/{}'.format(name)
+    name = os.path.basename(url)
+    fd, download_path = tempfile.mkstemp(prefix="sebs-", suffix=f"-{name}", dir="/tmp")
+    os.close(fd)
🧰 Tools
🪛 Ruff (0.15.6)

[error] 28-28: Probable insecure usage of temporary file or directory: "/tmp/{}"

(S108)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/100.webapps/120.uploader/python/function_cloudflare.py` at line
28, The assignment download_path = '/tmp/{}'.format(name) uses a predictable
filename; replace it by creating a secure temp file (e.g., tempfile.mkstemp() or
tempfile.NamedTemporaryFile()) and use the returned path/filename instead of
composing /tmp/<name>. Update the code that writes/read the file to use the file
descriptor or file object from tempfile, close the fd (if using mkstemp) and
ensure proper cleanup; refer to the download_path variable and the surrounding
download/save logic when making this change.

Comment on lines +74 to +83
// Extract Worker URL from header for R2 and NoSQL proxy
const workerUrl = req.headers['x-worker-url'];
if (workerUrl) {
if (storage && storage.storage && storage.storage.set_worker_url) {
storage.storage.set_worker_url(workerUrl);
}
if (nosql && nosql.nosql && nosql.nosql.set_worker_url) {
nosql.nosql.set_worker_url(workerUrl);
}
console.log(`Set worker URL for R2/NoSQL proxy: ${workerUrl}`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't take the storage/NoSQL proxy origin from a request header.

This header is copied straight into the backend clients, so any caller that can reach the container can redirect follow-up requests to arbitrary endpoints. Resolve the Worker origin from deployment config, or hard-validate the expected host before accepting it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/nodejs/container/handler.js` around lines 74 -
83, Do not trust req.headers['x-worker-url'] directly; instead derive the worker
origin from deployment configuration or a validated allowlist before calling
storage.storage.set_worker_url or nosql.nosql.set_worker_url. Replace direct
usage of the header in the handler: remove unconditional calls that pass
workerUrl to storage.storage.set_worker_url and nosql.nosql.set_worker_url, and
implement logic that (a) reads the canonical worker origin from config (or an
environment variable) or (b) validates the header value against a strict
allowlist/regex of expected hosts and only then calls
storage.storage.set_worker_url / nosql.nosql.set_worker_url; log a warning and
skip setting the backend clients if validation fails.

Comment on lines +120 to +134
// For debugging: check /tmp directory before and after benchmark
const fs = require('fs');

// Call the benchmark function
const ret = await benchmarkHandler(event);

// Check what was downloaded
const tmpFiles = fs.readdirSync('/tmp');
for (const file of tmpFiles) {
const filePath = `/tmp/${file}`;
const stats = fs.statSync(filePath);
if (stats.size < 500) {
const content = fs.readFileSync(filePath, 'utf8');
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove the synchronous /tmp scan from the benchmark path.

This adds extra filesystem work to every invocation and will skew the latency/memory numbers SeBS is trying to measure. content is never used, so this looks like leftover debugging.

✏️ Minimal fix
-    // For debugging: check /tmp directory before and after benchmark
-    const fs = require('fs');
-
     // Call the benchmark function
     const ret = await benchmarkHandler(event);
-    
-    // Check what was downloaded
-    const tmpFiles = fs.readdirSync('/tmp');
-    for (const file of tmpFiles) {
-      const filePath = `/tmp/${file}`;
-      const stats = fs.statSync(filePath);
-      if (stats.size < 500) {
-        const content = fs.readFileSync(filePath, 'utf8');
-      }
-    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/nodejs/container/handler.js` around lines 120
- 134, The temporary /tmp directory synchronous scan (fs.readdirSync,
fs.statSync, fs.readFileSync) after calling benchmarkHandler is leftover
debugging and should be removed because it adds I/O and alloc overhead and uses
the unused variable content; delete the entire block that builds tmpFiles and
iterates over it (the fs.* calls and the content read) from the code around the
await benchmarkHandler(event) call (refer to benchmarkHandler, tmpFiles,
fs.readdirSync, fs.statSync, fs.readFileSync) — alternatively, if you need this
for local debugging only, wrap it behind a debug flag so it is not executed in
normal benchmark runs.

Comment on lines +31 to +37
async query(keyPrefix) {
const list = await this.ctx.storage.list();
const items = [];
for (const [k, v] of list) {
items.push(v);
}
return { items };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

query currently ignores keyPrefix and returns all rows.

This breaks query semantics and can become unbounded as data grows. Filter by prefix before appending results.

🔧 Proposed fix
   async query(keyPrefix) {
     const list = await this.ctx.storage.list();
     const items = [];
+    const prefix = Array.isArray(keyPrefix) ? keyPrefix.join(':') : String(keyPrefix ?? '');
     for (const [k, v] of list) {
-      items.push(v);
+      if (!prefix || k.startsWith(prefix)) {
+        items.push(v);
+      }
     }
     return { items };
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async query(keyPrefix) {
const list = await this.ctx.storage.list();
const items = [];
for (const [k, v] of list) {
items.push(v);
}
return { items };
async query(keyPrefix) {
const list = await this.ctx.storage.list();
const items = [];
const prefix = Array.isArray(keyPrefix) ? keyPrefix.join(':') : String(keyPrefix ?? '');
for (const [k, v] of list) {
if (!prefix || k.startsWith(prefix)) {
items.push(v);
}
}
return { items };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/wrappers/cloudflare/nodejs/container/worker.js` around lines 31 -
37, The query method is ignoring the keyPrefix and returning all rows; update
the async query(keyPrefix) implementation to filter entries from
this.ctx.storage.list() by checking the key (the variable k in the for (const
[k, v] of list) loop) against keyPrefix (e.g., skip unless keyPrefix is falsy or
k.startsWith(keyPrefix)) before pushing v into the items array so only matching
rows are returned.

Comment on lines +24 to +26
RUN curl -LsSf https://astral.sh/uv/install.sh | sh && \
/root/.local/bin/uv tool install workers-py

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -nP 'curl .*install\.sh \| sh|uv tool install workers-py' dockerfiles/cloudflare/Dockerfile.manage

Repository: spcl/serverless-benchmarks

Length of output: 180


🏁 Script executed:

# Check the Dockerfile context and see if there are ARG definitions or security patterns already in use
head -n 30 dockerfiles/cloudflare/Dockerfile.manage

Repository: spcl/serverless-benchmarks

Length of output: 1292


🏁 Script executed:

# Search for other similar curl|sh patterns in the codebase
rg 'curl.*\| sh' dockerfiles/

Repository: spcl/serverless-benchmarks

Length of output: 167


🏁 Script executed:

# Check for other uv tool installations
rg 'uv tool install' dockerfiles/

Repository: spcl/serverless-benchmarks

Length of output: 158


🏁 Script executed:

# Look for existing ARG usage for security/version management in Dockerfiles
rg 'ARG.*SHA|ARG.*VERSION' dockerfiles/

Repository: spcl/serverless-benchmarks

Length of output: 1513


🏁 Script executed:

# Check if there are any comments or git history explaining the workers-py unpinned installation
git log --oneline -n 20 dockerfiles/cloudflare/Dockerfile.manage 2>/dev/null || echo "Git history not available"

Repository: spcl/serverless-benchmarks

Length of output: 160


🏁 Script executed:

# Look for any documentation or discussion about workers-py version requirements
rg 'workers-py' . --type=md --type=txt --type=py

Repository: spcl/serverless-benchmarks

Length of output: 114


Implement integrity verification and version pinning for uv and workers-py installation.

Line 24 pipes a network script directly to sh without integrity verification; Line 25 installs workers-py unpinned. Both pose supply-chain and reproducibility risks. This is inconsistent with the Docker GPG verification already implemented in this same Dockerfile (lines 11-12).

Proposed safer install pattern
+ARG UV_INSTALL_SHA256
+ARG WORKERS_PY_VERSION
-RUN curl -LsSf https://astral.sh/uv/install.sh | sh && \
-    /root/.local/bin/uv tool install workers-py
+RUN curl -fsSLo /tmp/uv-install.sh https://astral.sh/uv/install.sh \
+    && echo "${UV_INSTALL_SHA256}  /tmp/uv-install.sh" | sha256sum -c - \
+    && sh /tmp/uv-install.sh \
+    && /root/.local/bin/uv tool install "workers-py==${WORKERS_PY_VERSION}" \
+    && rm -f /tmp/uv-install.sh
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dockerfiles/cloudflare/Dockerfile.manage` around lines 24 - 26, Avoid piping
the remote install script directly to sh and pin versions/checksums: download
the astral installer with curl to a file (the existing RUN that invokes "curl
-LsSf https://astral.sh/uv/install.sh | sh" and the subsequent
"/root/.local/bin/uv tool install workers-py" should be changed), fetch or ARG
the expected sha256 (or GPG signature) for the installer and verify it before
executing, install a pinned uv release (call the installer with the specific uv
version or use a released binary artifact) and then install workers-py with an
explicit pinned version (e.g., use "uv tool install workers-py@<VERSION>"),
failing the build if the checksum/signature or versions do not match; reference
the installer download step and the "/root/.local/bin/uv tool install
workers-py" invocation when making these changes.

Comment on lines +35 to +36
# Start the HTTP server
CMD ["node", "handler.js"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Container runs as root user.

The static analysis tool flagged that this container runs as root (DS-0002). Running containers as root is a security risk—if the application is compromised, the attacker has full privileges within the container.

Consider adding a non-root user:

🔒 Suggested fix to run as non-root user
 # Copy all application files
 COPY . .

+# Create non-root user for security
+RUN groupadd -r nodejs && useradd -r -g nodejs nodejs && \
+    chown -R nodejs:nodejs /app
+
 # Run benchmark init script if it exists (e.g., for ffmpeg in video-processing)
 # This downloads static binaries needed by the benchmark
-# Note: ignore errors from init.sh (e.g., when resources already exist)
 RUN if [ -f "init.sh" ]; then \
         chmod +x init.sh && \
-        ./init.sh /app verbose x64 || true; \
+        ./init.sh /app verbose x64 || echo "Warning: init.sh failed"; \
     fi

+USER nodejs
+
 # Expose port 8080 for container communication
 EXPOSE 8080

Note: If init.sh requires root privileges (e.g., installing packages), run the USER directive after the init script.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dockerfiles/cloudflare/nodejs/Dockerfile` around lines 35 - 36, The
Dockerfile currently starts the container with CMD ["node", "handler.js"] which
runs as root; create a non-root user and group (e.g., appuser), chown the app
files and any runtime directories to that user, and add a USER appuser directive
before the CMD so the process runs unprivileged; if init.sh requires root
(package installs, apt, etc.) keep running those steps as root and move the USER
directive to after init.sh to drop privileges before CMD.

Comment on lines +188 to +191
# Only override container_deployment if explicitly set via CLI
# If not in config, use CLI default (False)
if container_deployment or "container_deployment" not in config_obj.get("experiments", {}):
update_nested_dict(config_obj, ["experiments", "container_deployment"], container_deployment)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Click 7.1.2, for an option declared as --foo/--no-foowithdefault=False, can the callback distinguish between an omitted flag and explicit --no-foo? Is default=None supported for tri-state behavior?

💡 Result:

No, in Click 7.1.2, the callback cannot distinguish between an omitted flag and explicit --no-foo for an option declared as --foo/--no-foo with default=False. Both cases pass False to the callback with no mechanism to differentiate the source (omitted vs. explicit). The value is the same (False), and there is no get_parameter_source or equivalent in 7.1.2 to check if it came from the command line or default. No, default=None is not supported for tri-state behavior in boolean flags like --foo/--no-foo. Setting default=None passes None only if omitted, but --no-foo sets False, and --foo sets True. However, docs note the default is passed as-is, but for flags, behavior may differ; primary sources confirm no explicit tri-state support without custom handling, and default=None prompts differently but does not enable distinguishing in callback without additional context access (unavailable in 7.1.2).

Citations:


🏁 Script executed:

# First, find and examine the sebs/cli.py file
git ls-files | grep -E "sebs/cli\.py$"

Repository: spcl/serverless-benchmarks

Length of output: 81


🏁 Script executed:

# Read the context around lines 188-191
head -n 210 sebs/cli.py | tail -n 35

Repository: spcl/serverless-benchmarks

Length of output: 1878


🏁 Script executed:

# Find the Click option decorator for container_deployment
rg -A 5 "container.deployment.*click\.option" sebs/cli.py

Repository: spcl/serverless-benchmarks

Length of output: 52


🏁 Script executed:

# Search for the function definition that contains container_deployment parameter
rg -B 30 "container_deployment" sebs/cli.py | head -n 100

Repository: spcl/serverless-benchmarks

Length of output: 3213


🏁 Script executed:

# Search for the --container-deployment option definition
rg "container-deployment" sebs/cli.py

Repository: spcl/serverless-benchmarks

Length of output: 129


🏁 Script executed:

# Search for simplified_common_params to understand how options are defined
rg -B 50 "simplified_common_params" sebs/cli.py | head -n 150

Repository: spcl/serverless-benchmarks

Length of output: 4502


--no-container-deployment cannot override a truthy config value with the current logic.

Click passes False both when the flag is omitted and when explicitly set to --no-container-deployment. The condition if container_deployment or "container_deployment" not in config_obj.get("experiments", {}) only updates when the key is missing from config, leaving a container_deployment: true in the config unchanged.

Suggested fix
`@click.option`(
    "--container-deployment/--no-container-deployment",
-   default=False,
+   default=None,
    help="Deploy functions as container images (AWS only).",
)
-if container_deployment or "container_deployment" not in config_obj.get("experiments", {}):
-    update_nested_dict(config_obj, ["experiments", "container_deployment"], container_deployment)
+if container_deployment is not None:
+    update_nested_dict(
+        config_obj, ["experiments", "container_deployment"], container_deployment
+    )
+elif "container_deployment" not in config_obj.get("experiments", {}):
+    update_nested_dict(config_obj, ["experiments", "container_deployment"], False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/cli.py` around lines 188 - 191, The current logic can't distinguish an
omitted flag from an explicit --no, so change the Click option for
container_deployment to use default=None (so omitted => None, explicit
True/False passed) and then update this check to "if container_deployment is not
None or 'container_deployment' not in config_obj.get('experiments', {})" before
calling update_nested_dict(config_obj, ['experiments','container_deployment'],
container_deployment); this ensures an explicit False from the CLI overwrites a
truthy value in config_obj while keeping the config default when the flag is
omitted.

Comment on lines +42 to +49
dockerfile_path = os.path.join(
os.path.dirname(__file__),
"..",
"..",
"dockerfiles",
"cloudflare",
"Dockerfile.manage"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In docker SDK for Python 4.2.0, for images.build(path=..., dockerfile=...), must dockerfile be specified relative to the build context?

💡 Result:

Yes, in Docker SDK for Python 4.2.0, the dockerfile parameter for images.build(path=..., dockerfile=...) must be specified as a path relative to the build context (the path directory). The official documentation describes it as "path within the build context to the Dockerfile". While some users report that absolute paths may work in practice, this is not documented behavior and relative paths are required per the docs. The Docker build system itself enforces that the Dockerfile must exist within (or be reachable from) the sent build context tarball.

Citations:


🏁 Script executed:

# Examine the file to understand the exact code and verify both locations
cat -n sebs/cloudflare/cli.py | sed -n '35,75p'

Repository: spcl/serverless-benchmarks

Length of output: 2167


Use relative path for dockerfile parameter.

images.build() requires the dockerfile parameter to be relative to the build context (path). Passing an absolute host path will fail when the Docker SDK attempts to locate the file within the sent build context.

Suggested fix
-                dockerfile_path = os.path.join(
+                dockerfile_relpath = os.path.join(
+                    "dockerfiles",
+                    "cloudflare",
+                    "Dockerfile.manage",
+                )
+                dockerfile_path = os.path.join(
                     os.path.dirname(__file__), 
                     "..", 
                     "..", 
-                    "dockerfiles", 
-                    "cloudflare", 
-                    "Dockerfile.manage"
+                    dockerfile_relpath,
                 )
@@
                     image, build_logs = docker_client.images.build(
                         path=build_path,
-                        dockerfile=dockerfile_path,
+                        dockerfile=dockerfile_relpath,
                         tag=full_image_name,
                         rm=True,
                         pull=True
                     )

Also applies to: 62-68

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/cloudflare/cli.py` around lines 42 - 49, The dockerfile_path is built as
an absolute host path but images.build() requires the dockerfile argument to be
relative to the build context; update the Docker build calls so you compute a
build_context (e.g., the directory to use as path) and pass dockerfile as a
relative path within that context (for example
"dockerfiles/cloudflare/Dockerfile.manage"), replacing uses of dockerfile_path
in the images.build() calls (and the similar occurrence at lines 62-68) so the
dockerfile parameter is relative to the provided build_context.

Comment on lines +83 to +97
elif "CLOUDFLARE_API_TOKEN" in os.environ:
ret = CloudflareCredentials(
api_token=os.environ["CLOUDFLARE_API_TOKEN"],
account_id=os.environ.get("CLOUDFLARE_ACCOUNT_ID"),
r2_access_key_id=os.environ.get("CLOUDFLARE_R2_ACCESS_KEY_ID"),
r2_secret_access_key=os.environ.get("CLOUDFLARE_R2_SECRET_ACCESS_KEY")
)
elif "CLOUDFLARE_EMAIL" in os.environ and "CLOUDFLARE_API_KEY" in os.environ:
ret = CloudflareCredentials(
email=os.environ["CLOUDFLARE_EMAIL"],
api_key=os.environ["CLOUDFLARE_API_KEY"],
account_id=os.environ.get("CLOUDFLARE_ACCOUNT_ID"),
r2_access_key_id=os.environ.get("CLOUDFLARE_R2_ACCESS_KEY_ID"),
r2_secret_access_key=os.environ.get("CLOUDFLARE_R2_SECRET_ACCESS_KEY")
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Enforce account_id presence after deserialization.

account_id is documented as required, but env-based paths currently allow None. Fail early if it is missing to avoid delayed runtime failures.

🔧 Proposed fix
         else:
             raise RuntimeError(
                 "Cloudflare login credentials are missing! Please set "
                 "up environmental variables CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID, "
                 "or CLOUDFLARE_EMAIL, CLOUDFLARE_API_KEY, and CLOUDFLARE_ACCOUNT_ID"
             )
+
+        if not ret.account_id:
+            raise RuntimeError(
+                "Cloudflare account ID is required. Please set CLOUDFLARE_ACCOUNT_ID "
+                "or provide credentials.account_id in configuration."
+            )

Also applies to: 99-103

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/cloudflare/config.py` around lines 83 - 97, The env-based
CloudflareCredentials branches currently allow account_id to be None; after
constructing ret in both branches (the CloudflareCredentials instantiations that
set api_token or email/api_key), immediately validate ret.account_id and raise a
clear ValueError (or similar) if it's missing to fail fast; apply the same check
for the other identical deserialization block referenced in the comment (the
second env-path at lines 99-103) so both the token and email/api_key flows
enforce a required account_id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants