Skip to content

WIP feat (browsers): create throughput benchmark for browser providers#115

Open
kisernl wants to merge 2 commits into
masterfrom
step-throughput-benchmark
Open

WIP feat (browsers): create throughput benchmark for browser providers#115
kisernl wants to merge 2 commits into
masterfrom
step-throughput-benchmark

Conversation

@kisernl
Copy link
Copy Markdown
Contributor

@kisernl kisernl commented May 7, 2026

This pull request introduces a new browser step throughput benchmark to measure and compare how quickly different browser providers can execute a sequence of agent-style actions within a single session. It adds a comprehensive workflow for automated benchmarking, updates documentation, and enhances configuration and reporting for these new benchmarks.

Key changes:

New Benchmarking Capability

  • Added a new GitHub Actions workflow (.github/workflows/browser-throughput-benchmarks.yml) to automate browser throughput benchmarking across multiple providers, including scheduled daily runs, PR-triggered runs, and result collection/posting.
  • Introduced new npm scripts in package.json for running browser throughput benchmarks per provider and for generating SVG summary tables. [1] [2]

Documentation

  • Added THROUGHPUT.md to thoroughly document the new browser step throughput benchmark, including its motivation, methodology, scoring, action sequence, and limitations.

Benchmark Implementation Improvements

  • Updated src/browser/benchmark.ts to allow configurable timeout and to correctly derive the iteration count for reporting, improving result accuracy and flexibility. [1] [2] [3]

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Browser Benchmark Results

# Provider Score Create Connect Navigate Release Total Status
1 Kernel 94.6 0.04s 0.43s 0.15s 0.04s 0.68s 10/10
2 Browserbase 94.3 0.22s 0.12s 0.15s 0.14s 0.66s 10/10
3 Hyperbrowser 94.1 0.17s 0.16s 0.12s 0.09s 0.55s 10/10
4 Steel 82.8 0.41s 0.61s 0.12s 0.13s 1.68s 10/10

View full run · SVG available as build artifact

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Browser Throughput Benchmark Results

# Provider Score APS (med) Task (med) Task (p95) Screenshot Status
1 Browserbase 66.5 5.24/s 9.55s 9.58s 206ms 3/3
2 Hyperbrowser 53.9 3.72/s 13.44s 14.47s 352ms 3/3
3 Kernel 53.3 3.68/s 13.60s 14.84s 421ms 3/3
4 Steel 20.6 1.48/s 33.89s 35.42s 732ms 3/3

View full run · SVG available as build artifact

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Sandbox Benchmark Results

Sequential

# Provider Score Median TTI P95 P99 Status
1 declaw 99.3 0.04s 0.11s 0.11s 10/10
2 tensorlake 97.0 0.29s 0.33s 0.33s 10/10
3 daytona 96.5 0.23s 0.55s 0.55s 10/10
4 upstash 95.3 0.43s 0.52s 0.52s 10/10
5 archil 95.3 0.40s 0.57s 0.57s 10/10
6 blaxel 94.6 0.51s 0.58s 0.58s 10/10
7 e2b 93.9 0.49s 0.79s 0.79s 10/10
8 vercel 91.5 0.63s 1.18s 1.18s 10/10
9 runloop 84.8 1.26s 1.90s 1.90s 10/10
10 hopx 82.8 1.62s 1.86s 1.86s 10/10
11 cloudflare 78.2 1.96s 2.50s 2.50s 10/10
12 modal 74.6 1.98s 3.38s 3.38s 10/10
13 codesandbox 73.9 2.52s 2.74s 2.74s 10/10
14 namespace 72.5 1.90s 2.01s 2.01s 9/10

Staggered

# Provider Score Median TTI P95 P99 Status
1 declaw 99.6 0.04s 0.04s 0.04s 10/10
2 tensorlake 97.0 0.29s 0.32s 0.32s 10/10
3 daytona 96.9 0.23s 0.43s 0.43s 10/10
4 archil 96.8 0.31s 0.33s 0.33s 10/10
5 upstash 96.0 0.38s 0.42s 0.42s 10/10
6 blaxel 94.7 0.52s 0.56s 0.56s 10/10
7 e2b 93.9 0.47s 0.82s 0.82s 10/10
8 vercel 89.0 0.79s 1.56s 1.56s 10/10
9 hopx 84.3 1.40s 1.82s 1.82s 10/10
10 modal 81.6 1.69s 2.07s 2.07s 10/10
11 namespace 80.7 1.88s 2.00s 2.00s 10/10
12 cloudflare 78.5 2.00s 2.38s 2.38s 10/10
13 codesandbox 70.0 2.74s 3.40s 3.40s 10/10
14 runloop 50.8 1.54s 14.64s 14.64s 10/10

Burst

# Provider Score Median TTI P95 P99 Status
1 declaw 99.5 0.04s 0.07s 0.07s 10/10
2 tensorlake 96.7 0.30s 0.38s 0.38s 10/10
3 daytona 96.2 0.25s 0.58s 0.58s 10/10
4 upstash 95.4 0.42s 0.53s 0.53s 10/10
5 archil 95.3 0.44s 0.51s 0.51s 10/10
6 blaxel 94.0 0.59s 0.62s 0.62s 10/10
7 e2b 93.8 0.52s 0.78s 0.78s 10/10
8 vercel 90.7 0.84s 1.06s 1.06s 10/10
9 namespace 79.2 1.88s 2.38s 2.38s 10/10
10 cloudflare 74.5 2.06s 3.28s 3.28s 10/10
11 runloop 73.4 2.31s 3.19s 3.19s 10/10
12 codesandbox 67.1 3.06s 3.64s 3.64s 10/10
13 hopx 50.9 1.79s 9.60s 9.60s 10/10
14 modal 48.4 1.94s 29.27s 29.27s 10/10

View full run · SVGs available as build artifacts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Storage Benchmark Results

1MB Files

# Provider Score Download Throughput Upload Status
1 AWS S3 95.7 0.05s 178.7 Mbps 0.06s 1000/1000
2 Cloudflare R2 94.9 0.10s 87.8 Mbps 0.18s 1000/1000
3 Tigris 94.5 0.20s 42.9 Mbps 0.11s 1000/1000

4MB Files

# Provider Score Download Throughput Upload Status
1 Cloudflare R2 95.2 0.15s 217.7 Mbps 0.30s 1000/1000
2 AWS S3 93.8 0.45s 74.8 Mbps 0.22s 1000/1000
3 Tigris 91.6 1.14s 29.4 Mbps 0.33s 1000/1000

10MB Files

# Provider Score Download Throughput Upload Status
1 Cloudflare R2 94.5 0.31s 267.1 Mbps 0.68s 1000/1000
2 AWS S3 90.7 1.36s 61.6 Mbps 0.46s 1000/1000
3 Tigris 86.2 3.40s 24.6 Mbps 0.52s 1000/1000

16MB Files

# Provider Score Download Throughput Upload Status
1 Cloudflare R2 94.6 0.42s 321.8 Mbps 0.71s 1000/1000
2 AWS S3 87.1 2.95s 45.5 Mbps 0.55s 1000/1000
3 Tigris 83.7 4.39s 30.6 Mbps 0.53s 1000/1000

View full run · SVGs available as build artifacts

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “browser step throughput” benchmark mode to measure per-action performance within a single long-lived browser session, complementing the existing browser lifecycle benchmark.

Changes:

  • Introduces a new browser-throughput benchmark runner (50-action Wikipedia loop), result schema, and composite scoring.
  • Adds provider configs, SVG generation, CLI wiring, and npm scripts for running and reporting throughput benchmarks.
  • Adds a dedicated GitHub Actions workflow to run/merge throughput results and post PR comments.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
THROUGHPUT.md Documents the new throughput benchmark methodology, scoring, running, and scheduling.
src/run.ts Adds a new browser-throughput mode to run the benchmark and write results.
src/merge-results.ts Adds merge + table-printing logic for browser-throughput artifacts.
src/browser/throughput-types.ts Defines result and provider config types for throughput benchmarking.
src/browser/throughput-scoring.ts Implements composite scoring + sorting for throughput results.
src/browser/throughput-providers.ts Adds provider definitions and session options (stealth/headless/viewport).
src/browser/throughput-benchmark.ts Implements the 50-action throughput benchmark runner and JSON writer.
src/browser/generate-throughput-svg.ts Generates an SVG leaderboard for throughput results.
results/browser-throughput/.gitkeep Ensures the results directory exists in-repo.
package.json Adds bench and SVG generation scripts for browser-throughput.
.github/workflows/browser-throughput-benchmarks.yml Adds CI workflow to run, merge, render, and publish throughput benchmark results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/browser/throughput-scoring.ts Outdated
Comment thread src/run.ts
Comment thread src/browser/throughput-benchmark.ts
Comment thread THROUGHPUT.md
Comment thread THROUGHPUT.md Outdated
Comment thread src/merge-results.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants