WIP feat (browsers): create throughput benchmark for browser providers by kisernl · Pull Request #115 · computesdk/benchmarks

kisernl · 2026-05-07T15:59:25Z

This pull request introduces a new browser step throughput benchmark to measure and compare how quickly different browser providers can execute a sequence of agent-style actions within a single session. It adds a comprehensive workflow for automated benchmarking, updates documentation, and enhances configuration and reporting for these new benchmarks.

Key changes:

New Benchmarking Capability

Added a new GitHub Actions workflow (.github/workflows/browser-throughput-benchmarks.yml) to automate browser throughput benchmarking across multiple providers, including scheduled daily runs, PR-triggered runs, and result collection/posting.
Introduced new npm scripts in package.json for running browser throughput benchmarks per provider and for generating SVG summary tables. [1] [2]

Documentation

Added THROUGHPUT.md to thoroughly document the new browser step throughput benchmark, including its motivation, methodology, scoring, action sequence, and limitations.

Benchmark Implementation Improvements

Updated src/browser/benchmark.ts to allow configurable timeout and to correctly derive the iteration count for reporting, improving result accuracy and flexibility. [1] [2] [3]

github-actions · 2026-05-07T16:00:51Z

Browser Benchmark Results

#	Provider	Score	Create	Connect	Navigate	Release	Total	Status
1	Kernel	94.6	0.04s	0.43s	0.15s	0.04s	0.68s	10/10
2	Browserbase	94.3	0.22s	0.12s	0.15s	0.14s	0.66s	10/10
3	Hyperbrowser	94.1	0.17s	0.16s	0.12s	0.09s	0.55s	10/10
4	Steel	82.8	0.41s	0.61s	0.12s	0.13s	1.68s	10/10

View full run · SVG available as build artifact

github-actions · 2026-05-07T16:01:52Z

Browser Throughput Benchmark Results

#	Provider	Score	APS (med)	Task (med)	Task (p95)	Screenshot	Status
1	Browserbase	66.5	5.24/s	9.55s	9.58s	206ms	3/3
2	Hyperbrowser	53.9	3.72/s	13.44s	14.47s	352ms	3/3
3	Kernel	53.3	3.68/s	13.60s	14.84s	421ms	3/3
4	Steel	20.6	1.48/s	33.89s	35.42s	732ms	3/3

View full run · SVG available as build artifact

github-actions · 2026-05-07T16:01:55Z

Sandbox Benchmark Results

Sequential

#	Provider	Score	Median TTI	P95	P99	Status
1	declaw	99.3	0.04s	0.11s	0.11s	10/10
2	tensorlake	97.0	0.29s	0.33s	0.33s	10/10
3	daytona	96.5	0.23s	0.55s	0.55s	10/10
4	upstash	95.3	0.43s	0.52s	0.52s	10/10
5	archil	95.3	0.40s	0.57s	0.57s	10/10
6	blaxel	94.6	0.51s	0.58s	0.58s	10/10
7	e2b	93.9	0.49s	0.79s	0.79s	10/10
8	vercel	91.5	0.63s	1.18s	1.18s	10/10
9	runloop	84.8	1.26s	1.90s	1.90s	10/10
10	hopx	82.8	1.62s	1.86s	1.86s	10/10
11	cloudflare	78.2	1.96s	2.50s	2.50s	10/10
12	modal	74.6	1.98s	3.38s	3.38s	10/10
13	codesandbox	73.9	2.52s	2.74s	2.74s	10/10
14	namespace	72.5	1.90s	2.01s	2.01s	9/10

Staggered

#	Provider	Score	Median TTI	P95	P99	Status
1	declaw	99.6	0.04s	0.04s	0.04s	10/10
2	tensorlake	97.0	0.29s	0.32s	0.32s	10/10
3	daytona	96.9	0.23s	0.43s	0.43s	10/10
4	archil	96.8	0.31s	0.33s	0.33s	10/10
5	upstash	96.0	0.38s	0.42s	0.42s	10/10
6	blaxel	94.7	0.52s	0.56s	0.56s	10/10
7	e2b	93.9	0.47s	0.82s	0.82s	10/10
8	vercel	89.0	0.79s	1.56s	1.56s	10/10
9	hopx	84.3	1.40s	1.82s	1.82s	10/10
10	modal	81.6	1.69s	2.07s	2.07s	10/10
11	namespace	80.7	1.88s	2.00s	2.00s	10/10
12	cloudflare	78.5	2.00s	2.38s	2.38s	10/10
13	codesandbox	70.0	2.74s	3.40s	3.40s	10/10
14	runloop	50.8	1.54s	14.64s	14.64s	10/10

Burst

#	Provider	Score	Median TTI	P95	P99	Status
1	declaw	99.5	0.04s	0.07s	0.07s	10/10
2	tensorlake	96.7	0.30s	0.38s	0.38s	10/10
3	daytona	96.2	0.25s	0.58s	0.58s	10/10
4	upstash	95.4	0.42s	0.53s	0.53s	10/10
5	archil	95.3	0.44s	0.51s	0.51s	10/10
6	blaxel	94.0	0.59s	0.62s	0.62s	10/10
7	e2b	93.8	0.52s	0.78s	0.78s	10/10
8	vercel	90.7	0.84s	1.06s	1.06s	10/10
9	namespace	79.2	1.88s	2.38s	2.38s	10/10
10	cloudflare	74.5	2.06s	3.28s	3.28s	10/10
11	runloop	73.4	2.31s	3.19s	3.19s	10/10
12	codesandbox	67.1	3.06s	3.64s	3.64s	10/10
13	hopx	50.9	1.79s	9.60s	9.60s	10/10
14	modal	48.4	1.94s	29.27s	29.27s	10/10

View full run · SVGs available as build artifacts

github-actions · 2026-05-07T16:08:23Z

Storage Benchmark Results

1MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	AWS S3	95.7	0.05s	178.7 Mbps	0.06s	1000/1000
2	Cloudflare R2	94.9	0.10s	87.8 Mbps	0.18s	1000/1000
3	Tigris	94.5	0.20s	42.9 Mbps	0.11s	1000/1000

4MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Cloudflare R2	95.2	0.15s	217.7 Mbps	0.30s	1000/1000
2	AWS S3	93.8	0.45s	74.8 Mbps	0.22s	1000/1000
3	Tigris	91.6	1.14s	29.4 Mbps	0.33s	1000/1000

10MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Cloudflare R2	94.5	0.31s	267.1 Mbps	0.68s	1000/1000
2	AWS S3	90.7	1.36s	61.6 Mbps	0.46s	1000/1000
3	Tigris	86.2	3.40s	24.6 Mbps	0.52s	1000/1000

16MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Cloudflare R2	94.6	0.42s	321.8 Mbps	0.71s	1000/1000
2	AWS S3	87.1	2.95s	45.5 Mbps	0.55s	1000/1000
3	Tigris	83.7	4.39s	30.6 Mbps	0.53s	1000/1000

View full run · SVGs available as build artifacts

Copilot

Pull request overview

Adds a new “browser step throughput” benchmark mode to measure per-action performance within a single long-lived browser session, complementing the existing browser lifecycle benchmark.

Changes:

Introduces a new browser-throughput benchmark runner (50-action Wikipedia loop), result schema, and composite scoring.
Adds provider configs, SVG generation, CLI wiring, and npm scripts for running and reporting throughput benchmarks.
Adds a dedicated GitHub Actions workflow to run/merge throughput results and post PR comments.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
THROUGHPUT.md	Documents the new throughput benchmark methodology, scoring, running, and scheduling.
src/run.ts	Adds a new `browser-throughput` mode to run the benchmark and write results.
src/merge-results.ts	Adds merge + table-printing logic for browser-throughput artifacts.
src/browser/throughput-types.ts	Defines result and provider config types for throughput benchmarking.
src/browser/throughput-scoring.ts	Implements composite scoring + sorting for throughput results.
src/browser/throughput-providers.ts	Adds provider definitions and session options (stealth/headless/viewport).
src/browser/throughput-benchmark.ts	Implements the 50-action throughput benchmark runner and JSON writer.
src/browser/generate-throughput-svg.ts	Generates an SVG leaderboard for throughput results.
results/browser-throughput/.gitkeep	Ensures the results directory exists in-repo.
package.json	Adds bench and SVG generation scripts for browser-throughput.
.github/workflows/browser-throughput-benchmarks.yml	Adds CI workflow to run, merge, render, and publish throughput benchmark results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

feat (browsers): create throughput benchmark for browser providers

f3ebce7

kisernl requested a review from Copilot May 7, 2026 16:09

Copilot started reviewing on behalf of kisernl May 7, 2026 16:09 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread src/browser/throughput-scoring.ts Outdated

Comment thread src/run.ts

Comment thread src/browser/throughput-benchmark.ts

Comment thread THROUGHPUT.md

Comment thread THROUGHPUT.md Outdated

Comment thread src/merge-results.ts

fix: resolve PR comments

06440b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP feat (browsers): create throughput benchmark for browser providers#115

WIP feat (browsers): create throughput benchmark for browser providers#115
kisernl wants to merge 2 commits into
masterfrom
step-throughput-benchmark

kisernl commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kisernl commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Benchmarking Capability

Documentation

Benchmark Implementation Improvements

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Browser Benchmark Results

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Browser Throughput Benchmark Results

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sandbox Benchmark Results

Sequential

Staggered

Burst

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Storage Benchmark Results

1MB Files

4MB Files

10MB Files

16MB Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kisernl commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading