Skip to content

Commit 94603e5

Browse files
author
StackMemory Bot (CLI)
committed
spike: scaffold Cloudflare Sandbox execution tier
1 parent 2ca09af commit 94603e5

8 files changed

Lines changed: 2269 additions & 0 deletions

File tree

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# P3 Spike: Cloudflare Sandbox as L3 Remote Execution Tier
2+
3+
## Scope
4+
5+
This spike answers one question:
6+
7+
**Can Cloudflare Sandboxes serve as a viable `L3` remote execution tier for StackMemory agents?**
8+
9+
For this spike, `L3` means:
10+
- remote, isolated, browser-addressable execution
11+
- persistent enough to survive idle cycles via mounted storage or backups
12+
- suitable for repo-oriented agent work
13+
14+
It does **not** mean:
15+
- replacing Postgres/SQLite as StackMemory's primary metadata database
16+
- replacing the hosted memory runtime
17+
18+
## Why this is worth testing now
19+
20+
Cloudflare's current platform shape materially changed:
21+
- Sandboxes are now generally available.
22+
- The SDK exposes commands, files, PTY terminals, Git workflows, file watching, mounted buckets, and backup/restore.
23+
- The platform is explicitly targeted at agentic workloads, CI/CD, and interactive development environments.
24+
25+
That changes the answer from "interesting maybe later" to "build a real spike now".
26+
27+
## Hypothesis
28+
29+
Cloudflare Sandboxes are viable for StackMemory if all of the following are true:
30+
31+
1. We can map one sandbox ID to one project/session cleanly.
32+
2. Browser terminal UX is good enough through WebSockets.
33+
3. Repo bootstrap plus restore beats repeated cold setup.
34+
4. Mounted storage and backups are good enough for persistence.
35+
5. Control-plane logic can stay in Workers without leaking secrets into the sandbox.
36+
37+
## What the spike package implements
38+
39+
See `packages/cloudflare-sandbox-spike/`.
40+
41+
It provides:
42+
- Worker entrypoint
43+
- Sandbox binding
44+
- container image
45+
- websocket terminal route
46+
- Git checkout bootstrap route
47+
- command execution route
48+
- file read/write routes
49+
- mounted R2 route
50+
- backup/restore routes
51+
52+
This is enough to validate the platform shape without dragging in full StackMemory runtime complexity.
53+
54+
## Viability criteria
55+
56+
The spike is a `GO` if we can demonstrate:
57+
58+
1. **Bootstrapping**
59+
- clone a repo into `/workspace/repo`
60+
- run install/test/build commands
61+
62+
2. **Interactive work**
63+
- connect browser terminal over websocket
64+
- keep shell state in a session
65+
66+
3. **Persistence**
67+
- mount an R2 bucket into the sandbox filesystem
68+
- persist files across sandbox destruction
69+
- create and restore a workspace backup
70+
71+
4. **Security model**
72+
- control secrets from the Worker side
73+
- avoid embedding live credentials directly into user-controlled code
74+
75+
5. **Operational clarity**
76+
- one sandbox ID maps cleanly to project/session identity
77+
- cleanup lifecycle is explicit
78+
- failure modes are understandable
79+
80+
## Non-goals
81+
82+
- Multi-tenant billing
83+
- full StackMemory API integration
84+
- hosted retrieval/indexing layer
85+
- production authn/authz
86+
- scheduler and queue integration
87+
- fleet management
88+
89+
## Current platform reading
90+
91+
### What looks strong
92+
93+
Cloudflare now has the pieces we actually need:
94+
- Sandboxes are GA and explicitly positioned for untrusted code execution and agent workflows.
95+
- Sandbox instances are Durable Objects under the hood, which gives a natural coordination identity.
96+
- Git operations are first-class.
97+
- PTY terminals are first-class.
98+
- Mounted S3-compatible buckets are first-class.
99+
- Backup/restore is first-class and specifically designed to avoid repeating clone/install/setup costs.
100+
101+
### What still looks limiting
102+
103+
- Sandbox containers are still ephemeral unless you add mounted storage or backups.
104+
- Backup/restore is production-only right now, not `wrangler dev`.
105+
- Preview URLs need custom domain setup; `.workers.dev` is not enough for exposed-port workflows.
106+
- This is an execution platform, not a relational memory/query platform.
107+
108+
## Recommended production shape if this spike passes
109+
110+
### Keep
111+
- StackMemory hosted runtime for:
112+
- projects
113+
- runs
114+
- frames
115+
- anchors
116+
- retrieval
117+
- search
118+
- orchestration
119+
120+
### Add
121+
- Cloudflare Sandbox as:
122+
- per-project execution runtime
123+
- browser terminal endpoint
124+
- disposable worker session host
125+
126+
### Store
127+
- R2 for:
128+
- mounted project persistence
129+
- backup archives
130+
- large artifacts and logs
131+
132+
### Coordinate
133+
- Durable Objects for:
134+
- sandbox identity
135+
- session-to-sandbox routing
136+
- short-lived coordination state
137+
138+
## Recommended production shape if this spike fails
139+
140+
If terminal UX, bootstrap/restore latency, or operational complexity are poor, do not force it.
141+
142+
Fallback:
143+
- keep execution local or VM-based
144+
- use Cloudflare only for edge API/control-plane pieces
145+
- do not contort StackMemory around a weak remote execution substrate
146+
147+
## First decision after the spike
148+
149+
At the end of P3, we should be able to say one of these clearly:
150+
151+
1. **GO**: Cloudflare Sandbox is good enough for a real remote execution track.
152+
2. **PARTIAL GO**: good for ephemeral code execution, not good enough for long-lived interactive project sessions.
153+
3. **NO GO**: useful technology, wrong fit for StackMemory's execution model.
154+
155+
## Deliverables
156+
157+
- `packages/cloudflare-sandbox-spike/`
158+
- this decision note
159+
- a short benchmark/result note after hands-on validation
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
R2_ENDPOINT=https://YOUR_ACCOUNT_ID.r2.cloudflarestorage.com
2+
AWS_ACCESS_KEY_ID=replace-me
3+
AWS_SECRET_ACCESS_KEY=replace-me
4+
CLOUDFLARE_ACCOUNT_ID=replace-me
5+
BACKUP_BUCKET_NAME=stackmemory-spike-backups
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
FROM docker.io/cloudflare/sandbox:0.7.0-python
2+
3+
RUN apt-get update && apt-get install -y --no-install-recommends \
4+
git \
5+
jq \
6+
ripgrep \
7+
sqlite3 \
8+
&& rm -rf /var/lib/apt/lists/*
9+
10+
RUN npm install -g pnpm@10
11+
12+
WORKDIR /workspace
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# Cloudflare Sandbox Spike
2+
3+
P3 spike for proving whether Cloudflare Sandboxes are a viable remote execution tier for StackMemory.
4+
5+
This package is intentionally narrow:
6+
- Worker + Sandbox SDK scaffold
7+
- terminal/websocket path
8+
- command execution
9+
- file read/write
10+
- Git checkout
11+
- R2-backed persistence hooks
12+
- backup/restore hooks
13+
14+
It is not production-ready orchestration. The point is to validate platform fit.
15+
16+
## Why this spike exists
17+
18+
For StackMemory, the hard question is not "can Cloudflare run code?".
19+
It is:
20+
21+
1. Can it host an isolated, project-scoped agent runtime?
22+
2. Can it preserve enough state to avoid cold-starting every session?
23+
3. Can it support browser terminals and repo workflows cleanly?
24+
4. Are the limits predictable enough to become a real `L3` remote execution layer?
25+
26+
Cloudflare's current Sandbox SDK is the first platform shape that makes this plausible without building a custom container control plane ourselves.
27+
28+
## What this spike proves
29+
30+
The scaffold demonstrates:
31+
32+
- `POST /v1/sandboxes/:id/bootstrap`
33+
- optionally mounts persistent storage
34+
- optionally clones a repo into `/workspace/repo`
35+
- `POST /v1/sandboxes/:id/exec`
36+
- runs commands in the sandbox
37+
- `GET /v1/sandboxes/:id/files?path=...`
38+
- `PUT /v1/sandboxes/:id/files?path=...`
39+
- `GET /v1/sandboxes/:id/ls?path=...`
40+
- `POST /v1/sandboxes/:id/mount`
41+
- mounts project storage into the sandbox
42+
- `POST /v1/sandboxes/:id/backup`
43+
- `POST /v1/sandboxes/:id/restore`
44+
- `POST /v1/sandboxes/:id/destroy`
45+
- `GET /health`
46+
- `GET/WS /v1/sandboxes/:id/terminal`
47+
- browser terminal passthrough to the sandbox PTY
48+
49+
## Local development
50+
51+
Prereqs:
52+
- Docker running locally
53+
- Cloudflare account
54+
- Node.js
55+
56+
Install:
57+
58+
```bash
59+
cd packages/cloudflare-sandbox-spike
60+
npm install
61+
```
62+
63+
Start locally:
64+
65+
```bash
66+
npm run dev
67+
```
68+
69+
Smoke test:
70+
71+
```bash
72+
curl http://localhost:8787/health
73+
74+
curl -X POST http://localhost:8787/v1/sandboxes/demo/bootstrap \
75+
-H 'content-type: application/json' \
76+
-d '{"repoUrl":"https://github.com/stackmemoryai/stackmemory.git","depth":1,"mountProjectData":true,"localBucket":true}'
77+
78+
curl -X POST http://localhost:8787/v1/sandboxes/demo/exec \
79+
-H 'content-type: application/json' \
80+
-d '{"command":"bash","args":["-lc","cd /workspace/repo && git status --short"]}'
81+
```
82+
83+
## Production-only caveat
84+
85+
`backup` / `restore` do not work under `wrangler dev` because the current backup implementation requires FUSE support. Use deployed Workers for that part of the spike.
86+
87+
## Required config
88+
89+
`wrangler.jsonc` already includes:
90+
- `containers`
91+
- `durable_objects`
92+
- `migrations`
93+
- `PROJECT_DATA` R2 binding
94+
- `BACKUP_BUCKET` R2 binding
95+
96+
For remote R2 bucket mounting and backup flows, populate secrets/envs similar to `.dev.vars.example`.
97+
98+
## Suggested evaluation sequence
99+
100+
1. `health`
101+
2. `bootstrap`
102+
3. `exec`
103+
4. websocket terminal
104+
5. `mount`
105+
6. write/read through mounted storage
106+
7. `backup`
107+
8. destroy sandbox
108+
9. restore backup
109+
10. re-run command in restored repo
110+
111+
## Current recommendation
112+
113+
If this spike works end-to-end, the likely production shape is:
114+
115+
- Workers = control plane / auth / API
116+
- Sandbox = per-project or per-session execution runtime
117+
- Durable Object = instance identity and state coordination
118+
- R2 = mounted project persistence + backups + artifacts
119+
- StackMemory hosted runtime = metadata, indexing, retrieval, event routing
120+
121+
This should be treated as a remote execution tier, not as a replacement for StackMemory's hosted relational memory store.

0 commit comments

Comments
 (0)