You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-Browser-agent-server code is in `browser-agent-server/nodejs/`
265
+
-Agent server code comes from `submodules/browser-operator-core/agent-server/nodejs/`
266
+
-Dependencies are installed during Docker build via `npm install`
267
+
-Rebuild the image if dependencies are missing: `make rebuild`
261
268
262
269
### 5. Docker Volume Caching on macOS
263
270
**Symptom:** File changes not visible in running container with docker-compose
@@ -293,10 +300,11 @@ make compose-up # OR make run
293
300
**Advantages:**
294
301
- Background operation
295
302
- Easy restart without rebuilding
296
-
- Volume-mounted eval-server code (live reload)
297
303
- Managed by docker-compose
298
304
- Better for long-running development
299
305
306
+
**Note:** Agent server code is baked into the image, so rebuilds are needed for code changes
307
+
300
308
**Usage:**
301
309
```bash
302
310
# First time setup
@@ -312,9 +320,11 @@ make test # Run simple eval test
312
320
# View logs
313
321
make logs # Follow all logs
314
322
315
-
# Iterate on eval-server code (NO REBUILD NEEDED)
316
-
vim eval-server/nodejs/src/api-server.js
317
-
docker-compose restart # Picks up changes immediately
323
+
# Iterate on agent server code (REQUIRES REBUILD)
324
+
vim submodules/browser-operator-core/agent-server/nodejs/src/api-server.js
325
+
make rebuild
326
+
docker-compose down
327
+
docker-compose up -d
318
328
319
329
# Stop
320
330
make stop # OR docker-compose down
@@ -362,7 +372,7 @@ make run # Restart after rebuild
362
372
|--------|-----------|-------------------|
363
373
|**Logs**| Live in terminal | Background, use `make logs`|
364
374
|**Stopping**| Ctrl+C or docker stop |`make stop`|
365
-
|**Eval server code**| Baked into image, rebuild needed |Volume-mounted, restart only|
375
+
|**Agent server code**| Baked into image, rebuild needed |Baked into image, rebuild needed|
366
376
|**DevTools code**| Baked into image, rebuild needed | Baked into image, rebuild needed |
367
377
|**Best for**| Debugging, seeing startup issues | Development iteration |
368
378
|**Script**|`run-local.sh`|`docker-compose.yml`|
@@ -377,9 +387,11 @@ make run # Restart after rebuild
377
387
```bash
378
388
cd deployments/local
379
389
380
-
# Browser-agent-server changes (NO REBUILD)
381
-
vim ../../browser-agent-server/nodejs/src/api-server.js
382
-
docker-compose restart # Volume-mounted, picks up changes
390
+
# Agent server changes (REQUIRES REBUILD)
391
+
vim ../../submodules/browser-operator-core/agent-server/nodejs/src/api-server.js
392
+
make rebuild
393
+
docker-compose down
394
+
docker-compose up -d
383
395
384
396
# DevTools changes
385
397
vim ../../browser-operator-core/front_end/panels/ai_chat/...
@@ -430,26 +442,29 @@ CHROMIUM_DATA_HOST=/tmp/browser URLS="https://example.com" make run
430
442
431
443
## Important Notes
432
444
433
-
### Browser Agent Server Location
434
-
The browser agent server code is in: `browser-agent-server/nodejs/`
445
+
### Agent Server Location
446
+
The agent server code is in: `submodules/browser-operator-core/agent-server/nodejs/`
435
447
436
448
This is the main server that handles browser automation requests via HTTP/WebSocket APIs.
437
449
450
+
**Note:** The submodule must be on the `feat/js-eval-endpoint` branch to have the `/page/execute` endpoint.
451
+
438
452
### CDP Port is 9223, Not 9222
439
453
The default Chrome DevTools port is 9222, but this project uses 9223.
440
454
441
455
Check these files:
442
456
-`deployments/commons/supervisor/services/browser-agent-server.conf` - Must have `CDP_PORT="9223"`
443
457
- Chromium startup config uses port 9223
444
458
445
-
### Dependencies in browser-agent-server/nodejs/
459
+
### Dependencies in submodules/browser-operator-core/agent-server/nodejs/
446
460
Required packages:
447
-
- js-yaml (for parsing YAML eval files)
448
-
- express (HTTP server)
449
461
- ws (WebSocket server)
450
-
- chrome-remote-interface (CDP client)
462
+
- uuid (ID generation)
463
+
- winston (logging)
464
+
- js-yaml (YAML parsing)
465
+
- dotenv (environment variables)
451
466
452
-
All managed by `package.json` and `npm install`.
467
+
All managed by `package.json` and `npm install` during Docker build.
453
468
454
469
### Lock File Cleanup is Automatic
455
470
After implementing `deployments/*/scripts/init-container.sh`, you should never need to manually clean lock files again. The script runs on every container start.
@@ -706,7 +721,7 @@ curl -X POST http://localhost:8080/page/screenshot \
Copy file name to clipboardExpand all lines: evals/CLAUDE.md
+8-4Lines changed: 8 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
6
6
7
7
This is the **Evaluation Framework** for testing browser automation agents. It uses **LLM-as-a-judge** to evaluate agent responses against defined criteria, with support for **visual verification** through screenshots.
8
8
9
-
The framework is completely independent of the main browser-agent server and operates as a standalone Python application that communicates with the browser-agent-server API at http://localhost:8080.
9
+
The framework is completely independent of the main agent server and operates as a standalone Python application that communicates with the agentserver API at http://localhost:8080.
10
10
11
11
## Framework Structure
12
12
@@ -58,9 +58,13 @@ cp .env.example .env
58
58
# Navigate to native runner directory
59
59
cd native
60
60
61
-
# Run a specific evaluation by path (relative to data/)
61
+
# Run a specific evaluation by file path (relative to data/)
62
62
python3 run.py --path test-simple/math-001.yaml
63
63
64
+
# Run a specific evaluation by directory path (NEW: auto-detects task.yaml)
0 commit comments