Skip to content

Commit 928f53a

Browse files
Update yutori-computer-use templates for n1-latest (#113)
## Summary - Update yutori templates (Python + TypeScript) to the `n1-latest` model API, which uses OpenAI-compatible `tool_calls` format instead of the previous custom format - Remove Playwright computer mode in favor of a single Computer Controls path (Playwright is still used for `goto_url` in kiosk mode) - Add kiosk mode option that hides browser chrome and uses Playwright for navigation - Rename `max_tokens` to `max_completion_tokens`, update viewport to 1280x800, simplify the sampling loop and tool mapping ## Test plan All 13 action types were tested against live Kernel browser sessions for both Python and TypeScript implementations (27 tests each, all passing): | Action | Status | |---|---| | `left_click`, `double_click`, `triple_click`, `right_click` | Pass | | `hover`, `drag` | Pass | | `type` (basic, `clear_before_typing`, `press_enter_after`) | Pass | | `key_press` (single key, combos, modifier mapping) | Pass | | `scroll` (up, down, left, right) | Pass | | `goto_url`, `go_back`, `refresh`, `wait` | Pass | | Error handling (unknown action, missing required fields, invalid direction) | Pass | Made with [Cursor](https://cursor.com) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Moderate risk because it rewires the agent loop/tool protocol and action schema for both Python and TypeScript templates, which can break runtime behavior if the `tool_calls` or coordinate/action mapping differs from expectations. > > **Overview** > Updates the `yutori-computer-use` Python and TypeScript templates to use Yutori `n1-latest`’s OpenAI-compatible `tool_calls` flow (tool result messages keyed by `tool_call_id`) instead of the prior JSON-in-content + `observation` approach. > > Simplifies execution to a single Computer Controls path by removing the Playwright “mode”, adding an optional `kiosk`/`kioskMode` that launches the browser in kiosk mode and uses Playwright only for `goto_url`, and updating action/schema details (e.g., `coordinates` + new click variants, `max_completion_tokens`, default viewport `1280x800`, WebP screenshot encoding via `Pillow`/`sharp`). QA docs were updated to reflect the new Yutori invocation patterns, and `.gitignore` now excludes `.cursor/plans/`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit c3757ab. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 477c6df commit 928f53a

File tree

17 files changed

+439
-1267
lines changed

17 files changed

+439
-1267
lines changed

.cursor/commands/qa.md

Lines changed: 9 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,6 @@ Here are all valid language + template combinations:
6060
| typescript | claude-agent-sdk | ts-claude-agent-sdk | ts-claude-agent-sdk | Yes | ANTHROPIC_API_KEY |
6161
| typescript | yutori-computer-use | ts-yutori-cua | ts-yutori-cua | Yes | YUTORI_API_KEY |
6262

63-
> **Note:** The `yutori-computer-use` template supports two modes: `computer_use` (default, full VM screenshots) and `playwright` (viewport-only screenshots via CDP). Both modes should be tested.
64-
6563
| python | sample-app | py-sample-app | python-basic | No | - |
6664
| python | gemini-computer-use | py-gemini-cua | python-gemini-cua | Yes | GOOGLE_API_KEY |
6765
| python | captcha-solver | py-captcha-solver | python-captcha-solver | No | - |
@@ -72,9 +70,7 @@ Here are all valid language + template combinations:
7270
| python | claude-agent-sdk | py-claude-agent-sdk | py-claude-agent-sdk | Yes | ANTHROPIC_API_KEY |
7371
| python | yutori-computer-use | py-yutori-cua | python-yutori-cua | Yes | YUTORI_API_KEY |
7472

75-
> **Yutori Modes:**
76-
> - `computer_use` (default): Uses Kernel's Computer Controls API with full VM screenshots
77-
> - `playwright`: Uses Playwright via CDP WebSocket for viewport-only screenshots (optimized for n1 model)
73+
> **Yutori:** Test both default browser and `"kiosk": true` (uses Playwright for goto_url when kiosk is enabled).
7874
7975
### Create Commands
8076

@@ -275,8 +271,8 @@ kernel invoke ts-magnitude mag-url-extract --payload '{"url": "https://en.wikipe
275271
kernel invoke ts-openai-cua cua-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 5 articles"}'
276272
kernel invoke ts-gemini-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and move the 5 items in the To Do and In Progress items to the Done section of the Kanban board. You are done successfully when the items are moved.", "record_replay": true}'
277273
kernel invoke ts-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'
278-
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "computer_use"}'
279-
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "playwright"}'
274+
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true}'
275+
kernel invoke ts-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "kiosk": true}'
280276

281277
# Python apps
282278
kernel invoke python-basic get-page-title --payload '{"url": "https://www.google.com"}'
@@ -287,8 +283,8 @@ kernel invoke python-openai-cua cua-task --payload '{"task": "Go to https://news
287283
kernel invoke python-openagi-cua openagi-default-task -p '{"instruction": "Navigate to https://agiopen.org and click the What is Computer Use? button"}'
288284
kernel invoke py-claude-agent-sdk agent-task --payload '{"task": "Go to https://news.ycombinator.com and get the top 3 stories"}'
289285
kernel invoke python-gemini-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and move the 5 items in the To Do and In Progress items to the Done section of the Kanban board. You are done successfully when the items are moved.", "record_replay": true}'
290-
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "computer_use"}'
291-
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "mode": "playwright"}'
286+
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true}'
287+
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items.", "record_replay": true, "kiosk": true}'
292288
```
293289

294290
## Step 7: Automated Runtime Testing (Optional)
@@ -313,8 +309,8 @@ If the human agrees, invoke each template use the Kernel CLI and collect results
313309
| ts-openai-cua | ts-openai-cua | | |
314310
| ts-gemini-cua | ts-gemini-cua | | |
315311
| ts-claude-agent-sdk | ts-claude-agent-sdk | | |
316-
| ts-yutori-cua | ts-yutori-cua | | mode: computer_use |
317-
| ts-yutori-cua | ts-yutori-cua | | mode: playwright |
312+
| ts-yutori-cua | ts-yutori-cua | | default |
313+
| ts-yutori-cua | ts-yutori-cua | | kiosk: true |
318314
| py-sample-app | python-basic | | |
319315
| py-captcha-solver | python-captcha-solver | | |
320316
| py-browser-use | python-bu | | |
@@ -323,8 +319,8 @@ If the human agrees, invoke each template use the Kernel CLI and collect results
323319
| py-openagi-cua | python-openagi-cua | | |
324320
| py-claude-agent-sdk | py-claude-agent-sdk | | |
325321
| py-gemini-cua | python-gemini-cua | | |
326-
| py-yutori-cua | python-yutori-cua | | mode: computer_use |
327-
| py-yutori-cua | python-yutori-cua | | mode: playwright |
322+
| py-yutori-cua | python-yutori-cua | | default |
323+
| py-yutori-cua | python-yutori-cua | | kiosk: true |
328324

329325
Status values:
330326
- **SUCCESS**: App started and returned a result

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,9 @@ report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json
3737

3838
# Finder (MacOS) folder config
3939
.DS_Store
40+
41+
# Cursor
42+
.cursor/plans/
4043
kernel
4144

4245
# QA testing directories

pkg/templates/python/yutori-computer-use/README.md

Lines changed: 30 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ kernel deploy main.py --env-file .env
2020
## Usage
2121

2222
```bash
23-
kernel invoke python-yutori-cua cua-task --payload '{"query": "Navigate to https://example.com and describe the page"}'
23+
kernel invoke python-yutori-cua cua-task --payload '{"query": "Go to https://www.magnitasks.com, Click the Tasks option in the left-side bar, and drag the 5 items in the To Do and In Progress columns to the Done section of the Kanban board. You are done successfully when the items are dragged to Done. Do not click into the items."}'
2424
```
2525

2626
## Recording Replays
@@ -35,19 +35,44 @@ kernel invoke python-yutori-cua cua-task --payload '{"query": "Navigate to https
3535

3636
When enabled, the response will include a `replay_url` field with a link to view the recorded session.
3737

38+
## Kiosk mode
39+
40+
Prefer **non-kiosk mode** by default and when the agent is expected to switch domains via URL. Use **kiosk (`"kiosk": true`)** when: (1) you're recording sessions and want a cleaner UI in the replay, or (2) you're automating on a single website and the combination of the complex site layout and browser chrome (address bar, tabs) may confuse the agent.
41+
42+
Note: In kiosk mode the agent may still try to use the address bar to enter URLs; it's not available, so it will eventually use `goto_url`, but those attempts may result in slowdown of the overall session.
43+
44+
Default (non-kiosk):
45+
46+
```bash
47+
kernel invoke python-yutori-cua cua-task --payload '{"query": "Navigate to https://example.com, then navigate to ign.com and describe the page"}'
48+
```
49+
50+
With kiosk (single-site or recording):
51+
52+
```bash
53+
kernel invoke python-yutori-cua cua-task --payload '{"query": "Enter https://example.com in the search box and then describe the page.", "kiosk": true}'
54+
```
55+
3856
## Viewport Configuration
3957

40-
Yutori n1 recommends a **1280×800 (WXGA, 16:10)** viewport for best grounding accuracy. Kernel's closest supported viewport is **1200×800 at 25Hz**, which this template uses by default.
58+
Yutori n1 recommends a **1280×800 (WXGA, 16:10)** viewport for best grounding accuracy.
4159

42-
> **Note:** n1 outputs coordinates in a 1000×1000 relative space, which are automatically scaled to the actual viewport dimensions. The slight width difference (1200 vs 1280) should have minimal impact on accuracy.
60+
> **Note:** n1 outputs coordinates in a 1000×1000 relative space, which are automatically scaled to the actual viewport dimensions.
4361
4462
See [Kernel Viewport Documentation](https://www.kernel.sh/docs/browsers/viewport) for all supported configurations.
4563

46-
## n1 Supported Actions
64+
## Screenshots
65+
66+
Screenshots are automatically converted to WebP format for better compression across multi-step trajectories, as recommended by Yutori.
67+
68+
## n1-latest Supported Actions
4769

4870
| Action | Description |
4971
|--------|-------------|
50-
| `click` | Left mouse click at coordinates |
72+
| `left_click` | Left mouse click at coordinates |
73+
| `double_click` | Double-click at coordinates |
74+
| `triple_click` | Triple-click at coordinates |
75+
| `right_click` | Right mouse click at coordinates |
5176
| `scroll` | Scroll page in a direction |
5277
| `type` | Type text into focused element |
5378
| `key_press` | Send keyboard input |
@@ -57,7 +82,6 @@ See [Kernel Viewport Documentation](https://www.kernel.sh/docs/browsers/viewport
5782
| `refresh` | Reload current page |
5883
| `go_back` | Navigate back in history |
5984
| `goto_url` | Navigate to a URL |
60-
| `stop` | End task with final answer |
6185

6286
## Resources
6387

0 commit comments

Comments
 (0)