From 5266cc3881c17ea6879ab5f7d0ec93f7439ea21a Mon Sep 17 00:00:00 2001 From: Nathan Flurry Date: Mon, 22 Jun 2026 20:05:36 -0700 Subject: [PATCH 1/2] docs(integration): Zid dylib integration issue tracker Living tracker for the Zid team's dylib-preview migration: their reported blockers (Q0-Q5) and our agentOS/secure-exec bugs surfaced while investigating. Key finding: reproduced their exact pattern + a faithful replay of their actor sequence on native linux-x64 with their pinned versions (815fcda). Q1-Q4 do NOT reproduce and createSession("pi") works end-to-end, so their blockers are environment-specific (custom bundled adapter and/or Rosetta x86-emulation), not SDK bugs. Tracks the genuine our-side gaps (toolKit Zod v3/v4, native-actor dropped config, defaultSoftware ignored, node_modules-mount helper, no mountFs on native actor, port DX, patch audit) and links secure-exec PR #114. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/integration/zid-dylib-issues.md | 85 ++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) create mode 100644 docs/integration/zid-dylib-issues.md diff --git a/docs/integration/zid-dylib-issues.md b/docs/integration/zid-dylib-issues.md new file mode 100644 index 000000000..282da85ed --- /dev/null +++ b/docs/integration/zid-dylib-issues.md @@ -0,0 +1,85 @@ +# Zid × agentOS dylib integration — issue tracker + +Living tracker for the Zid team's dylib-preview migration. Covers **their** reported +blockers and **our** (agentOS / secure-exec) bugs surfaced while investigating them. + +**Versions under test (their pins):** +- `@rivet-dev/agentos*` / `agentos-core` / `agentos-pi` / `agentos-sidecar`: `0.0.0-integrate-dylib-into-main.815fcda` +- `rivetkit`: `0.0.0-feat-dylib-actor-plugin.c44621f` +- `@agentos-software/common`: `0.3.0-rc.2`; `@secure-exec/core`: `0.3.0` +- Their runtime: Node 22, **linux-x64-gnu prebuilts under OrbStack + Rosetta x86-emulation on macOS** (prod = Railway, native linux-x64). + +> Note: the dylib stack lives on the `integrate-dylib-into-main` branch (HEAD = `815fcda`), +> **not** on `main` yet. `815fcda` is what their version is built from. + +## How they integrate +They do **not** use the native `agentOs()` actor. They built their own RivetKit JS actor +holding an `AgentOs` core instance (`c.vars.agentOs`) and drive it directly (the removed +`rivetkit/agent-os` actor's replacement). ~12 custom actions, a host toolkit (in-VM Pi +extension → HTTP to host), and server-side `onSessionEvent`. + +## Reproduction status (key finding) +Reproduced their exact pattern on **native linux-x64** with their pinned versions, including a +faithful replay of their actor sequence (create → seed writes → their three +`createInMemoryFileSystem` JS-driver mounts → `createSession("pi")` with `cwd:"/workspace"` and +`cwd:"/"`). **Q1–Q4 do NOT reproduce; `createSession("pi")` succeeds end-to-end.** Therefore +their blockers are **environment-specific** — prime suspects: their **custom bundled adapter** +(swapped over the stock `agentos-pi` adapter; the stock adapter works) and **Rosetta +x86-emulation**. Decisive next test for them: run with the **stock adapter** and/or on **native +linux-x64** (e.g. Railway prod), not OrbStack/Rosetta. + +Their own repro scripts (`scripts/diag-adapter.mjs`, `scripts/smoke-agentos.mjs`) drive their +rivetkit server; the VM flow they wrap is what was replayed here. + +--- + +## Their reported blockers + +| # | Issue | Finding | Status | +|---|---|---|---| +| 1 | **Q0** — native `agentOs()` actor can't host their custom actions / host toolkit / `onSessionEvent`; is wrapping the core class supported? | Yes — core-direct is the documented pattern (all 13 quickstarts). All 4 of their native-actor claims verified TRUE (`actions:{}`, no JS callbacks, callbacks parsed-and-dropped, `toolKits` not serialized). | ✅ Answered | +| 2 | **Q1** — adapter `import @agentclientprotocol/sdk` → `_resolveModule returned non-string` | Not a resolver bug — means "not found in node_modules"; it's their **custom bundled adapter's** node_modules layout. Bundling is a valid fix; else mount the adapter's `node_modules`. | ◐ Diagnosed; error-message fix in **secure-exec PR #114** (diagnostics only) | +| 3 | **Q2** — `chdir` ENOENT for every path incl `/` | Base rootfs **is** provisioned (proven on their version + full flow). Root cause is their custom adapter and/or Rosetta emulation, **not** the SDK. | ✅ Diagnosed (not our bug) | +| 4 | **Q3** — `command not found: sh` | `sh`/`bash` **do** ship (in `@agentos-software/coreutils`, inside `common`); works in repro. Their "common dropped sh" belief is wrong (only the package *description* omits "sh"). | ✅ Diagnosed (not our bug) | +| 5 | **Q4** — host `writeFile`/`mkdir` not visible to guest | Visible core-direct, **no mount required** (proven). Likely write-after-`createSession` ordering / a shadowing mount / env. | ✅ Diagnosed (not our bug) | +| 6 | **Q5** — inherent to the VM model or core-class-specific? | **Neither** — full core-direct path (incl. `createSession("pi")`) reproduced working. | ✅ Answered | + +--- + +## Our bugs / gaps (agentOS + secure-exec) + +| # | Issue | Impact | Status | Proposed fix | +|---|---|---|---|---| +| 7 | `toolKit→sidecar` runs Zod **v4** `toJSONSchema()`; throws on Zod **v3** schemas | **Why they dropped `toolKits`** and went host-tools-over-HTTP | 🟡 Open | accept v3 (or convert), or document the constraint | +| 8 | Native actor **silently drops** `onSessionEvent`/`onPermissionRequest`/`onBeforeConnect`/`toolKits` | Silent footgun — users think these are wired | 🟡 Open | throw a clear "unsupported across the native boundary" error (or wire through) | +| 9 | core `AgentOs.create()` ignores the `defaultSoftware` option it documents | Latent; auto-include of `common` is actor-only (`actor.ts:192-200`) | 🟡 Open | honor `defaultSoftware` in core, or fix the JSDoc | +| 10 | `withAutoAgentNodeModulesMount` is **actor-only** — no public helper for core-direct users | Core-direct users with a custom adapter get no node_modules-mount helper (relates to #2) | 🟡 Open | export a public `nodeModulesMount`-style helper / do it in core | +| 11 | Native actor has **no `mountFs`** action and rejects JS-driver mounts (static, serializable Native mounts only) | **Blocks the proxy-actor pattern** from hosting their session/skills JS-driver VFS | 🟡 Open | dynamic `mountFs` (incl. JS-driver) on the native actor | +| 12 | Engine `:6420` vs httpPort `:6421` (`/metadata` 404) | DX confusion they hit | ⚪ Open | document, or client auto-detect | +| 13 | Audit their 11 carried patches for dylib obsolescence — esp. the WASI **read-blocked-as-write** permission typo (their patch 1) | Some patches may be obsolete; the WASI one is a real correctness bug if it survived the move into secure-exec | ⚪ Not audited | audit + confirm against `0.3.0` | +| 14 | Misleading `_resolveModule returned non-string` error (really "not found") | Sent them down the bundling path (#2) | ✅ **secure-exec PR #114 (open)** | reworded + main-only regression test | + +**Status key:** ✅ done/answered · ◐ partial · 🟡 our bug, identified, not fixed · ⚪ not started + +--- + +## Reproduction recipe +On native linux-x64 (NOT Rosetta), from public npm: +``` +npm i @rivet-dev/agentos-core@0.0.0-integrate-dylib-into-main.815fcda \ + @rivet-dev/agentos-pi@0.0.0-integrate-dylib-into-main.815fcda \ + @agentos-software/common@0.3.0-rc.2 +``` +```js +import { AgentOs, createInMemoryFileSystem } from "@rivet-dev/agentos-core"; +import common from "@agentos-software/common"; +import pi from "@rivet-dev/agentos-pi"; +const vm = await AgentOs.create({ software: [common, pi] }); +console.log((await vm.exec("ls -la / && sh -c 'echo SH_OK' && pwd")).stdout); // base rootfs + sh +await vm.writeFile("/home/user/x.txt", "hi"); +console.log((await vm.exec("cat /home/user/x.txt")).stdout); // host write visible, no mount +for (const p of ["/home/user/.pi/agent/sessions","/app/skills"]) vm.mountFs(p, createInMemoryFileSystem()); +console.log((await vm.createSession("pi", { cwd: "/workspace", env: { HOME:"/home/user" } })).sessionId); // works +await vm.dispose(); +``` +All of the above succeed → Q1–Q4 are not SDK bugs. From 3345adf6336cb93f9e759e4a380111b0ca340c56 Mon Sep 17 00:00:00 2001 From: Nathan Flurry Date: Mon, 22 Jun 2026 20:19:27 -0700 Subject: [PATCH 2/2] docs(integration): reproduced with their actual adapter; root-cause Q1=node_modules hoist, Q2-4=Rosetta Bundled their actual adapter.js with their build-adapter.mjs (stub+eval+minify), swapped over agentos-pi, and ran createSession('pi') on their pinned version: it reaches session/new and returns a sessionId. Unbundled variant resolves too. So Q1-Q4 don't reproduce on native even with their code. Q1 root cause is the node_modules hoist layout (dep not on the /root/node_modules ancestor chain core mounts); Q2-Q4 point to OrbStack/Rosetta x86-emulation (no native repro). Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/integration/zid-dylib-issues.md | 44 ++++++++++++++++++++-------- 1 file changed, 31 insertions(+), 13 deletions(-) diff --git a/docs/integration/zid-dylib-issues.md b/docs/integration/zid-dylib-issues.md index 282da85ed..47bf49772 100644 --- a/docs/integration/zid-dylib-issues.md +++ b/docs/integration/zid-dylib-issues.md @@ -19,17 +19,35 @@ holding an `AgentOs` core instance (`c.vars.agentOs`) and drive it directly (the extension → HTTP to host), and server-side `onSessionEvent`. ## Reproduction status (key finding) -Reproduced their exact pattern on **native linux-x64** with their pinned versions, including a -faithful replay of their actor sequence (create → seed writes → their three -`createInMemoryFileSystem` JS-driver mounts → `createSession("pi")` with `cwd:"/workspace"` and -`cwd:"/"`). **Q1–Q4 do NOT reproduce; `createSession("pi")` succeeds end-to-end.** Therefore -their blockers are **environment-specific** — prime suspects: their **custom bundled adapter** -(swapped over the stock `agentos-pi` adapter; the stock adapter works) and **Rosetta -x86-emulation**. Decisive next test for them: run with the **stock adapter** and/or on **native -linux-x64** (e.g. Railway prod), not OrbStack/Rosetta. +Reproduced on **native linux-x64** with their pinned versions, escalating to **their actual code**: +1. Faithful replay of their actor sequence (create → seed writes → their three + `createInMemoryFileSystem` JS-driver mounts → `createSession("pi")` with `cwd:"/workspace"` + and `cwd:"/"`) — **all pass.** +2. **Their actual custom adapter, bundled with their actual `build-adapter.mjs`** (Proxy-stubbing + 8 packages + `eval(require())` + minify), swapped over the stock `agentos-pi` adapter → + `createSession("pi")` reaches `session/new` and returns a sessionId in ~1.4s. **No chdir ENOENT.** +3. **Their adapter UNBUNDLED** (runtime `import "@agentclientprotocol/sdk"`) → also resolves and + `createSession` succeeds. (The *stock* dylib adapter likewise imports `@agentclientprotocol/sdk` + at runtime and resolves.) + +**So Q1–Q4 do NOT reproduce on native, even with their exact adapter + flow + mounts.** Their +blockers are **environment-specific**. Two root causes: +- **Q1 (resolution)** = **node_modules hoist layout.** Core mounts the agent package's hoisted + `node_modules` tree at `/root/node_modules`; the adapter resolves `@agentclientprotocol/sdk` by + walking up from `/root/node_modules/@rivet-dev/agentos-pi`. In a **flat npm install** the dep is + hoisted top-level → resolves. In their monorepo install it isn't on that chain → "not found." + Bundling (their workaround) is correct; alternatively mount/hoist the dep onto the chain. +- **Q2/Q3/Q4 (empty guest FS / no sh / writes invisible)** = **no native reproduction and no code + explanation** → the remaining uncontrolled variable is **OrbStack + Rosetta x86-emulation** of + the linux-x64 prebuilt (guest VFS/mount/exec syscalls misbehaving under emulation). Could be a + single shared cause (the sidecar's mount/VFS layer failing under emulation, which would explain + all three at once). **Not directly reproduced** (this host is native x64, can't run Rosetta). + +**Decisive test for them:** run on **native linux-x64** (their Railway prod is native +linux-x64-glibc). If it works there but fails in OrbStack/Rosetta → emulation confirmed. Their own repro scripts (`scripts/diag-adapter.mjs`, `scripts/smoke-agentos.mjs`) drive their -rivetkit server; the VM flow they wrap is what was replayed here. +rivetkit server + swapped adapter; the VM flow + adapter they wrap is what was reproduced here. --- @@ -38,10 +56,10 @@ rivetkit server; the VM flow they wrap is what was replayed here. | # | Issue | Finding | Status | |---|---|---|---| | 1 | **Q0** — native `agentOs()` actor can't host their custom actions / host toolkit / `onSessionEvent`; is wrapping the core class supported? | Yes — core-direct is the documented pattern (all 13 quickstarts). All 4 of their native-actor claims verified TRUE (`actions:{}`, no JS callbacks, callbacks parsed-and-dropped, `toolKits` not serialized). | ✅ Answered | -| 2 | **Q1** — adapter `import @agentclientprotocol/sdk` → `_resolveModule returned non-string` | Not a resolver bug — means "not found in node_modules"; it's their **custom bundled adapter's** node_modules layout. Bundling is a valid fix; else mount the adapter's `node_modules`. | ◐ Diagnosed; error-message fix in **secure-exec PR #114** (diagnostics only) | -| 3 | **Q2** — `chdir` ENOENT for every path incl `/` | Base rootfs **is** provisioned (proven on their version + full flow). Root cause is their custom adapter and/or Rosetta emulation, **not** the SDK. | ✅ Diagnosed (not our bug) | -| 4 | **Q3** — `command not found: sh` | `sh`/`bash` **do** ship (in `@agentos-software/coreutils`, inside `common`); works in repro. Their "common dropped sh" belief is wrong (only the package *description* omits "sh"). | ✅ Diagnosed (not our bug) | -| 5 | **Q4** — host `writeFile`/`mkdir` not visible to guest | Visible core-direct, **no mount required** (proven). Likely write-after-`createSession` ordering / a shadowing mount / env. | ✅ Diagnosed (not our bug) | +| 2 | **Q1** — adapter `import @agentclientprotocol/sdk` → `_resolveModule returned non-string` | **Root cause: node_modules hoist layout.** Reproduced both stock & their adapter resolving the dep when hoisted flat; "not found" means it isn't on the `/root/node_modules` chain core mounts. Bundling (their workaround) is correct; else hoist/mount the dep. | ◐ Root-caused; error-message fix in **secure-exec PR #114** (diagnostics only) | +| 3 | **Q2** — `chdir` ENOENT for every path incl `/` | Base rootfs **is** provisioned — proven on their version with **their actual bundled adapter** + full flow; `createSession` reaches `session/new`. Not reproducible on native → **Rosetta x86-emulation** is the leading cause. | ✅ Diagnosed (not an SDK bug) | +| 4 | **Q3** — `command not found: sh` | `sh`/`bash` **do** ship (in `@agentos-software/coreutils`, inside `common`); works in repro. Their "common dropped sh" belief is wrong (only the package *description* omits "sh"). Likely same env (mount layer under emulation). | ✅ Diagnosed (not an SDK bug) | +| 5 | **Q4** — host `writeFile`/`mkdir` not visible to guest | Visible core-direct, **no mount required** (proven w/ their flow). Likely write-after-`createSession` ordering, a shadowing mount, or the same emulation env. | ✅ Diagnosed (not an SDK bug) | | 6 | **Q5** — inherent to the VM model or core-class-specific? | **Neither** — full core-direct path (incl. `createSession("pi")`) reproduced working. | ✅ Answered | ---