Summary
gstack-memory-ingest.ts (the per-skill transcript ingest writer added in #1298) is fundamentally unable to ingest at scale on Windows. Two layered bugs in the per-page gbrain put writer, neither of which has a small-scale workaround for transcript-sized payloads.
Filing as a follow-up to #1271 (Quirk 3) — that issue documented the smoke-test breakage at v0.18.2; this one covers the production-scale failure at v1.26+ and proposes a durable fix.
Repro: 876-transcript backfill on Windows 11 / Git Bash / Bun. Pre-patch: every page errored with ENOENT: no such file or directory '/dev/stdin'. Post-patch (refactor below): 466s → 5.6s for a 3-page benchmark, full 876 pages in 8 min.
Environment
Bug 1 — command -v gbrain probe is Bun-on-Windows incompatible
gbrainAvailable() calls execSync("command -v gbrain", { stdio: "ignore" }) to gate the writer. On Windows + Bun, this throws because Bun's spawn shim doesn't resolve POSIX builtins (command -v) the way Bash does — it tries to launch a binary literally named command. Effect: writer disables itself silently, every page is reported as "skipped: gbrain CLI not in PATH" even though gbrain --version works fine in the same shell.
Bug 2 — gbrain put <slug> reads /dev/stdin literally on Windows
After patching past Bug 1, every page fails:
[gbrain-error] ENOENT: no such file or directory, open '/dev/stdin'
Root cause: gbrain put reads content from stdin. On Windows, Bun's execFileSync with input: body should pipe to the child's stdin, but gbrain's reader appears to open /dev/stdin as a literal path rather than reading from fd 0. /dev/stdin doesn't exist on Windows.
The --content flag (Quirk 3 workaround in #1271) sidesteps stdin but trips the 32K CreateProcess arg-length limit on transcripts averaging ~970KB. So --content is only a workaround for trivially-small smoke-test payloads, not real content.
Why this matters
gstack-memory-ingest.ts is the V1 transcript ingest path (#1298) — the canonical way new sessions land in the brain. On Windows it's been a no-op since v1.26 was released. Brain score on a fresh /setup-gbrain install caps at ~10/100 because transcripts never make it in. No error surfaces unless the user runs --verbose and notices the per-page skip messages.
Proposed fix — stage + batch import
gbrain import <dir> --workers N --json is the canonical bulk path: parallel workers, mtime/hash dedup, checkpoint/resume on interrupt, no stdin, no per-page CreateProcess. Refactor the writer to:
- Build the page body (with frontmatter injection — exactly as today).
- Write it to a staging dir (
~/.gstack/.import-staging-<pid>/).
- At end of
ingestPass, fire one gbrain import <dir> --workers 4 --json call.
- Parse the JSON summary for aggregate counts; cleanup staging.
Trade-off accepted: per-file failure attribution from import is aggregate-only (gbrain emits per-file errors to stderr but the summary line only reports counts). For our use case — bulk transcript ingest where retry-next-pass is fine — that's the right trade.
Diff is ~130 insertions / 39 deletions in gstack-memory-ingest.ts. Happy to open a PR.
Benchmark (Windows 11 + Bun + gbrain v0.18.2 + Supabase Postgres ap-southeast-1)
| Scenario |
Before (per-page put) |
After (batch import) |
| 3-page test |
466s (all failing) |
5.6s |
| 876-page bulk backfill |
∞ (every page ENOENT) |
8 min |
| Brain score (post) |
10/100 |
45/100 |
| Embedding coverage |
0% |
99% |
Why not just fix gbrain put upstream
The writer-side fix is simpler, doesn't require a gbrain release, and gbrain import is already the recommended bulk path in gbrain's own docs. Per-page gbrain put would still need a Windows fix (open fd 0 instead of /dev/stdin), but that's a gbrain repo concern — not blocking gstack's writer.
Reproducer
# Windows 11 + Git Bash + Bun
gstack-memory-ingest --verbose 2>&1 | head -20
# Expected (post-fix): "imported N, skipped M"
# Actual (current main): "skipped: gbrain CLI not in PATH" repeated N times
Summary
gstack-memory-ingest.ts(the per-skill transcript ingest writer added in #1298) is fundamentally unable to ingest at scale on Windows. Two layered bugs in the per-pagegbrain putwriter, neither of which has a small-scale workaround for transcript-sized payloads.Filing as a follow-up to #1271 (Quirk 3) — that issue documented the smoke-test breakage at v0.18.2; this one covers the production-scale failure at v1.26+ and proposes a durable fix.
Repro: 876-transcript backfill on Windows 11 / Git Bash / Bun. Pre-patch: every page errored with
ENOENT: no such file or directory '/dev/stdin'. Post-patch (refactor below): 466s → 5.6s for a 3-page benchmark, full 876 pages in 8 min.Environment
bin/gstack-memory-ingest.tslines ~750-885Bug 1 —
command -v gbrainprobe is Bun-on-Windows incompatiblegbrainAvailable()callsexecSync("command -v gbrain", { stdio: "ignore" })to gate the writer. On Windows + Bun, this throws because Bun's spawn shim doesn't resolve POSIX builtins (command -v) the way Bash does — it tries to launch a binary literally namedcommand. Effect: writer disables itself silently, every page is reported as "skipped: gbrain CLI not in PATH" even thoughgbrain --versionworks fine in the same shell.Bug 2 —
gbrain put <slug>reads/dev/stdinliterally on WindowsAfter patching past Bug 1, every page fails:
Root cause:
gbrain putreads content from stdin. On Windows, Bun'sexecFileSyncwithinput: bodyshould pipe to the child's stdin, but gbrain's reader appears to open/dev/stdinas a literal path rather than reading from fd 0./dev/stdindoesn't exist on Windows.The
--contentflag (Quirk 3 workaround in #1271) sidesteps stdin but trips the 32K CreateProcess arg-length limit on transcripts averaging ~970KB. So--contentis only a workaround for trivially-small smoke-test payloads, not real content.Why this matters
gstack-memory-ingest.tsis the V1 transcript ingest path (#1298) — the canonical way new sessions land in the brain. On Windows it's been a no-op since v1.26 was released. Brain score on a fresh/setup-gbraininstall caps at ~10/100 because transcripts never make it in. No error surfaces unless the user runs--verboseand notices the per-page skip messages.Proposed fix — stage + batch import
gbrain import <dir> --workers N --jsonis the canonical bulk path: parallel workers, mtime/hash dedup, checkpoint/resume on interrupt, no stdin, no per-page CreateProcess. Refactor the writer to:~/.gstack/.import-staging-<pid>/).ingestPass, fire onegbrain import <dir> --workers 4 --jsoncall.Trade-off accepted: per-file failure attribution from
importis aggregate-only (gbrain emits per-file errors to stderr but the summary line only reports counts). For our use case — bulk transcript ingest where retry-next-pass is fine — that's the right trade.Diff is ~130 insertions / 39 deletions in
gstack-memory-ingest.ts. Happy to open a PR.Benchmark (Windows 11 + Bun + gbrain v0.18.2 + Supabase Postgres ap-southeast-1)
Why not just fix
gbrain putupstreamThe writer-side fix is simpler, doesn't require a gbrain release, and
gbrain importis already the recommended bulk path in gbrain's own docs. Per-pagegbrain putwould still need a Windows fix (open fd 0 instead of/dev/stdin), but that's a gbrain repo concern — not blocking gstack's writer.Reproducer