Skip to content

Windows: gstack-memory-ingest.ts can't ingest at scale (follow-up to #1271) #1386

@mofoeden-star

Description

@mofoeden-star

Summary

gstack-memory-ingest.ts (the per-skill transcript ingest writer added in #1298) is fundamentally unable to ingest at scale on Windows. Two layered bugs in the per-page gbrain put writer, neither of which has a small-scale workaround for transcript-sized payloads.

Filing as a follow-up to #1271 (Quirk 3) — that issue documented the smoke-test breakage at v0.18.2; this one covers the production-scale failure at v1.26+ and proposes a durable fix.

Repro: 876-transcript backfill on Windows 11 / Git Bash / Bun. Pre-patch: every page errored with ENOENT: no such file or directory '/dev/stdin'. Post-patch (refactor below): 466s → 5.6s for a 3-page benchmark, full 876 pages in 8 min.

Environment

Bug 1 — command -v gbrain probe is Bun-on-Windows incompatible

gbrainAvailable() calls execSync("command -v gbrain", { stdio: "ignore" }) to gate the writer. On Windows + Bun, this throws because Bun's spawn shim doesn't resolve POSIX builtins (command -v) the way Bash does — it tries to launch a binary literally named command. Effect: writer disables itself silently, every page is reported as "skipped: gbrain CLI not in PATH" even though gbrain --version works fine in the same shell.

Bug 2 — gbrain put <slug> reads /dev/stdin literally on Windows

After patching past Bug 1, every page fails:

[gbrain-error] ENOENT: no such file or directory, open '/dev/stdin'

Root cause: gbrain put reads content from stdin. On Windows, Bun's execFileSync with input: body should pipe to the child's stdin, but gbrain's reader appears to open /dev/stdin as a literal path rather than reading from fd 0. /dev/stdin doesn't exist on Windows.

The --content flag (Quirk 3 workaround in #1271) sidesteps stdin but trips the 32K CreateProcess arg-length limit on transcripts averaging ~970KB. So --content is only a workaround for trivially-small smoke-test payloads, not real content.

Why this matters

gstack-memory-ingest.ts is the V1 transcript ingest path (#1298) — the canonical way new sessions land in the brain. On Windows it's been a no-op since v1.26 was released. Brain score on a fresh /setup-gbrain install caps at ~10/100 because transcripts never make it in. No error surfaces unless the user runs --verbose and notices the per-page skip messages.

Proposed fix — stage + batch import

gbrain import <dir> --workers N --json is the canonical bulk path: parallel workers, mtime/hash dedup, checkpoint/resume on interrupt, no stdin, no per-page CreateProcess. Refactor the writer to:

  1. Build the page body (with frontmatter injection — exactly as today).
  2. Write it to a staging dir (~/.gstack/.import-staging-<pid>/).
  3. At end of ingestPass, fire one gbrain import <dir> --workers 4 --json call.
  4. Parse the JSON summary for aggregate counts; cleanup staging.

Trade-off accepted: per-file failure attribution from import is aggregate-only (gbrain emits per-file errors to stderr but the summary line only reports counts). For our use case — bulk transcript ingest where retry-next-pass is fine — that's the right trade.

Diff is ~130 insertions / 39 deletions in gstack-memory-ingest.ts. Happy to open a PR.

Benchmark (Windows 11 + Bun + gbrain v0.18.2 + Supabase Postgres ap-southeast-1)

Scenario Before (per-page put) After (batch import)
3-page test 466s (all failing) 5.6s
876-page bulk backfill ∞ (every page ENOENT) 8 min
Brain score (post) 10/100 45/100
Embedding coverage 0% 99%

Why not just fix gbrain put upstream

The writer-side fix is simpler, doesn't require a gbrain release, and gbrain import is already the recommended bulk path in gbrain's own docs. Per-page gbrain put would still need a Windows fix (open fd 0 instead of /dev/stdin), but that's a gbrain repo concern — not blocking gstack's writer.

Reproducer

# Windows 11 + Git Bash + Bun
gstack-memory-ingest --verbose 2>&1 | head -20
# Expected (post-fix): "imported N, skipped M"
# Actual (current main): "skipped: gbrain CLI not in PATH" repeated N times

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions