Skip to content

Dev#16

Merged
im4codes merged 90 commits intomasterfrom
dev
May 8, 2026
Merged

Dev#16
im4codes merged 90 commits intomasterfrom
dev

Conversation

@im4codes
Copy link
Copy Markdown
Owner

@im4codes im4codes commented May 8, 2026

No description provided.

IM.codes added 30 commits May 1, 2026 23:43
imcodes-win and others added 29 commits May 7, 2026 19:21
Counterpart of scripts/restart-daemon.sh for Windows. The .sh relies on
`set -euo pipefail`, `setsid`/`nohup`, and `disown` — none of which work
in cmd.exe — so Windows users had no equivalent way to rebuild + relink
+ bounce the local daemon from a transport session.

The .cmd does the same three foreground steps (npm install, npm run
build, npm link --force), then launches a detached `imcodes restart`
via wscript -> VBS -> CMD. The detach matters because the calling
shell is usually inside an imcodes-managed transport session — a
synchronous restart would kill the daemon, kill the session, kill this
script before the new daemon comes up.

Two cmd-on-Windows traps we hit while writing this:
  1. Pure ASCII + REM comments only. Unicode em-dashes and box-drawing
     chars in comments at the top of the file get reinterpreted in OEM
     codepage BEFORE `chcp 65001` runs, breaking cmd's parser. The
     `::` comment hack is also fragile near setlocal/EnableDelayedExpansion
     interactions, so we use REM throughout.
  2. Use `imcodes restart` (the standalone command), NOT `imcodes service
     restart --no-build` like the .sh does. The latter explicitly
     rejects win32 with "Unsupported platform: win32" — it only knows
     launchctl/systemd. The standalone `imcodes restart` routes to
     ensureDaemonRunning() in src/util/windows-daemon.ts which does
     the proper Windows pidfile/watchdog dance.

Tested end-to-end on Windows: rebuilds, relinks, daemon pid changes,
new code loads. Logs go to %TEMP%\imcodes-restart-daemon.log so users
can `Get-Content -Wait` from another shell during the restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e markers

Two bugs the 2026-05-07 prod upgrade hit, both pinned by tests in
Windows CI.

## Bug 1 — cmd.exe eats parens even with ^-escapes inside if-blocks

Commit 68947cc escaped literal parens in if-block echoes with ^( / ^).
Same day, prod log /tmp/imcodes-upgrade-cdFehr/upgrade.log showed the
echo

  echo Old daemon PID: !OLD_DAEMON_PID! ^(will be killed only after install succeeds^) >> log

producing the line

  Old daemon PID: 777468 (will be killed only after install succeeds

— `(` printed but `)` was eaten. Verified via `od -c` on the raw log,
not a display artifact. Inside `if exist (...)` blocks cmd.exe still
mis-counts ^-escaped parens at some pass, so the closing `^)` got
consumed and the rest of the echo's tail (`>> log`) became orphaned.

Permanent fix: forbid literal parens of ANY form (escaped or not)
inside if-block echoes. Use `[...]` or `--...--`. `[` and `]` are
not magic to cmd's block parser at any nesting depth.

Same fix applied to sharp-repair-script.ts which had the identical
pattern at two echo lines inside its `if "!SHARP_BROKEN!"=="1"` block.

## Bug 2 — silent death after npm install with no log evidence

Same prod log: npm install printed its successful "60 packages are
looking for funding" tail and then the script went absolutely silent.
No subsequent log line. The upgrade.lock stayed stranded at 19:11:57
and the watchdog couldn't respawn a new daemon. We could not tell
which step (sharp-repair, npm prefix -g, version check, taskkill,
repair-watchdog, VBS launch) hung or crashed.

Mitigation: emit `[trace] step=N <stage>` markers BEFORE and AFTER
every major step in upgrade.cmd. Next time it dies silently, the
last `[trace]` line in the log pinpoints exactly where. Also captures
exit codes for npm install and repair-watchdog so we can tell whether
those returned cleanly vs hung-then-killed.

## Tests (run in both Windows CI jobs)

- `every echo INSIDE an if(...) block has NO literal parens at all
  (not even ^-escaped)` — tightened from previous version which only
  required ^-escapes. Walks generated batch line-by-line, tracks
  block depth, asserts no `(` or `)` in any echo at depth>0.
- `emits trace markers at every major step so silent deaths are
  localizable` — pins all 16 trace markers exist.
- `trace markers appear in source-order so the log is read top-down`
  — pins the strict ordering so log readability stays guaranteed.
- `post-npm-install trace captures the exit code so we can tell
  whether install actually succeeded` — pins the exit-code capture.
- `uses brackets (not parens) inside every if-block echo` — pins
  the specific phrasings that previously had paren bugs and forbids
  the old `^(`/`^)` forms.
- Mirror tests in `sharp-repair-script.test.ts` for the same bracket
  convention there.

CI: added `sharp-repair-script.test.ts` to both `windows-unit-tests`
and `windows-conpty-tests` job runs (was previously only run in
Linux unit-tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/restart-daemon.cmd was using LF line endings.  cmd.exe's tolerance
for LF is uneven — modern Windows usually copes, but it's documented to
parse tokens across lines under some block-level conditions, and we hit
exactly that class of bug today in windows-upgrade-script.ts.  Convert
all 110 lines to CRLF and pin the convention with a test so a future
hand-edit (or git autocrlf misconfiguration) can't regress it silently.

While there, lock the rest of restart-daemon.cmd's invariants — every
one is something that cost a debug cycle in the past:

- Pure ASCII (no Unicode em-dashes, box-drawing, smart quotes).  chcp
  65001 only takes effect AFTER the file has been parsed, so any
  multi-byte UTF-8 in the comment header gets reinterpreted as OEM
  bytes and breaks the parser.  We lost a full restart cycle on a
  U+2014 (em-dash) in a comment line.
- REM comments only (NOT `::`).  The `::` hack works at top level
  but breaks inside parenthesized blocks; making rem the only style
  prevents future copy-paste surprises.
- No literal parens in if-block echoes.  Same root cause as the
  windows-upgrade-script.ts fix in the previous commit.
- Uses `imcodes restart` not `imcodes service restart` — the latter
  rejects win32 with "Unsupported platform".  The check filters
  rem-comments first so the documentation header (which mentions the
  forbidden form on purpose) doesn't trip the test.
- Uses ping not `timeout /t` — timeout aborts under wscript-spawned
  cmd because there's no console for stdin.
- Three-layer detached spawn: wscript -> VBS -> CMD with mode 0
  (hidden) + False (no wait), `On Error Resume Next` to suppress the
  modal error dialog if the inner CMD path is bad.
- Strict step ordering: npm install -> build -> link --force ->
  detached restart dispatch.  Each pre-build step checks errorlevel
  and exits 1 on failure so a broken build can't silently dispatch
  a stale restart.
- Per-run randomized tmp dir (%RANDOM%-%RANDOM%) so concurrent
  restarts don't trample each other; inner cmd self-cleans after
  60 s.

Wired into both windows-unit-tests and windows-conpty-tests CI jobs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two stability fixes for Windows daemon auto-upgrade after the 2026-05-07
incident where an in-flight auto-upgrade left the daemon wedged for hours
and `imcodes restart` could not recover.

1. Watchdog self-heals stuck upgrade.lock (10-minute mtime threshold)
   The auto-upgrade batch script removes the lock at its `:done`
   safety-net.  But if the script crashes BEFORE reaching :done — for
   example because cmd.exe's paren-counting derailed inside an
   `if exist (...)` block with an unescaped echo `(...)` (the actual
   2026-05-07 root cause) — the lock survives and the watchdog parks in
   :wait_loop forever.

   The watchdog now runs a tiny PowerShell probe inside :wait_loop:

     if ((Get-Item upgrade.lock).LastWriteTime -lt (Get-Date).AddMinutes(-10)) {
       Remove-Item upgrade.lock
     }

   Real upgrades finish in ~1-3 minutes, so a 10-minute threshold cannot
   race a live upgrade.  Two distinct exit messages let the operator
   tell normal lock release ("Upgrade lock cleared, resuming.") apart
   from auto-recovery ("Upgrade lock was stale (>10min) -- removed by
   watchdog self-heal.").

2. `imcodes restart` clears upgrade.lock before launching new watchdog
   When restartWindowsDaemon kills the existing daemon and watchdog
   tree, any in-flight auto-upgrade is by definition broken — we just
   killed its child processes.  Leaving the lock would only park the
   freshly-spawned watchdog in :wait_lock indefinitely; the user's
   `imcodes restart` would time out at 15 s with "Watchdog not found".
   New clearUpgradeLock() helper uses fs.rmSync, falls back to
   PowerShell Remove-Item if del was held off by an AV scan / sharing
   violation.

3. .gitattributes forces CRLF for .cmd / .bat / .vbs / .ps1
   The 2026-05-07 CI failure was test/util/restart-daemon-cmd.test.ts
   asserting CRLF line endings.  The file was committed with LF blob
   bytes, which on Windows checkout converts to CRLF (matching local
   dev), but on macOS/Linux runners stays as LF — so the test trips
   "bare LF at offset 9" only on those CI jobs.  `*.cmd text eol=crlf`
   in .gitattributes makes the working-tree representation
   platform-independent: blob is LF, checkout is CRLF everywhere.

Tests added:
  * windows-launch-artifacts.test.ts: 'self-heals stuck upgrade.lock
    when older than 10 minutes' asserts the PS probe is present, has
    the >10min mtime check, and the two distinct exit messages.
  * windows-launch-artifacts.test.ts: 'stale-lock probe runs INSIDE
    wait_loop, after the 30s sleep' guards against ordering regression
    that would race a slow-starting upgrade.
  * windows-daemon.test.ts: 'clears stuck upgrade.lock so the new
    watchdog does not park in :wait_lock' covers the manual restart
    recovery path with an existing-lock fixture.
  * windows-daemon.test.ts: 'skips lock removal when no lock exists'
    guards against fs noise on the happy path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the test that answers "how do you know the self-heal actually works
on Windows?" — instead of asserting string content of the generated
batch, this spawns a real `cmd.exe /c` against the production-generated
watchdog.cmd and verifies the lock-removal flow end-to-end.

Two cases:

1. Stuck lock (mtime 30s ago, threshold shrunk to 5s for test speed)
   → PowerShell probe detects staleness, removes the lock via Remove-Item,
   watchdog logs "Upgrade lock was stale (>10min) -- removed by watchdog
   self-heal", then resumes via `goto loop` and reaches the daemon launch
   line.  Asserts: status=0, lock gone from disk, log contains both the
   "waiting" entry message AND the self-heal message AND the post-loop
   marker echo.

2. Fresh lock (mtime = now, well under threshold)
   → PowerShell probe sees a young file and skips Remove-Item.  Asserts:
   lock STILL present on disk, no false-positive self-heal log line.
   Guards against the failure mode where the watchdog races a live
   upgrade by removing a lock the upgrade just placed.

Test mechanics:
- Generates the production watchdog.cmd via writeWatchdogCmd in a child
  Node process so vi.mock() from sibling tests can't pollute fs.
- Rewrites the cmd to redirect %USERPROFILE% paths to a per-test
  tmp dir, shrink the wait_loop ping from 30s → 2s, lower the stale
  threshold from 10 minutes → 5 seconds, and exit after one iteration
  so the test runs in ~3 seconds.
- The actual cmd.exe parses the actual `if exist`, `goto wait_lock`,
  `goto lock_cleared`, the PowerShell -Command one-liner, the single-
  quoted path expansion, and `(Get-Date).AddMinutes(-10)` — all the
  pieces that string-content tests can't verify.

Skipped on non-Windows hosts.  Runs on the Unit Tests (Windows) and
Unit Tests (Windows ConPTY) CI jobs to catch regressions in the
self-heal flow before they hit users in production.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches the Windows daemon auto-upgrade from a 200-line cmd.exe batch
script (generated from a TS string template) to a real Node.js runner.

Why: every Windows auto-upgrade outage we shipped traced back to a
cmd.exe parser quirk that JS doesn't have:

  - 2026-04-21: NODE_OPTIONS accumulating across upgrade cycles caused
    V8 to try reserving 16 GB heap.  Cause: cmd.exe `set` mutating the
    inherited env across `setlocal` boundaries.  In JS, spawnSync gets
    a fresh env object — accumulation is impossible.
  - 2026-04-27: silent `del` failure on a doubled-backslash path left
    upgrade.lock stranded for hours.  Cause: a sibling .mjs script
    over-escaping interpolated paths.  In JS, fs.unlinkSync uses the
    Windows wide-char API directly — paths embed verbatim.
  - 2026-05-07: an unescaped `(` inside an `if exist (...)` echo
    terminated the if-block early; the lock-removal step ended up
    outside any code path that ran.  Cause: cmd.exe parses if-blocks
    by counting parens.  In JS, control flow is normal try/catch/
    finally — no parsing surprises possible.

Plus the persistent `timeout /t N /nobreak` problem: under wscript-
spawned cmd (no console for stdin), `timeout` aborts immediately and
every "sleep" was a 0-second no-op.  Atomics.wait() in JS doesn't care
about console attachment.

CHINESE / NON-ASCII PATHS: Node fs APIs use the Windows wide-char API
natively.  Paths like `C:\Users\张三\AppData\Local\Temp\...` round-
trip transparently — no `chcp 65001`, no codepage games, no escape
rules to remember.  Tested with a real `张三` USERPROFILE in the e2e
suite; lock + log files written and read correctly through the
non-ASCII directory.

Layout:

  src/util/windows-upgrade-runner.mjs   ← the actual JS runner (NEW)
  src/util/windows-upgrade-script.ts    ← refactored: drops the cmd
                                           batch generator, adds
                                           buildWindowsUpgradeRunnerVbs()
                                           and resolveWindowsUpgradeRunnerPath()
  src/daemon/command-handler.ts         ← copies the runner from
                                           dist/ into a per-upgrade
                                           %TEMP% dir, then spawns
                                           wscript→VBS→node upgrade.mjs

The runner is shipped via the existing scripts/copy-worker-bootstraps.mjs
postbuild step (any *.mjs under src/ gets copied to dist/src/ after tsc).
At upgrade time the daemon copies the bundled runner into a fresh
%TEMP%/imcodes-upgrade-X/ before spawning, so the in-flight `npm
install -g` doesn't overwrite the runner's source from under itself.

NODE 24 EINVAL: spawnSync('foo.cmd', args) returns EINVAL on Node 24
without `shell: true` (post-CVE-2024-27980 hardening).  The runner
wraps every .cmd/.bat invocation in `cmd.exe /d /s /c <bat> <args>` —
the documented Microsoft pattern with precise argv preservation, no
shell-quoting surprises.

Tests:

  test/util/windows-upgrade-script.test.ts (rewritten):
    - buildWindowsUpgradeRunnerVbs: hidden+detached, "" doubling for
      arg quoting, Chinese path round-trip, NEVER references cmd.exe.
    - bundled .mjs invariants: real Node fs imports, top-level
      .catch().finally() guarantees clearLock(), capped 4 GB heap,
      --ignore-scripts for sharp, no `timeout /t`, positional argv.

  test/util/windows-upgrade-runner.e2e.test.ts (NEW, Windows only):
    Real Node child invoking dist/src/util/windows-upgrade-runner.mjs
    with a mocked npm.cmd.  Six scenarios:
      - npm install fails → lock cleared, daemon untouched, exit 0
      - shim missing after install → lock cleared, abort message logged
      - npm.cmd doesn't exist (spawn ENOENT) → finally still clears lock
      - Chinese-character %USERPROFILE% (张三) → fs round-trips cleanly
      - lock mtime is current (so watchdog self-heal can age it out)
      - bundled file is plain JS (no executable cmd.exe dependencies)

All 6 e2e tests pass on real Windows + Node 24.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
handleWebCommand had no try/catch around its dispatch switch, so any
synchronous throw inside a handler (TypeError from `cmd.foo.bar` when
`foo` is undefined; validation throw before the first `await` of an
async function; etc.) propagated out of the WebSocket onMessage
callback, hit the global `uncaughtException` handler in src/index.ts,
and broadcast a noisy `daemon.error` event to every connected browser.
The daemon stayed technically alive (the global handler keeps it
that way), but to operators it LOOKED crashed — a sudden `daemon.error`
landed in the UI for every browser session, mid-task.

Repro: a real WebSocket client sending `{type: "terminal.subscribe"}`
with `session` missing or wrong-typed.  handleSubscribe at line 3257
does `cmd.session as string` then passes it on to getSession, which
when given a non-string can synchronously throw.  Before this fix:

  Cannot read properties of undefined (reading 'split')
  → uncaughtException → daemon.error broadcast to all browsers

After:

  WARN Web command handler threw synchronously — daemon stays alive
       { type: 'terminal.subscribe' }

Quiet warn-level log line scoped to the offending command type.

Also tightens input validation so arrays don't squeeze through
`typeof msg === 'object'` (Array.isArray check added) and so debug
logs flag non-string `cmd.type` values for diagnostics.

Tests added (test/daemon/command-handler-bad-input.test.ts):
  - non-object inputs (null, undefined, primitives, arrays) ignored
  - object with no/wrong-type .type field doesn't throw
  - unknown .type strings ignored silently
  - known .type strings with missing/wrong-type payload fields don't crash
  - source-level invariant: try/catch wrapper must remain in place

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When `imcodes upgrade` (or any `npm install -g imcodes@…`) gets killed
mid-write — power loss, OOM-kill, ssh disconnect — npm leaves critical
top-level deps in `node_modules/` as empty placeholder dirs (no
`package.json` inside). The next daemon start crashes with
ERR_MODULE_NOT_FOUND on the first import, systemd Restart=always
thrashes forever, requires manual SSH + reinstall to recover. Hit
this on 212/213/215.

Add a non-Node guardian in front of the daemon entry that pre-flight-
checks `node_modules/`, detects the half-install signature, and re-
installs the SAME pinned version (read from the surviving
package.json — never rolls forward) before exec'ing the real Node
entry. systemd / launchctl / Task Scheduler never has to know.

Linux + macOS:
  - `bin/imcodes-launch.sh` — pure bash, no node_modules deps.
  - Wired into ExecStart / ProgramArguments by:
    - `src/bind/bind-flow.ts` (initial install)
    - `src/setup/setup-flow.ts` (one-click setup)
    - `src/daemon/command-handler.ts` step 3.5 (every upgrade)
    via shared helper `src/util/launch-target.ts` that resolves
    `program + args` based on whether the launcher is shipped.
  - Older installs without the launcher fall through to direct node
    invocation — graceful degradation.
  - Also clears stale `upgrade.lock.d/` (>1800 s) at every daemon
    start, sweeping aside leftovers from killed upgrades that the
    bash upgrade script's own 30-min watchdog wouldn't reach until a
    follow-up `imcodes upgrade` was triggered.

Windows:
  - `src/util/windows-launch-preflight.mjs` — same logic, pure Node
    built-ins (the .mjs runs via `imcodes-launch-preflight.cmd`
    npm-generated shim, NOT direct path embedding, so non-ASCII
    usernames stay out of the .cmd body).
  - Wired into the daemon-watchdog .cmd between the upgrade.lock
    wait and the launch line; emitted in env-var form
    (`%APPDATA%\npm\imcodes-launch-preflight.cmd`) when the install
    is at the default prefix, absolute path otherwise. Skipped
    silently when the shim isn't present (older installs predating
    this change).
  - Watchdog already self-heals stuck `upgrade.lock` files (>10 min)
    and sharp-specific dependencies; this closes the gap on
    top-level deps (commander/ws/cors/body-parser/hono/
    @huggingface/transformers).

Tests (42 new, all platforms):
  - `test/util/launch-target.test.ts` — helper picks launcher when
    present, falls back to direct node otherwise.
  - `test/util/imcodes-launch-script.test.ts` — bash launcher behavior:
    healthy → noop, half-install → reinstall pinned, dist missing →
    reinstall, repair fails → still exec's entry, doesn't roll
    forward, clears stale lock.
  - `test/util/windows-launch-preflight.test.ts` — Node preflight
    same matrix.
  - `test/util/windows-launch-artifacts.test.ts` — watchdog emits
    preflight in env-var form, skips line when shim absent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real incident on 2026-05-08: PID 849488 attempted three consecutive
auto-upgrades to imcodes@2026.5.2070-dev.2047 / 2071-dev.2048.  Each
runner spawned successfully according to daemon.log, but 15 minutes
later the daemon released its memory freeze without ever being killed.
By the time we looked, all three tmp dirs were preserved (good — the
preserve-on-failure fix landed earlier) and the upgrade.log files
clearly showed `install FAILED (exit null)` — npm never actually ran.

Root cause: `spawnSync('cmd.exe', ['/d', '/s', '/c', 'C:\Program Files\nodejs\npm.cmd', ...])`
fails on Node 24 + Windows 10 when the npm path contains spaces.  Node
serializes the argv into a Windows command line, wrapping the path
with embedded space in quotes.  Empirical reproduction:

    cmd.exe /d /s /c "C:\Program Files\nodejs\npm.cmd" --version
    → 'C:\Program' is not recognized as an internal or external command

Despite cmd.exe `/s` documentation suggesting inner quotes are
preserved, the actual behavior on Node 24's argv-to-Windows command
line conversion is that cmd ends up running `C:\Program` as the
executable.  npm never runs, status=1, runner aborts via the install-
FAILED branch — daemon UNTOUCHED.  But daemon's own memory-freeze
watchdog has no way to know the runner aborted, so it sits frozen
for 15 min until the watchdog times out.  Three cycles in a row
on real production = "daemon dead".

Fix: bypass cmd.exe entirely.

  Path A (preferred): npm-cli.js direct.  npm.cmd is just a thin
  shim around `node node_modules/npm/bin/npm-cli.js`.  Resolve that
  path next to npm.cmd and invoke `node.exe` (a real .exe, no
  cmd.exe needed) directly with the .js path.  Zero quoting issues
  because no batch interpreter is involved.

  Path B (fallback): shell:true with bare 'npm' and PATH-prepended
  npmDir.  cmd.exe's PATH lookup handles the path-with-spaces
  natively (it's how interactive cmd works).  Empirically verified
  to work where direct cmd.exe /d /s /c does not.

Same approach for the imcodes.cmd shim (repair-watchdog, --version
checks): spawnCmdShim() uses shell:true + PATH-prepended bare basename.

Verified end-to-end: ran the bundled runner against the REAL
`C:\Program Files\nodejs\npm.cmd` with a bogus pkg spec.  Now
captures real npm error (`ETARGET No matching version found`)
instead of `'C:\Program' is not recognized`.

Tests added/updated:
  - windows-upgrade-runner.e2e.test.ts: 'npm spawn works through a
    path containing spaces (Node 24 cmd.exe quoting bug)' fixture
    builds a fake npm under `<tmp>/with space dir/` and asserts
    real npm output captured (not a quoting failure signature).
  - existing e2e tests updated to match the new trace-marker log
    format ("[trace] step=N lock-acquired") alongside the older
    "lock acquired" form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…port for repo-types

Two pre-existing Windows traps that bit us today:

## 1. core.autocrlf=true silently CRLF-converts shell scripts on checkout

Real-world hit on 2026-05-08: running scripts/restart-daemon.cmd (which
calls npm install / npm run build / npm link --force) caused git to
refresh bin/imcodes-launch.sh, and the developer's local
`core.autocrlf=true` (very common default on Windows installs) silently
converted it to CRLF.  `git status` showed it modified, with a
particularly subtle byte pattern: line 1 stayed LF (`#!/usr/bin/env
bash\n`) while every subsequent line gained `\r`.

A careless `git add .` would have committed CRLF into a #!/usr/bin/env
bash script, breaking bash on every Linux/macOS developer's machine
and in CI:
  - bash treats trailing `\r` as part of the next token
  - `if [ "$X" = "y" ]\r; then` becomes `if [ "$X" = "y" ]\r`, fails
    with `[: missing ]`
  - variable assignments get `\r` baked into values, breaking $PATH
    manipulation
  - "$'\r': command not found" on every line

Pre-existing .gitattributes only covered the opposite case (forcing
CRLF for `*.cmd` / `*.bat` / `*.vbs` / `*.ps1`) — there was no rule
defending Unix-format files against Windows checkouts.

Fix: explicit `text eol=lf` for every Unix-shebang or Node-spawn-target
extension we ship.  This OVERRIDES `core.autocrlf=true`, so a developer
with that setting still gets correct LF on checkout for these files.

  *.sh  *.bash  *.mjs  *.cjs  *.mts  *.cts  → eol=lf
  bin/*                                     → eol=lf
  .husky/_/husky.sh                         → eol=lf

(.mjs / .cjs / .mts / .cts included because they're Node entry points
spawned via wscript+VBS in the Windows upgrade flow — Node is more
tolerant of CRLF than bash, but not bulletproof, especially with
shebangs and source-map paths.)

## 2. src/shared/repo-types.ts symlink → tsc TS1128 on Windows

Pre-existing: the file was a `120000` git symlink pointing at
`../../shared/repo-types.ts`.  Default Windows git checkouts run with
`core.symlinks=false` (the default unless the user has Developer Mode
or runs git as admin), where git materializes symlinks as plain text
files containing the link target string.

Effect: tsc parses the literal text "../../shared/repo-types.ts" as
TypeScript source, fails with two TS1128 errors, and `npm run build`
exits non-zero.  This blocked scripts/restart-daemon.cmd from doing
the build step on any default-configured Windows dev machine.

Fix: replace the symlink with a re-export wrapper file.  Compiles
identically on every OS, no symlink support required.  Daemon and
server still import from `'../shared/repo-types.js'` — the new file
transparently forwards to the canonical `shared/repo-types.ts`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ink)

Emergency CI green-fix.  My earlier commit 54856f3 replaced the file's
content (was a `../../shared/repo-types.ts` symlink target string,
became a TypeScript re-export `export * from '../../shared/repo-types.js'`)
but git's index entry kept the 120000 mode bit from the original
symlink.  Worst-of-both-worlds outcome:

  - On Linux CI: git tried symlink('../../shared/repo-types.ts...', dst)
    with the entire 14-line TypeScript source as the symlink target.
    Either fails on the embedded \n (EINVAL on most kernels) or
    creates a useless symlink to a non-existent path.  Either way the
    file isn't on disk → tsc errors:
      src/daemon/command-handler.ts(769,26): error TS2307:
        Cannot find module '../shared/repo-types.js'
      src/daemon/repo-handler.ts(15,26): error TS2307:
        Cannot find module '../shared/repo-types.js'

  - On Windows CI: even more visible —
      error: unable to create symlink src/shared/repo-types.ts:
        Filename too long
    (the "target" string is 600+ bytes; NTFS reparse-point targets
    cap at 16 KiB but most filesystems and git's helper bail much
    earlier on non-conforming input).

The Write tool I used to create the file content didn't change the
git mode bit — git still saw it as a symlink in the index.  Fixed by:

    git update-index --add --cacheinfo \
        100644,<existing-blob-sha>,src/shared/repo-types.ts

which keeps the same blob content but flips the mode to 100644
(regular file).  Now both Linux and Windows CI checkouts get a
real file on disk, tsc resolves the module via the re-export, and
the daemon-side `import { REPO_MSG } from '../shared/repo-types.js'`
in command-handler.ts and repo-handler.ts compiles cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Complements eb718a2's launch-time self-heal with INSTALL-time self-heal.
Three classes of failure the new preinstall handles:

  1. `.imcodes-XXXXX` siblings npm leaves in the global node_modules
     when an atomic-rename gets interrupted (power loss / OOM-kill /
     SSH disconnect). Future `npm install -g` against the same prefix
     would otherwise see "directory exists" obstacles and fail with
     ENOTEMPTY at cleanup.

  2. Stale `~/.imcodes/upgrade.lock.d/` (>30 min) — same threshold
     and logic as the bash upgrade script's own watchdog and the
     launcher's stale-lock sweeper, but applied at install time too.

  3. Concurrent `imcodes-upgrade` running in the background (the 215
     scenario from 2026-05-08): the daemon's auto-upgrade was halfway
     through `npm install imcodes@2026.5.2073-dev.2050` when the user
     also ran `npm i -g imcodes@dev` manually. Two parallel npm runs
     against the same lib/node_modules/imcodes/ collided with
     ENOTEMPTY at cleanup. The preinstall now ABORTS with exit 1 +
     a clear stdout message ("Another `imcodes upgrade` is already
     running... wait or pkill") so the failure is diagnosable instead
     of confusing.

Crucially, the concurrent-upgrade detection walks the FULL ancestry
chain (us → npm → bash upgrade.sh → daemon) so when the install
itself was triggered BY the daemon's upgrade.sh, we don't false-flag
our own grandparent as a competitor.

Pure Node built-ins so it runs even when node_modules/ is broken.
Mostly idempotent: clean machines exit in milliseconds with nothing
to do.

Wired in via the strip-onnxruntime-gpu prepack hook — adds a
`preinstall: node dist/src/util/preinstall-cleanup.mjs || true` to
the published package.json (alongside the existing sharp-repair
postinstall). Verified the resulting tarball ships both lifecycle
scripts and the file is executable.

10 tests cover: clean install no-op, leftovers cleanup, non-`.imcodes-`
prefix safety, fresh vs stale lock, concurrent upgrade abort, ancestor
exclusion, env override escape hatch, npm_config_prefix fallback to
`npm prefix -g`, lock dir mtime fallback when `started` is missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a desktop-only "Aa" dropdown to the chat title bar that lets users
switch font family and size. Changes are real-time and propagated across
all open <ChatView> instances via a CustomEvent bus + native `storage`
event, so every chat window updates simultaneously.

Presets always include 5 generic stacks (system/sans/serif/mono/rounded)
plus JetBrains Mono — bundled as a webfont via @fontsource/jetbrains-mono
(OFL 1.1, ~43KB woff2 latin regular+bold) — and conditionally surface
12 well-known programmer mono families when actually installed
(Fira Code, Cascadia, Source Code Pro, IBM Plex Mono, Hack, Iosevka,
Inconsolata, Roboto Mono, Ubuntu Mono, Menlo, Consolas, SF Mono).

A "…" entry triggers the Local Font Access API (Chromium-only) to
enumerate every installed family with a searchable picker; gracefully
shows ⚠ on unsupported browsers / denied permission.

Preferences persist per-machine via localStorage under
`imcodes_fontPrefs:chat`. No i18n strings (icon-only UI). No daemon /
server / protocol changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-added custom phrases / commands occasionally vanished due to two
distinct races in QuickInputPanel's quick-data store:

1. A visibility-triggered `quickDataResource.invalidate()` that resolved
   inside the 2-second debounce window returned server data as-is,
   overwriting the optimistic in-memory value and erasing the
   just-added phrase from the UI. If the user then mutated again, the
   new `scheduleSave` cancelled the original closure-captured timer
   and the addition never reached the server.

2. Closing or reloading the tab inside the debounce window dropped the
   pending PUT entirely — the timer fell with the page.

Fix:
- Track unsaved phrase/command additions in module-level Sets
  (`_pendingPhraseAdds`, `_pendingCommandAdds`). `addPhrase`/`addCommand`
  insert; `removePhrase`/`removeCommand` evict.
- Layer pending adds onto every server response via `applyPendingAdds`
  so refreshes never wipe optimistic state.
- Snapshot pending sets at PUT start; clear only the in-flight items on
  success so concurrent mutations during the request are retained for
  the next save cycle.
- Add `flushPendingSave` synchronously firing the pending PUT with
  `keepalive: true` on `visibilitychange=hidden` and `pagehide`,
  covering desktop and mobile Safari unload paths.
- Stop wiping the in-memory resource to `EMPTY_QUICK_DATA` on transient
  fetch errors after first hydration.
- Add two regression tests reproducing both loss paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…acks

- Drop the desktop-only gate on the chat title bar and `chatFontStyle`
  so phones also get the "Aa" dropdown and use the chosen font.
- Centralize a `CJK_FALLBACK` stack covering PingFang SC/TC, Microsoft
  YaHei/JhengHei, Hiragino Sans (CN/JP), Yu Gothic, Apple SD Gothic
  Neo, Malgun Gothic, Noto Sans CJK SC, and Source Han Sans SC.
- Append CJK_FALLBACK to every monospace preset and the JetBrains Mono
  default so Chinese / Japanese / Korean characters render
  consistently across macOS / Windows / Linux / iOS / Android instead
  of falling onto the browser's last-resort font.
- Apply CJK_FALLBACK to `localFamilyToCssValue` so a user-picked local
  Latin-only font keeps CJK readable.
- Auto-migrate existing localStorage entries: `ensureCJKFallback`
  rewrites stored stacks that lack a CJK family on read, so users who
  picked a font in the previous build silently upgrade.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Aa" trigger sat at the right end of the chat title bar
(`justifyContent: flex-end`), which collided with the absolutely-
positioned `chat-panel-toggle` (⊞) at top:6/right:8 inside the same
`.chat-view-wrap`. The ⊞ button visually covered the Aa button on
both desktop and mobile.

- Flip the title bar to `flex-start` so the Aa trigger renders on the
  far left, opposite the ⊞ toggle on the far right.
- Re-anchor the popover from `right: 0` to `left: 0` so it expands
  rightward from its new origin and stays on screen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the wrap-grid of identical "Aa" tiles (which made it impossible
to tell presets apart on a small screen) with a two-row layout:

  Row 1: − [size] +  (the most common adjustment, closest to trigger)
  Row 2: native <select> showing each font's display name, rendered in
         its own family for inline preview, with the user's chosen font
         applied to the closed select itself.

Native select gives free scrolling, an OS-native picker on mobile, and
keyboard navigation — all of which the custom grid was missing. Each
preset now carries an explicit `name` field (System / Sans / Serif /
Mono / Rounded / JetBrains Mono / Fira Code / Cascadia Code / etc.).

Local-font enumeration (Local Font Access API) is integrated as a
single sentinel "…" entry inside the same select. Picking it triggers
`queryLocalFonts()` once; on success the resulting families appear in
an optgroup the next time the menu is opened. A "⚠" disabled entry
covers `unsupported`/`denied` permission states; a disabled "…"
covers the in-flight loading state.

Stored values that don't match any preset (or any local family) are
surfaced as an explicit orphan <option> so the select stays
controlled and the user can still see what they have selected. A
small `extractPrimaryFamily` helper reconciles slight stack
differences (e.g., post-`ensureCJKFallback` migration) so an old
saved preset still resolves to the right entry.

Removes the now-unused showMore / query state, the wrap-grid family
button styles, the search input, and the custom scroll list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…opdown

Native select arrows are inconsistent across browsers/OSes — Safari on
iOS draws a discreet caret, Chrome on Android one shape, Firefox
desktop another, and our dark theme washes them all out. Users
mistook the closed select for a static label.

- Suppress the native arrow via `appearance: none` (+ -webkit / -moz
  prefixes for older Safari and Firefox).
- Wrap the <select> and overlay our own ▾ chevron at the right edge
  with `pointer-events: none` so taps still hit the underlying select
  and open the native picker.
- Reserve right padding on the select for the custom chevron so long
  font names don't run under it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The coverage report job was the slowest stage in CI. Four compounding
problems, all fixed here:

1. The `coverage:` block in `vitest.config.ts` was placed at the top
   level of the config object, where vitest silently ignores it and
   falls back to its built-in defaults — including the `html` reporter
   (writes hundreds of per-file pages) and an unbounded include glob
   that re-instruments the entire workspace on every run. Move it
   under `test.coverage` where it belongs and pin `provider: 'v8'`.

2. `test:coverage` ran ALL workspace projects, which meant the e2e
   suite (21 tests, `fileParallelism: false`, 60 s per test, real tmux
   spawning) ran a SECOND time inside coverage even though it already
   has its own `E2E Tests` job. Add `--project daemon --project web
   --project server` so only unit/component projects are instrumented.
   This also lets us drop `--no-file-parallelism --maxWorkers 1`
   (originally added to keep e2e stable) and the elevated 60 s
   timeouts — restoring full worker parallelism.

3. The CI coverage job ran `npm run build` and installed/primed tmux
   even though tests resolve from `src/` and no longer touch `dist/`
   or e2e. Drop both steps.

4. Switched the lcov reporter to `lcovonly` so we keep the data file
   (Codecov + the PR-comment action both consume it) but skip the
   sibling `lcov-report/` directory of ~556 per-file HTML pages
   (~24 MB) that nothing in CI reads.

Tightened `coverage.include`/`exclude` so v8 only instruments
`src/`, `web/src/`, `server/src/`, and `shared/` — not tests, build
outputs, scripts, benches, or docs.

Local timing: 10+ min → 2:13 (~5–6× faster). CI gain should be
similar, plus extra savings from removing build + tmux setup steps.

All four expected outputs still produced: lcov.info, coverage-summary.json,
coverage-final.json, and the terminal table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eed dist/

cea8af0 dropped `npm run build` from the Coverage Report job on the
assumption that "all current tests resolve from src/ via tsx transform."
That's wrong for two daemon-project suites that intentionally assert
against the built output:

  - test/packaging.test.ts — verifies package.json bin/main/files paths
    point at real files under dist/
  - test/util/postinstall-sharp-repair.test.ts — spawns the published
    dist/src/util/postinstall-sharp-repair.js and asserts behavior

Both pass in Unit Tests (which still runs npm run build) but fail in
the Coverage Report job, breaking CI on every push since cea8af0.

Restore the build step. tmux removal stays — coverage no longer runs
e2e — so the bulk of cea8af0's speed-up is preserved (build itself
is ~30s vs the 8+ min saved by skipping e2e + tightening reporters).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@im4codes im4codes merged commit af02a56 into master May 8, 2026
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant