Skip to content

feat(web): show active agent state as thinking#1

Open
fastestdevalive wants to merge 1 commit into
mainfrom
feat/vscode-integration-thinking
Open

feat(web): show active agent state as thinking#1
fastestdevalive wants to merge 1 commit into
mainfrom
feat/vscode-integration-thinking

Conversation

@fastestdevalive
Copy link
Copy Markdown
Owner

Summary

  • Present active session activity as thinking in UI status pills and detail headers.
  • Keep backend/activity state values unchanged (active), so lifecycle logic and APIs remain compatible.
  • Update component tests to match the new user-facing copy.

Why

  • The vscode-integration-thinking task aims to make in-progress agent work clearer to users; "thinking" better communicates active reasoning than the generic "active" label.

Validation

  • pnpm build (passes)
  • pnpm typecheck (passes)
  • pnpm lint (passes with pre-existing warnings)
  • pnpm test (fails in packages/core due to local config discovery at /home/gb/.ao-control/agent-orchestrator.yaml, pre-existing and unrelated to this change)

Closes #vscode-integration-thinking

Made with Cursor

Align status wording with vscode-integration-thinking by presenting `active` agent state as "thinking" in UI badges and detail headers, so users can better infer in-progress reasoning without changing backend state semantics.

Made-with: Cursor
fastestdevalive pushed a commit that referenced this pull request Mar 28, 2026
Fixes all 12 issues identified in the Cursor Bugbot review:

#4 – Setup tests now assert non-interactive mode skips validation and
  auto-generates tokens; removed incorrect validateToken call expectations.

#5 – Replaced module-level mutable `tsFailures` in doctor.ts with a
  `makeFailCounter()` closure that is local to each command invocation,
  eliminating potential state bleed between invocations.

#6 – Both `notify`, `notifyWithActions`, and `post` in notifier-discord
  now consistently guard on `effectiveUrl` (which includes thread_id),
  not on the raw `webhookUrl`. Removes non-null assertions.

#7/#12 – setup.ts now writes `${OPENCLAW_HOOKS_TOKEN}` as the token
  value in the YAML config instead of the raw token, so credentials are
  never committed to version control. setup.test.ts already expected this
  placeholder; the test was correct, the code was not.

#8 – `ao_batch_spawn` follow-up setTimeout handles are tracked in
  `batchSpawnFollowUpTimeouts[]` and cleared when the health service stops,
  preventing timer leaks after plugin shutdown.

#11 – Discord 429 Retry-After handling no longer double-delays: a
  `skipNextBackoff` flag is set after waiting for Retry-After so the
  following iteration skips the standard exponential backoff.

Also removes the unused `yamlStringify` import from setup.ts.

Issues #1/#2/#3/#9/#10 were already correctly addressed in previous commits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fastestdevalive pushed a commit that referenced this pull request Mar 30, 2026
Fixes all 12 issues identified in the Cursor Bugbot review:

#4 – Setup tests now assert non-interactive mode skips validation and
  auto-generates tokens; removed incorrect validateToken call expectations.

#5 – Replaced module-level mutable `tsFailures` in doctor.ts with a
  `makeFailCounter()` closure that is local to each command invocation,
  eliminating potential state bleed between invocations.

#6 – Both `notify`, `notifyWithActions`, and `post` in notifier-discord
  now consistently guard on `effectiveUrl` (which includes thread_id),
  not on the raw `webhookUrl`. Removes non-null assertions.

#7/#12 – setup.ts now writes `${OPENCLAW_HOOKS_TOKEN}` as the token
  value in the YAML config instead of the raw token, so credentials are
  never committed to version control. setup.test.ts already expected this
  placeholder; the test was correct, the code was not.

#8 – `ao_batch_spawn` follow-up setTimeout handles are tracked in
  `batchSpawnFollowUpTimeouts[]` and cleared when the health service stops,
  preventing timer leaks after plugin shutdown.

#11 – Discord 429 Retry-After handling no longer double-delays: a
  `skipNextBackoff` flag is set after waiting for Retry-After so the
  following iteration skips the standard exponential backoff.

Also removes the unused `yamlStringify` import from setup.ts.

Issues #1/#2/#3/#9/#10 were already correctly addressed in previous commits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fastestdevalive pushed a commit that referenced this pull request Apr 11, 2026
ComposioHQ#927)

* style(design): FINDING-001 — add prefers-reduced-motion support

All animations and transitions are disabled when the user's system
requests reduced motion, per DESIGN.md accessibility requirements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(design): FINDING-003 — remove concurrent breathe animations

Status pills had two animations: a breathe animation on the pill
and a dot-pulse on the child dot. DESIGN.md says "one animation per
element, one purpose" and "keep dot pulse, remove border heartbeat."

Removed all three breathe keyframes, kept dot-pulse only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(design): FINDING-004 — fix dashboard title weight and tracking

DESIGN.md specifies display headings at weight 680 and letter-spacing
-0.035em. The dashboard title was using 600 / -0.05em.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(design): FINDING-005 — fix detail-card text to blue-tinted graphite

Detail cards overrode text-secondary and text-tertiary with neutral
grays (#9898a0, #5c5c66). DESIGN.md specifies blue-tinted graphite
palette (#a5afc4, #6f7c94) for dark mode text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(design): FINDING-008 — add text-wrap: balance on headings

Dashboard title and kanban column titles now use text-wrap: balance
for more even line breaks on narrow viewports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(design): FINDING-006/009 — fix section label semantics and spacing

Changed "Attention Board" from <h2> to <div role="heading"> since it's
styled as a 12px uppercase label, not a heading. Also fixed letter-spacing
from 0.16em to 0.06em per DESIGN.md UI/Labels spec.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(design): FINDING-007 — contextual empty state messages

Empty kanban columns now show context-specific messages instead of
generic "No sessions" text. Each column's empty state reflects its
purpose: "No agents need your input" (Respond), "No code waiting
for review" (Review), etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(design): fresh design system — Warm Terminal

Complete redesign from Industrial Precision (blue-tinted) to Warm Terminal
(brown-tinted). Key changes:

- Warm charcoal surfaces (#121110, #1a1918, #222120) replace blue-gray
- Cream text (#f0ece8) replaces blue-white (#eef3ff)
- Warm periwinkle accent (#8b9cf7) replaces cool blue (#5B7EF8)
- Berkeley Mono for display headlines (mono cohesion)
- Added: Accessibility section (44px touch targets, WCAG AA, focus-visible)
- Added: Component anatomy (button states, card structure, input fields)
- Added: Light mode design rationale (warm parchment, not clinical white)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(design): swap Berkeley Mono for JetBrains Mono (free)

Berkeley Mono is a paid font ($75). JetBrains Mono is free, open source,
already loaded in the project, and the mono-for-headlines concept works
the same way with it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(design): fix light mode contrast failures

Light mode text-tertiary #a8a29e failed WCAG AA at 2.5:1 on white.
Darkened to #736e6b (5.0:1). Light mode accent #6b73c4 was borderline
at 4.3:1, darkened to #5c64b5 (5.3:1). All pairs now pass AA.

Added verified contrast ratios for both modes to accessibility section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: gitignore .gstack/ directory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(design): add design audit report and screenshots

Design review audit report with before/after screenshots for all
dashboard pages (kanban, session detail, PRs) across desktop, tablet,
and mobile viewports in both light and dark mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(design): address PR review comments

- Fix mobile test expecting removed "No sessions" text. The merge zone
  emptyMessage is now "Nothing cleared to land yet." (Bugbot comment #1)
- Remove no-op .dark .detail-card override that duplicated global dark
  values after FINDING-005 fix aligned them (Bugbot comment #2)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gitignore .gstack-report/ and remove from tracking

The .gstack-report/ directory contains local audit artifacts with
filesystem paths. Should not be tracked in the repository.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(design): align dashboard title CSS to new DESIGN.md spec

Dashboard title was using old Geist Sans values (weight 680, -0.035em).
New spec is JetBrains Mono, weight 500, letter-spacing -0.02em.
Added font-family: var(--font-mono) to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use native h2 element for Attention Board section heading

Replace ARIA role="heading" div with semantic h2 per ARIA first rule — native elements are preferred over ARIA roles for actual headings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: pre-landing review fixes — a11y, dead code, test coverage

- Add aria-controls + id to accordion button/body pair in AttentionZone
- Wrap empty-state messages in aria-live="polite" regions for AT announcements
- Remove dead message prop and isDefault from EmptyState (Skeleton.tsx)
- Add parameterized test covering all 6 zone-specific empty messages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: revert aria-live on empty states — causes false AT announcements

Codex review identified that role="status" aria-live on static empty-state
text causes burst announcements on page load (all empty columns fire) and
announces in collapsed mobile sections that aren't visible. Empty states are
static text, not dynamic transitions. The aria-controls fix is kept.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: bump version and changelog (v0.0.1.0)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: remove .gstack-report/ from .gitignore

* chore: remove VERSION and CHANGELOG (not used in this project)

* style(design): warm terminal color migration + inline style removal

Migrate all CSS tokens from cool blue-tinted graphite to warm
brown-tinted terminal aesthetic per DESIGN.md spec. Replace inline
style color mappings in ActivityDot, AttentionZone, Dashboard,
ProjectSidebar, and SessionCard with data-attribute CSS selectors.
Fix duplicate className bug on SessionCard done-title element.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: pre-landing review fixes — activity dot fallback + review stat color

Add base CSS fallback for activity-dot, activity-pill, and
activity-pill__text so null/unknown activity states render visibly
(gray) instead of invisible. Fix review stat card to use accent-orange
(matching kanban/sidebar/mobile review indicators) instead of cyan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: checkpoint current design branch state

* design changes

* feat(web): redesign session detail page — compact PR card, identity strip, layout reorder

- Redesign SessionTopStrip with simplified breadcrumbs, action buttons (Message/Kill)
- Replace stacked PR card with compact inline layout: title row + blocker/CI chips + collapsible comments
- Move PR card above terminal for better information hierarchy
- Replace vertical IssuesList with inline buildBlockerChips helper
- Add ~200 lines of new CSS classes for compact PR card design system
- Add changedFiles field to DashboardPR type

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(web): design system tokens, sidebar redesign, and component primitives

- Align all color tokens (status, bg, border, text) across three HTML mockups
- Rewrite ProjectSidebar to match finalized.html: rotation chevron, session status text, border-bottom project separators, 224px width
- Add packages/web/DESIGN.md: agent-readable reference for tokens, typography, component patterns, anti-patterns
- Add Badge.tsx: generic badge/chip/pill primitive with status/outline/default variants
- Add Button.tsx: ghost/primary/danger button primitive
- Update CLAUDE.md to reference DESIGN.md as required pre-read for web UI work

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style(design): FINDING-001 — card border-radius 0 → 6px to match mockup

* style(design): FINDING-002 — column border-radius 0 → 7px, border subtle to match mockup

* style(design): FINDING-003 — column header mono font, 500 weight, muted color to match mockup

* style(design): FINDING-004/005/006 — fix accent-blue/yellow/purple tokens to match mockup

* style(design): FINDING-007 — add --color-bg-card token (light #fff, dark #1c1b19)

* Refine dashboard design system and remove fixture flow

* Fix respond status colors in dashboard indicators

* Align tests with updated dashboard and metadata behavior

* fix(core): register notifier aliases consistently

* chore(web): drop uncovered showcase routes

* Add desktop PullRequestsPage coverage tests

* Remove generated coverage artifact

* Consolidate web design guidance into the root design system

* Fix working and ready status color tokens

* Fix sidebar collapse and inline kill confirmation

* fix(qa): ISSUE-001 - show all mobile filter chips

* Restore full title contrast in session cards

* Fix review feedback in dashboard state styling

* style: implement mobile responsive designs (feed, terminal-first, dense PRs)

Dashboard: replace accordion with urgency-sorted priority feed, horizontal scroll filter pills.
Session Detail: terminal-first layout with floating header, status pill, PR bottom sheet.
PRs: dense rows with CI dots, grouped sections, muted merged/closed rows.
Update tests to match new mobile layouts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix mobile terminal padding with PR sheet layout

* Polish mobile feed and session detail styling

* Align mobile dashboard layouts with gstack designs

* Fix mobile terminal actions and PR review labels

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fastestdevalive pushed a commit that referenced this pull request May 1, 2026
* feat(plugin): add kimicode agent plugin

Add @aoagents/ao-plugin-agent-kimicode implementing the Agent interface
for MoonshotAI's Kimi Code CLI. Follows the AO activity JSONL + PATH
wrapper pattern established by agent-aider/opencode, with a native-ish
signal sourced from ~/.kimi/<session>/ mtimes when present.

- Full Agent interface: getLaunchCommand (--yolo, --model, --agent-file),
  getEnvironment (AO_SESSION_ID + ~/.ao/bin PATH + GH_PATH), detectActivity,
  getActivityState (5-step cascade with mandatory JSONL entry fallback),
  isProcessRunning (tmux TTY + PID signal-0, matches `.kimi`/`uv run kimi`),
  getSessionInfo (state.json parsing), getRestoreCommand (--resume <id>
  with --continue fallback), setupWorkspaceHooks, postLaunchSetup,
  recordActivity, detect().
- Post-launch prompt delivery — kimi's `-p` implicitly enables --print and
  exits, which would break interactive supervised sessions.
- 58 unit tests covering all 7 mandatory getActivityState cases plus
  manifest, launch, env, prompt classification, process detection,
  session info extraction, restore command, and detect().
- Register in cli/src/lib/plugins.ts, detect-agent.ts, plugin-registry.json,
  cli package deps, and update user-facing docs / yaml examples.

Closes ComposioHQ#1384

* fix(plugin): register kimicode in core BUILTIN_PLUGINS and web services

The CLI-side registration in packages/cli/src/lib/plugins.ts only covers
`getAgentByName` callers. Code paths that go through the shared plugin
registry (session-manager, doctor, plugin, verify CLI commands, and the
web dashboard's services singleton) use `createPluginRegistry()` +
`loadBuiltins()` / explicit `register()`, which bypass the CLI map.

Without this wiring:
- `pnpm ao doctor` / `ao plugin` / `ao verify` wouldn't see kimicode
- Web dashboard would fail to render sessions with `agent: kimicode`
  because the webpack-bundled services.ts couldn't resolve the plugin

Add kimicode to:
- packages/core/src/plugin-registry.ts BUILTIN_PLUGINS
- packages/web/package.json dependencies
- packages/web/src/lib/services.ts static imports + register call

Caught while comparing against ComposioHQ#1395 (kimi-2-6-code plugin), which added
the same registry entry.

* fix(plugin-kimicode): address review feedback

Critical (from @harshitsinghbhandari, verified against kimi-cli source):
- Remove `promptDelivery: "post-launch"` — `-p`/`--prompt` is just a prompt
  string alias (also `--command`/`-c`), NOT a mode switch. The non-interactive
  flag is `--print`, which we never set. Inline delivery via `--prompt` is
  reliable and avoids the post-launch sendMessage() delay.
- Drop unchecked `as string` casts in getRestoreCommand in favor of typeof
  guards + `?? undefined` so null model values don't silently leak.

Medium (performance):
- Add 30s per-workspace cache to findKimiSessionMatch (mirrors codex's
  SESSION_FILE_CACHE_TTL_MS) so the ~/.kimi/ scan doesn't run 12×/min per
  active session. Cache keyed by workspacePath; cleared via the new
  `_resetSessionMatchCache` test-only export between test cases.

Minor (correctness):
- Collapse findKimiSessionDir + readKimiSessionState into one
  findKimiSessionMatch that returns {dir, state} from a single state.json
  read. Previously the file was parsed twice per getSessionInfo /
  getRestoreCommand call.
- Wire config.subagent → `kimi --agent <name>` (default / okabe / custom).
- Tighten detectActivity patterns so "I approve of this approach" and
  "Earlier I failed to connect" no longer falsely trigger waiting_input /
  blocked. Regexes are now line-anchored with `^`/`$` + `\b` word boundaries.

Tests: 58 → 71 (all green). New cases cover:
- Native-signal ready/idle decay (previously only active was tested)
- Cascade ordering: JSONL waiting_input wins over a matching native signal
- Malformed state.json in both getSessionInfo and getRestoreCommand
- `work_dir` alias accepted in addition to `cwd`
- project.agentConfig.model preferred over state.json's recorded model
- False-positive narration guards for both regex tightenings

* refactor(plugin-kimicode): clean up after second-round review

All changes are non-behavioral perf/style cleanups flagged during my second
review pass — no user-visible changes.

- Consolidate double JSON.parse in findKimiSessionMatchUncached: the previous
  pass parsed each candidate state.json once to extract cwd and a second time
  to extract session_id/model/title. Replaced both helpers with a single
  `parseKimiState(raw)` that returns all four fields in one traversal.
- Carry state.json's mtime through KimiSessionMatch so getKimiLiveSignalMtime
  (renamed from getKimiSessionMtime) doesn't re-stat state.json — the winner's
  mtime was already captured during the scan. Live-signal probe is now limited
  to context.jsonl + wire.jsonl (the per-turn files) and runs them in parallel
  via Promise.all instead of sequential awaits.
- Fold state.json mtime and the live-signal mtime into a single "freshest"
  timestamp in getActivityState so a recently-written context.jsonl wins even
  when state.json is stale.
- Tighten appendApprovalFlags signature: `string | undefined` → proper
  `AgentPermissionInput | undefined` so typos at call sites fail at compile
  time.
- Stricter detect(): don't trust every binary named `kimi` — verify the
  --version output mentions kimi/kimi-cli/kimi-code, and fall back to
  `kimi info` for builds that print a bare version number. Rejects unrelated
  tools that happen to install a `kimi` binary.

Tests: 71 → 75. New coverage:
- detect() accepts kimi-cli vendor strings
- detect() falls back to `kimi info` when --version is ambiguous
- detect() rejects an unrelated `kimi` binary
- Native signal picks the fresher of state.json vs context.jsonl mtimes

* fix(plugin-kimicode): correct session layout discovered via smoke test

Installing kimi-cli 1.38.0 locally (\`uv tool install kimi-cli\`) and running
it once revealed the plugin's session-discovery logic was built on wrong
assumptions about the on-disk layout.

Observed layout (kimi-cli 1.38.0):

  ~/.kimi/sessions/<md5(cwd)>/<session-uuid>/
    context.jsonl  — conversation history
    wire.jsonl     — turn events (TurnBegin/TurnEnd with user_input payload)

Differences from my original assumptions:

- Sessions are nested under \`sessions/\` (not direct subdirectories of
  \`~/.kimi/\`).
- The workspace is identified by an MD5 hash of the absolute path, not by
  a \`cwd\` field stored in a state file.
- There is no \`state.json\`. No \`title\`, \`model\`, or \`cost\` is persisted.
- The session ID is the UUID directory name and is accepted as-is by
  \`kimi --resume <uuid>\`.
- The old \`--continue\` fallback is unnecessary — if we found the directory,
  we always know its UUID.

Fixes:

- \`findKimiSessionMatch\` now computes \`md5(workspacePath)\` with node:crypto
  and lists \`~/.kimi/sessions/<hash>/\` directly. No more full-tree scan of
  \`~/.kimi/\`, no more \`readFile\` of a fictional \`state.json\`.
- \`getKimiLiveSignalMtime\` keeps the parallel \`Promise.all\` stat of
  context.jsonl + wire.jsonl (the only files that exist).
- \`getSessionInfo\` streams the first \`TurnBegin\` out of wire.jsonl as a
  best-effort summary, with a 1 MB byte ceiling. agentSessionId is the UUID.
- \`getRestoreCommand\` drops the \`--continue\` fallback branch — a found dir
  always has a usable UUID.

Verified end-to-end against the real kimi-cli 1.38 binary on this machine:
- \`detect()\` → true
- \`getLaunchCommand\` output parses cleanly when run with \`--help\`
- \`getSessionInfo\` extracts the actual first user prompt ("say hello")
- \`getRestoreCommand\` produces the same UUID kimi itself prints as the
  resume hint: \`kimi -r 6ec34626-aedf-4659-a061-c5fbfa4cf166\`

Tests remain at 75 green. Coverage is now against real on-disk layouts
using temp directories with MD5-hashed bucket names — no mock-structure
drift from reality.

* fix(plugin-kimicode): address follow-up review issues

Follow-up to the issues filed as a review comment on the PR.

[MED] detect() too loose (\bkimi\b matches unrelated binaries)
  The old regex accepted plain "kimi" alone because the (?:cli|code)?
  suffix was optional — any binary whose output contains "kimi" passed.
  Real kimi-cli's --version prints just "kimi, version X.Y.Z" (no suffix),
  so --version alone can't distinguish it from, say, a hypothetical
  keyboard-input-manager named kimi. Switch to `kimi info` exclusively;
  real kimi-cli prints "kimi-cli version: ..." which is a distinct vendor
  string. Regex now requires "kimi-cli" / "kimi-code" / "moonshot"
  literally. Added maxBuffer cap (4 KB) so a hostile binary can't flood
  detect() with MB-scale output.

[MED] --work-dir not passed — investigated, not actionable in this PR
  AgentLaunchConfig doesn't expose session.workspacePath — only
  projectConfig.path (the project root), which would actively break
  discovery if passed. Runtime cwd handling is load-bearing. Left a
  comment explaining the constraint and pointing at the core-types
  change needed to fix it properly.

[LOW] Empty-bucket race returned transient null
  During session creation kimi mkdirs the UUID directory before writing
  context.jsonl / wire.jsonl. getKimiLiveSignalMtime returned null in
  that window and findKimiSessionMatch returned null, flickering the
  dashboard to "no signal". Fall back to the UUID directory's own mtime
  when live files are absent.

[LOW] isProcessRunning matched "kimi" anywhere in ps args
  Old regex /(?:^|\/)\.?kimi(?:\s|$)|(?:\s|^)kimi(?:\s|$)/ matched
  `cat kimi.log`, `vim ~/.kimi/config.toml`, etc. Anchor to argv[0]
  instead — only the executable itself, or a python/uv/node runner
  followed by `kimi` as the first positional argument, counts.

[NIT] Symlink normalization
  kimi's process reads cwd via os.getcwd(), which returns the realpath on
  Linux. If AO hands us a symlinked workspacePath, our MD5(symlink) won't
  match kimi's MD5(realpath). realpath-resolve with a best-effort fallback
  to the raw string (preserves behavior when the path doesn't exist yet).

Tests: 75 → 80. New coverage:
- detect() vendor-string matrix: kimi-cli / kimi-code / moonshot accepted,
  unrelated "kimi keyboard input manager" rejected
- isProcessRunning rejects `cat kimi.log` / `vim ~/.kimi/config.toml`
- isProcessRunning accepts `python -m kimi`
- Native signal falls back to UUID-dir mtime during the empty-bucket race
- Symlinked workspace path matches the realpath-hashed bucket

Verified end-to-end against real kimi-cli 1.38.0:
- detect() → true (via `kimi info` vendor match)
- getSessionInfo → correct summary + UUID
- getRestoreCommand → matches kimi's own resume hint

* fix(plugin-kimicode): address inline review from illegalcall

Addresses all 10 inline comments on PR ComposioHQ#1390.

Load-bearing fixes:

[#6 line 327] detectActivity ordering was wrong
  The old code checked the idle prompt (`^kimi>\s*$`) before approval/error
  patterns. Real kimi UI re-renders `kimi>` on the last line when asking for
  a confirmation, so \`(Y)es/(N)o\\nkimi>\` was misclassified as idle and the
  session would sit forever looking quiet while actually blocked on input.
  Reordered to: waiting_input → blocked → idle → active. Matches codex/aider.

[#2,#4,#8 lines 128,154,493] No stable AO↔Kimi session binding
  Discovery was pure (path-hash + recency). If the user ran kimi manually in
  the same repo, or two AO sessions shared a workspace hash, AO would attach
  to the wrong UUID — summary / activity / --resume target all corrupted.
  Now:
   - \`session.metadata.kimiSessionId\` pins a specific UUID when set; no
     fallback to recency when the pin misses (fails closed, no silent drift).
   - Unpinned lookups filter UUIDs by \`liveMtime >= session.createdAt - 60s\`
     so stray dirs from prior AO sessions don't attach.
   - findKimiSessionMatch now takes the whole Session (not just workspacePath)
     so createdAt + metadata are available.

[#3 line 141] Any recent subdir was treated as a real session
  Stray temp dirs and crash leftovers would match on mtime, producing
  \`kimi --resume <garbage>\` and bogus active states. Now require
  context.jsonl OR wire.jsonl to exist before trusting a dir. The race
  fallback (empty UUID dir → dir mtime) is removed — the JSONL activity
  fallback in getActivityState covers the startup window instead.

[#5 line 191] Symlink follow outside ~/.kimi/sessions/
  \`stat()\` / \`createReadStream()\` followed symlinks without rebinding, so
  a bucket entry that's a symlink to \`/dev/zero\` or \`/etc/passwd\` would
  hang forever or leak data. Added \`isInsideKimiSessions(path)\` that realpaths
  the candidate and rejects anything outside the sessions root. Every
  bucket entry is checked before use.

Smaller cleanups:

[#1 line 89] Cache: 30s negative TTL + unbounded growth
  Negative results now cached 2s so a session appearing mid-poll is picked
  up on the next cycle. Expired entries evicted on read. Cache capped at
  256 entries with oldest-expiry pruning. Key changed to (workspacePath,
  pinnedUuid) so two AO sessions in the same bucket can't poison each
  other's cache entry.

[#7 line 440] Duplicate argv0Re regex — use the const.

[#9 line 532] maxBuffer: 4096 → 65536. Future \`kimi info\` releases that add
  plugin listings or telemetry banners won't silently break detect() with
  swallowed ENOBUFS.

[#10 test line 650] macOS test breakage: /var/folders is a symlink to
  /private/var/folders, so fakeHome under tmpdir() is a symlink path, while
  the plugin realpaths before hashing. Wrap the mkdtempSync in realpathSync
  so tests agree with the plugin on the canonical path. Linux CI masked this.

Tests: 80 → 86. New coverage:
  - detectActivity classifies confirmation-then-prompt-rerender as waiting_input
  - detectActivity classifies error-then-prompt-rerender as blocked
  - createdAt floor filter (ignores UUIDs from before the AO session)
  - Pinned kimiSessionId wins over recency
  - Pinned UUID missing returns null (no silent fallback)
  - Negative cache TTL ~2s (session appearing mid-poll picked up next cycle)
  - Empty UUID dir without live files is rejected (no stray-dir attach)

Verified end-to-end against real kimi-cli 1.38.0: detect() true,
getSessionInfo extracts correct summary + UUID, getRestoreCommand matches
kimi's own resume hint.

* fix(plugin-kimicode): use kimi.json for workspace mapping and add --work-dir

Read ~/.kimi/kimi.json work_dirs[] as the authoritative workspace-to-session
mapping. When last_session_id is populated, prefer it over the directory-mtime
recency heuristic — kimi itself wrote it. Falls back gracefully to the existing
MD5 hash scan when kimi.json is absent or last_session_id is null.

Add --work-dir to getLaunchCommand using projectConfig.path to establish an
explicit cwd contract, preventing shell-rc / tmux-hook drift from causing the
MD5(cwd) hash to diverge from kimi's session bucket.

* fix(plugin-kimicode): plumb workspacePath into AgentLaunchConfig

The kimicode plugin's --work-dir was passing projectConfig.path, which
breaks worktree-mode workspaces. In worktree mode, projectConfig.path is
the original repo root while session.workspacePath is the per-session
checkout — they differ. Either kimi would write to the project root
(breaking worktree isolation) or md5(projectConfig.path) would diverge
from md5(session.workspacePath), so getActivityState/getSessionInfo would
never find this session's bucket.

Fix:
- Add optional `workspacePath` field to AgentLaunchConfig.
- Plumb it through all 3 launch call sites in session-manager.ts.
- kimicode getLaunchCommand uses config.workspacePath, falling back to
  config.projectConfig.path when undefined.
- Tests for the divergent-paths case.

Public-interface change: AgentLaunchConfig grows one optional field.

Invariants preserved:
- Agent.getLaunchCommand signature unchanged — still takes one
  AgentLaunchConfig.
- Existing plugins (claude-code, aider, codex, opencode) compile and run
  unchanged; the new field is optional and they ignore it.
- Clone-mode workspaces (where workspacePath === projectConfig.path)
  produce the same launch command as before.
- Fallback to projectConfig.path keeps callers that don't pass the new
  field working — no flag day required.

* fix(plugin-kimicode): capture baseline pre-launch to close startup race

captureKimiBaseline() previously ran in postLaunchSetup, which races
against kimi's own startup writes. If kimi created its UUID directory
before postLaunchSetup ran, that UUID landed in `preExistingUuids` and
was filtered out forever — so `findKimiSessionMatch` returned null
permanently for that session.

Fix:
- Add optional `preLaunchSetup(workspacePath)` to the Agent interface,
  invoked from session-manager AFTER the workspace exists but BEFORE
  `runtime.create()` spawns the agent.
- Move captureKimiBaseline from postLaunchSetup to preLaunchSetup in
  the kimicode plugin.
- Test asserts the new UUID is attached even when written immediately
  after preLaunchSetup runs (i.e. in the race window).

Public-interface change: Agent.preLaunchSetup is optional. Existing
plugins (claude-code, aider, codex, opencode) compile and behave
unchanged. Only kimicode opts in.

Invariants preserved:
- Workspace exists before preLaunchSetup runs (called after the
  worktree/clone is created, never before).
- Failures in preLaunchSetup propagate just like other launch-path
  failures — the existing try/catch covers it.
- captureKimiBaseline is still write-once (returns early if the
  baseline file already exists), so restore preserves the original
  partition.

* fix(plugin-kimicode): persist UUID pin to disk instead of dead metadata

The session.metadata.kimiSessionId branch was treated as the highest-
priority signal but nothing ever populated it. That left the entire
"AO↔kimi UUID binding" mechanism dead — discovery fell through to the
recency heuristic on every call, so a manual `kimi` run in the same
workspace, a sibling AO session sharing a bucket, or any drift in
kimi's directory layout could attach the wrong session.

Fix:
- Remove the dead session.metadata.kimiSessionId branch from
  findKimiSessionMatchUncached and the cache key.
- Add a workspace-local pin file (.ao/kimi-session-id.json). Once
  findKimiSessionMatchUncached identifies a winner via the recency
  heuristic (or via kimi.json's last_session_id soft-pin), it writes
  the UUID to the pin file. Subsequent calls read the pin file as the
  highest-priority signal and skip the heuristic entirely — locking
  in the AO↔kimi binding for the rest of the session lifetime.
- Cache key simplified to workspacePath alone since the pin is now
  persistent and cannot drift between calls.
- Tests cover: pin wins over recency, first match writes the pin,
  pin holds when a newer non-pinned UUID appears later.

Mechanism mirrors the existing .ao/kimi-baseline.json pattern (also
file-based, write-once, lives in the workspace).

* refactor(plugin-kimicode): extract session-discovery into its own module

index.ts had grown to 880 lines after the pin-file fix landed. The
discovery layer (kimi.json parsing, baseline capture, pin file, hash
bucket scan, cache) is one cohesive responsibility — pulling it out
keeps both files under the 500-line mark and makes the precedence
rules legible.

- New file: session-discovery.ts. Opens with a decision-table comment
  documenting the precedence (pin file → kimi.json soft-pin → recency
  heuristic) so future readers see the rule before the code.
- Public surface: captureKimiBaseline, findKimiSessionMatch,
  KimiSessionMatch, kimiShareDir, _resetSessionMatchCache.
- index.ts re-exports _resetSessionMatchCache so the existing test
  imports keep working.
- No behavioral change — all 98 tests pass unchanged.

* test(plugin-kimicode): worktree-mode end-to-end discovery test

Adds a test where workspacePath (per-session worktree) and
projectConfig.path (repo root) are different paths. Asserts that
discovery hashes workspacePath — not projectConfig.path — for the
kimi bucket lookup. Previously this scenario was untested; the bug
fixed in 9fcc1d9 (--work-dir using projectConfig.path) would have
been caught by this test.

Combined with the earlier --work-dir tests in 9fcc1d9, the worktree
divergent-paths case is now exercised at both the launch site
(getLaunchCommand) and the discovery site (getRestoreCommand) end
to end.

* fix(plugin-kimicode): sandbox-check live-signal files against symlinks

Addresses illegalcall's review comment (id 3127022353): the existing
isInsideKimiSessions check verified the session DIRECTORY but not its
children. A symlinked context.jsonl, wire.jsonl, or wire.jsonl pointing
at /etc/passwd, /dev/zero, or a FIFO would be silently followed by
stat() / createReadStream() — leaking reads, hanging on devices, or
escaping the kimi-sessions sandbox.

Fix:
- New isKimiSessionFile(path) helper using lstat + isFile() — rejects
  symlinks, sockets, FIFOs, block/char devices. lstat (not stat) so we
  see the symlink itself before the kernel resolves it.
- getKimiLiveSignalMtime swapped to lstat-based check; non-regular
  files contribute no mtime.
- extractKimiSummary refuses to open wire.jsonl when it isn't a
  regular file.
- Tests cover both paths: getActivityState rejects a session whose
  live-signal files are symlinked outside the bucket; getSessionInfo
  returns null summary when wire.jsonl is symlinked even if context.jsonl
  is real.

* fix(plugin-kimicode): apply baseline + createdAt filters to kimi.json soft-pin

The kimi.json soft-pin used to record a candidate UUID before the baseline
and createdAt filters were applied, so a stale last_session_id pointing at
a pre-AO UUID (manual `kimi` run, kimi.json lag) would be captured into
.ao/kimi-session-id.json and route every later getActivityState /
getSessionInfo / getRestoreCommand call at the wrong conversation, with
no self-healing path.

Move the baseline + createdAt floor checks above the soft-pin branch so
the soft-pin candidate goes through the same gates as the recency contest.

Add two regression tests:
- soft-pin pointing at a baseline UUID is rejected and the AO pin file
  records the legitimate AO-spawned UUID instead
- soft-pin pointing at a UUID older than session.createdAt - 60s is
  rejected by the createdAt floor

Both tests fail on the prior code and pass after the fix.
fastestdevalive pushed a commit that referenced this pull request May 7, 2026
…ComposioHQ#1511) (ComposioHQ#1620)

* rebase: forward branch onto main + resolve activity-events kind union conflict

* feat(core): wire scm/runtime/agent plugin-call failure events

Adds activity-event evidence for previously-silent failure paths in
lifecycle-manager.ts so the RCA agent can answer 'why did X happen?':

- scm.batch_enrich_failed (line 617 catch)
- scm.detect_pr_succeeded (line 658 success path)
- scm.detect_pr_failed (line 664 catch)
- scm.review_fetch_failed (line 1517 catch)
- scm.poll_pr_failed (line 1132 catch)
- runtime.probe_failed (line 938 catch)
- agent.process_probe_failed (lines 1054 + 1139 catches, with where field)
- agent.activity_probe_failed (line 1062 outer catch)

Plus 6 new tests covering the call shapes.

Invariants preserved (per CLAUDE.md):
- B1 state-mutate-before-emit: each emit follows existing observer call
- B2 never throws: recordActivityEvent best-effort by design
- B3 re-entrancy guard unchanged
- B4 Promise.allSettled semantics unchanged

* feat(core): wire reaction lifecycle activity events

Adds AE evidence around reaction triggers, escalations, and failures so
RCA can answer 'did AO try to auto-fix this? did it succeed?':

- reaction.action_succeeded (combined for send-to-agent / notify / auto-merge,
  with data.action variant) — fires after each successful reaction action
- reaction.send_to_agent_failed — fires in the previously-silent catch when
  sessionManager.send throws inside a send-to-agent reaction
- reaction.escalated — fires alongside the existing notifyHuman escalation
  with data.escalationCause = 'max_retries' | 'max_duration'

Plus 3 new tests covering the call shapes.

Invariants preserved: emits land after the existing notifyHuman/return
paths so state mutation order is unchanged.

* feat(core): wire auto-cleanup, poll-cycle, detecting escalation events

Adds AE evidence around session destruction, poll loop failures, and the
detecting→stuck transition so RCA can answer 'when did my session get
cleaned up?', 'did the polling loop crash?', and 'why did AO mark this
session stuck?':

- session.auto_cleanup_deferred — agent busy, cleanup deferred
- session.auto_cleanup_completed — kill succeeded, runtime + worktree gone
- session.auto_cleanup_failed (level=error) — kill threw, session stays merged
- lifecycle.poll_failed (level=error) — pollAll outer catch fired
- detecting.escalated — first cycle that promotes detecting→stuck, with
  cause = max_attempts | max_duration. Guarded by detectingEscalatedAt
  metadata so it fires once per escalation, not on every poll while stuck.

Plus 5 new tests covering the call shapes and the idempotency guard.

Invariants preserved:
- Auto-cleanup events fire AFTER existing observer.recordOperation (B1)
- detecting.escalated emits ONCE per escalation (invariant B9 in
  .context/lifecycle-manager-instrumentation.md)
- poll_failed emits inside the existing pollAll catch — flow unchanged

* feat(core): wire report_watcher.triggered activity event

Adds AE evidence when the report watcher fires (no_acknowledge / stale_report
/ agent_needs_input). RCA: 'AO thinks my agent is stuck — why?'

- report_watcher.triggered (level=warn) — emitted alongside the existing
  observer.recordOperation, only when a trigger is non-null (per invariant
  in .context/lifecycle-manager-instrumentation.md §B9)

Plus 1 test exercising the no_acknowledge trigger path.

* fix(core): one-shot guard on report_watcher.triggered AE emit

Live-observed regression: report_watcher.triggered fired 116 times in
production over a few hours because the emit was unguarded and re-fired
every 30s poll while a trigger stayed active. Symptom was massive event
flood for stuck/no-acknowledge/stale conditions.

Fix: gate the emit on the existing isNewTrigger variable (same one-shot
guard pattern used for detecting.escalated). The observer.recordOperation
above remains unguarded by design (it's a metric/heartbeat); the AE trail
is for actionable evidence only.

Adds a regression test that drives the same trigger across two polls and
asserts the AE event fires only on the first.

* fix(core): address Greptile feedback on PR ComposioHQ#1620

Two findings from Greptile (issue same as Codex P2 #1):

1. scm.batch_enrich_failed omitted projectId/sessionId — when the
   lifecycle worker is project-scoped (deps.projectId set), this event
   is effectively project-scoped too. Without projectId, queries like
   `ao events list --project todo-app --type scm.batch_enrich_failed`
   return zero results, defeating the purpose of the instrumentation.
   Fix: pass scopedProjectId when set. Unscoped (multi-project) supervisors
   still leave projectId null because the batch crosses project boundaries.

2. Misleading field name pendingSinceMs in session.auto_cleanup_deferred
   data — the local variable of the same name is a Unix epoch timestamp,
   but the data field stored `Date.now() - pendingSinceMs` (an elapsed
   duration). RCA agents would mis-interpret it as a timestamp and compute
   a 1970-era "pending since" date. Renamed to pendingElapsedMs.

* fix(core): address Codex review on PR ComposioHQ#1620

- lifecycle.poll_failed: keep summary generic, route raw error text
  through `data.errorMessage` only. sanitizeSummary just truncates;
  sanitizeData redacts credential URLs. Since FTS5 indexes summary,
  interpolating subprocess error output (which can include
  https://x-oauth-basic:TOKEN@github.com/... from git/gh) made
  credentials persistently searchable.

- reaction.escalated: expand escalationCause to
  "max_retries" | "max_attempts" | "max_duration" and mirror the
  trigger checks. Numeric escalateAfter is an attempt-count gate, not
  a duration; previously got misattributed to "max_duration" whenever
  retries was unset (built-in defaults use {escalateAfter: 2}).

Adds two regression tests as guards for both behaviors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(core): replace import() type annotation with import type to satisfy lint

CI's @typescript-eslint/consistent-type-imports rule rejects inline
`typeof import("../activity-events.js")` inside the vi.mock factory.
Hoist it to a top-level `import type * as ActivityEventsModule` so the
type lives in a proper import declaration; vi.mock factory resolution
is unaffected (type-only imports emit no runtime code).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(core): keep report_watcher.triggered summary generic to plug FTS leak

auditResult.message for the agent_needs_input trigger embeds the
free-form report.note supplied via `ao report --note "..."`. Since
sanitizeSummary only truncates and FTS5 indexes the summary column,
a note containing a credential URL would be persistently searchable
from the events DB. Same class of bug as the prior poll_failed fix.

Summary becomes generic ("<trigger> triggered"); the full message
continues to flow through `data.message` where sanitizeData redacts
credential URLs.

Adds a regression test that seeds a needs_input report with a
credential-bearing note and asserts the summary stays clean.

Reported by @ashish921998 in PR ComposioHQ#1620.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(core): redact token-shaped secrets in activity-event data (P1)

Both `summary` and `data` columns are FTS5-indexed (events-db.ts:58-59).
Prior fixes moved raw error/report text from `summary` to `data.message` /
`data.errorMessage`, on the assumption that sanitizeData() would scrub it.
That assumption was incomplete: sanitizeData only redacted credential URLs
and entire values under sensitive *key* names. Token-shaped substrings
(`Bearer …`, `ghp_…`, `sk-…`, JWTs, `AKIA…`, ALL_CAPS_TOKEN=value) under
non-sensitive keys like `message`/`errorMessage` were stored as-is and
made searchable via FTS.

Adds a TOKEN_PATTERNS array applied to every string value during
sanitization, plus a 500-char per-string cap (matching sanitizeSummary's
existing precedent — limits blast radius if a new token format slips past
the patterns).

Patterns cover: Bearer headers, GitHub PATs (classic + fine-grained),
OpenAI/Anthropic sk- keys, Slack xox- tokens, AWS access key IDs, JWTs,
and ENV-style assignments scoped to ALL_CAPS keys ending in
TOKEN/PASSWORD/SECRET/etc.

Tests:
- 10 new sanitizeString unit tests (one per token shape + prose-preservation
  regression guard + 500-char cap + nested array/object recursion)
- 1 new FTS5 integration test that drives recordActivityEvent → real SQLite
  → both direct row read and FTS MATCH must return zero token leakage

Test fixtures use string concatenation across the prefix boundary so
literal token shapes don't appear in source (gitleaks pre-commit guard).

Reported by @ashish921998 in PR ComposioHQ#1620.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(core): bound credential-URL regex to prevent ReDoS (CodeQL alert)

CodeQL flagged CREDENTIAL_URL_RE as polynomial: input shaped like
`http://http://http://...` with no terminating `@` caused O(n²)
backtracking because the unbounded `[^@\s]+` greedily spanned multiple
`http://` prefixes before failing at end-of-string and walking back.

Two-part fix:
1. Exclude `/` from the userinfo character class — this is also semantically
   correct since RFC 3986 userinfo cannot contain unencoded `/`.
2. Add a hard length cap (200 chars) on the userinfo segment as a belt-and-
   braces guard against future pathological inputs.

The fix is observable: 14KB pathological input completes in single-digit
ms post-fix vs multiple seconds pre-fix. Adds a regression test that
runs the pathological input through the full sanitize pipeline and
asserts <100ms completion.

Reported by GitHub Advanced Security on PR ComposioHQ#1620.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(core): replace CREDENTIAL_URL_RE regex with linear scan

The bounded {1,200} quantifier in CREDENTIAL_URL_RE let credential URLs
with >200-char userinfo pass through unredacted. Since data is FTS5-indexed,
those credentials became searchable (P1 from PR ComposioHQ#1620 review).

Replace the regex with a simple linear scan (redactCredentialUrls) that:
- Has no length limit — scans until @, space, or /
- Is O(n) with no regex backtracking (fixes CodeQL polynomial-regex alert)
- Matches http:// and https:// case-insensitively (preserves old /gi behavior)

Adds regression tests for:
- >200-char userinfo bypass
- URLs without userinfo (no false positives)
- Multiple credential URLs in one string
- Pathological ReDoS-shaped input still completes in <100ms

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: AO Bot <ao-bot@composio.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant