Skip to content

fix: ship orient CLI subcommand + build BM25 index under --no-embed#95

Merged
clay-good merged 4 commits into
mainfrom
fix/orient-cli-and-install-hook
May 29, 2026
Merged

fix: ship orient CLI subcommand + build BM25 index under --no-embed#95
clay-good merged 4 commits into
mainfrom
fix/orient-cli-and-install-hook

Conversation

@clay-good
Copy link
Copy Markdown
Owner

@clay-good clay-good commented May 29, 2026

Summary

A clean-up PR to unblock everyone: make openlore dead-simple to install and fix the bugs that broke first-run setup. Started from field feedback on openlore@2.0.4; a full first-run dogfood on clean repos surfaced the rest. Targets main via this branch — not a direct push to main.

1. openlore install wrote a broken SessionStart hook

The hook ran npx --yes openlore orient --json, but no orient CLI subcommand existed → error: unknown command 'orient' on every session start.

Fix: ship the orient CLI subcommand (src/cli/commands/orient.ts) wrapping the existing handleOrient MCP handler. Flags --task/--json/--directory/--limit; with no task it prints a session-start primer (exit 0) so the hook is a useful no-op. --json keeps stdout pure JSON (diagnostic validateDirectory output routed to stderr) so wrappers can parse it.

2. analyze --no-embed skipped the BM25 index entirely

The whole index build was gated on --embed, so --no-embed left orient reporting "No analysis found".

Fix: the index is now always built; --no-embed only disables the semantic-embedding attempt (keyword-only BM25). Removed a contradictory auto-enable block; fixed the help text.

3. MCP watcher EMFILE on large repos

Dogfooding on a Rust repo (75GB target/, ~294k files) crashed the MCP server's auto-watcher with EMFILE: too many open files on the first tool call — breaking both the orient skill's MCP fallback and Claude Code's long-lived MCP server.

Fix (robust, not just a bigger list): match ignored dirs by root-relative path segment, which prunes the ignored directory itself so chokidar never opens FDs inside it (a substring includes('/target/') only matches descendants, so chokidar still descends and FD-storms first). Also eliminates false-positives for repos under paths like /home/user/dist/app. Broadened the ignore set across ecosystems (Rust/JS/Python/Go/JVM/.NET/VCS). Added a --no-watch-auto flag for one-shot callers; the orient-via-mcp.mjs helper detects and uses it. Real-chokidar test proves target/ is pruned, not stormed.

4. One-command setup (the "dead simple" goal)

openlore install previously only wired surfaces — so a new user's first orient() still returned "No analysis found" until they manually ran analyze.

Fix: install now also builds the index (runs init if needed, then the real analyze so the BM25 index orient reads actually gets built). orient works in the first session, from one command. Opt out with --no-analyze; skipped for --dry-run/--uninstall. Sub-command chatter is captured so install prints its own concise progress.

On MCP auto-start: the MCP server is launched by the agent (Claude Code spawns npx openlore mcp from settings.json), so install already wires auto-start — there's no daemon for openlore to start itself. The file watcher (now safe on large repos) stays on by default; openlore mcp --no-watch-auto disables it.

5. Docs

  • Dropped the stale "orient CLI not yet shipped / TODO(spec-02-followup)" language from SKILL.md, the skill README.md, and the wrappers.
  • Rewrote the project README.md Quickstart around the single openlore install command; documented MCP auto-start and the watcher's build-dir pruning / --no-watch-auto.

First-run verification (clean repos, via PR build)

  • Rust (proxilion): initanalyze (BM25, 4359 fns) → install → hook returns {"openlore":"ready"}orient --task returns results; MCP server watches the 75GB repo with no EMFILE.
  • TS (vaulytica): a single openlore install wires the surface, builds the index (1889 fns), and orient returns real results immediately — no manual analyze step. Confirmed published 2.0.4 errors (unknown command 'orient') while the PR build returns clean JSON. Both repos left exactly as found.

Tests

  • orient.test.ts, analyze-no-embed.test.ts, install-analyze.test.ts (one-command setup builds the index; --no-analyze/dry-run don't), mcp-watcher.test.ts (isIgnoredRelPath + real-chokidar pruning).
  • Full suite: 2928 passing, 2 skipped; typecheck clean; build succeeds; openlore orient --help lists the command.

Note: dependabot advisory #43 on the default branch is pre-existing and unrelated.

🤖 Generated with Claude Code

clay-good and others added 4 commits May 29, 2026 07:49
Three fixes from field feedback on openlore@2.0.4:

1. Ship the `openlore orient` CLI subcommand (wraps the existing
   handleOrient MCP handler). Supports --task, --json, --directory,
   --limit. With no task it prints a session-start primer (exit 0) so the
   `npx --yes openlore orient --json` SessionStart hook written by
   `openlore install` is a useful no-op instead of erroring every session.
   Registered in src/cli/index.ts; the orient.sh/orient.ps1 wrappers
   auto-detect it via `openlore --help` and use it directly.
   In --json mode, stdout is kept pure JSON (validateDirectory's
   "[ok] Successfully validated…" line is routed to stderr, mirroring the
   stdout discipline the MCP server already applies) so wrappers can parse it.

2. `analyze --no-embed` now builds a keyword-only (BM25) index instead of
   skipping indexing entirely. Previously the whole index-build step was
   gated on --embed, so --no-embed left orient reporting "No analysis
   found". The index is now always built; --no-embed only disables the
   semantic embedding attempt. Fixed the misleading help text. Also
   removed a contradictory auto-enable block that re-forced embeddings
   back on when --no-embed was passed.

3. Aligned the skill wrappers, SKILL.md, and README with the shipped CLI
   (dropped the "not yet shipped / TODO(spec-02-followup)" language).

Tests: new orient.test.ts (command surface, primer, json passthrough,
limit validation, error path, stdout-purity) and analyze-no-embed.test.ts
(regression: --no-embed builds the index with a null embedder; default
path still attempts embeddings). Full suite green: 2920 passed, 2 skipped;
typecheck clean; build succeeds and `openlore orient --help` lists the command.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Found while dogfooding the full first-run flow on a clean Rust repo
(proxilion: 75GB target/, ~294k files). On the first tool call the MCP
server's auto-watcher recursively watched target/ and crashed with
EMFILE: too many open files — breaking both the orient skill's MCP
fallback and Claude Code's long-lived MCP server.

- Expand the watcher's IGNORED_SEGMENTS beyond node_modules/.git/dist to
  cover build-output and dependency dirs across ecosystems: Rust target/,
  JS build/.next/.turbo/coverage/..., Python .venv/__pycache__/..., Go
  vendor/, JVM .gradle/, .NET obj/, plus .hg/.svn/.idea. These are matched
  as cheap string segments before any FD is opened, so they never trigger
  the EMFILE they used to. Export isIgnoredPath and add regression tests.

- Add a --no-watch-auto flag to `openlore mcp` so one-shot callers can opt
  out of the watcher entirely. The orient skill's orient-via-mcp.mjs helper
  (a single initialize→orient→exit round-trip) now detects the flag via
  `mcp --help` and passes it when supported — older openlore builds that
  predate the flag are unaffected.

Verified end-to-end on proxilion: init → analyze (BM25, 4359 functions) →
install → SessionStart hook returns "ready" → orient returns real results,
and the MCP server now watches the repo with no EMFILE. Full suite green:
2923 passed, 2 skipped; typecheck clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make first-run setup dead simple and remove the remaining friction a new
user hits.

install — one command, fully wired:
- `openlore install` now builds the index after configuring agent surfaces
  (runs init if needed, then the real analyze command so the BM25 index
  orient() reads actually gets built). Previously install only wired the
  MCP server + hook, so a user's first orient() call returned "No analysis
  found" until they manually ran analyze. Now orient works in the first
  session. Opt out with `--no-analyze`. Skipped for --dry-run/--uninstall.
- Sub-command stdout is captured to stderr during the build so install
  prints its own concise progress instead of init/analyze's "Next step:
  run generate" chatter.
- Note: the MCP server is launched by the agent (Claude Code spawns
  `npx openlore mcp` from settings.json), so install already wires
  auto-start — there's no daemon for openlore to start itself.

watcher EMFILE — robust fix (not just a bigger ignore list):
- Match ignored directories by root-RELATIVE path segment, not absolute
  substring. This prunes the ignored directory ITSELF (chokidar never
  opens FDs inside target/ etc. — the actual EMFILE fix; a substring
  `includes('/target/')` only matches descendants, so chokidar still
  descends and FD-storms before pruning). It also stops false-positives
  for repos living under a path like /home/user/dist/myapp.
- Broadened the ignore set across ecosystems (Rust target/, JS build dirs,
  Python venvs/caches, Go vendor/, JVM .gradle/, .NET obj/, VCS dirs).
- Export isIgnoredRelPath; add unit tests (incl. substring false-positive
  and windows-separator cases) plus a real-chokidar test proving target/
  is pruned, not FD-stormed.

README: rewrote the Quickstart around the single `openlore install`
command, documented MCP auto-start and the watcher's build-dir pruning /
`--no-watch-auto`, and refreshed the test count.

Verified end-to-end on a clean repo (vaulytica): one `openlore install`
→ surfaces wired, BM25 index built (1889 functions), SessionStart hook
returns "ready", orient() returns real results — no manual analyze step.
Full suite green: 2928 passed, 2 skipped; typecheck clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Found via another dogfood pass (this time on nidus, a large mixed
Python/TS repo with a pre-existing CLAUDE.md):

- Re-running `openlore install` printed a scary "[error] Configuration
  exists. Use --force" plus a "Could not detect project type" warning to
  stderr, even though the outcome is a clean no-op (surfaces already wired,
  recent index reused). Root cause: buildIndex drove the init CLI command,
  which logs that error when config exists. Switch init to the programmatic
  openloreInit() API, which is silent and returns created:false when config
  already exists; analyze still runs via the CLI (it builds the BM25 index).
  Re-running install is now a clean no-op with no error noise.

  (Verified separately that "project type Unknown" on nidus is correct, not
  a bug: nidus has no top-level manifest — its pyproject.toml is nested
  under python/ — and analyze still indexed it fine.)

- Add a reusable `/first-run-hardening` slash command
  (.claude/commands/first-run-hardening.md) capturing the dogfooding method
  that found this and the earlier first-run bugs: build first and drive
  dist/ (not the published version), run the real install flow on a clean
  sibling repo, stress field edges (large/non-TS repos, --json purity,
  idempotency, pre-existing user files), root-cause before fixing, treat
  test repos as read-only, and keep main/PR discipline. Un-ignore
  .claude/commands/ (only) so the command is shared while local .claude
  settings stay ignored.

Full suite green: 2928 passed, 2 skipped; typecheck clean. Dogfooded on
nidus (install idempotency + CLAUDE.md merge preserved) and fresh temp
repos (bare-install surface detection, --no-analyze hint).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@clay-good clay-good merged commit ed7a754 into main May 29, 2026
4 checks passed
@clay-good clay-good deleted the fix/orient-cli-and-install-hook branch May 29, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant