fix: ship orient CLI subcommand + build BM25 index under --no-embed by clay-good · Pull Request #95 · clay-good/OpenLore

clay-good · 2026-05-29T12:50:29Z

Summary

A clean-up PR to unblock everyone: make openlore dead-simple to install and fix the bugs that broke first-run setup. Started from field feedback on openlore@2.0.4; a full first-run dogfood on clean repos surfaced the rest. Targets main via this branch — not a direct push to main.

1. `openlore install` wrote a broken SessionStart hook

The hook ran npx --yes openlore orient --json, but no orient CLI subcommand existed → error: unknown command 'orient' on every session start.

Fix: ship the orient CLI subcommand (src/cli/commands/orient.ts) wrapping the existing handleOrient MCP handler. Flags --task/--json/--directory/--limit; with no task it prints a session-start primer (exit 0) so the hook is a useful no-op. --json keeps stdout pure JSON (diagnostic validateDirectory output routed to stderr) so wrappers can parse it.

2. `analyze --no-embed` skipped the BM25 index entirely

The whole index build was gated on --embed, so --no-embed left orient reporting "No analysis found".

Fix: the index is now always built; --no-embed only disables the semantic-embedding attempt (keyword-only BM25). Removed a contradictory auto-enable block; fixed the help text.

3. MCP watcher EMFILE on large repos

Dogfooding on a Rust repo (75GB target/, ~294k files) crashed the MCP server's auto-watcher with EMFILE: too many open files on the first tool call — breaking both the orient skill's MCP fallback and Claude Code's long-lived MCP server.

Fix (robust, not just a bigger list): match ignored dirs by root-relative path segment, which prunes the ignored directory itself so chokidar never opens FDs inside it (a substring includes('/target/') only matches descendants, so chokidar still descends and FD-storms first). Also eliminates false-positives for repos under paths like /home/user/dist/app. Broadened the ignore set across ecosystems (Rust/JS/Python/Go/JVM/.NET/VCS). Added a --no-watch-auto flag for one-shot callers; the orient-via-mcp.mjs helper detects and uses it. Real-chokidar test proves target/ is pruned, not stormed.

4. One-command setup (the "dead simple" goal)

openlore install previously only wired surfaces — so a new user's first orient() still returned "No analysis found" until they manually ran analyze.

Fix: install now also builds the index (runs init if needed, then the real analyze so the BM25 index orient reads actually gets built). orient works in the first session, from one command. Opt out with --no-analyze; skipped for --dry-run/--uninstall. Sub-command chatter is captured so install prints its own concise progress.

On MCP auto-start: the MCP server is launched by the agent (Claude Code spawns npx openlore mcp from settings.json), so install already wires auto-start — there's no daemon for openlore to start itself. The file watcher (now safe on large repos) stays on by default; openlore mcp --no-watch-auto disables it.

5. Docs

Dropped the stale "orient CLI not yet shipped / TODO(spec-02-followup)" language from SKILL.md, the skill README.md, and the wrappers.
Rewrote the project README.md Quickstart around the single openlore install command; documented MCP auto-start and the watcher's build-dir pruning / --no-watch-auto.

First-run verification (clean repos, via PR build)

Rust (proxilion): init → analyze (BM25, 4359 fns) → install → hook returns {"openlore":"ready"} → orient --task returns results; MCP server watches the 75GB repo with no EMFILE.
TS (vaulytica): a single openlore install wires the surface, builds the index (1889 fns), and orient returns real results immediately — no manual analyze step. Confirmed published 2.0.4 errors (unknown command 'orient') while the PR build returns clean JSON. Both repos left exactly as found.

Tests

orient.test.ts, analyze-no-embed.test.ts, install-analyze.test.ts (one-command setup builds the index; --no-analyze/dry-run don't), mcp-watcher.test.ts (isIgnoredRelPath + real-chokidar pruning).
Full suite: 2928 passing, 2 skipped; typecheck clean; build succeeds; openlore orient --help lists the command.

Note: dependabot advisory #43 on the default branch is pre-existing and unrelated.

🤖 Generated with Claude Code

Three fixes from field feedback on openlore@2.0.4: 1. Ship the `openlore orient` CLI subcommand (wraps the existing handleOrient MCP handler). Supports --task, --json, --directory, --limit. With no task it prints a session-start primer (exit 0) so the `npx --yes openlore orient --json` SessionStart hook written by `openlore install` is a useful no-op instead of erroring every session. Registered in src/cli/index.ts; the orient.sh/orient.ps1 wrappers auto-detect it via `openlore --help` and use it directly. In --json mode, stdout is kept pure JSON (validateDirectory's "[ok] Successfully validated…" line is routed to stderr, mirroring the stdout discipline the MCP server already applies) so wrappers can parse it. 2. `analyze --no-embed` now builds a keyword-only (BM25) index instead of skipping indexing entirely. Previously the whole index-build step was gated on --embed, so --no-embed left orient reporting "No analysis found". The index is now always built; --no-embed only disables the semantic embedding attempt. Fixed the misleading help text. Also removed a contradictory auto-enable block that re-forced embeddings back on when --no-embed was passed. 3. Aligned the skill wrappers, SKILL.md, and README with the shipped CLI (dropped the "not yet shipped / TODO(spec-02-followup)" language). Tests: new orient.test.ts (command surface, primer, json passthrough, limit validation, error path, stdout-purity) and analyze-no-embed.test.ts (regression: --no-embed builds the index with a null embedder; default path still attempts embeddings). Full suite green: 2920 passed, 2 skipped; typecheck clean; build succeeds and `openlore orient --help` lists the command. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Found while dogfooding the full first-run flow on a clean Rust repo (proxilion: 75GB target/, ~294k files). On the first tool call the MCP server's auto-watcher recursively watched target/ and crashed with EMFILE: too many open files — breaking both the orient skill's MCP fallback and Claude Code's long-lived MCP server. - Expand the watcher's IGNORED_SEGMENTS beyond node_modules/.git/dist to cover build-output and dependency dirs across ecosystems: Rust target/, JS build/.next/.turbo/coverage/..., Python .venv/__pycache__/..., Go vendor/, JVM .gradle/, .NET obj/, plus .hg/.svn/.idea. These are matched as cheap string segments before any FD is opened, so they never trigger the EMFILE they used to. Export isIgnoredPath and add regression tests. - Add a --no-watch-auto flag to `openlore mcp` so one-shot callers can opt out of the watcher entirely. The orient skill's orient-via-mcp.mjs helper (a single initialize→orient→exit round-trip) now detects the flag via `mcp --help` and passes it when supported — older openlore builds that predate the flag are unaffected. Verified end-to-end on proxilion: init → analyze (BM25, 4359 functions) → install → SessionStart hook returns "ready" → orient returns real results, and the MCP server now watches the repo with no EMFILE. Full suite green: 2923 passed, 2 skipped; typecheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Make first-run setup dead simple and remove the remaining friction a new user hits. install — one command, fully wired: - `openlore install` now builds the index after configuring agent surfaces (runs init if needed, then the real analyze command so the BM25 index orient() reads actually gets built). Previously install only wired the MCP server + hook, so a user's first orient() call returned "No analysis found" until they manually ran analyze. Now orient works in the first session. Opt out with `--no-analyze`. Skipped for --dry-run/--uninstall. - Sub-command stdout is captured to stderr during the build so install prints its own concise progress instead of init/analyze's "Next step: run generate" chatter. - Note: the MCP server is launched by the agent (Claude Code spawns `npx openlore mcp` from settings.json), so install already wires auto-start — there's no daemon for openlore to start itself. watcher EMFILE — robust fix (not just a bigger ignore list): - Match ignored directories by root-RELATIVE path segment, not absolute substring. This prunes the ignored directory ITSELF (chokidar never opens FDs inside target/ etc. — the actual EMFILE fix; a substring `includes('/target/')` only matches descendants, so chokidar still descends and FD-storms before pruning). It also stops false-positives for repos living under a path like /home/user/dist/myapp. - Broadened the ignore set across ecosystems (Rust target/, JS build dirs, Python venvs/caches, Go vendor/, JVM .gradle/, .NET obj/, VCS dirs). - Export isIgnoredRelPath; add unit tests (incl. substring false-positive and windows-separator cases) plus a real-chokidar test proving target/ is pruned, not FD-stormed. README: rewrote the Quickstart around the single `openlore install` command, documented MCP auto-start and the watcher's build-dir pruning / `--no-watch-auto`, and refreshed the test count. Verified end-to-end on a clean repo (vaulytica): one `openlore install` → surfaces wired, BM25 index built (1889 functions), SessionStart hook returns "ready", orient() returns real results — no manual analyze step. Full suite green: 2928 passed, 2 skipped; typecheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Found via another dogfood pass (this time on nidus, a large mixed Python/TS repo with a pre-existing CLAUDE.md): - Re-running `openlore install` printed a scary "[error] Configuration exists. Use --force" plus a "Could not detect project type" warning to stderr, even though the outcome is a clean no-op (surfaces already wired, recent index reused). Root cause: buildIndex drove the init CLI command, which logs that error when config exists. Switch init to the programmatic openloreInit() API, which is silent and returns created:false when config already exists; analyze still runs via the CLI (it builds the BM25 index). Re-running install is now a clean no-op with no error noise. (Verified separately that "project type Unknown" on nidus is correct, not a bug: nidus has no top-level manifest — its pyproject.toml is nested under python/ — and analyze still indexed it fine.) - Add a reusable `/first-run-hardening` slash command (.claude/commands/first-run-hardening.md) capturing the dogfooding method that found this and the earlier first-run bugs: build first and drive dist/ (not the published version), run the real install flow on a clean sibling repo, stress field edges (large/non-TS repos, --json purity, idempotency, pre-existing user files), root-cause before fixing, treat test repos as read-only, and keep main/PR discipline. Un-ignore .claude/commands/ (only) so the command is shared while local .claude settings stay ignored. Full suite green: 2928 passed, 2 skipped; typecheck clean. Dogfooded on nidus (install idempotency + CLAUDE.md merge preserved) and fresh temp repos (bare-install surface detection, --no-analyze hint). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

clay-good and others added 4 commits May 29, 2026 07:49

clay-good merged commit ed7a754 into main May 29, 2026
4 checks passed

clay-good deleted the fix/orient-cli-and-install-hook branch May 29, 2026 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ship orient CLI subcommand + build BM25 index under --no-embed#95

fix: ship orient CLI subcommand + build BM25 index under --no-embed#95
clay-good merged 4 commits into
mainfrom
fix/orient-cli-and-install-hook

clay-good commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

clay-good commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. openlore install wrote a broken SessionStart hook

2. analyze --no-embed skipped the BM25 index entirely

3. MCP watcher EMFILE on large repos

4. One-command setup (the "dead simple" goal)

5. Docs

First-run verification (clean repos, via PR build)

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clay-good commented May 29, 2026 •

edited

Loading

1. `openlore install` wrote a broken SessionStart hook

2. `analyze --no-embed` skipped the BM25 index entirely