fix: ship orient CLI subcommand + build BM25 index under --no-embed#95
Merged
Conversation
Three fixes from field feedback on openlore@2.0.4: 1. Ship the `openlore orient` CLI subcommand (wraps the existing handleOrient MCP handler). Supports --task, --json, --directory, --limit. With no task it prints a session-start primer (exit 0) so the `npx --yes openlore orient --json` SessionStart hook written by `openlore install` is a useful no-op instead of erroring every session. Registered in src/cli/index.ts; the orient.sh/orient.ps1 wrappers auto-detect it via `openlore --help` and use it directly. In --json mode, stdout is kept pure JSON (validateDirectory's "[ok] Successfully validated…" line is routed to stderr, mirroring the stdout discipline the MCP server already applies) so wrappers can parse it. 2. `analyze --no-embed` now builds a keyword-only (BM25) index instead of skipping indexing entirely. Previously the whole index-build step was gated on --embed, so --no-embed left orient reporting "No analysis found". The index is now always built; --no-embed only disables the semantic embedding attempt. Fixed the misleading help text. Also removed a contradictory auto-enable block that re-forced embeddings back on when --no-embed was passed. 3. Aligned the skill wrappers, SKILL.md, and README with the shipped CLI (dropped the "not yet shipped / TODO(spec-02-followup)" language). Tests: new orient.test.ts (command surface, primer, json passthrough, limit validation, error path, stdout-purity) and analyze-no-embed.test.ts (regression: --no-embed builds the index with a null embedder; default path still attempts embeddings). Full suite green: 2920 passed, 2 skipped; typecheck clean; build succeeds and `openlore orient --help` lists the command. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Found while dogfooding the full first-run flow on a clean Rust repo (proxilion: 75GB target/, ~294k files). On the first tool call the MCP server's auto-watcher recursively watched target/ and crashed with EMFILE: too many open files — breaking both the orient skill's MCP fallback and Claude Code's long-lived MCP server. - Expand the watcher's IGNORED_SEGMENTS beyond node_modules/.git/dist to cover build-output and dependency dirs across ecosystems: Rust target/, JS build/.next/.turbo/coverage/..., Python .venv/__pycache__/..., Go vendor/, JVM .gradle/, .NET obj/, plus .hg/.svn/.idea. These are matched as cheap string segments before any FD is opened, so they never trigger the EMFILE they used to. Export isIgnoredPath and add regression tests. - Add a --no-watch-auto flag to `openlore mcp` so one-shot callers can opt out of the watcher entirely. The orient skill's orient-via-mcp.mjs helper (a single initialize→orient→exit round-trip) now detects the flag via `mcp --help` and passes it when supported — older openlore builds that predate the flag are unaffected. Verified end-to-end on proxilion: init → analyze (BM25, 4359 functions) → install → SessionStart hook returns "ready" → orient returns real results, and the MCP server now watches the repo with no EMFILE. Full suite green: 2923 passed, 2 skipped; typecheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make first-run setup dead simple and remove the remaining friction a new
user hits.
install — one command, fully wired:
- `openlore install` now builds the index after configuring agent surfaces
(runs init if needed, then the real analyze command so the BM25 index
orient() reads actually gets built). Previously install only wired the
MCP server + hook, so a user's first orient() call returned "No analysis
found" until they manually ran analyze. Now orient works in the first
session. Opt out with `--no-analyze`. Skipped for --dry-run/--uninstall.
- Sub-command stdout is captured to stderr during the build so install
prints its own concise progress instead of init/analyze's "Next step:
run generate" chatter.
- Note: the MCP server is launched by the agent (Claude Code spawns
`npx openlore mcp` from settings.json), so install already wires
auto-start — there's no daemon for openlore to start itself.
watcher EMFILE — robust fix (not just a bigger ignore list):
- Match ignored directories by root-RELATIVE path segment, not absolute
substring. This prunes the ignored directory ITSELF (chokidar never
opens FDs inside target/ etc. — the actual EMFILE fix; a substring
`includes('/target/')` only matches descendants, so chokidar still
descends and FD-storms before pruning). It also stops false-positives
for repos living under a path like /home/user/dist/myapp.
- Broadened the ignore set across ecosystems (Rust target/, JS build dirs,
Python venvs/caches, Go vendor/, JVM .gradle/, .NET obj/, VCS dirs).
- Export isIgnoredRelPath; add unit tests (incl. substring false-positive
and windows-separator cases) plus a real-chokidar test proving target/
is pruned, not FD-stormed.
README: rewrote the Quickstart around the single `openlore install`
command, documented MCP auto-start and the watcher's build-dir pruning /
`--no-watch-auto`, and refreshed the test count.
Verified end-to-end on a clean repo (vaulytica): one `openlore install`
→ surfaces wired, BM25 index built (1889 functions), SessionStart hook
returns "ready", orient() returns real results — no manual analyze step.
Full suite green: 2928 passed, 2 skipped; typecheck clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Found via another dogfood pass (this time on nidus, a large mixed Python/TS repo with a pre-existing CLAUDE.md): - Re-running `openlore install` printed a scary "[error] Configuration exists. Use --force" plus a "Could not detect project type" warning to stderr, even though the outcome is a clean no-op (surfaces already wired, recent index reused). Root cause: buildIndex drove the init CLI command, which logs that error when config exists. Switch init to the programmatic openloreInit() API, which is silent and returns created:false when config already exists; analyze still runs via the CLI (it builds the BM25 index). Re-running install is now a clean no-op with no error noise. (Verified separately that "project type Unknown" on nidus is correct, not a bug: nidus has no top-level manifest — its pyproject.toml is nested under python/ — and analyze still indexed it fine.) - Add a reusable `/first-run-hardening` slash command (.claude/commands/first-run-hardening.md) capturing the dogfooding method that found this and the earlier first-run bugs: build first and drive dist/ (not the published version), run the real install flow on a clean sibling repo, stress field edges (large/non-TS repos, --json purity, idempotency, pre-existing user files), root-cause before fixing, treat test repos as read-only, and keep main/PR discipline. Un-ignore .claude/commands/ (only) so the command is shared while local .claude settings stay ignored. Full suite green: 2928 passed, 2 skipped; typecheck clean. Dogfooded on nidus (install idempotency + CLAUDE.md merge preserved) and fresh temp repos (bare-install surface detection, --no-analyze hint). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A clean-up PR to unblock everyone: make openlore dead-simple to install and fix the bugs that broke first-run setup. Started from field feedback on
openlore@2.0.4; a full first-run dogfood on clean repos surfaced the rest. Targetsmainvia this branch — not a direct push to main.1.
openlore installwrote a broken SessionStart hookThe hook ran
npx --yes openlore orient --json, but noorientCLI subcommand existed →error: unknown command 'orient'on every session start.Fix: ship the
orientCLI subcommand (src/cli/commands/orient.ts) wrapping the existinghandleOrientMCP handler. Flags--task/--json/--directory/--limit; with no task it prints a session-start primer (exit 0) so the hook is a useful no-op.--jsonkeeps stdout pure JSON (diagnosticvalidateDirectoryoutput routed to stderr) so wrappers can parse it.2.
analyze --no-embedskipped the BM25 index entirelyThe whole index build was gated on
--embed, so--no-embedleft orient reporting "No analysis found".Fix: the index is now always built;
--no-embedonly disables the semantic-embedding attempt (keyword-only BM25). Removed a contradictory auto-enable block; fixed the help text.3. MCP watcher EMFILE on large repos
Dogfooding on a Rust repo (75GB
target/, ~294k files) crashed the MCP server's auto-watcher withEMFILE: too many open fileson the first tool call — breaking both the orient skill's MCP fallback and Claude Code's long-lived MCP server.Fix (robust, not just a bigger list): match ignored dirs by root-relative path segment, which prunes the ignored directory itself so chokidar never opens FDs inside it (a substring
includes('/target/')only matches descendants, so chokidar still descends and FD-storms first). Also eliminates false-positives for repos under paths like/home/user/dist/app. Broadened the ignore set across ecosystems (Rust/JS/Python/Go/JVM/.NET/VCS). Added a--no-watch-autoflag for one-shot callers; theorient-via-mcp.mjshelper detects and uses it. Real-chokidar test provestarget/is pruned, not stormed.4. One-command setup (the "dead simple" goal)
openlore installpreviously only wired surfaces — so a new user's firstorient()still returned "No analysis found" until they manually rananalyze.Fix:
installnow also builds the index (runsinitif needed, then the realanalyzeso the BM25 index orient reads actually gets built). orient works in the first session, from one command. Opt out with--no-analyze; skipped for--dry-run/--uninstall. Sub-command chatter is captured so install prints its own concise progress.On MCP auto-start: the MCP server is launched by the agent (Claude Code spawns
npx openlore mcpfromsettings.json), soinstallalready wires auto-start — there's no daemon for openlore to start itself. The file watcher (now safe on large repos) stays on by default;openlore mcp --no-watch-autodisables it.5. Docs
SKILL.md, the skillREADME.md, and the wrappers.README.mdQuickstart around the singleopenlore installcommand; documented MCP auto-start and the watcher's build-dir pruning /--no-watch-auto.First-run verification (clean repos, via PR build)
init→analyze(BM25, 4359 fns) →install→ hook returns{"openlore":"ready"}→orient --taskreturns results; MCP server watches the 75GB repo with no EMFILE.openlore installwires the surface, builds the index (1889 fns), andorientreturns real results immediately — no manual analyze step. Confirmed published 2.0.4 errors (unknown command 'orient') while the PR build returns clean JSON. Both repos left exactly as found.Tests
orient.test.ts,analyze-no-embed.test.ts,install-analyze.test.ts(one-command setup builds the index;--no-analyze/dry-run don't),mcp-watcher.test.ts(isIgnoredRelPath+ real-chokidar pruning).openlore orient --helplists the command.Note: dependabot advisory #43 on the default branch is pre-existing and unrelated.
🤖 Generated with Claude Code