Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@
Avoid claiming universal binary-file support; unsupported proprietary formats need extraction or
dedicated parsers.
- Keep optional audio summaries separate from core ingestion/query behavior. The
`mimir-audio-summary` skill must prefer `kb audio` / `@jcode.labs/mimir-tts`, use the Edge MP3
path for global Voice Forge quality when online TTS is explicitly acceptable, support the
Transformers.js WAV path for offline/confidential rendering, and keep generated audio under
`mimir-audio-summary` skill must prefer `kb audio` / `@jcode.labs/mimir-tts`, default to the
Transformers.js WAV path for offline/confidential rendering, use the Edge MP3 path for global
Voice Forge quality only when online TTS is explicitly acceptable, and keep generated audio under
ignored local Mimir state.
- Keep the repository as a simple pnpm workspace monorepo. Add Turbo only if multiple packages or
apps start needing task caching/orchestration beyond `pnpm --filter`.
Expand Down Expand Up @@ -70,6 +70,7 @@ General principles (KISS, DRY, YAGNI, SOLID) as applied in this codebase. Match

- `packages/mimir` is the core package published as `@jcode.labs/mimir`.
- `packages/mimir/src/cli.ts` exposes the `kb` CLI.
- `packages/mimir/src/doctor.ts` owns the user-facing readiness diagnosis behind `kb doctor`.
- `packages/mimir/src/config.ts` resolves `.kb/config.json` from the target repository.
- `packages/mimir/src/defaults.ts` owns shared default paths, provider defaults, and generated-state ignore
constants. Keep config/init/security/gitignore aligned through this module instead of copying
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Changelog

## 0.4.2 - 2026-06-29

- Add `kb doctor` to diagnose initialization, index freshness, security posture, and next steps.
- Make `kb audio` and `mimir-tts` default to the offline/confidential Transformers.js WAV path;
Edge MP3 now requires an explicit `--engine edge` command.
- Stop indexing the generated `private/README.md` helper file created by `kb init`.
- Improve onboarding output from `kb init` and `kb install-skill`.

## 0.4.1 - 2026-06-29

- Add an Edge-compatible Mimir TTS engine so `kb audio` can match the global Voice Forge quality
Expand Down
8 changes: 4 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ output in the same commit, or CI fails. This is the single easiest mistake to ma
- TTS package: **Mimir TTS**, published as `@jcode.labs/mimir-tts`.
- CLI binary: **`kb`** (`packages/mimir/bin.kb` -> `packages/mimir/dist/cli.js`). Commands: `init`,
`ingest`, `search`, `ask`, `audit`, `status`, `security-audit`, `destroy-index`, `audio`,
`serve-mcp`, `skill-path`, `install-skill`.
`doctor`, `serve-mcp`, `skill-path`, `install-skill`.
- TTS CLI binary: **`mimir-tts`** (`packages/mimir-tts/dist/cli.js`). Commands: `doctor`, `render`.
- Project config/state in the target repo: **`.kb/`** (`config.json`, `sources.txt`, `access.log`,
`storage/`), raw documents in **`private/`**, agent kit in **`.mimir/`**.
Expand All @@ -63,9 +63,9 @@ The ingest pipeline (`packages/mimir/src/ingest.ts`) chains single-responsibilit
`embeddings.ts` (vectorize) → `store.ts` (LanceDB). `query.ts` embeds the query and runs vector
search; `ask` returns cited passages only (no LLM synthesis in core).

`packages/mimir-tts` is a separate ESM package. It uses `edge-tts` for high-quality MP3 when the
external CLI is installed, and Transformers.js for offline WAV rendering without Python or ffmpeg.
Core `kb audio` imports it dynamically.
`packages/mimir-tts` is a separate ESM package. It defaults to Transformers.js for offline WAV
rendering without Python or ffmpeg, and uses `edge-tts` for high-quality MP3 only when explicitly
requested. Core `kb audio` imports it dynamically.

Key behaviors to keep in mind before editing:

Expand Down
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,15 @@ agents.
- [`@jcode.labs/mimir-tts`](./packages/mimir-tts): plug-and-play Edge-quality MP3 and offline
Transformers.js WAV renderer used by `kb audio`.

## Documentation

- [Getting started](./docs/getting-started.md): install Mimir and get the first useful search.
- [CLI reference](./docs/cli-reference.md): every `kb` and `mimir-tts` command with when to use it.
- [Troubleshooting](./docs/troubleshooting.md): common setup, indexing, audio, and release issues.
- [Security hardening](./SECURITY-HARDENING.md): threat model, offline operation, and release
verification.
- [UX/DX audit](./docs/ux-dx-audit.md): current findings, fixes, and remaining product risks.

## Development

```bash
Expand Down
10 changes: 5 additions & 5 deletions SECURITY-HARDENING.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ built to minimize data movement, but it is not a certified high-assurance system
default.
- MCP is read-focused: destructive tools are not exposed over MCP, and MCP retrieval is capped by
`mcpMaxTopK`.
- Optional audio summaries use `kb audio` / `@jcode.labs/mimir-tts`. Edge MP3 gives the highest
quality when online TTS is acceptable. Transformers.js WAV is the offline/confidential path and
does not require Python, ffmpeg, Piper, XTTS, or a local TTS server.
- Optional audio summaries use `kb audio` / `@jcode.labs/mimir-tts`. Transformers.js WAV is the
default offline/confidential path and does not require Python, ffmpeg, Piper, XTTS, or a local TTS
server. Edge MP3 gives the highest quality only when online TTS is explicitly acceptable.
- npm releases are published with provenance from the protected GitHub Actions workflow.
- Release artifacts include a package tarball, SHA256 checksums, SBOM, and manifest.

Expand Down Expand Up @@ -140,13 +140,13 @@ Confidentiality defaults:
- narration text is written to a temp file outside the repository;
- generated MP3 or WAV audio should be written under `.mimir/audio/`;
- `.mimir/` is ignored by Git;
- Edge MP3 uses the online Edge TTS service through the external `edge-tts` CLI and should be used
only when sending the narration text to that service is acceptable;
- Transformers.js WAV does not require Python, ffmpeg, Piper, XTTS, or a local TTS server;
- the first online-enabled Transformers render may download public model weights into
`.mimir/models/tts`, but the narration text is processed locally;
- `--engine transformers --offline` disables remote model loading and requires preloaded model
files.
- Edge MP3 uses the online Edge TTS service through the external `edge-tts` CLI and should be used
only when sending the narration text to that service is acceptable.

Generated audio can still contain sensitive information. Treat it like a derived confidential
document.
Expand Down
90 changes: 90 additions & 0 deletions docs/cli-reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# CLI Reference

Mimir ships two CLIs:

- `kb`: the main local RAG, MCP, skills, security, and audio command.
- `mimir-tts`: the standalone text-to-speech renderer used by `kb audio`.

## Main Workflow

| Command | Use It When |
| --- | --- |
| `kb init` | Create `.kb/config.json`, `.kb/sources.txt`, `private/`, and Git ignore rules. |
| `kb doctor` | Diagnose setup, index freshness, security warnings, and the next command to run. |
| `kb ingest` | Parse source files, redact, chunk, embed, and rebuild the local LanceDB index. |
| `kb audit` | Check whether supported source files are missing from or stale in the index. |
| `kb search "<query>"` | Retrieve ranked passages without asking an LLM to write an answer. |
| `kb ask "<question>"` | Return cited retrieval context for an agent or trusted model runtime. |
| `kb security-audit` | Inspect privacy posture: telemetry, providers, redaction, Git ignore, MCP. |
| `kb status` | Print raw config paths, provider settings, and indexed chunk count. |

## Agent Integration

| Command | Use It When |
| --- | --- |
| `kb install-skill` | Copy portable agent skills and an MCP config snippet into `.mimir/`. |
| `kb skill-path` | Print the package-bundled skill path for agents that load installed package skills. |
| `kb serve-mcp` | Start the MCP stdio server for compatible agents. |

MCP tools exposed by `kb serve-mcp`:

- `mimir_status`
- `mimir_search`
- `mimir_ask`
- `mimir_audit`
- `mimir_security_audit`

## Maintenance And Safety

| Command | Use It When |
| --- | --- |
| `kb destroy-index --yes` | Delete generated `.kb/storage` index files. |
| `kb security-audit --strict` | Fail the command when privacy warnings are present. |

`destroy-index` does not securely erase SSD or copy-on-write storage. For strong deletion
guarantees, use encrypted storage and destroy the encryption key.

## Audio

| Command | Use It When |
| --- | --- |
| `kb audio --doctor` | Check TTS runtime readiness. |
| `kb audio <file> --engine transformers --offline --out .mimir/audio/name.wav` | Render a confidential/offline WAV. |
| `kb audio <file> --engine edge --out .mimir/audio/name.mp3` | Render a higher-quality online Edge MP3. |
| `mimir-tts doctor --json` | Inspect the standalone TTS package. |
| `mimir-tts render <file> --offline --out .mimir/audio/name.wav` | Render directly through the TTS package. |

`kb audio` defaults to the offline/confidential Transformers.js path. MP3 output requires explicit
`--engine edge` because Edge TTS is an online service.

## Important Options

| Option | Applies To | Meaning |
| --- | --- | --- |
| `--top-k <number>` | `search`, `ask` | Number of passages to return. |
| `--json` | `doctor`, `security-audit`, `audio --doctor`, `mimir-tts doctor` | Print machine-readable JSON. |
| `--strict` | `security-audit` | Exit non-zero when warnings exist. |
| `--offline` | `audio`, `mimir-tts render` | Disable remote model downloads and force the local Transformers.js path. |
| `--allow-remote-models` | `audio`, `mimir-tts render` | Explicitly allow model downloads for Transformers.js. |
| `--engine edge` | `audio`, `mimir-tts render` | Use online Edge TTS for MP3 output. |

## Environment Overrides

Config values can be overridden through environment variables:

- `KB_RAW_DIR`
- `KB_STORAGE_DIR`
- `KB_SOURCES_FILE`
- `KB_ACCESS_LOG_PATH`
- `KB_EMBEDDING_PROVIDER`
- `KB_EMBEDDING_MODEL`
- `KB_EMBEDDING_MODEL_PATH`
- `KB_TRANSFORMERS_ALLOW_REMOTE_MODELS`
- `KB_REDACTION_ENABLED`
- `KB_REDACTION_BUILT_IN`
- `KB_ACCESS_LOG`
- `KB_MCP_MAX_TOP_K`
- `KB_TOP_K`
- `KB_CHUNK_SIZE`
- `KB_CHUNK_OVERLAP`
- `KB_INCLUDE_EXTENSIONS`
119 changes: 119 additions & 0 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Getting Started

This tutorial gets a repository from zero to a working local Mimir knowledge base.

## Who This Is For

Use this when you want an AI agent, CLI, or local workflow to retrieve grounded context from private
project documents without sending the dataset to a hosted RAG service.

## Prerequisites

- Node.js 20 or newer.
- pnpm, npm, yarn, or bun. The examples below use pnpm.
- A repository where local generated folders can be ignored by Git.

## 1. Install Mimir

```bash
pnpm add -D @jcode.labs/mimir
```

## 2. Initialize The Repository

```bash
pnpm exec kb init
pnpm exec kb doctor
```

`kb init` creates:

```plain text
private/ # raw documents to ingest
.kb/config.json # local config
.kb/sources.txt # optional extra source paths
.gitignore # ignores private/**, .kb/, and .mimir/
```

`kb doctor` explains what is missing and the next command to run.

## 3. Add Documents

Put supported files under `private/`:

```plain text
private/
policy.md
meeting-notes.pdf
requirements.docx
```

Do not put secrets, env files, or public repo content under `private/` unless you intend Mimir to
index them.

## 4. Build The Local Index

```bash
pnpm exec kb ingest
pnpm exec kb doctor
```

When the index is ready, `kb doctor` prints `ready=true`.

## 5. Retrieve Evidence

Use `search` for exact passages:

```bash
pnpm exec kb search "approval for offline operation"
```

Use `ask` when you want cited retrieval context to hand to an AI agent or model:

```bash
pnpm exec kb ask "What evidence supports offline operation?"
```

Mimir does not synthesize an LLM answer. It returns cited local passages; your chosen agent or model
does the writing around those passages.

## 6. Connect An Agent

```bash
pnpm exec kb install-skill
```

This creates:

```plain text
.mimir/skills/mimir/SKILL.md
.mimir/skills/mimir-audio-summary/SKILL.md
.mimir/mcp.json
.mimir/README.md
```

Use `.mimir/mcp.json` with MCP-compatible agents. Load `.mimir/skills/mimir/` in agents that support
portable skill folders.

## 7. Optional Audio Summary

Confidential/offline audio:

```bash
pnpm exec kb audio /tmp/summary.txt \
--engine transformers \
--offline \
--out .mimir/audio/summary.wav
```

Higher-quality online MP3:

```bash
pipx install edge-tts
pnpm exec kb audio /tmp/summary.txt \
--engine edge \
--out .mimir/audio/summary.mp3
```

The Edge path sends narration text to the online Edge TTS service. Use it only when that is
acceptable for the content.
Loading
Loading