Open-source, sovereign local RAG for confidential datasets and AI agents.
Mimir provides a TypeScript CLI, library, MCP server, and portable agent skills that can be installed in any Node.js repository. It indexes local files from the target repository, stores vectors locally with LanceDB, and can use either built-in local-hash retrieval or optional Transformers.js semantic embeddings.
Mimir core returns cited retrieval context. Answer synthesis belongs to the AI agent, LLM, or local model runtime you choose around it.
Created by Jean-Baptiste Thery and published under the JCode Labs npm scope.
Built by Jean-Baptiste Thery, freelance full-stack/AI tooling engineer at JCode Labs.
This root README is the canonical product documentation for the public npm packages.
| Package | Role |
|---|---|
@jcode.labs/mimir |
Core CLI, library, MCP server, bundled agent skills, and synthetic examples. |
@jcode.labs/mimir-tts |
Plug-and-play Edge-quality MP3 and offline Transformers.js WAV renderer used by kb audio. |
The package README files are intentionally short because npm displays each package README separately. They point npm readers back to this GitHub documentation.
Mimir is a public open-source project under the MIT License. It is designed to be inspectable, forkable, and usable without a JCode Labs account.
Contributions are welcome through pull requests. Start with CONTRIBUTING.md.
Security reports should stay private and follow SECURITY.md.
Mimir stays MIT open source. Sponsorship helps fund maintenance, issue triage, documentation, and practical agent-workflow improvements.
Sponsor the project through GitHub Sponsors.
Suggested GitHub Sponsors tiers:
- EUR 5/month: support the project.
- EUR 15/month: active sponsor.
- EUR 49/month: priority on issues and questions.
- EUR 199/month: company sponsor and light advisory support.
Early public package. APIs may evolve before 1.0.0.
- Build a local RAG knowledge base inside any repository.
- Analyze confidential datasets while keeping raw files and generated indexes local.
- Give Claude, Codex, Cursor, internal assistants, or other MCP-compatible tools the same private retrieval layer.
- Retrieve grounded local evidence through CLI, library calls, MCP tools, or bundled agent skills.
- Optionally create listenable MP3/WAV summaries or cited Markdown reports with bundled skills.
Mimir is not a hosted SaaS, not a remote vector database, and not a certified high-assurance system. For regulated or state-grade environments, pair it with encrypted disks, controlled machines, release verification, and an external security review.
Mimir is useful whenever source material should stay local but an AI agent still needs grounded context.
| Use case | Example questions |
|---|---|
| Understand a code repository | "Where is authentication implemented?", "What depends on this module?", "Summarize the payment flow." |
| Understand architecture | "What services exist?", "What are the data boundaries?", "Which components are risky to change?" |
| Analyze specifications | "What does the technical spec require?", "Which requirements are still unclear?", "Generate an implementation checklist." |
| Work through a request for proposal or tender | "What are the mandatory constraints?", "Which documents prove compliance?", "What risks should be clarified?" |
| Study courses and training material | "Summarize chapter three.", "Create revision questions.", "Compare these two concepts." |
| Analyze a book or long report | "Extract the main thesis.", "Find recurring arguments.", "Create a chapter-by-chapter brief." |
| Build an internal knowledge base | "What is the policy for incident review?", "Who owns this process?", "Which source says that?" |
| Prepare meetings or decisions | "Give me a one-page briefing.", "What is missing before deciding?", "List action items and evidence." |
| Ask questions over offline documents | "Which files mention local-only operation?", "What evidence supports this claim?" |
| Generate audio briefings | "Create a listenable high-quality or offline summary of the current dossier." |
| Generate Markdown reports | "Write a cited local report with findings, risks, next actions, and sources." |
- Node.js 20 or newer.
- pnpm, npm, yarn, or bun.
- A repository where generated local folders can be ignored by Git.
- No model runtime is required for the default
embeddingProvider: "local-hash"mode. - Optional semantic embeddings use Transformers.js with local model files under
.mimir/modelsby default. - Generated answers are intentionally outside Mimir core. Use Claude, Codex, OpenAI, a local model MCP server, or another trusted model runtime to synthesize from Mimir's cited context.
- Optional audio summaries use
@jcode.labs/mimir-tts. For highest-quality MP3, install the externaledge-ttsCLI and render with--engine edge. For confidential or air-gapped content, use the Transformers.js WAV path with--engine transformers --offline; it does not require Python, ffmpeg, Piper, XTTS, or a local server. - Optional Markdown reports use the bundled
mimir-markdown-reportskill and should stay under ignored.mimir/reports/unless explicitly sanitized for sharing.
The package is public. Users do not need a JCode Labs account or npm token to install it.
With pnpm:
pnpm add -D @jcode.labs/mimirWith npm:
npm install --save-dev @jcode.labs/mimirInstall the standalone TTS package only when you want to use it directly:
pnpm add -D @jcode.labs/mimir-ttsMaintainer tokens are only needed to publish new versions.
Initialize a repository, install the portable agent kit, run readiness checks, and ingest documents when supported files are already present:
pnpm exec kb setupkb setup creates or updates:
private/ # raw documents to ingest
.kb/config.json # local config
.kb/sources.txt # optional extra source paths
.mimir/skills/mimir/SKILL.md # portable agent skill
.mimir/skills/mimir-audio-summary/SKILL.md
.mimir/skills/mimir-markdown-report/SKILL.md
.mimir/mcp.json # generic MCP server config snippet
.mimir/claude-mcp-server.json # Claude Code add-json payload
.mimir/codex-mcp.toml # Codex config.toml snippet
.gitignore # ignores private/**, .kb/, and .mimir/
It detects the repository package manager and writes the MCP helper files with the right command:
pnpm exec kb serve-mcp, npx kb serve-mcp, yarn exec kb serve-mcp, or bunx kb serve-mcp.
Check readiness at any time:
pnpm exec kb doctorIf files are missing from the index, stale, or the setup is incomplete, run:
pnpm exec kb doctor --fixdoctor --fix performs safe repairs: missing scaffolding, Git ignore entries, agent kit install, and
index rebuild when supported files are present and the privacy posture has no warnings.
Manual initialization is still available:
private/ # raw documents to ingest
.kb/config.json # local config
.kb/sources.txt # optional extra source paths
.gitignore # ignores private/**, .kb/, and .mimir/
Put supported files under private/:
private/
policy.md
meeting-notes.pdf
requirements.docx
Build the local index:
pnpm exec kb ingest
pnpm exec kb doctorWhen the index is ready, kb doctor prints ready=true. kb ingest and kb audit also report
files that were discovered but not indexed because the type is unsupported, the file is too large,
or the file name looks like a secret/private key.
List skipped paths explicitly:
pnpm exec kb audit --unsupportedRetrieve exact passages:
pnpm exec kb search "approval for offline operation"Return cited retrieval context for an agent or model:
pnpm exec kb ask "What evidence supports offline operation?"Mimir does not synthesize an LLM answer. It returns cited local passages; your chosen agent or model does the writing around those passages.
With npm, use npx after installing the package:
npx kb setup
npx kb doctor
npx kb search "approval for offline operation"Mimir has two embedding modes.
Use this when you want a fully local, no-model smoke test or a dependency-light setup. Retrieval is lexical/hash-based, not semantic.
.kb/config.json:
{
"embeddingProvider": "local-hash"
}Commands:
pnpm exec kb ingest
pnpm exec kb search "offline retrieval approval"
pnpm exec kb ask "What evidence supports offline operation?"kb ask always returns cited retrieved passages instead of a generated synthesis. You can pass those
passages to any LLM or agent you trust.
Use this when you want better semantic retrieval while keeping Mimir core free of an LLM server.
.kb/config.json:
{
"embeddingProvider": "transformers",
"embeddingModel": "mixedbread-ai/mxbai-embed-xsmall-v1",
"embeddingModelPath": ".mimir/models",
"transformersAllowRemoteModels": false
}Commands:
pnpm exec kb ingest
pnpm exec kb ask "Which passages support offline operation?"Keep transformersAllowRemoteModels false for confidential or air-gapped work and preload model
files into embeddingModelPath. Set it to true only when you explicitly allow Transformers.js to
download model files from Hugging Face.
Mimir ships with portable agent skills and a standard MCP server.
If kb setup was not used, install the agent kit into a repository:
pnpm exec kb install-skillThis creates:
.mimir/skills/mimir/SKILL.md
.mimir/skills/mimir-audio-summary/SKILL.md
.mimir/skills/mimir-markdown-report/SKILL.md
.mimir/mcp.json
.mimir/claude-mcp-server.json
.mimir/codex-mcp.toml
.mimir/README.md
Agents that support skill folders can load .mimir/skills/mimir/ for deep local RAG usage. Load
.mimir/skills/mimir-audio-summary/ only when an optional spoken summary is needed. Load
.mimir/skills/mimir-markdown-report/ when the user asks for a cited Markdown report, dossier,
audit memo, or planning note. Other agents can read the generated .mimir/README.md and use the MCP
config snippet.
Start the MCP server from the repository root:
pnpm exec kb serve-mcpMCP tools exposed:
mimir_statusmimir_searchmimir_askmimir_auditmimir_security_audit
This MCP layer is the recommended way to let any compatible LLM or agent query the same local knowledge base. The LLM does not need to know about LanceDB or the raw file layout; it asks Mimir for ranked passages or cited context and uses the returned citations.
From the target repository root:
pnpm exec kb setup
claude mcp add-json --scope local mimir "$(cat .mimir/claude-mcp-server.json)"Claude Code provides the active project path to MCP servers through CLAUDE_PROJECT_DIR; Mimir uses
that value when serving MCP, so the same installed npm package can work inside each repository where
kb setup was run. Keep the MCP scope local unless you intentionally want to share the server
config.
From the target repository root:
pnpm exec kb setup
cat .mimir/codex-mcp.tomlCopy the printed TOML into ~/.codex/config.toml or another trusted Codex config layer. The snippet
contains the repository cwd, so Codex can launch the Mimir MCP server from the right project.
For other MCP clients that cannot set cwd, set MIMIR_PROJECT_ROOT=/absolute/path/to/repository
when launching kb serve-mcp.
From a repository that already ran kb setup and has Mimir wired into the current agent, ask:
Use Mimir to audit the local evidence. First run mimir_status and mimir_audit. Then search for
"offline retrieval approval" and produce a cited Markdown report. Do not rely on memory if Mimir
does not contain enough evidence.
Agents that support skill folders should also load:
.mimir/skills/mimir/
.mimir/skills/mimir-markdown-report/
The Markdown report skill writes reports under .mimir/reports/ by default, which stays ignored by
Git.
Print the bundled skill path from the installed package:
pnpm exec kb skill-pathMimir includes a plug-and-play text-to-speech path for listenable summaries.
For the same quality path as the global Voice Forge skill, install edge-tts and render MP3:
pnpm exec kb audio --doctor
pipx install edge-tts
pnpm exec kb audio /tmp/MIMIR-SUMMARY-project.txt \
--engine edge \
--out .mimir/audio/project-summary.mp3The Edge path uses the online Microsoft Edge TTS service through the edge-tts CLI. Use it only
when sending the narration text to that service is acceptable. MP3 output requires explicit
--engine edge for this reason.
By default, kb audio uses the Transformers.js WAV path. For confidential or air-gapped work,
preload Transformers.js-compatible model files and render WAV offline:
pnpm exec kb audio /tmp/MIMIR-SUMMARY-project.txt \
--engine transformers \
--offline \
--model-path .mimir/models/tts \
--out .mimir/audio/project-summary.wavUse the standalone package directly:
pnpm exec mimir-tts doctor --json
pnpm exec mimir-tts render /tmp/MIMIR-SUMMARY-project.txt \
--engine edge \
--out .mimir/audio/project-summary.mp3The default standalone engine is transformers. The default Transformers.js model is
Xenova/mms-tts-fra. Override it with --model or MIMIR_TTS_MODEL.
The package code lives in node_modules or in this repository. Project data stays in the repository
where you run the CLI:
your-project/
private/ # raw documents to ingest
.kb/config.json # local config
.kb/sources.txt # optional extra source paths
.kb/storage/ # generated LanceDB index
.kb/access.log # metadata-only access log
The package never ships project documents. kb setup adds gitignore entries for .kb/,
.mimir/, and private/**. Generated indexes, agent files, and raw documents stay local to the
target repository.
Mimir is designed for private repositories and sensitive local evidence.
- Zero telemetry: no analytics or document content is sent to JCode Labs.
- No LLM generation in core: Mimir returns cited context for the agent/runtime you choose.
- Local-hash by default: no model runtime is required for the default retrieval path.
- Transformers.js remote model loading is disabled by default.
- Redaction before indexing: common secrets and identifiers are redacted before chunks are embedded and stored.
- Metadata-only access logs: query hashes and action metadata are logged, not raw queries.
- MCP is read-focused and bounded by
mcpMaxTopK. - Generated local state is ignored by Git.
Run:
pnpm exec kb security-audit --strictRemove the generated vector index:
pnpm exec kb destroy-index --yesdestroy-index does not securely erase SSD or copy-on-write storage. For strong deletion
guarantees, use encrypted storage and destroy the encryption key.
For air-gapped operation, release verification, secure deletion limits, and threat model details,
read SECURITY-HARDENING.md.
Mimir supports common text, document, data, config, log, and source-code files out of the box:
- Markdown:
.md,.mdx - Text:
.txt,.text - JSON:
.json - YAML:
.yaml,.yml - CSV/TSV:
.csv,.tsv - HTML:
.html,.htm - EPUB:
.epub - PDF:
.pdf - Office/OpenDocument:
.docx,.pptx,.xlsx,.odt,.ods,.odp - Rich text:
.rtf - Notebook:
.ipynb - Subtitles/calendars/mail:
.vtt,.srt,.ics,.eml - Line data and logs:
.jsonl,.ndjson,.log - XML feeds and documents:
.xml,.rss,.atom,.svg - Config and data files:
.toml,.ini,.conf,.cfg,.properties,.sql - Source code:
.ts,.tsx,.mts,.cts,.js,.jsx,.mjs,.cjs,.py,.go,.rs,.java,.rb,.php,.cs,.c,.cpp,.h,.hpp,.css,.scss,.vue,.svelte,.astro,.sh,.bash,.ps1 - Documentation/code review text:
.rst,.adoc,.tex,.diff,.patch,.markdown,.mdown
Custom UTF-8 text extensions can be enabled without changing code:
{
"includeExtensions": [".transcript", ".evidence"]
}Or through:
KB_INCLUDE_EXTENSIONS=".transcript,.evidence" pnpm exec kb ingestImages, scans, audio/video files, old proprietary Office binaries such as .doc, and other formats
that are not listed should be OCRed, transcribed, converted, or exported to text/PDF/HTML first.
Mimir intentionally avoids pretending that every binary format can be indexed safely without
extraction logic.
Secret-like files such as .env, .npmrc, private keys, and certificates are skipped by default.
Convert safe examples to a normal text format before ingestion.
Sensitive key/certificate-like files such as .pem, .key, .p12, .pfx, .jks, .gpg, and
common secret filenames such as .env, .npmrc, .netrc, and .pgpass are skipped by default even
if they sit under a source directory.
Default .kb/config.json:
{
"rawDir": "private",
"storageDir": ".kb/storage",
"sourcesFile": ".kb/sources.txt",
"accessLogPath": ".kb/access.log",
"embeddingModelPath": ".mimir/models",
"tableName": "chunks",
"embeddingProvider": "local-hash",
"embeddingModel": "mixedbread-ai/mxbai-embed-xsmall-v1",
"transformersAllowRemoteModels": false,
"redaction": {
"enabled": true,
"builtIn": true,
"patterns": []
},
"accessLog": true,
"mcpMaxTopK": 10,
"topK": 5,
"chunkSize": 1200,
"chunkOverlap": 150,
"maxFileBytes": 50000000,
"ingestConcurrency": 4,
"embeddingBatchSize": 32,
"includeExtensions": []
}Environment overrides:
KB_RAW_DIRKB_STORAGE_DIRKB_SOURCES_FILEKB_ACCESS_LOG_PATHKB_EMBEDDING_PROVIDERKB_EMBEDDING_MODELKB_EMBEDDING_MODEL_PATHKB_TRANSFORMERS_ALLOW_REMOTE_MODELSKB_REDACTION_ENABLEDKB_REDACTION_BUILT_INKB_ACCESS_LOGKB_MCP_MAX_TOP_KKB_TOP_KKB_CHUNK_SIZEKB_CHUNK_OVERLAPKB_MAX_FILE_BYTESKB_INGEST_CONCURRENCYKB_EMBEDDING_BATCH_SIZEKB_INCLUDE_EXTENSIONS
Mimir ships two CLIs:
kb: the main local RAG, MCP, skills, security, and audio command.mimir-tts: the standalone text-to-speech renderer used bykb audio.
| Command | Use it when |
|---|---|
kb setup |
Initialize Mimir, install the agent kit, run doctor, and ingest when safe. |
kb init |
Create .kb/config.json, .kb/sources.txt, private/, and Git ignore rules. |
kb doctor |
Diagnose setup, index freshness, security warnings, and the next command to run. |
kb doctor --fix |
Create missing scaffolding, install skills/MCP config, and rebuild stale indexes when safe. |
kb ingest |
Parse source files, redact, chunk, embed, and rebuild the local LanceDB index. |
kb audit |
Check whether supported source files are missing from or stale in the index. |
kb audit --unsupported |
List files skipped because they are unsupported, too large, or secret-like. |
kb search "<query>" |
Retrieve ranked passages without asking an LLM to write an answer. |
kb ask "<question>" |
Return cited retrieval context for an agent or trusted model runtime. |
kb security-audit |
Inspect privacy posture: telemetry, providers, redaction, Git ignore, MCP. |
kb status |
Print raw config paths, provider settings, and indexed chunk count. |
| Command | Use it when |
|---|---|
kb install-skill |
Copy portable agent skills and an MCP config snippet into .mimir/. |
kb skill-path |
Print the package-bundled skill path for agents that load installed package skills. |
kb serve-mcp |
Start the MCP stdio server for compatible agents. |
| Command | Use it when |
|---|---|
kb destroy-index --yes |
Delete generated .kb/storage index files. |
kb security-audit --strict |
Fail the command when privacy warnings are present. |
| Command | Use it when |
|---|---|
kb audio --doctor |
Check TTS runtime readiness. |
kb audio <file> --engine transformers --offline --out .mimir/audio/name.wav |
Render a confidential/offline WAV. |
kb audio <file> --engine edge --out .mimir/audio/name.mp3 |
Render a higher-quality online Edge MP3. |
mimir-tts doctor --json |
Inspect the standalone TTS package. |
mimir-tts render <file> --offline --out .mimir/audio/name.wav |
Render directly through the TTS package. |
| Option | Applies to | Meaning |
|---|---|---|
--top-k <number> |
search, ask |
Number of passages to return. |
--json |
doctor, audit, security-audit, audio --doctor, mimir-tts doctor |
Print machine-readable JSON. |
--unsupported |
audit |
List skipped file paths and reasons. |
--strict |
security-audit |
Exit non-zero when warnings exist. |
--offline |
audio, mimir-tts render |
Disable remote model downloads and force the local Transformers.js path. |
--allow-remote-models |
audio, mimir-tts render |
Explicitly allow model downloads for Transformers.js. |
--engine edge |
audio, mimir-tts render |
Use online Edge TTS for MP3 output. |
import { ask, ingest, search } from "@jcode.labs/mimir"
await ingest({ rebuild: true })
const results = await search("vendor invoice status")
const answer = await ask("What documents support the project timeline?")Use kb doctor first. It is the shortest path to the next useful action:
pnpm exec kb doctorUse doctor --fix when you want Mimir to repair safe setup issues automatically:
pnpm exec kb doctor --fixRun:
pnpm exec kb setup
pnpm exec kb doctorCommit only safe scaffolding if this is a real repository. Do not commit private documents,
.kb/storage, .mimir/, env files, or credentials.
Check that supported files exist under private/:
find private -maxdepth 2 -type f
pnpm exec kb ingest
pnpm exec kb doctorIf documents live elsewhere, add one path per line to .kb/sources.txt. Relative paths resolve from
the project root.
If files exist but are not supported yet, inspect the skipped inventory:
pnpm exec kb audit --unsupportedThen either convert them to a supported format, OCR/transcribe them, or add a safe custom UTF-8 text
extension with includeExtensions / KB_INCLUDE_EXTENSIONS.
The default local-hash provider is dependency-light and offline, but it is lexical/hash retrieval,
not semantic retrieval.
For better semantic retrieval, configure Transformers.js embeddings and preload the model when working offline:
{
"embeddingProvider": "transformers",
"embeddingModel": "mixedbread-ai/mxbai-embed-xsmall-v1",
"embeddingModelPath": ".mimir/models",
"transformersAllowRemoteModels": false
}Switching providers requires a full re-ingest:
pnpm exec kb ingest
pnpm exec kb doctorRun:
pnpm exec kb ingest
pnpm exec kb auditOr let doctor perform the safe rebuild:
pnpm exec kb doctor --fixMimir rebuilds the index on each ingest. The --rebuild flag is accepted for compatibility, but
ingest already rebuilds.
Read the warning lines. Common causes:
.kb/,.mimir/, orprivate/**are not ignored by Git.- Redaction was disabled.
- Transformers.js remote model loading was enabled.
Run the safe repair command if Git ignore entries are missing:
pnpm exec kb doctor --fix
pnpm exec kb security-audit --strictThis is intentional. MP3 output uses online Edge TTS and requires explicit consent:
pnpm exec kb audio /tmp/summary.txt \
--engine edge \
--out .mimir/audio/summary.mp3For confidential or offline work, use WAV:
pnpm exec kb audio /tmp/summary.txt \
--engine transformers \
--offline \
--out .mimir/audio/summary.wavInstall the external CLI:
pipx install edge-tts
pnpm exec kb audio --doctorOnly use Edge TTS when sending narration text to the online service is acceptable.
Offline rendering requires model files to already exist under .mimir/models/tts or the path passed
with --model-path.
For a first online setup on non-sensitive text:
pnpm exec mimir-tts render /tmp/test.txt --out .mimir/audio/test.wavThen reuse the cached files with:
pnpm exec mimir-tts render /tmp/test.txt --offline --out .mimir/audio/test.wavMimir can run retrieval without a model runtime. Some runtime dependencies remain because they own core features:
| Dependency | Why it remains |
|---|---|
@huggingface/transformers |
Optional local semantic embeddings and offline TTS. |
| LanceDB | Local vector storage and nearest-neighbor retrieval. |
| MCP SDK | MCP server for compatible agents. |
| fast-glob | Safe source-file discovery. |
| unpdf, html-to-text, yaml, fflate | Document parsing for PDF, HTML, YAML, Office/OpenDocument ZIP files. |
| commander, zod, picocolors | CLI, config validation, readable terminal output. |
Removing more dependencies is possible only by dropping features or replacing them with smaller
internal implementations. The current low-friction path is dependency-light at runtime for users who
choose local-hash, while preserving richer parsing, MCP support, and optional semantic embeddings.
This repository includes a synthetic example under
packages/mimir/examples/sovereign-rag-demo. It can
be used to test ingestion, retrieval, security-audit, and custom text extensions without using
private documents.
From a local checkout:
pnpm build
cd packages/mimir/examples/sovereign-rag-demo
node ../../dist/cli.js security-audit
node ../../dist/cli.js ingest
node ../../dist/cli.js search "offline retrieval approval"
node ../../dist/cli.js auditThe example uses the default local-hash retrieval mode, so it can run without downloading an embedding or chat model.
Install and validate the monorepo:
pnpm install
pnpm validateUseful filtered commands:
pnpm --filter @jcode.labs/mimir test
pnpm --filter @jcode.labs/mimir-tts test
pnpm --filter @jcode.labs/mimir build
pnpm --filter @jcode.labs/mimir-tts buildpackages/mimir/dist/ and packages/mimir-tts/dist/ are committed. After changing TypeScript
sources, run:
pnpm build
pnpm validateCI checks that generated dist/ files match the source.
The root package is private and only orchestrates workspace tasks. npm publishing is handled by the
protected Publish npm GitHub Actions workflow, which publishes @jcode.labs/mimir-tts before
@jcode.labs/mimir.
Build from source:
git clone git@github.com:jcode-works/jcode-mimir.git
cd jcode-mimir
pnpm install
pnpm buildUse a local checkout in another repository:
pnpm add -D file:../jcode-mimir/packages/mimirCreate a local npm tarball:
pnpm build
pnpm --dir packages/mimir packSECURITY-HARDENING.md: threat model, offline operation, release verification, and high-assurance deployment notes.docs/ux-dx-audit.md: current UX/DX findings, fixes, and remaining product risks.
MIT (c) Jean-Baptiste Thery.