lilbee

This is in active development. Cool things coming, bear with me please: #15 Feel free to use latest published versions but the entire project is actively being rebuilt and will be much more useful soon My motivation is a single executable that I can use for q&a, programming, and just having a locally curated encyclopedia like I used to have with Encarta 99, but this time I can talk to it instead and get responses without any need for reaching the Internet. Having a local search engine is awesome and being in full control of the inputs and outputs is even better.

Gain back some privacy while still having the awesome power of AI. Frontier AI's are awesome and local LLM's are no replacement but they certainly should be used much more than they currently are. Graphics cards not just for gamers and crypto miners, but now they have become very useful to my friends and I on a daily basis thanks to lilbee.

It's time the masses have something simple to use, fully local, and all in one process / install. Computers can be more than frontends for agents and web browsers and it's time to take advantage of our hardware. This is my attempt at a solution to this problem. I think existing solutions have too many moving pieces and require too many heavy dependencies and often use sidecar style solutions.

There's no simple way to make local AI immediately useful right in the terminal. That was also a big motivation for me. I needed something terminal first. A single executable that anyone can run is much more ideal and shareable and democratic. It's easier to install and use that way for everyone.

Local AI at this time is for nerds but it doesn't have to be and this approach I think is the right direction towards that goal. There's a lot to this project so check out the description below for what to expect For the GUI option, I'm releasing an obsidian plugin on top of lilbee with feature parity to the terminal UI here https://github.com/tobocop2/obsidian-lilbee

Interactively or programmatically chat with a database of documents using strictly your own hardware, completely offline. Augment any AI agent via MCP or shell — take a free model or even a frontier model and make it better. Talks to an incredible amount of data formats (see supported formats). Integrate document search into your favorite GUI using the built-in REST API — no need for a separate web app when you already have a preferred GUI (see Obsidian plugin).

Why lilbee
Demos
Install
Quick start · Full usage guide
Agent integration
HTTP Server · API reference
Interactive chat
Supported formats

Why lilbee

Your hardware, your data — chat with your documents completely offline. No cloud, no telemetry, no API keys required
Make any model better — augment any AI agent via MCP or shell with hybrid RAG search. Take a free model or even a frontier model and make it leagues better at your data
Talks to everything — PDFs, Office docs, spreadsheets, images (OCR), ebooks, and 150+ code languages via tree-sitter
Bring your own GUI — built-in REST API means you can integrate document search into whatever tool you already use. No extra app needed (see Obsidian plugin)
Per-project databases — lilbee init creates a .lilbee/ directory (like .git/) so each project gets its own isolated index

Add files (lilbee add), then search or ask questions. Once indexed, search works without Ollama — agents use their own LLM to reason over the retrieved chunks.

Demos

Click the ▶ arrows below to expand each demo.

AI agent — lilbee search vs web search (detailed analysis)

opencode + minimax-m2.5-free, single prompt, no follow-ups. The Godot 4.4 XML class reference (917 files) is indexed in lilbee. The baseline uses Exa AI code search instead.

⚠️ Caution: minimax-m2.5-free is a cloud model — retrieved chunks are sent to an external API. Use a local model if your documents are private.

	API hallucinations	Lines
With lilbee (code · config)	0	261
Without lilbee (code · config)	4 (~22% error rate)	213

With lilbee — all Godot API calls match the class reference

Without lilbee — 4 hallucinated APIs (details)

If you spot issues with these benchmarks, please open an issue.

Vision OCR

Scanned PDF → searchable knowledge base

A scanned 1998 Star Wars: X-Wing Collector's Edition manual indexed with vision OCR (LightOnOCR-2), then queried in lilbee's interactive chat (qwen3-coder:30b, fully local). Three questions about dev team credits, energy management, and starfighter speeds — all answered from the OCR'd content.

See benchmarks, test documents, and sample output for model comparisons.

One-shot question from OCR'd content

The scanned Star Wars: X-Wing Collector's Edition guide, queried with a single lilbee ask command — no interactive chat needed.

Standalone

Interactive local offline chat

[!NOTE] Entirely local on a 2021 M1 Pro with 32 GB RAM.

Model switching via tab completion, then a Q&A grounded in an indexed PDF.

Code index and search

Add a codebase and search with natural language. Tree-sitter provides AST-aware chunking.

JSON output

Structured JSON output for agents and scripts.

Hardware requirements

When used standalone, lilbee runs entirely on your machine — chat with your documents privately, no cloud required.

Resource	Minimum	Recommended
RAM	8 GB	16–32 GB
GPU / Accelerator	—	Apple Metal (M-series), NVIDIA GPU (6+ GB VRAM)
Disk	2 GB (models + data)	10+ GB if using multiple models
CPU	Any modern x86_64 / ARM64	—

Ollama handles inference and uses Metal on macOS or CUDA on Linux/Windows. Without a GPU, models fall back to CPU — usable for embedding but slow for chat.

Install

Prerequisites

Python 3.11+
Ollama — the embedding model (nomic-embed-text) is auto-pulled on first sync. If no chat model is installed, lilbee prompts you to pick and download one.
Optional (for scanned PDF/image OCR): Tesseract (brew install tesseract / apt install tesseract-ocr) or an Ollama vision model (recommended for better quality — see vision OCR)

First-time download: If you're new to Ollama, expect the first run to take a while — models are large files that need to be downloaded once. For example, qwen3:8b is ~5 GB and the embedding model nomic-embed-text is ~274 MB. After the initial download, models are cached locally and load in seconds. You can check what you have installed with ollama list.

Install

pip install lilbee        # or: uv tool install lilbee

Development (run from source)

git clone https://github.com/tobocop2/lilbee && cd lilbee
uv sync
uv run lilbee

Quick start

See the usage guide.

Agent integration

lilbee can serve as a local retrieval backend for AI coding agents via MCP or JSON CLI. See docs/agent-integration.md for setup and usage.

HTTP Server

lilbee includes a REST API server so you can integrate document search into any GUI or tool:

lilbee serve                          # start on a random port (written to <data_dir>/server.port)
lilbee serve --port 8080              # or pick a fixed port

Endpoints include /api/search, /api/ask, /api/chat (with streaming SSE variants), /api/sync, /api/add, and /api/models. When the server is running, interactive API docs are available at /schema/redoc. See the API reference for the full OpenAPI schema.

Interactive chat

Running lilbee or lilbee chat enters an interactive REPL with conversation history, streaming responses, and slash commands:

Command	Description
`/status`	Show indexed documents and config
`/add [path]`	Add a file or directory (tab-completes paths)
`/model [name]`	Switch chat model — no args opens a curated picker; with a name, switches directly or prompts to download if not installed (tab-completes installed models)
`/vision [name\|off]`	Switch vision OCR model — no args opens a curated picker; with a name, prompts to download if not installed; `off` disables (tab-completes catalog models)
`/settings`	Show all current configuration values
`/set <key> <value>`	Change a setting (e.g. `/set temperature 0.7`)
`/version`	Show lilbee version
`/reset`	Delete all documents and data (asks for confirmation)
`/help`	Show available commands
`/quit`	Exit chat

Slash commands and paths tab-complete. A spinner shows while waiting for the first token from the LLM. Background sync progress appears in the toolbar without interrupting the conversation.

Supported formats

Text extraction powered by Kreuzberg, code chunking by tree-sitter. Structured formats (XML, JSON, CSV) get embedding-friendly preprocessing. This list is not exhaustive — Kreuzberg supports additional formats beyond what's listed here.

Format	Extensions	Requires
PDF	`.pdf`	—
Scanned PDF	`.pdf` (no extractable text)	Tesseract (auto, plain text) or Ollama vision model (recommended — preserves tables, headings, and layout as markdown)
Office	`.docx`, `.xlsx`, `.pptx`	—
eBook	`.epub`	—
Images (OCR)	`.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`, `.webp`	Tesseract
Data	`.csv`, `.tsv`	—
Structured	`.xml`, `.json`, `.jsonl`, `.yaml`, `.yml`	—
Text	`.md`, `.txt`, `.html`, `.rst`	—
Code	`.py`, `.js`, `.ts`, `.go`, `.rs`, `.java` and 150+ more via tree-sitter (AST-aware chunking)	—

See the usage guide for OCR setup and model benchmarks.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.beads		.beads
.github		.github
demos		demos
docs		docs
site		site
src/lilbee		src/lilbee
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lilbee

Why lilbee

Demos

Vision OCR

Standalone

Hardware requirements

Install

Prerequisites

Install

Development (run from source)

Quick start

Agent integration

HTTP Server

Interactive chat

Supported formats

License

About

Uh oh!

Releases 25

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

lilbee

Why lilbee

Demos

Vision OCR

Standalone

Hardware requirements

Install

Prerequisites

Install

Development (run from source)

Quick start

Agent integration

HTTP Server

Interactive chat

Supported formats

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 25

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages