ledoxide

Note

This doc is generated by AI because what else can we use

ledoxide is a specialized, client-pulling based HTTP server designed to implement a Vision-Language Model (VLM) based bookkeeping and expense extraction workflow. Its primary goal is to process images of receipts, invoices, or screenshotted transaction records (e.g., social media purchase notifications) and autonomously extract structured billing data: descriptions (notes), exact monetary amounts, and appropriate expense categorization.

Usage

The application is containerized and readily available via Docker.

Running with Docker

Because ledoxide relies on heavy Vision-Language Models executing through llama.cpp, running the container with NVIDIA GPU support (--gpus all) is highly recommended.

docker run -p 3100:3100 \
  --gpus all \
  -e HF_TOKEN="your_huggingface_token" \
  -e AUTH_KEY="your_secret_bearer_token" \
  -v $HOME/.cache/huggingface:/huggingface \
  zhufucdev/ledoxide:latest

Environment Variables

Variable	Description
`AUTH_KEY`	Used as the Bearer token to protect endpoints. If not provided via flag or env var, a random key is generated and logged on startup.
`HF_HOME`	Directory for Hugging Face cache (Defaults to `/huggingface` inside the Docker image). Crucial for caching the heavy LLM/VLM models between container restarts.
`HF_TOKEN`	Required for downloading models from Hugging Face if they are gated or to avoid rate limits.
`HF_ENDPOINT`	Can be used to set a custom Hugging Face proxy endpoint.
`RUST_LOG`	Set to `debug` to enable verbose logging, including underlying `llama.cpp` inference logs.

CLI Arguments

When running natively or overriding the Docker command, the following arguments are supported:

-b, --bind <BIND>: The address to bind to (default: 127.0.0.1:3100).
-c, --categories <CATEGORIES>: A list of valid categories for expenses (defaults: Groceries, Transport, Rent, Entertainment, Shopping, Drink, Food).
--max-concurrency <N>: Maximum number of models to run simultaneously (default: 4).
--large-model: Instructs the server to use a larger vision model configuration.
--model-timeout-minutes <MINS>: Time before an inactive model is evicted from RAM/VRAM to save resources (default: 5).
--offline: Prevents reaching out to Hugging Face; forces the use of locally cached models only.

API Endpoints

The server exposes a simple REST API:

GET / Returns the server package name and version string.
POST /create_task Accepts a multipart/form-data payload containing an image file (key: image) and optionally lm_sampling and vlm_sampling JSON parameters. Requires: Authorization: Bearer <AUTH_KEY> header. Returns: A JSON TaskControlBlock containing a unique task ID indicating the task is pending.
GET /get_task/{task_id} Checks the status of a specific task by ID. Requires: Authorization: Bearer <AUTH_KEY> header. Returns: The task state (pending, running, or finished). If finished, it includes the extracted structured data: notes, amount, and category.

Implementation Details

Architecture: The application is written in Rust, leveraging tokio for its async runtime and axum for HTTP routing.
Inference Engine: It uses llama-cpp-2 for efficient local inference and hf-hub for model distribution management. It heavily utilizes LLM grammar constraints (llguidance and .lark schema files) to strictly enforce output formatting (ensuring numbers are extracted cleanly and categories strictly match the configured list).
Model Pipeline: The standard pipeline involves a Vision Model (defaulting to Qwen3-VL-4B-Instruct-GGUF) that extracts a highly detailed text description of the uploaded image. This description is then piped into a smaller Text Model (defaulting to gemma-3-1b-it-qat-q4_0-gguf) that runs targeted prompts to extract the summary notes, numeric amount, and category.

Caching Strategies & Resource Management

Model Memory Timeout: To preserve system RAM and GPU VRAM, ledoxide wraps its loaded models in a TimedModel construct. If a model remains unused for the configurable timeout period (default 5 minutes), it is automatically dropped from memory and will be seamlessly reloaded from disk on the next request.
Task Swapping: To prevent the server's memory from bloating with historical task data over long uptimes, the internal Scheduler implements an on-disk swap queue. When the in-memory finished queue exceeds --max-memory-size (default: 468,000 items), older finished tasks are serialized using postcard and flushed to a temporary swap file on disk. The /get_task endpoint streams over both active memory and the disk swap seamlessly.
Hugging Face Mount: To prevent redownloading multi-gigabyte .gguf files, it is vital to mount the HF_HOME cache to persistent host storage when using Docker.

Minor Caveats

Task Removal: Finished tasks remain in memory or the on-disk swap file indefinitely. There is currently no API to "delete" or "acknowledge" a task to free its disk footprint once retrieved. Over extreme uptimes on busy servers, the swap file could grow continuously.
CUDA Optimization: Depending on your GPU architecture, the Docker image may trigger a warning about an unsupported UPSCALE operator in the MTL0 backend for CLIP execution, though inference typically falls back gracefully. Flash Attention is enabled by default to mitigate this footprint.
Gated Model: Attempts to download Gemma 3 would fail without specifying a valid HF_TOKEN. You can visit a Gemma repo to see if you have access.

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
asset		asset
constraint		constraint
prompt		prompt
src		src
.dockerignore		.dockerignore
.envrc		.envrc
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
cloudbuild.yaml		cloudbuild.yaml
flake.lock		flake.lock
flake.nix		flake.nix
prebuild.pl		prebuild.pl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ledoxide

Usage

Running with Docker

Environment Variables

CLI Arguments

API Endpoints

Implementation Details

Caching Strategies & Resource Management

Minor Caveats

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ledoxide

Usage

Running with Docker

Environment Variables

CLI Arguments

API Endpoints

Implementation Details

Caching Strategies & Resource Management

Minor Caveats

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages