renAI is a Python tool that automatically renames and organizes your digital files using Large Language Models (LLMs) and Optical Character Recognition (OCR). It extracts meaningful metadata from your files using either text extraction or multimodal AI vision, creating consistent, searchable filenames.
Last updated: 2026-04-16
Managing large collections of digital files can be overwhelming when they arrive with cryptic or inconsistent names. renAI solves this by intelligently analyzing your files and generating descriptive, organized filenames.
Two Powerful Approaches:
- Text Extraction Mode: Reads document content (PDF, EPUB, DOCX, etc.) to identify titles, authors, and publication years using traditional text-based LLMs
- Vision Mode: Uses vision-capable LLMs to "see" and understand documents and images. If text extraction fails, falls back to vision model instead of OCR.
Whether you're dealing with scanned PDFs, digital documents, or photos, renAI adapts to give you the best results.
The tool supports both cloud-based LLMs (DeepInfra, OpenRouter, OpenAI) and local inference (Ollama, LM Studio), giving you flexibility between convenience and privacy.
Documents (Text Extraction Mode):
Title -- Subtitle -- Edition -- Author -- (Year).ext
Example: Introduction to Machine Learning -- 3rd Edition -- Andrew Ng -- (2018).pdf
Documents (Vision Mode):
Title -- Author -- (Year) -- (Language).ext
Example: Deep Learning Research -- Ian Goodfellow -- (2016) -- (en).pdf
Images (Vision Mode):
Description -- [Date] -- (Language).ext
Example: Sunset -- [2024-07-15] -- (en).jpg
| Feature | Description |
|---|---|
| Multi-Format Support | Processes PDF, EPUB, MOBI, DOCX, PPTX, XLSX, TXT, MD, and image files (JPG, PNG, HEIC, etc.) |
| Multiple PDF Extractors | Choose from PyMuPDF, pdfplumber, pypdf, or pypdfium2 |
| AI-Driven Metadata | Uses LLMs to extract titles, authors, years, and categories |
| Vision Mode | Tries text extraction first; falls back to vision-capable LLM for PDFs if text extraction fails (instead of OCR) |
| Image Renaming | Rename image files using AI vision analysis |
| Advanced OCR | Tesseract integration with Enhanced Mode for low-quality scans |
| mutool Support | Extracts high-quality images from fixed-layout PDFs |
| Smart Caching | SHA256-based caching system (separate layers for both text and metadata). |
| AsyncIO-First | Fully asynchronous architecture using asyncio for non-blocking I/O and high concurrency |
| Async RPM Limiting | Integrated AsyncRateLimiter ensures provider compliance across concurrent tasks |
| Ruff Powered | ultra-fast linting and formatting for consistent code style |
| SSoT Architecture | Single Source of Truth for metadata using strictly-typed Pydantic models |
| Modular Prompts | Centralized, reusable prompt components with dynamic JSON schema injection |
| Unified Refinement | Standardized second-pass 'Senior Editor' logic for both text and vision modes |
| Token Optimization | Context-free refinement pass reduces secondary API costs by ~90% |
| Custom Categories | Define your own categories per language in categories.toml |
rename: Renames files based on extracted metadataorganize: Moves renamed files into category foldersbenchmark: Compares PDF extraction library performanceevaluate: Tests multiple providers and models for accuracy--fallback-mode ocr: Tries text extraction first. If it fails, falls back to Tesseract OCR (default behavior).--fallback-mode vision: Tries text extraction first (no OCR). If it fails, falls back to vision-capable LLM for PDFs.--fallback-mode move: Moves files where text extraction fails toNeeds_Scan/folder.--rename-images: Renames image files using vision-capable LLM. Also processes non-image files with text extraction + selected fallback.
-
Python 3.12+
-
Tesseract OCR - Download for Windows or install via package manager:
# Ubuntu/Debian sudo apt-get install tesseract-ocr # macOS brew install tesseract
-
mutool (Optional) - Part of MuPDF
# Clone the repository
git clone https://github.com/ozgurulukir/renAI.git
cd renAI
# Install using uv (creates 'renai' command)
uv sync --all-extras
# Initialize renAI (Interactive Wizard)
# This will guide you through provider setup and create necessary config files
renai initrenAI uses an interactive wizard for first-time setup. Run the following command:
renai initThis will create your configuration files in the appropriate OS directory:
- Windows:
%LOCALAPPDATA%\renAI - Linux:
~/.config/renAI - macOS:
~/Library/Application Support/renAI
The wizard helps you configure:
- LLM Provider: DeepInfra (default), OpenRouter, OpenAI, or Custom
- API Keys: Stored securely in
secrets.toml - General Settings: Concurrent workers, fallback strategies, etc.
renAI uses a robust, layered configuration system:
- User Overrides:
settings.tomlin your user config directory. - Physical Defaults:
settings.default.toml(auto-generated reference). - Internal Defaults: Hardcoded baseline settings for zero-setup operation.
settings.toml: Customize application behavior and override provider defaults.secrets.toml: Sensitive API keys. Never share this file!providers.toml: Register custom LLM providers or override endpoints.categories.toml: Define your own categories for organization (--mode organize).eval_models.toml: List models to test with theevaluatecommand.
Get keys from: DeepInfra | OpenRouter | OpenAI
You can also configure renAI using environment variables (prefixed with RENAI_):
RENAI_PROVIDER: e.g.,openaiRENAI_MODEL: e.g.,gpt-4oRENAI_WORKERS: e.g.,8
renAI follows a "Zero-Code Defaults" philosophy. You don't need to edit configuration files to start. Baseline factory defaults are included within the package, allowing it to "just work" after simple initialization.
Override options are available via:
- Environment variables (prefixed with
RENAI_) settings.tomlin your user config directorysettings.default.toml(Global baseline)- CLI flags (highest priority)
# Rename all files in directory
uv run renai process "C:/Path/To/Books" --mode rename# Rename and organize into category folders
renai process "C:/Path/To/Books" --mode organizeEvaluate multiple LLM providers and models against your sample files to find the best performer. Results are saved to evaluation.log in your app data directory.
# Evaluate ALL providers and models defined in eval_models.toml
renai evaluate "C:/Path/To/Samples"
# Evaluate only a specific provider
renai evaluate "C:/Path/To/Samples" --provider deepinfra
# Evaluate with custom worker count
renai evaluate "C:/Path/To/Samples" --workers 8Vision mode implements a "smart fallback" strategy: it tries text extraction first (which is faster and cheaper). If extraction fails, it falls back to a vision-capable LLM to process the PDF as an image.
# Process PDFs with vision model (no OCR) - uses default provider/model
renai process "C:/Path/To/Books" --fallback-mode vision
# Use specific vision model
renai process "C:/Path/To/Books" --fallback-mode vision --provider openrouter --model openai/gpt-4o
# DeepInfra with Llama Vision
renai process "C:/Path/To/Books" --fallback-mode vision --provider deepinfra --model meta-llama/Llama-3.2-90B-Vision-Instruct
# OpenAI GPT-4o
renai process "C:/Path/To/Books" --fallback-mode vision --provider openai --model gpt-4oRename image files using AI vision analysis. Requires a vision-capable model:
# Rename images with vision model
renai process "C:/Path/To/Photos" --rename-images --provider openai --model gpt-4o
# With EXIF date (if available)
renai process "C:/Path/To/Photos" --fallback-mode vision --rename-images --provider openai --model gpt-4o --exif
# With DeepInfra
renai process "C:/Path/To/Images" --fallback-mode vision --rename-images --provider deepinfra --model Qwen/Qwen2-VL-72B-InstructProcess both PDFs and images in the same folder:
| Parameter | PDF/Supported | Image |
|---|---|---|
--fallback-mode vision |
Normal β Vision fallback | βοΈ Skipped |
--fallback-mode vision --rename-images |
Normal β Vision fallback | π Vision rename |
--rename-images |
Normal β OCR fallback | π Vision rename |
# PDFs with vision, images with vision
renai process "C:/Path/To/Mixed" --fallback-mode vision --rename-images --provider openai --model gpt-4o
# Only rename images (PDFs with OCR)
renai process "C:/Path/To/Mixed" --rename-images --provider openai --model gpt-4o
# Only process PDFs with vision (skip images)
renai process "C:/Path/To/PDFs" --fallback-mode vision --provider openai --model gpt-4o# Enhanced OCR with image preprocessing
renai process "C:/Path/To/Books" --ocr-enhanced --use-mutooluv run renai process "C:/Path/To/Books" --mode rename --provider deepinfra
uv run renai process "C:/Path/To/Books" --mode rename --provider openrouter
uv run renai process "C:/Path/To/Books" --mode rename --provider openai
Use the custom provider to connect to local inference servers like Ollama or LM Studio. These tools provide OpenAI-compatible APIs.
β οΈ Important: Not all models are capable of structured output, particularly LLMs below 7B parameters. Check the model card README if you are unsure if the model supports structured output.renAI requires JSON output from the LLM for reliable metadata extraction. Models with fewer than 7B parameters may struggle with structured output formatting.
# Set environment variables (Windows PowerShell)
$env:CUSTOM_API_BASE_URL = "http://localhost:11434/v1" # Ollama default
$env:CUSTOM_MODEL = "llama3" # Your model name
# Run with custom provider
renai process "C:/Path/To/Books" --mode rename --provider custom
# Or override model directly
renai process "C:/Path/To/Books" --mode rename --provider custom --model mistralWhen running local models on consumer hardware, or to save tokens, you can control how much text is extracted and sent to the LLM:
# Reduce text length to 5000 characters (default: 10000)
renai process "C:/Path/To/Books" --mode rename --provider custom --text-length 5000
# Increase for better context if your model can handle it
renai process "C:/Path/To/Books" --mode rename --provider custom --text-length 20000
# Regular mode: Stop reading after 5 pages, no skipping
renai process "C:/Path/To/Books" --mode rename --text-extraction regular --extraction-pages 5| Option | Description | Default |
|---|---|---|
--text-length |
Target text length sent to LLM | 10000 |
--text-extraction |
Extraction chunking logic (mixed or regular) |
regular |
--extraction-pages |
Target pages to process from the document | 10 |
| Variable | Description | Default |
|---|---|---|
CUSTOM_API_BASE_URL |
Base URL of your local API server | http://localhost:8000/v1 |
CUSTOM_MODEL |
Model name to use | llama3 |
CUSTOM_API_KEY |
API key (usually not needed for local) | not-required |
| Tool | URL |
|---|---|
| Ollama | http://localhost:11434/v1 |
| LM Studio | http://localhost:1234/v1 |
| LocalAI | http://localhost:8080/v1 |
| Text Generation WebUI | http://localhost:5001/v1 |
# DeepInfra with Qwen model
renai process "C:/Path/To/Books" --mode rename --model Qwen/Qwen2.5-72B-Instruct
# OpenRouter with Gemini
renai process "C:/Path/To/Books" --mode rename --provider openrouter --model google/gemini-2.0-flash-exp:free# Increase concurrent workers
renai process "C:/Path/To/Books" --mode rename --workers 8# Bypass cache and re-extract
renai process "C:/Path/To/Books" --mode rename --update-metadataCompare the performance and accuracy of different PDF extraction libraries on your documents.
renai process "C:/Path/To/Books" --benchmarkThe evaluate command performs a competitive test:
- It takes the first few files from your directory.
- It processes them using every model listed in your
eval_models.toml. - It logs the extracted metadata and a confidence score for each attempt.
Configuration:
Models are loaded from eval_models.toml (User > Default > Internal).
Example eval_models.toml:
[deepinfra]
models = ["meta-llama/Llama-3.2-90B-Vision-Instruct", "meta-llama/Llama-3.1-405B-Instruct"]
[openai]
models = ["gpt-4o", "gpt-4o-mini"]# Clear all cache (text, metadata, and ocr_debug)
renai cache
# Clear cache without confirmation
renai cache --yesrenAI supports vision-capable LLMs that can process PDFs and images as visual content. By default, it tries text extraction first to save costs, falling back to vision only if necessary.
| Provider | Model | Best For |
|---|---|---|
| OpenAI | gpt-4o |
Best accuracy, reliable |
| OpenAI | gpt-4o-mini |
Fast, cost-effective |
| OpenRouter | openai/gpt-4o |
Flexible pricing |
| OpenRouter | google/gemini-1.5-flash |
Free tier available |
| DeepInfra | meta-llama/Llama-3.2-90B-Vision-Instruct |
Open source, powerful |
| DeepInfra | Qwen/Qwen2-VL-72B-Instruct |
Excellent value |
| Custom | llava, qwen-vl |
Local inference |
| Feature | Vision Mode (--vision) |
OCR Mode (default) |
|---|---|---|
| Speed | Faster for text PDFs | Slower |
| Accuracy | Higher for clear text | Depends on scan quality |
| Scanned Docs | May struggle | Optimized for scans |
| Languages | Limited to model | Turkish+English optimized |
| API Cost | Image tokens | Text tokens |
When to use Vision Mode:
- PDFs where text extraction might fail
- When you don't have Tesseract installed
- When OCR is too slow or inaccurate
- When you want better results on formatted documents
When to use OCR:
- Scanned documents
- Low-quality images
- Multi-language documents
- When you need Turkish character support
β οΈ Always backup your files before running renAI inrenameororganizemode.
- The tool renames and moves files
- Safety checks are included, but unexpected issues can occur
- Review the naming convention before applying to your entire collection
- Test with a small directory first
When using --mode organize:
- Files from all subdirectories are moved to category folders at the root
- Empty subdirectories may remain after files are moved
- Category folders are created at the root level, not within subdirectories
- Running multiple times will re-process files
| Limitation | Description |
|---|---|
| Text Length | Configurable via --text-length (e.g., 10000 chars) |
| API Costs | Each file processed sends text to an LLM API |
| Rate Limits | Providers may enforce rate limits |
| OCR Quality | Scanned documents require good quality scans |
| Language Support | Optimized for Turkish and English |
| Vision Mode | PDF only (images supported with --rename-images) |
--fallback-mode move: Moves files with insufficient text toNeeds_Scanfolder (skips fallback processing)--move-scan-failures: Moves files that fail both standard and fallback extraction toScan_Failuresfolder
- Regular Backups: Always keep a backup of your original files
- Start Small: Test with 5-10 files first to verify quality
- Monitor Costs: Set up usage alerts with your LLM provider
- Use Caching: Avoid
--update-metadataunless necessary - Choose OCR Wisely: Use
--ocr-enhanced --use-mutoolonly for scanned documents - Vision for PDFs: Use
--fallback-mode visionfor text-based PDFs (faster, more accurate) - Compare Providers: OpenRouter often offers competitive pricing
- Clear Cache: Use
renai cachewhen switching models or settings - Protect Privacy: For sensitive or private documents, use a local LLM (Ollama, LM Studio) with the
customprovider. This keeps your document content on your machine rather than sending it to cloud services.
renAI uses an intelligent caching system that includes provider-model awareness:
| Cache Type | Key Components | Invalidation |
|---|---|---|
| Text Cache | File hash + OCR settings | File Content: Cache is invalidated when the SHA256 hash of the file changes. |
| Metadata Cache | File hash + provider + model + schema validation | When file content, provider, model, or schema changes |
How it works:
- Each file is identified by its SHA256 hash
- Metadata cache keys include a hash of the provider-model pair
- Switching providers or models automatically invalidates old cache
- Use
--update-metadatato force fresh extraction regardless of cache
Example:
# First run with DeepInfra - caches metadata
renai process "C:/Books" --provider deepinfra
# Same files with OpenRouter - NEW cache entry (old one preserved)
renai process "C:/Books" --provider openrouter
# Same provider/model - uses cached metadata
renai process "C:/Books" --provider deepinfra
# Force refresh - bypasses cache
renai process "C:/Books" --provider deepinfra --update-metadataCache Location: Managed automatically by the OS via platformdirs (e.g., ~/.cache/renAI on Linux, ~/Library/Caches/renAI on macOS, %LOCALAPPDATA%\renAI\Cache on Windows).
Clear Cache:
# Clear all cache (metadata, text, and ocr_debug)
renai cache| Scenario | Recommended Provider |
|---|---|
| Public documents | Any cloud provider (DeepInfra, OpenRouter, OpenAI) |
| Private/personal files | custom provider with local LLM |
| Sensitive business documents | custom provider with local LLM |
| Testing/experimentation | Cloud providers for convenience |
For detailed privacy guidance, data retention policies, and compliance considerations, see PRIVACY.md.
| Files | Est. Cost (~$0.001/file) |
|---|---|
| 100 | $0.10 |
| 1,000 | $1.00 |
| 10,000 | $10.00 |
Actual costs vary by provider and model. Vision mode may have different pricing (image tokens vs text tokens).
renAI now utilizes modern OS-agnostic path management via the platformdirs library.
- Cache Data: Managed in
%LOCALAPPDATA%\renAI\Cache(Windows) or~/.cache/renAI(Linux). - Log and Debug Data: Managed in the OS user data directory.
- Configuration: Managed via
config.toml,secrets.toml, andcategories.tomlin the flat OS config directory (no nested folders).
Create categories.toml in your OS's respective user config directory for renAI (e.g., ~/.config/renAI/categories.toml on Linux, %APPDATA%\renAI\categories.toml on Windows):
[en]
categories = [
"Article", "Fiction", "Non-Fiction", "Textbook", "Report",
"Gov", "Presentation", "Manual", "Thesis", "Research",
"Other", "Dictionary", "Magazine"
]
[tr]
categories = [
"Makale", "Kurgu", "Kurgu DΔ±ΕΔ±", "Ders KitabΔ±", "Rapor",
"Devlet", "Sunum", "KΔ±lavuz", "Tez", "AraΕtΔ±rma",
"DiΔer", "SΓΆzlΓΌk", "Dergi"
]| Option | Description |
|---|---|
--provider |
deepinfra, openrouter, openai, or custom |
--model |
Model name (for text processing or vision if --fallback-mode vision is used) |
--extractor |
pymupdf, pypdf, pdfplumber, pypdfium2 |
--workers |
Concurrent workers (default: 4) |
--ocr-enhanced |
Enable enhanced OCR |
--use-mutool |
Use mutool for high-quality PDF-to-image conversion during OCR fallback (recommended for scanned PDFs) |
--debug-ocr |
Save debug images |
--recursive |
Process subdirectories (default: True) |
--no-recursive |
Process root only |
--exif |
Add EXIF date to image filenames when available |
--text-length |
Target text length for LLM (default: 10000) |
--text-extraction |
Extraction mode (mixed or regular) (default: regular) |
--extraction-pages |
Pages to process per document |
--update-metadata |
Force refresh metadata cache |
--fallback-mode |
Strategy when extraction fails (none, ocr, vision, move) |
--move-scan-failures |
Move failed files to Scan_Failures folder |
cache |
Command to clear cache |
init |
Command to start interactive setup wizard |
providers |
Command to list configured providers |
--version |
Show renAI version |
Important: When using
--fallback-mode visionor--rename-images, you must use a vision-capable model. renAI does not validate vision capability - please ensure your model supports multimodal/vision input.
βββ src/
β βββ renai/
β βββ cli.py # CLI entry point (Typer)
β βββ settings.py # Configuration & Pydantic settings loading
β βββ config/ # [NEW] Factory default TOML files
β βββ prompts/ # Modular .txt prompt templates
β βββ models/ # SSoT Pydantic Schemas (Book, Image)
β βββ engines/ # Vision & OCR engines
β βββ extractors/ # Document parsers
β βββ processors/ # Core processing logic
β β βββ book.py
β β βββ orchestrator.py # Pipeline orchestration
β β βββ setup.py # [NEW] Interactive Init Wizard
β β βββ text_extractor.py
β β βββ visual.py
β βββ ...
βββ tests/ # Async-powered unit tests (pytest-asyncio)
βββ pyproject.toml # Package metadata & dependencies
βββ ...
Contributions are welcome!
- Fork the repository
- Create a feature branch:
git checkout -b feature/NewFeature - Commit changes:
git commit -m 'Add NewFeature' - Push to branch:
git push origin feature/NewFeature - Open a Pull Request
MIT License - see LICENSE file for details.
- Issues: Open an issue on GitHub
- Discussions: Use GitHub Discussions
- Email: Contact maintainers
Vision mode allows renAI to process PDFs and images without text extraction, using multimodal AI models that can "see" document content.
Rename image files (photos, screenshots) using AI:
renai process "C:/Photos" --mode rename --fallback-mode vision --provider deepinfra --model meta-llama/Llama-3.2-90B-Vision-Instruct --rename-imagesThis will analyze each image and rename it based on its visual content.
Process folders containing both PDFs and images:
renai process "C:/Documents" --mode rename --fallback-mode vision --provider deepinfra --model meta-llama/Llama-3.2-90B-Vision-Instruct --rename-imagesThis processes:
- PDFs with vision model (no OCR needed)
- Images with vision model
When processing images, you can include EXIF date information in filenames:
renai process "C:/Photos" --mode rename --fallback-mode vision --provider deepinfra --model meta-llama/Llama-3.2-90B-Vision-Instruct --rename-images --exifThis adds the photo date from EXIF metadata to the filename when available.
| Model | Provider | Notes |
|---|---|---|
meta-llama/Llama-3.2-90B-Vision-Instruct |
DeepInfra | Best balance of quality and speed |
meta-llama/Llama-3.2-11B-Vision-Instruct |
DeepInfra | Faster, good for large batches |
openai/gpt-4o |
OpenAI | Highest quality, more expensive |
google/gemini-1.5-pro |
OpenRouter | Good alternative |
| Feature | Vision Mode | OCR Mode |
|---|---|---|
| Speed | Faster (no text extraction) | Slower |
| Accuracy | Depends on model | Good for clear text |
| Cost | API calls per file | API calls per file |
| Best for | Scanned PDFs, images | Text-based PDFs |
# Linting & Formatting
uv run ruff check .
uv run ruff format .
# Type Checking
uv run pyright
# Run Tests
uv run pytest tests/ -vrenAI - Organize your digital library with AI