A background service that watches directories for documents, converts them to Markdown, and indexes them for full-text search. Includes an MCP server for integration with AI assistants.
- Document Conversion: Automatically converts documents (PDF, DOCX, XLSX, PPTX, HTML, etc.) to Markdown using markitdown
- Full-Text Search: Indexes converted documents with Tantivy for fast search
- Background Service: Runs as a daemon watching for file changes
- MCP Server: Exposes search functionality to AI assistants via the Model Context Protocol
- XDG Compliant: Respects
XDG_CONFIG_HOME,XDG_DATA_HOME, andXDG_STATE_HOME
# Clone the repository
git clone https://github.com/byteowlz/ingestr.git
cd ingestr
# Install both binaries
cargo install --path ingestr-cli
cargo install --path ingestr-mcpOr use just:
just install-all- Initialize configuration:
ingestr initThis creates a config file at ~/.config/ingestr/config.toml.
- Start the service:
# Run in foreground
ingestr service run
# Or run as background daemon
ingestr service start- Search your documents:
ingestr search "quarterly report"ingestr <COMMAND>
Commands:
service Manage the background conversion and indexing service
search Query the search index
init Create config directories and default files
config Inspect and manage configuration
completions Generate shell completions
# Start in foreground (useful for debugging)
ingestr service run
# Start as background daemon
ingestr service start
# Stop the daemon
ingestr service stop
# Check status
ingestr service status
# Restart
ingestr service restart
# Convert existing files once and exit
ingestr service run --once--watch-dir <PATH> Directory to watch for documents (default: ~/Documents)
--output-dir <PATH> Directory for converted Markdown files (default: ~/markdown)
--index-dir <PATH> Directory for search index (default: $XDG_DATA_HOME/ingestr-cli/index)
--disable-index Convert files without indexing
--once Process existing files and exit# Basic search
ingestr search "search terms"
# Limit results
ingestr search "budget" --limit 5
# Output as JSON
ingestr search "report" --json
# Output as YAML
ingestr search "report" --yaml# Show effective configuration
ingestr config show
# Show config file path
ingestr config path
# Reset to defaults
ingestr config reset# Bash
ingestr completions bash > ~/.local/share/bash-completion/completions/ingestr
# Zsh
ingestr completions zsh > ~/.zfunc/_ingestr
# Fish
ingestr completions fish > ~/.config/fish/completions/ingestr.fishConfiguration is loaded from (in order of increasing priority):
- Default values
- Global config:
$XDG_CONFIG_HOME/ingestr/config.toml(or~/.config/ingestr/config.toml) - Local config:
./config.toml - Environment variables:
INGESTR_CLI__<SECTION>__<KEY> - CLI-specified config:
--config <path> - Command-line arguments
profile = "default"
[logging]
level = "info"
# file = "~/Library/Logs/ingestr.log"
[runtime]
# parallelism = 8
timeout = 60
fail_fast = true
[watcher]
watch_dir = "~/Documents"
debounce_ms = 750
skip_hidden = true
[output]
markdown_dir = "~/markdown"
[index]
enabled = true
# index_dir = "$XDG_DATA_HOME/ingestr/index"
[paths]
# data_dir = "$XDG_DATA_HOME/ingestr"
# state_dir = "$XDG_STATE_HOME/ingestr"Override any config value using environment variables:
INGESTR_CLI__WATCHER__WATCH_DIR=~/MyDocs ingestr service run
INGESTR_CLI__INDEX__ENABLED=false ingestr service runThe ingestr-mcp binary provides an MCP server for AI assistant integration.
Add to your MCP client configuration (e.g., Claude):
{
"ingestr": {
"command": "ingestr-mcp",
"args": []
}
}Or generate the config:
ingestr-mcp --show-config| Tool | Description |
|---|---|
search |
Search the document index for relevant content |
open_source |
Open a source document (requires confirmation) |
--config <PATH> Override config file path
--index-dir <PATH> Override index directory
--log-level <LEVEL> Set log level (error, warn, info, debug, trace)
--show-config Print MCP client configuration JSON and exitThe MCP server automatically starts the ingestr service if not already running.
| Path | Description |
|---|---|
$XDG_CONFIG_HOME/ingestr/config.toml |
Configuration file |
$XDG_DATA_HOME/ingestr/index/ |
Tantivy search index |
$XDG_STATE_HOME/ingestr/service.pid |
Background service PID |
$XDG_STATE_HOME/ingestr/service.log |
Background service logs |
ingestr/
ingestr-cli/ # CLI application and background service
ingestr-core/ # Shared search index library
ingestr-mcp/ # MCP server for AI assistants
# Check all crates
just check
# Format code
just fmt
# Run tests
just test
# Run the service during development
just serve
# Run MCP server during development
just mcpSee LICENSE file for details.