This guide walks through what happens the first time you run Archive Brain, what to expect, and how to tell whether things are working as intended.
If you’re comfortable with Docker but new to this project, this is the right place to start.
On the first run, Archive Brain does more work than on subsequent starts. Specifically, it will:
- Pull required LLM models (several GB total)
- Initialize the PostgreSQL database and vector indexes
- Start scanning configured source directories
- Begin ingesting, segmenting, and enriching documents
Depending on your hardware and dataset size, this can take anywhere from a few minutes to over an hour.
During this time:
- CPU usage may spike
- Memory usage may increase
- Fans may spin up
- The UI may appear empty or partially populated
All of this is expected behavior.
The background pipeline runs inside the worker container.
To follow progress in real time:
docker compose logs -f workerTo check model downloads and LLM readiness:
docker compose logs ollamaIf models are still downloading, the pipeline will wait and retry automatically.
Archive Brain only processes files from directories explicitly listed in:
config/config.yaml
For your first run:
- Start with a small test folder
- Avoid pointing at your entire home directory
- Confirm ingestion works as expected before expanding scope
This makes it easier to understand the system and avoids long initial processing times.
Recommended minimums:
- 16 GB RAM
- SSD storage
- GPU optional (but helpful)
If you experience out-of-memory issues:
- Switch to a smaller LLM model
- Use an external Ollama instance with GPU acceleration
- Reduce the number or size of source directories
The system is designed to favor correctness and clarity over speed.
It is safe to:
- Restart containers
- Re-run the pipeline
- Change models and retry enrichment
- Adjust source directories
Most pipeline steps are idempotent, meaning unchanged files are skipped automatically.
To fully reset the system and start fresh:
docker compose -f docker-compose.yml --profile prod down -v
docker compose -f docker-compose.yml --profile prod up -d --buildThis removes all stored data and embeddings.
- Documents may still be processing
- Check worker logs for activity
- First runs are the slowest
- Subsequent runs reuse metadata and embeddings
Verify the vision model is installed:
docker compose exec ollama ollama listPull it manually if needed:
docker compose exec ollama ollama pull llavaYou’ll know the first run has stabilized when:
- Worker logs quiet down
- Search results start appearing consistently
- CPU and memory usage drop back to idle levels
From here, Archive Brain behaves incrementally — only new or changed files are processed.
- Explore semantic search in the UI
- Try natural-language questions
- Gradually expand your source directories
- Read the architecture overview if you want to customize or extend the system
This system rewards curiosity and iteration. Take it slow, observe the logs, and adjust as you go.