A polished, local-first OCR web app powered by Gemma-3 Vision and Streamlit.
GemmaOCR-APP lets you upload an image, run OCR with a multimodal LLM, and receive well-structured Markdown output (headings, lists, code blocks, etc.)—all in a clean interface.
- Local OCR pipeline with Ollama +
gemma3:12b - Simple Streamlit UI with:
- image upload (
png,jpg,jpeg) - one-click text extraction
- clear/reset action
- image upload (
- Structured output, not plain text dump
- Runs on your machine (no external OCR SaaS required)
- Python
- Streamlit (UI)
- Ollama Python client (model inference)
- Gemma-3 Vision model (
gemma3:12b) - Pillow (image handling)
GemmaOCR-APP/
├── app.py # Streamlit application
├── README.md # Project documentation
└── LICENSE # License file
Note:
app.pyreferences./assets/gemma3.pngfor the header icon. Add this file if it is missing in your local copy.
Before running the app, ensure:
- Python 3.9+ is installed
- Ollama is installed and running
- The Gemma model is pulled locally
ollama pull gemma3:12bgit clone https://github.com/<your-username>/GemmaOCR-APP.git
cd GemmaOCR-APPpython -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows PowerShellpip install streamlit ollama pillowstreamlit run app.pyThen open the local URL shown by Streamlit (usually http://localhost:8501).
- Upload an image from the sidebar.
- Click Extract Text 🔍.
- The app sends your image and OCR prompt to
gemma3:12bvia Ollama. - The model response is rendered as Markdown in the main panel.
- Click Clear 🗑️ to reset current output.
This project is designed to run locally and does not require usernames, passwords, or API keys by default.
For security reasons, do not place real personal credentials in source code, README files, or Git history.
If you later add external integrations (e.g., cloud storage, paid APIs), use environment variables and a .env file that is excluded from version control.
- Confirm Ollama is running:
ollama list
- Ensure
gemma3:12bis installed:ollama pull gemma3:12b
- Re-launch Streamlit after model download.
Create assets/ and add gemma3.png, or adjust the image path in app.py.
- Batch OCR for multiple images
- Download results as
.md/.txt - Bounding-box OCR visualization overlay
- Language selection and prompt presets
- Dockerized deployment profile
Contributions are welcome.
- Fork the repo
- Create a feature branch
- Commit changes
- Open a pull request with a clear description
This project is licensed under the terms in LICENSE.