This is a PDF to DOCX converter application built with Streamlit that uses Google's Gemini AI for OCR (Optical Character Recognition) and text correction. The application converts PDF documents to Word documents with support for both Persian and English text, preserving formatting while correcting spelling, grammar, and punctuation.
Preferred communication style: Simple, everyday language.
- Streamlit - Used as the web framework for the user interface
- Single-page application with centered layout
- Page configured with custom title and icon
- PDF Processing - Uses
pdf2imagelibrary to convert PDF pages to images - OCR & Text Correction - Leverages Google Gemini AI API to extract and correct text from images
- Document Generation - Uses
python-docxlibrary to create Word documents from extracted text
- Progress Tracking - JSON-based progress file (
progress.json) for tracking conversion state - Temporary Storage - Dedicated directories for temporary images (
temp_images/) and output files (output/) - Gemini Integration - Client wrapper for Google's Generative AI API with structured prompts for OCR
- AI-Powered OCR: Chose Gemini AI over traditional OCR libraries (like Tesseract) for better accuracy with mixed Persian/English text and automatic grammar/spelling correction
- Markdown Intermediate Format: Text is extracted as Markdown to preserve formatting (headings, bold, lists, tables) before converting to DOCX
- Progress Persistence: JSON file-based progress tracking enables resumable conversions
- Google Gemini AI - Primary OCR and text correction service
- Requires
GEMINI_API_KEYenvironment variable - Uses
google-genaiPython client library
- Requires
streamlit- Web application frameworkpdf2image- PDF to image conversion (requires poppler system dependency)Pillow(PIL) - Image processingpython-docx- Word document generationgoogle-genai- Google Generative AI client
- Poppler - Required by pdf2image for PDF rendering (must be installed at system level)