OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
-
Updated
Dec 2, 2022 - Jupyter Notebook
OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
Convert scanned PDFs into searchable text locally using Vision LLMs (olmOCR). 100% private, offline, and free. Features a modern Web UI & CLI.
A powerful and user-friendly tool based on OCRmyPDF, offering a seamless GUI for conversion of image-based PDFs into searchable text.
Perform Optical Character Recognition (OCR) on a scanned PDF file containing Arabic text and output a searchable PDF
A Python script that runs Paddle OCR on a possibly unsearchable PDF to make it searchable.
This batch script creates a searchable PDF of a PDF with one or more scanned pages which contain images.
Create a searchable PDF with ALTO-XML and JP2 files.
Extract tables from searchable as well as non-searchable pdf files
Self-hosted GPU-accelerated OCR web app — convert scanned PDFs to searchable PDF, Markdown, or Word. Powered by PaddleOCR. Supports Chinese (Traditional & Simplified) and multilingual documents. Single Docker container deployment.
NeuroScan-AI is an advanced document-understanding engine built with modern computer vision and OCR pipelines. It performs smart perspective correction, illumination normalization, and adaptive enhancement to transform raw camera captures into clean, searchable, professional-grade documents.
Convert scanned PDF documents into searchable, OCR-processed, and PDF compliant files using ocrmypdf, powered by an interactive Streamlit interface. Supports parallel processing to handle large documents efficiently.
Quick proof of concept to perform OCR on images.
PySide6 app to perform batch image/PDF processing and OCR.
A wrapper on top of python-OCR tools such as pytesseract and easyocr, to recognize and extract text embedded in images. Also, convert scanned-PDFs to text searchable PDFs.
Tool for creating searchable PDFs
A lightning-fast, privacy-first web app for offline text extraction. Paste (Ctrl+V) or drop any image to instantly generate plain text and a searchable PDF entirely within your browser using Tesseract.js. No server uploads required.
Lightweight bash script to convert scanned PDFs into searchable, copyable PDFs using Tesseract OCR with parallel processing.
Add a description, image, and links to the searchable-pdf topic page so that developers can more easily learn about it.
To associate your repository with the searchable-pdf topic, visit your repo's landing page and select "manage topics."