searchable-pdf

Here are 18 public repositories matching this topic...

NanoNets / ocr-python

OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.

python pdf ocr tesseract pdf-to-text image-to-text textract pdf-to-csv pdf-to-json searchable-pdf pytesseract-ocr extract-table table-extract image-to-text-converter extract-text-from-image extract-text-from-pdf

Updated Dec 2, 2022
Jupyter Notebook

ahnafnafee / local-llm-pdf-ocr

Sponsor

Star

Convert scanned PDFs into searchable text locally using Vision LLMs (olmOCR). 100% private, offline, and free. Features a modern Web UI & CLI.

python ocr web-ui document-processing fastapi privacy-focused searchable-pdf no-api-key pdf-ocr local-llm offline-ai surya-ocr olmocr vision-llm

Updated Apr 28, 2026
Python

Achiwilms / OCR-Wizard

Star

A powerful and user-friendly tool based on OCRmyPDF, offering a seamless GUI for conversion of image-based PDFs into searchable text.

python pdf ocrmypdf ocr-recognition pdf-ocr-extraction ocr-python searchable-pdf ocr-pdf pdf-ocr

Updated Oct 28, 2023
Python

zaakki-ahamed / Arabic_OCR_From_PDF

Star

Perform Optical Character Recognition (OCR) on a scanned PDF file containing Arabic text and output a searchable PDF

optical-character-recognition arabic pytesseract searchable-pdf

Updated Dec 18, 2023
Python

msmarkgu / PdfOCRer

Star

A Python script that runs Paddle OCR on a possibly unsearchable PDF to make it searchable.

python ocr searchable-pdf paddleocr

Updated Dec 18, 2024
Python

timberger / Searchable-Image-PDF-Creat-O-Mat

Star

This batch script creates a searchable PDF of a PDF with one or more scanned pages which contain images.

pdf ghostscript imagemagick converter ocr drag drop tesseract scan batch scanned-documents batch-script scanned-pages imagemagick-wrapper searchable-pdfs scanned-image-pdfs tesseract-wrapper ghostscript-wrapper searchable-pdf

Updated Oct 22, 2022
Batchfile

Haighton / create_searchable_pdf

Star

Create a searchable PDF with ALTO-XML and JP2 files.

pdf alto-xml searchable-pdf

Updated Nov 30, 2020
CSS

pratik149 / pdf-table-extractor

Star

Extract tables from searchable as well as non-searchable pdf files

python console pdf opencv excel table extract-data searchable-pdf

Updated Oct 6, 2020
Jupyter Notebook

Self-hosted GPU-accelerated OCR web app — convert scanned PDFs to searchable PDF, Markdown, or Word. Powered by PaddleOCR. Supports Chinese (Traditional & Simplified) and multilingual documents. Single Docker container deployment.

docker pdf flask ocr document-conversion gpu cuda self-hosted paddlepaddle pdf-to-word chinese-ocr searchable-pdf paddleocr pdf-to-markdown

Updated Apr 6, 2026
Python

mwasifanwar / NeuroScan-AI

Star

NeuroScan-AI is an advanced document-understanding engine built with modern computer vision and OCR pipelines. It performs smart perspective correction, illumination normalization, and adaptive enhancement to transform raw camera captures into clean, searchable, professional-grade documents.

python pdf opencv ocr ai computer-vision deep-learning cv image-processing tesseract machine-vision document-scanner binarization deskew fastapi streamlit searchable-pdf ai-engineering document-enhancement

Updated Oct 28, 2025
Python

mghulamqadir / scanned-to-searchable-pdf

Star

Convert scanned PDF documents into searchable, OCR-processed, and PDF compliant files using ocrmypdf, powered by an interactive Streamlit interface. Supports parallel processing to handle large documents efficiently.

python pdf ocr tesseract-ocr google-colab searchable-pdf

Updated Jul 6, 2025
Python

jidel / Searchable-PDF-Creator

Star

Quick proof of concept to perform OCR on images.

ocr wpf tesseract searchable-pdf

Updated Jul 22, 2020
C#

irakliskhirtladze / OCR_viewer

Star

PySide6 app to perform batch image/PDF processing and OCR.

python computer-vision image-processing gui-application tesseract-ocr image-to-text opencv-python ocr-recognition ocr-python image-to-pdf searchable-pdf easyocr pyside6

Updated Dec 26, 2025
Python

sxaxmz / handle_scanned_pdf

Star

A wrapper on top of python-OCR tools such as pytesseract and easyocr, to recognize and extract text embedded in images. Also, convert scanned-PDFs to text searchable PDFs.

tesseract-ocr pytesseract ocr-python scanned-image-pdfs searchable-pdf easyocr scanned-pdf-documents extract-text-from-image extract-text-from-pdf

Updated Jul 6, 2024
Python

AlfredoCubitos / ocr2pdf

Star

Tool for creating searchable PDFs

ocr tesseract pdf-document searchable-pdf ocr2pdf

Updated Nov 27, 2019
Python

R0mb0 / Rapid_OCR

Sponsor

Star

A lightning-fast, privacy-first web app for offline text extraction. Paste (Ctrl+V) or drop any image to instantly generate plain text and a searchable PDF entirely within your browser using Tesseract.js. No server uploads required.

javascript css html ocr html5 offline-first css3 text-extraction html-css-javascript browser-based tailwind-css productivity-tool privacy-focused tesseract-js searchable-pdf italian-developers r0mb0

Updated Apr 27, 2026
HTML

maxgfr / copyable-pdf

Sponsor

Star

Lightweight bash script to convert scanned PDFs into searchable, copyable PDFs using Tesseract OCR with parallel processing.

cli pdf automation ocr tesseract scanned-documents poppler document-processing pdfunite searchable-pdf pdftoppm scanned-pdf

Updated Mar 6, 2026
Shell

juanso123 / local-llm-pdf-ocr

Star

python nlp docker-compose web-ui embeddings feature-extraction privacy-focused searchable-pdf geometric-transformations no-api-key llm ollama llimage olmocr vision-llm

Updated Apr 28, 2026
Python

Improve this page

Add a description, image, and links to the searchable-pdf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the searchable-pdf topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

searchable-pdf

Here are 18 public repositories matching this topic...

NanoNets / ocr-python

ahnafnafee / local-llm-pdf-ocr

Achiwilms / OCR-Wizard

zaakki-ahamed / Arabic_OCR_From_PDF

msmarkgu / PdfOCRer

timberger / Searchable-Image-PDF-Creat-O-Mat

Haighton / create_searchable_pdf

pratik149 / pdf-table-extractor

cyanyux / pdf-ocr

mwasifanwar / NeuroScan-AI

mghulamqadir / scanned-to-searchable-pdf

jidel / Searchable-PDF-Creator

irakliskhirtladze / OCR_viewer

sxaxmz / handle_scanned_pdf

AlfredoCubitos / ocr2pdf

R0mb0 / Rapid_OCR

maxgfr / copyable-pdf

juanso123 / local-llm-pdf-ocr

Improve this page

Add this topic to your repo