Skip to content

Latest commit

 

History

History
11 lines (6 loc) · 1.75 KB

File metadata and controls

11 lines (6 loc) · 1.75 KB

Python-OCR

This project leverages Google Tesseract OCR in combination with OpenCV to enable real-time text detection from a computer webcam feed, supporting multiple languages. Its architecture utilizes multi-threading to optimize performance by separating the video capture and OCR processing tasks into distinct threads.

The system operates with an OpenCV video stream running in a dedicated thread that continuously captures frames from the webcam and stores the most recent frame as a class attribute. Concurrently, an independent OCR thread accesses this latest frame, performs text recognition using Tesseract via pytesseract, and returns the detected text along with bounding boxes once processing is complete. This concurrent execution prevents frame display delays that commonly occur when OCR processing is performed synchronously within the main video loop, significantly improving real-time responsiveness.

While the OCR thread may not process every frame due to computational limits, this latency is within acceptable bounds for typical applications where the camera views stable text for a few seconds. The bounding boxes and recognized text are dynamically updated on the frame display as results become available.

The modular design of the script allows easy modification to accommodate different camera sources, image types, or image processing techniques, making it a flexible OpenCV video stream implementation enhanced with real-time OCR capabilities.

Tesseract must be installed on the system beforehand and is compatible with major operating systems including Windows, macOS, and Linux. This architecture effectively balances real-time video streaming with asynchronous OCR processing to deliver continuous text recognition through a standard webcam setup.