This Python script uses Tesseract OCR and Regex to extract phone numbers from images.
It processes all images inside an images folder and saves results into a CSV file.
- β
If phone numbers are found β image is moved to
success/ - β If no phone number or error β image is moved to
failed/ - π A
numbers.csvfile is generated with extracted numbers and source filenames. - π The main
imagesfolder will be emptied after processing.
project/
βββ main.py
βββ README.md
βββ numbers.csv (generated automatically)
βββ images/ # put your input images here
βββ success/ # created automatically
βββ failed/ # created automatically
-
Clone the repository:
git clone https://github.com/pythonicshariful/phone-number-extractor.git cd phone-number-extractor -
Install dependencies:
pip install -r requirements.txt
-
Install Tesseract OCR:
- Windows: Download here
- Linux (Ubuntu/Debian):
sudo apt install tesseract-ocr
- macOS (Homebrew):
brew install tesseract
-
Update the path to
tesseract.exeinsidemain.pyif needed:pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
-
Place your images inside the
images/folder.
Supported formats:.png,.jpg,.jpeg -
Run the script:
python main.py
-
Results:
- Extracted phone numbers β
numbers.csv - Successfully processed images β
success/ - Failed images β
failed/
- Extracted phone numbers β
- Python 3.8+
- Tesseract OCR
- Python libraries:
pillow pytesseract tqdm
Install them with:
pip install pillow pytesseract tqdmMIT License