Skip to content

Latest commit

 

History

History
96 lines (72 loc) · 2 KB

File metadata and controls

96 lines (72 loc) · 2 KB

Phone Number Extractor from Images

This Python script uses Tesseract OCR and Regex to extract phone numbers from images.
It processes all images inside an images folder and saves results into a CSV file.

  • ✅ If phone numbers are found → image is moved to success/
  • ❌ If no phone number or error → image is moved to failed/
  • 📊 A numbers.csv file is generated with extracted numbers and source filenames.
  • 🔄 The main images folder will be emptied after processing.

📂 Folder Structure

project/
│── main.py
│── README.md
│── numbers.csv (generated automatically)
│── images/        # put your input images here
│── success/       # created automatically
│── failed/        # created automatically

⚙️ Installation

  1. Clone the repository:

    git clone https://github.com/pythonicshariful/phone-number-extractor.git
    cd phone-number-extractor
  2. Install dependencies:

    pip install -r requirements.txt
  3. Install Tesseract OCR:

    • Windows: Download here
    • Linux (Ubuntu/Debian):
      sudo apt install tesseract-ocr
    • macOS (Homebrew):
      brew install tesseract
  4. Update the path to tesseract.exe inside main.py if needed:

    pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

▶️ Usage

  1. Place your images inside the images/ folder.
    Supported formats: .png, .jpg, .jpeg

  2. Run the script:

    python main.py
  3. Results:

    • Extracted phone numbers → numbers.csv
    • Successfully processed images → success/
    • Failed images → failed/

🛠 Requirements

  • Python 3.8+
  • Tesseract OCR
  • Python libraries:
    pillow
    pytesseract
    tqdm

Install them with:

pip install pillow pytesseract tqdm

📜 License

MIT License