InvoSync

InvoSync is an AI-powered invoice and purchase order reconciliation system that uses Tesseract OCR and RapidFuzz to detect, extract, and compare data from scanned invoices and purchase orders. It automatically highlights mismatches, corrects inconsistencies, and exports clean, standardized results — saving hours of manual verification work.

Features

OCR Extraction: Reads invoice and purchase order data using Tesseract OCR.
Discrepancy Detection: Automatically compares and corrects vendor, quantity, and pricing mismatches.
Data Export: Generates corrected CSV reports and discrepancy summaries.
Modern Frontend: Built using React.js + Vite for a fast and smooth UI.
Seamless Backend Integration: Flask backend for OCR, comparison, and API handling.
Modular Architecture: Clean separation of OCR, parsing, comparison, and export logic.

Project Demo

Tech Stack

Layer	Technologies
Frontend	React.js, Vite, TailwindCSS
Backend	Python, Flask
OCR Engine	Tesseract OCR
Text Matching	RapidFuzz
Data Processing	Pandas, Regex
File Handling	Pillow, pdf2image

Project Structure

InvoSync/
│
├── backend/
│   ├── app.py               # Flask app entry point
│   ├── compare.py           # Handles comparison between PO and invoice
│   ├── export.py            # Exports corrected data to CSV
│   ├── ocr.py               # OCR logic using Tesseract
│   ├── parse.py             # Parses extracted text into structured format
│   └── requirements.txt     # Backend dependencies
│
├── frontend/
│   ├── index.html
│   ├── package.json
│   ├── package-lock.json
│   └── src/
│       ├── App.jsx
│       ├── main.jsx
│       ├── components/
│       │   ├── HowItWorks.jsx
│       │   ├── Page.jsx
│       │   └── Shell.jsx
│       ├── lib/
│       │   └── api.js
│       └── pages/
│           ├── Dashboard.jsx
│           ├── Exports.jsx
│           ├── Login.jsx
│           ├── RecordDetail.jsx
│           ├── Records.jsx
│           ├── Review.jsx
│           ├── Settings.jsx
│           ├── Signup.jsx
│           └── Upload.jsx
│
└── README.md

Architecture Diagram

     +---------------------+
     |  Frontend (React)   |
     |  ────────────────   |
     |  Upload Invoices     |
     |  View Discrepancies  |
     +----------+----------+
                |
                v
     +---------------------+
     |   Backend (Flask)   |
     |   ───────────────   |
     | OCR Extraction       |
     | Parsing & Comparison |
     | Export to CSV        |
     +----------+----------+
                |
                v
     +---------------------+
     |   Tesseract OCR     |
     |  Image → Text       |
     +---------------------+

Setup & Deployment

Backend Setup

cd backend
python -m venv venv
source venv/bin/activate      # (Windows: venv\Scripts�ctivate)
pip install -r requirements.txt
python app.py

Backend runs on http://localhost:5000

Frontend Setup

cd frontend
npm install
npm run dev

Frontend runs on http://localhost:5173

How Tesseract OCR is Used

Tesseract OCR is an open-source engine developed by Google that extracts text from images and PDFs.
In InvoSync, it’s used inside backend/ocr.py to process invoice and purchase order files.

Pipeline:

Preprocess image using Pillow and ImageEnhance for better clarity.

Extract text via:

import pytesseract
text = pytesseract.image_to_string(image)

Parse the extracted text into structured fields (item name, quantity, price, total).
Compare extracted invoice and purchase order details using RapidFuzz for fuzzy string matching.

Workflow

Upload invoice.jpg and purchase_order.jpg through the frontend.
Backend extracts and parses data using Tesseract OCR.
Mismatches (e.g., price/quantity/vendor) are identified.
Corrected invoice and discrepancy report are exported as:
- corrected_invoice.csv
- discrepancy_report.csv

Output Files

File	Description
`corrected_invoice.csv`	Final corrected invoice after comparison
`discrepancy_report.csv`	Log of mismatches and corrections made

Contributors

Developed by:
Ashish Kumar Nanda
Kushagra Gupta
Rashi Gupta
Radhi Pahuja

If you find this project useful, please consider giving it a ⭐ on GitHub!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InvoSync

Features

Project Demo

Tech Stack

Project Structure

Architecture Diagram

Setup & Deployment

Backend Setup

Frontend Setup

How Tesseract OCR is Used

Workflow

Output Files

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InvoSync

Features

Project Demo

Tech Stack

Project Structure

Architecture Diagram

Setup & Deployment

Backend Setup

Frontend Setup

How Tesseract OCR is Used

Workflow

Output Files

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages