Azure Document Intelligence – AI-Based PDF Classification & Splitting System

This repository demonstrates a real-world AI document processing system built using Azure Document Intelligence, designed to automatically:

Classify multi-document PDFs
Split them into individual files
Upload each document to the correct SharePoint folder

The solution replicates a production-grade back-office automation workflow used in enterprise document pipelines.

🚀 What This Project Does

Given a single multi-page PDF containing different document types (e.g. passport + license):

Classifies each page using a trained Azure classifier
Identifies document type per page
Splits the PDF accordingly
Uploads each document to SharePoint
Organizes files into business folders automatically

🧠 Use Cases

This system is ideal for:

KYC automation
Identity verification systems
Loan onboarding workflows
Back-office document segregation
Insurance & banking operations
Compliance document routing

📂 Project Structure


Azure-Document-Intelligence/
│
├── azure_doc_intelligence.py
│   → Simple local PDF classification demo
│
├── classify_split_pdf_using_azure_doc_intelligence.py
│   → Full enterprise pipeline with:
│      - Classification
│      - Splitting
│      - SharePoint upload
│
└── README.md (this file)

⚙️ Technology Stack

Azure Document Intelligence
Azure Custom Classifier
Python 3.10+
PyMuPDF (PDF splitting)
Microsoft Graph API
Azure AD (MSAL authentication)
SharePoint Online
REST APIs

🔑 Key Features

AI-powered document classification
Automatic PDF splitting
Multi-document handling
Zero manual sorting
Microsoft Graph integration
SharePoint upload automation
Enterprise authentication
Modular processing pipeline

🧩 System Architecture


PDF Input
↓
Azure Document Intelligence
↓
AI Classification Model
↓
Page-level Document Detection
↓
PDF Split Engine
↓
Microsoft Graph API
↓
SharePoint Document Library

🔐 Security Design

Azure AD app authentication
No hardcoded credentials in production
Token-based Microsoft Graph access
OAuth2 client credentials flow
Sample keys only in repository

▶️ How to Run

1. Install dependencies

pip install azure-ai-documentintelligence pymupdf msal requests

2. Configure environment

Edit:

DI_ENDPOINT
DI_KEY
CLASSIFIER_ID
CLIENT_ID
CLIENT_SECRET
TENANT_ID
SITE_URL

3. Execute

python classify_split_pdf_using_azure_doc_intelligence.py input.pdf

📈 Why This Project Is Valuable

This is not a tutorial demo.

It demonstrates:

Production-style AI pipelines
Real enterprise authentication
Business document automation
End-to-end system design
Cloud-native AI integration

The same architecture is used in:

Banks
FinTech platforms
Insurance companies
Legal tech systems
Government document processing

🧪 Sample Output

Input:

combined_kyc.pdf

Output:

combined_kyc_passport.pdf → SharePoint/Passport  
combined_kyc_license.pdf  → SharePoint/License

👤 Author

Dinakar S AI Engineer | Automation Engineer | Cybersecurity

This is exactly how senior engineers document real systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure Document Intelligence – AI-Based PDF Classification & Splitting System

🚀 What This Project Does

🧠 Use Cases

📂 Project Structure

⚙️ Technology Stack

🔑 Key Features

🧩 System Architecture

🔐 Security Design

▶️ How to Run

1. Install dependencies

2. Configure environment

3. Execute

📈 Why This Project Is Valuable

🧪 Sample Output

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
azure_doc_intelligence.py		azure_doc_intelligence.py
classify_split_pdf_using_azure_doc_intelligence.py		classify_split_pdf_using_azure_doc_intelligence.py

Folders and files

Latest commit

History

Repository files navigation

Azure Document Intelligence – AI-Based PDF Classification & Splitting System

🚀 What This Project Does

🧠 Use Cases

📂 Project Structure

⚙️ Technology Stack

🔑 Key Features

🧩 System Architecture

🔐 Security Design

▶️ How to Run

1. Install dependencies

2. Configure environment

3. Execute

📈 Why This Project Is Valuable

🧪 Sample Output

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages