This repository demonstrates a real-world AI document processing system built using Azure Document Intelligence, designed to automatically:
- Classify multi-document PDFs
- Split them into individual files
- Upload each document to the correct SharePoint folder
The solution replicates a production-grade back-office automation workflow used in enterprise document pipelines.
Given a single multi-page PDF containing different document types (e.g. passport + license):
- Classifies each page using a trained Azure classifier
- Identifies document type per page
- Splits the PDF accordingly
- Uploads each document to SharePoint
- Organizes files into business folders automatically
This system is ideal for:
- KYC automation
- Identity verification systems
- Loan onboarding workflows
- Back-office document segregation
- Insurance & banking operations
- Compliance document routing
Azure-Document-Intelligence/
│
├── azure_doc_intelligence.py
│ → Simple local PDF classification demo
│
├── classify_split_pdf_using_azure_doc_intelligence.py
│ → Full enterprise pipeline with:
│ - Classification
│ - Splitting
│ - SharePoint upload
│
└── README.md (this file)
- Azure Document Intelligence
- Azure Custom Classifier
- Python 3.10+
- PyMuPDF (PDF splitting)
- Microsoft Graph API
- Azure AD (MSAL authentication)
- SharePoint Online
- REST APIs
- AI-powered document classification
- Automatic PDF splitting
- Multi-document handling
- Zero manual sorting
- Microsoft Graph integration
- SharePoint upload automation
- Enterprise authentication
- Modular processing pipeline
PDF Input
↓
Azure Document Intelligence
↓
AI Classification Model
↓
Page-level Document Detection
↓
PDF Split Engine
↓
Microsoft Graph API
↓
SharePoint Document Library
- Azure AD app authentication
- No hardcoded credentials in production
- Token-based Microsoft Graph access
- OAuth2 client credentials flow
- Sample keys only in repository
pip install azure-ai-documentintelligence pymupdf msal requestsEdit:
DI_ENDPOINT
DI_KEY
CLASSIFIER_ID
CLIENT_ID
CLIENT_SECRET
TENANT_ID
SITE_URLpython classify_split_pdf_using_azure_doc_intelligence.py input.pdfThis is not a tutorial demo.
It demonstrates:
- Production-style AI pipelines
- Real enterprise authentication
- Business document automation
- End-to-end system design
- Cloud-native AI integration
The same architecture is used in:
- Banks
- FinTech platforms
- Insurance companies
- Legal tech systems
- Government document processing
Input:
combined_kyc.pdf
Output:
combined_kyc_passport.pdf → SharePoint/Passport
combined_kyc_license.pdf → SharePoint/License
Dinakar S AI Engineer | Automation Engineer | Cybersecurity
This is exactly how senior engineers document real systems.