Skip to content

niphoenixo/AI-Document-Intelligence-API-Remote-Friendly-Use-Case-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 AI Document Intelligence API

A scalable, async AI-powered document processing and question-answering system built with FastAPI, Python, and LLM frameworks. This service allows users to upload structured and unstructured documents, perform semantic search, and ask natural language questions using embeddings and vector databases..


🚀 Key Features

  • Upload and process CSV, Excel, PDF, and JSON documents
  • Clean and preprocess data using Pandas / NumPy
  • Generate embeddings using LLM frameworks (LangChain)
  • Store and retrieve vectors using a Vector Database (FAISS / Chroma)
  • Ask natural language questions over uploaded documents
  • Fully async REST APIs for high performance
  • Clean, modular, and scalable backend architecture

🧠 Architecture Overview

Client
  ↓
FastAPI (Async REST APIs)
  ↓
Document Service
  ├── File Processors (CSV / Excel / PDF / JSON)
  ├── Data Cleaning (Pandas / NumPy)
  ├── Text Chunking
  ↓
Embedding Service (LangChain)
  ↓
Vector Store (FAISS / Chroma)
  ↓
LLM (Semantic Search & Q&A)

🛠️ Tech Stack

Backend

  • Python 3.10+
  • FastAPI
  • Pydantic
  • AsyncIO

AI / LLM

  • LangChain
  • OpenAI / Vertex AI (pluggable)

Data

  • Pandas
  • NumPy
  • PostgreSQL / MySQL (metadata)
  • FAISS / ChromaDB (vector storage)

DevOps

  • Docker
  • Docker Compose
  • Git

📁 Project Structure

ai-document-intelligence/
├── app/
│   ├── api/            # API routes
│   ├── services/       # Business logic
│   ├── processors/     # File processors
│   ├── models/         # DB models
│   ├── schemas/        # API schemas
│   ├── core/           # Config & logging
│   ├── db/             # DB session
│   └── main.py         # App entry point
├── embeddings/         # Vector DB storage
├── data/               # Sample documents
├── tests/              # Unit & API tests
├── Dockerfile
├── docker-compose.yml
└── README.md

🔌 API Endpoints

1️⃣ Upload Document

POST /api/v1/documents/upload

Supports:

  • CSV
  • Excel
  • PDF
  • JSON

What happens internally:

  • File is validated and parsed
  • Data is cleaned using Pandas
  • Text is chunked and embedded
  • Embeddings are stored in vector DB

Final requirements

About

I built an async FastAPI-based document intelligence system where users upload structured and unstructured files. The system processes data using Pandas, generates embeddings using LangChain, stores them in a vector database, and enables semantic search and Q&A using LLMs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors