Skip to content

AI-Powered PDF Helper: Interact, analyze, and chat with your PDFs using an advanced Hybrid AI Architecture (Google Generative AI + Qwen2-VL-2B-Instruct). Features include smart summarization, multimodal analysis, note management, and a modern Next.js/Express.js UI. Privacy-first with local Qwen2-VL model deployment.

Notifications You must be signed in to change notification settings

govindmehta/PdfHelperAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

PDF Helper AI ๐Ÿ“„๐Ÿค–

A modern, AI-powered PDF management and interaction platform built with Next.js and Express.js. Upload, analyze, and chat with your PDF documents using advanced AI technology.

โœจ Features

  • ๐Ÿš€ AI-Powered PDF Chat: Interact with your PDF documents using advanced AI
  • ๐Ÿง  Qwen2-VL-2B-Instruct Model: Local vision-language model for advanced PDF understanding
  • ๐Ÿค– Multi-Model AI Integration: Google Generative AI + Qwen2-VL for optimal performance
  • ๐Ÿ“ฑ Modern UI/UX: Beautiful, responsive design with glass morphism effects
  • ๐Ÿ” Smart Analysis: Extract insights and analyze PDF content with AI
  • ๐Ÿ“ Note Management: Create and manage notes from your PDFs
  • ๐Ÿ–ผ๏ธ Image Extraction: Extract and analyze images from PDF documents
  • ๐Ÿ’ฌ Conversational AI: Natural language processing for interactive discussions
  • ๐ŸŽฏ Context-Aware Responses: AI maintains conversation history for meaningful interactions
  • ๐Ÿ“Š Automated Summarization: Generate intelligent summaries of PDF content
  • ๐Ÿ” User Authentication: Secure user management with JWT
  • ๐Ÿ“Š Dashboard: Centralized management of all your PDFs and notes
  • ๐ŸŒ RESTful API: Well-documented API with Swagger/OpenAPI

๐Ÿค– AI & Machine Learning

Generative AI Integration

Our PDF Helper AI leverages cutting-edge generative AI technologies to provide intelligent document interaction:

  • ๐Ÿง  Qwen2-VL-2B-Instruct: Advanced vision-language model deployed locally for superior PDF understanding and multimodal analysis
  • ๐ŸŒŸ Google Generative AI (Gemini): Integrated for advanced reasoning, content generation, and multi-modal understanding
  • ๐Ÿ”„ Hybrid AI Architecture: Combines the power of cloud-based GenAI with local custom models for optimal performance and privacy

Qwen2-VL-2B-Instruct Model Features

  • ๐Ÿ‘๏ธ Vision-Language Understanding: Specialized model capable of processing both text and visual content from PDFs
  • ๐Ÿ“„ Document Analysis: Optimized for document understanding with 2B parameters for efficient local inference
  • ๐Ÿ–ผ๏ธ Image Comprehension: Advanced visual reasoning capabilities for charts, diagrams, and images within PDFs
  • ๐Ÿ’ก Instruction Following: Fine-tuned for following complex instructions and providing detailed responses
  • โšก Lightweight Architecture: 2B parameter model optimized for local deployment with minimal resource requirements
  • ๐Ÿ”’ Privacy-First: Runs entirely offline, ensuring document confidentiality and data security

Custom Model Features

  • ๐Ÿ“š Vision-Language Processing: Qwen2-VL-2B-Instruct model trained for comprehensive document understanding
  • ๐Ÿ  Local Deployment: Runs entirely on-premise for maximum privacy and data security
  • โšก Optimized Inference: GPU-accelerated processing with model quantization for fast responses
  • ๐Ÿ”’ Privacy-First: All document processing happens locally, ensuring confidentiality
  • ๐ŸŽฏ Multimodal Understanding: Enhanced capability for processing text, images, charts, and diagrams in PDFs
  • ๐Ÿ“Š Efficient Architecture: 2B parameter model provides excellent performance with minimal resource usage

AI-Powered Features

  • ๐Ÿ’ฌ Intelligent Conversations: Natural language interface for document queries and analysis
  • ๐Ÿ“Š Smart Summarization: Automatic generation of key insights and executive summaries
  • ๐Ÿ” Semantic Search: Advanced content discovery using vector embeddings and similarity matching
  • ๐Ÿ–ผ๏ธ Multimodal Analysis: Process both text and images within PDFs using OCR and vision models
  • ๐ŸŽจ Content Generation: Create structured notes, outlines, and reports from PDF content
  • ๐Ÿ”ฎ Predictive Analysis: AI suggests relevant questions and topics based on document context
  • ๐Ÿ“ˆ Performance Optimization: Continuous model improvement through feedback loops and usage analytics

Technical Implementation

  • Qwen2-VL-2B-Instruct: Local vision-language model for document understanding and analysis
  • LM Studio SDK: Local model management and inference optimization
  • Redis Vector Store: Efficient storage and retrieval of document embeddings
  • Custom Training Pipeline: Automated model fine-tuning and deployment workflow
  • API Gateway: Seamless integration between multiple AI models and services

๐Ÿ› ๏ธ Tech Stack

Frontend

  • Next.js 15 - React framework with App Router
  • TypeScript - Type-safe JavaScript
  • Tailwind CSS - Utility-first CSS framework
  • Lucide React - Beautiful icons
  • Zustand - State management
  • React Hot Toast - Notifications

Backend

  • Express.js - Web framework for Node.js
  • MongoDB - NoSQL database with Mongoose ODM
  • Google Generative AI - AI integration
  • Redis - Caching and session management
  • JWT - Authentication
  • Multer - File upload handling
  • PDF-Parse - PDF text extraction
  • Tesseract.js - OCR for image text extraction

๐Ÿš€ Getting Started

Prerequisites

  • Node.js 18+
  • MongoDB database
  • Redis server
  • Google Generative AI API key

Installation

  1. Clone the repository

    git clone https://github.com/govindmehta/pdfHelper.git
    cd pdfHelper
  2. Backend Setup

    cd backend
    npm install

    Create a .env file in the backend directory:

    PORT=5000
    MONGODB_URI=mongodb://localhost:27017/pdfhelper
    JWT_SECRET=your-jwt-secret-key
    GOOGLE_API_KEY=your-google-generative-ai-key
    REDIS_URL=redis://localhost:6379
  3. Frontend Setup

    cd ../frontend
    npm install

    Create a .env.local file in the frontend directory:

    NEXT_PUBLIC_API_URL=http://localhost:5000

Running the Application

  1. Start the backend server

    cd backend
    npm run dev
  2. Start the frontend development server

    cd frontend
    npm run dev
  3. Access the application

๐Ÿ“ Project Structure

pdfhelper/
โ”œโ”€โ”€ backend/
โ”‚   โ”œโ”€โ”€ config/           # Configuration files
โ”‚   โ”œโ”€โ”€ controllers/      # Route controllers
โ”‚   โ”œโ”€โ”€ middlewares/      # Custom middleware
โ”‚   โ”œโ”€โ”€ models/          # Database models
โ”‚   โ”œโ”€โ”€ routes/          # API routes
โ”‚   โ”œโ”€โ”€ services/        # Business logic
โ”‚   โ”œโ”€โ”€ utils/           # Utility functions
โ”‚   โ”œโ”€โ”€ uploads/         # File uploads
โ”‚   โ””โ”€โ”€ server.js        # Main server file
โ”œโ”€โ”€ frontend/
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ app/         # Next.js app router
โ”‚   โ”‚   โ”œโ”€โ”€ components/  # React components
โ”‚   โ”‚   โ””โ”€โ”€ lib/         # Utility libraries
โ”‚   โ”œโ”€โ”€ public/          # Static assets
โ”‚   โ””โ”€โ”€ package.json
โ””โ”€โ”€ README.md

๐Ÿ”ง API Endpoints

Authentication

  • POST /api/users/register - Register new user
  • POST /api/users/login - User login
  • GET /api/users/profile - Get user profile

PDF Management

  • POST /api/pdfs/upload - Upload PDF file
  • GET /api/pdfs - Get user's PDFs
  • GET /api/pdfs/:id - Get specific PDF
  • DELETE /api/pdfs/:id - Delete PDF

AI Chat

  • POST /api/ai/chat - Chat with PDF content
  • GET /api/ai/conversation/:pdfId - Get conversation history

Notes

  • POST /api/notes - Create note
  • GET /api/notes - Get user's notes
  • PUT /api/notes/:id - Update note
  • DELETE /api/notes/:id - Delete note

๐ŸŽจ UI Components

  • Landing Page: Modern hero section with gradient animations
  • Dashboard: Glass morphism design with PDF management
  • Chat Interface: Real-time AI conversation with structured responses
  • Authentication: Clean login/register forms
  • AI Response: Structured content parsing with icons and formatting

๐Ÿ”’ Security Features

  • JWT-based authentication
  • Input validation and sanitization
  • File upload restrictions
  • CORS configuration
  • Rate limiting (recommended for production)

๐Ÿ“ฑ Responsive Design

  • Mobile-first approach
  • Breakpoint-specific layouts
  • Touch-friendly interactions
  • Optimized performance

๐Ÿš€ Deployment

Backend Deployment

  1. Set up MongoDB Atlas or your preferred database
  2. Configure Redis instance
  3. Set environment variables
  4. Deploy to your preferred platform (Heroku, AWS, etc.)

Frontend Deployment

  1. Build the application: npm run build
  2. Deploy to Vercel, Netlify, or your preferred platform
  3. Configure environment variables

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the ISC License.

๐Ÿ™ Acknowledgments

  • Google Generative AI for powerful AI capabilities
  • The open-source community for amazing tools and libraries
  • Contributors who help improve this project

๐Ÿ“ž Support

For support, please create an issue in the GitHub repository or contact the maintainers.


Made with โค๏ธ by Govind Mehta

About

AI-Powered PDF Helper: Interact, analyze, and chat with your PDFs using an advanced Hybrid AI Architecture (Google Generative AI + Qwen2-VL-2B-Instruct). Features include smart summarization, multimodal analysis, note management, and a modern Next.js/Express.js UI. Privacy-first with local Qwen2-VL model deployment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published