Skip to content

Pranavsingh431/Inceptia_ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Inceptia AI - Startup India Assistant

Production-quality AI chatbot providing accurate, real-time information about Startup India policies, funding, registration, and benefits.

StartupGuru Demo

Python 3.11+ FastAPI Streamlit License: MIT

✨ Features

🎯 Intelligent Query Processing

  • RAG Pipeline: Retrieval-Augmented Generation with confidence scoring
  • Smart Intent Detection: Automatically categorizes queries (eligibility, funding, registration, etc.)
  • Fallback Handling: Graceful handling of off-topic or unclear queries
  • Session Management: Tracks conversations and maintains context

πŸ•·οΈ Advanced Web Scraping

  • Dynamic Content: Handles JavaScript-rendered pages with Playwright
  • PDF Processing: Extracts text from government documents using PyMuPDF
  • Content Cleaning: Removes navigation, footers, and irrelevant content
  • Structured Output: Organized by topics, sections, and metadata

🧠 Production-Quality AI

  • Vector Search: ChromaDB with sentence-transformers embeddings
  • LLM Integration: Groq API with llama3-8b-8192 for fast responses
  • Confidence Scoring: Evaluates answer reliability
  • Source Attribution: Links responses to original documents

🎨 Beautiful Interface

  • Modern UI: Gradient designs, animations, and responsive layout
  • Real-time Chat: Live conversation with typing indicators
  • Source Cards: Interactive source display with relevance scores
  • Quick Topics: Pre-defined questions for common queries
  • Admin Panel: System stats, reload functions, and debugging tools

πŸš€ Quick Start

Prerequisites

# Python 3.11 or lower required
python --version

# Install Playwright browsers
playwright install

1. Clone & Install

git clone <repository-url>
cd startupguru
pip install -r requirements.txt

2. Configure Environment

# Copy the example environment file
cp .env.example .env

# Edit .env and add your actual Groq API key
# Get your API key from: https://console.groq.com/
# Replace 'your_groq_api_key_here' with your actual key

3. Run Complete Pipeline

# Single command to scrape, process, and deploy
python startupguru_main.py pipeline

4. Deploy Application

# Start both backend and frontend
python startupguru_main.py deploy

Access Points:

πŸ“‹ Detailed Setup

Step-by-Step Installation

1. Environment Setup

# Create virtual environment
python -m venv startupguru_env
source startupguru_env/bin/activate  # Linux/Mac
# or
startupguru_env\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

2. Install Playwright

# Install browser drivers
playwright install chromium

3. Run Individual Components

Scraping Only:

python startupguru_main.py scrape --force

Processing Only:

python startupguru_main.py process --rebuild

API Server Only:

python startupguru_main.py serve --port 8000

Frontend Only:

python startupguru_main.py frontend --port 8501

4. Testing

# Test complete system
python startupguru_main.py test

# View system statistics
python startupguru_main.py stats

πŸ—οΈ Architecture

graph TB
    A[User Query] --> B[Streamlit Frontend]
    B --> C[FastAPI Backend]
    C --> D[Query Handler]
    D --> E[Intent Detection]
    D --> F[Vector Search]
    F --> G[ChromaDB]
    D --> H[LLM Processing]
    H --> I[Groq API]
    D --> J[Response Generation]
    J --> B
    
    K[Web Scraper] --> L[Document Processor]
    L --> M[Text Chunking]
    M --> N[Embeddings]
    N --> G
Loading

Core Components

  1. Smart Scraper (smart_scraper.py)

    • Playwright-based dynamic content scraping
    • PDF text extraction
    • Content cleaning and structuring
  2. Document Processor (document_processor.py)

    • LangChain text splitting
    • Sentence-transformers embeddings
    • ChromaDB vector storage
  3. Query Handler (query_handler.py)

    • RAG pipeline implementation
    • Confidence scoring
    • Fallback mechanisms
  4. FastAPI Backend (startupguru_api.py)

    • RESTful API endpoints
    • Background task processing
    • Comprehensive error handling
  5. Streamlit Frontend (startupguru_app.py)

    • Modern chat interface
    • Real-time system monitoring
    • Interactive source display

πŸ“Š Tech Stack

Backend

  • Framework: FastAPI 0.104+
  • LLM: Groq API (llama3-8b-8192)
  • Embeddings: sentence-transformers (all-MiniLM-L6-v2)
  • Vector DB: ChromaDB 0.4.18+
  • Text Processing: LangChain 0.1+

Scraping

  • Browser Automation: Playwright 1.40+
  • PDF Processing: PyMuPDF 1.23+
  • HTML Parsing: BeautifulSoup4 4.12+

Frontend

  • UI Framework: Streamlit 1.28+
  • Styling: Custom CSS with gradients and animations
  • Charts: Plotly (optional)

Infrastructure

  • Logging: Loguru
  • CLI: Click
  • Environment: python-dotenv

πŸ”§ API Reference

Chat Endpoint

POST /chat
Content-Type: application/json

{
  "message": "What is Startup India?",
  "session_id": "optional_session_id",
  "include_debug": false
}

Response:

{
  "response": "Startup India is a flagship initiative...",
  "confidence": 0.89,
  "sources": [
    {
      "title": "Startup India Initiative",
      "url": "https://www.startupindia.gov.in/...",
      "topic": "general",
      "similarity": 0.92
    }
  ],
  "topic_detected": "startup_definition",
  "processing_time": 1.23,
  "session_id": "session_123"
}

Other Endpoints

  • GET /health - System health check
  • GET /stats - System statistics
  • POST /scrape - Start background scraping
  • POST /process - Start document processing
  • POST /reload - Full system reload
  • GET /search - Direct document search

πŸ“ˆ Performance

Benchmarks

  • Response Time: ~1-3 seconds per query
  • Throughput: 50+ concurrent requests
  • Accuracy: 85%+ confidence on domain queries
  • Scraping Speed: ~100 pages in 10 minutes

Optimizations

  • βœ… Sentence-transformers for fast embeddings
  • βœ… ChromaDB for efficient vector search
  • βœ… Groq API for ultra-fast LLM responses
  • βœ… Background task processing
  • βœ… Caching and session management

🎯 Sample Queries

General Information

  • "What is Startup India initiative?"
  • "How does Startup India help entrepreneurs?"

Registration & Process

  • "How to register a startup in India?"
  • "What is DPIIT recognition process?"
  • "Steps to get startup certification"

Eligibility & Criteria

  • "What are the eligibility criteria for startups?"
  • "Who can apply for Startup India benefits?"
  • "Age limit for startup recognition"

Funding & Support

  • "What funding options are available?"
  • "Seed fund scheme details"
  • "Government grants for startups"

Tax Benefits

  • "Tax exemptions for startups"
  • "Income tax benefits under Startup India"
  • "Angel tax provisions"

Women Entrepreneurs

  • "Support for women entrepreneurs"
  • "Women-specific startup schemes"

πŸ” Monitoring & Debugging

System Statistics

# View comprehensive stats
python startupguru_main.py stats

Query Logs

All queries are logged to ./logs/query_log.csv with:

  • Timestamp
  • Query text
  • Response
  • Confidence score
  • Processing time
  • Topic detected

Debug Mode

Enable debug information in API calls:

{
  "message": "your query",
  "include_debug": true
}

πŸš€ Deployment

Local Development

# Development mode with auto-reload
python startupguru_main.py serve --reload
python startupguru_main.py frontend

Production Deployment

Docker (Recommended)

# Coming soon - Dockerfile for containerized deployment

Manual Deployment

# Production server
python startupguru_main.py deploy

Cloud Deployment

  • Vercel: Deploy Streamlit frontend
  • Railway: Deploy FastAPI backend
  • Heroku: Full-stack deployment

🀝 Contributing

Development Setup

# Clone repository
git clone <repo-url>
cd startupguru

# Install development dependencies
pip install -r requirements.txt
pip install black flake8 pytest

# Run tests
python startupguru_main.py test

# Format code
black .

Adding New Features

  1. Fork the repository
  2. Create feature branch
  3. Add tests
  4. Submit pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Startup India: Official government initiative data
  • Groq: Fast LLM API infrastructure
  • LangChain: Document processing framework
  • ChromaDB: Vector database solution
  • Streamlit: Beautiful web app framework

πŸ“ž Support

  • Issues: GitHub Issues
  • Documentation: This README + API docs
  • Community: [Link to community/discord if available]

Built with ❀️ for the Startup India ecosystem

Star this repo Follow us

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors