Legal documents are often a labyrinth of complex jargon, creating a significant information asymmetry that can lead to unforeseen financial and legal risks for individuals and small businesses. Our solution, Demystify Legal Documents with Generative AI, directly addresses this critical challenge.
We've developed an AI-powered platform that acts as a reliable first point of contact, simplifying complex legal documents into clear, accessible guidance. Our goal is to empower users to make informed decisions and protect themselves, providing a private, safe, and supportive environment.
- Bridging the Information Gap: Transforms opaque legal language into understandable insights.
- Risk Mitigation: Helps users avoid unknowingly agreeing to unfavorable terms.
- Accessibility: Makes essential legal information available to everyone, from everyday citizens to small business owners.
- Empowerment: Provides the knowledge needed for informed decision-making.
Our Minimum Viable Product (MVP) focuses on two core functionalities:
- Instant Summarization: Get a quick, clear overview of any uploaded legal document.
- AI-Powered Q&A: Ask specific questions and receive grounded answers, leveraging both your document's content and relevant constitutional laws.
- Intelligent Document Ingestion: Seamlessly upload various legal document formats (PDF, DOCX, TXT) with intelligent text and structure extraction via Vertex AI Document AI.
- Contextual Summarization: Obtain immediate, concise, and easy-to-understand summaries of complex legal texts powered by Gemini 1.5 Pro.
- AI-Powered Q&A (RAG): Engage in a natural language chat to ask specific questions, receiving accurate answers grounded in your uploaded document and relevant constitutional laws, utilizing a robust Retrieval-Augmented Generation (RAG) pipeline.
- Persistent Chat History: Securely save and review all past conversations within the app, ensuring continuity and easy access to previous insights (powered by Firestore).
- Secure Authentication: User authentication powered by Google OAuth for secure and seamless access.
- End-to-End Google Cloud Platform (GCP) Integration: Leveraging a suite of GCP services for scalability, reliability, and security from frontend to AI processing.
- Multilingual Support (Future): Planned integration with Google Cloud Translation API to offer communication in multiple languages.
Our solution is built on a modern, scalable, and robust technology stack:
- Angular: A powerful framework for building dynamic and responsive user interfaces.
- FastAPI: A high-performance, easy-to-use Python web framework for building our robust API backend.
- Postman: Used for efficient API testing and development during the entire lifecycle.
- Vertex AI Document AI: For intelligent document parsing, OCR, and information extraction.
- Gemini 1.5 Pro (LLM): The core Large Language Model for advanced text summarization and generating responses within the RAG pipeline.
- Vertex AI Vector Search (formerly Matching Engine): A highly scalable vector database for efficient storage and semantic retrieval of document embeddings.
- Google Cloud Storage: Secure and scalable object storage for housing original user-uploaded legal documents.
- Firestore: A flexible, scalable NoSQL document database used for managing user profiles and persisting chat history.
- Cloud Identity & Access Management (IAM) / OAuth: For secure user authentication and granular access control across GCP services.
- Cloud Translation API: (Planned for future) For enabling multilingual capabilities.
- LangChain: An indispensable framework for orchestrating the Retrieval-Augmented Generation (RAG) pipeline, including text chunking and managing interactions with LLMs and vector databases.
text-embedding-004: The specific embedding model used to convert text chunks and user queries into numerical vectors for semantic search.
This project consists of two main parts: the frontend (Angular) and the server (FastAPI).
- Node.js (LTS version) & npm
- Python 3.11+ & pip
- Docker (if deploying locally or using Cloud Run)
- Google Cloud SDK (
gcloudCLI) configured with your project. - A Google Cloud Project with billing enabled and necessary APIs enabled (Vertex AI, Document AI, Cloud Storage, Firestore, Cloud Run, Artifact Registry, etc.).
- Service accounts with appropriate roles for your GCP services.
- Clone the repository:
git clone [https://github.com/varunpareek690/demystifyDocs.git] (https://github.com/varunpareek690/demystifyDocs.git) cd demystifyDocs/server - Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate` pip install -r requirements.txt
- Configure environment variables:
Create a
.envfile in theserver/directory based on.env.example.# Example .env content: GOOGLE_PROJECT_ID="your-gcp-project-id" # ... other API keys/configurations for Document AI, Gemini, etc. # Ensure your gcloud authentication is set up for your local environment - Run the FastAPI server locally:
The API should be accessible at
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
http://localhost:8000.
-
Navigate to the frontend directory:
cd ../frontend # Assuming frontend is a sibling directory
-
Install npm dependencies:
npm install
-
Run the Angular development server:
ng serve --open
The Angular app should open in your browser, typically at
http://localhost:4200.- See
frontend/README.mdto setup Angular
- See
To deploy your FastAPI backend to Google Cloud Run, follow these steps from your RAG_Service/ directory:
This project uses two main scripts—ingest.py and query.py—to interact with Google Cloud services. Follow these steps to set up your environment and run the scripts.
- Python 3.11+ installed.
- Google Cloud SDK (
gcloudCLI) installed and configured with your GCP project. You must authenticate your local environment by runninggcloud auth loginandgcloud config set project <YOUR_PROJECT_ID>. - Ensure billing is enabled on your GCP project and that all necessary APIs are enabled.
First, you need to set up your Python environment and install the required dependencies. Navigate to your project's root directory.
-
Create a Python virtual environment to isolate your project's dependencies:
python -m venv venv
-
Activate the virtual environment:
- On macOS/Linux:
source venv/bin/activate - On Windows:
venv\Scripts\activate
- On macOS/Linux:
-
Install the required libraries from
requirements.txt:pip install -r requirements.txt
Now that your environment is set up, you can execute the scripts.
-
Run
ingest.py: This script is responsible for the document ingestion and embedding process. It will connect to the Document AI and Vector Search services to prepare your data.python ingest.py
This script will perform tasks like extracting text, creating embeddings, and storing them in your vector search database.
-
Run
query.py: After the data is ingested, you can run this script to interact with your RAG model. It will take a user query, find relevant information, and generate a response using the Gemini API.python query.py
This script handles the conversational aspect, demonstrating the core Q&A functionality of your application.
Each of these files is designed to connect directly to the respective GCP services you've enabled in your project, allowing you to run the full workflow locally.