Skip to content

DevVictor19/rag-nestjs

Repository files navigation

RAG NestJS

Application Processing Flow

This project is a web scraping and LLM-powered content analysis system built with NestJS. It scrapes articles from various sources, processes them using LLM (Language Learning Model), and stores them in a vector database for semantic search capabilities.

Application Flow

The application follows a multi-step processing flow as shown in the diagram above:

  1. CSV Upload & Initial Processing

    • Users upload a CSV file containing article URLs through the /scraper/upload-csv endpoint
    • The system processes the CSV and extracts URLs and sources
  2. Message Queue Processing

    • URLs are published to a RabbitMQ queue for asynchronous processing
    • This ensures reliable handling of large numbers of URLs
  3. Web Scraping

    • The system fetches content from each URL
    • HTML content is cleaned and converted to markdown format
    • Scraped data is stored in MongoDB
  4. LLM Processing

    • The scraped content is processed by the LLM (Google Gemini)
    • Content is analyzed and transformed into knowledge representations
  5. Vector Storage

    • Processed content is stored in the Qdrant vector database
    • This enables semantic search capabilities
  6. Query Processing

    • Users can interact with the system through the /agent endpoint
    • The system uses the stored vector embeddings to provide relevant responses

Features

  • Web scraping of articles
  • LLM-powered content analysis
  • Vector database integration for semantic search
  • RabbitMQ for message queuing
  • MongoDB for data storage
  • RESTful API endpoints

Prerequisites

  • Node.js (v16 or higher)
  • pnpm package manager
  • MongoDB
  • RabbitMQ
  • Qdrant vector database
  • Google Gemini API key

Installation

  1. Clone the repository:
git clone <repository-url>
cd develops-today-llm-challenge
  1. Install dependencies:
pnpm install
  1. Create a .env file based on .env.example:
cp .env.example .env
  1. Update the .env file with your configuration:
  • Set your Gemini API key
  • Configure MongoDB connection details
  • Set RabbitMQ URL
  • Configure Qdrant URL

Environment Variables

The following environment variables are required for the application to function properly:

LLM Configuration

  • GEMINI_API_KEY: Your Google Gemini API key for LLM operations

Database Configuration

  • DB_URI: MongoDB connection URI (e.g., mongodb://localhost:27017)
  • DB_NAME: Name of the MongoDB database
  • DB_USER: MongoDB username
  • DB_PASSWORD: MongoDB password
  • DB_PORT: MongoDB port (default: 27017)

Message Queue Configuration

  • RABBITMQ_URL: RabbitMQ connection URL (e.g., amqp://localhost)

Vector Database Configuration

  • QDRANT_URL: Qdrant vector database URL (e.g., http://localhost:6333)

Example .env file:

GEMINI_API_KEY=your_gemini_api_key
RABBITMQ_URL=amqp://localhost

DB_URI=mongodb://localhost:27017
DB_NAME=scraper
DB_USER=admin
DB_PASSWORD=admin
DB_PORT=27017

QDRANT_URL=http://localhost:6333

Running the Application

Development Mode

pnpm start:dev

Production Mode

pnpm build
pnpm start:prod

Using Docker

docker compose up -d

API Endpoints

Scraper Endpoint

  • POST /scraper/upload-csv - Upload a CSV file containing article URLs to be scraped
    • Accepts a multipart form-data with a 'file' field containing the CSV
    • CSV should contain 'URL' and 'Source' columns

Agent Endpoint

  • POST /agent - Generate a prompt using the LLM
    • Request body should contain a 'query' field with the text to process
    • Returns the generated prompt

Project Structure

  • src/scraper/ - Web scraping functionality
  • src/llm/ - Language Learning Model integration
  • src/vectors/ - Vector database operations
  • src/rabbitmq/ - Message queue handling
  • src/database/ - Database operations and models

Testing

Run the test suite:

pnpm test

Run tests with coverage:

pnpm test:cov

Contributing

  1. Fork the repository
  2. Create your feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

License

This project is licensed under the UNLICENSED license.

About

RAG implementation using Gemini, NestJS and LangChain.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors