industrial-rag-pipeline

industrial-rag-pipeline is a Retrieval-Augmented Generation (RAG) pipeline designed for Electrical Motor QA. It integrates document extraction, indexing, retrieval, and generation capabilities using Elasticsearch and Google Generative AI.

Overview

This project provides a pipeline for processing PDF documents, indexing their content into Elasticsearch, and enabling semantic search and question-answering capabilities. It uses Google Generative AI for embedding generation and response generation.

Features

PDF Extraction: Extracts text from PDF files and splits it into chunks for indexing.
Elasticsearch Integration: Indexes document chunks and performs semantic search using vector embeddings.
RAG Agent: Combines retrieved documents with generative AI to answer user queries.
FastAPI: Provides RESTful endpoints for document indexing and question answering.
Streamlit Playground: Interactive interface for testing the RAG pipeline.
Logging: Color-coded logging for better debugging and monitoring.

Project Structure

.
├── api/                # FastAPI endpoints for the RAG pipeline
├── app/                # Core application logic (pipeline, prompts, schemas, utils)
├── notebooks/          # Jupyter notebooks for evaluation and experimentation
├── tests/              # Unit and integration tests
├── playground.py       # Streamlit app for interactive RAG testing
├── Makefile            # Automation scripts for development and testing
├── .env                # Environment variables (not included in version control)
├── pyproject.toml      # Project dependencies and metadata
└── README.md           # Project documentation

Installation

Prerequisites

Python 3.12 or higher
uv package manager
Elasticsearch instance
Google Generative AI API access

Steps

Install uv (if not already installed):

# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or via pip
pip install uv

Clone the repository:

git clone <repository-url>
cd industrial-rag-pipeline

Create and activate virtual environment with uv:

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:
```
uv sync
```
Set up environment variables in .env (see Environment Variables).

Usage

Running the API

Start the FastAPI server:

make dev

Or directly with uv:

uv run uvicorn api.main:app --reload --host 0.0.0.0 --port 8000

Access the API documentation:
- Swagger UI: http://127.0.0.1:8000/docs
- ReDoc: http://127.0.0.1:8000/redoc

Running the Playground

Start the Streamlit playground:
```
make playground
```
Or directly with uv:
```
uv run streamlit run playground.py
```
Open the playground in your browser:
- http://localhost:8501

Testing

Run all tests using the Makefile:

make test

Run specific tests:

Local integration tests:

make test-local
# Or: uv run pytest tests/local/

API tests:

make test-api
# Or: uv run pytest tests/api/

Development

Adding Dependencies

Add new dependencies using uv:

# Add a regular dependency
uv add package-name

# Add a development dependency
uv add --dev package-name

# Add from a specific source
uv add "package-name>=1.0.0"

Updating Dependencies

# Update all dependencies
uv sync --upgrade

# Update a specific package
uv add --upgrade package-name

Running Scripts

Use uv run to execute scripts within the project environment:

# Run Python scripts
uv run python scripts/your_script.py

# Run CLI tools
uv run black .
uv run isort .
uv run mypy .

Environment Variables

The project uses a .env file to manage sensitive configurations. Below are the required variables:

# Google API Key
GEMINI_API_KEY="your-google-api-key"

# Elasticsearch configs
ELASTIC_SEARCH_API_KEY="your-elasticsearch-api-key"
ELASTIC_SEARCH_URL="your-elasticsearch-url"

# Optional
INDEX_NAME="your-default-index-name"

Endpoints

Health Check

GET /health
- Returns the health status of the API.

Document Indexing

POST /documents/
- Uploads and indexes PDF documents into Elasticsearch.

Question Answering

POST /question/
- Generates answers to user queries using the RAG pipeline.

Why UV?

This project uses uv as the Python package manager for several advantages:

Speed: 10-100x faster than pip for dependency resolution and installation
Reliability: Consistent dependency resolution with lockfile support
Simplicity: Single tool for virtual environments, dependency management, and script running
Modern: Built with Rust for performance and reliability
Compatibility: Works with existing pyproject.toml and requirements.txt files

If you prefer using pip, you can still generate a requirements file:

uv export --format requirements-txt --output-file requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

industrial-rag-pipeline

Table of Contents

Overview

Features

Project Structure

Installation

Prerequisites

Steps

Usage

Running the API

Running the Playground

Testing

Development

Adding Dependencies

Updating Dependencies

Running Scripts

Environment Variables

Endpoints

Health Check

Document Indexing

Question Answering

Why UV?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
api		api
app		app
notebooks		notebooks
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
main.py		main.py
playground.py		playground.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

industrial-rag-pipeline

Table of Contents

Overview

Features

Project Structure

Installation

Prerequisites

Steps

Usage

Running the API

Running the Playground

Testing

Development

Adding Dependencies

Updating Dependencies

Running Scripts

Environment Variables

Endpoints

Health Check

Document Indexing

Question Answering

Why UV?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages