RAG LLM Application

A Retrieval-Augmented Generation (RAG) application using LLM models and vector stores.

Python Version

3.11

Key Components

Embeddings: Uses LlamaCppEmbeddings - requires GGUF format models (quantized llama.cpp models)
LLM: Uses LlamaCpp - requires GGUF format models
Vector Store: FAISS (model-agnostic). Using faiss-cpu package
RAG Pipeline: LangChain's RetrievalQA (works with any LangChain-compatible LLM)

Tested Models

The application has been tested with the following models:

Embeddings: bge-small-en-v1.5-q4_k_m.gguf
LLM: llama-2-7b.Q4_K_M.gguf

Apple Silicon Support

Not supported.

Supported File Types

The application supports the following file types:

PDF (*.pdf)
CSV (*.csv)
JSON (*.json)
HTML (*.html)
Text (*.txt)

Configuration

The application can be configured using environment variables:

Basic Configuration

SOURCE_DIR: Directory containing the source files to process
SOURCE_TYPE: Type of source files to process (pdf, csv, json, html, txt)
EMBEDDINGS_MODEL_PATH: Path to the embeddings model
LLM_MODEL_PATH: Path to the LLM model
FLASK_PORT: Port for the Flask API server

LangSmith Monitoring (Optional)

LANGCHAIN_TRACING_V2: Set to "true" to enable LangSmith tracing
LANGCHAIN_ENDPOINT: LangSmith API endpoint (default: https://api.smith.langchain.com)
LANGCHAIN_API_KEY: Your LangSmith API key
LANGCHAIN_PROJECT: LangSmith project name for organizing traces

LLM Model Parameters

CPU_THREADS: Number of CPU threads to use (default: 6, recommended: match your CPU cores)
GPU_LAYERS: Number of layers to offload to GPU (-1 means all)
GPU_BATCH_SIZE: Batch size for processing multiple inputs at once (default: 256)
N_CTX: Context window size/token limit (default: 4096, should match your LLM model's context size)
TEMPERATURE: Controls randomness in generation (0.0 = deterministic, higher = more random)
MAX_TOKENS: Maximum number of tokens to generate in responses (default: 2000)
TOP_P: Nucleus sampling probability threshold (default: 0.9)
TOP_K: Limits vocabulary to top K tokens (default: 40)
REPEAT_PENALTY: Penalizes token repetition (default: 1.3, higher = less repetition)
GRAMMAR_PATH: Optional path to JSON grammar for structured output

Usage

Docker Compose

The easiest way to run the application is using Docker Compose:

Make sure you have a .env file in the project root directory with all required environment variables (see Configuration section above).

cp .env.example .env

Run the application:

docker-compose up

The Docker container will use the environment variables from your .env file. Make sure this file exists before running the container.

Running Directly

You can also run the application directly:

Install dependencies:

pip install -r requirements.txt

Create a .env file based on .env.example and configure your environment variables.
Run the application:

python -m rag_app.run

To force recreation of the vector store:

python -m rag_app.run --force-create

API

The application provides a simple API to query the vector store. You can access the interactive Swagger documentation at:

http://localhost:8080/api/docs

The API endpoints are:

POST /api/query

Request body:

{
  "question": "your question here"
}

Example:

curl -X POST "http://localhost:8080/api/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the main topic of the document?"}'

Response:

{
  "data": {
    "answer": "string",
    "question": "string"
  }
}

Testing

Run the tests with:

python -m unittest discover tests

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
rag_app		rag_app
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
openapi.yaml		openapi.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG LLM Application

Python Version

Key Components

Tested Models

Apple Silicon Support

Supported File Types

Configuration

Basic Configuration

LangSmith Monitoring (Optional)

LLM Model Parameters

Usage

Docker Compose

Running Directly

API

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG LLM Application

Python Version

Key Components

Tested Models

Apple Silicon Support

Supported File Types

Configuration

Basic Configuration

LangSmith Monitoring (Optional)

LLM Model Parameters

Usage

Docker Compose

Running Directly

API

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages