RAG Setup Guide

Overview

This repository provides a complete setup for deploying a retrieval augmented generation system, including:

Elasticsearch schema setup and migrations.
Redis user setup and ACL configuration.
Automatic Elasticsearch snapshots using cron jobs.
Docker-based infrastructure to run embedding processes as well as Streamlit UI and Telegram for interaction.

Prerequisites

Before starting, ensure you have the following installed:

Docker & Docker Compose
SSH access to the remote server ( DigitalOcean Droplet or similar environment ).
Access to GitHub secrets for environment variables like REDIS_USERNAME, REDIS_PASSWORD, etc.

Decomposed Scripts

1. Create Directories and Set Permissions

Script: /scripts/core/setup_directories_and_permissions.sh

This script creates the necessary directories for Elasticsearch, Redis, Neo4j, Chroma and sets proper permissions.

Commands Executed:

Creates required directories for Elasticsearch, Redis, Neo4j, etc.
Sets chmod 777 permissions for critical directories.
Changes ownership to user 1000:1000 where needed.
Executes the following scripts:
- /scripts/redis/setup_redis_acl.sh
- /scripts/elasticsearch/setup_elasticsearch.sh
- /scripts/elasticsearch/setup_elasticsearch_cron.sh

2. Setup Elasticsearch Schema and Migrations

Script: /scripts/elasticsearch/setup_elasticsearch.sh

This script checks and runs the Elasticsearch schema setup and applies the required migrations.

3. Setup Elasticsearch Cron Job

Script: /scripts/elasticsearch/setup_elasticsearch_cron.sh

This script installs a cron job that takes snapshots of Elasticsearch data every hour.

4. Setup Redis ACL

Script: /scripts/redis/setup_redis_acl.sh

This script configures Redis users and ACL by creating the file /usr/local/etc/redis/users.acl and setting up username and password in the environment files.

Running the Setup

Deploy locally:

Step 1: Configure Environment Files

Create files .env.production, .env.api.production and env.ui.production.

Use the env.production.copy, .env.api.production.copy and .env.api.production.copy files as guidance to structure your environment configuration.

Important:

Embedding the whole documentation might last between 6 - 12h per index.

In case you would like to give it relatively quick try, modify:

documents_directory = "../data/documentation_optimal/"

to

documents_directory = "../data/documentation_optimal/test"

in multi_representation_indexing.py

as well as

folders = ['decision-system', 'habitat-system', 'lifestyle-system', 'material-system', 'project-execution', 'project-plan','social-system', 'system-overview']

to

folders = ['test1']

in create_raptor_indexing_langchain.py

In case you decide to create the indexes via embeddings with different data set, you'll need to manually change name of the collection:

chroma_collection_name = "MRITESTTTTTTTTTTT4" redis_namespace = "parent-documents-MRITESTTTTTTTTTTT4"

in multi_representation_indexing.py

and

chroma_collection_name = "raptor-locll-test12"

in create_raptor_indexing_langchain.py

Docker Configuration

Step 1: Run the MRI Indexing

Open docker-compose.yaml and navigate to the app service/section. You'll need to modify it depending on which indexing process or UI you want to run.

Comment out all other commands in the app section/service of docker-compose.yaml except:

command: python ./modules/multi_representation_indexing.py

Build and run the Docker containers:

make build

Wait until the embeddings process finishes. You should see the following log:

Created MRI embeddings for complete documentation...

Bring down the Docker containers:

make stop

Step 2: Run the RAPTOR Indexing

Comment out all other commands in the app service/section of docker-compose.yaml except:

command: python ./modules/create_raptor_indexing_langchain.py

Build and run the Docker containers again:

make build

Wait for the process to complete. The console will log:

Created RAPTOR embeddings for complete documentation.

Bring down the Docker containers:

make stop

Step 3: Create telegram bot

Create telegram bot and save token in both .env.production and .env.api.production under TELEGRAM_DEVELOPMENT_INTEGRATION_TOKEN. This will enable you to interact with the bot via telegram

Step 4: Comment out the app service in `docker-compose.yaml` in order to save on resources

There are issues with streamlit that needs to be fixed, so this is only temporary.

Step 5: Final

Once the embeddings processes are finished, make sure to uncomment api and telegram_bot services inside docker-compose.yaml so you can test it out.

Finally run make build

Deploy remotely ( via Github actions ):

1. Fork the repository

Go to the Repository:
Navigate to the repository you want to fork on GitHub.
Fork the Repository:
In the top-right corner, click Fork. GitHub will create a copy of the repository under your account.
Set Up GitHub Secrets in the Fork:
After forking, go to your forked repository and follow the steps above to create your own secrets for deployment.

By forking the repository and setting up your own secrets, you can customize and deploy the solution to your remote servers.

2. Creating GitHub Secrets

Navigate to Your Repository:
Open the repository page on GitHub.
Open Settings:
Click the Settings tab at the top of the repository.
Access Secrets:
In the sidebar under Security, click Secrets and variables > Actions.
Add a New Secret:
Click New repository secret.
Name the Secret:
Enter a name using uppercase letters and underscores, e.g., API_KEY, DB_PASSWORD.
Add the Secret Value:
Paste the sensitive value (e.g., API key, token) in the Secret field.
Save the Secret:
Click Add secret to save it.

List of Necessary Secrets to Add

DROPLET_IP_ADDRESS=XXXXXXXXXXXXXXXXXXXX (The public IP address of the remote server, used to establish a connection for deployment and management.)
SSH_PRIVATE_KEY=XXXXXXXXXXXXXXXXXXXX (This is the private key component of an SSH key pair. It must match the public key that has been added to the remote server's authorized keys, allowing secure authentication and access to the server via SSH.)
HUGGING_FACE_INFERENCE_ENDPOINT=XXXXXXXXXXXXXXXXXXXX
HUGGING_FACE_API_KEY=XXXXXXXXXXXXXXXXXXXX
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=XXXXXXXXXXXXXXXXXXXX
NEO4J_URL=bolt://neo4j:7687
NEO4J_DATABASE=neo4j
CHROMA_URL=chromadb
CHROMA_PORT=8000
ELASTIC_SCHEME=http
ELASTIC_URL=elasticsearch
ELASTIC_PORT=9200
OPENAI_API_KEY=XXXXXXXXXXXXXXXXXXXX
NEBULA_URL=graphd
NEBULA_PORT=9669
NEBULA_USERNAME=XXXXXXXXXXXXXXXXXXXX
NEBULA_PASSWORD=XXXXXXXXXXXXXXXXXXXX
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_USERNAME1=XXXXXXXXXXXXXXXXXXXX
REDIS_PASSWORD1=XXXXXXXXXXXXXXXXXXXX
REDIS_USERNAME2=XXXXXXXXXXXXXXXXXXXX
REDIS_PASSWORD2=XXXXXXXXXXXXXXXXXXXX
COHERE_API_KEY=XXXXXXXXXXXXXXXXXXXX
ENV=production
TELEGRAM_DEVELOPMENT_INTEGRATION_TOKEN=XXXXXXXXXXXXXXXXXXXX
API_URL=http://api:5000
GROQ_API_KEY=XXXXXXXXXXXXXXXXXXXX
DATABASE_URL=XXXXXXXXXXXXXXXXXXXX
LANGFUSE_PORT=3000
NEXTAUTH_SECRET=XXXXXXXXXXXXXXXXXXXX
NEXTAUTH_URL=XXXXXXXXXXXXXXXXXXXX
SALT=XXXXXXXXXXXXXXXXXXXX
ENCRYPTION_KEY=XXXXXXXXXXXXXXXXXXXX
POSTGRES_DB=XXXXXXXXXXXXXXXXXXXX
POSTGRES_USER=XXXXXXXXXXXXXXXXXXXX
POSTGRES_PASSWORD=XXXXXXXXXXXXXXXXXXXX

By forking the repository and setting up these secrets, you can customize and deploy the solution to your remote servers.

Name		Name	Last commit message	Last commit date
Latest commit History 785 Commits
.github/workflows		.github/workflows
chromadb		chromadb
data		data
env		env
etc/jupyter/nbconfig/notebook.d		etc/jupyter/nbconfig/notebook.d
modules		modules
neo4j		neo4j
scripts		scripts
.env.api.production.copy		.env.api.production.copy
.env.production.copy		.env.production.copy
.env.ui.production.copy		.env.ui.production.copy
.gitignore		.gitignore
Dockerfile		Dockerfile
DockerfileStagedBuild		DockerfileStagedBuild
Makefile		Makefile
create_knowledge_graph_previous.py		create_knowledge_graph_previous.py
docker-compose.development.yaml		docker-compose.development.yaml
docker-compose.local.yaml		docker-compose.local.yaml
docker-compose.nebula.test.yaml		docker-compose.nebula.test.yaml
docker-compose.yaml		docker-compose.yaml
readme.md		readme.md
redis.conf		redis.conf
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Setup Guide

Overview

Table of Contents

Prerequisites

Decomposed Scripts

1. Create Directories and Set Permissions

2. Setup Elasticsearch Schema and Migrations

3. Setup Elasticsearch Cron Job

4. Setup Redis ACL

Running the Setup

Deploy locally:

Step 1: Configure Environment Files

Important:

Docker Configuration

Step 1: Run the MRI Indexing

Step 2: Run the RAPTOR Indexing

Step 3: Create telegram bot

Step 4: Comment out the app service in `docker-compose.yaml` in order to save on resources

Step 5: Final

Deploy remotely ( via Github actions ):

1. Fork the repository

2. Creating GitHub Secrets

List of Necessary Secrets to Add

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Setup Guide

Overview

Table of Contents

Prerequisites

Decomposed Scripts

1. Create Directories and Set Permissions

2. Setup Elasticsearch Schema and Migrations

3. Setup Elasticsearch Cron Job

4. Setup Redis ACL

Running the Setup

Deploy locally:

Step 1: Configure Environment Files

Important:

Docker Configuration

Step 1: Run the MRI Indexing

Step 2: Run the RAPTOR Indexing

Step 3: Create telegram bot

Step 4: Comment out the app service in docker-compose.yaml in order to save on resources

Step 5: Final

Deploy remotely ( via Github actions ):

1. Fork the repository

2. Creating GitHub Secrets

List of Necessary Secrets to Add

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Step 4: Comment out the app service in `docker-compose.yaml` in order to save on resources

Packages