RAG Implementation with Cohere and LangChain

This project implements a Retrieval Augmented Generation (RAG) system using Cohere's language models and LangChain framework. The system can answer questions about HR policies by referencing a knowledge base.

Prerequisites
Installation
Project Structure
Features
Usage
Configuration
How It Works

Prerequisites

Python 3.10 or higher
Cohere API key
Required Python packages

Installation

Clone the repository:

git clone <repository-url>
cd Chains

Install required packages:

pip install langchain-cohere langchain-community chromadb pydantic python-dotenv

Set up environment variables: Create a .env file in the project root:

COHERE_API_KEY=your_api_key_here

Project Structure

Chains/
├── data/
│   ├── globalcorp_hr_policy.txt
│   ├── local_vectorstore/
│   └── local_docstore/
├── src/
│   └── CohereQnA.py
└── README.md

Features

Document Loading: Automatic processing of HR policy documents
Vector Embeddings: Powered by Cohere's embedding models
Local Storage: Using Chroma for vector storage
Smart Retrieval: Parent-child document architecture
Interactive QA: Question answering using RAG

Usage

Place your HR policy document in data/globalcorp_hr_policy.txt
Run the QnA system:

python src/CohereQnA.py

The system will:
- Load and process the document
- Create embeddings
- Store them in a local vector store
- Answer questions about the HR policy

Configuration

Parameter	Value	Description
Parent chunk size	1000	Characters per parent chunk
Child chunk size	200	Characters per child chunk
Overlap	20	Characters overlapping between chunks
Model	command-light	Cohere model used
Temperature	0	Deterministic output setting

How It Works

Parent-Child Document Retrieval

The system uses a two-tier document splitting approach:

Child Splitter: Creates small, precise chunks for accurate matching
Parent Splitter: Maintains larger chunks for context preservation

Example scenario:

Document Structure:
├── Parent Chunk (Chapter-sized, 1000 chars)
│   └── Child Chunks (Paragraph-sized, 200 chars)

Key-Value Document Store

The system uses a key-value document store to maintain relationships between document chunks and their sources. This is implemented using LangChain's storage system:

Features

Relationship Preservation: Maintains links between parent and child document chunks
Source Tracking: Associates chunks with their original source documents
Efficient Retrieval: Enables quick lookup of parent documents when child chunks are retrieved

Benefits

Precise Answers: Find exact relevant passages
Context Preservation: Access broader context when needed
Balanced Retrieval: Optimal mix of specificity and context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG Implementation with Cohere and LangChain

Table of Contents

Prerequisites

Installation

Project Structure

Features

Usage

Configuration

How It Works

Parent-Child Document Retrieval

Key-Value Document Store

Features

Benefits

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

RAG Implementation with Cohere and LangChain

Table of Contents

Prerequisites

Installation

Project Structure

Features

Usage

Configuration

How It Works

Parent-Child Document Retrieval

Key-Value Document Store

Features

Benefits