Skip to content

This repository contains boilerplate of a commonly used technique to evaluate RAG.

Notifications You must be signed in to change notification settings

ml6team/rag-evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG retrieval evaluation

Evaluating RAG systems can be done on two sides:

  • retrieval evaluation: assess the accuracy and relevance of the information retrieved by the system
  • response evaluation: assess the quality and appropriateness of the responses generated by the system based on the retrieved information.

This folder contains 3 scripts which can be used to perform retrieval evaluation. It calculates 3 metrics (hit rate, Mean Reciprocal Rank or MRR and NDCG) on automatically generated questions. It is based on this OpenAI cookbook by LlamaIndex.

The scripts can be used to 1) download an entire index (collection of chunks) from Azure AI Search 2) use an LLM to generate a question for each chunk 3) apply the retriever on the generated questions to compute the metrics.

Note: this repository assumes you're working with Azure AI Search and Azure OpenAI models. The idea remains the same if you're working with different vector databases, clouds or LLM APIs.

Installation

One can install the dependencies like so:

uv add rag-evaluation

Then complete config.py with the required environment variables.

Usage

Simply run the 3 scripts in a sequence (note that you still need to enter the API key of Azure AI Search in the first and third script):

uv run step_1_download_index.py
uv run step_2_generate_question_context_pairs.py
uv run step_3_calculate_metrics.py --search-type hybrid --use-reranker

About

This repository contains boilerplate of a commonly used technique to evaluate RAG.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages