Airline Delay Analysis Assistant: Evolution from Fine-Tuning to RAG

Project Overview

This repository contains an AI-powered assistant for analyzing airline delay data. The project has evolved through multiple approaches, showcasing the advancement of LLM application techniques from basic fine-tuning to RAG (Retrieval-Augmented Generation) systems.

Evolution Timeline

Stage 1: OpenAI Fine-Tuning Approach (v1)

Our initial approach utilized OpenAI's fine-tuning capabilities to create a specialized model for airline delay analysis. This method involved:

Creating a custom dataset of airline delay Q&A pairs
Fine-tuning the OpenAI base model on this domain-specific data
Deploying the fine-tuned model to answer queries about airline delays

While this approach produced good results for standard questions within the fine-tuning dataset, it had limitations with novel queries and couldn't adapt to new data without retraining.

Stage 2: Context-in-Prompt Approach

As an intermediate step, we explored embedding relevant context directly in prompts:

Preprocessing the airline data into digestible chunks
Using basic retrieval mechanisms to find relevant data for a query
Including this data directly in the prompt sent to the LLM

This improved flexibility but was limited by context window constraints and lacked sophistication in retrieval. Manual inspection showed that the accuracy of the answer was quite low.

Stage 3: RAG System (v2)

Our current implementation leverages a full Retrieval-Augmented Generation (RAG) system built with LangChain:

Vector Database: Airline data is embedded and stored in a vector database for semantic search
Advanced Retrieval: LangChain's query-aware chunking and hybrid search algorithms find the most relevant information
LLM Integration: Retrieved content is intelligently incorporated into prompts using LangChain's chains and agents
Evaluation Framework: Comprehensive metrics provided by Giskard to measure accuracy, relevance, and bias in model responses

LangChain provides the flexible framework that powers our RAG pipeline, enabling modular components that can be swapped or upgraded as needed. Giskard's evaluation suite allows us to systematically test and validate our system, ensuring reliability and identifying areas for improvement.

Usage

Installation

# Clone the repository
git clone https://github.com/yourusername/airline-delay-analysis.git
cd airline-delay-analysis

# Set up virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Configuration

Create a config.yaml file with your API keys and settings
Place your airline data in the data/ directory

Running the RAG System

python rag_system.py --query "What are the main causes of delays in Chicago?"

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
__pycache__		__pycache__
data		data
.gitignore		.gitignore
README.md		README.md
airlines_delay_sample.db		airlines_delay_sample.db
config.yaml		config.yaml
rag_fly.py		rag_fly.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Airline Delay Analysis Assistant: Evolution from Fine-Tuning to RAG

Project Overview

Evolution Timeline

Stage 1: OpenAI Fine-Tuning Approach (v1)

Stage 2: Context-in-Prompt Approach

Stage 3: RAG System (v2)

Usage

Installation

Configuration

Running the RAG System

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Airline Delay Analysis Assistant: Evolution from Fine-Tuning to RAG

Project Overview

Evolution Timeline

Stage 1: OpenAI Fine-Tuning Approach (v1)

Stage 2: Context-in-Prompt Approach

Stage 3: RAG System (v2)

Usage

Installation

Configuration

Running the RAG System

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages