This project is a comprehensive Data Science and AI pipeline developed in three milestones, progressing from traditional Machine Learning prediction to Knowledge Graph construction, and finally to an end-to-end Graph-RAG (Retrieval-Augmented Generation) application for an Airline Travel Assistant.
- Neo4j: Install Neo4j Desktop or Community Server locally.
- Python Libraries:
pip install neo4j
Goal: Develop a machine learning pipeline to predict passenger satisfaction based on customer feedback and flight data.
Key Features:
- Data Engineering: Cleaning datasets and performing Sentiment Analysis on review content using the Vader library.
- Exploratory Data Analysis (EDA): Visualizing top flight routes, booking distributions, and rating patterns by traveler type.
- Predictive Modeling: Implementing a Statistical ML model or Shallow Feed-Forward Neural Network (FFNN) to classify passengers as "Satisfied" (Rating ≥ 5) or "Dissatisfied".
- Model Explainability (XAI): Using SHAP and LIME to interpret model predictions and identify influential features.
Deliverables:
- Jupyter Notebook with reproducible workflow.
- Analytical Report & Visualizations.
- Predictive Model & XAI plots.
Goal: Transition from tabular data to a structured Neo4j Knowledge Graph (KG) that models relationships between passengers, flights, airports, and journeys.
Schema:
- Nodes:
Passenger,Journey,Flight,Airport. - Relationships:
(:Passenger)-[:TOOK]->(:Journey),(:Journey)-[:ON]->(:Flight),(:Flight)-[:DEPARTS_FROM]->(:Airport),(:Flight)-[:ARRIVES_AT]->(:Airport).
Key Tasks:
- KG Construction: A Python script (
Create_kg.py) to ingest CSV data and build the graph adhering to a strict schema. - Scoring Rule Implementation: Calculating a weighted
overall_satisfaction_scorebased on food, delay, legs, and miles to identify passengers above a specific threshold. - Cypher Analytics: Developing queries to answer business questions (e.g., busiest routes, average delays, food satisfaction by generation).
Goal: Build a Graph Retrieval-Augmented Generation (Graph-RAG) system that uses the Neo4j Knowledge Graph from Milestone 2 as a grounding mechanism for an LLM-based assistant.
System Architecture:
- Input Preprocessing:
- Intent Classification: Routing queries (e.g., "Search" vs. "Recommend").
- Entity Extraction: Using NER to identify airports, flights, and dates from user input.
- Graph Retrieval Layer:
- Baseline: Structured Cypher templates populated with extracted entities.
- Embeddings: Semantic similarity search using vector embeddings (Node or Feature embeddings).
- LLM Layer:
- Combines retrieved graph context with a structured prompt (Context, Persona, Task) to generate accurate answers.
- Comparison of at least 3 different LLMs (e.g., Llama, Gemma, Mistral).
- User Interface:
- A Streamlit UI to visualize the retrieved KG context, executed Cypher queries, and the final LLM response.
cd ms3
pip install -r requirements.txt