🚀 Hybrid Semantic Retrieval & Intelligence System (HSRIS)

Academic Assignment — End-to-end NLP pipeline for customer support ticket retrieval using PyTorch, built entirely from scratch (no sklearn).

📌 Overview

HSRIS is a multi-stage NLP retrieval system that combines:

Stage	Method	Output
Encoders	Label Encoding + One-Hot Encoding	Priority integers, Channel binary vectors
Sparse Retrieval	Bag-of-Words, N-Grams, TF-IDF (sparse tensors)	Keyword similarity scores
Dense Retrieval	GloVe 300-d + TF-IDF-weighted averaging	Semantic similarity scores
Hybrid Search	α·TF-IDF + (1-α)·GloVe	Ranked ticket results
Dual-GPU	`torch.nn.DataParallel`	Batch of 100 queries on Tesla T4 ×2
Evaluation	Precision@5	Quantitative comparison of all methods
UI	Gradio App	Interactive live demo

🗂️ Repository Structure

HSRIS/
├── HSRIS_FINAL.ipynb          ← ✅ Upload this to Kaggle (46 cells)
├── HSRIS_COMPLETE.py          ← Full single Python source file
├── build_notebook.py          ← Assembles .ipynb from source
├── make_single_notebook.py    ← Converts merged .py → .ipynb
├── hsris_part1_encoders.py    ← Label + One-Hot encoders
├── hsris_part2_sparse_retrieval.py  ← BoW, N-Grams, TF-IDF
├── hsris_part3_dense_layer.py ← GloVe, nn.Embedding, cosine sim
├── hsris_part4_hybrid_eval.py ← Hybrid search, Dual-GPU, Precision@5
├── hsris_part5_gradio_app.py  ← Gradio UI
└── README.md

⚙️ Environment

Setting	Value
Platform	Kaggle Notebook
GPU	Tesla T4 ×2 (Dual GPU)
Python	3.10+
Libraries	PyTorch · NumPy · Pandas · Regex · Gradio · Matplotlib
Forbidden	scikit-learn (`TfidfVectorizer`, `LabelEncoder`, etc.)

📦 Kaggle Datasets Required

Dataset	Kaggle Slug
Customer Support Tickets	`suraj520/customer-support-ticket-dataset`
GloVe 6B 300d	`thanakomsn/glove6b300dtxt`

🚀 How to Run on Kaggle

Upload HSRIS_FINAL.ipynb → Kaggle → New Notebook → Import
Add datasets (right panel → + Add Data):
- suraj520/customer-support-ticket-dataset
- thanakomsn/glove6b300dtxt
Enable Dual GPU: Settings → Accelerator → GPU T4 x2
Run All — Gradio public URL appears at the bottom

📐 Architecture

Query String
     │
     ├──► Tokenize ──► TF-IDF Vector ──► Cosine Sim ──► TF-IDF Scores
     │                                                        │
     └──► Tokenize ──► GloVe Embedding ──► Cosine Sim ──► GloVe Scores
                                                              │
                          α × TF-IDF + (1-α) × GloVe  ◄──────┘
                                      │
                               Top-K Ranked Results

📊 Deliverables

#	Deliverable	Location
1	Jupyter Notebook (all tasks)	`HSRIS_FINAL.ipynb`
2	Execution Time vs Batch Size Plot	Cell 27 → `execution_time_plot.png`
3	Precision@5 Report Table	Cell 28
4	Precision@5 Bar Chart	Cell 28 → `precision_at5_chart.png`
5	5 Qualitative GloVe > TF-IDF Examples	Cell 29
6	Gradio App Live Link	Cell 32 → printed URL

🔑 Key Design Decisions

No sklearn — TF-IDF, Label/One-Hot encoding all built from scratch with NumPy/PyTorch
Sparse tensors — TF-IDF stored as torch.sparse_coo_tensor to save GPU VRAM
TF-IDF weighted GloVe — More accurate than simple mean pooling (SIF-inspired)
nn.DataParallel — Automatically splits workload across both T4 GPUs
Gradio — Works natively in Kaggle with share=True for public URL

👤 Author

Built as part of an academic NLP assignment on Information Retrieval systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Hybrid Semantic Retrieval & Intelligence System (HSRIS)

📌 Overview

🗂️ Repository Structure

⚙️ Environment

📦 Kaggle Datasets Required

🚀 How to Run on Kaggle

📐 Architecture

📊 Deliverables

🔑 Key Design Decisions

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
HSRIS_COMPLETE.py		HSRIS_COMPLETE.py
HSRIS_FINAL.ipynb		HSRIS_FINAL.ipynb
HSRIS_Kaggle_Notebook.ipynb		HSRIS_Kaggle_Notebook.ipynb
README.md		README.md
build_notebook.py		build_notebook.py
hsris_part1_encoders.py		hsris_part1_encoders.py
hsris_part2_sparse_retrieval.py		hsris_part2_sparse_retrieval.py
hsris_part3_dense_layer.py		hsris_part3_dense_layer.py
hsris_part4_hybrid_eval.py		hsris_part4_hybrid_eval.py
hsris_part5_gradio_app.py		hsris_part5_gradio_app.py
make_single_notebook.py		make_single_notebook.py

Folders and files

Latest commit

History

Repository files navigation

🚀 Hybrid Semantic Retrieval & Intelligence System (HSRIS)

📌 Overview

🗂️ Repository Structure

⚙️ Environment

📦 Kaggle Datasets Required

🚀 How to Run on Kaggle

📐 Architecture

📊 Deliverables

🔑 Key Design Decisions

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages