Engineering Knowledge Assistant — RAG Pipeline

A production-grade Retrieval-Augmented Generation (RAG) pipeline grounded in published research on railroad condition monitoring, DAS signal processing, and AI-driven manufacturing optimization.

🔗 Try it live → huggingface.co/spaces/arifme071/engineering-knowledge-rag

What it does

This app lets you query a knowledge base built from peer-reviewed research papers using natural language. Type a question, and the pipeline:

Retrieves the most semantically relevant passages from the knowledge base (FAISS + SentenceTransformers)
Augments a prompt with those passages as context
Generates a grounded, citation-backed answer (HuggingFace Flan-T5)

No hallucination about domain specifics — every answer is anchored to real research.

Architecture

User Query
    │
    ▼
┌─────────────────────────┐
│   SentenceTransformers  │  → Embed query (MiniLM-L6-v2, 384-dim)
│   (all-MiniLM-L6-v2)   │
└────────────┬────────────┘
             │ query vector
             ▼
┌─────────────────────────┐
│    FAISS Vector Index   │  → Cosine similarity search (Inner Product)
│    (IndexFlatIP)        │    Returns top-k most relevant chunks
└────────────┬────────────┘
             │ retrieved chunks + scores
             ▼
┌─────────────────────────┐
│    Prompt Builder       │  → Assembles context + question into LLM prompt
│                         │    with source attribution
└────────────┬────────────┘
             │ prompt
             ▼
┌─────────────────────────┐
│  HuggingFace Flan-T5    │  → Generates grounded answer
│  (google/flan-t5-base)  │    Open-source, runs locally — no API key needed
└────────────┬────────────┘
             │ answer + sources
             ▼
┌─────────────────────────┐
│   Streamlit Chat UI     │  → Displays answer with expandable source cards
│                         │    showing relevance scores and paper metadata
└─────────────────────────┘

Knowledge Base

The app ships with a built-in knowledge base from 7 peer-reviewed publications (184+ citations):

Paper	Venue	Year	Domain
CNN-LSTM-SW for Railroad Anomaly Detection via DAS	Elsevier GEITS	2024	Railroad AI
DAS-based Railroad CM with GRU/LSTM	SPIE JARS	2024	DAS Signal Processing
Review of DAS Applications for Railroad CM	Elsevier MSSP	2023	DAS Review
HMM-RL for WAAM Intelligent Control	Springer	2026	Manufacturing AI

You can extend the knowledge base by dropping your own PDFs into data/papers/ — the pipeline auto-ingests and indexes them on first run.

Example queries

How does the CNN-LSTM sliding window correct misclassifications?
What are the four condition classes in the DAS railroad dataset?
How does SMOTE handle class imbalance in the training data?
What is the difference between GRU and LSTM for DAS signal processing?
How does the HMM-RL pipeline optimize WAAM manufacturing?
What features are extracted from DAS signals for model training?
What is the ROC AUC score for each condition class?
How does distributed acoustic sensing work on railroad tracks?

Repository Structure

engineering-knowledge-rag/
├── app/
│   └── main.py                        # Streamlit UI — chat interface, settings sidebar
│
├── src/
│   ├── retrieval/
│   │   └── rag_pipeline.py            # Core RAG: embed → FAISS search → LLM generate
│   ├── ingestion/
│   │   ├── builtin_knowledge.py       # Pre-loaded paper excerpts (works offline)
│   │   └── document_loader.py         # PDF/TXT ingestion + chunking pipeline
│   └── utils/
│       └── ui_helpers.py              # Source cards, example query buttons
│
├── data/
│   ├── papers/                        # Drop your PDFs here to extend knowledge base
│   └── index/                         # Auto-generated FAISS index (git-ignored)
│
├── Dockerfile                         # Container for HuggingFace Spaces deployment
├── requirements.txt
├── .gitignore
└── README.md

Quickstart

Run locally

git clone https://github.com/arifme071/engineering-knowledge-rag.git
cd engineering-knowledge-rag
pip install -r requirements.txt
streamlit run app/main.py

Open http://localhost:8501 — loads instantly with the built-in knowledge base.

Add your own documents

# Drop PDFs into data/papers/
cp your_paper.pdf data/papers/

# Delete cached index so it rebuilds
rm -rf data/index/

# Rerun — auto-ingests on startup
streamlit run app/main.py

Run with Docker

docker build -t engineering-rag .
docker run -p 7860:7860 engineering-rag

Deploy to HuggingFace Spaces (free)

Go to huggingface.co → New Space
Name: engineering-knowledge-rag | SDK: Docker/Streamlit | Visibility: Public
Upload all files from this repo
HuggingFace builds and deploys automatically (~5 min)
Live URL: https://huggingface.co/spaces/YOUR_USERNAME/engineering-knowledge-rag

Tech Stack

Component	Technology	Why
Embeddings	`sentence-transformers/all-MiniLM-L6-v2`	Fast, high-quality 384-dim, runs on CPU
Vector store	FAISS `IndexFlatIP`	Exact cosine similarity, no server needed
LLM	`google/flan-t5-base`	Open-source, instruction-tuned, zero API cost
Frontend	Streamlit	Shareable demo, native chat components
Deployment	HuggingFace Spaces + Docker	Free hosting, permanent public URL

Extending the Pipeline

Swap the LLM — any HuggingFace model works:

llm_model: str = "mistralai/Mistral-7B-Instruct-v0.1"  # Better quality, needs GPU
llm_model: str = "google/flan-t5-large"                 # Bigger, still CPU-friendly

Use OpenAI/Anthropic instead:

from openai import OpenAI
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
response = client.chat.completions.create(model="gpt-4o", messages=[...])

Persistent vector store with ChromaDB:

import chromadb
client = chromadb.PersistentClient(path="data/chroma")

Related Repositories

railroad-anomaly-detection-cnn-lstm — CNN-LSTM-SW paper repo (Elsevier GEITS 2024)
📚 Google Scholar — Full publication list

Author

Md Arifur Rahman PIN Fellow (AI in Manufacturing) · Georgia Tech | MSc Applied Engineering · Georgia Southern University

License

MIT License — see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Engineering Knowledge Assistant — RAG Pipeline

What it does

Architecture

Knowledge Base

Example queries

Repository Structure

Quickstart

Run locally

Add your own documents

Run with Docker

Deploy to HuggingFace Spaces (free)

Tech Stack

Extending the Pipeline

Related Repositories

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
app		app
src		src
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Engineering Knowledge Assistant — RAG Pipeline

What it does

Architecture

Knowledge Base

Example queries

Repository Structure

Quickstart

Run locally

Add your own documents

Run with Docker

Deploy to HuggingFace Spaces (free)

Tech Stack

Extending the Pipeline

Related Repositories

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages