pdfBot 📄🤖

pdfBot is a privacy-first, fully local Retrieval-Augmented Generation (RAG) chatbot that allows you to chat interactively with your PDF documents. Built entirely with open-source tools, it runs completely offline, ensuring your sensitive data never leaves your machine.

✨ Features

100% Local & Private: No API keys required. Powered by a local Llama 2 model using CTransformers.
Conversational UI: Features a sleek, interactive web interface built with Chainlit.
Document Processing: Automatically loads, splits, and processes PDF documents placed in a local directory.
Efficient Retrieval: Uses HuggingFaceEmbeddings and FAISS for fast, accurate local vector search.

🛠️ Tech Stack

Orchestration: LangChain
UI Framework: Chainlit
Local LLM Inference: CTransformers
Vector Store: FAISS
Embeddings: HuggingFace (sentence-transformers/all-mpnet-base-v2)

🚀 Getting Started

Prerequisites

Python 3.8+
Download the LLM: You will need to download the llama-2-7b-chat.Q5_K_M.gguf model (or update app.py to point to your preferred .gguf model) and place it in the root directory. You can find this model on HuggingFace (e.g., from TheBloke's repositories).

Installation

Clone this repository:

git clone [https://github.com/tilakraj0308/pdfbot.git](https://github.com/tilakraj0308/pdfbot.git)
cd pdfbot

Create and activate a virtual environment (optional but recommended):

  python -m venv myenv
  source myenv/bin/activate  # On Windows use `myenv\Scripts\activate`

Install the required dependencies:

  pip install langchain chainlit ctransformers faiss-cpu huggingface-hub sentence-transformers pypdf
  (Note: Use faiss-gpu if you have a compatible NVIDIA GPU).

Usage

Add your Documents Place all the .pdf files you want to chat with inside the data/ directory.
Create the Vector Database Run the data ingestion script. This will process your PDFs, generate embeddings, and save the FAISS database locally in the db/ folder.

python data_store.py

Start the Chatbot Launch the Chainlit interface to start interacting with your documents.

chainlit run app.py -w

The UI will open in your default web browser (usually at http://localhost:8000).

📂 Project Structure

app.py: The main Chainlit application and LangChain RetrievalQA setup.
data_store.py: Script to load PDFs, chunk text, create embeddings, and build the FAISS index.
data/: Directory where you drop your input PDF files.
db/: Directory where the local FAISS vector database is saved.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
db		db
.gitignore		.gitignore
README.md		README.md
app.py		app.py
data_store.py		data_store.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdfBot 📄🤖

✨ Features

🛠️ Tech Stack

🚀 Getting Started

Prerequisites

Installation

Usage

📂 Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pdfBot 📄🤖

✨ Features

🛠️ Tech Stack

🚀 Getting Started

Prerequisites

Installation

Usage

📂 Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages