A simple question-answering bot using BERT that can find answers in documents.
You give it a document and ask questions about it. The model reads through the text and pulls out answers.
- Install dependencies:
pip install transformers torch- Run the notebook:
jupyter notebook bert_qa_note.ipynb
- Load a document, ask questions, see answers. That's it.
- Loader: Reads .txt files
- Preprocessor: Cleans up the text (removes LaTeX, fixes spacing, etc)
- Chunker: Splits long documents into chunks so the model can handle them
- QA Model: Uses BERT to find where the answer is
- Visualizer: Shows which tokens got the highest scores
bert_qa/
├── qa_model.py # The actual model
├── loader.py # Load documents
├── chunker.py # Split text into chunks
├── preprocess.py # Clean text
└── visualize.py # Plot results
data/
└── transformer_wiki.txt # Sample document
bert_qa_note.ipynb # Try it here
Works best with questions that have clear answers in the text. If the question or context is too different from the training data, answers might be off.
The model is fine-tuned on SQuAD, so it's pretty good at extractive QA (pulling answers directly from text).
Btw document could be replaced-- curently it only supports txt files and could be changed to accept csv,pdf etc